Generative Grammar This volume brings together for the first time papers by the distinguished linguist Robert Freidin, Professor of Linguistics at Princeton University. Robert Freidin’s research is focused on generative grammar, which provides a formal theory of linguistic structure that underlies linguistic performance. This collection of papers deals with topics central to the study of generative grammar, including theories of movement, case and binding, as well as their intersections and empirical motivation. Also included are papers covering the broader history of generative grammar, which seek to understand the evolution of linguistic theory by careful investigation of how and why it has changed over the past sixty years. The history of the theory provides a context for a fuller understanding of current proposals, while current theoretical discussions contribute to the ongoing history and often provide important clarifications of earlier work. Generative Grammar is an essential resource for those seeking to understand both the history of generative grammar and current developments in the field. Robert Freidin is Professor of the Council of the Humanities at Princeton University.
Routledge Leading Linguists Series editor: Carlos P.Otero University of California, Los Angeles, USA 1 Essays on Syntax and Semantics James Higginbotham 2 Partitions and Atoms of Clause Structure Subjects, agreement, case and clitics Dominique Sportiche 3 The Syntax of Specifiers and Heads Collected essays of Hilda J.Koopman Hilda J.Koopman 4 Configurations of Sentential Complementation Perspectives from Romance languages Johan Rooryck 5 Essays in Syntactic Theory Samuel David Epstein 6 On Syntax and Semantics Richard K.Larson 7 Comparative Syntax and Language Acquisition Luigi Rizzi 8 Minimalist Investigations in Linguistic Theory Howard Lasnik 9 Derivations Exploring the dynamics of syntax Juan Uriagereka 10 Towards an Elegant Syntax Michael Brody
11 Logical Form and Linguistic Theory Robert May 12 Generative Grammar Theory and its history Robert Freidin 13 Theoretical Comparative Syntax Studies in macroparameters Naoki Fukui
Generative Grammar Theory and its history
Robert Freidin
LONDON AND NEW YORK
First published 2007 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX 14 4RN Simultaneously published in the USA and Canada by Routledge 270 Madison Ave, New York, NY 10016 Routledge is an imprint of the Taylor & Francis Group, an informa business This edition published in the Taylor & Francis e-Library, 2007. “To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” © 2007 Robert Freidin All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data A catalog record for this book has been requested ISBN 0-203-64356-9 Master e-book ISBN
ISBN10: 0-415-33181-1 (hbk) ISBN10: 0-203-64356-9 (Print Edition) (ebk) ISBN13: 978-0-415-33181-4 (hbk) ISBN13: 978-0-203-64356-3 (Print Edition) (ebk)
Contents Acknowledgements 1 Introduction PART I Theory
§A: Movement
vii 1 16
18
2 Cyclicity and the theory of grammar
20
3 Superiority, Subjacency and Economy
50
4 Cyclicity and minimalism
72
§B: Case 5 Core grammar, Case theory, and markedness WITH H.LASNIK 6 Lexical Case phenomena WITH REX SPROUSE 7 The subject of defective T(ense) in Slavic WITH JAMES E.LAVINE §C: Binding
96 98 110 132 160
8 Disjoint reference and wh-trace WITH 162 H.LASNIK 9 On the fine structure of the binding theory: Principle A and reciprocals WITH 177 WAYNE HARBERT
10 Fundamental issues in the theory of binding
185
11 Binding theory on minimalist assumptions
216
PART II History
226
12 The analysis of Passives
228
13 Review of Ideology and Linguistic Theory: Noam Chomsky and the deep structure debates by Geoffrey J.Huck and John A.Coldsmith 14 Linguistic theory and language acquisition: a note on structure-dependence
253
15 Conceptual shifts in the science of grammar: 1951–92
264
16 Review of The Minimalist Program by Noam Chomsky
298
17 Exquisite connections: some remarks on the evolution of linguistic theory WITH JEAN-ROGER VERGNAUD 18 Syntactic Structures Redux
310
260
337
References
363
Name index
376
Subject index
381
Acknowledgements The authors and publishers would like to thank the following for granting permission to reproduce material in this work: Cambridge University Press for permission to reprint R.Freidin “Linguistic Theory and Language Acquisition: A note on Structure dependence,” Behavioural and Brain Sciences, 14:2 (1991), pp. 618–619. M.I.T. Press for permission to reprint R. Freidin “Cyclicity and the Theory of Grammar,” Linguistic Inquiry, 9:4 (1978), pp. 519–549. And R.Freidin and H.Lasnik, “Disjoint Reference and Wh-trace,” Linguistic Inquiry, 12:1 (1981), pp. 39–53 And R.Freidin and R.Sprouse (1991) “Lexical case phenomena,” in R. Freidin (ed.), Principles and Parameters in Comparative Grammar, M.I.T. Press, pp. 392–416 And R.Freidin (1999) “Cyclicity and minimalism,” in S.Epstein and N.Hornstein (eds.), Working Minimalism M.I.T. Press, pp. 95–126. Indiana University for permission to reprint R.Freidin, review of G.Huck & J.Goldsmith, Ideology and Linguistic Theory: Noam Chomsky and the deep structure debates, Routledge (1995) Blackwell Publishing for permission to reprint R.Freidin (2004) “Syntactic Structures Redux,” Syntax, 7:100–126. Elsevier for permission to reproduce R.Freidin and J.R.Vergnaud, “Exquisite Connections: some remarks on the evolution of linguistic theory,” Lingua, 111:639–666. Georgetown University Press for permission to reprint R. Freidin “Superiority, Subjacency and Economy,” in H.Campos and P.Kempchinsky (eds.), Evolution and Revolution in Linguistic Theory: Essays in honour of Carlos Otero, Georgetown University Press 1955, pp. 138–167. With kind permission of Springer Science and Business Media to reprint R.Freidin, “Fundamental Issues in Theory Binding,” in B.Lust (ed.), Studies in the Acquisition of Anaphora: Defining the constraints, D. Reidel Publishing Co., Dordecht, 1986, pp. 151–188. ISBN 90–277–2121–1. Every effort has been made to contact copyright holders for their permission to reprint material in this book. The publishers would be grateful to hear from any copyright holder who is not here acknowledged and will undertake to rectify any errors or omissions in future editions of this book.
1 Introduction The biolinguistic approach to the study of human language is a fairly recent development in the history of linguistics, a perspective that developed within modern generative grammar (from LSLT to the present). This approach seeks to understand what presumably unique biological properties account for human language, more specifically its structure, use and biological origin. It has been pursued by postulating explicit computational models of what a speaker of a human language must know to be able to use the language, the knowledge of linguistic structure that underlies linguistic performance. Such models have been the subject of study in a subfield of linguistics called generative grammar. The papers in this volume deal with central topics within the study of generative grammar—primarily, the theories of movement, case and binding, as well as their intersections and empirical motivation. They also cover the broader history of the field, which is rich and intricate. This history provides a context for a fuller understanding of current proposals, which after all also form an integral part of this history. Thus the separation of these essays into two parts, theory and history, is somewhat artificial. Current theoretical discussions simply contribute to the ongoing history and often provide important clarifications of previous work. Historical discussions usually clarify the past and often create a context in which to understand what progress, if any, has been or is being made. Furthermore, ideas that have been abandoned along the way can be resurrected and refurbished in the current context—the revival of generalized transformations being a spectacular example. A Theory The syntactic cycle has played a central role in the theory of movement since its inception in Chomsky 1965. “Cyclicity and the theory of grammar” (1978) (chapter 2) resulted from research on how the cycle (in particular the Strict Cycle Condition of Chomsky 1973 (henceforth SCC)) operated under the trace theory of movement rules. The empirical motivation for the SCC discussed in Chomsky 1973 cited one example of what is commonly referred to as a wh-island violation. One derivation of this example violated no known movement constraints and yet yielded deviant output. Thus the SCC appeared to be necessary to rule out this derivation. Under trace theory however, this derivation yielded the same output as a derivation that violated some other constraints on movement (e.g., the Subjacency Condition). While the SCC could only be interpreted as a condition on derivations, Subjacency could also be interpreted as a condition on representations—more precisely, a locality condition on trace binding. Under this interpretation, the SCC becomes superfluous. “Cyclicity and the Theory of Grammar”
Generative grammar
2
generalized this result to other cases, not considered in Chomsky 1973, whose derivations also violated the SCC. It demonstrated how most wh-movement cases could be handled by Subjacency1, whereas NP-movement cases involved other conditions.2 By deriving the empirical results of the principle of the syntactic cycle from other independently motivated general principles of grammar, this paper demonstrated how the cyclicity of the computational system was in fact built into the general architecture of UG. This result also raised the issue of the derivational vs. representational interpretation of general principles and provided an argument for the latter interpretation, given that some violations of the SCC also violated Subjacency interpreted as a condition on representations. Like the previous chapter, chapter 3, “Superiority, subjacency, and economy” (1995), concerns the potential for overlap among conditions—the two mentioned in the title and Chomsky’s Shortest Movement economy constraint—as applied to the derivation of certain wh-island violations. For example, constructions like (1) might be derived in two different ways, one of which violates both the Subjacency and Superiority conditions while the other only violates the latter, depending on whether who moves to Spec-CP in the complement clause. (1)
*What did you forget who had borrowed?
This paper attempts to refine the analysis of wh-island violations by using wh-phrases of the form which-N instead of bare interrogative pronouns. The Superiority Condition does not apply to movement involving which-N phrases in single clauses, as illustrated by the well-formedness of examples like (2). (2)
Which books did which students borrow?
This result extends to such wh-phrases in complex sentences. Thus (3) in contrast to (1) appears to be relatively well-formed. (3)
Which books did you forget which students had borrowed?
To account for the contrast between (1) and (3) within a minimalist feature checking analysis circa 1994 this paper adopts a rather radical stance, including the Form Chain analysis of Chomsky 1993, countercyclic movement (which reinforces the conclusion of the previous chapter that the SCC cannot be a primitive, but rather its appropriate empirical effects are derived from other principles of grammar), and the rejection of the Shortest Move condition, which ought to block (3) and possibly (2) as well.3 Between the publication of chapter 2 in 1978 and the publication of “Cyclicity and minimalism” (chapter 4) in 1999, the theory of phrase structure underwent a radical revision. Starting in 1979 it became clear that PS rules redundantly stipulated properties that followed from the interaction of general principles of grammar (e.g., the Case Filter and the θ-Criterion) and idiosyncratic properties of individual lexical items (Chomsky
Introduction
3
class lectures 1979; see also Stowell 1981). Thus phrase structure rules were abandoned as the mechanism for building phrase structure. However, it took over a decade before the notion of generalized transformation was revived as the mechanism for constructing phrase structure from lexical items (Chomsky 1993, 1995b).4 With PS rules, derivations of phrase structure were purely top-down, whereas with Merge, these derivations are exclusively bottom-up. This fundamental difference prompted a reevaluation of how cyclic derivation worked as well as the empirical motivation for a cyclic principle and the possibility of deriving its empirical effects from other independent principles of grammar, the major topics of chapter 4. It is interesting to note that a cyclic principle (e.g., the SCC) ensures that the phrase marker of sentence will be processed strictly bottom-up (i.e., from smaller to larger domains) even (or especially) when phrase structure is constructed top-down. Under minimalist analysis the strictly bottom-up creation of phrase structure is determined by an extension condition (Chomsky 1993), which requires that each step of a derivation extends the right or left edge of the phrase marker affected.5 This applies to movement operations as well, and thereby serves as a replacement cyclic principle (to the SCC) without mentioning the notion of cyclic domain. In effect, Chomsky’s Extension Condition (1993, p. 22) provides another way to derive the empirical effects of a cyclic principle. Under minimalism the details of derivations involving multiple movements (e.g., the derivation of the wh-island violations covered in the previous two chapters) are both more complicated and less well determined by the theory of grammar because of the wide choice of analytic options. Is movement the result of Move or Attract? Do some features move independently of the rest of the constituent they occur in, which then undergoes displacement by some form of generalized pied piping (Chomsky 1995d) or do whole constituents move to check features? Is there a strong/weak feature distinction in addition to the interpretable/uninterpretable distinction? And if so, does an unchecked strong feature cancel a derivation—an analysis that leads to a further, perhaps questionable, distinction between deletion and erasure? Chapter 4 attempts to sort through these options for constructions whose derivations constitute SCC violations—including super-raising, the Head Movement Constraint (HMC, see Travis 1984) and Constraint on Extraction Domains (CED, see Huang 1982) violations, as well as wh-islands. It shows how, under current minimalist analysis, the empirical evidence for cyclic derivation follows from other independently motivated grammatical principles, and thus eliminates the need to stipulate an independent cyclic principle. For example, in the case of wh-island violations, if [+wh] is a strong feature of C that motivates movement to Spec-CP, then if the feature is not checked immediately after it enters the derivation, the derivation cancels. If the feature is checked immediately after it enters the derivation, then it is no longer active and hence cannot motivate a further counter-cyclic movement of a wh-phrase. In this case the empirical effects of cyclicity follow from the principles of feature checking and so there is no need to invoke an independent cyclic principle.6 Chapter 4 also discusses three additional proposals for deriving cyclicity from other principles of grammar. Kitahara 1995 proposes that the economy condition requiring shortest derivations always blocks a countercyclic movement because it involves an extra derivational step. Collins 1997 suggests that countercyclic movements result in
Generative grammar
4
configurations that violate the Linear Correspondence Axiom of Kayne 1994. Chapter 4 identifies potential flaws in these proposals and proposes instead that countercyclic Merge might be ruled out because the elementary operation that performs merger is incapable by its nature of applying countercyclically. Thus no condition on derivations or on representations is needed to block countercyclic operations. The formulation of the elementary operation suffices, an optimal solution on minimalist assumptions. Like the analysis of cycilicity the analysis of grammatical Case has played a fundamental role in a theory of syntactic movement. The theoretical importance of Case for modern generative grammar was first spelled out in an unpublished letter by JeanRoger Vergnaud to Chomsky and Lasnik about their paper “Filters and Control” (1977).7 Chomsky 1980 adapts Vergnaud’s Case theory in a formulation of a Case filter, which limits the distribution of NPs with phonetic content (as opposed to phonetically empty NPs—e.g., trace and PRO) to Casemarked positions. The Case filter analysis provides a more general and more principled account of the distribution of phonetically realized NPs, as well as a principled motivation for the movement of nominal expressions in general. Thus, NPs with phonetic content that enter a derivation in a Caseless position must move to a Casemarked position to yield a legitimate syntactic construction. Chapter 5 (“Core Grammar, Case Theory, and Markedness” (1981)), an early study of the new Case theory, works out some ramifications of the Case filter analysis. In particular, it is concerned with the interaction (and hence the ordering) of Caseassignment and Deletion, and also of the Case Filter and Deletion. The paper identifies empirical evidence that determines how these mechanisms must be ordered. It also demonstrates how Case theory distinguishes between NP-trace and wh-trace, where only the latter is subject to the Case Filter. This distinction is further supported by Binding Theory, as discussed in chapter 8. This result raised a question about the nature of the Case Filter, which had been assumed to apply only to phonetically realized NPs. The inclusion of wh-trace, which is obviously not phonetically realized, suggests that the Case filter analysis may not be properly formulated. This concern led to a “visibility” approach to Case (Chomsky 1981, 1986) that integrates Case theory and θ-theory This approach attempts to explain the evidence from wh-movement as a violation of the θ-Criterion rather than the Case filter.8 Initially Case theory was formulated primarily on the basis of English, a language without a rich morphological Case system. Expanding Case analysis to languages that have rich morphological Case systems (e.g., Russian and Icelandic) revealed a further general principle as well as some refinements of Case theory. Such languages usually manifest two distinct types of morphological Case: configurational and lexical (a.k.a. quirky Case). Configurational Case is assigned purely in terms of syntactic position, whereas lexical Case is assigned via selection by a specific lexical head (where different heads of the same category may select different lexical Cases). In constructions where configurational and lexical Case could be in conflict (e.g., the object of a verb that assigns lexical Case), the lexical Case assignment must be satisfied and therefore the configurational Case is morphologically suppressed. This follows from the principle of Lexical Satisfaction of Freidin & Babby 1984, the ramifications of which are investigated in chapter 6, “Lexical case phenomena” (1991). Furthermore, lexical Case phenomena establish a distinction between Case assignment and Case licensing. In a clause whose main verb selects a lexically Casemarked subject, that lexically Casemarked subject must
Introduction
5
occur in a position that is configurationally licensed for Case. It is necessary but not sufficient that the subject bears the appropriate lexical Case. Thus a phonetically realized NP must be Case licensed as well as Casemarked. For configurational Case, licensing and Casemarking appear to be indistinguishable, but for lexical Case these are distinct processes. Under minimalism, the Case Filter has been replaced by the Principle of Full Interpretation (FI), which has subsumed its empirical effects.9 This follows given that all phonetically realized NPs enter a derivation with unvalued Case features and that because these features are uninterpretable at PF and LF (with or without values), they must be eliminated via checking during the course of the derivation. If not, these unchecked features violate FI at PF and LF. It is further assumed that the valuation and checking of Case features is a reflex of the checking of agreement features (henceforth ) Nominative Case is valued and checked via the of T, and accusative Case via the of ν. The role of Case in the theory of movement has also changed significantly under minimalism. In the initial discussions of Case theory it was assumed that NP-movement (e.g., passive and raising) were driven by the need for a NP with phonetic content to be Case-marked. Under minimalism, Case-marking (i.e., valuation of Case features) is a secondary effect, the result of agreement of Whether movement is driven by the need to check uninterpretable depends on whether agreement only obtains under a local Spec-head relation or can occur long distance (as in Chomsky 2000b). If the latter, then movement is motivated by neither agreement nor Case considerations. Instead, movement must be driven by some other general requirement—presumably the Extended Projection Principle (EPP) of Chomsky 1982 (see also Chomsky 1981, p. 27). In languages like English the interactions of Case, agreement, and EPP considerations tend to overlap and hence it is difficult to tease apart the unique effects of each. However, in languages with rich morphological Case systems (e.g., Russian and Ukranian) we find phenomena (dubbed accusative unaccusative constructions) whose analysis yields a separation of the former two factors from the latter. This constitutes the focus of chapter 7, “The subject of defective T(ense) in Slavic” (2002). Russian, for example, has constructions in which accusative NP occurs in subject position but does not agree with the finite verb. Instead, the verb manifests a default agreement, indicating a lack of agreement between the syntactic subject and the verb. The accusative Case-marking is configurational, therefore established by agreement with ν rather than T. The displacement of the accusative NP to Spec-TP cannot be motivated by a Case or agreement relation with T. Therefore the movement appears to be purely the result of the EPP. This result provides some independent empirical motivation for the EPP. It argues against recent attempts to reduce EPP effects to properties of Case and agreement systems. Furthermore, it suggests that the EPP does not fall under checking theory, where in recent formulations (e.g., Chomsky 2000b) probe/goal relations are restricted to active elements (i.e., two elements that each contain at least one unvalued feature). The analysis of these accusative unaccusative constructions further contradicts the claim that once the Case feature of NP have been valued, that NP is frozen in place. If this analysis is correct, then the freezing effect for some Case-marked NPs must be derivable from some other
Generative grammar
6
principle. Chapter 7 explores a prohibition against multiple agreement (as proposed in chapter 17) as an alternative. In the late 1970s, when the modular approach of the Principles and Parameters framework was just coming into focus, various attempts were made to connect different modules by utilizing the concepts in one to formulate principles for another. The formulation of the Nominative Island Condition (NIC) of binding theory (Chomsky 1980) is one obvious example. Furthermore, the major thrust of theorizing, as usual, was for the most general application of these principles across the widest range of phenomena. In particular, researchers attempted to explain the distribution of various empty categories (e.g., trace vs. PRO) in terms of the binding principles that were independently motivated for an account of lexical expressions (anaphor vs. pronoun). The accuracy of the empirical analyses as well as the viability of the particular formulations of principles remained to be determined. The following four chapters provide an explication and partial critique of the binding theory as it has developed over the past two and a half decades. The research reported in chapter 8, “Disjoint reference and wh-trace” (1981), began with two observations concerning May’s demonstration (1979, 1981) that certain violations of the COMP-to-COMP condition on wh-movement (Chomsky 1973) yielded a trace with a contradictory index. Details aside, May’s analysis was based on the argument in Chomsky 1976 that a wh-trace functions as a variable, on a par with a name. The empirical evidence for this argument involved the analysis of strong crossover constructions—e.g., (4) as compared to (5) (examples from Chomsky 1976). (4)
Who did he say Mary kissed?
(5)
Who said Mary kissed him?
While (5) allows for two distinct interpretations—a question about three people or two, (4) can only be interpreted as a question about three people. Chomsky’s analysis correlated the possible interpretations of (4–5) with corresponding interpretations for (6–7). (6)
He said Mary kissed John.
(7)
John said Mary kissed him.
(6) is a statement about three people, but (7) could also be a statement about just two (i.e., where him is anaphoric on John). The fact that a pronoun must be disjoint in reference from a name (or variable) that it c-commands accounts for the limitation on the interpretations of (4) and (6). Building on Chomsky’s analysis, May established an analytic connection between strong crossover phenomena and the COMP-to-COMP condition on wh-movement. However, like the discussion in Chomsky 1976, May’s analysis was limited to wh-movement out of object position. In discussing May’s result, Lasnik and I realized immediately that if it was viable then it would apply as well to violations of the COMP-to-COMP condition where the wh-phrase moves out of a complement finite clause subject position. And given the connection between COMP-to-
Introduction
7
COMP condition violations and strong crossover, there must be a strong crossover case involving movement of a wh-phrase from the subject of a finite clause—e.g., (8) (cf. (9)), with the corresponding interpretative possibilities. (8)
Who did he say likes Mary?
(9)
Who said he likes Mary?
Chapter 8 spells the ramifications of these observations for the theory of grammar, leading to a simplification of the current theories of binding and indexing. One important ramification concerned the NIC (as above), which Chomsky and others wanted to use to derive the empirical effects of the that-trace filter (Chomsky and Lasnik 1977). This required that a wh-trace in the subject position of a finite clause be analyzed as an anaphor and therefore in violation of the NIC. Chapter 8 demonstrates definitively that this analysis is not viable, thereby undercutting some of the motivation for the NIC as the correct formulation of the binding principle that covered anaphors in the subject position of a finite clause.10 (See also chapter 5 for further discussion against the NIC.) The NIC did, however, have an empirical advantage over its predecessors (e.g., the TSC)—namely, it distinguished (10) from (11), which were both excluded under the earlier proposals. (10)
*John thinks that himself is clever.
(11)
John thinks that pictures of himself are always unflattering.
Formulations of binding theory after the NIC handled the difference between (10) and (11) in terms of a notion of accessible “subject”, attempting to unify the prohibition against nominative anaphors with the SSC. This was achieved by treating the agreement element of the Inflection category as another instance of “subject” on a par with the syntactic subject. According to the binding principle for anaphors, an anaphor must be bound to an antecedent in the domain of an accessible SUBJECT (either syntactic subject or agreement element (i.e., in a finite clause)). The agreement element would be accessible to the syntactic subject it was linked to, but not to another NP properly contained within the syntactic subject, thereby accounting for (10) vs. (11).11 Chapter 9, “On the Fine Structure of the Binding Theory: Principle A and Reciprocals” (1983), demonstrates that the notion of accessible syntactic subject should be further refined. It shows that pleonastic syntactic subjects (non-referential it) do not function as accessible subjects for the binding principle for anaphors. The fact that a reciprocal expression can be properly bound across non-referential it suggests that only referential or θ-marked subjects are relevant to the binding principle for anaphors. Presumably only syntactic subjects that are potential antecedents function as accessible syntactic subjects. This is somewhat unexpected given that the agreement element, which is clearly not a potential antecedent, can also function as an accessible subject for binding theory. This analysis of pleonastic subjects makes it possible to test whether the agreement element is generally an accessible subject for anaphor binding. Notice that there are two
Generative grammar
8
cases to consider. One involves the NIC effect, where the anaphor in question is directly linked to the agreement element (via Spec-head agreement). The second concerns an agreement element that is linked to a pleonastic subject. The pleonastic subject is not accessible, but the agreement element could be, provided the resulting coindexing would not result in an i-within-i configuration, as in (12). (12)
They expected it would be reported to each other that John was lying.
(12)
contrasts with the deviant (13).
(13)
*They expected John would be reported to each other to be lying.
While (13) clearly violates the binding principle for anaphors, (12) does not, even though the anaphor is not antecedent-bound in the domain of an accessible SUBJECT (i.e., the agreement element of the sentential complement of expected). This shows that the binding principle for anaphors requires a fundamental reformulation, as is discussed in chapter 9, which also examines the noncomplementarity of anaphor vs. pronoun binding, as well as the divergence between NP-trace binding and anaphor binding. Chapter 10, “Fundamental issues in the theory of binding” (1986), provides a critical explication of GB binding theory, expanding on the critical investigations of binding theory presented in the previous two chapters. It argues against the standard view of the complementarity of binding for anaphors and pronouns—namely, that the principles for anaphors and pronouns have the same binding domain, and tries to show how the near complementarity follows from the overlap between two different binding domains. It examines how the theory of binding might apply to empty categories and thereby provide a typology for empty categories on a par with the typology of lexical NPs it yields. This section includes some critical discussion of the PRO-Theorem of Chomsky 1981, as well as a comparison of the functional determination of empty categories vs. their derivational determination. Finally, the paper addresses the question of the level(s) of representation at which the various binding principles apply. Based on evidence from reconstruction phenomena, the paper argues that binding principles apply solely at LF. The claim in chapter 10 that binding theory held at LF was based on empirical evidence. With the advent of the Minimalist Program (MP) in 1992, this analysis was further supported by an overwhelming conceptual argument—namely, given the elimination of D-structure and S-structure, LF was the only level of representation available at which binding theory could apply. The MP also ruled out indices as a grammatical device on the grounds that they violated a core requirement of minimalist analysis, an inclusiveness condition that restricted any structure formed by the computational system of human language (i.e., PF and LF) to only those elements already present in the lexical items selected, hence barring the introduction of new syntactic objects such as indices. Chomsky’s Inclusiveness Condition also bears directly on the theory of movement, specifically trace theory. In previous discussions traces were conceived of as “empty categories” left behind when a constituent was moved to another syntactic position. Traces were represented in analyses as an indexed symbol t or e, the index being necessary to identify the antecedent of the trace (i.e., a non-empty constituent). The elimination of indices rendered the empty category analysis of traces problematic because
Introduction
9
there was now no simple way to identify its antecedent. Furthermore, inclusiveness itself prohibits representations of traces as special symbols (i.e., t or e). Whether it also rules out bare categorial features (e.g., [+N, −V] for an NP-trace) is not obvious. And even if such syntactic objects were allowed under inclusiveness, the empty category interpretation of trace (i.e., the syntactic element that movement leaves in the position from which the constituent is moved) is computationally more complex than the copy/deletion theory of movement whereby movement leaves behind a full copy of the constituent moved, which is later deleted at PF. The copy theory in comparison, solves the problem of antecedence in a straightforward way In these ways, the MP condition on inclusiveness radically alters the basis on which a theory of binding can be constructed. Adopting this perspective, chapter 11, “Binding theory on minimalist assumptions” (1997) submits standard GB binding theory to a minimalist critique. It explores the effects of overt vs. covert movement on generating appropriate LF representations for the interpretation of anaphoric relations. Along the way it identifies a serious empirical problem for the analysis of “reconstruction” phenomena (where the structural relation of elements at PF does not correspond to the structural relation necessary for the appropriate interpretation at LF). This concerns the apparent failure of reconstruction when two names are involved ((15) as compared to (14)). (14)
How many pictures of Alice did she really like?
(15)
How many pictures of Alice did Alice really like?
While (14) can only be construed as involving two people, (15) also has an interpretation in which the question concerns only one person, Alice. Such evidence supports Lasnik’s 1991 analysis, which separates Principle C of the binding theory into two independent principles, one concerning the binding of names by pronouns and the other concerning the binding of names by names. Chapter 11 suggest that on conceptual grounds the treatment of names as anaphoric expressions is suspect at best and therefore the part of Principle C that deals with pronouns ought to be construed as a principle about what elements in which structural configurations can function as an antecedent to a pronoun. This discussion supports a substantial reformulation of binding theory. Thus, for example, taking Principle A as a rule of interpretation for bound anaphors (reflexive pronouns and reciprocal expressions), we can now unite the NIC and SSC under a single principle—essentially, an anaphor cannot be antecedent free, which itself may reduce to a violation of FI. B History The next group of papers is generally concerned with the evolution of generative theory—specifically, changes in theoretical perspective, their motivation and consequences. Chapter 12, “The Analysis of Passives” (1975), one of my earliest published papers, based on a chapter from my Ph.D. dissertation (1970), begins with an observation about a
Generative grammar
10
shift in theory from Syntactic Structures to Aspects of the Theory of Syntax that eliminated what then (1969–1970) appeared to be one of the strongest motivations for transformational analysis—namely, that the passive transformation accounted for the intuitive relation between active and passive sentences by deriving them from the same underlying structure. Under the standard theory of Aspects, passive sentences had a different underlying structure from their corresponding active counterparts, one that involved a new syntactic device, the empty node. Chapter 12 criticizes the use of empty nodes and attempts to replace the transformational analysis that uses them with a purely lexicalist account of passives that relies solely on phrase structure rules.12 It attempts to address another strong argument for a passive transformation (i.e., that it provided a natural account for the otherwise idiosyncratic heavy restrictions on the passive auxiliary (be-en)) by reanalyzing passive predicates as adjectives and the passive auxiliary as a (main) copular verb.13 The lexicalist analysis of passives developed in chapter 12 was at the time a distinctly minority view and has remained so, even though other versions were developed by Michael Brame and Joan Bresnan in the mid 1970s. It is of interest primarily as a foil to a transformational account and as an occasion for critically evaluating the putative empirical motivation for the transformation. With hindsight it is obvious that a monolithic passive transformation was already under attack even within transformational grammar, starting with Chomsky 1970 (written 1966), which split the passive rule into two parts, neither of which involved the insertion of the passive auxiliary or the passive by. See chapter 15 for a detailed discussion of the evolution of the passive transformation in generative grammar. It is worth noting that the proposals in chapter 12 were responding to a fundamental background assumption in transformational generative grammar. Transformations were viewed as a necessary additional mechanism for enhancing a phrase structure grammar account of human language—see for example, Chomsky 1957 and Postal 1964. It took over three and a half decades to recognize that this view was fundamentally wrong, with something stronger than the converse being true—i.e., that transformations were the only mechanism needed to construct phrase structure and therefore phrase structure rules were superfluous (also because they stipulated properties that otherwise followed from principles of the general theory). A comparison of core assumptions forms the basis of chapter 13, “Ideology in linguistic theory?” (1997), my review of Huck and Goldsmith’s Ideology and Linguistic Theory: Noam Chomsky and the Deep Structure Debates. The main thesis of their monograph is that the Generative Semantics (GS) program was not abandoned because its core assumptions were proved to be wrong, but rather because its adherents were overwhelmed by the ideological rhetoric of its major antagonist, Noam Chomsky. The evidence that Chomsky was motivated by ideology is that once GS was eliminated, he then reintroduced GS analyses in his own work. The review evaluates this thesis and related claims by comparing them to the published record, Chomsky’s three major papers of the early 1970s, reprinted in Chomsky 1972b. It demonstrates that the written record, including the correspondence cited in the book, does not provide evidence of a debate between Chomsky and the generative semanticists. It also shows that on Huck and Goldsmith’s account of the core propositions of GS there is no difference between the Extended Standard Theory and GS, which supports Chomsky’s 1972 assessment. If there
Introduction
11
is no difference in the core propositions then it would be natural for analyses that are compatible with GS assumptions to appear in Chomsky’s later work. However, as the review notes, the claim that Chomsky has reintroduced specific GS analyses in recent work is demonstrably false. Sometimes developments in linguistic theory can undermine previous analyses so that empirical evidence that has been interpreted as supporting some fundamental concept no longer supports that concept. Chapter 14, “Linguistic theory and language acquisition: a note on structure-dependence” (1991), explicates one such example. This note is ostensibly about the relation between linguistic theory and psycholinguistic studies of language acquisition, dealing with the Crain and Nakayama experiments on the structuredependence of rules (1987) as discussed in Crain 1991. These experiments concerned the formation of yes/no questions in English. Crain and Nakayama demonstrated that although young children made some errors in forming these questions, the errors never involved the kind of question that would constitute the output of a structure-independent formulation of the yes/no question rule. Their experiments were based on the discussions of structure-dependence in Chomsky 1971b and 1975, which concerned a particular format for transformational rules—one that involved the specific mention of constant terms (e.g., subject NP) to specify a context for the application of the transformation in the structural description. However, with the advent of the Move α formulation of movement transformations under the Principles and Parameters framework, where no context is mentioned in the formulation of that rule, the evidence that Chomsky had discussed (and Crain and Nakayama were testing) no longer bore on the issue of the structure-dependence of transformational rules. Move α is capable of generating the same deviant sentences as a structure-independent formulation of the yes/no question rule. Within the Principles and Parameters framework, the deviance is handled by general principles of grammar (e.g., the HMC) and thus the Crain and Nakayama experiments really speak to the innateness of these principles rather than the innateness of the structure-dependence of rules.14 The shift from the LSLT format for transformations (Chomsky 1975a) to Move α (beginning in the mid to late 1970s) was a technical change that was motivated by more fundamental conceptual shifts in theoretical perspective. Chapter 15, “Conceptual shifts in the science of grammar: 1951–1992” (1994), elucidates several conceptual shifts in the history of transformational generative grammar, from its inception in the late 1940s and early 1950s to the advent of the Minimalist Program in the early 1990s. Chomsky’s first work in linguistics, his undergraduate thesis, which he expanded into a master’s thesis (The morphophonemics of Modern Hebrew (1951)), constitutes a major shift of focus from the methods of linguistic analysis, mostly taxonomic procedures, to the construction of grammars and a theory of structural analyses. As this work developed another equally fundamental shift in perspective occurred—from E-language to I-language, i.e., the adoption of the psychological interpretation of grammar. Much of this is illustrated in the chapter by a comparison of Chomsky’s work with that of his teacher Zellig Harris. The psychological interpretation of grammar led eventually to a shift in focus from rule systems for particular languages to more general properties of the language faculty as instantiated in general principles of grammar. This shift is illustrated with a brief history of the passive transformation, which exemplifies how language-particular and construction-specific grammatical rules were gradually replaced by general principles
Generative grammar
12
and mechanisms of UG. Reanalyzing the principles and mechanisms of the theory of UG in terms of notions of economy leads to another conceptual shift from systems of principles to questions of language design, leading ultimately to the concerns of the Minimalist Program. My 1997 review article on Chomsky’s The Minimalist Program (chapter 16) attempts to present and evaluate the MP in terms of its fundamental assumptions and the changes in theoretical perspective that they have produced. The MP shares several of the most basic assumptions with previous theories in generative grammar. Given that the MP fits very much within the Principles and Parameters framework, it also shares several assumptions with that framework. Thus separating those assumptions unique to the MP from the rest provides a clearer understanding of the special contribution of the MP to linguistic theory. These assumptions range from the very general and abstract (“a theory of grammar must meet a criterion of conceptual necessity” and “a linguistic expression is the optimal realization of interface conditions”) to the relatively specific and concrete (“the interface levels LF and PF are the only relevant linguistic levels” and “all conditions are interface conditions” and “phrase structure representation is bare”).15 Chapter 16 attempts to explicate and evaluate these assumptions as they apply to some specific details of analysis. It also offers a tentative answer to the question of whether the MP constitutes a major breakthrough in the study of language and mind, a guardedly affirmative one—one which continues to hold, I think, in the current period.16 The discussion of the MP is developed further in chapter 17, “Exquisite connections: some remarks on the evolution of linguistic theory” (2001). This chapter addresses three important and interrelated issues: the relation between the MP and its predecessors (GB theory in particular), the empirical and conceptual motivation of the MP, and the relation of theoretical linguistics to the natural sciences. As detailed in the chapter, the shift from GB to the MP is motivated to a large extent by the longstanding general methodological requirements that overlapping conditions ought to be eliminated whenever possible and the analyses ought to maximize empirical generalizations. In practice, economy conditions, which were introduced several years prior to the advent of the MP, have replaced significant portions of the older GB principles. Such developments relate directly to general considerations of conceptual naturalness such as simplicity, economy and non-redundancy that play a role in both linguistics and the natural sciences—physics in particular. Some examples from the history of physics are discussed in the paper. In addition, a new analysis of Principle C effects for pronouns involving symmetries across levels of analysis as a means for enhancing the economy of computations is proposed as an example for linguistics.17 The final chapter in this collection, “Syntactic Structures redux” (2004), extends the commentary in Lasnik 2000 on Chomsky’s celebrated 1957 monograph, the transformational analysis of the English verbal morphology system, and the developments that followed up through the 1990s. It reviews Lasnik’s critique of the transformational machinery in Syntactic Structures and shows how the problems he identifies are resolved under a more minimal theory of transformations that developed circa 1980. Considering the three central topics in the analysis of this system: the order and form of verbal elements, the main verb movement parameter, and the distribution of periphrastic do, it discusses some problems that arise in Lasnik’s hybrid account and proposes a lexicalist alternative that eliminates the rules of affix-hopping and do-support.
Introduction
13
This new analysis, like Lasnik’s hybrid analysis, falls roughly within the guidelines of the MP and therefore constitutes another minimalist commentary on the transformational analysis of the verbal morphology system. It demonstrates in detail how past and present linguistic theory can be mutually illuminating. The work in this volume, like all my work in linguistics, has benefitted enormously from exchanges with colleagues and students. The coauthored papers are special cases where the conversations began long before the first word was written and continued right up through the final proofs. I would like to thank my coauthors, Wayne Harbert, Howard Lasnik, James Lavine, Rex Sprouse, and Jean-Roger Vergnaud for their permission to reprint our joint articles in this volume and for many happy hours of conversation. I am also especially indebted to Carlos Otero for his essential contribution to this work, including thirty years of enlightening conversation. And, like so many of my colleagues, my intellectual debt to Noam Chomsky is enormous and incalculable. Notes
1 Appendix 2 discusses one case that cannot be ruled out by Subjacency. (i)
Who was known what to see?
The complement CP violates Chomsky & Lasnik’s [NP to VP] filter (1977). Notice, however, that (i) does not violate the Case Filter, which superseded the Chomsky-Lasnik filter, given that who moves from complement subject to Casemarked matrix subject position before it moves to matrix Spec-CP The second wh-movement is not a crucial factor because subject raising is also blocked out of wh-infinitivals, as in (i).
(ii)
*John was known which authors to admire.
However if the complement object wh-phrase moves to the matrix Spec-CP, the result is no longer deviant. (iii)
Which authors was John known to admire?
In fact, it is not obvious how (i–ii) can be accounted for under current proposals, especially in light of (iii). 2 Some constructions could be handled with the Tensed-S and Specified Subject conditions of Chomsky 1973, also construed as conditions on trace binding (but see chapter 8 for evidence against applying the TSC and SSC to wh-traces). Others required the postulation of principles predicate/argument structure—ultimately incorporated into the θ-Criterion of Chomsky 1981. A third part of the θ-Criterion was first proposed in Freidin 1975b, footnote 20 (chapter 12). 3 This depends, of course, on a precise analysis of how wh-movement functions. Therefore conclusions may differ depending on whether we are assuming movement occurs
Generative grammar
14
via Move vs. Attract, and independently on whether the movement operation applies to an entire constituent or just selected features of the constituent. 4 Chomsky 1993 proposes the substitution operation as the generalized transformation that constructs phrase structure from lexical items and also yields movement. In this analysis, substitution “takes a phrase marker K1 and inserts it in a designated empty position Ø in a phrase marker K, forming the new phrase marker K*, which satisfies X-bar theory” (p. 22). Ø is not part of the lexicon, but rather an element provided by the substitution operation itself. When substitution applies to a single phrase marker, movement results. When it concatenates two separate phrase markers, it builds structure. In Chomsky 1995b the adjunction operation (designated as Merge) replaces the substitution operation on natural simplicity grounds: substitution involves an extra step to introduce a designated empty position, whereas adjunction simply creates the position automatically at the adjunction site. Note that under classical substitution analyses, the element to be substituted is generated independently by another process, whereas under the generalized transformation analysis, the creation of the empty position and its replacement are part of the same process. In a sense the formulation of substitution appears to be an overly complicated formulation of adjunction. Notice further that the postulation of the designated empty position Ø violates the Inclusiveness Condition of Chomsky 1995d. 5 The account in Chomsky 1993 is more nuanced. It applies only to substitution operations, thereby allowing for the possibility that adjunction operations need not obey “cyclicity”. 6 Note that the Extension Condition is also redundant with respect to these wh-island violations. Similar arguments are available for the other constructions offered as empirical motivation for a cyclic principle—namely, super-raising and violations of the HMC and the CED. Notice that this analysis also supersedes the Subjacency analysis of SCC violations involving wh-islands. Thus we no longer have an argument for treating Subjacency as a condition on representations rather than a condition on the application of rules. Each analysis is still possible, but neither is preferable (in contrast to the situation in chapter 2). Whether a given condition should be interpreted derivationally or representationally is ultimately an empirical issue. Under current minimalist analyses, it appears that UG involves both conditions on representations (e.g., Full Interpretation) and conditions on representations (e.g., the Phase Impenetrability Condition); however some researchers have argued that conditions in UG should be uniform (i.e., one kind or the other)—see Epstein and Seely 2002 for discussion. 7 To be published in Freidin and Lasnik (2005). 8 Actually, the visibility proposal reduces all Case Filter violations to θ-Criterion violations, thereby dispensing with the Case Filter. Thus phonetically realized NPs and traces that are not marked for Case are invisible at LF and therefore cannot be assigned a θ-role. This approach leads to the controversial null-Case analysis of PRO (Martin 1996, Chomsky and Lasnik 1993). The analysis shifts Case considerations from the PF-side to the LF-side, thereby suggesting that the PF phenomenon of morphological Case must be separated from the notion of abstract Case. But exactly what this latter notion is now remains obscure. 9 This follows under the assumption that unvalued Case features (essentially equivalent to valued but unchecked Case features) on nominal expressions are uninterpretable at one or more interfaces (PF and/or LF) and therefore violate FI. Under the visibility analysis discussed above, unvalued Case features would prevent the assignment of a θ-role to a nominal expression, which would then violate the functional relatedness part of the θCriterion. This too might be considered to be ultimately a violation of FI if the nominal expression is nonetheless legible at LF but without a role in the predicate/argument architecture. 10 Given that no wh-trace functioned as an anaphor, part of the analysis in chapter 2 must be abandoned—i.e., the use of the TSC and SSC to derive some of the empirical effects of the SCC. Thus the analysis in chapter 8 provided additional support for interpreting Subjacency as a condition on representations.
Introduction
15
11 The notion “accessible” was formulated in terms of an analysis of indexing. A SUBJECT was accessible if coindexation between it and the anaphor did not result in a configuration where some NP properly contained in another NP bore the same index as the NP that contained it (the so-called i-within-i prohibition). 12 The proposal was more general, suggesting that all structure-preserving (in the sense of Emonds 1970) transformations—at that time the standard NP movements (passive, raising, dative)—could be handled in a similar fashion. 13 Wasow 1977 argues against this analysis by trying to show that passive constructions share properties that are common to transformationally derived structures rather than those derived by lexical rules. From the current perspective, almost thirty years later, some aspects of the lexical analysis still seem plausible. Given Case theory, the passive predicate must be distinguished from its corresponding verb. In terms of Case it behaves like an adjective rather than a verb. With the elimination of phrase structure rules, the problem of an AP rule where the A uncharacteristically takes an NP object disappears. The assignment of Case to the direct object in a double object construction is a special case for active verb as well as the passive predicate. Whether the grammatical subject in a passive has been displaced from another position during the derivation or is interpreted as filling the θ-role linked to that position may be a difference too subtle to distinguish with the evidence available. If the evidence does not distinguish the two analyses, we may well be dealing with notational variants. 14 The discussion in chapter 14 also bears directly on the issue of the poverty of the stimulus. See Lasnik & Uriagereka 2002 for further discussion. 15 Whether bare phrase structure is unique to the MP is debatable. Phrase structure analysis under the GB theory was moving in that direction (cf. Freidin 1992) and furthermore there was no principle in GB that would have precluded it. 16 See Chomsky 2005 for a more recent discussion of what has been achieved within the MP. 17 This extends the discussions of Principle C and binding theory more generally that occur in chapters 8, 10 and 11.
Part I Theory
§A: Movement
2 Cyclicity and the theory of grammar* The general principles governing the operation of the transformational component of a grammar have always been one of the central issues in the theory of syntax. Among these, the syntactic cycle is perhaps the most familiar and widely studied. In what follows, I will show that the cycle (i.e., the syntactic cycle) can in effect be derived from other conditions that are independently motivated. This result suggests a shift of emphasis in syntactic research. This article is organized as follows. Section 1 contains a discussion of the notion “cycle” and its empirical motivation. Section 2 provides an explication of the theoretical framework in which the derivation of the cycle is possible. Section 3 shows how the derivation works in detail. Section 4 and appendices 1 and 2 contain a discussion of what the derivation signifies for syntactic theory. 1 The Syntactic Cycle In its initial formulation—essentially as in (1) (cf. Chomsky (1966))—the cycle is a general principle that determines, in part, the order of application of transformations with respect to syntactic domains in phrase markers. (1) In a derivation, for all syntactic domains α in a phrase marker, a linear sequence of transformations applies to a domain αi before applying to αj, where αj contains αi.
“α” denotes the set of categories that constitute cyclic domains.1 Under (1), rules apply to successively larger cyclic domains until the entire phrase marker has been processed. This mode of application is commonly referred to as “bottom-to-top.” A sharper notion of the cycle is provided by the Strict Cycle Condition (Chomsky (1973, 243)): (2) Strict Cycle Condition (SCC) No rule can apply to a domain dominated by a cyclic node A in such a way as to affect solely a proper subdomain of A dominated by a node B which is also a cyclic node.2
The SCC ensures that once a cycle has been passed in a derivation, it is inaccessible to any rule that does not analyze it as a subdomain—i.e., by making crucial reference to some constant term in the matrix domain. For example, suppose that, given a complex structure (3), a rule Ri applying on the S1 cycle transformed the structure of S2 in such a
Cyclicity and the theory of grammar
21
way that a rule Rj could then apply solely within S2, where the structural description of Rj was not met before the application of Ri. (3)
In this derivation Ri feeds Rj, and the feeding relationship holds between a cyclic domain and its cyclic subdomain. The SCC, but not the cyclic principle in (1), rules out such derivations. As a concrete illustration, consider the following derivation (4), in which Wh Movement violates the SCC (but not (1)) and results in misgeneration (i.e., an ungrammatical output).3 (4)
a.
[S′1 [C1]
[S1 John knows [S′1 [C2]
[S2 who saw what]]]]
b.
[S′1 [C1]
[S1 John knows [S′2 [C2 who]
[S2 saw what]]]]
c.
[S′1 [C1
[S1 John knows [S′2 [C2]
[S2 saw what]]]]
d.
[S′1 [C1 who]
[S1 John knows [S′2 [C2 what]
[S2 saw]]]]
who]
Specifically, the movement that maps (4c) onto (4d) violates the SCC. The movement of who from the embedded COMP to the matrix COMP on the S′1 cycle feeds the movement of what in S′2 into the embedded COMP. The latter movement cannot apply when the embedded COMP is filled. When it does apply, the application crucially involves only terms within the cyclic subdomain S′2. (4) gives a partial derivation of (5). (5)
*Who does John know what saw?
(5) constitutes the crucial case for the SCC in previous discussions (see Chomsky (1973)). In section 3, many other cases will be discussed. The SCC is not merely an ancillary principle to (1)—which is usually conceived of as “the cyclic principle.” Rather, the SCC subsumes this notion of cycle (see also Lasnik and Kupin (1977)). Clearly if rules may not apply solely within cyclic subdomains on any given cycle, then the only point in a derivation where they may legally apply solely within those subdomains is on the cycle of the sub-domain. Given the SCC, a rule that can apply to the most deeply embedded cyclic domain of a phrase marker must apply on the cycle of that domain or not at all. Therefore, a stipulation about the order in which
Generative grammar
22
subdomains of a phrase marker must be operated on by rules (e.g., (1)) is superfluous, since this ordering (i.e., bottom-to-top) follows from the SCC alone. In fact, without the SCC or its equivalent, it is questionable whether there is a coherent notion of cycle at all. That is, without the SCC it is possible to construct derivations involving rule orderings that wildly violate bottom-to-top ordering but are consistent with (1). Given a structure of n cyclic domains where some optional rule may apply to each domain, it would be allowed under (1) to refrain from exercising the option to apply these rules until the last cycle. If the structural descriptions of these optional rules are still met in the various cyclic subdomains of the last cycle, then the option to apply them may be exercised. In this case, the rules could apply in every possible order—only one of which is bottom-to-top. Clearly (1) is not sufficient to ensure that only cyclic (bottom-to-top) ordering is permissible. In marked contrast, the SCC permits only the cyclic ordering. The fact that the SCC reduces the class of possible derivations to only those that are cyclic has no empirical significance in the hypothetical case cited above because the output of the noncyclic derivations will be identical to that of the cyclic derivation. The interaction of the SCC and the two possible derivations of (6) provides a concrete illustration. (6)
[S′1 [NP the car [S′2 which had been stolen]] was discovered in a ditch]
Both derivations involve NP Preposing in S′1 and S′2. The SCC prohibits the derivation in which the rule applies to S′1 before it applies to S′2. Yet because the outcome is the wellformed (6) in any case, the exclusion of one possible derivation of (6) does not alter the strong or weak generative capacity of the grammar. Reductions of this sort are therefore without empirical consequence with respect to the class of possible languages a grammar generates. In contrast to (6), there are cases in which the SCC excludes derivations that result in misgeneration. (4) above is one example. Such cases provide the empirical motivation for the SCC. Ill-formed strings whose derivations violate the SCC (e.g., (5)) constitute the empirical content of the SCC. In the next section, (5) will be considered as paradigmatic of SCC violations. It will be demonstrated that the empirical effect of the SCC can be derived within trace theory from an independently motivated condition on traces. This condition makes no reference to notions like “cyclic domain” or “stage of the cycle.” 2 Theoretical framework I presuppose here the general framework of the extended standard theory (EST, as in Chomsky (1972b; 1975c))—in particular, the theory of traces (cf. Chomsky (1976; 1977a); Fiengo (1977)) and such conditions on the application of rules as the Tensed-S Condition and the Specified Subject Condition (cf. Chomsky (1973; 1976; 1977a)). Trace theory is discussed in section 2.1; the Tensed-S and Specified Subject Conditions in section 2.2.
Cyclicity and the theory of grammar
23
2.1 Trace Theory The notion “trace” assumed here is defined as an empty category—formally [β e], where “β” denotes an arbitrary category of the base and “e” stands for the identity element.4 For example, [NP e] is an NP trace, [PP e] is a PP trace, and so on.5 As empty categories, traces have two distinct sources. They may be transformationally generated by movement operations, which leave an empty category (identical to the moved category) at the site from which the movement occurred. This follows in part from the structure-preserving hypothesis (Emonds (1976))—see Chomsky (1977b) and Freidin (1977) for discussion. Traces may also be base generated, which follows from the optionality of base rules (cf. Chomsky (1965) and Emonds (1976))—assuming a convention that an unexpanded category β is equivalent to [β e]. It is assumed here that all categories in base structures are indexed, including base-generated traces.6 However, coindexing is only a result of movement or construal rules—e.g., the rule that assigns a reflexive pronoun an antecedent (see Chomsky (1977a)). The derivation of “truncated” passives illustrates both sources. (7)
a. b. c.
Mike was arrested yesterday. [S [NPi e] [VP was arrested [NPj Mike] yesterday]] [S [NPj Mike] [VP was arrested [NPj e] yesterday]]
The base structure (7b) (details omitted) of sentence (7a) undergoes NP Preposing, generating (7c). The trace in (7b) is base-generated; the one in (7c) is transformationally generated. As in (7), base-generated trace is not coindexed with any lexically filled category, whereas transformationally-generated trace is coindexed by definition of movement. The difference in indexing serves to distinguish between the two types of trace throughout a derivation. They are subject to different surface structure conditions, as will be discussed below. The relation between a moved phrase and its trace is interpreted here as that of bound anaphora, following Chomsky (1977a; 1977b) and Fiengo (1977). This interpretation allows us to capture the striking generalization that traces and bound anaphors (e.g., reflexive pronouns and reciprocals (each other)) are subject to exactly the same binding conditions (Chomsky (1976)). For example, traces, like reflexive pronouns, must be bound to an antecedent in S—in contrast to regular pronouns (he, she, etc.), which are essentially free in reference (see Lasnik (1976) and Chomsky (1976)). Binding can be formally expressed in terms of coindexing of categories. Under this interpretation, the distinction between base-generated and transformationally-generated trace translates as unbound vs. bound anaphor. In general, bound anaphora must meet the following conditions on binding: (8)
Proper Binding (PB) Each bound anaphor αi in a phrase marker Pj must be a.
bound to some antecedent in Pj; and
b.
c-commanded by its antecedent.7
Generative grammar
24
PB accounts for the following facts concerning the occurrence of bound anaphors like reflexives, reciprocals, and traces in surface structure. (9)
a.
b.
c.
i.
Larryi hurt himselfi while playing hockey.
ii.
The childreni ignored each otheri whenever the television was on.
iii.
The booki was published [NPi e] by Penguin last year.
i.
*Himselfi left early.
ii.
*Each otheri slept late,
iii.
*[NPi e] is here now.
i.
*Himselfi hurt Larryi while playing hockey.
ii.
*Each otheri ignored the childreni whenever the television was on.
iii.
*[NPi e] was published that booki by Penguin last year.
The examples in (9b) contain “bound anaphors” that are not bound to an antecedent, thus violating (8a). In (9c) the examples contain bound anaphora where the antecedent does not c-command the anaphor, thus violating (8b). Only the examples in (9a) are wellformed with respect to PB. With this preliminary sketch of trace theory in mind, we turn now to the analysis of strict cyclicity. Within the framework of trace theory, the derivation of (5) involves (10) (cf. (4)), which yields (11) as a surface structure.8 (5)
*Who does John know what saw?
(10)
a
[
[C1 ] [S1 John knows [
[C2 ] [S2 whoi saw whatj]]]]
b.
[
[C1] [S1 John knows [
[C2 whoi] [S2 ti saw whatj]]]]
c.
[
[C1 whoi] [S1 John knows [
[C2 ti] [S2 ti saw whatj]]]]
d.
[
[C1 whoi] [S1 John knows [
[C2 whatj] [S2 ti saw tj ]]]]
(11)
[S′1 [C1
whoi] [S1 does John knows
[S′2
[C2 whatj] [S2 ti saw tj]]]]
The mapping of (10c) onto (10d) via Wh Movement violates the SCC. It also results in the erasure of a bound trace, ti in COMP2 . Recalling that an NP trace is formally [NPi e] we see that the effect of trace erasure is the replacement of the identity element “e” with lexical material and, more importantly, a change of index on the category (see footnote 9). A prohibition against erasing bound traces would also exclude the mapping from (10c) to (10d). (12)
Trace Erasure Prohibition (TEP) A bound trace may not be erased.9
Cyclicity and the theory of grammar
25
The TEP guarantees that once a trace is bound, it is inaccessible to rules and therefore remains invariant throughout the rest of the derivation. Compared with the SCC, it provides an alternative account of why (10) results in the misgeneration of (5) that makes no reference to notions like “cyclic domain” and “stage of the cycle.” The TEP (or its equivalent) is essential for a unified theory of semantic interpretation under trace theory (see Chomsky (1976)). For example, the TEP excludes the following derivation that occurs within a single cyclic S domain. (13)
a. b. c.
[S [NPh e] [VP was given [NPi Bernie] [NPj a camera]]] [S [NPi Bernie] [VP was given [NPi e] [NPj a camera]]] [S [NPi Bernie] [VP was given [NPj a camera] [NPj e]]]
The mapping from (13b) onto (13c) violates the TEP. The structure (13c) that results from the violation of the TEP is uninterpretable at the level of surface structure, the level at which all interpretation occurs within the unified theory. Given only (13c), there is no way to interpret NPi as the indirect object of the predicate was given. Within trace theory, the fact that (13c) does not receive the correct interpretation—if any at all—constitutes independent empirical motivation for the TEP (that is, aside from violations of the SCC it covers).10 The SCC and TEP account for the misgeneration of (5) only if the derivation of (5) is (10). However, there is an alternative derivation of (5) that violates neither the SCC nor the TEP. (14)
a. b. c.
[ [C1 ] [S1 John knows [S′2 [C2 ] [S2 who saw whatj]]]] [ [C1] [S1 John knows [S′2 [C2 whatj] [S2 whoi saw tj]]]] [ [C1 whoi] [S1 John knows [S′2 [C2 whatj] [S2 ti saw tj]]]]
The derivation (14) also yields (11) as a surface structure of (5). Therefore, another condition is needed to prevent the misgeneration of (5); that is, a condition that excludes (14) as a possible derivation. (14) shows that neither the SCC nor the TEP is sufficient by itself to account for the misgeneration of (5). 2.2 Conditions In the derivation (14), the mapping of (14b) onto (14c) violates the Tensed-S Condition. (15)
Tensed-S Condition (TSC) No rule can involve X, Y in the structure …X…[α…Y…]… where α is a tensed sentence. (Chomsky (1973, (20)))
The movement of whoi into the external COMP1 is out of a tensed sentence and therefore prohibited by (15). The effect of the TSC is to make all categories in embedded finite clauses, with the exception of the embedded COMP (to be discussed below), inaccessible to rules that
Generative grammar
26
relate them to categories in matrix clauses. The TSC holds for rules of construal (e.g., the rule that assigns coindexing between a reflexive pronoun and its antecedent) as well as for movement rules (see Chomsky (1973, 241) and Chomsky (1976) for important discussions). Thus, in (16), the TSC can be construed as blocking the coindexing of the reflexive pronoun in the embedded finite clause with the matrix subject. (16)
*Johni thought [S′ that himselfj was clever]
It is important to note that under this interpretation the TSC does not directly account for the ill-formedness of (16). Rather, it is the proper binding condition on bound anaphora (specifically (8a) above) that determines the ill-formedness of (16). (16) is excluded because it contains a “bound anaphor” that is not bound to an antecedent (cf. (9b)). The account of (16) just given crucially depends on construing the TSC as a condition on the application of rules. Suppose though that coindexing by construal rules applies freely without any conditions (or even that indexing in the base is free, thus allowing coindexing in the base). In this instance, a derivation could yield (17) rather than (16). (17)
*Johni thought [S′ that himselfi was clever]
(17) does not violate the conditions on proper binding (8) because the reflexive pronoun is bound to an antecedent that c-commands it. Nonetheless, the binding relation expressed in (17) is not well-formed. This follows automatically if we interpret the TSC as a condition on proper binding rather than as a condition on the application of rules (cf. Chomsky (1976; 1977b)). Interpreted as a condition on proper binding at the level of surface structure (technically, interpreted surface structure—i.e., after the application of construal rules), the TSC would be formulated as shown in (18): (18)
TSC In a structure …X…[α…Y…]… where α is a tensed sentence, X may not properly bind Y.
(A more explicit formulation is given in (27).) Given the convention that “bound anaphors” must be properly bound, (18) accounts for the ill-formedness of (19). (19)
a.
*Johni thought [S′ that himselfi was clever] (=(17))
b.
*theyi expected [S′ that each otheri would win]
c.
*Jilli was reported [S′ that [NPi e] had won a prize]
Here too a condition on proper binding for lexical bound anaphora extends naturally to trace binding as well. Note that the analysis of the relation between a moved phrase (NP
Cyclicity and the theory of grammar
27
in all the cases considered thus far) and its trace as that of bound anaphora is an empirical issue and not a matter of principle or definition. At this point we might ask whether there is any difference between interpreting the TSC as a condition on the application of rules, as in (15), and interpreting it as a condition on proper binding, as in (18). The answer is in the affirmative; but the relevant facts involve traces rather than lexical bound anaphors like reflexive pronouns. Interpreted as a condition on proper binding, the TSC (i.e., (18)) automatically accounts for the ill-formedness of (20).11 (20)
*[[
[C1 whoi] [S1 John knows [
[C2 whatj]S2 ti saw tj]]]]
whoi in COMP1 binds a trace in a tensed sentence, S′2—thus violating proper binding under (18). (20) is a structure that occurs in the two derivations of (5) discussed above, (10) and (14)—that is, (20)=(10d)=(14c). Interpreted as a condition on proper binding, the TSC is all that is needed to account for the misgeneration of (5). However, if the TSC is interpreted as a condition on the application of rules, only the derivation (14) is accounted for and so we need to stipulate the SCC (or TEP) in addition to account for the derivation (10). Thus, it does make a difference how the TSC is interpreted. Only under the proper binding interpretation is it possible to derive the empirical effect of the SCC and thereby eliminate what turns out to be a redundancy in the content of the theory of grammar. In the case of (5), then, the SCC is superfluous. In section 3 it will be shown how this analysis extends to all cases in which violations of the SCC result in misgeneration.12 Our analysis illustrates how the empirical effects of the SCC can in part be derived from independently motivated conditions on proper binding. This results from an interpretation of the TSC as a condition on proper binding. Note that the Specified Subject Condition (Chomsky (1973; 1976)) is also susceptible to such an interpretation.13 (21) Specified Subject Condition (SSC) In a structure …X…[α…Y…]… where α contains a subject distinct from Y and not controlled by X, X may not properly bind Y. (cf. Chomsky (1976, (11)))
Formulated in this way, the SSC accounts for the ill-formedness of the following structures. (22)
a. b. c.
*Johni believed [S′ Mary to like himselfi] *wei expected [S′ John to consult each otheri] *Jilli was reported [S′ Jack to have insulted [NPi e]]
Here again, lexical bound anaphors and traces pattern together. Taken in conjunction with one another, the TSC and SSC prohibit binding between all categories in matrix and embedded clauses, with the exception of binding that involves the subject of an embedded nonfinite clause or the embedded COMP. In effect, these
Generative grammar
28
conditions specify “opaque domains” with respect to binding. A domain is “opaque” if a category within it may not be bound to a category outside. Thus, the predicates of embedded clauses and the subjects of finite clauses constitute opaque domains with respect to categories in a matrix clause. Viewed in this way, the TSC and SSC define an “opacity principle” for anaphora in natural language.14 As defined, the Opacity Principle permits binding between matrix categories and the subject of an embedded nonfinite clause as illustrated in (23). (23)
a. b. c.
Johni believed [S′ himselfi to be clever] wei expect [S′ each otheri to succeed] Jilli was reported [S′[NPi e]to have insulted Jack]
Comparing (23) with (22), we see that in nonfinite clauses, only the predicates constitute opaque domains. It is important to note that predicates of nonfinite clauses are not opaque in any absolute or global sense. Thus, consider (24): (24)
[
Jilli was reported
ti to have been insulted ti by Jack]]
According to a global interpretation of the Opacity Principle, the binding relation between Jilli in the matrix clause and the trace ti in the embedded predicate would not be permissible. Since (24) yields a grammatical sentence, this interpretation must be rejected. Instead, a local interpretation of the Opacity Principle is required. Towards this end, we observe that there is a crucial difference between the structures in (24) and (22c)—namely, no possible antecedent intervenes between Jilli and ti in (22c), whereas ti in the embedded subject intervenes between Jilli and the ti of the embedded predicate in (24). Thus, in (24), neither the binding between the matrix subject and the embedded subject nor the binding between the embedded subject and the embedded object violates the Opacity Principle. In each instance the binding relation is local, in that each antecedent/anaphor pair is adjacent with respect to any other possible antecedent or anaphor. Each pair constitutes a link in a chain of binding relations. A chain is wellformed if and only if each link is well-formed. Given this rather informal sketch of a linking convention, the Opacity Principle can be said to designate the subject of nonfinite clauses as an “escape hatch.”15 This analysis extends quite naturally to the embedded COMP. Thus, compare (25) and (26). (25)
[ [C1 whoi] [S1 John said [
(26)
[ [C1 whoj] [S1 John knew [
[C2 ti] [S2 Fred saw ti]]]] [C2 whatk [S2 tk frightened tj]]]]
Structure (25) yields a grammatical sentence, Who did John say Fred saw?; whereas (26) yields an ungrammatical sentence, *Who did John know what frightened? Given the linking convention, the Opacity Principle predicts this difference. The binding relation between whoj and its trace in (26) violates the TSC and SSC. In (25), the link between the trace ti in S2 and the one in COMP2 does not violate the Opacity Principle. Since (25)
Cyclicity and the theory of grammar
29
yields a grammatical sentence, it must be assumed that the link between whoi in COMP1 and ti in COMP2 is also well-formed. That is, the Opacity Principle must be formulated in such a way that COMP is also an “escape hatch.” (See Chomsky (1973; 1977b) for a detailed discussion of this issue.) Such a formulation is given in (27). (27)
Opacity Principle In a structure …X…[α…Y…]… where: a. α=S′ (or NP) and b. Y is not bound to any c-commanding category in α (linking convenion), if Y is in the domain of16 i. a finite clause (TSC) or ii. a subject not controlled by X (SSC), then X may not properly bind Y.
In order that COMP have the required escape hatch property, “finite clause” in (27i) must be construed as S rather than S′. Therefore, the domain of a subject and a finite clause are the same, i.e., S. The linking convention (27b) guarantees that only adjacent categories in a binding chain will be subject to opacity conditions on proper binding.17 The Opacity Principle plays a central role in the derivation of the empirical effects of the SCC, as will be documented in the following section. 3 Analysis of the evidence for Strict Cyclicity In this section, it is assumed that the major movement rules are formulated as in (28) (see Chomsky (1976) for discussion). (28)
a. Move NP b. Move wh-phrase
These rules apply freely without constraint, and thus there is no question as to whether a grammar will misgenerate any of the examples that follow. In fact, (28) misgenerates a considerably larger class of structures whose derivations violate the SCC than has been discussed previously. It will be demonstrated here that the Opacity Principle (27) and other independently motivated well-formedness conditions on structure account for all such cases of misgeneration—thus extending the analysis of (5) in section 2 to all the empirical effects of the SCC. Schematically, all violations of the SCC resulting in misgeneration involve a derivation (29), where [Pi ] designates a base-generated position. (29)
a. b. c.
[ …[Pa ]…[ …[Pb X]…[Pc Y]…]…] [ …[Pa X]…[ …[Pb ]…[Pc Y]…]…] [ …[Pa X]…[ …[Pb Y]…[Pc ]…]…]
Generative grammar
30
Because the mapping from (29a) onto (29b) crucially involves the cycle, the mapping from (29b) onto (29c), which affects solely the cyclic subdomain of violates the SCC. Every such violation depends on a pair of movements of the same type. If X and Y are NPs, rule (28a) applies; if they are wh-phrases, rule (28b) does. Beyond this, there are several other parameters. S′2 may contain either a finite or a nonfinite clause. X may have been base-generated in Pb or moved into it by a rule. The latter situation is covered in section 3.1; the former, in section 3.2. Almost all cases of misgeneration resulting from violations of the SCC have alternative derivations that do not violate the SCC (as illustrated in section 2 with respect to (5)). Because these cases will be accounted for by conditions on structure, how they are derived will be of no consequence. Therefore, only the derivations that violate the SCC will be given. 3.1 Multiple movements out of the complement predicate 3.1.1 NP Movements The derivation (30), which violates the SCC, misgenerates (31a).18 (30)
a. b. c. d.
[ np was reported [ that [S2 np was given Johni a bookj]]] [ np was reported [ that [S2 Johni was given ti a bookj ]]] [ Johni was reported [ that [S2 ti was given ti a bookj]]] [ Johni was reported [ that [S2 a bookj was given ti tj ]]]
(31)
a. b.
*John was reported that a book was given. *John was reported a book to have been given.
(31b) would result if the verb in the sentential complement in (30) were nonfinite—e.g., to have been given. In (30d) Johni binds a trace in a tensed sentence (S′2), thereby violating the Opacity Principle (specifically the TSC (27i)). The trace of Johni is also in the domain of a specified subject, a bookj. Therefore, even if the complement in (30) were nonfinite (as in (31b)), the binding between Johni and its trace would still violate the Opacity Principle—in this instance, the SSC (27ii). 3.1.2 Wh movements When a sentential complement contains a double object verb whose objects are both whphrases, a derivation that violates the SCC is similar in result to (30). In the derivation (32), GIVE represents either the finite or the nonfinite verb form. (32)
a. b. c. d.
[ [ [ [
[C1 e ] [S1 Max knew [ [C2 e] [S2 Mary GIVE whoi whatj]]]] [C1 e] [S1 Max knew [ [C2 whoi] [S2 Mary GIVE ti whatj]]]] [C1 whoi] [S1 Max knew [ [C2 ti] [S2 Mary GIVE ti whatj]]]] [C1 whoi] [S1 Max knew [ [C2whatj][S2Mary GIVE ti tj]]]]
Cyclicity and the theory of grammar
31
If GIVE is construed as finite, we derive (33a) from (32); if it is construed as nonfinite, we derive (33b). (33)
a. b.
*Who did Max know what Mary gave? *Who did Max know what Mary to have given?
Either way we interpret GIVE in (32), the result violates the Opacity Principle. This holds for the other possible derivation from (32a) and also for all derivations in which the indirect object is moved into COMP from a prepositional phrase (whether or not the preposition is also moved into COMP). In (32d), ti in S2 is in the domain of a subject (Mary) and cannot be properly bound by its antecedent whoi in the external COMP1. (33a,b) involve violations of the SSC, and (33a) involves a violation of the TSC in addition. What (30) and (33) show is that multiple binding into opaque domains is prohibited by the Opacity Principle when this binding involves the same escape hatch. In (30), the escape hatch is the subject of a nonfinite complement and in (33), COMP. The Opacity Principle does not prohibit multiple binding into opaque domains when the two different escape hatches are used. Thus, (34) is perfectly grammatical even though its surface structure (35) contains two bound traces in the complement predicate. (34)
What was Mary reported to have been given?
(35)
[ [C1 whatj] [S1 was Maryi reported
[C2 tj] [S2 ti to have been given ti tj]]]]
The two traces in the complement predicate are properly bound by the two traces in the escape hatches (complement subject and COMP), and the latter two are properly bound by their lexical antecedents—all according to the Opacity Principle.19 3.2 Movements out of complement subject and complement VP When violations of the SCC involve a movement out of the complement subject and the item moved was base-generated in that position, the SSC is inoperative. The relevant cases are as follows. 3.2.1 Wh movements The derivation (36), which violates the SCC, misgenerates (37a) if SEE represents the finite form of the verb, and (37b) if SEE represents the nonfinite form. (36)
a. b. c. d.
[ [C1 ] [S1 John knows [ [C2 ] [S2 whoi SEE whatj]]]] [ [C1 ] [S1 Johnknows [ [C2 whoi] [S2 ti SEE whatj]]]] [ [C1 whoi] [S1 John knows [ [C2 ti ] [S2 ti SEE whatj]]]] [ [C1 whoi] [S1 John knows [C2 whatj] [S2 ti SEE tj]]]]
Generative grammar (37)
a. b.
32
*Who does John know what saw? (=(5)) *Who does John know what to see?
The explanation for (37a) was given in section 2. We turn now to (37b). The structural analysis of (37b) to which conditions on proper binding apply is (38). (38)
[
[C1 whoi] [S1 John knows [
[C2 whatj] [S2 ti see tj]]]]
By the Opacity Principle, whoi and whatj properly bind their respective traces in S2. Nonetheless, the binding of ti by whoi is improper with respect to another condition on binding that concerns structures of obligatory control.20 In short, the subject of a whinfinitive construction is a position of obligatory control, and structures whose analysis is (39) are structures of obligatory control (see Chomsky and Lasnik (1977) for discussion). (39)
[S′ [C wh-phrase, + WH] [S NP to VP]]
(40)
contains the relevant evidence.
(40)
a.
John knows who to see.
b.
*John knows who Bill to see.
c.
*John knows who to see Bill.
In (40a), who is interpreted as the object of see; and John, as the subject of see. That this interpretation must hold for all constructions containing the structure (39) is illustrated by (40b,c). (40b) shows that a lexically filled NP may not occur in the subject position of a wh-infinitive. (40c) illustrates the same point if who is analyzed as the complement subject. Yet even if who is analyzed as being in the embedded COMP—thereby binding a trace in the complement subject, as in (41)–(40c) is still ungrammatical. (41)
[
John knows [
[C2 whoi] [S2 ti to see Bill]]]
In this way, (40) demonstrates that the subject of a wh-infinitive may not contain lexical material or its derivational reflex, a bound trace. The interpretation of these subjects is accomplished by a rule of obligatory control.21 It follows from this fact about (39) that (38) is ill-formed and therefore that (37b) is a case of misgeneration. It is assumed here that the above-mentioned fact about (39) holds universally and therefore need not be stipulated as part of any language-particular grammar. Given that this fact is part of Universal Grammar (UG), nothing more is required to account for how (37b) follows from general principles. Of course it may be that this fact about (39) can be derived from other independently motivated principles of UG. However, such considerations are methodological rather than empirical, and as such do not affect the derivation of the SCC in question. This is not to say that the issue is unimportant—see appendix 2 for some discussion.
Cyclicity and the theory of grammar
33
3.2.2 NP Movements The derivation (42) violates the SCC and misgenerates (43). np was reported [ Fredi HAVE KISSED Maryj]] Fredi was reported [ ti HAVE KISSED Maryj]] Fredi was reported [ Maryj HAVE KISSED tj]]
(42)
a. b. c.
[ [ [
(43)
a. b.
*Fred was reported Mary has kissed. *Fred was reported Mary to have kissed.
(43a) results when HAVE KISSED represents the finite form of the verb; (43b), when it represents the nonfinite form. Notice that (42) is the only possible movement derivation of (42c) from (42a), unlike all the other SCC violations discussed so far. (42c) contains only one trace, and it is properly bound to its antecedent in S2. Therefore, (43) is not accounted for by conditions on proper binding in surface structure. Clearly the ill-formedness of (43) must follow from something other than proper binding. Yet it is related to the trace erasure that results from mapping (42b) onto (42c). The trace erasure in (42) is fundamentally different from that which occurs in the derivations (10), (30), (32), and (36). In the latter derivations, the effect of trace erasure is to close an escape hatch that is crucial for binding a trace in an opaque domain. In (42), the erasure eliminates the only existing trace of a lexically filled NP. In this way, the trace erasure in (42) is similar to that which occurs in (13), repeated here. (13)
a. b. c.
[S [NPh e ] [VP was given [NPi Bernie ] [NPj a camera]]] [S [NPi Bernie] [VP was given [NPj e ] [NPj a camera]]] [S [NPj Bernie] [VP was given [NPj a camera] [NPj e ]]]
Given only the surface structures (13c) and (42c), there is no way to recover the deep structure grammatical relations of the matrix subjects—i.e., that Bernie functions as the indirect object of was given; and Fred, as the subject of HAVE KISSED. Given surface structure interpretation, this means that these matrix subjects will be interpreted as filling no deep structure grammatical relation with respect to any predicate in their respective sentences. Suppose that a deep structure grammatical relation (e.g., object of predicate) corresponds to an argument of a predicate in Logical Form (LF). Assuming that the surface subject of a passive is not a deep structure grammatical relation, it follows that the matrix subjects in (13c) and (42c) will not map onto argument positions in LF; instead, they will be free arguments. Given a prohibition against free arguments in LF, we have an explanation for (13c) and (42c). This condition is clearly a reasonable requirement for a system of predicate calculus within LF, and quite independent of trace theory. For example, the prohibition accounts for the ill-formedness of (44a,b), which are generable under the assumption that lexical insertion may apply freely to base structures. (44)
a. b.
*The report was published the book yesterday. *Harry seems that Louise didn’t like the leaky waterbed.
Generative grammar
34
Depending on how (44a,b) are mapped onto LF, one of the NPs in each example will not be in an argument position; that is, it will be essentially a free argument. From this perspective, the function of trace binding in surface structure is to relate a lexically filled NP to an empty argument position that is bound by a predicate in LF.22 Furthermore, it appears that the lexical antecedents of bound traces never occupy surface positions that map onto argument positions in LF (e.g., the object of a verb). This is supported by the fact that trace binding does not occur between two argument positions— e.g., the subject and object of an active verb. In other words, a derivation like (45) must somehow be excluded. (45)
a. b.
[S [NPi e ] praised [NPj John]] [S [NPj John] praised [NPj e ]]
(Compare (45b) with (13c) and S2 in (42c).) Although the trace in (45b) is properly bound given the condition on proper binding discussed so far, John praised is not interpreted as ‘John praised himself and is in fact ill-formed. A prohibition against mapping a single lexical NP onto more than one argument position in LF would exclude 334 n.12 also (13) and (42).23 Additional evidence that the antecedent of a trace does not occupy a surface position that maps directly onto an argument position in LF is provided by the fact that some of these positions sometimes take lexical NPs with null semantic content—e.g., expletive it. The positions in question are the subject of the copula in (46) and the subject of seemsclass verbs in (47).24 (46)
a. b.
(47)
i. ii.
i.
Johni was reported [S′ [NPi e] to have won]
ii.
it was reported [S′ that John had won]
i.
Johni is likely [S′ [NPi e] to win]
ii.
it is likely [S′ that John will win] Johni seems [S′ [NPi e] to be winning] it seems [S′ that John is winning]
In contrast to (46)–(47), expletive it never occurs in a surface position that maps directly onto an argument position in LF—for example, the object of a verb or preposition. The evidence suggests that there is no one-to-one correspondence between surface NP positions and argument positions in LF—specifically, that there are more surface NP positions than argument positions in LF. Trace theory mediates this asymmetry. Unfortunately, in the absence of a substantive theory of LF, little can be said about the details of the actual mapping between surface syntax and the syntax of LF. In lieu of a detailed formal theory of LF, we can express the prohibition against free arguments in LF representations as (48) and the prohibition against trace binding between two argument positions as (49).
Cyclicity and the theory of grammar
35
(48) Functional Relatedness In a sentence Si, each lexical NP with nonnull semantic content must fill some argument position in the logical form of Si. (49) Functional Uniqueness In a sentence Si, no lexical NP may fill more than one argument position for any given predicate in the logical form of Si.25
The stipulation “lexical NP with nonnull semantic content” in (48) allows for those instances in which lexical NPs have null semantic content and occur in nonargument positions in surface structure as in (46a-ii), (46b-ii), and (47ii). Note that we must also assume that lexical NPs with null semantic content may not occur in argument positions in LF. (Again, whether this must be stipulated or whether it follows from more basic principles will depend on how the theory of LF is formalized.) Together, (48)–(49) fix some limits on possible functional structures in LF. As formulated, they are neutral with respect to any formalized theory of LF. Trace theory is only one of several possible ways of instantiating these principles in a formal theory of grammar. Needless to say, an adequate theory of LF must somehow incorporate these rather minimal principles. In light of (48)–(49), the function of trace binding in surface structure is to link lexical NPs with nonnull semantic content that occur in surface positions that do not directly correlate with some argument position in LF to such a position. Given this interpretation of trace theory, we can identify additional cases of misgeneration whose derivation violates the SCC. For example, the matrix predicate in the derivation (42) (i.e., the passive predicate was reported) could be replaced with an adjectival predicate like is likely or a seems-class predicate like appears, and the resulting misgenerations in (50) would still be accounted for by exactly the same principles that account for (43). (50)
a. b
i.
*Fredi is likely [S′ that Maryj has kissed [NPj e]]
ii
*Fredi is likely [S′ Maryj to have kissed [NPj e]]
i
*Fredi appears [S′ that Maryj has kissed [NPj e]]
ii
*Fredi appears [S′ Maryj to have kissed [NPj e]]
Furthermore, the same analysis would apply if the matrix predicate in (42) were replaced with a predicate whose subject did directly correlate with an argument position in LF (in contrast to the matrix predicates in (42) and (50)). Thus, the following misgenerations are also accounted for. (51)
a. b.
*Fredi reported [S′ that Maryj kissed [NPj e]] *Fredi reported [S′ Maryj to have kissed [NPj e]]
Generative grammar
36
The derivation of (51) would violate the SCC in the following way. (52)
a. b. c.
[ np was reported [ Fredi HAVE KISSED Maryj]] [ Fredi reported [NPi e] HAVE KISSED Maryj]] [ Fredi reported Maryj HAVE KISSED [NPj e]]]
Either (48) or (49) is sufficient to account for (50); only (49) accounts for (51). Thus, these conditions on functional structure account for a much wider class of SCC violations than was previously considered. This completes the analysis of the empirical evidence for strict cyclicity. 4 Discussion As demonstrated in section 3, under trace theory, all cases of misgeneration by (53) that are excluded by the SCC can also be excluded by (54). (53) (54)
a. b. c. d. e.
The Tensed-S Condition (=(18)) The Specified Subject Condition (= (21)) The theory of obligatory control (in particular, the stipulation about (39)) Functional Relatedness (= (48)) Functional Uniqueness (= (49))
Therefore, in terms of empirical consequences—and not in principle—the SCC and (54) totally overlap. Significantly, (54a–e) are empirically motivated by considerations that are independent of violations of the strict cycle, whereas the SCC accounts for no cases of misgeneration that are not also covered by (54). Therefore, as the SCC has no independent empirical motivation given (54), there is no reason to stipulate it as part of the theory of grammar. By taking (54a–e) as axioms of the theory of grammar, we derive the empirical effects of the strict cycle as a theorem.26 It is worth noting here that from (54) we also derive the effects of the principle (55) as regards the behavior of the rules (53). (55)
No rule may erase a properly bound trace.
(55) is a subcase of the Trace Erasure Prohibition (12) discussed above. Since every violation of the SCC under trace theory involves the erasure of a properly bound trace, whatever other principles account for the SCC violations also account for the TEP violations as well. As shown above (see section 3.2.2), the independent motivation for the
Cyclicity and the theory of grammar
37
TEP—i.e., independent of the strict cycle—falls under the conditions on functional structure (48)–(49). Like the SCC then, (55) has no independent empirical motivation (again in contrast to (54)) and thus there is no reason to stipulate it as part of the theory of grammar.27 As noted above, both the SCC and the TEP can only be interpreted as conditions on the application of rules—or, assuming that rules apply freely, as conditions on derivations, whose domain is the mapping from base structures to surface structures. The TSC and SSC ((54a) and (54b), respectively) can also be interpreted as conditions on proper binding in surface structure, that is, as conditions on structures. In fact, only under this interpretation will these conditions account for the strict cycle violations; see section 2.2. As for the condition on wh-infinitives (54c) and the Functional Relatedness and Functional Uniqueness conditions ((54d) and (54e), respectively), they seem most naturally interpreted as conditions on structure also. As a result of this interpretation, we see that the derivation of the cycle illustrates how conditions on structure allow for the elimination of conditions on mapping. In the case under discussion, conditions on structure are clearly more general. The conditions on structure under consideration fall within three partially intersecting domains. The TSC and SSC fall within the theory of binding, as discussed above. Functional Relatedness and Functional Uniqueness fall within a theory of arguments, part of the theory of LF. The condition on the interpretation of the subject of wh-infinitives falls under the theory of obligatory control. It seems likely that all three fall within a theory of LF. The theory of obligatory control as it applies to wh-infinitive constructions involves both binding and the theory of arguments, though to what extent each is involved crucially depends on how a position of obligatory control (henceforth PRO) is analyzed under trace theory; cf. Chomsky and Lasnik (1977) for discussion. Under the assumption that PRO is a base-generated empty NP, it behaves like a trace to the extent that it must be assigned an antecedent. Unlike trace, it may have an unspecified antecedent that does not occur in the sentence, as in (56). (56)
It was unclear what PRO to do.
The binding of PRO differs from the binding of trace in a very striking way. Given Functional Uniqueness, trace binding may not involve two argument positions. PRO binding, in contrast, is immune to Functional Uniqueness because it must involve two distinct argument positions when there is an antecedent in S. Whether the facts about obligatory control can be shown to follow from the interaction of the theories of binding and arguments, or perhaps some other component in the theory of grammar, remains to be determined. A brief discussion of this issue occurs in appendix 2, and see also Chomsky (1980) for a detailed analysis. In this light, the empirical effects of the strict cycle follow from the theories of binding, arguments in LF, and obligatory control. In each instance, the relevant domain is structure at some level of representation rather than the way in which these structures are derived. That is, each theory provides a set of well-formedness conditions on structures— filters, in a word—at some level of representation.
Generative grammar
38
For example, it is a simple matter to reformulate the Opacity Principle (27), which incorporates the TSC and SSC, as a filter. (57) Opacity Filter *[…X…[α…Y…]…] where: a. α=S′ (or NP), and b. X and Y have identical indices, and c. Y is not bound to any c-commanding category in α (linking convention), and d. Y is the domain of (i) a finite clause (TSC) or (ii) a subject not controlled by X (SSC).
Following Chomsky and Lasnik (1977), we can interpret (57) as a rule that says “assign ‘*’ [the structural change] to any structure […] which meets the following conditions… [the structural description]”. How the condition on wh-infinitive constructions might be formulated in terms of filters is discussed in appendix 2. Unfortunately, without a formal theory of LF for natural language, it is difficult to speculate about what the LF filters might look like. In any event, the rationale for interpreting (54) as a set of filters rather than as conditions on mapping is quite straightforward. In the case of cyclicity what requires an explanation is why a grammar of English, which incorporates (53) in some form, does not misgenerate all the SCC violations discussed above. There are two general approaches to the problem. Either we constrain the function of the rules (53) so that the misgeneration does not result, or we construct a set of filters that will eliminate the misgeneration that does result from the free application of (53). A priori there is no reason to prefer one approach over the other, all things being equal. With respect to cyclicity however, all things are not equal—which will become clear from a comparison of the two approaches as they apply to strict cycle misgeneration. Under the heading “conditions on function,” there are two options. For one, we could formulate (53) in terms of context and target predicates (cf. Bresnan (1976a)) in such a way that no NP Movement or Wh Movement rule will misgenerate the strict cycle violations. Because these predicates are available only under a wider theory of transformations than the theory that restricts the class of possible rules to the form of (53), this option seems undesirable on methodological grounds. Furthermore, context and target predicates are language-particular or, worse yet, construction-particular within a particular language. (Compare for example “Move NP” with the formulations of Passive, Raising, and There Insertion (which it subsumes, except for the insertion of there) in the wider theory that allows transformations to be formulated in terms of context and target predicates.)28 In fact, it is precisely context and target predicates that are what is language-particular about NP Movement and Wh Movement transformations.29 Under the other option, we would attempt to find general conditions that restrict the application of (53)—that is, for all constructions in any particular language and for all languages. The SCC is one such condition. The TEP, or alternatively a condition prohibiting rules from analyzing trace (cf. Pollock (1978)), is another. Comparing such conditions on the function of rules (i.e., on (53)) to the filters (54), we are compelled to choose in favor of filters. That is, although both the filters and the conditions on function seem applicable across languages, the empirical effects of the conditions are subsumed by the filters, but not conversely. Given the filters, then, the
Cyclicity and the theory of grammar
39
conditions are simply redundant. Furthermore, the filters are motivated independently of any SCC violation resulting in misgeneration and therefore generalize across a larger class of phenomena than was designated by the initial problem of strict cycle violations. What has been demonstrated to a certain degree is that, given a set of general filters, it is possible to maintain the narrower theory of transformations. This is an important result for the following reason. In the wider theory, the behavior of (53) is stipulated in terms of context and target predicates. In the narrower theory, we are not allowed to stipulate how the rules behave. Rather, this must follow from what is allowed to be stipulated—in this case, the general principles characterized by the filters. Thus, the narrower theory gives us an explanation for the behavior of (53), as opposed to a description of that behavior given the wider theory. This illustrates the trading relation that holds between description and explanation. It is worth considering further the difference between the conditions on rule application vs. the filtering approach to misgeneration. Conditions on rules are an indirect means of accounting for the empirical facts, that is, for ill-formed outputs. In contrast, filters provide a direct characterization of what is ill-formed in the output. Moreover, they are independent from any particular formulation of the rules, whereas conditions on rules—more generally, conditions on derivations—crucially depend on what rules are assumed to be operative. As an illustration, consider the following possibility. Suppose that coindexed categories (including bound traces) could be generated by the base rules. Now, every strict cycle violation discussed above could be misgenerated directly in the base. The filters would still apply, but the SCC or TEP—even in conjunction with the TSC and SSC interpreted as conditions on rules—would be insufficient to account for the cases.30 From this perspective, conditions on rules are less preferable than conditions on structure—that is, filters. There are of course still many outstanding problems for a theory of filters. For example, the transformations in (53) clearly misgenerate considerably more structures than have been discussed here. On the other hand, it is also the case that the filters (48)– (49) cover a much wider range of phenomena than has been discussed in this article. From what we have seen thus far, the study of filters as a way of increasing the explanatory content of the theory of grammar seems quite promising. Appendix 1: The motivation for the cycle The syntactic cycle was initially proposed for methodological reasons. Adopting a cyclic principle of rule application allowed for a fundamental simplification of the theory of grammar as formulated in the early 1960s. In particular, some powerful theoretical machinery such as generalized embedding transformations, transformational markers, and the as yet unformulated type 2 projection rules of Fodor-Katz semantic theory could be eliminated in favor of recursive base rules and the cycle. The result of adopting the cycle was a more highly structured theory of grammar that was also weaker in expressive power; specifically, it did not allow for the possibility that singulary transformations would have to apply to a matrix S before a constituent S could be embedded in it. (See Chomsky (1966) for a detailed discussion of this point.) However, there was no
Generative grammar
40
discussion then of the empirical motivation for the cycle; and in fact, it seems as if no empirical differences between the two theories were noticed at that point. In the rather extensive literature on the cycle that has developed since its initial proposal, it has been claimed that the cycle is empirically motivated by grammatical sentences whose derivations require that some pair of rules applies in an A-B-A order— that is, cyclically. Rather than review this literature (and the equally extensive counterliterature claiming that these arguments for the cycle can be subverted), it will be sufficient to examine what is perhaps the strongest case of this sort in English, the interaction between the rules of NP Preposing and There Insertion. Within the framework presented above, There Insertion consists of two independent operations, an NP movement and an insertion of there into an empty NP category (see Milsark (1974)) without affecting the index of that category (cf. Dresher and Hornstein (1979)). The former operation is a rightward movement of an NP around be, one realization of the rule “Move NP.”31 NP Preposing is yet another realization of the general rule. Given that the two NP movements cannot be distinguished from one another under our framework, an A-B-A derivation could only be Move NP—Insert there—Move NP. Consider the following actual case. (58) gives a derivation for (59). (58)
a. b. c. d. e.
(59)
[ [ [ [ [
np were reported [ np to have been sighted tigersi in Spain]] np were reported [ tigersi to have been sighted ti in Spain]] np were reported [ ti to have been tigersi sighted ti in Spain]] np were reported [ therei to have been tigersi sighted ti in Spain]] therei were reported [ ti to have been tigersi sighted ti in Spain]]
There were reported to have been tigers sighted in Spain.
Note that proper binding is observed in (58e). (58) provides an A-B-A derivation for (59) within the framework under consideration. The derivational arguments for the cycle claim that the A-B-A ordering is not merely sufficient, but also unique in yielding the correct result. Clearly this is not true for (59), which could be derived by applying all the NP-movements before inserting there in the matrix subject. The derivation would be identical to (58) except for the substitution of (60) for (58d). (60)
[
ti were reported
ti to have been tigersi sighted ti in Spain]]
Recall that np in (58), as well as ti, represents an empty NP category, [NPi e], where the category bears an index i. Thus, replacing np with ti merely involves changing the index of np to i (or adding the index i if it is assumed that basegenerated categories do not have indices when they are empty). Within the framework under consideration, it seems unlikely that insertion operations that do not affect the indexing of a category will be crucial for any movement operation of the sort in (53). This leaves the possibility of an A-B-A interaction between “Move NP” and “Move wh-phrase.” Here too there will be nothing crucial about such an interaction. Given that once a wh-phrase moves into COMP it may only move into
Cyclicity and the theory of grammar
41
another COMP (see Chomsky (1977b)), NP Movements and Wh Movements will be essentially independent of one another. That there will be no crucial A-B-A arguments for the cycle within the theoretical framework assumed here is exactly what should be expected. Given filters, the facts of derivation with respect to surface structures are of no empirical significance. Moreover, misgeneration (and not generation) is what constitutes empirical motivation for this or that analysis. Since the major movement rules assumed (e.g., (53)) apply without constraint, we can assume also that they will generate all the grammatical sentences as well as a considerable number of ungrammatical structures. Generation of grammatical sentences is no longer an issue; rather, it is misgeneration that has to be accounted for. From this it follows that only misgeneration resulting from violations of the SCC constitutes empirical motivation for the cycle. Appendix 2: Filters or Control In section 3.2.1, the strict cycle violation involving the misgeneration of sentences containing wh-infinitive constructions (e.g., (61)) was accounted for by a stipulation, apparently universal, that the subjects of wh-infinitives are positions of obligatory control. (61)
*Who does John know what to see? (= (37b))
The surface structure of (61) is (62). (62)
[
[C1 whoi] [S1 does John know [
[C2 whatj] [S2 ti to see tj]]]]
As noted above, (62) does not meet the structural description of the Opacity Filter (57). The question is, then, does (62) violate any other independently motivated condition on well-formed structures? If so, then we can derive part (optimally, all) of the empirical effect of the stipulation regarding wh-infinitives. One possibility is the Subjacency Condition (cf. Chomsky (1973; 1977b)), provided it can be reinterpreted as a filter.32 (63)
Subjacency Filter *[…X…[α…[β…Y…]…]…X…] where: a. X and Y have identical indices, and b. α and β are bounding categories (i.e., S or NP), and c. Y is not bound to any c-commanding category in α (linking convention).
Given (63), the binding relation between whoi and its trace in (62) is ill-formed. Filter (63) is independently motivated because it accounts for the boundedness of rightward movements (see Ross (1967), Akmajian (1975), and Chomsky (1977b)), as
Generative grammar
42
well as for violations of the Complex NP Constraint (Ross (1967)) (as in (64)) and of the Subject Condition (Chomsky (1973, (99))) (as in (65)), independent of (61).33 (64) a. i.
*Who did John know the boy Mary liked?
ii. [S′ [C whoi] [S did John know [NP the boy [S′ [C ti] [S Mary liked ti]]]]] b. i.
*Who did John deny the claim that Fred had cheated?
ii. [S′ [C whoi] [S did John deny [NP the claim [S′ [C that] [S Fred had cheated ti]]]]] (65)
i. ii.
*Who did stories about annoy Sam? [C whoi] [S did [NP stories about ti] annoy Sam]]
(Note that (65) provides evidence that for Subjacency the bounding node in English is S and not S′, in contrast to Opacity) However, Subjacency is not sufficient to account for all the strict cycle violations involving wh-infinitive constructions. Thus, consider the following derivation, which violates the SCC and misgenerates (67). (66)
(67)
a. b. c. d.
[ [ [ [
[C1 ] [S1 np was known [ [C2 ] [S2 whoi to see whatj]]]] [C1 ] [S1 whoi was known [ [C2 ] [S2 ti to see whatj]]]] [C1 ] [S1 whoi was known [ [C2 whatj] [S2 ti to see tj]]]] [C1 whoi] [S1 ti was known [ [C2 whatj] [S2 ti to see tj]]]] *Who was known what to see?
Given the linking convention (63c) for Subjacency, whoi properly binds its trace in S1 and that trace properly binds the trace ti in S2. On the evidence, it appears that Subjacency may not be the best way of accounting for strict cycle violations that misgenerate structures containing wh-infinitive constructions. First, it is clear that Subjacency does not account for all the facts concerning such constructions—e.g., (67) and (40). Second, the Subjacency Filter overlaps in effect with the Opacity Filter. The Subjacency Filter alone accounts for all the strict cycle violations that involve only Wh Movements. Given that the Opacity Filter is more general (it applies to all bound anaphora, whereas Subjacency appears to be limited to trace (see note 32)), there may be a way of deriving all of the empirical effects of Subjacency and thereby eliminating a redundancy in the theory of filters.34 An alternative solution to the problem of obligatory control in wh-infinitives might follow from the theory of filters developed in Chomsky and Lasnik (1977). The putatively universal filter (68) provides a local account for the facts concerning wh-infinitives. (68)
*[α NP to VP], unless α is in the domain of and adjacent to [−N] (i.e., V or for).35
Filter (68) analyzes lexical NPs (including trace), but not PRO. Thus, it accounts for the following examples.
Cyclicity and the theory of grammar (69)
a. b. c.
43
*John knows whati [Bill to see ti] *Johni was known whatj [ti to see tj] *John knows whoi [ti to see Bill]
None of the bracketed NP-to-VP structures in (69) is adjacent to a [−N], since the wh-word is [+N]. It is of interest that (68) accounts for (33b) as well as (37b). What remains undetermined under this solution is whether filter (68) relates to the theory of binding or LF. Given the organization of a grammar as outlined in footnote 9, it would seem that filter (68) is in principle unrelated to any consideration of LF. There are of course other options. For example, it may well be that filter (68) should in fact relate to LF, since obligatory control is a question of interpretation. For an alternative analysis to filter (68) that is more closely related to LF, see Chomsky (1980). Notes
* I am indebted to Noam Chomsky, Jan Koster, and Carlos Quicoli for much valuable discussion of the material covered in this article, and to Noam Chomsky and Norbert Hornstein for helpful comments and suggestions regarding an earlier draft. This work was supported by fellowships from the American Council of Learned Societies and The National Institutes of Mental Health (grant #1F32 MH 05879–01). 1 Which nodes constitute cyclic domains will be determined by various empirical considerations. Initially, only S [S′ in current terminology] was considered a cyclic domain. In subsequent work it has been suggested that other domains should also be treated as cyclic—e.g., NP (Chomsky (1970); Akmajian (1975)) and PP (Van Riemsdijk (1978a; 1978b); Baltin (1978)). Williams (1975) shows that if all syntactic domains are taken to be cyclic, the natural linear ordering of a sequence of transformations more or less follows as a consequence. 2 For discussion of how this condition applies within the domain of phonological theory, see Kean (1974) and Mascaro (1976). 3 “C” as an abbreviation for the category COMP will be used in labeled bracket representations; otherwise, “COMP” will be used. Throughout the article, 4 For some discussion of the early history of trace theory, see Lightfoot (1976a, section 1). In Chomsky (1977a, chapter 1; 1977b) and Fiengo (1977), the definition of trace crucially involves coindexing. Surface structures containing base-generated empty categories that receive no interpretation via interpretive rules must be excluded by convention. In the present discussion, an empty category—indexing aside—is considered to be subject to the conditions governing traces. This avoids having to stipulate a special convention for basegenerated empty categories that occur in surface structure. The only problem here may be the infelicitous use of the term “trace” for base-generated empty categories, which in the base are not “traces” (in a narrow sense) of another category. However, this is really no different from the situation in which a reflexive pronoun is referred to as a bound anaphor when the sentence containing it has no possible antecedent to bind it—e.g., himself disappeared. Cf. the discussion of examples (8) and (9) below. 5 Whether or not there is any motivation for the full range of traces allowed by our definition remains to be determined. The interpretation of trace theory discussed in section 3.2.2 is suggestive in this direction. For now, the question is left open; but see Williams (1977).
Generative grammar
44
6 The analysis of strict cyclicity that follows does not depend on this assumption. Alternatively, it could be claimed that in base structure, only categories containing lexical material (including grammatical formatives like expletive it and existential there) receive indices. Thus, indexing would be done by lexical insertion; and coindexing, by movement and construal rules. At present it is unclear what empirical consequences differentiate between the two alternatives. 7 (8a) is actually part of the definition of “bound anaphor”. It is given here as a part of PB simply as a matter of expository convenience. C-command is defined as follows: Node A c(onstituent)-commands a node B if neither A nor B dominates the other and the first branching node which dominates A dominates B. (Reinhart (1976, 32)) C-command is essentially the notion “in construction with” of Klima (1964). It is more restrictive than the notions of command given in Langacker (1969) or Lasnik (1976). It should be noted that Reinhart’s study concerns primarily free anaphora rather than bound anaphora and that the definition of c-command given above is not the final version in that study. The final version involves a complication for reasons not directly relevant to the present discussion. It is assumed here that PB applies to partially interpreted surface structures. 8 “ti” is used throughout as an abbreviation for trace—i.e., for where [βi e] where β denotes a category of the base. 9 The TEP is essentially a restatement of the Dresher-Hornstein Principle (i). (i) Only designated NP elements can erase traces. (Dresher and Hornstein (1979 (38))) Under the Dresher-Hornstein analysis, only grammatical formatives like existential there and expletive it count as designated elements. Both the Dresher-Hornstein Principle and the TEP rule out the possibility of determiner spell-out rules, as in Fiengo (1977). (See their article for independent arguments against such rules.) It also follows from (i) and the TEP that there can be no rule of Agent Postposing (cf. Chomsky (1970)). (See also Emonds (1976), Hornstein (1977), and Chomsky and Lasnik (1977) for additional discussion.) Under the assumption that the insertion of there in [NPi e] has no effect on the index i, and that trace erasure crucially involves a change of index, There Insertion does not involve trace erasure. The analysis extends naturally to expletive it. Alternatively, given the nonmovement analysis of extraposition phenomena in Koster (1978a), the insertion of expletive it does not involve trace erasure. Given this characterization of trace erasure, deletions of traces do not constitute trace erasures in the technical sense. Since deletions of categories remove information required for semantic interpretation, it seems reasonable to separate them from that part of the derivation that relates a base structure to its semantic representation (e.g., logical form). Following Chomsky and Lasnik (1977), I assume that sentence grammar is generally organized as follows: Base Transformations I. Deletions Filters Phonology
II. Rules of construal Quantifier interpretation etc.
Cyclicity and the theory of grammar
45
(I) determines phonetic representation; and (II), logical form. Given this model, only trace erasure by change of index (and not by deletion of the category with its index) would affect interpretation. The filters in (I) concern local phenomena at the periphery of sentences (e.g., COMP). Filters relating to logical form (i.e., in (II)) are discussed in section 4. 10 The derivation (13) would be excluded under the SCC if, following Williams (1975), we consider all syntactic domains as cyclic. Note however that the TEP excludes only ill-formed structures, unlike the SCC (see the discussion of example (6) above). 11 (20) is not actually a surface structure in the technical sense, since a rule of SubjectAuxiliary Inversion will map it onto (11). The effects of this rule are without consequence for the issues at hand, and therefore as a matter of expository convenience will not be stated in what follows. 12 It will also be shown how the empirical effects of the TEP are derived from independently motivated principles of Universal Grammar (UG). Thus, much of the content of the TEP follows from UG and TEP need not be stipulated. 13 The result of such an interpretation is similar to that involving the TSC, as will be shown in section 3.1. 14 This conceptualization of the conditions is due to Chomsky, class lectures (Fall 1976). See Chomsky (1980) for a discussion of how the phenomena of disjoint reference might be incorporated into this analysis. 15 For further discussion of the notion “locality” in the theory of grammar, see Koster (1977; 1978c). It is worth noting that the Opacity Principle, in conjunction with the linking convention, imposes a high degree of locality on trace theory. For comments on the global character of trace theory see Chomsky (1975, 117f). 16 Using the notion “c-command” as defined in footnote 8, we define “syntactic domain” as follows: The domain of a node A consists of all and only the nodes c-commanded by A. (This differs necessarily from the definition given in Reinhart (1976).) From this formulation of (27), it follows that COMP in the embedded S is not in the domain of either a finite clause (i.e., S) or a specified subject. Thus, the “escape hatch” property of COMP falls out as a consequence of this formulation (cf. Chomsky and Lasnik (1977, fn. 13)). See Chomsky (1980) for further discussion. 17 For example, in (25), even if whoi is analyzed as X and its trace in S2 as Y, the Opacity Principle does not apply because there is a c-commanding trace in α (= [S2) that binds the trace in S2. Note that, given linking, it might be possible to drop the qualification “not controlled by X” from the SSC. Note also that the linking convention says in effect that the Opacity Principle applies only to anaphors that are free (i.e., unbound) in α. Cf. Chomsky (1980). 18 “np” denotes a base-generated (i.e., unbound) NP trace, [NPi e]. 19 This does not exhaust the possibilities with respect to the functioning of the two escape hatches. See section 3.2.1. 20 Alternatively, the binding of ti by whoi violates the Subjacency Condition (Chomsky (1973; 1975c; 1977b))—that is, the binding occurs over two bounding nodes (taking S rather than S′ to be the relevant node for bounding; see Chomsky (1977b) for discussion). So far, Subjacency has been unnecessary for handling Wh Movement violations of the SCC. For the present, there is no real necessity to invoke Subjacency to account for (37); but see appendix 2 for discussion. 21 The exact formulation of this rule is not crucial for the present discussion. See Chomsky and Lasnik (1977) and Chomsky (1980) for a discussion of the properties of obligatory control. Note that it follows from the Opacity Principle that subject of embedded nonfinite clauses is the only possible NP position of control (Chomsky and Lasnik (1977, fn. 30)).
Generative grammar
46
22 How this analysis holds in the case of idioms—(i) (in contrast to (ii)) for example—remains to be determined. (i)
Advantagei was taken ti of Bill.
(ii)
Billi was taken advantage of ti. The issue is whether or not ti in (i) marks an argument position in LF. In other words, what is the representation of idioms like take advantage of in LF? As a first approximation, it seems plausible that the analysis of such idioms as complex predicates is correct. If so, then given the internal structure of such predicates, ti in (i) does mark an argument position: the object of take.
23 Again, the proper formulation of the condition will depend on how the mapping from surface structure to LF is done. (45) could just as easily be excluded under a different assumption— namely, that each lexical NP (e.g., John) maps onto one argument position. In this case, the other argument position remains unfilled. (45) could then be excluded on the grounds that not all the argument positions of the predicate are filled. The treatment of positions of obligatory control (i.e., PRO; see Chomsky (1976)) is an exception to this prohibition as stated. There are several options for dealing with this. PRO and trace may be analyzed differently in LF, or possibly the prohibition is to be formulated so that positions of obligatory control are immune, and so on. For additional comments see the discussion surrounding example (56) below, and Chomsky (1977a, chapter 1). Note that, given the prohibition under discussion, it must follow that either the subject of noncausatives such as the one in (i) is not an argument position in LF or that the trace analysis of such constructions shown in (ii) is untenable. (i)
Ice melts slowly in a cold place.
(ii)
Icei melts ti slowly in a cold place. See Fiengo (1974) for an analysis along the lines of (ii), and Wasow (1977) for a counterargument.
24 Other non-argument positions like COMP and TOPIC (see Chomsky (1977b)) as well as the subject position of copular constructions whose complements do not contain sentential constructions do not allow lexical NPs with null semantic content. At present, there is no explanation for this. Note that this analysis extends quite naturally to existential sentences. Existential there occurs system-atically in the subject position of copular constructions. This is not to say that it does not occur in any other contexts, e.g., (i) (Mark Baltin, personal communication): (i)
There hung on the mantelpiece a portrait of Louis XIV However, such constructions differ from copular existentials in that the underlying subject may not occur immediately after the verb in the former, whereas this order is preferred in the latter.
Cyclicity and the theory of grammar (ii)
47
*There hung a portrait of Louis XIV on the mantelpiece. There is a portrait of Louis XIV on the mantelpiece. ?*There is on the mantelpiece a portrait of Louis XIV For additional discussion of noncopular existentials, see Milsark (1974).
25 How these prohibitions are to be formalized in a theory of grammar will crucially depend on how the theory of LF is formalized—which is still to be determined. Nonetheless, (48) and (49) have interesting consequences for a theory of transformations. For example, notice that (49) alone is sufficient to block some cases that would violate a recoverability condition on deletions. Thus, suppose that “Move NP“were to apply in such a way as to substitute a lexically filled NP for another lexically filled NP. In the case in which the substitution involves the subject and object of an active verb, the result will be like (45b) and therefore ill-formed according to (49), independent of recoverability. It may be that, given the full set of conditions on LF, recoverability need not be stipulated for mappings from base structure to LF. Another interesting consequence concerns the A-over-A Condition (cf. Chomsky (1973)). This condition accounts for the misgeneration that results when a rule applies to an improperly factored string—for example (i), where the Passive transformation’s structural condition imposes a factorization into (X, NP, V, NP, Y). (This example is taken from Chomsky (1973, 235).) (i)
John and—Bill—saw—Mary. The man who saw—Mary—bought—the book. John’s winning—the race—surprised—me. The Passive transformation maps (i) onto (ii).
(ii)
*John and Mary was seen by Bill. *The man who saw the book was bought by Mary *John’s winning I was surprised by the race. Within trace theory (ii) translates as (iii).
(iii)
[NPi Johnj and Maryk] was seen tk by Bill [NPi the manj who saw the bookk] was bought tk by Mary [NPi John’sj winning Ik] was surprised tk by the race
Notice that (iii) is automatically excluded via (48), since NPj in each instance will not map onto an argument position in LF. Whether this sort of analysis can account for all the Aover-A violations remains to be determined. Note too that a stronger version of Functional Uniqueness, i.e., that no lexical NP may fill more than one argument position in the logical form of a sentence, automatically excludes
Generative grammar
48
the possibility of a rule of Raising to Object. For discussion of the status of this rule see Chomsky (1973); Postal (1974; 1977); Lightfoot (1976b); and Bresnan (1976b). 26 This is not to say that we have a formal proof; clearly we do not. In fact, it seems unlikely at this point that a formal proof can be constructed. Recall that the SCC excludes a class of derivations that are allowed by (54) (cf. the discussion of example (6) above). As noted in section 1, this reduction has no empirical effects because the structures produced by these derivations are well-formed and otherwise generable in ways that do not violate any conditions. Since (54) does not subsume all the effects of the SCC, deriving the SCC as a principle from (54) is not possible. However, it is important to note that (54) holds across languages and therefore may reflect universal properties. The applicability of the Opacity Principle (i.e., the TSC and SSC) to clitic and quantifier phenomena in some Romance languages has been demonstrated by Kayne (1975) and Quicoli (1976a; 1976b; 1979), among others. The fact that the subject of a wh-infinitive is a position of obligatory control holds for other languages that allow these constructions—for example, Hebrew and French. (i)
Hebrew:
John yodea (et) ma lir’ot. ‘John knows (acc.) what to see.’
French:
Jean sait qui visiter. John knows who to visit.’
(ii)
Hebrew:
*John yodea (et) ma Bill lir’ot. John knows (acc.) what Bill to see.’
French:
*Jean sait qui Bill visiter. John knows who Bill to visit.’
(iii)
Hebrew:
*John yodea (et) ma lir’ot Bill. John knows (acc.) what to see Bill.’
French:
*Jean sait qui visiter Bill. John knows who to visit Bill.’
(iv)
Hebrew:
*John noda ma lir’ot. John is known what to see.’
There is no French equivalent of (iv). (I am indebted to Hagit Borer for the Hebrew data and Dominique Sportiche for the French.) As for the conditions on functional structure, no comment seems necessary. 27 (54) does not prohibit rules like Agent Postposing or determiner spell-out (see note 9). However, it is not entirely clear that the addition of such rules will result in misgeneration (i.e., have empirical effects in the sense discussed in connection with example (6) above); but cf. Dresher and Hornstein (1979) and Chomsky and Lasnik (1977, 476). 28 From this point of view, a theory in which the form of movement rules is limited to “Move Category”, any condition on the form of context or target predicates is yet another condition on the function of rules. For example, the condition of minimal factorization (Chomsky
Cyclicity and the theory of grammar
29 30
31 32 (i)
49
(1976, 312)) would be so analyzed from the perspective that the form of movement rules is “Move Category”. However, if we assume that the form of transformations includes context and target predicates, then minimal factorization is a condition on the form of rules. For additional discussion, see Chomsky (1973, 232). Clearly, any language whose grammar involves rules that move NPs or wh-phrases must contain the rules (53) minimally. Thus, (53) may also reflect properties of UG. Given that we already have most of the appropriate filters to handle the increase in misgeneration that would result from free indexing (including coindexing) in the base, it may be that facts concerning the surface distribution of NPs and wh-phrases provide no motivation for a movement analysis over base generation of bound traces. (For more on this issue, see Jenkins (1977).) Thus, (53) may well be just an algorithm for binding in much the same way that the strict cycle provides a correct algorithm for derivations. This of course presupposes a solution to the problem of idioms, which is often cited as perhaps the strongest motivation for a movement analysis. Under trace theory, there is no need to assume that this movement is structure preserving in the sense of Emonds (1976). For discussion, see Milsark (1974, chapter 2). All that is required is that a rule of interpretation maps the moved NP onto an argument position in LF. The fact that some cases of lexical bound anaphora seem well-formed even though they involve Subjacency violations as in (i) suggests that (63) should be limited to trace binding. Wei expect pictures of ourselvesi (each otheri) to be on sale by Sunday. Furthermore, it is not clear that the boundedness of rightward movements falls out under this interpretation as it does when Subjacency is interpreted as a condition on movement. See Chomsky (1975c; 1977b) and Koster (1978c) for much relevant discussion.
33 It is important to note that Subjacency may not always be applicable in the case of whextraction from object NPs because of the possibility of reanalysis. Thus, (i) is well-formed, but (ii) is not: (i)
What did John write a book about?
(ii)
*What did John destroy a book about? See Chomsky (1977b, 112ff.) for discussion.
34 See Koster (1978c) for discussion of how the empirical effects of the Subjacency Condition might be derived from other independently motivated conditions. Given the logic of the approach outlined in this article, this should be possible in principle. Whether it can in fact be done is of course an empirical question. 35 Filter (68) crucially depends on the existence of a distinction between lexical NP and trace as opposed to PRO. In Chomsky and Lasnik (1977) this distinction is given in terms of indexing, so that only lexical NP and trace have indices at the point at which the filter applies; PRO being assigned an index by a rule of construal that does not feed into the filtering system (see footnote 9). Under this treatment, only the alternative analysis of indexing mentioned in footnote 6 seems possible. In contrast, the alternative analysis to filter (68) given in Chomsky (1980) appears to be neutral with respect to the two possible analyses of indexing, since under that treatment trace and PRO fall together as opposed to lexical NP. Also, see Weinberg (1977) for a discussion of how this filter might be generalized to cover the facts concerning gerundives.
3 Superiority, Subjacency, and Economy* In his introduction to volume four of Noam Chomsky: Critical Assessments, Carlos Otero writes: In true scientific inquiry (in fact, in any kind of rational inquiry including the natural sciences), productive people try to identify and come to understand major factors and see what can be explained in terms of them. They anticipate that there will always be a periphery of unexplained phenomena (a range of nuances and minor effects that require auxiliary assumptions) which should be very sharply separated. (1994:17–18) Generative grammar provides an ideal case study, as Otero has documented.1 This inquiry has from the outset been a search for guiding principles, the dominant structures, and the major consequences. In the history of generative grammar over the past four decades, among the major factors investigated have been rule systems and systems of grammatical principles and related parameters. One extremely successful line of research has been the elimination of language-particular and construction-specific transformations in favor of three optimally general and presumably universal transformational rules: Substitute α for β, Adjoin α to β, and Delete α (the former two referred to under the designation “Move α” or all three under the designation “Affect α”). This was brought about by the gradual development (1970 to 1979) of a set of general grammatical principles involving phrase structure, bounding, binding, Case, government, and predicate/argument structure,2 many of which function as conditions on representations, rather than conditions on the application of rules (i.e., on derivations), raising a question about the role of derivations in grammar. If most grammatical principles function as conditions on representations, then perhaps all conditions on derivations could be replaced by conditions on representations so that derivations would become essentially epiphenomenal. One result in this direction was the demonstration in Freidin (1978) that the empirical effects of the Strict Cycle Condition of Chomsky (1973) could be derived from independently motivated conditions on representations.3 Such derivations highlighted the interconnection of the proposed conditions, beginning with Chomsky’s demonstration of how many of Ross’s island constraints (Ross 1967/19864)—and in particular the Complex NP Constraint—followed from the Subjacency Condition (Chomsky 1973, 1997b—henceforth Subjacency). This led to the useful research strategy of trying to derive the empirical effects of one condition from another (or others) that appear to be more basic or have a broader empirical range. The guiding principle in this research is to eliminate redundancy (i.e., overlapping conditions) under the assumption that the language faculty is nonredundant. When a construction is prohibited by more than one condition, it is assumed that this construction is more strongly deviant than those that are prohibited by only one of the conditions. This is an empirical hypothesis, though not as yet particularly well motivated.
Superiority, subjacency, and economy
51
The Superiority Condition (Chomsky 1973) provides an interesting case study of the connections between grammatical principles and the effort to subsume certain ones under others. The original formulation of the Superiority Condition (henceforth Superiority) in Chomsky (1973) applies exclusively to leftward movement:5 (1)
No rule can involve X, Y in the structure …X…[α…Z…−WYV…]… where the rule applies ambiguously to Z and Y, and Z is superior to Y.
A category Z is superior to Y when Z asymmetrically c-commands Y.6 The standard examples involve wh-movement in a sentence containing multiple wh-phrases where one phrase is superior to the other, as in (2a–b): (2)
a. b.
*What did who read? Who read what?
Given a rule of wh-movement that can move either the subject who or the object what into [Spec,CP],7 Superiority prohibits the movement of what because it is asymmetrically c-commanded by who. Under this formulation Superiority is a condition on the application of wh-movement and is thus a condition on derivations. One widely discussed attempt to derive the empirical effects of Superiority involves the Empty Category Principle (ECP) of Chomsky (1981), under the crucial assumption that the ECP applies to LF representations.8 Under this proposal (2a) and (2b) would have the LF representations along the lines of (3a) and (3b) respectively: (3)
a. b.
[CP [NP2 who1 [NP2 what]] [IP x1 read y2]] [CP [NP1 what2 [NP1 who]] [IP x1 read y2]]
Note that this analysis assumes that unmoved wh-phrases in (2) are simply adjoined at LF to the wh-phrase in [Spec,CP]. It follows from the adjunction operation that the head of the phrase to which a wh-phrase is adjoined remains the head of the entire adjunction structure. Thus, in (3a) who fails to antecedent govern its trace because it does not ccommand it—in contrast to (3b) where NP1 c-commands and hence antecedent governs the trace in subject position. In both (3a–b) the trace in object position satisfies the ECP because it is lexically governed by the verb. On this analysis Superiority a condition of derivations, is reduced to the ECP, a condition on LF representations. This analysis crucially depends on a disjunctive formulation of the ECP where either lexical head government or antecedent government suffices (as in Lasnik and Saito 1984, 1992).9 While this reduction may work for Superiority violations involving a superior subject in a finite clause, it fails for all other superior positions. Consider (4a–b) with the representations (5a–b): (4)
a. b.
*What did John expect who(m) to read? *what did John persuade who(m) to read?10
Generative grammar (5)
a. b.
52
[CP what1 did [IPJohn expect [CP t1 [IP who(m) to read t1]]]] [CP what1 did [IPJohn persuade who(m) [CP t1 [IP PRO to read t1]]]]
In (4a) the complement subject will undergo exceptional Case marking by the matrix verb. Under the standard GB analysis the infinitival subject position is lexically governed by the matrix verb. Thus antecedent government of the trace in complement subject position by the wh-phrase in [Spec, CP] is not required. Under the Spec/head agreement analysis of Case assignment (Chomsky and Lasnik 1993, Chomsky 1993), who(m) would move to [Spec, AgrO-P] in the matrix clause where it would antecedent govern its trace in complement subject position. (5a) therefore involves a Superiority, not an ECP, violation. In (5b) the superior wh-phrase is in object position where it is lexically governed. Thus (5b) is also a Superiority violation, but not an ECP violation.11 Another configurational analysis of superiority effects is given in Pesetsky (1987) involving the relation between paths of moved wh-phrases (or in more contemporary terms, between wh-chains). Specifically, Pesetsky proposes a Nested Dependency Condition as formulated in (6): (6)
Nested Dependency Condition (NDC) If two wh-trace dependencies overlap, one must contain the other.
(6) accounts for the following contrast (his (22a–b)): (7)
a. b.
?What book2 don’t you know who1 to persuade t1 to read t2? *Who1 don’t you know what book2 to persuade t1 to read t2?
In (7a) the chain (who1, t1) is nested inside the chain (what book2 t2), whereas in (7b), the chains intersect because what book lies inside the chain created by the movement of who while its trace is outside that chain.12 To handle Superiority violations within a single clause (e.g., (2a)), Pesetsky adopts the analysis of Jaeggli (1981), where the wh-phrase that remains in-situ at S-structure is adjoined to the clause that dominates the moved wh-phrase—that is, adjunction to CP in our analysis or to S′ in Jaeggli’s. This analysis corresponds to the standard raising analysis of quantifiers (May 1977) in which scope ambiguities are resolved in LF representations—specifically, the quantifier with wider scope will c-command the quantifier with narrower scope at LF. This seems inappropriate since multiple wh-phrase constructions do not manifest such scopal ambiguities. The answer to a question like (2b) is simply a paired list of people and objects. It involves a mapping from readers to things read. In contrast the LF representation allowed by the NDC (i.e., where what takes wide scope over who) corresponds to an interpretation where the mapping is from things read to readers. Yet a list of what was read by whom (e.g., “Ulysses was read by Bill and Mary, and Middlemarch was read by Jane, Fred, and Sally,…”) is a thoroughly unnatural response to a question like (2b).13 The NDC generalizes to certain Subjacency violations.14 Thus (8) violates both the NDC and Subjacency:
Superiority, subjacency, and economy (8)
53
*[CP what1 do [IP you wonder [CP who2 [IP Bill gave t1 to t2]]]]
The chain (who2, t2) intersects the chain (what1, t1) and the link of the latter chain crosses two bounding categories (2 IPs). Nonetheless, (8) cannot be attributed to a crossed binding constraint like the NDC because of examples like (9), which are equally unacceptable but where there is no occurrence of crossed binding, as discussed in Freidin and Lasnik (1981). (9)
*[CP who2 do [IP you wonder [CP what1 [IP Bill gave t1 to t2]]]]
Presumably (9) is only a Subjacency violation (but see note 40 for a different analysis). It is not a Superiority violation because at the point in the derivation where what moves to the internal [Spec, CP], it asymmetrically c-commands who (on virtually every analysis of double object constructions that has been proposed in the literature). In contrast the movement of who to the internal [Spec, CP] in (8) violates Superiority because the whphrase what is superior.15 Such paired examples suggest that a geometric approach to movement constraints like the NDC (or the Path Containment Condition of Pesetsky 1982b) is not viable. At this point we might raise the question of whether the movement in (9) of who to the external [Spec, CP] over what in the internal [Spec, CP] constitutes a Superiority violation. Undoubtedly, the internal [Spec, CP] c-commands the object of the PP in the embedded IP. Nonetheless, the wh-phrase in [Spec, CP] could not move to the external [Spec, CP] because then the selectional requirements of wonder would not be satisfied. If we change the matrix verb to remember, which does not require a [+WH] CP complement, then it is possible to derive (10): (10)
[CP what1 do [IP you remember [CP t1 [IP Bill gave t1 to whom2]]]]
The question is whether the movement of what to the matrix [Spec, CP] is driven by Superiority as well as Subjacency which is violated in (9) by the movement of who to the external [Spec, CP]. In other words, is Superiority limited to wh-phrases in grammatical function positions? This seems like a reasonable way to sharpen the formulation of Superiority16 Note that those Romance languages in which a relative pronoun can be extracted out of a wh-island17 provide some evidence that Superiority does not apply between a superior wh-phrase in [Spec, CP] and a wh-phrase in a grammatical function position. If it did, then those constructions should be blocked by Superiority Another connection between Superiority and Subjacency arises in constructions such as (11), which can be associated with two distinct S-structures (12a–b): (11)
*What did you forget who had borrowed?
(12)
a. b.
*[CP what1 did [IP you forget [CP who2 [IP t2 had borrowed t1]]]] *[CP what1 did [IP you forget [CP t1 [IP who2 had borrowed t1]]]]
Generative grammar
54
The chain (what1, t1) in (12a) violates Subjacency and Superiority (since who, which asymmetrically c-commands the complement object from the internal [Spec, CP], could also move to the external [Spec, CP]). In contrast, derivation of the chain link between the two traces in (12b) violates only Superiority. Suppose that the language faculty assigns examples like (11) a unique mental representation at each level of representation. Within the recent “minimalist” framework (Chomsky 1993, 1995c) in which the levels of representation are restricted to the interface levels PF and LF, the problem we have identified cannot be discussed in terms of assigning two distinct S-structure representations. Rather, the computation of (11) “crashes”—that is, it fails to converge (meaning “yield a legitimate structural description”) at one or both interface levels. Under a standard GB analysis (11) is computationally ambiguous. Computational ambiguity of this sort seems rather inefficient for language design, so perhaps it can be excluded. Let us conjecture that the only kind of computational ambiguity we will find in languages is that in which the two computations are convergent, as in the standard case of structural ambiguity (e.g., a review of a book by two linguists). Given this conjecture, the language faculty assigns a unique computation to (11) and therefore (11) crashes for a particular reason. If this is correct, then certain aspects of derivations are not epiphenomenal, as some earlier research was interpreted as suggesting. From this perspective the problem with (11) is to determine why the computation crashes. Does the failure to converge involve PF or LF (or both)? The answer will depend on how and where Superiority and Subjacency apply to derivations or the representations they create, which in turn depends on other factors concerning the derivation of (11). Putting aside the issue of at what level or levels nonconvergence occurs, we have two distinct possibilities: the computation of (11) crashes because of Superiority or Subjacency. The choice between these two possibilities can be resolved if it follows from the theory of grammar that either the wh-phrase in subject position must move into [Spec, CP] prior to the movement of the wh-phrase in object position (in which case we get a Subjacency violation) or the wh-phrase in subject position does not move to a [Spec, CP] until after the wh-phase in object position moves to the external [Spec, CP]. Under the latter scenario Subjacency is not violated, but Superiority is. Of course, in the absence of a theory that chooses one analysis over the other, we might try to identify some data whose analysis requires one option and excludes the other. The standard versions of GB do not distinguish the two computations of (11) given in (12). Thus, (11) is excluded either because it violates two conditions or only one. Under the assumption that a construction that violates two general principles will be perceived as more deviant than one that violates only one of the two, this analysis of (11) cannot stand. One way of computing (11) must be wrong. Fortunately, there is a way to tease apart the Superiority versus Subjacency analyses empirically. Consider the well-known fact that superiority effects disappear when bare interrogative pronouns are replaced by which-phrases, as illustrated in the contrast between (13a) and (13b):18 (13)
a. b.
*What did who borrow? Which books did which students borrow?
Superiority, subjacency, and economy
55
This fact gives us a tool for testing the sort of violation that occurs in (11). If the corresponding construction with which-phrases instead of bare interrogative pronouns is less deviant, then we know that (11) involves a Superiority violation. (14)
Which books did you forget which students had borrowed?
Given that (14) is significantly better than (11)—some speakers (myself included) find it essentially normal, (11) appears to be a Superiority violation, not a Subjacency violation. Having identified some empirical evidence for a unique analysis of (11), we now need to show how the analysis follows from the theory of grammar. I am assuming that examples like (13b) and (14) are not necessarily part of the primary language data of every child who acquires a grammar of English. Even if they were, the analysis of constructions like (11) does not follow automatically. Within the current minimalist framework as sketched in Chomsky (1993) there are two economy of derivation principles that could be used to predict that a wh-phrase in subject position that is not moved out of its clause remains in-situ in the derivation to PF19 (henceforth “overt syntax”20): Procrastinate and Greed. Procrastinate is based on the notion that LF movement is somehow preferable to overt movement (see Chomsky 1993:30). Thus, if nothing requires the movement of a subject wh-phrase in overt syntax, it will not move in the overt syntax. The derivation to LF does require the covert movement of a wh-phrase subject to [Spec, CP] in order to create a proper quantifier/variable structure. Otherwise, the derivation crashes at LF because the result violates Full Interpretation (FI)—a quantifier that does not bind a variable cannot be interpreted. Greed, a somewhat related principle, is characterized as selfserving Last Resort, where the Last Resort principle licenses a step in a derivation “only if it is necessary for convergence” (Chomsky 1993:32). “Derivations are driven by the narrow mechanical requirement of feature checking only, not by a ‘search for intelligibility’ or the like” (Chomsky 1993:33). Under Chomsky’s version of Greed, an element moves only to check its own features. As will be discussed below, because this requirement appears to be too strong for wh-movement, a modified version of Greed will be proposed. Given that quantifier/variable structures need not occur in overt syntax (as in the case of non-wh-quantifiers), it is reasonable to assume that the construction of these structures can be postponed to the derivation to LF. The application of Procrastinate to (11) therefore suggests that (12b) should be a viable PF representation. However, Greed pulls harder in the other direction. Following the proposal for feature checking in Chomsky (1993), let us assume that languages have strong or weak features; that strong features are visible at PF whereas weak features are invisible; that features that are checked “disappear”; and that strong features that remain unchecked at PF cause a derivation to crash. Let us further assume that [+WH] is a strong feature in English.21 Therefore, it must be checked in overt syntax, or the derivation will crash at PF. For reasons that will become clear below, I am going to assume that this wh-feature is the one attached to C and that it is checked via Spec/head agreement when a wh-phrase moves into the relevant [Spec, CP]. Thus, if C contains [+WH], this feature must be checked in the overt syntax, or the derivation will crash at PF.
Generative grammar
56
Consider a verb such as expect that does not allow indirect question complements. The sentence (15a) could be represented as either (15b) or (15c): (15)
a. *John expects the students read what. b. [CP [C+WH] [IP John expects [CP [C −WH] [IP the students read what]]]] c. [CP [C −WH] [IP John expects [CP [C −WH] [IP the students read what]]]]
In (15b) the wh-phrase has failed to move to the external [Spec, CP], hence the strong wh-feature in the external C is not checked, and the derivation crashes at PF because it is visible. At LF the wh-phrase can still move to the external [Spec, CP] to check the whfeature (too late in this instance); however, this will create the requisite quantifier variable structure needed to interpret the wh-phrase at LF. So the derivation converges at LF. Greed applies both to LF, where it governs the movement of an object or infinitival complement subject NP to [Spec, AgrO-P], and to PF, where it determines the movement of the subject of a finite clause from [Spec, VP] to [Spec, AgrS-P] (= [Spec, IP]). In (15c) there is no strong feature to be checked, thus no reason for the wh-phrase to move at all in the overt syntax or the covert syntax, given Greed. Thus, the derivation of (15c) crashes at LF because the wh-phrase cannot be properly interpreted—a violation of FI. If we substitute the verb wonder, which obligatorily selects a [+ WH] C, for the matrix verb in (15a), the derivation crashes at PF because this strong feature is not checked. Greed would allow the wh-phrase to move to the internal [Spec, CP] at LF, hence the derivation would converge at LF. If that movement did not occur, then the derivation crashes at LF too for the familiar reason. If the matrix verb in (15a) is changed to forget, which optionally selects a [+ WH] C, then either the matrix C or the complement C or both could contain [+WH]. If both do and there is only one wh-phrase, then one will not be checked, and the derivation crashes at PF. If there are two wh-phrases in the construction, then each wh-feature can be checked and the derivation converges at PF (as in (14)). If neither matrix nor complement C contains the wh-feature, then Greed causes the derivation to crash at LF as an FI violation. Now consider the case in which there are multiple wh-phrases and only one [+ WH] C—e.g., who read what? If feature checking applied to a feature of the wh-phrase, then such questions should crash at PF, contrary to the fact. For this reason we have been assuming that a wh-phrase checks the wh-feature in C and not conversely.22 As for the LF representation of multiple wh-questions in which one wh-phrase remains in-situ, let us assume that the moved wh-phrase provides an adjunction site to which the wh-phrase insitu can move at LF to create the appropriate quantifier/variable structure. This analysis requires that we modify the analysis of Greed for wh-movement. The wh-phrase is not moving into [Spec, CP] to check its own [+ WH] feature, but rather the one contained in C. Let us assume that the purpose of Greed is to maximize convergent derivations (i.e., profits). If a particular movement does not lead to a convergent derivation, then Greed blocks it. Apparently the wh-feature in C is independent of whatever feature accounts for the formation of yes/no questions. Thus (16a) crashes either at PF because there is a [+ WH] feature in the external C that is not checked or at LF because the wh-phrase cannot move to create a quantifier/variable structure.
Superiority, subjacency, and economy (16)
a. b. c.
57
*Does John really expect [−WH] the students to read what? What does John really expect [−WH] the students to read? Does John really expect [−WH] the students to read Barriers?
In contrast the wh-phrase in (16b) converges at PF because the [+WH] feature in the matrix [Spec, CP] is checked by the wh-phrase and at LF for the familiar reason. (16c) does not involve a [+WH] C, as indicated by the contrast between (16a) and (16c). With this analysis in mind we can return to the computation of (11) as a violation of Superiority rather than Subjacency This concerns the way feature checking is carried out when a wh-phrase occurs in [Spec, IP]. To see what this involves, let us consider the verbs expect, forget, and wonder, which differ in their ability to take indirect question complements. As the paradigm in (17) illustrates, expect cannot take an indirect question complement, forget may (but does not have to), and wonder must. (17)
a. b. c. d. e. f.
*John expects who to borrow the book. Who does John expect to borrow the book? John forgot who had borrowed the book. Who did John forget had borrowed the book? John wondered who had borrowed the book. *Who did John wonder had borrowed the book?
(17a) crashes at LF for the same reasons that (15a) does, depending on which analysis of the external C we choose. In (17b) who winds up in the matrix [Spec, CP] to have its [+ WH] feature checked by [+ WH] C. The chain it forms involves the complement [Spec, CP]—in which case Subjacency is satisfied. (17c) and (17e) seem to be identical in the relevant respects. In both complements the wh-phrase subject raises to [Spec, CP] to check the [+ WH] feature.23 (17f) crashes at PF because the [+ WH] feature of the complement C has not been checked.24 In contrast the feature of the complement C in (17d) is [− WH] and therefore does not require overt material in [Spec, CP]. Notice that we have been assuming that the wh-phrase in [Spec, IP] raises to [Spec, CP] so that checking of the wh-feature can occur. There is another possibility—namely, that checking occurs without movement to [Spec, CP]. If this is feasible, then Greed will rule out the string-vacuous movement of the wh-phrase in the overt syntax. The movement of the wh-phrase from [Spec, IP] to [Spec, CP] would occur in the covert syntax, the derivation to LF, as predicted by Procrastinate. Such derivations would require that the [+ WH] feature of complement C be checked by the wh-phrase in complement subject position. While this analysis might appear feasible for (17c) and (17e), it would create a problem for constructions such as (18) where the matrix verb selects a [− WH] C: (18) a. Which books does John expect which students to borrow? b. [CP which books1 did [IP John expect [CP t1 [IP which students2 to borrow t1]]]]
If the selectional feature on C for forget and wonder can be transferred to I, then presumably the same thing should happen with expect, thereby creating a Spec/head agreement violation. Since (18) is perfectly acceptable, its computation should not violate
Generative grammar
58
any constraint on movement or selectional restriction. In this way constructions such as (18) argue against the possibility that the [+ WH] feature on C can be checked by a whphrase in [Spec, IP]. (As we will see below, an even stronger empirical argument can be given against this analysis.) Given Greed, the computation of (18a) cannot involve successive cyclic Move α through the internal [Spec, CP] because movement of the wh-phrase to this position is not motivated by feature checking. Also, a two-step derivation would violate the principle of Least Effort (Chomsky 1991), provided there is a way of accounting for this longdistance movement in a single step. Following Chomsky (1993), let us now assume that the basic movement operation is not Move a, but rather Form Chain. Form Chain constructs (18b) in one step by moving which books to the external [Spec, CP] and at the same time constructing the chain consisting of the trace in the internal [Spec, CP] and the trace in complement object position.25 The chain is well formed since its links are minimal (i.e., satisfy Subjacency etc.). Given this formulation, it follows that Subjacency, now a condition on chain links, must be construed as a condition on representations.26 In this way (18) provides empirical evidence for Form Chain under a theory that includes Greed and a principle of Least Effort. In (18) the wh-phrase in the complement subject position could not be raised to the internal [Spec, CP] position, thereby creating a putative Subjacency violation. However, a corresponding example with forget (e.g., (14) above) is susceptible to this analysis. Assuming the Form Chain analysis, the derivation of (14) with respect to wh-movement can be computed as two steps in (19) or one step in (20): (19) [CP which books1 did [IP John forget [CP which students2 [IP t2 had borrowed t1]]]] (20) [CP which books1 did [IP John forget [CP t1 [IP which students2 had borrowed t1]]]]
In (20) the one-step consists of moving the complement object to the external [Spec, CP] and forming a chain, which satisfies Subjacency. The two-step derivation in (19) can be given in two ways, depending on the order of the operations. If the complement subject which students is moved first to the complement [Spec, CP], then the application of Form Chain to the complement object which books will create a chain that violates Subjacency. The alternative derivation is countercyclic: which books moves long distance to the matrix [Spec, CP], forming a chain that satisfies Subjacency, and then which students moves into the complement [Spec, CP]. A principle of Least Effort selects (20) over (19) ceteris paribus. Given the status of (14), it seems clear that the proper computation for constructions such as (14) is (20) and not (19). As noted earlier, Subjacency violations do not exhibit varying effects depending on whether a which-phrase or a bare interrogative is involved. Furthermore, can, like expect, select a [− WH] C as discussed above. If the complement C in (20) does not contain [+ WH], Greed prevents the complement subject from moving to complement [Spec, CP] at LF to create a quantifier/variable structure. Instead, it must move to the external [Spec, CP], adjoining to the wh-phrase there to create a complex quantifier that is appropriate for the pair-list reading of such constructions.27 However, when both wh-phrases in a multiple wh-question have moved to a [Spec, CP] in the overt syntax, the pair-list reading is blocked, as illustrated in (21):
Superiority, subjacency, and economy (21)
a. b.
59
Which student did the professor tell which book to read? Which book did the professor tell which student to read?
In (21b) the question asks for a pair-list answer, whereas in (21a) it does not—instead, it asks for the identity of student x such that the professor told x which book to read. The difference in interpretation is directly related to the difference in the syntactic structures of (21a) and (21b). In (21a) both wh-phrases have moved in the overt syntax to [Spec, CP] positions; therefore, at LF they form independent quantifier/variable structures. In contrast only one wh-phrase has moved to [Spec, CP] in (21b), the other remaining insitu. However, at LF, the in-situ wh-phrase (which student) must move to [Spec, CP] in order to form a quantifier/variable structure. The only c-commanding [Spec, CP] position is already filled with a wh-phrase, therefore the in-situ wh-phrase adjoins to the whphrase in [Spec, CP], forming an absorption structure that gives the pair-list reading. Notice also that the nonpair-list reading of (14) may be computationally more complex than that of (21 a) because it involves two intersecting quantifier/variable chains, while those in (21 a) do not intersect. Although this analysis accounts for expect and forget (e.g., (18) and (14)), the corresponding construction with wonder, which obligatorily selects [+ WH] C, unlike forget, raises some further problems:28 (22)
Which books did John wonder which students had borrowed?
The status of (22) is on a par with (14), though perhaps slightly degraded because of the computational complexity noted above. Significantly, (22) does not have a pair-list reading like (14), but only the one asking “for which books x did John wonder which students had borrowed x.” The fact that (18) can only have a pair-list reading and (22) cannot would follow if the complement subject of (22) moves to [Spec, CP] in the overt syntax. In this way (22) provides evidence against the Vacuous Movement Hypothesis (cf. Chomsky 1986b). As with the corresponding example with forget, (22) has two potential analyses, given in (23): (23) a. [CP which books1 did [IPJohn wonder [CP which students2 [IP t2 had borrowed t1]]]] b. [CP which books1 did [IPJohn wonder [CP t1 [IP which students2 had borrowed t1]]]]
It would seem that (23b), the blocked Superiority violation derivation, and not (23a), the Subjacency violation, provides the appropriate computation. And yet the selectional feature of wonder requires (23a) at PF. If (23a) must feed the PF computation of (22), the countercyclic derivation seems the only way to avoid a Subjacency violation. We have been assuming that feature checking of wh-features applies at PF for good reason. Consider what results if we adopt the alternative assumption—namely, that whfeatures are checked at LF. (22) is no longer a problem because the wh-phrase in [Spec, IP] could raise to the complement [Spec, CP] at LF to check the wh-feature in C. However, this mechanism would also check the same feature in (24a), so that the explanation for the deviance of such constructions is lost.
Generative grammar (24)
a. b.
60
*John wondered Mary really admires who(m) *John forgot Mary really admired who(m)
If forget in (24b) selects [+ WH], checking features at LF will not account for such constructions—in contrast to feature checking at PF, where the wh-phrase remains insitu. Thus, checking of wh-features occurs at PF, not LF. It is worth noting that this account of (22) provides evidence for the derivational (as opposed to a representational) approach to grammar: Under a derivational approach, computation typically involves simple steps expressible in terms of natural relations and properties, with the context that makes them natural wiped out by later steps of the derivation and not visible in the representations to which it converges.29 (Chomsky 1995b) The second step in the derivation of (23a) wipes out the context that allowed the first step. Ironically, the proposed derivation for (22) violates the Strict Cycle Condition (SCC) as formulated in Chomsky (1973).30 This would be a serious problem if the SCC were an axiom of the theory, but not if the empirical effects of the SCC followed from other considerations (see Freidin 1978, Kitahara 1995). Given the Form Chain analysis of whmovement outlined above, it should be clear that Subjacency alone no longer accounts for any deviant wh-island construction.31 Consider the derivation of (25): (25)
* Which students did John wonder which books had borrowed?
As in (22), the movement of each wh-phrase in (25) is motivated by Greed. If which books moves to the internal [Spec, CP] first, then the chain formed by a second movement of which students to the external [Spec, CP] will violate Subjacency. Presumably, this derivation converges at PF but crashes at LF. If, however, the two applications for Form Chain are reversed, then neither chain formed will violate Subjacency. The reason that (25) is deviant, but (22) is not, must follow from some other condition—in this case the EC P. The chain (which students1, t1) in (25) violates the ECP at LF because the trace is not properly governed. In contrast the trace in the LF chain (which books2, t2) in (22) must be properly governed. In the latter case it does not seem as if antecedent government will hold since both (22) and (25) appear to be relativized minimality violations of the same sort. This suggests that we may need to retain head government to explain the difference between (22) and (25), which involve another subject/object asymmetry.32 Chomsky (1993) proposes that the notion “shortest link,” expressible in terms of the operation Form Chain, might be used to incorporate parts of Subjacency and the ECP under the intuitive formulation of a Shortest Movement Condition expressed in (26):
Superiority, subjacency, and economy
61
(26) “Shortest Movement” Condition Given two convergent derivations D1 and D2, both minimal and containing the same number of steps, D1 blocks D2 if its links are shorter. (Chomsky op. cit.: 34)
Chomsky suggests that the phenomena of Superiority and Relativized Minimality (including superraising, the Head Movement Constraint, and [Spec,CP] islands (including wh-islands)) should fall out from such economy considerations, though an explicit account is not provided. (See Kitahara 1993 for a more detailed analysis of Superiority effects.) To see how the Shortest Movement Condition (henceforth Shortest Movement) subsumes the effects of Superiority, consider how it would account for the simplest case, exemplified by (2). A question immediately arises about what part of the two derivations is relevant to the computation of this condition. Every derivation will involve an overt part, the derivation to PF, and a covert part, the derivation to LF. In the overt part, (2a–b) would involve (27a–b) respectively—pretending that the subject does not originate inside VP to keep the representations simple. (27)
a. b.
[CP what2 [C* did [IP who1 [VP read t2 ]]]] [CP who1 [C* [IP t1 [VP read what2]]]]
The link between what and its trace in object position is obviously longer than the one between who and its trace. However, in the LF representations (3a–b) (repeated here for convenience), the links are the same for each derivation since both phrases have moved to the same two positions:33 (3)
a. b.
[CP[NP2 who1 [NP2 what]] [IP x1 read y2]] [CP[NP1 what2 [NP1 who]] [IP x1 read y2]]
This shows that Shortest Movement applies in the overt syntax, not to LF.34 It also shows that Shortest Movement will not account for the difference between (22) and (25) since each example involves the movement of both wh-phrases to the same two positions. The same situation obtains in the more complicated case of (28) where the two whphrases start out in different clauses:35 (28)
a. b.
Who(m)1 did John persuade t1 [to visit who(m)2] *Who(m)2 did John persuade who(m)1 [to visit t2]
Taking (28a–b) to represent a pair of two convergent derivations in the overt syntax (PF), the link between who1 and its trace is, again obviously, shorter that the link between who2 and its trace. Thus, the difference in acceptability between (28a) and (28b) follows from Shortest Movement. Given Greed, the wh-phrase in complement object position cannot move to the [Spec, CP] of the complement because persuade does not select a [+ WH] C. (28b) is only a Superiority violation under the Form Chain analysis given that a trace of
Generative grammar
62
the moved wh-phrase will occur in the complement [Spec, CP], whereas under a Move α analysis (assuming Greed) it would be both a Superiority and a Subjacency violation. As with the analysis of (11), these two analyses can be distinguished by replacing the bare interrogative pronouns in (28) with which-phrases: (29)
a. b.
Which students1 did John persuade t1 [to visit which professors2] Which professors2 did John persuade which students1 [to visit t2]
(29b) is perfectly normal.36 However, the derivation of (29b) is identical to that of (28b), which is blocked by Shortest Movement. This contrast argues against Shortest Movement as formulated in (26). The condition is simply too general to make the fine distinctions apparently needed.37 Having dispensed with Shortest Movement, we can return to the two analyses of (28b) under Greed. With Form Chain (28b) is only a Superiority violation. Given this we would expect (29b) to be normal, as it is. Under the Move α analysis (29b) would still be a Subjacency violation since (28b) is both a Superiority and Subjacency violation. Thus (29b) shows that (28b) is only a Superiority violation, thereby providing further empirical evidence for the Form Chain analysis of movement. That is, Move α appears to be incompatible with Greed. Notice that there are other constructions whose derivations involve no apparent difference between the Move α and Form Chain analyses and about which Shortest Movement also makes the wrong prediction. The derivation of (30a-b) involves one NP movement from complement subject to matrix subject and one wh-movement from matrix clause to [Spec, CP]: (30)
a. b.
Which books1 [t1 seem to which students2 [t1 to be boring]] To which students2 do [which books1 seem t2 [t1 to be boring]]
(30b) should be blocked by (30a) under Shortest Movement.38 As might be expected, the same phenomena occur with passive constructions, which also involve one NPmovement: (31)
a. b.
Which students2 were [t2 persuaded t2 [to read which books1]] Which books1 were [which students2 persuaded t2 [t1 [to read t1]]]
As illustrated, in (32) and (33), these constructions can be recast as indirect question complements of forget and wonder: (32) (33)
a.
I forgot which students were persuaded to read which books.
b.
I forgot which books which students were persuaded to read.
a.
I wonder which students were persuaded to read which books.
b.
I wonder which books which students were persuaded to read.
Superiority, subjacency, and economy
63
Moreover, all of these except (33a) can be turned into direct questions with only a slight degradation in acceptability, as in (34) and (35):39 (34) (35)
a.
Which students did you forget were persuaded to read which books?
b.
Which books did you forget which students were persuaded to read?
a.
* Which students do you wonder were persuaded to read which books?
b.
Which books do you wonder which students were persuaded to read?
(35a) violates the selectional property of wonder, hence the derivation crashes at PF because the [+ WH] feature in C of the CP complement of wonder is not checked. In (35b) that feature is checked by the movement of the complement subject which students into the complement [Spec, CP]. Notice that there are two possible derivations here using Form Chain. Given that there is a [+ WH] feature in C of the CP complement of wonder, which books could move to the [Spec, CP] of that complement, forming a chain, and then move to the matrix [Spec, CP], forming a second chain. Alternatively, the wh-phrase could move directly to the matrix [Spec, CP], forming a single chain. Economy considerations along the lines of Least Effort (see Chomsky 1991) would preclude the first option. The wh-feature attached to the CP complement of wonder will be checked in PF by the overt movement of which students from [Spec, IP] to [Spec, CP]. Note that this derivation provides another more complicated example that necessarily violates the Strict Cycle Condition. So far nothing precludes the possibility of a cyclic derivation of constructions like (22) and (35b), where the long-distance movement would invariably result in a chain that violates Subjacency. This is not a problem, however, as long as the countercyclic derivation is available. In general when the theory allows a convergent and a nonconvergent derivation for the same string, the convergent derivation usually masks the nonconvergent one. Thus, we can ignore the cyclic derivation that gives a bad result, though ideally the cyclic derivation would be blocked by some principle of grammar. Under the countercyclic Form Chain analysis of wh-movement discussed above, no Subjacency violation could occur in a construction with a single embedded clausal complement. Thus the standard cases of wh-island violations involving the movement of two wh-phrase complements of V, as in (36), can no longer be explained as Subjacency violations. (36)
a.
*Which books did you forget to which students Bill recommended?
b.
*To which students did you forget which books Bill recommended?
On the cyclic derivation the examples in (36) violate Subjacency in the usual way, but not on the countercyclic derivation. Therefore, something aside from Subjacency must prohibit the countercyclic movement inside the complement CP. Just prior to this movement Form Chain creates the following structures:
Generative grammar
64
(37) a. [CP which books1 did [IP you forget [CP t1 [IP Bill recommended t1 to which students2]]]] b. [CP to which students2 did [IP you forget [CP t2 [IP Bill recommended which books1 t2]]]]
Since a trace is an empty category analogue of its antecedent, t1 in (37a) is an NP, whereas t2 in (37b) is a PP. The countercyclic movement of the wh-PP in (37 a) or the wh-NP in (37b) should be ruled out by the nondistinctness on substitutions because the traces are categorially distinct from the phrases that replace them. In this way both possible derivations of (36) have a bad outcome.40 At this point we might wonder whether Subjacency plays any role at all regarding the movement of multiple wh-phrases in a sentence. That it does play a crucial role can be demonstrated with examples containing at least two successive sentential embeddings: (38)
*Which books couldn’t Mary remember which students John expected to read?
With respect to wh-movement the derivation of (38) involves two steps: which students moves from the subject of read to [Spec, CP] of the complement of remember, and which books moves from the object of read to the [Spec, CP] of the root clause. This can be accomplished either cyclically or countercyclically. In the cyclic derivation the movement of which books is a straightforward Subjacency violation. The application of Form Chain in this instance creates a chain, one of whose links will cross two IPs because the intervening [Spec, CP] is occupied by which students. The countercyclic derivation is somewhat more interesting. Form Chain moves which books to the root CP and creates a chain with traces in all the intervening [Spec, CP] positions. This chain observes Subjacency. The next step in the derivation requires Form Chain to move which students to the [Spec, CP] of the complement of remember. If the resulting structure is as given in (39) (where t2 is the trace of which books), then the chain formed by which students and its trace constitutes another Subjacency violation. (39)
…[CP which students1 [IP John expected [CP t2 [IP t1 to read t2]]]]
(39) results only if Form Chain cannot replace a trace with another trace. In contrast Form Chain can substitute a wh-phrase for a trace (for feature checking, as required by Greed). Note that even the cyclic derivation can be interpreted in this way. Thus Subjacency violations for such constructions follow as a consequence from the restriction on chain formation by the operation Form Chain.41 The analysis of wh-movement presented here has been motivated by principles of economy, primarily Greed with assistance from Least Effort. Taking Greed to be perhaps the major motivating force for substitution operations leads to the choice of Form Chain over Move α as the proper formulation for the movement transformation. As a result successive cyclic movement is revealed as an illusion, its effects subsumed under Subjacency (applied to the chain links created by Form Chain) and a rather natural prohibition against replacing one trace with another via Form Chain. The analysis of Superiority phenomena demonstrates that Form Chain is not subject to a Shortest Movement constraint (as proposed in Chomsky 1993) and therefore plays no role in
Superiority, subjacency, and economy
65
motivating Form Chain over Move α. Furthermore, the analysis of wh-movement discussed in this paper suggests that some derivations may have to be countercyclic, not strictly cyclic. Therefore derivations may be countercyclic or cyclic, with improper consequences ruled out by other factors. Much research in generative grammar over the past four decades has quite properly attempted to identify and explore large-scale generalizations about syntactic processes, including the A-over-A Principle, the wh-Island Condition, the Strict Cycle Condition, Subjacency, Koster’s Locality Principle (Koster 1978b), Connectedness (Kayne 1983), Global Harmony (Koster 1986), and currently the Shortest Movement Condition. It seems to me that the history of the field has shown that these large-scale syntactic generalizations tend to break down at various points and have to be restricted or replaced by more fine-grained principles (e.g., Greed, which essentially concerns morphological properties). It is therefore not surprising to find that a closer look at superiority phenomena and wh-movement under minimalist assumptions leads to the conclusion that successive cyclic movement is illusory, that derivations can be countercyclic and hence the empirical effects of the Strict Cycle Condition must be derivable from other principles of grammar, that the Shortest Movement Condition does not appear to be viable, and that the concept “wh-island” turns out to be completely spurious. In the middle of Samuel Beckett’s play Endgame a character exclaims (“With fervour”), “Ah the old questions, the old answers, there’s nothing like them!” This is a profoundly irrational attitude, as anyone actively engaged in rational inquiry (including the natural sciences, as Carlos Otero pointedly notes in the quote at the beginning of this paper) knows. And as any generative grammarian who has been following the field over the past few decades knows all too well, it is demonstrably false. The great power of rational inquiry is that it can lead us to see the world in a new and hopefully clearer light. It liberates us from the tyranny of habit and the domination of concepts whose authority over us is without real justification. The more accurate our understanding of the world, the better our chances of finding real solutions to the problems we study. One hopes that this uniquely human ability can be applied to other aspects of human activity to improve the way we live as individuals and in society. Notes
* I am indebted to Len Babby, Sam Epstein, and Howard Lasnik for discussion of various portions of the material covered here and to audiences at USC and UC Irvine for comments on oral presentations of this material. I am further endebted to Epstein and Lasnik for extensive comments on a draft of this paper. 1 See also Otero (1984, 1991, and in preparation). 2 See Chomsky (1981) for the classic discussion and Freidin (1994b) for further discussion of the historical development. 3 For discussion of how strict cyclicity might be derived from the economy conditions of Chomsky (1993) see Kitahara (1995). See also Collins (1994), which attempts to derive the empirical effects of the Generalized Proper Binding Condition from conditions on the economy of derivations. 4 Henceforth, two dates separated by a slash indicate the date of an unpublished Ph.D. dissertation followed by the date of a subsequent published version. 5 It turns out, however, that Superiority, like Subjacency, applies to rightward movement as well. Consider the paradigm in (i):
Generative grammar
66
(i) a. A report about the new power station that was written by three senators just appeared. b. A report about the new power station just appeared that was written by three senators. c. *A report that was written by three senators just appeared about the new power station. Given that the PP about the new power station can extrapose when it is the sole adjunct in an NP, as shown in (ii), we now have to explain why it cannot extrapose when it occurs with another adjunct—in this case a reduced relative clause—in the NP: (ii)
A report just appeared about the new power station. Under the natural assumption that the two adjuncts in the NP are in an asymmetric ccommand relation, (i-c) is a straightforward Superiority violation now applied to rightward movement. This result is quite general across a range of NP constructions. For details and discussion see Freidin (1994c). It is worth mentioning that if Superiority generalizes to rightward movement, then it probably cannot be reduced to a condition on Operator Disjointness as proposed in Lasnik and Saito (1992) (see also Epstein 1993 for further discussion).
6 Recall that in 1973 the notion “c-command” had not been developed, and therefore Chomsky’s original definition of the relation “superior” is not identical. Chomsky (1973) gives the following definition: “We say that the category A is ‘superior’ to the category B in the phrase marker if every major category dominating A dominates B as well but not conversely.” The term “major category” designates N, V, and A and the categories that dominate them. However, if the rightward movement cases mentioned in the previous note fall under Superiority, then the formulation of the condition might be simplified to just asymmetric c-command. The formulation of Superiority in (1) requires a further clarification under the Move α analysis. Since Move α does not distinguish between wh-movement and NP-movement (i.e., of non-wh-NPs), the condition should not be interpreted as applying to a construction where, for instance, a non-wh-NP is superior to a wh-phrase. Thus, (i) does not constitute a Superiority violation even though Move α applies to both the complement subject and object: (i)
Who does Mary seem to like? It is not that Move α applies ambiguously to two constituents in an asymmetric c-command relation. What is crucial is that the application of Move α affects the same constituent X.
7 To avoid complicating the exposition, I will use current analyses rather than the historically accurate ones unless the difference is significant. In this case adopting the S′ → Comp S analysis rather than the CP/IP analysis of clauses does not affect the discussion. 8 This proposal appears in Jaeggli (1981), who credits it to Chomsky (class lectures 1979). There is a somewhat different analysis involving the same idea in Aoun, Hornstein, and Sportiche (1980), which utilizes a special mechanism of Comp-indexing at S-structure. As we will see below, there is no need to posit such a mechanism under minimalist assumptions, hence the LF representations here are different from previous analyses (though most similar to May 1985), but the difference does not affect the ECP account. It should be noted here that this was not the first attempt to derive Superiority. See Koster (1978b), where a general c-command condition on locality (his Locality Principle) is
Superiority, subjacency, and economy
67
proposed as an alternative to the Specified Subject Condition, parts of Subjacency and Superiority among other phenomena. See also the Priority Filter of Fiengo (1980). 9 See the discussion of examples (22) and (25) below for some evidence that seems to support the disjunctive formulation. This particular analysis also crucially depends on not adopting the May/Chomsky theory of adjunction structures (May 1985, Chomsky 1986b, 1993), where an adjunction structure creates a two-segment category. Under this theory a category α Dominates (the capitalization indicates the special definition of the term) a category β if all the segments of α dominate β. (Notice that we still rely on the standard definition of domination to define “Dominates.”) We then substitute “Dominate” for “dominate” in the definition of c-command. Thus, both who in (3a) and what in (3b) will c-command their respective traces because the first category that Dominates them is CP. We might try to save the ECP analysis by adjoining the in-situ wh-phrase to CP. For the examples in (2) this would have the effect that the phrase adjoined to CP would not c-command its trace because there would be no category that Dominates it. However, this strategy would immediately fail with embedded constructions as in (i), with the LF representation (ii): (i) (ii)
*I wonder what who read. I wonder [CP who1 [CP what2 [IP x1 read y2]]] In (ii) who would c-command its trace because the matrix VP is the first branching category that Dominates who.
10 See Pesetsky (1982b) and Hendrick and Rochemont (1988) where such cases are discussed as ‘pure’ superiority effects. 11 Now we might try to save the ECP analysis of superiority effects by noting that at LF, who(m) in (4a) will be adjoined to what in the external [Spec, CP]. Although the trace of who(m) in complement subject position will be antecedent governed by the trace in [Spec, AgrO-P], the trace in [Spec, AgrO-P] would not be antecedent governed by who, if— crucially—we reject the May/Chomsky adjunction theory (see note 9). A similar analysis can be constructed for (4b). Nonetheless, this analysis creates a fatal problem. Given that an object must move to [Spec, AgrO-P] at LF for Case checking, the trace of what in [Spec, AgrO-P] for (2b) will also fail to be antecedent governed. Under the May/Chomsky adjunction theory and on the analysis that the LF movement of the wh-phrase involves adjunction to the whphrase in the matrix [Spec, CP], then the adjoined wh-phrase in [Spec,CP] will ccommand and hence antecedent govern its trace in [Spec, AgrO-P]. Thus none of the Superiority violations in (2a) and (4) will violate the ECP. It is worth noting that the Pollock/Chomsky analysis of the functional category structure of clauses creates a problem for (2b) unless we abandon the LF analysis of the ECP or we adopt the May/Chomsky theory of adjunction. See Cheng and Demirdash (1990) for an alternative ECP anlysis of the superiority effects in constructions like (4). Under this analysis the ECP applies at two distinct levels: antecedent (XP) government at S-structure and head (X0) government at LF. Such anlyses are excluded under the minimalist framework of Chomsky (1993), given that S-structure is not a legitimate level of syntactic representation. 12 Pesetsky claims that the following pair of examples also demonstrates a difference that involves the Nested Dependency Condition:
Generative grammar (i)
68
a.
?This is one book which2, I do know who1 to talk to t1 about t2.
b.
*John is one guy who1 I do know what book2 to talk to t1 about t2.
These examples are unconvincing because they are essentially on a par with the corresponding examples where the to-phrase and about-phrase have been switched. (ii)
a.
?This is one book which2 I do know who1 to talk about t2 to t1.
b.
*John is one guy who1 I do know what book2 to talk about t2 to t1.
(ii-a) is no more or less deviant than (i-a) even though the two wh-chains intersect in (ii-a) while they are nested in (i-a). The deviance of (ii-b) seems equal to that of (i-b), even though the former involves nested chains. 13 This problem does not arise for the analysis of (2b) if we assume that adjunction of what to who at LF creates an absorption structure along the lines of Higginbotham and May (1981), where the NP that dominates both wh-phrases bears the index of each. Alternatively, if we adopt the May/Chomsky analysis of adjunction, then both wh-phrases will c-command their respective traces. Further, it will not matter under either analysis whether adjunction occurs to the right or the left. For the NDC to hold for (2a) adjunction would have to occur to the right. Yet even if it did, it is not obvious that the resulting structure involves an instance of crossed binding—especially if the NP that dominates both wh-phrases bears the index of who. 14 Note that (7a) and (i-a) of note 12 also apparently violate Subjacency though not the NDC. 15 Thus (i) (in contrast to (ii–iii) is a Superiority violation because the direct object asymmetrically c-commands the object of the PP: (i) (ii) (iii)
*Who(m) did John give what to? To whom did John give what? What did John give to whom?
Note that (ii) is problematic for some binary branching analyses of double object verbs (e.g., Larson 1988) where the NP what is superior to the PP to whom. 16 However, under the hypothesis that violations of multiple conditions increases the perceived deviance of a construction this formulation would not explain why (8) and (9) are perceived as equivalently deviant. If we abandon this hypothesis, nothing follows about the proper formulation of Superiority. Note that restricting Superiority to wh-phrases in A-positions eliminates a redundancy between Superiority and Subjacency that otherwise holds for whisland violations in English-type languages. 17 See Rizzi (1980) and Sportiche (1981) for discussions of Italian and French respectively. 18 See Freidin (1974). The observation has also been credited to Richard Kayne and also to Michael Brame, and doubtless should be to many others who read Chomsky (1973) when it first appeared. Thus, whereas (2a) constitutes a Superiority violation, (13b) is absolutely normal. We will return to this and other related facts below. 19 Cf. the Vacuous Movement Hypothesis of Chomsky (1986b:89–90)—namely, “vacuous movement is not obligatory at S-structure.” See (Freidin and Lasnik 1981:n. 14) where the nonmovement of the wh-phrase complement subject in (ii) is proposed to explain the noticeable difference in acceptability between (i) and (ii), a problem raised in Chomsky (1980):
Superiority, subjacency, and economy (i) (ii)
69
Who did you wonder what saw? What did you wonder who saw? Thus only (i) would be a Subjacency violation, whereas (ii) would be a Superiority violation. Freidin and Lasnik suggest that the deviance of (ii) involves a relaxing of the requirement of wonder that its Comp (i.e., complement CP) contain a wh-phrase to the less stringent requirement of adjacency between the verb and the wh-phrase. See also George (1980) for further discussion of an analysis in which wh-phrase subjects do not move string vacuously. As we will see below, this particular analysis is not available under recent theory, nor is the Vacuous Movement Hypothesis more generally.
20 Note that we cannot say that the wh-phrase in subject position remains in-situ at PF, since at the level of PF the relevant structure is missing. See Chomsky (1991:n. 10). Since there is no level of S-structure in the framework, we can only talk about the differences between overt word order and LF representations in this more complicated way. 21 See Watanabe (1992) for an analysis in which the feature is also strong in wh-in-situ languages such as Chinese and Japanese. 22 Related to this is the analysis of echo questions in which the wh-phrase remains in-situ. Presumably the interpretation of such questions is not identical to the corresponding question in which the wh-phrase has been fronted. Let us assume provisionally that there is no whfeature in C that requires checking in these constructions. Exactly how they are interpreted at LF remains to be determined. 23 There is a derivation of (17c) that would converge at PF and crash at LF—namely, the one in which the complement C was [−WH]. However, when we process a string to which the grammar can assign two distinct derivations, one legitimate and the other not, we generally ignore the illegitimate derivation. That is, we do not treat such cases as true structural ambiguity, where we can view the string one way and then another. 24 Note that this analysis does not need to refer to satisfaction of a selectional property, along the lines of Lexical Satisfaction as in Freidin and Babby (1984) and Freidin (1992), not to be confused with Satisfy of Chomsky (1993). 25 Form Chain is proposed in Chomsky (1993) to resolve a conflict between two natural notions of economy, shortest move and fewest steps. In what follows some empirical evidence will be discussed that supports Form Chain over Move α. under certain assumptions about the economy of derivations. This evidence also concerns the choice between shortest move and fewest steps. Form Chain bears a strong resemblance to the proposal for wh-movement in Bresnan (1971), which was offered to counter the argument against successive cyclic wh-movement in Postal (1970) concerning stranded prepositions by successive cyclic wh-movement. Translating Bresnan’s proposal to the contemporary analysis, a wh-phrase moves directly to a [Spec, CP] governed by a [+WH] C. The rest of the Form Chain proposal is missing. However, Bresnan and Grimshaw (1978) proposes a single long-distance movement of a wh-phrase followed by coindexation of the Comp nodes intervening between the wh-phrase and its extraction site provided a Comp does not contain its own wh-phrase. 26 Notice that in this analysis Subjacency is not an interface condition. It applies immediately to the output of Form Chain, thus it has the flavor of a condition on rule application. Although it might be argued that Subjacency is a condition on the operation Form Chain, this interpretation seems weak. We do not know that any given application of Form Chain violates Subjacency until we have formed the chain and computed its links. If the argument that Subjacency applies to Form Chain as a condition of rule application were valid, then any condition on representations could be similarly construed as a condition on the application of
Generative grammar
70
rules. Thus, if filter F blocks the output of rule R, then we would be able to say that F blocks the application of R—which is true. However, F says nothing specifically about R, as is the case with true conditions on derivations. 27 This contradicts Pesetsky’s D-linking analysis of which-phrases in which it is claimed that they do not move at LF, in contrast to bare interrogative pronouns—thus, they do not create crossed binding configurations that violate the NDC. However, without the NDC there is no explanation for the difference in behavior between D-linked versus non-D-linked whphrases. I am assuming that the previous discussion renders the NDC suspect at best. See Lasnik and Saito (1992) for further criticism of Pesetsky’s analysis. 28 Chomsky (1986b) refers to the corresponding example with bare interrogative pronouns (i) as “at worst a weak wh-lsland Condition violation”: (i)
What did you wonder who saw? (See note 19 above for additional discussion and references.) Let us assume for the moment that (i) is also a Superiority violation under the Form Chain analysis. However, the derivation of (22) would not be, since a which-phrase may move over a superior whichphrase without inducing unacceptability. See note 18. The selectional feature of wonder can also be checked by being morphologically instantiated as if or whether.
29 The examples Chomsky cites concern head movement and segmental phonology. (22), and perhaps (14) as well, provide an example involving XP movement. 30 It also violates the extension version of the strict cycle mentioned in Chomsky (1993). However, I will not pursue this issue here. 31 Under the Move α analysis, then Subjacency would have to be a condition on derivations because the intermediate trace in the internal [Spec, CP] would be destroyed by the movement of the complement subject wh-phrase to that [Spec, CP]. Thus, we arrive at a conclusion similar to that of Lasnik and Saito (1984), though for different reasons. In their analysis the wh-phrase does not have to leave a trace in [Spec, CP] and therefore does not. Hence Subjacency is violated if it is interpreted as a condition on representations, but not if it applies to derivations. What is correct about their analysis is the insight that Subjacency is directly tied to the operation of movement rules. If the analysis proposed here is on the right track, it shows again how misleading our descriptive generalizations can be. Wh-islands come in varying strengths, depending on what grammatical principles their computations violate (i.e., ECP vs. Superiority). 32 This analysis requires a disjunctive formulation of the ECP as argued for most recently in Lasnik and Saito (1992). See also Chomsky (1981), Huang (1982), and Lasnik and Saito (1984) for discussion of the disjunctive formulation of the ECP. 33 It makes no difference whether we create absorption structures at LF or adjoin the in-situ whphrase to CP. 34 See Kitahara (1993) for a detailed discussion of how Shortest Movement accounts for a range of Superiority violations. 35 (28) is essentially the one cited in Chomsky (1993:14), except that the who/whom distinction is quite weak in my idiolect. Thus I prefer (28a) with who in both wh-phrase positions. 36 For me (29a) is slightly degraded compared to (29b), which is on a par with (28a). 37 (29b) not only provides empirical evidence against the notion of shortest move, but it also provides evidence for Form Chain as opposed to Move a, crucially assuming Greed. Thus, the motivation for Form Chain can no longer be credited to a conflict between two natural notions of economy, shortest movement and fewest steps, as mentioned in note 25.
Superiority, subjacency, and economy
71
38 The corresponding paradigm with bare interrogatives is interesting because both cases seem to be deviant. (i)
a. b.
?*What seems to whom to be boring? *To whom does what seem to be boring?
(i-b) is a straightforward Superiority violation, (i-a) seems only slightly less deviant than (i-b) for reasons that remain obscure. 39 There may be a reading where both (35a–b) are acceptable—namely, where the string do you wonder is parsed as a parenthetical. We are not considering that analysis, but rather the one in which wonder functions as the main verb of the sentence. 40 At best Subjacency provides only a partial explanation for the strong deviance of these constructions. Ideally, the cyclic derivation would be unavailable so that Subjacency would not be involved at all in explaining these examples. Note that (8) and (9) still pose a problem under the assumption that only the movement of NPs is involved. However, given that such constructions are strongly deviant, on a par with those in (37) if not worse, a different analysis seems to be involved. Suppose that (8–9), like (37), involve the movement of two distinct categories (NP and PP) in spite of appearances. If so, then the stranding of the preposition must come about by some other process. This could be achieved under the copying+deletion analysis of movement discussed in Chomsky (1993). See Freidin (1999b) for details. 41 It is worth noting that such examples cannot be subsumed under a nested dependency or path containment condition of the sort proposed in Pesetsky (1982, 1987).
4 Cyclicity and minimalism * For well over three decades, some notion of the cyclic nature of transformations, and of derivations more generally, has guided theorizing about the nature of the language faculty In this chapter, I evaluate current discussions of cyclicity in terms of previously identified empirical and conceptual motivations for a syntactic cycle. In section 4.1, I give a brief history of the cycle in generative grammar, showing in particular that the original motivation for a cycle is no longer valid under current basic assumptions. In section 4.2, I discuss the current status of the syntactic cycle under various proposals within the Minimalist Program. I demonstrate how all of the empirical motivation cited for stipulating a cyclic principle falls under other independently motivated principles and analyses, so that stipulating a cyclic principle is redundant with respect to other parts of grammar (cf. Freidin 1978). In section 4.3, I evaluate recent proposals for deriving the empirical effects of a generalized cyclic principle from other principles of grammar. I subject these proposals to a minimalist critique, showing that they are based on nonminimal assumptions. Instead, I propose that the empirical effects of a generalized cyclic principle can be derived from a theory of elementary transformational operations that is optimal from a minimalist perspective. 4.1 The cycle in generative grammar: a brief history The transformational cycle was first proposed around 1964 as a way to eliminate generalized transformations from the theory while accounting in a natural way for some empirical observations about the general operation of transformations. Within the earliest formulations of transformational grammar, phrase structure rules were assumed to be nonrecursive. Recursion in phrase structure was handled instead by generalized transformations, which applied to two or more phrase markers, either embedding one inside another or conjoining them. In addition to these generalized transformations, there were singulary transformations, which operated on a single phrase marker (P-marker). As a result, a derivation was represented by a transformation marker (T-marker), indicating how the system of transformations applied. Consider the example discussed in Chomsky’s 1964 lectures at the Linguistic Society of America Summer Institute (Chomsky 1966). Sentence (1) would be derived from three base P-markers underlying the (kernel) sentences (2), the derivation being represented by the T-marker (3), where B1, B2, and B3 represent each of the base P-markers.
Cyclicity and minimalism (1)
I expected the man who quit work to be fired
(2)
a. b. c.
73
I expected it someone fired the man the man quit work
(3)
The T-marker (3) shows that the three base P-markers in (2) are processed as follows. First, (2c) (= B3) undergoes the transformation that turns it into a relative clause. Next, that derived P-marker is embedded in the P-marker for (2b) (= B2), after which the passive transformation converts the expanded P-marker into a passive construction (the man who quit work was fired by someone) and a deletion transformation removes the phrase by someone. Finally, the resulting P-marker is embedded into the P-marker underlying (2a) (= B1) and the embedded sentential structure is converted into an infinitival. Putting aside the antiquated nature of the particulars of this analysis, the incorporation of T-markers in the theory of grammar created several complications. One concerned the then-evolving semantic theory for generative grammar first proposed by Katz and Fodor (1963). Katz and Fodor had to propose two types of projection rule, whose purpose was to construct readings for constituents. Type I projection rules operated on underlying Pmarkers to construct a reading. Type II projection rules were proposed to account for the effects of generalized transformations that applied to pairs of P-markers. Katz and Postal (1964) demonstrated that the contribution of generalized transformations could be limited to the amalgamation of readings of the P-markers joined together by the operation. Thus, the only function of type II rules was to assign the reading of the embedded structure (a base P-marker) to the P-marker into which it was embedded. Another complication concerning generalized transformations involved restrictions on the organization of T-markers as discussed by Fillmore (1963). As interpreted by Chomsky (1966), Fillmore’s observations were essentially that (a) generalized transformations need not be ordered (in contrast to singulary transformations), (b) there are no cases where a matrix P-marker must be operated on by a singulary transformation before a constituent P-marker is embedded in it by a generalized transformation (though there are cases where a singulary transformation must apply to the matrix P-marker after the constituent P-marker has been embedded), and (c) the embedding operation should be viewed as a substitution operation that inserts a sentential P-marker in place of a “dummy symbol” A (equivalent to a categorially unspecified position). As Chomsky noted, the earlier theory of T-markers allowed for more complex orderings between singulary and generalized transformations. He concluded, “It is therefore quite natural to generalize from these empirical observations, and to propose as a general condition on T-markers that they must always meet Fillmore’s conditions,” as illustrated in (3) (1966, 62).
Generative grammar
74
However, as Chomsky also noted, this condition as formulated “appears to be quite ad hoc” (p. 62). Chomsky’s solution was to eliminate generalized transformations in favor of allowing recursion in phrase structure rules so that they could construct generalized P-markers directly. To account for the kind of ordering restrictions Fillmore observed, Chomsky introduced the transformational cycle as a general principle of rule application.1 The linear sequence of singulary transformations applies to the most deeply embedded sentential structure in the generalized P-marker. Then, as Chomsky outlines the process, “Having completed the application of the rules to each such structure, reapply the sequence to the ‘next-higher’ structure dominated by S in the generalized phrase-marker” (1966, 63). This continues until the root S has been so processed. Chomsky gave the argument for the cycle as follows: The advantages of this modification are obvious. It provides a more highly structured theory which is weaker in expressive power; in other words, it excludes in principle certain kinds of derivational pattern that were permitted by the earlier version of transformational theory, but never actually found. Since the primary goal of linguistic theory is to account for specific properties of particular languages in terms of hypotheses about language structure in general, any such strengthening of the constraints on a general theory is an important advance. Furthermore, there is good internal motivation for enriching the structure (and hence decreasing the expressive power) of transformational theory in this way, namely, in that this modification permits us to eliminate the notion of “generalized transformation” (and with it, the notion “T-marker”) from the theory of syntax. Hence the theory is conceptually simpler. Finally, the theory of the semantic component can be simplified in that type two projection rules are no longer necessary at all. (1966, 65) As stated, the argument rests on a mixture of empirical and conceptual/ methodological issues. The reduction in expressive power is motivated by the empirical fact that certain derivational patterns are not found in natural languages. The elimination of generalized transformations and T-markers, as well as of type II projection rules in the semantic component, simplifies the theory conceptually The general methodological goal of accounting for the specific properties of individual languages via general principles of language also favors postulating the cycle. At this stage in the development of the theory of transformational grammar, the cycle was used as an argument for eliminating generalized transformations. Yet the current theory of bare phrase structure has reintroduced generalized transformation into the theory of grammar, at the same time relying on a version of the cyclic principle. Since this does not lead to any obvious problem, it should be clear that there is no conceptual incompatibility between the cycle and generalized transformations. Actually, generalized transformations became superfluous with the introduction of recursion in phrase structure rules. Without phrase structure rules, generalized transformations are again necessary to embed one piece of phrase structure within another.
Cyclicity and minimalism
75
Following the enunciation of the cyclic principle, a new empirical argument for the cycle was introduced. The general form of the argument involved identifying a pair of transformations α and β such that the derivation of some sentence of a language required the applications α>β>α, where α had to apply before β in a particular sentential domain and then α had to apply after β in a higher sentential domain. Such arguments tried to establish that the application of β depended on the prior application of α and the second application of α depended on the prior application of β.2 Such arguments turned out to be quite fragile, primarily because they were based on particular formulations of transformational rules. Change the formulation of the rule and the argument collapsed. For example, Ross’s (1976) important analysis of the cyclic nature of pronominalization depended on the existence of a transformational operation (called pronominalization) that converts a referential expression into a coreferential pronoun. The argument vanishes under current analyses where pronouns are inserted in a derivation directly from the lexicon and later interpreted. Several other specific empirical arguments were based on the existence of a There-Insertion transformation and of a Raising transformation that included the extraposition of the infinitival predicate—rules that play no role in the current principles-and-parameters framework. However, even where these rules were assumed, this form of empirical argument for the cycle turned out to be defeasible. Kimball (1972) demonstrated that transformations could apply in a strict linear fashion: transformation T1 applies iteratively to increasingly larger domains and then another transformation T2 applies to the derived P-marker in the same fashion, so that the strict linear ordering T1>T2 holds globally. Kimball showed how such linear derivations generated the same structures that cyclic derivations produced. (See Freidin 1976 for further discussion of the empirical motivation for the cycle based on rule-ordering arguments.) The formulation of the cycle in Chomsky 1965 and 1966 left open the possibility that a singulary transformation could still apply to a cyclic subdomain of the current cycle, in violation of the spirit though not the letter of the cyclic principle. Thus, the application of a rule to a matrix sentential domain would create a context for the application of a rule to a constituent sentential domain. Chomsky (1973) proposed a sharper formulation of the cyclic principle that eliminated this loophole. This reformulation, called the Strict Cycle Condition (henceforth SCC), is given in (4). (4) Strict Cycle Condition No rule can apply to a domain dominated by a cyclic node A in such a way as to affect solely a proper subdomain of A dominated by a node B which is also a cyclic node. (1973, 243)
Chomsky (1973) mentioned one example relevant to the SCC, given in (5) (= his (57) and (58)), but did not go into the details of the analysis. (5)
a. b.
*what did he wonder where John put COMP he wondered [S COMP John put what where]
Given an underlying structure (5b), if where moves into the embedded COMP position, then movement of what into the matrix COMP is blocked by other principles—notably
Generative grammar
76
the Subjacency Condition, which prohibits movements that extract an element out of more than one bounding node at a time. The derivation of (5a) violating the SCC is given in (6). (6)
a. b. c.
COMP he wondered [S [COMP what] John put t where] [COMP what] he wondered [S [COMP t ] John put t where] [COMP what] he wondered [S [COMP where] John put t t]
In (6a–b), what moves successive-cyclically to each COMP position. Then, in (6c), the movement of where to the empty embedded COMP violates the SCC.3 This derivation is somewhat different from the kind that the original formulation of the cycle was designed to prohibit—that is, derivations where a rule could apply solely to the matrix domain prior to the embedding of the constituent domain in a way that would facilitate the application of a rule applying solely to the constituent domain. In this instance, the rule applying to the matrix domain involves a term in the constituent domain. Nonetheless, examples like (5a) constitute the empirical content of the SCC. In Freidin 1978, I identified other cases that are ruled out by the SCC, including cases of NP-movement, and showed that, given trace theory, the empirical effects of the SCC can be subsumed under other independently motivated principles—specifically, parts of the θ-Criterion, what eventually became the Case Filter, and the Subjacency Condition construed, crucially, as a condition on representations. Given that the SCC appears to be totally redundant with respect to other necessary parts of the theory, it was argued that the SCC is superfluous and can be dropped as an axiom of Universal Grammar (UG). The elimination of the SCC suggested that the facts of derivation may be epiphenomenal and that what was actually important were the representations generated by the rules of grammar. Thus, in the case of (5a) it didn’t matter whether the derivation of the sentence violated just the SCC or Subjacency (or both together on one derivation) because the output gave the same representation, which violated Subjacency construed as a condition on representations. In short, it didn’t matter whether derivations conformed to the SCC or not because any deviant sentence whose derivation violated it would be excluded by independently motivated conditions. 4.2 The current status of the syntactic cycle With the advent of the Minimalist Program, syntactic theory has undergone several fundamental changes that radically affect the former analyses of the cycle, including the empirical content of the principle. In this section, I review how the relevant changes affect the earlier analyses. Consider the derivation of (5a) under minimalist assumptions. Since there are no phrase structure rules or X-bar schemas, the phrase structure of the example is built up from the interaction of the concatenation operation (Merge) with the lexicon (perhaps via a numeration). Given that the relevant portion of the derivation concerns only the movement of the wh-phrases, I will ignore the movement and feature checking of the subjects and objects. One stage of (5a)’s derivation will be (7).
Cyclicity and minimalism (7)
77
+Q [John put what where]
The Q-feature is strong in English because it induces overt wh-movement. Therefore, under current analyses, the feature must be checked as soon as it is introduced into the derivation; otherwise, the derivation will cancel. More precisely, if the Q-feature is a feature of a complementizer C, then the derivation cancels unless it is checked within CP, a projection of C.4 Therefore, one of the wh-phrases in (7) must raise to check the Qfeature to prevent the derivation from canceling. Since this movement takes place prior to Spell-Out, the whole wh-phrase must move to [Spec, CP], giving (8). (8)
[CP what +Q [John put what where]]
Notice that the movement of what does not produce an empty category. Rather, the trace of the moved phrase is just a copy of the phrase. The derivation continues to build on (8) by adding lexical material until it reaches the external complementizer. (9)
+Q [he wondered [CP what +Q [John put what where]]]
At this point, in order to get the derivation to violate the SCC, the Q-feature of what, assuming it is + Interpretable and therefore cannot delete, should be available to raise to check the Q-feature in the matrix C. That this is ever possible seems unlikely. Generally, when a wh-phrase checks a strong Q-feature, it cannot move on. Thus, (10) is never possible even when the derivation involves movement of what through the sentential complement CP. (10)
*what did he wonder Bill bought
However, even if it were possible to raise what to the matrix [Spec, CP], the copy of what in the complement CP would block the movement of where (via substitution) to the internal CP. Furthermore, Last Resort would also block the movement of where since the Q-feature in the complement CP has already been checked and therefore there would be no motivation for moving where at all. This also holds for the possible adjunction of where to the complement CP. There is of course another derivation of (5a), which violates the SCC but would not be blocked by Last Resort. Consider the derivation that has reached the stage given in (11). (11)
+Q [he wondered [CP +Q [John put what where]]]
Assume for the moment that the unchecked strong Q-feature in the complement CP has not caused the derivation to cancel, as it would under the analysis in Chomsky 1995. The
Generative grammar
78
derivation that violates the SCC involves movement of what directly to the matrix CP, followed by the movement of where to the complement CP. In this derivation, each movement checks a strong feature and therefore satisfies Last Resort. Whether the interclausal movement to the matrix CP can occur before the intraclausal movement to the complement CP depends on how the movement operation is construed. If the operation is Move, then the closest landing site is the complement CP and therefore the interclausal movement violates the Minimal Link Condition (MLC; see Chomsky 1995, 296, (82)). But if the operation is Attract, then the interclausal movement should be allowed because there is no closer wh-phrase to attract. The intraclausal movement is allowed under either interpretation of the movement operation. Now if we reverse the order of the two movements, the results are different. If where moves to the complement CP first, then the movement of what to the matrix CP will violate the MLC even on the interpretation of the movement operation as Attract since where in the complement CP is closer to the attractor Q-feature in the matrix CP than what in complement object position. We have discovered here that under current analytical assumptions concerning the construction of phrase structure and the interpretation of movement as a feature-checking operation, the empirical motivation for the SCC as an independent principle is limited to a very particular interpretation of the theory. Thus, there is a derivation of (5a) that violates the SCC but would be tolerated by the current theory only if the movement operation is construed as Attract and not Move and, crucially, an unchecked strong feature does not cancel a derivation. Otherwise, the empirical effects of the SCC appear to be handled by an independently motivated principle (Last Resort) plus the definition of feature strength and therefore provide no motivation for postulating the SCC as an axiom of the theory Before we consider cyclicity in the literature on minimalism, it is worth asking whether the original motivation for the cycle reappears, given that generalized transformations are again responsible for generating generalized P-markers. The answer appears to be negative because the procedure for constructing phrase structure does not allow for embedding operations of the sort that existed in the earlier theory Embedding is not brought about by a substitution operation that replaces a dummy node ∆ with a sentential construct. Rather, the sentential complement becomes embedded in a VP by concatenating with a verb that constitutes the head of the phrase created. Therefore, there is no way that the matrix domain is constructed separately from the complement domain so that the matrix domain could undergo some transformational operation prior to the embedding of the complement domain.5 Therefore, the general procedure for constructing phrase structure itself precludes the possibility that the original formulation of the cycle was meant to prohibit. Chomsky (1993) proposes a constraint on substitution operations that gives in effect a version of the strict cycle. Recall that the 1993 proposal includes two substitution operations: a binary operation GT (generalized transformation), which maps two Pmarkers onto a single generalized P-marker, and a singulary operation Move α, which maps one P-marker onto another. Both operations work in the same fashion. Each operation targets a P-marker K, adds Ø (“a designated empty position”), and substitutes α for Ø. For Move α, α is a P-marker contained within the target K. The new version of the strict cycle requires that “substitution operations always extend their target” (p. 23).
Cyclicity and minimalism
79
(12) Extension Condition For substitution, Ø must be external to the targeted P-marker K. (12) prevents the substitution operation (as opposed to the adjunction operation, which creates two-part categories) from doing any kind of embedding with respect to the P-marker it operates on.
Chomsky offers two empirical arguments for this condition. First, without (12) “we would lose the effects of those cases of the ECP [Empty Category Principle] that fall under Relativized Minimality” (p. 23). Chomsky cites examples relating to superraising, the Head Movement Constraint, and wh-islands. Consider the superraising example cited in (13) (= Chomsky’s (19a)). (13)
[I′ seems [IP is certain [John to be here]]]
Chomsky is assuming here that (13) could be a stage in a legitimate derivation, whereas under the analysis in Chomsky 1995 this derivation would presumably cancel because the strong D-feature of the complement I of seems has not been checked within its maximal projection (see below for further discussion). The derivation that violates the Extension Condition involves the insertion of it into the embedded IP after John has moved to the matrix [Spec, IP], extending the targeted embedded projection of I but not the whole Pmarker, yielding (14). (14)
*[IP John [I′ seems [IP it is certain [IPJohn to be here]]]]
The strength of the argument for a cyclic principle based on Relativized Minimality violations like superraising depends on three points: (a) that such constructions could actually be generated in this way, (b) that this derivation is not prohibited by any other principle or analysis that is more general, and (c) that there are no other possible derivations of these constructions that are not ruled out by the Extension Condition. In the latter case, we might wonder whether some generalization is being missed. Regarding (a), it is not obvious that a [Spec, IP] position in the complement of seems can be created after that IP has been embedded in the matrix VP by concatenating with the matrix verb. For one thing, such an operation raises nontrivial questions about derived constituent structure (see section 4.3 for further discussion). Nonetheless, there is a derivation of the appropriate sort that will not run into such problems: namely, moving John successive-cyclically first to the [Spec, IP] of the complement of seems and then to the matrix [Spec, IP], then substituting it for the intermediate trace of John. Under the copy theory of movement operations, we might be inclined to disallow the substitution of it on the grounds that it is distinct from John and therefore the operation would violate the nondistinctness conditions on substitutions (cf. Chomsky 1965). The effect of the nondistinctness condition is to prohibit the deletion of information from a derivation. In this way, it is really just another way to express the recoverability condition on deletions. Suppose we recognize this redundancy between the two conditions and eliminate nondistinctness in favor of recoverability, which has a broader coverage. Now, does the deletion of a trace violate recoverability? Presumably, it does not in the case of PF since
Generative grammar
80
trace deletion there is general requirement (see Nunes 1995). In the case of LF, the trace of the foot of a chain may be necessary for interpretation, but apparently the intermediate traces of the chain in question are not and thus presumably could be deleted without violating recoverability. If so, then substitution of it for John should be allowed. This suggests that the copy theory of movement does not block trace erasure via substitution. The analysis carries over to all the examples considered in Freidin 1978— for instance, (5)–(6). Interestingly, both (5a) and the superraising case (14) contain chains that violate Subjacency. However, if there is no level of S-Structure, as assumed in current work, then Subjacency interpreted as a condition on representations would have to apply at LF. Nonetheless, we probably do not want to rely on such an analysis because the notion of bounding category is essentially unmotivated under minimalist assumptions and therefore suspect. Fortunately there is another, more fundamental way to eliminate these derivations without recourse to suspect notions like bounding category or appeals to a cyclic principle. Let us suppose that there are no substitution operations in UG. That is, we assume that the elementary operations of UG are limited to concatenation (adjunction) and deletion. In earlier theories, the substitution operation was required for lexical insertion and therefore it was reasonable to assume that it was generally available for movement operations as well. Now that lexical items are inserted into a derivation by the concatenation operation, there is no reason to assume the existence of movement by substitution. If this analysis is on the right track, then it is at least questionable whether there is a countercyclic derivation of the superraising construction (14). Furthermore, the proposed analysis automatically addresses point (b) above—namely whether the countercyclic derivation is prohibited by some other principle or more general analysis. Putting aside the fact that the derivation violates Subjacency construed as a condition on chain representations at LF, it is prohibited by the elimination of substitution as an elementary transformational operation of UG. The only remaining question is whether concatenation can apply to a subphrase of a P-marker. See section 4.3 for discussion of why this may be generally prohibited. Let us turn now to point (c), concerning other possible derivations of these constructions that do not violate the Extension Condition. Given the separation of feature-moving operations and category-moving operations, it may be possible to derive the superraising violations without violating either the Extension Condition or the strong feature analysis. Consider, for example, the following derivation. First, the D-feature of John in (13) raises to check the strong D-feature of is. Next, it is merged with the resulting P-marker, checking the agreement and Case features of is and the Case of it. Notice that the insertion of it here does not violate the Extension Condition, nor does it involve leaving an unchecked strong feature, which would automatically cancel the derivation. At this point, the D-feature of John can raise to the matrix I, checking another strong D-feature. Suppose then that the nominal phrase John merges with the matrix projection of I, thereby generating the violation of superraising that we are attempting to avoid by invoking cyclicity in some form, but without violating cyclicity. This derivation raises several questions. One crucial question concerns whether or not feature movement involves only the movement of a single feature or, as assumed in Chomsky 1995c,6 the set of formal features of a lexical item—FF(LI). If the former, then
Cyclicity and minimalism
81
the cyclic merger of it could be involved in checking the agreement and Case features of is, as proposed. If the latter and if the pied-piped FFs enter into a checking relation with I, then the Case feature of it would not be checked, causing the derivation to crash at both LF and PF. Under this analysis, the superraising construction violates Full Interpretation, not some formulation of the SCC. Suppose, however, that even under the interpretation that Move F automatically involves the pied-piping of FF(LI), it is possible that feature checking of Case and agreement features involves the specifier-head relation. This ought to be possible because that is exactly how all the features of I will be checked in a simple expletive construction (e.g., (15)). (15)
it is likely that John will be here on time
That is, it will merge with is likely that John will be here on time, creating a specifierhead relation with is and checking not only its—Interpretable Case and agreement features, but also its strong D-feature. So even on the pied-piping analysis of feature movement, it may not be necessary for the “free rider” features, those that have been carried along, to enter into feature-checking relations when the feature that carries them does. Another question that arises concerns the movement of John rather than it to the matrix [Spec, IP]. If it were a simple question of movement of categories, then movement of the nominal John over the nominal it would violate the MLC. However, under the Move F analysis, the situation is not so straightforward. Given that minimally only a feature (along with other associated FF(LI)) will move to check the strong D-feature of the matrix I, it is possible that the moved D-feature of John will move again, rather than the features of it in the complement [Spec, IP]. Notice that the two positions are equidistant, because they are both in the checking domain of I (as discussed above), and therefore cannot be distinguished so that the MLC will prefer one movement option over the other.7 The final step in the derivation of (14) involves the movement of the nominal category John to the matrix [Spec, IP]. If category movement is motivated solely for the purpose of convergence at PF but not for feature checking, then John moves directly from the IP complement of certain, presumably attracted by its FFs that merged with seem. This movement violates the MLC only if it can be attracted to the matrix [Spec, IP] instead of John. But if all the FFs of it have been checked in the complement of seems, then there is no reason to move it at all. If this is correct, then the derivation of the superraising case converges. The convergent derivation of the superraising construction involves at least one highly suspect property: namely, that the features of a finite verb (or I) can be checked by the features of different nominal expressions. This could be avoided if feature checking is obligatory. Thus, the Case feature of John will be checked by is and therefore could not be raised to the matrix clause to check the Case feature of seems. This would follow from the economy condition on fewest steps. Checking the D-feature of I by raising the FFs of a nominal expression and then checking the Case and by merger of an expletive counts as two steps whereas checking all these features using only the raising operation counts as one step. This interpretation would ensure that the expletive winds up in the matrix [Spec, IP] since this is the only position in which its Case feature will be checked.8
Generative grammar
82
Under this analysis, (14) could result only if the expletive checks all the relevant features of is. Then the movement of John from the complement of certain directly to the matrix clause would be blocked by the MLC if Move F applies to the D-feature or the of the nominal. However, if Move F applies to the Case feature, then the MLC would not block the movement because the Case feature of it, having been checked, is no longer accessible. Thus, (14) could still converge, even if the two-step feature-checking option discussed above is blocked. Alternatively, the MLC could be strengthened so that the FFs that get carried along and enter into a checking relation are also considered in the determination of minimal links. Then, even when the Case feature chain does not violate the MLC, the D- and chains will. If the superraising construction (14) does actually converge, then we might expect that some principle of economy will prefer the derivation of (16) to that of (14). (16)
[IP it [I′ seems [IPJohn is certain [IPJohn to be here]]]]
Thus, at the point in the derivation shown in (17) where there is a choice between moving the features of John and merging it, Move F is preferred to Merge. (17)
[I′ is certain [IPJohn to be here]]
If this generalizes to all derivations, then singulary transformations will be preferred to generalized transformations. Presumably, the computational system for human language (CHL) tries to make maximal use of what is already in a P-marker before incorporating additional material (but cf. Chomsky 1995c, 347). Like the superraising case we have considered in such detail, the Head Movement Constraint (HMC) case is assumed to involve the insertion of a lexical item within a Pmarker. An HMC violation can be generated from (18) if fix is moved to C and then can is inserted in VP or IP. (18)
[C′ C [I′ John [VP fix the car]]]
The Extension Condition blocks the insertion of the modal. Such potential derivations are dubious on quite general grounds. C normally attracts only the finite form of the verb. If so, then there will be no reason for a nonfinite verb form to move to C and hence no way to generate HMC violations. If this line of reasoning is viable, then perhaps the entire range of HMC problems is just the result of an overly general and imprecise analysis. Unlike the superraising and HMC cases, the wh-island case does not involve an instance of lexical insertion that is supposed to violate the Extension Condition. One derivation from (19) violates the Extension Condition straightforwardly. (19)
[C′ C [IP John wondered [C′ C [IP Mary fixed what how]]]]
If Move α targets the matrix C′ creating Ø external to C′ and substituting how for Ø then (12) prevents Move α from targeting the complement C′ and moving what to the
Cyclicity and minimalism
83
complement [Spec, CP].9 This derivation is also ruled out because of the unchecked strong Q-feature in the complement CP at the point that how is moved to matrix CP. So far, there is no strong empirical argument for the Extension Condition based on these examples. There is of course a derivation of the wh-island violation (20) from (19) that violates neither the Extension Condition nor the strong feature cancellation analysis: namely, what moves to the complement [Spec, CP] and then how moves to the matrix [Spec, CP]. (20)
[CP how did [IP John wonder [CP what C [IP Mary fixed what how]]]]
The second movement constitutes a clear violation of the MLC if the Q-feature of what in the [Spec, CP] of the complement could be attracted to check the Q-feature of the matrix C.10 Given these analyses, no Relativized Minimality violation provides strong evidence for a cyclic principle. Another empirical argument for the Extension Condition that does not involve Relativized Minimality violations concerns raising to complement position. Chomsky (1993) notes that given the Extension Condition, a structure like [X′ X YP] cannot be mapped onto [X′ X YP ZP] where ZP is raised from YP or inserted by GT. The strength of this argument depends on two closely linked assumptions: (a) that this consequence is unique to the Extension Condition and (b) that substitution operations will construct these structures that the Extension Condition is needed to prohibit. The second assumption, on which the first relies, seems far from obvious. It presupposes a substitution operation so unstructured that it can insert a designated empty position virtually anywhere in a Pmarker. Furthermore, the operation that could map [X′ X YP] onto [X′ X YP ZP] would be both structure-building and structure-destroying in a way that substitution operations are not supposed to be. So it seems doubtful that this kind of mapping would arise naturally. Chomsky (1995c, 328) cites another case related to overt cyclicity involving the interaction of NP- and wh-movements (also A- and Ā-movements). (21)
*who was [α a picture of twh] twas taken tα by Bill
The wh-movement violates the Condition on Extraction Domain (GED), but can be derived without violating that condition if wh-movement to [Spec, CP] precedes NPmovement into [Spec, IP] countercyclically. Chomsky states that the countercyclic derivation is prohibited by the current analysis of feature strength. The strong feature of I unchecked by NP-movement will cause the derivation to cancel before wh-movement can apply. He also suggests two other ways to exclude the countercyclic derivation of (21). But before we examine those, it is worth taking a closer look at the strong feature analysis. For (21), too, the separation of category movement from feature movement has the potential for undermining the strong feature cancellation analysis. Thus, it might be possible for the strong D-feature of I to be checked via feature movement and still move who from the VP position. This would entail that it is possible to create the [Spec, IP] position after the phrasal projection of I has been embedded in CP as the complement of
Generative grammar
84
C. The overt movement of the NP is not forced by the strong feature cancellation analysis unless we assume that the strong feature of I is checked as a result of that movement, contrary to the Move F analysis. Therefore, the countercyclic derivation of (21) appears to argue for retaining the Extension Condition. The problem with the Extension Condition is that it should prohibit feature movement generally since such operations (including overt head movement) do not extend the targeted P-marker as required.11 Chomsky (1995c) suggests two other ways of preventing the countercyclic derivation of (21). One involves some notion of economy. Although the NP-movement in both derivations is the same, the wh-movement in the countercyclic derivation is longer and therefore might be excluded by comparing length of steps in derivations. As Chomsky notes, this is not appealing because it requires “a ‘global’ notion of economy of the sort we have sought to avoid” (p. 328). This would require that both derivations converge even though the construction is clearly deviant; therefore, the admissible (by hypothesis) cyclic derivation would have to be blocked in some other way. Another alternative Chomsky considers is the provision that “α can be attracted to K only if it contains no trace” (p. 365, (200)), which he proposes as a strengthening of the proviso that only the head of a chain may be attracted. This rules out the NP-movement in the countercyclic derivation rather than the wh-movement.12 Under this analysis, we lose the generalization expressed by the CED that extractions out of subject phrases in finite clauses are generally prohibited whether the subject phrase is created by movement or merger. So far, we have been discussing how to eliminate the countercyclic derivation of (21). However, since (21) is deviant, all derivations must be eliminated—either the derivation is prohibited or the representations derived violate some output condition at PF or LF. Although Chomsky mentions the CED in reference to the cyclic derivation, we cannot appeal specifically to this condition because its formulation involves the notion “government,” which has been excluded from discussion within the Minimalist Program on the grounds that it is illegitimate. Therefore, the cyclic derivation of (21) must be excluded in some other way. To this end, let us take a closer look at the cyclic derivation under the copy theory of movement, which involves the three steps illustrated in (22). (22)
a. b. c.
[β a picture of who] was taken [α a picture of who] was [β a picture of who] was taken [α a picture of who] who was [β a picture of who] was taken [α a picture of who]
In (22a,b), α is a trace of its copy β. However, in (22c), β now contains a trace of who whereas α does not. Suppose that this renders α distinct from β so that α cannot function as the trace of β. The derivation would crash at LF because the nominal a picture of could not be assigned a θ-role. In other words, (21) constitutes a θ-Criterion violation— specifically, Functional Relatedness (Freidin 1978), now subsumed under FL Note that if the cyclic derivation crashes, then we cannot resort to economy to rule out the countercyclic derivation. Returning now to the countercyclic derivation of (21), we have a number of options for excluding it. Under the copy theory of traces, the relevant steps of the derivation proceed as in (23).
Cyclicity and minimalism (23)
a. b. c.
85
was [IP was taken [α a picture of who]] who was [IP was taken [α a picture of who]] who was [IP [β a picture of who] was taken [α a picture of who]]
In (23b,c), who in α is the trace of who in the matrix [Spec, CP]. In (23c), who in β, which is presumably a copy of a, is also a trace of who in [Spec, CP]. By construction, β binds α and who in [Spec, CP] binds who in α. What is not clear is the relation between who in [Spec, CP] and who in β. If we suppose that who in [Spec, CP] binds who in β, then we might object to the chain (who, whoβ, whoα) on the grounds that the link between whoβ and whoα is ill formed because whoβ does not bind whoα. Alternatively, if we impose the derivational definition of c-command (Epstein 1991), then who in [Spec, CP] will not c-command β, hence who in β. Since by construction who in β is a trace, β contains an unbound trace in violation of the Proper Binding Condition. Under the copy theory of movement, there may be yet another explanation for the deviance of (21) regarding trace deletion. Consider the standard assumption that nontrivial chains must reduce at PF to trivial chains; that is, all traces must be erased because nontrivial chains are illegitimate objects at PF, violating FL The general case involves a chain whose linear order reproduces the linear order of the copies in the Pmarker and in which each adjacent pair of copies is in an asymmetric c-command relation such that the head of the chain asymmetrically c-commands the trace adjacent to it, that trace asymmetrically c-commands the trace adjacent to it, and so on to the foot of the chain. Such chains generally reduce to trivial chains at PF. Before we discuss how this situation does not obtain in the derivations (22) and (23), we need to consider whether a copied element is necessarily always a trace of its copy. In this regard, consider (24). (24) [NP books about language] seem [IP [NP books about language] to be selling well]
Presumably, there is only one chain to consider, the chain involving the NP books about language and its trace in [Spec, IP]. That is, there is no chain between books in the matrix IP and the copy in the complement IP, and similarly for about and language. From this perspective, what is odd about a derivation like (22) is that a copied element who is not a trace when the phrase containing it is moved, but then becomes a trace when it alone is moved later. One way to deal with this is, as suggested above, to treat the phrase containing the nontrace copy as distinct from the phrase containing the trace copy Presumably this will block the required θ-role assignment. Furthermore, α and β would not form a chain and hence there would be no reason to delete α at PF. Alternatively, we might resolve the issue as follows. Suppose that once a head-trace relation is established between two copies in a derivation, all copies must be considered as part of the chain. Thus, in both (22) and (23), the chain (25) is what must undergo chain reduction via trace deletion. (25)
(who, whoβ, whoα)
Generative grammar
86
However, (25) does not conform to the standard case because whoβ does not asymmetrically c-command whoα (cf. Takahashi 1994). The counter-cyclic derivation adds another failure of asymmetric c-command between the first two elements of the chain on the derivational c-command analysis. The first is of course sufficient to block trace deletion in both derivations. As a result, both derivations crash at PF because they both contain illegitimate nontrivial chains, in violation of FI. If this is a viable analysis, then it should be preferable to one that requires an additional stipulation, like the Extension Condition, to impose cyclic derivations. So far, we have yet to identify a strong empirical argument for the strict cycle based on the Extension Condition.13 The empirical motivation for the Extension Condition based on Relativized Minimality violations seems to dissolve when we probe the details of the analysis. Chomsky (1995c) reassesses these cases, noting that they involve two situations: (a) countercyclic operations and (b) skipping an already filled position. As we have seen with the superraising and wh-island cases, the MLC alone blocks countercyclic operations. Chomsky claims that the countercyclic insertion of heads does not arise (viz., HMC violations) because heads are inserted only by pure merger, which satisfies the Extension Condition.14 The other situation, involving skipped positions, is generally blocked by the MLC. This leads Chomsky to conclude that there may be no need to impose the Extension Condition on overt category movement. 4.3 Generalized cyclicity Although it seems that all the empirical consequences of SCC violations identified so far can be handled by other mechanisms that have been proposed, the question remains whether derivations do in fact adhere to cyclic operations as a result of general constraints on the computational system. In the earliest account that attempted to derive the empirical effects of the SCC from principles of UG (Freidin 1978), these effects are a fortuitous consequence of the combined effects of several principles of UG. In the more recent analyses discussed above, countercyclic derivations are blocked by various stipulations—for example, that substitutions must create positions external to the targeted P-marker (the Extension Condition) or that an unchecked strong feature in a maximal projection cancels a derivation. These seem like retreats from earlier deductive accounts because they are based on a descriptive statement rather than on general principles that have some conceptual plausibility. The analysis based on FI and the MLC is an attempt to reconstruct a more deductive account under minimalist assumptions. In this section, I review some more recent attempts, also based on minimalist assumptions, to deduce strict cyclicity from more fundamental principles of grammar. I begin with a brief discussion of the nature and role of transformation under a theory of bare phrase structure and end with a somewhat different perspective on cyclicity that follows naturally from the fundamental concepts of that theory. For concreteness, let us assume the following version of the bare phrase structure theory. Phrase structure is constructed by classical adjunction, a structure-building operation that concatenates two categories and projects the label of one of them. This operation can apply to two items in a numeration (or perhaps directly in the lexicon), or to a lexical item and a P-marker that has been constructed as part of the derivation, or to
Cyclicity and minimalism
87
two such P-markers. All three possibilities can generate a specifier-head configuration, whereas perhaps only the first two can generate a head-complement configuration. When the elements adjoined are not from the same P-marker, the operation acts as a generalized transformation. In contrast, the adjunction elementary can apply to two elements of a single P-marker, in which case the operation acts as a singulary transformation. Even though the adjunction elementary is involved in lexical insertion, phrase structure generation, and movement (both overt and covert), the conditions on its application differ depending on whether it operates as a generalized or singulary transformation. In the former case, adjunction operates on categories; in the latter, it is presumably restricted to the adjunction of a feature to a category. In standard practice, the former case is called Merge and the latter, Move (or Attract/Move). In addition, there must be some further operation that accounts for the pied-piping of a phrasal category in the case of overt category movement. From this perspective, the issue of cyclicity concerns the application of the adjunction elementary to a nonroot category during a derivation, since presumably this application could have occurred at the point in the derivation where the adjunction site was the root node. Thus, adjunction to a nonroot, either by Merge or by Move, constitutes a countercyclic application. In other words, every category in a P-marker marks a cyclic domain in some sense, not just those categories that form domains for movement rules as in earlier analyses. Kitahara (1997) attempts to account for generalized cyclicity in terms of a new analysis of the operations Move and Merge. He claims that there is a distinction to be made between the cyclic and countercyclic (noncyclic in his terminology) operations of Move and Merge. Cyclic Move, for example, consists of a single elementary operation: concatenate α and Σ, forming Σ′. It should be obvious that this is just classical adjunction. This operation maps (26a) onto (26b). (26)
In contrast, countercyclic Move is more complicated. The targets of the operation, α and K, are contained in Σ, so concatenation of α and K forms L={γ, {α, K}}. The insertion of L into the P-marker now requires an extra operation, which Kitahara calls “replacement” mapping (27 a) onto (27b).
Generative grammar (27)
88
a.
Thus, countercyclic Move on this analysis requires two elementary operations instead of one. Kitahara applies the same analysis to Merge. The preference for cyclic operations over countercyclic operations results from a general economy condition that prefers shorter derivations. (28) Shortest Derivation Condition Minimize the number of elementary operations necessary for convergence. (Kitahara 1997, 32)
Under this analysis, cyclicity of rule application follows from general economy considerations.15 Even though Kitahara’s analysis is more general and therefore may seem more appealing than one based on feature strength, the claim that there is an elementary operation of replacement seems problematic. For one thing, it seems doubtful that this elementary operation plays a role in legitimate derivations. If it does not, then there is no motivation for claiming that it exists at all. Its sole purpose would be to act as an extra operation so the economy analysis succeeds. We could just as easily say that the function postulated for this operation is illegitimate, thereby eliminating the need to appeal to economy—the line of inquiry I will pursue below. Kitahara claims that the erasure operation is an instance of replacement in which a feature F is “replaced” by the empty element Ø. However, there is no clear reason why this replacement is required if by erasure F becomes inaccessible to any further operation of CHL.16 The minimal solution should be simply to eliminate the feature. Yet even if we grant that there could be an elementary operation of replacement that is responsible for erasure of features, it alone could not account for the work Kitahara attributes to it in the case of countercyclic Move. In erasure, the replacement operation has the effect of simply substituting a feature for the empty element. In Move, as shown in (27), replacement results in the deletion of Σ and the projection of Σ′ from L as well as the substitution of L for K.17 The deletion of Σ here cannot be the result of replacement so some other elementary operation would have to be involved. Under earlier theories of elementary operations (see Freidin 1992), there were a maximum of three such operations, substitution, adjunction, and deletion, each of which performed one of the three basic operations on structure: structure preservation, structure creation, and structure destruction, respectively. Although a structure-preserving
Cyclicity and minimalism
89
operation has been dropped in recent analyses, the other two kinds of operation are surely conceptually necessary. On minimalist assumptions, we would expect that elementary operations perform only these basic and simple structural tasks. From this perspective, Kitahara’s replacement operation, when it applies countercyclically, performs several truly elementary operations, including the structure-destroying erasure of Σ and the structure-building replacement of K with L.18 Hence, it is clear that the operation of replacement as characterized for countercyclic movement operations in Kitahara 1997 is simply not an elementary operation; therefore, it seems unlikely that it would be part of CHL. Kawashima and Kitahara (1996), Collins (1995, 1997), and Epstein et al. (1998) make proposals to block countercyclic derivations generally as violations of Kayne’s (1994) Linear Correspondence Axiom (LCA), a general constraint on linear ordering in Pmarkers. Rather than discuss the different proposals here, I will restrict my comments to the analysis presented in Collins (1997). Collins claims that countercyclic operations generally yield structures that violate the LCA. Consider, for example, his discussion of countercyclic Merge. He assumes that given a set of syntactic objects Σ (i.e., a numeration) containing three elements (SOs), a countercyclic derivation like (29) can occur. (29)
a.
Σ = {α, β, γ}
b.
Merge (α, β) Σ = {γ {α, β}}
c.
Merge (γ, α) Σ = {{γ, α}, {α, β}}
The step (29c), which is allowed by Collins’s definition of Merge (which I will not discuss here), creates a structure in which the two distinct nonterminals dominating β and γ are not in a c-command relation; hence, neither asymmetrically c-commands the other, in violation of the LCA. Collins gives (30) as an illustration of the structure generated by (29). (30)
As a concrete example, Collins gives a derivation for the countercyclic insertion of a subject into [Spec, VP] after VP has been merged with T. The relevant steps are given in (31). (31)
a. b.
Σ = {John, {T, {saw, me}}} Merge (John, {saw, me}) = {John, {saw, me}} Σ = {{T, {saw, me}}, {John, {saw, me}}}
Generative grammar
90
In this case, John and T are the terminals that fail to be ordered with respect to each other. Collins notes that this analysis for countercyclic merger applies as well to countercyclic movement. Though intriguing, the LCA analysis of countercyclic Merge seems seriously flawed. First, note that Collins’s treatment of Merge would not be allowed under Chomsky’s characterization. Clearly, then, CHL must include a second procedure that combines syntactic objects already formed. A derivation converges only if this operation has applied often enough to leave us with just a single object, also exhausting the initial numeration. The simplest such operation takes a pair of syntactic objects (SOi, SOj) and replaces then by a new combined syntactic object SOij. Call this operation Merge. (1995c, 226) Under Chomsky’s formulation, Merge creates a new syntactic object, which surely must be treated as a single object by subsequent legitimate applications of Merge. Presumably by “replacing” the pair of syntactic objects with a new single object, Merge loses the ability to access the parts that have been replaced.19 Furthermore, there is a crucial difference between the unordered set of elements in the initial numeration and the set of elements created by Merge—namely, the latter constitute ordered sets (assuming of course that Merge gives a linear order to the objects it concatenates).20 So instead of (31a) as input to Merge, we have (32). (32)
Σ = {John,
>}
Under Chomsky’s characterization of Merge, (32) contains two syntactic objects (John and the ordered set>), not three (John, T, <saw, me>) or four (John, T, saw, me).21 Turning now to countercyclic Move, it may be that some version of the LCA will prohibit the operation from applying in legitimate derivations—presumably, such derivations will crash at PF because they contain elements that cannot be assigned a linear order. The analysis based on the LCA assumes that countercyclic operations are in some sense possible, though they have consequences that are later filtered out. Alternatively, the bad consequences of a countercyclic operation might be immediate, as I will suggest in what follows. What I would like to propose is that the issue of countercyclic movement crucially concerns the question of derived constituent structure, which is central to transformational analysis. Recall that from the beginning of modern generative grammar (e.g., Chomsky 1975a), transformations are defined as operations that apply to analyzed strings. As Chomsky notes (1975a, 320), “We must provide a derived constituent structure of transforms, for one thing, so that transformations can be compounded.” Thus, if we cannot specify the derived constituent structure for a transform, then it cannot serve as the input to another transformation. In earlier theories, a significant amount of information about derived
Cyclicity and minimalism
91
constituent structure came from phrase structure rules. For example, Emonds’s structurepreserving hypothesis (see Emonds 1970, 1976) proposed that the derived constituent structure from the application of structure-preserving rules was determined by the phrase structure rules of the base (cf. Chomsky’s (1975a) discussion of the derived structure of the passive by phrase). However, with the demise of phrase structure rules and the advent of bare phrase structure theory, we no longer have phrase structure rules or schemas to provide derived constituent structures. Under the bare phrase structure theory, the transformational operations that create constituent structure also specify derived constituent structure. On minimalist assumptions, we might expect that the processes that create constituent structure are the only ones that provide derived constituent structure. In other words, there are no separate processes for determining derived constituent structures. From this perspective, consider the problem of assigning derived constituent structure to an operation that moves a phrase to concatenate with a nonroot category. For concreteness, suppose that the object of a passive predicate is moved to form [Spec, IP] after the IP has been merged with a complementizer as in (33). (33)
[CP that [IP was elected [NP Adam]]]
Notice that in (33), the target of the operation is IP, the maximal projection of I and the complement of that. When the NP Adam concatenates with IP, IP must project a category so that the NP will now by construction c-command the IP. The syntactic relations between the moved phrase and its target are no different from those that would have arisen if the target had been the root category at the point where the movement applied. However, it is not clear that the moved constituent or the newly projected IP bear any relation to the complementizer that c-commands the target IP. To see how this works, consider the structure derived from the cyclic movement of the object to [Spec, IP]. (34)
[CP that [IP [NP Adam] [I* was elected [NP Adam]]]]
In (33), CP immediately dominates the IP was elected Adam, whereas in (34), CP immediately dominates the IP Adam was elected Adam. To construct (34) from the countercyclic derivation, the syntactic relation between the complementizer that and the moved NP Adam would have to be redefined; and, as a result, the relation between that and the IP was elected would also have to be redefined. These redefinitions would require some additional grammatical mechanism. Therefore, unless such a mechanism can be motivated on independent grounds, there is no reason to assume it is available.22 Given that there is no redefinition mechanism, what happens when we attempt to move a phrase to concatenate with a nonroot category? One possibility is that when the concatenation occurs, the nonroot category projects to create a three-dimensional Pmarker, which can then be ruled out by the LCA. However, if there is little motivation for three-dimensional P-markers in the first place, then we want to avoid postulating such entities. However, if P-markers are only two-dimensional, there is no way for a category to project once it has become a constituent of another category. If the concatenation via
Generative grammar
92
movement cannot be carried out because the target cannot project in the normal way, then it seems reasonable to assume that such operations are not possible. This analysis raises questions about the status of head movement and feature movement to heads more generally, as well as the status of covert movement. With respect to covert movement, there are two primary cases to consider: Quantifier Raising (QR) and object movement (e.g., in English). There are proposals for the analysis of quantifiers that eliminate QR (see Kitahara 1996; Hornstein 1995). If object movement is actually overt in English, as argued by Lasnik (1995c) and others, then perhaps there is no covert movement of categories at all. Turning to head movement, notice that this operation causes a problem only if we insist that the output is a string consisting of the moved head followed by the functional head it is adjoined to. The adjunction analysis appears to be redundant given that the head itself enters the derivation with whatever features the functional head represents (e.g., tense or agreement). Given that there is no need for such redundant constructions, a less problematic way to effect head movement would be to substitute the lexical head for the functional head. Thus, there may be a limited role for substitution in grammar after all. The substitution analysis for head movement would eliminate the problem posed by the absence of redefinition mechanisms.23 If this analysis is on the right track, then the reason neither countercyclic Merge nor countercyclic Move can apply is that the elementary operation itself does not permit this. On minimalist assumptions, this should be the optimal explanation. It makes no appeal to external factors like economy, constraints on linear ordering, or other UG principles. Certain kinds of derivations are excluded simply on the grounds that the elementary operations of CHL are so constructed that they cannot occur. Notes
* A version of this chapter was presented at the Fourth Seoul International Conference on Linguistics (August 1997). I would like to thank Dong-Whee Yang and the other participants for their comments. I would also like to thank Sam Epstein and Norbert Hornstein for useful comments on an earlier draft, and Howard Lasnik for helpful discussion of some of the issues discussed herein. 1 It is worth noting here that Chomsky (1966) does not actually refer to his solution as “the cycle” or mention “the cyclic application of rules.” The first mention of the notion “transformational cycle” occurs in Chomsky 1965, 29, in reference to the application of phonological rules. On page 134, Chomsky proposes that “the rules of the base… apply cyclically, preserving their linear order,” to generate generalized P-markers. Two paragraphs later, Chomsky proposes that the linear sequence of singulary transformations apply cyclically to generalized P-markers. 2 Note that such arguments never appear in Chomsky’s discussion of the cycle, nor are they referred to as evidence for the cycle. 3 This analysis assumes that the trace of the moved wh-phrase what can be replaced by the whphrase where. As will be discussed below, such analyses may not be possible under current assumptions about the nature of traces. For detailed discussion, see Freidin 1978. 4 According to Chomsky (1995c, 233), a strong feature is simply “one that a derivation ‘cannot tolerate’” and therefore causes a derivation to cancel unless it is checked when it is introduced. This principle of cancellation requires that the strong Q-feature not be attached to the wh-phrase. If it were, any derivation containing a wh-phrase with a strong Q-feature would cancel.
Cyclicity and minimalism
93
5 Note, however, that the analysis in Chomsky 1993 explicitly assumes that the generalized transformation is a substitution operation and that P-markers corresponding to those of Chomsky 1975a could be constructed. Thus, the original problem can arise for which the cycle was proposed as a solution. Given that such analyses are not possible under a theory of bare phrase structure, I will not discuss Chomsky’s (1993) analysis further here. 6 See page 265, where Chomsky claims that “there are strong empirical reasons for assuming that Move F automatically carries along FF(LI)” (though he cites none). 7 This equidistance opens up the possibility that the FFs of it rather than John will raise to check the features of seem. In fact, this would be forced if the Case feature of John was checked by is, leaving the Case feature of it unchecked unless it moves to the matrix clause. Note that this creates the rather odd situation where the Case feature of a phonetically realized nominal expression is checked even though the nominal itself remains in a “Caseless position.” In other words, this feature-checking analysis of Case allows us to dissociate Case from the phonetically realized nominal that bears it, whereas under the old Case Filter analysis it was the phonetically realized nominal expression itself that was subject first and foremost to the Case Filter.
If the FFs of it do move, then we might expect that it itself would also raise to the matrix clause, rather than John, because the FFs of it were raised to check the features of seem. Yet it appears that the FFs of it and John are identical (i.e., the and the Case feature), so perhaps there is no way to distinguish which set of features goes with which nominal category once they have been moved. If θ-roles belong to the set of FFs (see Lasnik 1995c; Bošković and Takahashi 1998), then there is a way to distinguish the two sets of FFs. This would avoid the situation of having the features of one expression move for checking and the category of another expression move to satisfy convergence at PF. 8 There is still the possibility that the expletive will be merged in the complement of seems and raised into the matrix clause. This would presumably block the movement of John to [Spec, IP] of the complement of seems. Strangely, then, even though the Case feature of John has been checked, John remains in a Caseless position. This could be avoided if at least the Case feature is checked only in a specifier-head relation—a return to the more traditional analysis. The problem with this approach is that it runs afoul of the just-mentioned economy condition on fewest steps. Furthermore, it appears to undermine the analysis of feature movement as adjunction to a head. If some features must be checked via the specifier-head relation, then and D-features)? Alternatively, we could block the why not all features (e.g., merger of the expletive in the complement of seems on the grounds that it does not enter into any checking and/or θ-relation in that IP. 9 The trouble with this analysis is that in focusing on the derivational history, we are forced to provide two quite different explanations for why wh-island violations are prohibited, depending on which movement occurs first. Alternatively, from a representational perspective, there is only one problem no matter how the derivation proceeds: namely, the long-distance trace binding violates the MLC. However, this analysis crucially depends on the MLC’s not being part of the definition of the movement operation and also on there being at least one internal output condition, hence not a bare output condition. 10 It seems as if we are forced to this conclusion even though it is generally not possible for a wh-phrase to check more than one Q-feature in C (see (10)), which suggests that the Qfeature is more like a Case feature on a nominal than a D-feature. Otherwise, there is no obvious way to block the derivation. Notice that unlike the superraising case, where there is an admissible convergent derivation (16), the wh-island case yields only deviant results. 11 On the basis of this consideration, Chomsky (MIT class lectures, fall 1995) has rejected the Extension Condition as the right formulation of the cyclic principle.
Generative grammar
94
12 Chomsky suggests that this restriction “may be too strong even within the core computational system” (p. 365), but says no more about it. Notice that there are other, more general problems with this proposal. First, it is essentially a complicated stipulation, which should be avoided unless it has some as yet undisclosed broad empirical effects that only it can achieve. Second, it is not clear how it applies if feature and category movement are separated. Presumably, categories do not move to check features, so it is not clear how a category (whether it contains a trace or not) will be attracted to K in the first place. 13 In a note, Chomsky (1993, 46, n. 26) remarks,
Depending on other assumptions, some violations might be blocked by various “conspiracies.” Let us assume, nevertheless, that overt substitution operations satisfy the extension (strict cycle) condition generally, largely on grounds of conceptual simplicity. If the analyses of the various empirical problems that appear to bear on the issue of cyclicity are on the right track, then imposing a cyclicity condition on derivations like the Extension Condition cannot be defended “on grounds of conceptual simplicity” because it requires a stipulation that is unnecessary. All the empirical examples cited as evidence for cyclicity can be handled by independently motivated conditions that can be argued to be intrinsic parts of CHL. 14 In Chomsky 1995c, 248, this is virtually stipulated on the grounds that merger with nonroot categories would constitute a complication with potentially serious consequences that would require strong motivation. In the absence of such motivation, Chomsky assumes that “Merge always applies in the simplest possible form: at the root.” As we will see, this may follow from the nature of elementary operations and therefore need not be stipulated. 15 This is basically the same sort of analysis that appears in Kitahara 1995. The earlier analysis is not valid under bare phrase structure theory, whereas this one is not obviously incompatible. 16 Kitahara (1997, 34) claims that Ø is “an actual symbol of mental representation with no feature,” but does not explain how it serves any particular function. On minimalist grounds, it would be preferable to avoid postulating such apparently functionless elements. It is also not obvious that Ø will not cause a problem for FI at one interface or both. 17 Note that the projection of Σ′ cannot be attributed to the concatenation of α and K, which projects L. 18 Not only Σ is lost, but also the fact that K was an immediate constituent of Σ. 19 If this is correct, it excludes proposals that head movement may be accomplished by first concatenating a head that has been embedded in a P-marker with a functional head via merger outside the P-marker and then reintroducing the complex via merger with the P-marker (i.e., sideward movement (Nunes 1995) or interarboreal movement (Bobaljik and Brown 1997)). 20 The LCA depends on this assumption as well. Since it seems to be a fairly standard assumption for phrase structure analysis under the bare phrase structure theory, I will not comment further. 21 Even if we put all this aside, there is another consideration that may render Collins’s analysis untenable. Notice that the countercyclically merged subject John will have to raise to check the EPP feature of T. When it does, John and T will no longer be in violation of the LCA. Since the offending trace will not show up at PF, the countercyclic derivation will not be prohibited by the LCA. Hence, the strict cycle does not follow from the LCA as Collins claims.
Cyclicity and minimalism
95
22 It is worth noting that Watanabe (1995) proposes the concept of redefinition, formulated as (i), as the conceptual basis of cyclicity. (i)
Avoid Redefinition Avoid redefinition as much as possible.
He considers various interpretations of (i)—for example, as a global economy condition and as a simplicity measure for generalized transformations. Note that this analysis assumes the existence of redefinition procedures for transforms. The claim I make here is somewhat sharper: there are no redefinition procedures and therefore it follows that countercyclic operations will yield uninterpretable structures. 23 Still to be dealt with is the feature movement analysis of feature checking. Notice that there is even less motivation for the adjunction of features to a head than for the adjunction of a lexical head to a functional head, since the effects of such adjunction are invisible at PF— and at LF, too, for all intents and purposes. As noted above, the case of expletive insertion suggests that the Case and of I must be checkable from [Spec, IP]—unless we assume that the FFs of the expletive in [Spec, IP] move down to the head I for the purposes of checking. This would of course require some kind of redefinition mechanism. Alternatively, checking as adjunction of FFs to a head will have to be reanalyzed as something along the lines of specifier-head agreement, as in previous analyses, or a more direct relation between the features moved and the features of the target head, perhaps establishing a chain between the two sets of features.
§B: Case
5 Core grammar, Case theory, and markedness* with H. Lasnik** Core grammar (CG) is universal grammar’s substantive contribution to the grammar of a particular language. It consists of a set of general rules or rule schema (e.g., “move α”), conditions on rules (e.g., the recoverability condition for deletions), and filters (e.g., *[that [NP e]])—all of which provide a severely limited set of possible grammars from which certain properties of languages follow.1 The contents of the grammar are organized into various components, each with its specific properties. The various components interact modular fashion, in a way determined by the general organization of grammars. Following Chomsky and Lasnik (1977), we adopt the organization in (1).
(1)
I.
a. b.
Base Transformations
II.
a. b. c.
Deletions Filters Phonological rules etc.
III.
a. b. c.
Quantifier interpretation Control Conditions on binding etc.
(I) maps base structures onto S-structures, the output of the transformational component. (II) maps S-structures onto phonetic representations; (III) maps them onto representations in logical form (LF), one aspect of semantic representation. (Henceforth we will refer to (I) as “the Syntax,” (II) as “the Phonology,” and (III) as “LF”). (1) constitutes an empirical hypothesis about CG. Different organizations of the components will have different empirical effects. For example, (1) predicts that deletion plays no role in determining semantic representation. The opposite prediction would hold in an alternative where deletions are part of the transformational component. By itself, CG distinguishes those phenomena which reflect properties of UG from those which are idiosyncratic and language particular. Thus the phenomena of a particular language fall into one of two categories: core or periphery. Periphery phenomena require extra descriptive apparatus beyond that available from CG. For this reason they are considered to be marked, whereas the phenomena that fall under CG are considered to be unmarked (see van Riemsdijk (1978b) and Koster (1978c) for additional discussion). The way in which CG is formulated determines the cut between core and periphery. Alternative formulations of a “finite clause” condition on binding provide an interesting
Core grammar, case theory, and markedness
99
example. One formulation, the Propositional Island Condition (PIC) as in (2a), is given in terms of the notion “finite clause,” while another, the Nominative Island Condition (NIC) as in (2b), is given in terms of the notion “nominative Case.”2 (2)
a. b.
PIC: an anaphor may not be free in the domain of a finite clause. NIC: a nominative anaphor may not be free in S.
So defined, the PIC and NIC have different empirical effects. In the case of complex NP’s, the NIC but not the PIC predicts that (3) is grammatical. (See Chomsky (1980) for discussion). (3)
They expect that pictures of each other will be on sale by Friday.
The anaphor each other is free in the embedded finite clause but is not in the nominative Case, though it is bound in the matrix clause. Under the NIC the kind of binding exemplified by (3) belongs to the core and is unmarked. Under the PIC it belongs to the periphery and is marked—i.e., it will require some modification of CG to account for the grammaticality of (3). There is more to say about this issue and we will return to it below. The grammaticality of (3) has been used to motivate the formulation of the finite clause condition on binding in terms of Case.3 Thus the argument for the NIC over the PIC is based on greater coverage of data. By itself this argument is inconclusive because it is always possible, as we will argue below, that the grammaticality of sentences like (3) reflects marked phenomena which cannot properly be used as motivation for particular formulations of CG. Another argument for formulating CG in terms of Case concerns the replacement of the *[NP to VP] filter of Chomsky and Lasnik (1977) with the *N filter of Chomsky (1980).4 (4) (5)
*[α NP to VP], unless α is adjacent to and in the domain of [−N]. *N, where N is lexical and has no Case.
As discussed in Chomsky (1980), the properties of (4) stipulated in the unlessclause follow naturally from Case analysis. In particular, a more restricted version of the context “adjacent to and in the domain of [−N]” in (4) is independently needed to account for Case-assignment. The replacement of the *[NP to VP] filter with the *N filter eliminates the seemingly ad hoc unlessclause by absorbing it into rather natural and independently motivated conditions on Case-assignment, to be discussed below. This substitution, though not entirely without problems—see Anderson (1979, 4.3) for discussion—also clears up several technical problems with the *[NP to VP] filter, as noted in Chomsky (1980). The context “adjacent to and in the domain of [−N]” stipulated in the unless-clause of (4) appears in Case theory in terms of the more restricted notion “governed” as it applies to the rules of Case-assignment (6).5
Generative grammar (6)
a. b. c. d.
100
NP is oblique when governed by P. NP is objective when governed by V NP is nominative when governed by Tense. NP is genitive when governed by
As a first approximation, the notion “govern” may be defined for configurational languages as (7). (7)
A category α governs a category β iff α and β are adjacent and sisters.6
The structural relation “sister” is defined as: (8)
A pair of categories α and β are sisters iff they c-command each other.7
As it stands, the definition of “govern” (7) is too restrictive because it does not account for Case-marking in all grammatical constructions. For example, in English it does not account for Case-marking of a direct object NP across an indirect object NP in double object constructions or of lexical NP subjects of infinitival complements. If (7) is the correct definition for CG, then these constructions fall outside the core with respect to Case-assignment and so require special modification of core notions to account for their grammaticality. The fact that infinitival complements in most languages do not allow lexical subjects suggests that the grammaticality in English of infinitival complements with lexical subjects should fall outside the core. Note that grammatical exceptions constitute no particular problem given the logic of markedness. Case-assignment as formulated in (6) applies to any NP in an appropriate context, the simplest hypothesis. In addition to lexical NP’s, which must be marked for Case given the *N filter analysis of [NP to VP] constructions, (6) also applies to empty NP’s—i.e., PRO, NP-trace (henceforth NP-e), and WH-trace (henceforth WH-e). Some empty NP’s must be marked for Case—more specifically, subject to (6c)—under the assumption that the NIC is the correct finite clause condition on binding for CG. The NIC accounts for the illformedness of the representations in (9) only if PRO and NP-e are marked nominative by (6c).8 (9)
a. b.
*John thought [ (that) [S [NP e] was clever]] *John thought [ (that) [S PRO was clever]]
This argument does not extend to WH-e, but an argument can be constructed on the assumption that the *N filter (henceforth *N) is, like its predecessor *[NP to VP], a filter of the Phonology. (This assumption is examined below). Then, from the organization of the grammar in (1), *N applies after Deletion. In particular, *N applies after deletion in COMP (see Chomsky and Lasnik (1977) and Chomsky (1980) for discussion of such deletions). Granting this, we can now give an argument that WH-e in an NP position must be subject to *N, in which case it also must
Core grammar, case theory, and markedness
101
be subject to Case-assignment. Note first that COMP in relative clauses like (10) is a position where deletion may properly occur. (10)
the man [ [COMP who/ ] [S it seems [ e [S e is here]]]]
Under the assumption that *N applies only to lexical NP’s, (11b) ought to be grammatical because the offending item, un-Case-marked who as in (1 1a), has been deleted. (11)
a. b.
*the man [ [COMP who] [S it seems [ e [S e to be here]]]] *the man [ [COMP ] [S it seems [ e to be here]]]]
Since (11b) is ungrammatical, we may conclude that WH-e is subject to *N.9 Assuming empty NP’s have no internal structure, *N must be reformulated as (12). (12)
*NP Filter: *NP, where NP is lexical or the trace of WH, and has no Case.
This filter (henceforth *NP) should not be confused with the *NP filter of Rouveret and Vergnaud (1980). It is worth noting that lexical NP and WH-e fall together as a natural class with respect to opacity (i.e., the PIC/NIC and (Specified) Subject Condition—see Freidin and Lasnik (1981)) as well as *NP. Under this approach, as under the approach of Chomsky (1980), infinitival relatives such as a man to fix the sink are apparent counter-examples if derived by WH-movement, since the subject of the infinitival cannot be marked for Case. Williams (1980) argues, partly on the basis of this fact, that the subject in these constructions is actually PRO. Note that such an analysis is required for infinitival relatives such as a problem to solve. (13)
[NP a problem [
[S e to solve e]]]
A Wh-element could not have been fronted from both e-positions in (13). If Case-assignment applies after movement rules ((I-b) of (1)) as we have indicated, then WH-phrases in COMP will not in general be marked for Case, though their traces in appropriately governed positions will. Presumably, Case-assignment cannot apply prior to movement rules because subjects of passives are assigned Case only after NPmovement.10 Thus *NP will give the wrong result for WH-phrases in COMP without some further modification of the grammar. There are at least two solutions to this apparent problem. A WH-phrase in COMP could be exempted from *NP or assigned the Case of its trace in a governed position by a special rule.11 The discussion which follows is neutral between these two proposals. The question arises as to where the Case-assignment rule applies with respect to the rest of the grammar. We will assume that it applies as a unitary process, the simplest hypothesis.12 As such, it must precede *NP and also the NIC, as well as follow movement. Thus in order to determine the ordering of Case-assignment, it will be useful to determine the ordering of *NP.
Generative grammar
102
Based on an EQUI deletion analysis of nominals like (14), there is an argument that *NP must apply after Deletion. (14)
my desire to win
In the EQUI analysis the underlying structure of (14) is (15). (15)
[NP my desire [ [COMP for] [S PRO-self to win]]]
(14) is derived from (15) by the application of EQUI, which deletes PRO-self (see Chomsky and Lasnik (1977, 468)), and deletion in COMP, which deletes the complementizer for. If *NP applies after Deletions, the derivation goes through without problems. However, if the filter applies before the Deletion component, then self, which is considered lexical by virtue of having the same distribution as other lexical NP’s, must be marked for Case. Otherwise *NP will designate (15) as illformed and (14) would have no source as a result. The question then is how self in (15) is assigned Case under the *NP-Deletion ordering. It cannot be assigned Case by the nominal desire because N does not assign Case to NP’s that are in its complement. Thus (16) is blocked by *NP because Bernie is not marked for Case. (16)
*[NP my desire [ [S Bernie to win]]]
In contrast, Bernie in (17) is marked for Case by the complementizer for and therefore not subject to the filter. (17)
[NP my desire [ [COMP for] [s Bernie to win]]]
Extending this analysis to (15), we could claim that self is marked for Case by for, making it immune to *NP. Then PRO-self and for would delete, yielding (14). This analysis is not viable. If the complementizer for can delete after assigning Case to self in. (15), then there is no reason to expect it not to delete after assigning Case to Bernie in (17), thereby yielding the ungrammatical (16). Since, by hypothesis, Deletion follows *NP, the filter cannot be used to account for the ungrammaticality of (16), an unsatisfactory result. In general it appears that when the complementizer for assigns Case to the subject of an infinitival complement, for cannot delete. As Chomsky notes: The marked for-infinitive construction has the form (85) in the base: (85)
[ [COMP for] [S NP to VP]]
Core grammar, case theory, and markedness
103
Only [+ F] verbs [i.e. verbs which can assign Case across an infinitival clause boundary HL & RF] take such complements as (85), though of course (85) appears quite freely in other contexts. To accomodate these constructions within the present framework, we must assume that for assigns Case to NP in (85) (presumably, oblique Case by rule (68a) [our (6a), HL & RF]) and that when it does so it is undeletable. (1980, 30) Citing the observation that the for-complementizer and the homophonous preposition share some properties, Chomsky goes on to propose the following implementation of the nondeletability of a Case-assigning for-complementizer. Again following a suggestion of Vergnaud’s, let us assume that in the base, the complementizer for may or may not be assigned the feature [+ P] (i.e., assigned to the category Preposition). If it is assigned this feature, it is undeletable under the re cover ability condition [footnote deleted, HL & RF]. If for is not assigned [+ P], it will not assign Case and may delete, and in the complement to a [+ F] verb such as want, the subject of the infinitive will be assigned objective Case by the governing [+ F] verb.… (1980, 31) An alternative account of the relationship between for-deletion and Case-assignment will be presented below. The nondeletability of the for-complementizer when it has assigned Case accounts for why the illformed (16) is not derived from (17) by Deletion. This eliminates one problem for the EQUI analysis of (14) under the assumption that Deletion follows Caseassignment and *NP, but creates another. If in (15) for assigns oblique Case to self (see fn. 5), then for cannot delete. As a result, the grammar will derive (18) and not (14) from (15). (18)
*[NP my desire [ [COMP for] [S
to win]]]
The illformedness of (18) falls under the *[for-to] filter of Chomsky and Lasnik (1977). This account generalizes to WH-movement as well. The *[for-to] filter also excludes (19). (19)
*[ [c who] [S is it illegal [ [C e for] [S e to take part]]]]
If deletion applies to COMP (abbreviated as ‘C’) in (19), then (20) results. (20)
*[ [C who] [S is it illegal [ [C e [S e to take part]]]]
Generative grammar
104
Since the for-complementizer may delete if it does not assign Case, we could assume that the WH-e in subject position is not Case-marked and therefore illformed under *NP. Alternatively, we could assume that the WH-trace is marked oblique and that the deletion of for is in violation of the recoverability condition. As shown above, the principle governing the deletability of the for-complementizer contributes to a quite general account of (16)–(20). It creates problems for the EQUI analysis of (14), but only if *NP is assumed to apply before Deletion. In this way the EQUI analysis of (14) provides an empirical argument for considering *NP as a filter of the Phonology, in which case it follows Deletion as in (1). We turn now to considerations relevant to the ordering of for-complementizer deletion and Case-assignment. The principle governing the deletability of the for-complementizer has interesting consequences for the Case analysis of [+ F] verbs like want, the verbs which can assign objective Case across a clause boundary. On the basis of examples like (21), we assume that want subcategorizes for for-infinitive constructions (see (85) above) and that the underlying structure of (22) is (23). (21)
John wants very much [ [COMP for] [s Mary to win]]
(22)
John wants Mary to win.
(23)
John wants [ [COMP for] [s Mary to win]]
Given Chomsky’s principle governing for-complementizer deletion, it must be assumed that for in (23) does not assign Case to Mary and that Mary is assigned Case by want instead. If Case-asignment applies before Deletion, then the context (24) is involved; but if it applies after Deletion, then the context (25) is. (24)
V [ [COMP for] [s NP
(25)
V[
[S NP…
In terms of the definition of “governed” in (7), the relationship between V and NP violates both the adjacency and sisterhood conditions in (24), but only the sisterhood condition in (25). Assuming Chomsky’s principle and the ordering Case-assignment before Deletion, we are committed to allowing want to assign Case to the subject of an infinitival complement in violation of both conditions defining the relation “govern.” In contrast, the assumption that Case-assignment follows Deletion allows us to maintain at least the adjacency condition for Case-assignment with [+ F] verbs. Structures like (26) provide further motivation for the latter assumption. (26)
*John wants very much [
[S Mary to win]]
In (26) wants is neither adjacent to nor a sister of Mary. If want can assign Case to an infinitival subject in this context-as must be the case to account for (22) under the hypothesis that Case-assignment precedes Deletion, then we are faced with the problem
Core grammar, case theory, and markedness
105
of explaining why Case-assignment does not apply to Mary in (26). This problem does not arise under the converse hypothesis that Deletion precedes Case-assignment. Under the latter hypothesis, Caseassignment in (22) would not violate the adjacency condition and so the ill-formedness of (26) falls out from the general theory of Case. There is however one class of cases, as in (27), where Case-assignment does apparently apply between a verb and a NP that are neither adjacent nor sisters. (27)
a.
Who do you want very much [ [COMP e ] [S e to win]] Who do you want very much to win?
b.
Who do you believe sincerely [ [C e [S e to be the best man]] Who do you believe sincerely to be the best man? (cf *I believe sincerely John to be the best man.)
( designates the zero morpheme complementizer of Chomsky and Lasnik (1977)). Unless the Wh-e’s in S are marked for Case, such constructions should be excluded by *NP. The fact that the questions in (27) appear to be grammatical suggests that once again we are confronting phenomena that fall outside the core. Recall that [NP to VP] structures with full lexical subjects are exceptional among the languages of the world, a fact which supports the analysis under which structures like (22) also fall outside the core. Yet compared with (22), the grammatical status of (27) seems to us significantly less certain. It is worth noting that whereas exceptions to both the adjacency and sisterhood conditions seem to involve only Case-marking of WH-e, exceptions to just the sisterhood condition may involve lexical NP (as in (22)) as well. The WH-e counterpart of (22) is (28). (28)
Who do you want [
[S e to win]]
We suspect that when lexical NP is assigned Case in violation of one or more of the core conditions, WH-e will also be Case-marked in that position, but not conversely. This suggests phenomena which might properly fall under a substantive theory of markedness. Considerations relevant to a theory of markedness provide yet another argument in favor of ordering Case-assignment after Deletion rather than before. Once again let us consider the two ordering theories: A. B.
Case-assignment before Deletion. Deletion before Case-assignment.
For theory A Chomsky’s principle governing the deletion of for-complementizer must be stipulated in order to account for (16), and (18)—(20). From this it follows that Caseassignment must be able to apply across a complementizer in the derivation of (22) from (23). Thus Case-assignment in the derivation of (22) involves exceptions to both the adjacency and sisterhood conditions on government. Now if Case-assignment in (23) violates both conditions in assigning Case to lexical NP, then we have no principled way
Generative grammar
106
of accounting for the ill-formedness of (29) since presumably Case-assignment could apply to lexical subjects of the infinitival complements in these constructions. (29)
a. b.
*I believe sincerely John to be the best man. *John wants very much Mary to win.
Under theory A, there appears to be no principled way to distinguish the different possibilities for Case-marking of lexical NP and WH-e in terms of the core notion of government. Considered in terms of the acquisition of a grammar, theory A is problematic. In acquiring the grammar of English, if the language learner must modify the Caseassignment rule to allow for (22) as discussed above, then that modification would, presumably, allow in (27)–(29). How (29) would then be excluded is a mystery since negative evidence of this sort is not available in the corpus that in part determines the particular grammar acquired.13 We could, of course, avoid this problem for theory A by claiming that modifications of CG are made on a case by case basis. If this is the correct way to proceed, then modifications of CG would be truly arbitrary and there would be hardly any motivation for a substantive theory of markedness. In contrast to theory A, theory B allows for a clear distinction between (22) and (27) in terms of the adjacency and sisterhood conditions on government. Case-assignment violates only the latter in the derivation of (22), but both in the derivation of (27). Most significantly, under theory B, it follows automatically that if for assigns Case, it does not delete—since deleted for will not be present when Case-assignment applies. Given the organization of CG in (1), ordering Case-assignment after Deletion has two important consequences. First, *NP must be ordered after Deletion regardless of how nominals like (14) are analyzed. Second, the NIC cannot be maintained as the relevant finite clause condition on binding since LF has no access to Case-marking. As mentioned above, the empirical argument of Chomsky (1980) for the NIC is inconclusive. Though bound anaphors can often occur embedded in the subjects of finite clauses such as (30), freely referring pronouns can also occur in these positions as in (31). (30) a. Theyi expect [ that [S [NP pictures of each otheri] will be on sale by Friday]] b. Theyi expect [ that [S [NP pictures of themselvesi] will be on sale by Friday]] (31) John expects [ that [S [NP pictures of him] will be on sale by Friday]]
Here John and him can freely co-refer, even though under the NIC, Disjoint Reference should apply (see the appendix of Chomsky (1980)). The question arises, then, as to which is the more “basic” phenomenon—the application of bound anaphora or the blocking of Disjoint Reference. In fact, the pronoun in (31) is apparently free in reference quite generally, while there is substantial variability about the acceptability of (30a and b). All else being equal, this substantially weakens the argument for the NIC. An analysis of the distribution of lexical anaphors in complex NP’s provides an independent argument against the NIC and in favor of the PIC. Formulated as the NIC,
Core grammar, case theory, and markedness
107
the finite clause condition on binding in conjunction with the (Specified) Subject Condition (SSC, see Chomsky (1980)) allows a lexical anaphor in the subject of a finite clause to be bound by an antecedent outside that clause just in case the anaphor is not marked nominative. The relevant cases involve reflexive pronouns and reciprocals as in (30a and b) above. The PIC, in constrast, rules out (30a and b). It also accounts for (32), whereas the NIC does not. (32)
*[S [NP pictures of each other / themselves] will be on sale soon]
Since the illformedness of structures like (32) should follow from CG, the NIC analysis requires the addition of another condition on binding. An obvious candidate is a stipulation that anaphors must be bound (see Freidin (1978)). This stipulation is redundant with respect to those cases already covered by the NIC and SSC—e.g., (33) and (34) respectively. (33)
*Himself left early.
(34)
*It pleases herselfi that Berniej built his whizzing-stick.
Given the PIC instead of the NIC, the stipulation is totally redundant. That is, the stipulation falls out as a theorem given the PIC and SSC as axioms (see Freidin and Lasnik (1981) for further discussion). This situation does not provide a decisive argument against the NIC in favor of the PIC, but it is suggestive. Another argument against the NIC is provided by facts concerning variation across idiolects. If the NIC is the correct core principle, then we might expect a situation where structures like (30a and b) are grammatical in all idiolects. But if this situation does not hold, then we face the serious problem of explaining how the language learner constructs a grammar which excludes (30a and b) in the absence of the relevant evidence—i.e., negative evidence. On the basis of an informal survey, we have found that there is in fact variation across idiolects concerning (30a and b). Interestingly, the variation in part distinguishes between reflexive pronouns and reciprocals. Thus four possible idiolects can be projected based on the grammaticality judgments concerning (30a and b). (35)
a. b. c. d.
both grammatical. only reflexive pronouns. only reciprocals. neither.
Only (a), (c), and (d) were attested in our informal survey. This suggests that reciprocals may have a wider distribution than reflexives, perhaps another phenomenon to be covered by a theory of markedness.14 Given that (30a and b) fall outside the core phenomena, it seems reasonable to suppose that idiolectal variation results from very specific modifications of CG which in turn result from conflicts between CG and the initial data for language acquisition. But then it
Generative grammar
108
should be possible that some corpus of initial data contains only the reflexive cases (30b) and not the reciprocal cases (30a). Given the existence of such a corpus, an auxiliary hypothesis is needed to account for the non-existence of idiolect (b). We assume that this hypothesis will be provided by a substantive theory of markedness—i.e., an empirical hypothesis about the limited range of variation among grammars. Notes
* This article is a revised version of the portions of our GLOW Colloquium presentation dealing with Case analysis and markedness. Excluded here, and developed in Freidin and Lasnik (1981), is the section of our presentation examining binding and opacity. We are indebted to Mona Anderson for comments on an earlier version of this article. ** This work was supported by a Postdoctoral Research Fellowship from the NIMH (grant 1F32MH 05879–01). 1 For further discussion of this and related points, see Chomsky (1980) and Lasnik and Kupin (1977). 2 We assume that the domain of the PIC and NIC is S rather than based on arguments given in Freidin and Lasnik (1981). 3 Presumably the correct notion of Case is rather abstract since many of the essential properties covered by Case theory (cf. Chomsky (1980)) are manifested in languages which have no overt Case-marking—e.g., Vietnamese. 4 The two filters divide the class of NP types (e.g., trace, PRO, and lexical NP) differently. See Freidin (1978, fn.35) for discussion. 5 No rule for genitive Case was given in Chomsky (1980). We assume that rather than is the relevant node for Case-assignment because of examples like (i) and (ii) (i) (ii)
John’s interesting proofs *The proofs the theorem which would be analyzed as (iii) and (iv) respectively.
(iii) (iv)
[ John’s [ interesting proofs]] [ the [ proofs the theorem]]
In (iii), but not N, governs John given our definition of “govern,” which is discussed immediately below. In (iv), the theorem would be incorrectly assigned genitive Case if (6d) mentioned N and not If the complementizer for is not analyzed as [+ P] then another rule of Caseassignment dealing with for-infinitive constructions must be added to (6). 6 See fn. 13 for the definition given in Chomsky (1980) and also for a discussion of some potential problems connected with it. 7 Under this definition, two categories need not be immediately dominated by the same category to be sisters. As far as we can tell, this creates no problems. 8 The PRO case is noted in Chomsky (1980, fn. 30) but not the NP-e case.
Core grammar, case theory, and markedness
109
9 WH-NP’s that are Case-marked under the *N filter analysis will have Case-marked traces under our analysis. WH-NP’s that are not Case-marked will have traces that are also not Case-marked. 10 A conceivable alternative analysis would have Case-assignment apply before all transformations. Then the derived subject of a passive would acquire the Case of the position to which it moves. We have not explored the consequences of this possibility. Below, however, we will argue that Case-assignment follows Deletion. Given this, of course, it must also follow movement. 11 Chomsky (1980) presents a different solution. There, Case-assignment for WH-phrases is taken to be part of the WH-movement rule. This is identical in effect to the second proposal in the text. However this analysis seems contrary to the spirit of the modularity hypothesis for the structure of CG since it complicates the internal structure of the movement component, perhaps unnecessarily. 12 Chomsky (1980) proposes that oblique Case is assigned in the base. Oblique case is then carried along under NP movement, ultimately conflicting with the assigned Case of the derived position of an oblique NP. This proposal was intended to capture the general impossibility of preposition stranding under NP movement. An alternative compatible with surface structure assignment of oblique Case is suggested in Chomsky (1980, fn. 30) and developed in May (1981). The essence of the proposal is an «inverse» *NP filter which assigns * to any empty NP (except WH-e) bearing Case. The object of a preposition will thus be unable to passivize.
Assuming that participles do not assign Case, normal passives and reanalyzed pseudopassives will still be allowed. 13 Taking theory A in conjunction with the definition of government proposed in Chomsky (1980), this situation is significantly worse. In this paper the relation “governed” is defined as:
α is governed by β if α is c-commanded by β and no major category or major category boundary appears between α and β.29 (p. 25) Footnote 29 reads:
This convention builds in the « adjacency and c-command » condition of the *[NP-to-VP] filter. Excluded are the structures β [γ a and β γ α, where γ is a major category. The notion « government », like the notion « grammatical relation », must be defined at a level of abstraction that excludes from consideration parenthetical elements, interpolated adverbs, and the like. This definition is less restrictive than the one given in (7) above. Under this analysis of Caseassignment, (27) and (29) both fall within the core phenomena. What is « marked » under this analysis is the illformedness of (29). It is difficult to imagine how CG would be modified to exclude (29) on the basis of evidence available in the acquisition situation. 14 Henk van Riemsdijk (personal communication) has informed us that reciprocals in Dutch occur in structures analogous to (30a) whereas reflexives may not occur in structures analogous to (30b).
6 Lexical Case phenomena with Rex Sprouse 1 Basic assumptions The analysis of Case phenomena presented in this paper is based on a theory of Case structure in generative grammar that has been developed over the past decade (see Chomsky 1986a for some general discussion and Chomsky 1981 for more technical details of the basic theory). Central to this theory of Case is the Case Filter (1), which plays a crucial role in determining the distribution of lexical (that is, phonetically realized) versus nonlexical NPs (trace1 and PRO) in sentences. (1)
Case Filter *NP [+phonetic matrix] that is not Case-marked
The Case Filter designates as ill formed any lexical NP that is not marked for Case. Under the well-motivated assumption that the infinitival subject position in such constructions is not marked for Case, the Case Filter excludes infinitival indirect questions with lexical subjects as in (2a) (in contrast to (2b), where the infinitival has a nonlexical subject—in this instance PRO—and to (2c), where the indirect question is a finite clause). (2)
a. b. c.
*John wondered [CP whati [IP Mary to say ei to Bill]]. John wondered [CP whati [IP PRO to say ei to Bill]]. John wondered [CP whati [IP Mary said ei to Bill]].
Under this analysis, it must follow that the Case Filter does not apply to D-Structure representations. If it did, then raising constructions such as (3a) would be ruled out at DStructure where the lexical subject is in an infinitival subject position, illustrated in (3b). (3)
a. b. c. d.
Fredi seems [IP ei to be happy]. [NP] seems [IP Fred to be happy]. *It seems [IP Fred to be happy]. It seems [IP Fred is happy].
(3c) shows that a lexical subject cannot occur in the infinitival subject position of a raising construction at S-Structure, in contrast to the subject position of a finite complement (for example, (3d)). (3a) demonstrates that the Case Filter applies after movement transformations.2
Lexical case phenomena
111
If Case assignment applies after movement transformations and the Case Filter applies to syntactic representations derived via movement rules, there must be a mechanism for assigning Case to an NP that occurs in a Case-marked position at D-Structure and is moved to a position at S-Structure that is not Case-marked, as with the moved wh-phrase in (2). For the analysis of (2), we assume that the wh-phrase what inherits a Case marking via the trace that it binds in the complement object position. Thus, this mechanism for Case assignment in such representations derived by movement transformations can be construed as Case inheritance via trace binding. Though Case inheritance holds generally for movements from a Case-marked D-Structure grammatical function position (for instance, object position) to an S-Structure position that is not a grammatical function position (for instance, the specifier position of CP), it does not seem to apply in constructions where movement is between two grammatical function positions. In (3a), for example, where movement occurs between two grammatical function positions (matrix subject and complement subject), the moved NP is assigned Case by virtue of moving into a Case-marked position from a position that is not Case-marked. This assumes that a verb like seem does not assign objective Case to the complement subject position—in contrast to a verb like expect, which does, as illustrated in (4). (4)
Bernie expects [IP Adam to win].
The difference between seem and expect that is thought to account for the difference in Case-marking possibilities is that seem, in contrast expect, does not assign a semantic function (or θ-role) to its subject. Given this correlation between θ-role assignment and Case-marking possibilities, it is assumed that a verb that does not assign a θ-role to its subject may not Case-mark an NP that it governs.3 This correlation generalizes to passive constructions as well, where the passive predicate does not assign a θ-role to its subject. (5)
a. b.
*It was expected [IP Adam to win], Adami-was expected [IP ei to win].
Thus, even though the passive predicate governs the infinitival complement subject, it does not assign objective Case to this NP, in contrast to the corresponding active predicate in (4). Thus, (5a) constitutes a Case Filter violation. In (5b) the D-Structure complement subject has moved into matrix subject position where it is marked for nominative Case in the normal fashion. This correlation between the inability of a verb to assign a θ-role to a subject and its inability to assign Case to an NP it governs is generally referred to in the literature as Burzio’s generalization (see Chomskey 1986:139–141 and Burzio 1986: sec. 3.1). It is standardly assumed that this failure of Case assignment results from a mechanism of “Case absorption,” which is induced by passive morphology for passive predicates.4 As we will show, this assumption requires some revision when we consider the fuller range of Case phenomena, which includes lexial Case—that is, Case marking that is determined as a lexical property of certain heads (for instance, V and P) in some languages, as opposed to Case marking determined solely in terms of syntactic configuration (henceforth configurational Case).
Generative grammar
112
The Case-theoretic analysis of (2)–(5) given above, essentially the standard analysis, rests on several assumptions that we would like to examine in some detail in this and the following sections. Let us suppose that Case is assigned as an index to the maximal phrasal projection of N (designated as NP, in contrast to nonmaximal phrasal projections, which will be designated as N*). At this point two basic questions arise about the process of Case assignment: (i) What is the formulation of the rule of case assignment? and (ii) Where does this rule apply in relation to other rules of grammar (in other words, where is Case assignment located in the organization of a grammar)? The formulation of the rule (or rules) of Case assignment crucially affects the interpretation of the Case Filter. Suppose the rule of Case assignment is stated in the optimally simple form (6). (6)
Assign Case to NP.
(Motivation for this formulation comes from lexical Case phenomena, as we will discuss in section 2.) If (6) is interpreted as an optional rule, then its particular behavior with respect to various constructions (when it must apply versus when it cannot apply) will be determined by general principles of grammar. For example, when the rule fails to apply to a lexical NP in a canonical Case-marked position (for instance, subject of a finite clause), then the resulting representation violates the Case Filter. When Case is assigned to a lexical NP in a syntactic position that is not licensed for Case (for instance, infinitival subject position in indirect questions, as in (2)), then the resulting representation violates the general principle of proper Case licensing stated in (7). (7)
Principle of Proper Assignment Each Case index must be properly assigned.5
A Case index will be properly assigned where it is governed by an appropriate element (for instance, accusative Case governed by V and nominative Case governed by agreement). (7) is independently needed to exclude instances where (6) assigns the wrong Case index to an NP (for instance, assigning accusative Case to the subject of a finite clause). Thus, under the formulation of Case assignment as (6), the explanation for why indirect infinitival questions in English may not have lexical subjects has two parts, one of which involves the Case Filter and the other, a principle of Proper Assignment. An alternative to (6) would be a set of specific rules, each of which assigns a particular Case to an NP in a particular configuration. This is assumed in the standard analysis, where the lexical subject of an infinitival indirect question would never be assigned a Case and would therefore always constitute a Case Filter violation. This solution conflates Case assignment with Case licensing, which, we will argue, need to be distinguished for the analysis of lexical Case phenomena. A third alternative would be to consider (6) as an obligatory rule in Universal Grammar (UG). Thus, every NP in a given phrase marker will be assigned Case. Under this analysis the Case Filter is essentially useless. To account for the “Case Filter effects” under the standard analysis, the principle of Case licensing would be restricted to phonetically realized NPs. Note that this is necessary in any event since PRO as well as
Lexical case phenomena
113
lexical NPs will be Case-marked if (6) is obligatory, and presumably Case-marked empty categories are not subject to any particular Case principle. (Evidence that PRO must be Case-marked in some instances is discussed in section 5.) 2 Lexical versus configurational Case in Russian This section initiates an investigation of lexical Case phenomena that we hope to show has significant consequences for the theory of grammar as it has been formulated in the standard work on generative grammar (see Chomsky 1981, 1986). We begin with a discussion of lexical versus configurational Case in Russian as a striking illustration of the different syntactic properties of these two types of Case. In the following discussion configurational Case designates a Case marking that is licensed solely in terms of a canonical syntactic configuration (for instance, accusative Case on an NP governed by V or nominative Case on an NP governed by the agreement element). Lexical case designates a Case marking on an NP that is associated with a particular lexical head and that differs from the canonical configurational Case that would otherwise be assigned to the NP that bears the lexical Case.6 In Russian the lexical versus configurational Case distinction shows up clearly with respect to the verbal object. (7)
a.
Configurational Case Ivan poceloval [NP ètu krasivuju devušku]. Ivan-NOM kissed that-ACC pretty-ACC girl-ACC “Ivan kissed that pretty girl.”
b.
Lexical Case i.
Ivan pomog [NP ètoj krasivoj devuške]. Ivan-NOM helped that-DAT pretty-DAT girl-DAT “Ivan helped that pretty girl.”
ii.
*Ivan pomog [NP ètu krasivuju devušku]. Ivan-NOM helped that-ACC pretty-ACC girl-ACC
In both (7a) and (7bi) the verb governs its NP object. The verbs poceloval and pomog differ in that the latter requires that its object occur in the dative Case—thus, (7bii), where the object is Case-marked accusative, is ill formed even though the NP occurs in a syntactic configuration that licenses accusative Case. Note that in Russian Case is also morphologically realized on lexical modifiers of N (such as determiners and adjectives). This can be accounted for if Case marking of all such lexical elements of the NP results from propagation of the Case index on NP to the lexical constituents in the government domain of the lexical head N. Russian exhibits a striking difference between lexical and configurational Case with respect to nouns modified by certain numerals.
Generative grammar (8)
a.
114
Configurational Case Ivan poceloval [NP pjat’ krasivyx devušek]. Ivan-NOM kissed five-ACC pretty-GEN girls-GEN “Ivan kissed five pretty girls.”
b.
Lexical Case i.
Ivan pomog [NP pjati krasivym devuškam]. Ivan-NOM helped five-DAT pretty-DAT girls-DAT “Ivan helped five pretty girls.”
ii.
*Ivan pomog [NP pjati krasivyx devušek]. Ivan-NOM helped five-DAT pretty-GEN girls-GEN
In the “quantified” noun constructions in (8), when the NP is marked for configurational Case, the numeral manifests the appropriate configurational Case marking, whereas the remainder of the NP is obligatorily marked in the genitive. In contrast, a lexically Casemarked NP shows no such “Case splitting,” as illustrated in (8bi) versus (8bii). (For further discussion of these constructions, see Freidin and Babby 1984 and Babby 1987). The failure of Case splitting in lexically Case-marked NPs can be accounted for by assuming that lexical Case marking involves a head-to-head relation (like selection). Thus, the verb pomoc’ ‘help’ has a lexical property that imposes dative Case on the head of its NP object. This can be stated as a syntactic feature+_____DAT that is interpreted like a selectional feature (for instance+_____[+animate] for the verb frighten) and specifies a head-to-head relation between a verb and its object. Satisfaction of lexical properties where they exist in languages seems to be obligatory and to take precedence over structural properties where the latter conflict with the former. This follows if (9) is a principle of grammar. (9)
Principle of Lexical Satisfaction Lexical properties must be satisfied. (Freidin and Babby 1984)
Given that lexical Case properties must be satisfied, it follows from our assumption that Case is assigned to a maximal phrasal projection that there can be no Case splitting in lexically Case-marked NPs. In order for a lexical Case index to propagate from NP to its head, all projections of N will be lexically Case-marked and presumably all lexical constituents governed by N as well. 3 The Principle of Lexical Satisfaction The Principle of Lexical Satisfaction (PLS) has several consequences for the distribution of Case in Russian. In general, it prohibits Case alternations that are possible with configurational Case from occurring when lexical Case is involved. For example, NPs bearing configurational Case may also be marked genitive (the so-called partitive genitive), as illustrated in (10).
Lexical case phenomena
115
Partitive genitive/configurational Case
(10)
a.
Ja xoču vodu. I want water-ACC “I want water.”
b.
Ja xoču vody. I want water-GEN “I want some water.”
As (11) demonstrates, this kind of alternation is not allowed with lexically Case-marked NPs. Partitive genitive/lexical Case
(11)
a.
Ivan prišel [PPS vodoj]. Ivan arrived with water-INST “Ivan arrived with water.”
b.
*Ivan prišel [PPS vody]. Ivan arrived with water-GEN “Ivan arrived with some water.”
In (11 a) the object of the preposition s ‘with’ is lexically Case-marked instrumental; thus, (11b) violates the PLS. The genitive of negation is another phenomenon is Russian where a configurational Case alternates with genitive Case marking. For example, the subject of a negative finite clause may occur in the nominative or genitive, as in (12).7 (12)
a.
Pticy bol’še ne pojavljalis’. birds-NOM any-more NEG appeared “The birds didn’t come again.”
b.
Ptic bol’še ne pojavljalos’. birds-GEN any-more NEG appeared “No birds came again.”
Since subjects in Russian never occur with lexical Case, this alternation is generally permissible. However, with objects of verbs the genitive of negation is only permissible when it alternates with configurational accusative Case. This alternation, which is similar in character to (12), is illustrated in (13). (13)
a.
Oni ne odobrjajut inostrannye metody. they-NOM NEG approve-of foreign-ACC methods-ACC “They do not approve of (the) foreign methods.”
Generative grammar b.
116
Oni ne odobrjajut inostrannyx metodov. they-NOM NEG approve-of foreign-GEN methods-GEN “They do not approve of foreign methods.”
In contrast to configurational Case, lexical Case does not alternate with the genitive of negation—as predicted by the PLS. Thus, in (14) the genitive of negation is not allowed in alternation with the lexically Case-marked objects in the dative and instrumental Cases. (14) a. Oni ne podražajut inostrannym metodam. they-NOM NEG imitate foreign-DAT methods-DAT “They do not imitate foreign methods.” b. *Oni ne podražajut inostrannyx metodov. they-NOM NEG imitate foreign-GEN methods-GEN c. Oni ne upravljajut inostrannymi mašinami. they-NOM NEG drive foreign-INST cars-INST “They do not drive foreign cars.” d. *Oni ne upravljajut inostrannyx mašin. they-NOM NEG drive foreign-GEN cars-GEN
As far as we can determine, the lack of a Case alternation between a lexical Case and the genitive of negation is a purely syntactic phenomenon and has no apparent explanation in terms of semantic differences between verbs that require lexically Case-marked objects and those that take configurationally Case-marked objects. Passive constructions provide yet another instance where lexical and configurational Case are clearly distinguished. As is standard in so many of the world’s languages, the accusative object in an active sentence shows up as the nominative subject of the corresponding passive sentence. (15) gives the paradigm for configurational Case active/passive constructions. (15)
a.
Active Ivan čitaet knigu. Ivan-NOM reads book-ACC “Ivan is reading the book.”
b.
Passive i.
Kniga čitaetsja (Ivanom). book-NOM is-being-read Ivan-lNST “The book is being read (by Ivan).”
ii.
*Knigu čitaetsja (Ivanom). book-ACC is-being-read Ivan-lNST
iii.
*Čitaetsja knigu (Ivanom).
Lexical case phenomena
117
In effect, (15a) and (15bi) illustrate an accusative/nominative Case alternation for the underlying object knig-. (15bii) shows that accusative Case is not licensed in subject position, and (15biii) demonstrates that the passive form čitaetsja does not allow a Casemarked lexical object. Thus, Russian passives appear to involve something like the “Case absorption” hypothesized for English passives. The corresponding paradigm for a verb that imposes lexical Case on its underlying object is given in (16). (16)
a.
Active Rabotnik podražaet inostrannym metodam. worker-NOM copies foreign-DAT methods-DAT “The worker is copying foreign methods.”
b.
Passive i.
*Inostrannye metody podražajutsja rabotnikom. foreign-NOM methods-NOM are-copied worker-INST “Foreign methods are being copied by the worker.”
ii.
*Inostrannym metodam podražajutsja rabotnikom. foreign-DAT methods-DAT are-copied worker-INST
iii.
*Podražajutsja inostrannym metodam rabotnikom. are-copied foreign-DAT methods-DAT worker-INST
(16bi) violates the PLS under the assumption that the lexical property +_____ DAT of podražat’ is not affected by passive morphology. If passive morphology had the effect of canceling the lexical Case property of the verb, then (16bi) should be well formed, contrary to the facts. Under the assumption that the NP inostrannym metodam is a constituent of VP, (16biii) shows that the passive form of the verb does not allow a lexical NP in object position. Passive morphology therefore has the same effect on configurationally and lexically Case-marked objects. (16bii) demonstrates that the lexically Case-marked NP is not properly licensed in subject position. Thus, there is no possible way to satisfy both the lexical Case property of the verb and the Case properties of the passive construction in general.8 4 Lexical Case phenomena in German Like Russian, German also exhibits lexical Case marking on the objects of verbs. Though the vast majority of the transitive verbs in the German lexicon take NP objects with accusative Case marking in active clauses, there also exists a set of verbs taking NP objects with dative Case marking. This contrast is illustrated in (17).9
Generative grammar
(17)
a.
b.
118
daß der Polizist [VP [v′ den Spion beobachtete]] that [the policeman]-NOM [the spy]-ACC observed “that the policeman observed the spy.” daß der Polizist dem Spion half that [the policeman]-NOM [the spy]-DAT helped “that the policeman helped the spy.”
In (17a) the NP object of the verb beobachten “observe” occurs in the configurational accusative Case (hence den Spion), whereas in (17b) helfen ‘help’ selects a dative Case object (hence dem Spion). German passive constructions exhibit an asymmetry in the behavior of configurational and lexical Case. A configurationally Case-marked object (that is, in the accusative) in an active construction occurs as a nominative in the subject position of a corresponding passive construction—as in English and Russian. In contrast, a lexically Case-marked object in an active construction remains in object position with the same lexical Case in the corresponding passive construction, as illustrated in (18b) as compared to (18a).10 (18)
a.
daß der Spion beobachtet wurde that [the spy] -NOM observed-PPP was
b.
daß dem Spion/*der Spion geholfen wurde that [the spy]-DAT/[the spy]-NOM helped-PPP was
In the two passive constructions, the D-Structure object of beobachten occurs with nominative Case marking, in contrast to the D-Structure object of helfen, which retains its dative Case marking in the passive, as required by the PLS. The grammaticality of daß dem Spion geholfen wurde demonstrates that German differs from Russian in that the selection of lexical Case seems to be sufficient to license the occurrence of Case in a position that is not licensed for configurational Case. Given that German does not have lexically Case-marked subjects,11 it should be expected that a lexically Case-marked object will not occur in subject position at SStructure. The motivation for this assumption will become clear in the discussion of Icelandic that follows. For German, then, Burzio’s generalization does not extend to lexical Case phenomena, though it remains valid for configurational Case phenomena.12 5 Lexical Case phenomena in Icelandic In this section we will examine the distribution of lexical Case in Icelandic, where unlike what happens in Russian and German, lexically Case-marked NPs may occur in subject position as well as object position. This difference has important consequences for the analysis of passives whose corresponding active forms take lexically Case-marked objects, as we will discuss in section 5.2.
Lexical case phenomena
119
5.1 Verbs selecting lexically Case-marked subjects Several studies13 have presented tests for syntactic subjecthood in Icelandic. These tests involve a range of phenomena including the binding of reflexives and reciprocals for all speakers, coordination, expletive insertion (það), correspondence with PRO in control structures, and a number of superficially heterogeneous word order facts. Though these properties often coincide with the nominative Case NP with which finite verb forms must agree morphologically, this is not always the case. Here we mention only three of these properties as an illustration. For many Icelandic speakers, reflexive binding is subject-oriented; thus, nonsubjects cannot serve as antecedents to reflexives.14 (19a) gives the standard nominative subject case; (19b) shows that a nonnominative NP may also bind a reflexive. (19) a. Haralduri las bókina sínai. Harald-NOM read book his (+REFL) b. Haraldii batnaði veikin hjá bróður Harald-DAT recovered-from the-disease at-the-home-of brother sínumi. his (+REFL) c. *Haralduri drap Friðrikj hjá bróður sínumi/*j. Harald-NOM killed Friðrik-ACC at-the-home-of brother his (+REFL)
It is assumed that this NP is also a structural subject, like its nominative counterpart. (19c) illustrates the fact that nonsubjects cannot bind reflexives. Note that although subject-oriented binding of reflexives is not observed in all idiolects of Icelandic, the fact that it holds in some is enough to establish the argument for lexically Case-marked subjects. A second property of nominative subjects shared with certain lexically Case-marked NPs is that in yes/no questions, the NP immediately follows the finite verb. This parallelism follows if both Haraldur in (20a) and Haraldi in (20b) are structural subjects. (20)
a.
Hefur Haraldur lesið bókina? has Harald-NOM read the-book
b.
Hefur Haraldi batnað veikin? has Harald-DAT recovered-from the-disease
A related property is that both NPs occur in so-called raising constructions where the nominative NP is assumed to be the structural subject of the “seem”-type verb but is interpreted as the subject of the embedded infinitival. The same assumption about the status of the lexically Case-marked NP in (21b) is completely natural. (21)
a.
Haralduri virðist [ti hafa lesið bókina]. Harald-NOM seems to-have read the-book
b.
Haraldii virðist [ti hafa batnað veikin]. Harald-DAT seems to-have recovered-from the-disease
Generative grammar
120
There seems to be no reason to analyze the trace of Haraldi in (21b) as occupying a nonsubject position if its lexical antecedent occurs in a bona fide subject position at S-Structure. Given this analysis and our analysis of lexical Case as being imposed by selectional features of lexical heads (for instance, V, P, and A), Icelandic differs from both Russian and German in that lexical Case selection extends to subjects as well as complements. The fact that lexical Case selection is possible for subjects leads to further differences in the distribution of lexically Case-marked NPs in Icelandic, as we will discuss in the following section.15 5.2 Verbs selecting lexically Case-marked objects Icelandic, like Russian and German, also allows for lexically Case-marked objects as well as structurally Case-marked ones. Consider the following examples. (22)
a.
Egill drap Harald í gær. Egill-NOM killed Harald-ACC yesterday
b.
Egill hjálpaði barninu. Egill-NOM helped the-child-DAT
In (22a) the object is assigned accusative Case according to its configuration, whereas in (22b) the verb hjálpa selects an object in the dative Case. This distinction between lexical and configurational Case is illustrated in the passive counterparts to (22a–b), given in (23a–b), respectively, where the corresponding accusative object becomes a nominative subject and the corresponding dative object becomes a syntactic subject but remains in the dative. (23)
a.
Haraldur var drepinn í gær. Harald-NOM was killed-PPP yesterday
b.
Barninu/*Barnið var hjálpað. the-child-DAT/the-child-NOM was helped-PPP
(23b) constitutes a different pattern from the corresponding Russian and German examples. The fact that the S-Structure subject barninu occurs in the lexically determined Case dative, rather than nominative, follows from the PLS. The phenomenon of lexical Case in such constructions strongly indicates the presence of a trace in the object position of passive constructions. Otherwise, the selection of lexical Case in these constructions would be simply unexplained since passive predicates do not select lexical properties of subjects at all. Presumably the reason that lexically Case-marked objects may occur in the subject position of corresponding passive constructions is that this position can be selected for lexical Case in regular active sentences. It appears that once lexical Case is selected in a position, lexically Case-marked NPs can be moved to such positions even when there is no direct selection of lexical Case in the position.
Lexical case phenomena
121
As Zaenen, Maling, and Thráinsson 1985 showed, the evidence that the lexically Case-marked NP in (23b) is in fact a structural subject is virtually identical to the evidence for lexically Case-marked subjects in active sentences reviewed above. The lexically Case-marked NP may be the antecedent of a reflexive in subject-oriented binding dialects. (24)
Stráknumi var hjálpað af bróður sínumi. the-boy-DAT was helped-PPP by brother his (+REFL)
Also, in yes/no questions, the lexically Case-marked NP behaves like a subject in that it immediately follows the finite verb. (25)
Var barninu hjálpað í gær? was the-child-DAT helped-PPP yesterday
And, finally, it can occur as the surface subject of a “seem”-type verb while being interpreted as the subject argument of the embedded infinitival. (26)
Barninui virðist [ti hafa verið hjálpað]. the-child-DAT seems to-have been helped-PPP
That is, (26) is a standard case of subject-to-subject raising. In Icelandic, as in German, it is possible for a lexically Case-marked object to remain in its D-Structure position at S-Structure in the presence of passive morphology. This is illustrated clearly in (27). (27)
a.
Í gær var hjálpað barni. yesterday was helped-PPP a-child-DAT
b.
það var hjálpað barni. it was helped-PPP a-child-DAT
In (27) barni (DAT) appears in object position at S-Structure, as in the corresponding active.16 (27) demonstrates that passive morphology in Icelandic blocks neither the assignment nor the licensing of lexical Case with respect to object position. Thus, the movement of a lexically Case-marked object to subject position, as in (23b), is not forced by any Case Filter effect—again in contrast to its configurational Case counterpart. Further evidence that passive predicates may license the occurrence of lexical Case in object positions comes from the analysis of ditransitive verbs, where the Case marking on one or both objects may be lexically selected. With respect to passivization possibilities there are two distinct classes of ditransitive verbs in Icelandic: the gefa-class, where the unmarked order of objects in active sentences is DAT-ACC, and the skila-class, representing all other occurring Case
Generative grammar
122
patterns. In both classes the±recipient NP precedes the theme NP. The classes may be summarized schematically as in Tables 6.1 and 6.2. Table 6.1 Ditransitive verbs
Class I Class II
a. b. c. d.
± Recipient
Theme
Examples
DAT
ACC
gefa “give,” syna “show,” meina “refuse”
DAT DAT ACC ACC
DAT GEN DAT GEN
skila “return,” lofa “promise” óska “wish,” synja “deny” svipta “deprive,” sæma “award” spyrja “ask,” minna “remind”
Table 6.2 Case/θ-role correspondences gefa “give” θ1 agent
skila “return”
θ2 recipient
θ3 theme DAT
θ1 agent
θ2 recipient DAT
θ3 theme DAT
With verbs of class I either object may move to subject position in the passive. This is illustrated in (28a–c). (28)
a.
Ég syndi henni bílinn. I-NOM showed her-DAT the-car-ACC
b.
Bíllinn var syndur henni. the-car-NOM was shown-PPP/NOM her-DAT
c.
Henni var syndur bíllinn. her-DAT was shown-PPP/NOM the-car-NOM
In (28b–c) either NP object may occur in the subject position, while the other remains in postverbal object position. (28b), like (27), shows that passive morphology blocks neither the assignment nor the licensing of lexical Case on objects. (28c) illustrates how the lexically selected Case marking on the object must be preserved at S-Structure when the object is moved to subject position. The nominative Case marking on the object NP in (28c) cannot be an instance of lexical Case, since nominative alternates with accusative in active sentences.17 A different pattern obtains for class II ditransitives. In this class, only the ± recipient NP may move to subject position. Compare paradigms (28) and (29). (29)
a.
Egill skilaði stelpunni pennanum. Egill-NOM returned the-girl-DAT the-pen-DAT
b.
Stelpunnii var skilað ti pennanum. the-girl-DAT was returned-PPP the-pen-DAT
Lexical case phenomena c.
123
* Pennanumi var skilað stelpunni ti. the-pen-DAT was returned-PPP the-girl-DAT
Given that passive morphology does not block Case assignment via selection of lexical Case and that an NP that is lexically Case-marked by a passive form may occur in syntactic subject position (as illustrated in (29b)), the ungrammatically of (29c) is surprising. The most salient structural difference between the two passive constructions is that in the S-Structure representation of (29b) the lexically Case-marked trace is adjacent to the V that governs it, whereas in the S-Structure representation of (29c) it is not. We will suppose therefore that there is a general prohibition (30) against a lexically Casemarked trace that is not strictly adjacent to a governing lexical head. (30)
Strict Adjacency for Lexically Case-marked Trace A lexically Case-marked Trace must be strictly adjacent to a governing head.
(30) has the flavor of a locality condition; though why this particular condition holds remains to be explained.18 In any event, the explanation cannot be that the lexical antecedent of the lexically Case-marked Trace is in some sense “too far away,” because it is possible to raise such antecedents out of their clauses, as illustrated in (31a).
(31)
a.
Hennii eru taldir [ti hafa verið sýndir ti bílarnir]. her-DAT are believed to-have been shown-PPP the-cars
b.
Bílarniri eru taldir [ti hafa verið sýndir henni ti]. the-car-NOM are believed to-have been shown-PPP her-DAT
(31) demonstrates that either underlying object NP of a class I ditransitive may be raised to a higher subject position with the familiar pattern of Case retention by the lexically Case-marked NP and Case alternation with the structurally Case-marked NP. Note that (30) would also hold for the intermediate trace in (31a) given that the matrix verb taldir governs the adjacent trace—which is plausible given that the familiar “exceptional Case marking” (ECM) constructions exist in Icelandic, as in (32). (32)
a.
þeir telja [IP hennii hafa verið sýndur ti bíllinn]. they believe her-DAT to-have been shown-PPP/NOM the-car-NOM
b.
þeir telja [IP bíllinni hafa verið sýndan henni ti]. they believe the-car-ACC to-have been shown-PPP/ACC her-DAT
As noted above, we assume that because Icelandic allows lexical Case selection for subjects, it is possible to move lexically Case-marked objects into subject positions under
Generative grammar
124
the proper conditions—one of which is the adjacency constraint (30), as (33)–(34) demonstrate. (33)
(34)
a.
Þeir telja [IP stelpunni; hafa verið skilað ti pennanum]. they believe the-girl-DAT to-have been returned-PPP the-pen-DAT
b.
*Þeir telja [IP pennanumi hafa verið skilað stelpunni ti].
a.
Stelpunni er talið [IP ti hafa verið skilað ti pennanum]. the-girl-DAT is believe to-have been returned-PPP the-pen-DAT
b.
* Pennanumi er talið [IP ti hafa verið skilað stelpunni ti].
Both (33b) and (34b) violate (30).19 Underlying subject NPs that involve lexical Case selection also occur in both ECM and raising constructions.20 (35) provides three standard examples of lexical Case selection for subjects, one in each of the nonnominative Cases of Icelandic. (35)
a.
Verkjanna gætir ekki. the-pains-GEN is-noticeable not
b.
Mér batnaði veikin. me-DAT recovered-from the-disease-NOM
c.
Mig vantar peninga. me-ACC lacks money-ACC
The corresponding ECM constructions are given in (36). (36)
a.
Hann telur [IP mig vanta peninga]. he-NOM believes me-ACC to-lack money-ACC
b.
Hann telur [IP barninu hafa batnað he-NOM believes the-child-DAT to-have recovered-from veikin]. the-disease-NOM
c.
Hann telur [IP verkjanna ekki gæta]. he-NOM believes the-pains-GEN not to-be-noticeable
Raising of the lexical Case selected subject is possible both in “seem”-type constructions and in the passive of ECM constructions. This is illustrated in (37) and (38), respectively. (37)
a.
Mig virðist vanta peninga. me-ACC seems to-lack money-ACC
b.
Barninu virðist hafa batnað veikin.
Lexical case phenomena
125
the child-DAT seems to-have recovered-from the-disease-NOM
(38)
c.
c.
Verkjanna virðist ekki gæta. the-pains-GEN seems not to-be-noticeable
a.
Mig er talið vanta peninga. me-ACC is believed to-lack money-ACC
b.
Barninu er talið hafa batnað veikin. the child-DAT is believed to-have recovered-from the-disease-NOM
Verkjanna er ekki talið gœta. the-pains-GEN is not believed to-be-noticeable
These paradigms demonstrate that lexical Case selected subjects in Icelandic have the same syntactic distribution and behavior as configurationally Casemarked subjects at S-Structure.21 6 Case assignment versus licensing At the outset we suggested that Case assignment and Case licensing might well be distinct phenomena, which would lead to a very different view of the Case Filter than has standardly been supposed. We can now provide further evidence for this view based on a subject-object asymmetry with respect to lexical Case in Icelandic. As noted in section 5, the selection of lexical Case in object position can license the occurrence of a lexical NP in object position of a passive predicate, in contrast to the configurationally accusative Casemarked object, which is not licensed in the same position. With a lexical Case selected subject, however, selection is not sufficient to license the occurrence of a lexical NP in subject position. The asymmetry in Case licensing for lexically Case-marked subjects versus objects shows up when we compare the failure of passive morphology to block (lexical) Case licensing (as, for example, in (27)) with the following paradigm. (39)
a.
[að [PRO batna veikin]] er venjulegt PRO-DAT to-recover-from the-disease-NOM is usual
b.
*[að [Jóni batna veikin]] er mikilvægt Jon-DAT to-recover-from the-disease-NOM is important
The bracketed construction in (39) is an infinitival sentential subject. Crucially, the verb in this construction is one that selects a lexical Case subject (in the dative). Given the PLS, we assume that PRO is Case-marked dative22 so that the lexical property of Case selection is satisfied in (39a). The fact that a lexical subject in the selected Case is not possible in this construction shows that lexical Case selection for subjects is not sufficient to license the presence of a lexical NP in that position. Thus, (39b) is ruled out as a violation of Case licensing.23 Under this analysis, the lexical Case subjects in the following examples must be licensed configurationally, and not via lexical Case selection.
Generative grammar (40)
126
a.
Jóni batnaði veikin. Jon-DAT re covered-from the-disease-NOM
b.
Ég tel [Jóni hafa batnað veikin]. I believe Jon-DAT to-have recovered-from the-disease-NOM
In (40a–b) the actual Case of the subject of batna is determined by the verb, but the licensing of the lexical NP is done independently in terms of its structural position. In (40b), for example, the licensing of the lexical NP Jóni is done by the matrix verb tel, which governs the NP. Thus, Case assignment and Case licensing appear to be distinct processes. This same asymmetry holds with respect to lexically Case-marked derived subjects where lexical Case selection is to an object position. The paradigm corresponding to (39) is given in (41). (41)
a.
[að PROi vera hjálpað ei] er erfitt PRO-DAT to-be helped-PPP is difficult
b.
*[(að) Jónii vera hjálpað ei] er erfitt Jon-DAT to-be helped-PPP is difficult
Given the PLS, we assume once again that PRO is Case-marked dative to satisfy the lexical Case selection property of hjálpa (namely, that it selects a dative object). Though this Case selection is sufficient to license the presence of a lexical NP in the object position of the passive predicate, it has no effect when the lexical object is moved to subject position, as illustrated in (41b). Thus, in (42) we have yet another example where Case assignment is determined via lexical Case selection, whereas the licensing of the lexical NP is done configurationally. (42)
Ég tel [Jónii hafa verið hjálpað ei]. I believe Jon-DAT to-have been helped-PPP
In this way, (41)–(42) provide striking confirmation of the subject-object asymmetry for lexical Case selection and that Case licensing is primarily a configurational phenomenon, with the exception of lexical objects in passive constructions, which can be licensed by lexical Case selection. The separation of Case assignment and the licensing of lexical NPs leads us to a reconsideration of what have been assumed to be Case Filter effects. Our investigation of lexical Case phenomena suggests that the determination of the Case of an NP is not the relevant factor; rather, it is whether a lexical NP occurs in a configuration that licenses the presence of a lexical NP. If this view is correct, then the Case Filter as formulated in section 1 should be replaced by a licensing principle along the lines of (43).
Lexical case phenomena
127
(43) Case licensing principle A lexical NP (that is, one containing phonetic material) must occur in a Case-licensed position.
Under this analysis, (39b) and (41b) constitute violations of Case Licensing.24 Though it seems that the Case-licensed positions are exactly those where a configurationally Case-marked NP can occur, there is in Icelandic and German the notable exception of the object position for passive predicates, where a lexically Casemarked NP can occur but the canonical configurational Case (accusative) cannot. One way to deal with this exception is to assume that a lexical NP can occur only in positions that receive a “structural index.” Thus, objects of verbs and prepositions receive a structural index under government, and presumably subjects of finite clauses receive such an index via identification with respect to agreement. If, for reasons yet to be determined explicitly, a passive predicate cannot assign such a structural index, then its lexical object must get one from some other position. If, however, lexical Case selection for objects can assign such an index as a marked option (for instance, Icelandic versus Russian), then the Icelandic facts would follow. Note that we would have to assume that assignment of the structural index is to a position and remains on the position even when a lexical NP in that position is moved to another position (as in (41b)). Assuming that structural indices are assigned within a government domain, the asymmetry of licensing with respect to lexically Case-marked subjects and objects would follow since subject position (the position designated as SPEC of IP) is outside the government domain of V.25 What we have demonstrated in this paper is that lexical Case phenomena manifest some rather different properties from configurational Case phenomena. With configurational Case phenomena, Case assignment and Case licensing are not distinguished. The analysis of lexical Case, in contrast, requires a distinction between the assignment and licensing of Case—the former being a lexical property of certain heads and the latter a configurational property of constructions. As we have shown with the analysis of Icelandic, it is licensing rather than Case assignment that distinguishes wellformed from ill-formed constructions. Since this analysis will apply equally well to configurational Case phenomena, it is possible to revise Case theory by replacing the Case Filter with a principle of Case Licensing, as we have proposed. Acknowledgement We would like to thank Len Babby, Caroline Heycock, Joan Maling, and Halldór Sigurðsson for helpful comments on this paper. Notes
1 Lasnik and Freidin (1981) give an argument that wh-trace must also be subject to the Case Filter. The argument concerns the deletion of wh-phrases in infinitival relative clauses where the deleted wh-phrase binds a trace in a subject position that is not marked for Case. The indicates the deletion site of the relative pronoun. relevant paradigm is given in (i), where
Generative grammar
128
(i) a. [NP The man [CP whoi [IP it is believed [CP ti [IP ti is lying]]]]] is a friend of mine. b. [NP The man [CP
[IP it is believed [CP ti [IP ti is lying]]]]] is a friend of mine.
c. *[NP The man [CP whoi [IP it is believed [CP ti [IP ti to be lying]]]]] is a friend of mine. d. *[NP The man [CP [IP it is believed [CP ti [IP ti to be lying]]]]] is a friend of mine.
In the examples where the wh-phrase originates in the subject of a finite clause (ia– b), the relative pronoun may freely delete. In corresponding examples where the wh-phrase originates in the subject of an infinitival that is not marked for Case, an overt wh-phrase results in a Case Filter violation. The deletion of the wh-phrase, which would eliminate the Case Filter violation, does not change the unacceptability of the example. Lasnik and Freidin (1981) conclude that this results because the Case Filter holds for wh-trace as well as lexical NP. It seems to us that there may be another explanation for these data. For example, we might reasonably conjecture that deletion operations can only affect elements that are “properly licensed.” If being marked for Case is part of proper licensing for lexical NPs, then the example that otherwise leads to the conclusion that wh-trace must be subject to the Case Filter can be ruled out on other (perhaps more general) grounds. This eliminates the apparent anomaly of grouping phonetically realized NP with whtrace as being subject to the Case Filter. 2 Discussion of how Case assignment and movement transformations are ordered will be deferred to section 6. For the present discussion we will assume that Case assignment follows movement—that is, all movements. 3 We assume here that the relation governs is defined in terms of symmetric m-command, where a category X m-commands a category Y if the first maximal phrasal projection dominating X, a lexical category, also dominates Y, where X and Y are in a linear relation in the phrase marker. In particular, it is assumed that the matrix V governs the infinitival complement subject NP in both (3) and (4)—but crucially not in (2), where CP is assumed to be a barrier to government. 4 Note that the explanation of Case absorption as a consequence of passive morphology fails to generalize to the nonpassive raising cases shown in (3). Thus, the mechanism that underlies Burzio’s generalization remains obscure (but see note 12). 5 This is essentially a principle of “Case checking”—see Jaeggli 1981, Vergnaud 1985, and Chomsky 1981 for discussion. 6 This distinction between lexical and configurational Case is first discussed in Freidin and Babby 1984 (written in 1981), which recasts and extends both the data and some fundamental ideas in Babby 1980b. 7 (12b) is from Babby 1980a:13. Note that the subject NP may be marked with the genitive of negation only if the sentence is existential. In Freidin and Babby 1984 both the partitive genitive and the genitive of negation are treated neither as instances of lexical nor as instances of configurational Case, but rather as instances of “semantic Case,” which is distinct from lexical Case in that it can alternate with configurational Case. See Freidin and Babby 1984 for a more detailed discussion. 8 Crucially, this is true no matter what analysis of Russian word order is adopted. 9 Here we follow the standard practice of giving German examples in the form of subordinate clauses in order to abstract away from effects of verb movements and topicalization. Note also that in German verbs assign Case to the left rather than to the right as in English.
Lexical case phenomena
129
10 The designation “-PPP” stands for “perfect passive participle.” Note that there is no reason to suppose that the NP dem Spion in (18b) is a syntactic subject. As Cole et al. (1980) have demonstrated, the D-Structure object in such sentences fails to exhibit any of the identifiable syntactic properties of S-Structure subjects in German: it fails to behave in coordinate structures as nominative Case subjects do, and there is no corresponding infinitival with a PRO subject in its place—properties in fact exhibited by nominative NPs. This is illustrated in the following paradigms, where the derived nominative subject participates in coordination and control structures in which the corresponding lexically Case-marked NP cannot. (i) a. daß der Spion beobachtet wurde that [the spy]-NOM observed was
(ii)
b.
daß der Spion Angst hatte und beobachtet wurde that [the spy]-NOM fear had and observed was
c.
daß der Spion hofft [PRO beobachtet zu werden] that [the spy]-NOM hopes observed to be
a.
daß dem Spion geholfen wurde that [the spy]-DAT helped-PPP was
b.
*daß der Spion Angst hatte und geholfen wurde that [the spy]-NOM fear had and helped-PPP was
c.
*daß dem Spion hofft [PRO geholfen zu werden] that [the spy]-NOM hopes helped-PPP to be
On the basis of the contrast between (ib–c) and (iib–c) it is assumed that whereas the nominative NP in (ia) occurs in derived subject position, the dative NP in (iia) does not. 11 The same diagnostics mentioned in the preceding note support this conclusion. Thus, for the verb ekeln ‘to be disgusted’ in (i), the accusative NP is analyzed as occurring in VP, and not in the canonical subject position (which contains an empty expletive under some analyses). (i)
a.
[IP [VPMich ekelt]]. me-ACC is-disgusted
b.
*daß ich Angst habe und ekelt that I fear have and is-disgusted
c.
*daß ich hoffe [PRO nicht zu ekeln] that I hope not to be-disgusted
12 Note that in Burzio 1986:178 the correlation referred to elsewhere as “Burzio’s generalization” is restricted to accusative Case. Burzio does not discuss the possibility of lexical Case. See Babby 1990 for discussion of certain constructions in Russian where Burzio’s generalization fails for configurationally Case-marked NPs as well. 13 The seminal studies in this regard are Andrews 1976 and Cole et al. 1980. Other important ones include Thráinsson 1979 and Zaenen, Maling, and Thráinsson 1985. 14 Here we abstract away from double object constructions in which an indirect object can bind (a subpart of) its direct object. See Sprouse 1989:262–307.
Generative grammar
130
15 Yip, Maling, and Jackendoff (1987) observe that a lexical Case selected by a verb may not show up in the corresponding nominalization. The example cited is the verb kenna ‘to teach’, which selects a dative object, where the corresponding nominal kennsla ‘teaching’ occurs with a (presumably configurational) genitive object. Given that lexical Case selection is a lexical property and that lexical properties often change under nominalization, this fact should not be surprising. 16 There are ungrammatical forms that differ from (27) only in that the object is definite rather than indefinite. (i)
*Í gær var hjálpað barninu. yesterday was helped-PPP the-child-DAT
(ii)
*það var hjálpað barninu. it was helped-PPP the-child-DAT
This is due to a definiteness effect rather than Case absorption via passive morphology, as (27) illustrates. See Sigurðsson 1989 for a detailed analysis. 17 See Zaenen, Maling, and Thráinsson 1985 and Sprouse 1989 for two different approaches to this problem. 18 See Zaenen, Maling, and Thráinsson 1985 for a different analysis based on rules that associate θ-roles and grammatical functions. Crucially, their analysis depends on one language-particular association rule—in contrast to (30), which is assumed to hold at the level of UG. 19 (34b) would be well formed if pennanum is interpreted as topic. If pennanum is placed in an unambiguous subject position, the sentence is ill formed, as illustrated in (i). (i)
*Var pennanum talið [IP ti hafa verið skilað stelpunni ti]? was the-pen believed to-have been returned-PPP the-girl-DAT
We are indebted to Hálldor Sigurðsson for this information. 20 Here we ignore the standard assumption of the early 1980s that all nonnominative subjects must be VP-internal at D-Structure, an assumption with purely theory-internal motivation. For us, underlying subject NPs are NPs that can be syntactically realized only in subject position. 21 Again, see the relevant literature cited in note 6. 22 Even though dative Case is not morphologically realized in such infinitivals, we assume it is present to account for agreement between the subject and a modifier that occurs in the predicate. (i) a.
Barninu batnaði veikin einu. the-child-DAT recovered-from the-disease alone-DAT/NEUTER/SG
b. Barnið vonast til að [PRO] batna veikin the-child-NOM hopes [PREP] PRO-DAT to-recover-form the-disease einu. alone-DAT/NEUTER/SG c.
[að [[PRO] batna veikin einum]] er PRO-DAT to-recover-from the-disease-NOM alone-DAT/MASG/SG is venjulegt/erfitt usual/difficult
Lexical case phenomena
131
Joan Maling informs us that the of the predicative adjective in these constructions can vary and that they impose an interpretation on noncontrolled PRO. Interestingly, when the infinitival verb does not select a lexical Case for the subject, the predicative adjective shows up with nominative Case. 23 Case assignment is not the relevant mechanism to distinguish between (39a) and (39b). It seems to us highly improbable that if lexical Case is assigned to PRO (as required by the PLS), it would fail to be assigned to the lexical NP. 24 Note that this principle is distinct from the of Proper Assignment discussed in section 1. The latter is needed to rule out examples like John saw he that do not violate Case Licensing or the Case Filter. 25 There remains the question of where Case assignment applies with respect to movement operations. Among the several options that are available (for instance, assignment at DStructure only, or at S-Structure only, or some combination), it is difficult to motivate one proposal over the others. It seems to us that any analysis should be compatible with our proposal regarding Case licensing, and therefore we will not pursue the issue further in this paper.
7 The subject of defective T(ense) in Slavic* with James E. Lavine 1 Introduction In human languages, a phrasal unit that is interpreted thematically at Logical Form (LF) as occupying one syntactic position may occur overtly in Phonetic Form (PF) in a different position, a phenomenon referred to as the displacement property As Chomsky has noted (1995c:221–22), this property appears to be unique to human languages, especially in contrast to artificially constructed formal systems such as logic (in its variety of formulations) and so-called computer languages. It has been a constant focus in the study of modern generative grammar from the outset. Within the Principles and Parameters framework of the past two decades and more particularly within recent refinements of the Minimalist Program, attempts to discover what motivates this property have identified three potential factors: Case, agreement, and the requirement that certain functional categories must have syntactically realized specifier phrases. For example, the displacement that occurs in the English passive construction in (1) below could be motivated in terms of: (i) Case: the nominative (NOM) pronoun must move to the specifier position of the clause (headed by the functional category T(ense) and hence Spec-TP) to be licensed; (ii) Agreement: the pronoun must move to Spec-TP to establish agreement with the passive auxiliary was; and (iii) the property of the functional category T, which requires that the pronoun move to create Spec-TP. (1)
He was attacked t by the visitor.
With only English data it is difficult to determine to what extent each factor alone might motivate displacement. From another point of view, this becomes a problem of separating the operative factor from epiphenomena that might cooccur.1 This article examines unaccusative constructions2 in Russian and Ukrainian where the predicate assigns accusative Case (ACC) to a nominal expression that shows up at PF in the specifier position of TP. These constructions provide empirical evidence for teasing apart the operative roles of the three potential factors in displacement phenomena. They show that T, in the absence of agreement or Case-licensing properties, has an independent requirement that triggers movement to its specifier. They also demonstrate a correlation between defective T lacking agreement features and a υ below, which can check ACC in the case of all unaccusative predicates. The correlation is not merely possible, but necessary. These unaccusative predicates cannot occur with T because if they did, the result would not yield a convergent derivation: the
The sujective of defective
of υ would be checked but not the be discussed below.
133
of T.3 Exactly how this works will
1.1 The framework In recent refinements of the Minimalist Program (in particular, Chomsky 2000b, 2001), displacement in narrow syntax is driven by an uninterpretable feature of a head of one of the core functional projections: CP, TP, and υP.4 Non-defective T and υ have a full set of uninterpretable which are checked via matching of corresponding of nominal expressions. Uninterpretable structural Case features of the nominal expression delete under such matching. Each core functional projection may have a specifier position that is not semantically-selected by the functional head. These are designated “EPP-positions”—i.e., positions not forced by the Projection Principle. These positions are motivated by a functional head’s requirement for an overt specifier.5 A major innovation in recent minimalist syntax is that feature checking of uninterpretable features need not involve movement. Uninterpretable features enter the derivation (from the lexicon) unvalued. They are valued and deleted (hence checked) via the relation Agree, which matches the unvalued features of the probe to the corresponding valued features of the goal. In the case of the unvalued uninterpretable of a verbal element, matching with the valued interpretable of an NP results in the valuation of the uninterpretable Case feature of the NP (and its subsequent deletion)—e.g., the on the probe T and the Case feature on the goal NP in (1). Chomsky 2001 assumes that “a goal as well as a probe must be active for Agree to apply” (cf. his (3i)). To be active is to have an unvalued (uninterpretable) feature. It follows that only N with unvalued structural Case can be active. Once structural Case has been valued via Agree, the NP headed by this N is frozen in place. This would account for the impossibility of raising the subject of a finite clause, as in (2). (2)
*Mary seems [TP t is proud of her work]
Given that NOM on Mary is valued under agreement with the head T of the complement TP, Mary would be frozen in the complement TP. The only way for this derivation to converge is with merger of expletive it in the root Spec-TP. An alternative derivation for (2) in which the Case feature of Mary is valued under matching with the root T would result in the of the complement T remaining unvalued—in which case this derivation fails to converge. As will be discussed below, the facts from accusative unaccusative constructions suggest that the assumption that an NP is unmovable once its Case has been valued is untenable. The operation Move applies only in those cases in which the probe has an EPP requirement, in addition to its or wh-features. The question of whether the EPP requirement of T can drive movement independently of its is the central concern of this paper. In the standard case of movement (e.g., the passive construction in (1)), the moved constituent also involves the operation Agree to check structural Case. However, Agree can apply without movement, as in the case of the English existential construction in (3), where the unmoved expletive associate a visitor values the of T:
Generative grammar (3)
134
There is a visitor in the kitchen.
In these constructions, the EPP requirement of T is satisfied by the merger of the expletive there.6 In this regard, the satisfaction of the EPP is separate from the checking of Case and agreement. However, these constructions are famously controversial with respect to the whether the expletive bears (NOM) Case and, if so, whether the associate bears some other Case (see Belletti 1988 and Lasnik 1995a for discussion). We will show that the Russian and Ukrainian constructions introduced in section 2 are significantly more straightforward in the way the EPP is distinct from properties of Case and agreement, and thus make an important contribution to this discussion. Finally, note that the notion “defective category” plays a central role in the analysis of accusative unaccusative constructions. A defective category is defined in (4): (4)
A category that lacks a full set of
is defective.
In English, infinitival T (as in (5)) and passive υ (as in (1)) are defective categories. (5)
and hence
We expect Len to finish his book this summer.
The independent status of the EPP requirement of T with respect to the Case/agreement system is tested precisely when T is defective. If movement to Spec-TP occurs in the absence of on T, the EPP alone must be sufficient to cause displacement. 2 The paradigm and In (6–8) we provide examples of accusative unaccusatives in Russian and Ukrainian: Russian Finite Accusative Unaccusative7
(6) a.
Soldata ranilo pulej. soldierACC wounded[−AGR] bulletINST “A soldier was wounded by a bullet.”
b.
Podvaly zatopilo livnem. basementsACC flooded[−AGR] downpourINST “Basements were flooded by the downpour.”
c.
Ženščinu zadavilo kovrom samoletom v parke womanACC crushed[−AGR] carpet airplaneINST in park Gor’kogo. of Gorky ‘A woman was crushed by the flying carpet [attraction] in Gorky Park.” [Moskovskij komsomolec 9/13/99]
The sujective of defective
135
In the case of the Russian construction the indirect internal argument can also appear pre-verbally in discourse-neutral speech. The basic requirement for discourse-neutral word order entails (in descriptive terms) a simple prohibition on V-initial structures. Examples with preverbal instrumental (INST) NPs are given in (7): (7)
Russian Finite Accusative Unaccusative a.
Vetrom i doždjami sbilo seti. windINST and rainsINST knocked-down[–AGR] netsACC “Wind and rains knocked down some nets.”
b.
Volnoj oprokinulo lodku. waveINST overturned[−AGR] boatACC “A wave overturned a boat.” [Kovtunova 1980:354]
The Ukrainian accusative unaccusative construction is a nonagreeing passive. The example in (8c) contains a passive wh-phrase: (8)
Ukrainian Non-Finite Accusative Unaccusative8 a.
Inozemcja bulo posadženo do v’jaznyci. foreignerACC was[−AGR] placed[−AGR] to prison “A foreigner was put into prison.”
b.
Nemovlja bulo znajdeno u košyku. babyACC was[−AGR] found[−AGR] in basket “A baby was found in a basket.”
c.
Ja spodivajusja, [ščo cej žart ne bude I hope that this jokeACC NEG will be vykorystano “Pravdoju Ukrajiny”. used [–AGR] PravdaINST of Ukraine “I hope that this joke won’t be used by Ukrainian Pravda.” [Wieczorek 1994:47]
The basic facts of these constructions are given in (9): (9)
Accusative Unaccusative Construction (6–8) i.
the underlying complement of the verb bears ACC case
ii.
there is no thematic external argument
iii.
discourse-neutral word-order is established by the location of the ACC or INST complement in a preverbal position
Generative grammar
136
iv.
the complement is optionally non-D(iscourse)-linked in the preverbal position9
v
the main predicate bears no agreement morphology
These predicate types, referred to in Babby (2000) as “un-Burzio” verbs, contradict Burzio’s Generalization, the correlation between a predicate’s thematic and Case properties, given in (10): (10)
Burzio’s Generalization (Chomsky 1986a:139; Burzio 1986:178) A verb (with an object) Case-marks its object if and only if it θ-marks its subject.
In what follows, we will suggest that Case never drives the A-movement attributed to the correlation in (10) (cf. Marantz 1991). Instead, displacement may reduce to the requirement of T for a specifier (cf. Rothstein’s (1983) “syntactic predication”), i.e., a requirement of core functional projections as in Chomsky 2000b. The central task regarding (9) is to show how these properties are connected in a principled way. We will argue that the underlying ACC- (or INST-) Case-marked complements of these unaccusative predicates move to the specifier position of a defective T, thus providing evidence for separating T’s EPP requirement from its other, purely morphological, uninterpretable Thus the structure of accusative unaccusative constructions is (11) and (12) on the opposite page.10 (11)
The sujective of defective
137
(12)
The dotted lines indicate that either the NP:ACC or the NP:OBL can move to Spec-TP. υP lacks a specifier. This indicates that Russian and Ukrainian, unlike Scandinavian, are not object shift languages. ACC Case on the direct internal argument is valued “long distance” via Agree with υ. The oblique semantic Case on the indirect internal argument is assigned by lexical V when the two merge.11 Because T is (i.e., defective) it cannot value NOM Case. Note the correlation between Tdef (where Tense lacks agreement morphology, indicated as [–AGR] in (6–8)) and the ability of an unaccusative predicate to converge with an ACC complement. When T is the sole argument of an unaccusative (or one of the internal arguments of a di-unaccusative) would be forced to enter into a relation with T’s uninterpretable a relation that is precluded in Slavic for NPs bearing non-NOM Case. While any unaccusative υ can potentially assign ACC to its complement, this operation will survive only those derivations that do not subsequently require that this ACC NP enter into a relation with T’s Consider in (13) the four logical possibilities for the of T and υ that could occur with unaccusatives:12
Generative grammar
(13)
a. b. c. d.
138
Tcomp/υdef Tcomp/υcomp Tdef/υcomp *Tdef/υdef
The configuration in (13a) gives the case of standard unaccusatives, which conform to Burzio’s Generalization. Unaccusative υ is defective and consequently fails to value ACC on its complement. Defective υ creates a weak phase (cf. Chomsky 2001) in which the unvalued complement is still active and, thus, can be valued by the higher T. The result is an unaccusative predicate with a NOM subject (e.g., English arrive and disappear, and their Russian equivalents pryti and isčeznut’, respectively). When both T and υ are (as in (13b)), a clause whose verb is unaccusative will converge only in the case of di-unaccusatives, where there are two NPs available: one to value the of υ and another to value the of T. The examples in (6a–b) will converge as follows with Tcomp:13 (14)
Russian Accusative Unaccusatives (Tcomp/υcomp) a.
Pulja ranila soldata. bulletNOM.FSG woundedESG soldierACC “A bullet wounded a soldier.”
b.
Liven’ zatopil podvaly. downpourNOM.M.SG floodedM.SG basementsACC “A downpour flooded basements.”
In the case of unaccusatives with a single internal argument, the configuration in (13b) will necessarily fail to converge. Only one of the two probes will be able to value and then delete its uninterpretable features. The configuration of (13c) yields a convergent derivation for the accusative unaccusatives in (6–8). Because unaccusative υ is it values ACC on its complement. The moved ACC NP satisfies the EPP of Tdef.14 The accusative unaccusative construction provides some motivation for not considering the EPP to be a species of feature checking. If it were, then Tdef would enter a one-way checking relation with the ACC NP in which the NP checks the EPP-feature of Tdef but Tdef checks no feature of the NP. This would contradict the otherwise well grounded assumption that both probe and goal must be active (i.e. have an unvalued feature) to be related by Agree (see Chomsky 2001). Turning briefly to (13d), note that this configuration is hopelessly deviant. Assuming that all unaccusatives select at least one internal argument, if both T and υ are defective (as in (13d)), this argument will not be valued for Case. The resulting uninterpretable Case-feature will cause the derivation to crash. The unaccusative types available with respect to the of T and υ are identified in (15).
The sujective of defective (15)
a. b. c. d.
139
Tcomp/υdef arrive type Tcomp/υcomp di-unaccusative Tdef/υcomp accusative unaccusative *Tdef/υdef N/A
Note that the configuration in (15c) is our main concern. The two central questions posed by this configuration are: (i) Can a defective category impose an EPP requirement?15 and (ii) Is a structurally Case-marked NP obligatorily “frozen” for subsequent narrowsyntactic operations once its Case is valued? 3 The Case for EPP-motivated A-movement To address question (i) first it must be considered whether some factor other than the EPP drives the displacement of a predicate-internal argument in (6–8). Next, this movement must be distinguished from discourse-oriented short-distance scrambling of arguments. These issues will be taken up in sections 3.1 and 3.2, respectively. In section 3.3, we further discuss the role of defective T in the derivation of accusative unaccusatives. 3.1 Case-driven movement to T of non-nominative NPs What is new about the data in (6–8) with respect to recent work on A-movement of nonNOM NPs to a preverbal position is that the underlying complements in these examples are structurally Case-marked and optionally non-D-linked (see n. 9). This contrasts with the familiar paradigm from Icelandic in (16), where a lexically Case-marked complement may be displaced into Spec-TP. (16)
Icelandic Quirky-Case-Assigning Passive a.
Stólunum hafði verið stolið á uppboðinu. the chairsDAT had[–AGR] been stolen at the auction “The chairs had been stolen at the auction.”
b.
það hafði verið stolið fjórum stólum á thereEXPL had[_AGR] been stolen four chairsDAT at uppboðinu. the auction “There had been stolen four chairs at the auction.”
c.
*Hafði verið stolið fjórum stólum á uppboðinu. [Sigurðsson 1992:14]
In (16a), where the main verb assigns lexical (i.e., quirky) Case to its complement, the passive construction exhibits the standard raising of the internal argument to Spec-TP without the standard Case- or agreement-checking motivation. In the absence of such movement, as in (16b), an expletive is merged in Spec-TP. When neither raising nor
Generative grammar
140
merger of an expletive into Spec-TP occurs, the result is deviant as (16c) illustrates. Under an analysis in which there is no independent EPP requirement, the obligatory presence of preverbal material in (16a-b) must follow from some other feature of T. Boeckx (2000a) argues that the requirement for an overt Spec-TP in such instances follows instead from an enriched machinery for Case-checking, involving the presence of a “Generic Case” feature in T, which attracts the lexically Case-marked complement or an expletive.16 Under this view, EPP-motivated movement in (16a-b) can still be held to overlap with a Case-checking operation. To be sure, the existence of lexically Case-marked complements that are displaced into subject position is among the thorniest problems involved in reducing the EPP to Case (or to some other feature of T). For Icelandic, the Generic-Case proposal and an independent EPP-analysis are roughly equivalent since they account for the same set of data. Note, however, that the generic Case proposal cannot be extended to account for the data in (6–8). The main empirical problem with applying Boeckx’s proposal to the Russian and Ukrainian examples is that the preverbal internal arguments in these accusative unaccusatives are structurally Case-marked, in contrast to the derived lexically Case-marked subject of the Icelandic passive in (16a). Under the Generic-Casechecking mechanism for this movement, unaccusative V’s complement would be forced to check two distinct structural Cases (with the peculiar property of bearing a morphological reflex of the former only), thereby creating a chain that is doubly Casemarked. That the direct object of accusative unaccusatives bears structural (rather than lexical) ACC is demonstrated on the basis of the following facts. First, the direct object undergoes Genitive of Negation (GenNeg) and Partitive Genitive (GenPart) formation, processes that are well known to apply to structurally Case-marked elements only.17 Note the examples from Russian and Ukrainian in (17) and (18), respectively: Russian: GenNeg18
(17)
Ne zatopilo ni odnogo podvala. NEG flooded[–AGR] not single basementGEN “Not a single basement flooded.” (18)
Ukrainian a.
GenNeg Na druhyj den’ ne bulo znajdeno joho čovna. on next day NEG was found[–AGR] his boatGEN “On the following day his boat wasn’t found.” [Shevelov 1969:177]
b.
GenPart Spočatku bulo vypyto vody a potim at first was drunk[–AGR] waterGEN and then vidrizano xliba. sliced[–AGR] breadGEN “First some water was drunk and then some bread was sliced.”
The sujective of defective
141
The examples in (17–18) are in contrast to lexically Case-marked NPs, which fail to undergo GenNeg, as shown in the Russian examples in (19):19 (19)
Russian: Lexical Case [Podražat’ “imitate”+DAT] a.
GenNeg *Oni ne podražajut inostrannyx metodov. they NEG imitate foreign methodsGEN
b.
Oni ne podražajut inostrannym metodam. they NEG imitate foreign methodsDAT “They do not imitate foreign methods.” [Freidin and Babby 1984]
Another argument for a structural Case analysis of the ACC complement of accusative unaccusatives is the “heterogeneous” NP-internal case-marking pattern for numericallyquantified nominals, as in (20). When the numeral bears structural Case, the remainder of the NP is obligatorily marked GEN. If the numeral bears lexical Case, the remainder of the NP will be marked “homogeneously” with the same lexical Case, as in (21).20 (20)
Russian Numerically-Quantified NP: Structural Case Ivan poceloval pjat’ krasivyx devušek. Ivan kissed fiveACC prettyGEN girlsGEN “Ivan kissed five pretty girls.”
(21)
Russian Numerically-Quantified NP: Lexical Case [pomoč’ “help” + DAT] a.
Ivan pomog pjati krasivym devuškam. Ivan helped fiveDAT prettyDAT girlsDAT “Ivan helped five pretty girls.”
b.
*Ivan pomog pjati krasivyx devušek. Ivan helped fiveDAT prettyGEN girlsGEN
Using these Case-marking patterns as a diagnostic for structural versus lexical Case, we see in (22–23) that the ACC complement of accusative unaccusatives does indeed bear structural Case. Under the theory that these predicates assign lexical ACC, the homogeneous Case-pattern would be falsely predicted to occur (cf. grammatical (21a)). (22)
Russian Numerically-Quantified Accusative Unaccusative Vetrom razbilo pjat’ okon/*Ókna. windINST broke[–AGR] fiveACC windows GEN windowsACC “The wind broke five windows.”
Generative grammar (23)
142
Russian Numerically-Quantified Accusative Unaccusative21 U nego na vojne vyrvalo pjat’ reber/*rëbra. at him at war tore-out[–AGR] fiveACC ribsGEN ribsACC “He had five ribs torn out [by an explosion] at war.”
Note, finally, that if the ACC complement of accusative unaccusatives were lexically, rather than structurally, Case-marked, this would predict that two ACC NPs (one lexical (an Experiencer) and the other structural) should be able to cooccur in the same clause. Indeed, Sigurðsson (1992) notes this possibility for Icelandic: (24)
Icelandic Experiencer Predicate Okkur vantaði vinnu. usACC lacked[–AGR] a jobACC “We lacked/needed a job.” [Sigurðsson 1992:4]
Similar ACC-ACC predicates do not occur in Russian and Ukrainian (or elsewhere in Slavic) where both ACC NPs are arguments. Note, for example, the ungrammaticality of (25), falsely predicted to converge with two ACC NPs under the theory that one of the two is lexically Case-marked (that is, where the first ACC is an Experiencer, parallel to the Icelandic example in (24)): (25)
Russian *Ee vyrvalo krov’. herACC vomited[–AGR] bloodACC
Returning to the Icelandic paradigm in (16), note that the DAT-NP/EXPL alternation reflects an interpretive distinction that obscures the precise nature of the checking relation with T. The raised internal argument in (16a) is obligatorily interpreted as definite. Under a theory of Generic Case, it would follow that this interpretive effect is, in some sense, built in to the checking operation. This presents an additional empirical obstacle for applying such a Case-checking mechanism to the preverbal ACC NPs in Russian and Ukrainian accusative unaccusatives since these preverbal ACC NPs are optionally interpreted as non-generic indefinites (i.e., they are optionally non-D-linked).22 To summarize, the preverbal ACC NP in Russian and Ukrainian accusative unaccusatives is structurally Case-marked and indefinite (under neutral discourse). We have shown in this section that this ACC NP does not enter into a Case (or agreement) relation with T. Furthermore, we have distinguished the Russian and Ukrainian accusative unaccusatives from the superficially similar lexically Case-marked passive construction in Icelandic, which, unlike the Slavic constructions, involves an obligatory definite interpretation for the raised nominal expression.
The sujective of defective
143
3.2 On short scrambling in Russian In order to establish that the displacement occurring in (6–8) reflects the EPP requirement, it is necessary to show that movement of the internal argument to the preverbal position in such cases is not simply a more general instance of discoursemotivated short (clause-bound) scrambling. In sections 3.2.1 and 3.2.2, we will briefly review facts concerning focus projection and Weak Crossover (WCO) that support the EPP hypothesis.23 In section 3.2.3 we provide evidence for syntactic movement to a functional projection in the Tense/Aspect system. 3.2.1 Focus projection One striking fact about the accusative unaccusatives in (6–8) is that the displacement of an internal argument does not disrupt focus projection, the process by which, under neutral intonation, the focus (new, or non-D-linked material) projects from the most embedded element to the whole VP or clause, the latter resulting in maximally wide focus (see Cinque 1993, Selkirk 1995, Reinhart 1995, Zubizarreta 1998). Because the wide-focus interpretation asserts the whole sentence as new it can be felicitously uttered in an out-of-the-blue context. In contrast, scrambling disrupts focus projection (Junghanns and Zybatow 1997:300–12; Kondrashova 1996:138–48). This distinction between constructions with and without scrambling can be demonstrated using the Russian double-object construction. As Junghanns and Zybatow (1997) show, the interpretation of focus indicates that the basic order in double-object structures such as (26) is the one in which the DAT Goal precedes the ACC Theme (see also Kondrashova 1996:142–43). (26) Russian Double-Object Construction24 a. Basic Order Odna ženščina podarila mal’čiku jabloko. one womanNOM gave boyDAT appleACC (i) (ii)
“A woman gave a boy an apple.” “A woman gave the boy an apple.”
b. Scrambled Order Odna ženščina podarila jablokoi mal’čiku ti. one womanNOM gave appleACC childDAT (i) (ii)
*“A woman gave an apple to a boy.” “A woman gave the apple to a boy.” [Junghanns and Zybatow 1997:295]
When the ACC NP scrambles over the DAT NP in (26b), it not only has the effect of removing jablokoACC “apple” from the focus, but, more importantly, focus now can no longer project to the entire VP; the DAT NP, as a result, is narrowly focused. In (26a), there is no discourse-oriented scrambling, with the result that focus can take scope over the entire sentence. The focus projection facts for (26a–b) are schematized in (27a–b), respectively:
Generative grammar (27)
144
Focus Projection a. b.
for (26a): [IP(FOC) Subj [VP(FOC) V [(FOG) NPDAT] [(FOC) NPACC]]] for (26b): [IP Subj [VP V [NPACC] [(FOC) NPDAT] tACC]]
Although the SVOO order in (26a) exhibits a well-known focal ambiguity (which in (27a) is shown to be potentially four-ways ambiguous), VO(O) without an S fails to exhibit the same ambiguity as demonstrated for accusative unaccusatives in (28–29): (28)
Russian (cf. (6a)) #Ranilo soldata pulej. wounded[–AGR] soldierACC bulletINST
(29)
Ukrainian (cf. (8a)) #Bulo posedženo inozemcja do v’jaznyci. was[–AGR] placed[–AGR] foreignerACC to prison
In these examples, the V-initial order, which is reported as awkward in discourse-neutral speech, constitutes a non-basic, scrambled structure. The examples in (28–29) either bear a particular focus structure (narrow focus on the right edge, with all other elements interpreted as part of the presupposed segment of the clause) or are interpreted as narrative-inversion structures, familiar from the Germanic literature, with a “folksy” “story-initial” reading.25 Thus, the VO(O) order for Russian and Ukrainian accusative unaccusatives has two possible interpretations: (i) narrow focus on the indirect internal NPOBL argument (or PP adjunct); or (ii) narrative inversion. By analogy with the focus projection pattern in double-object constructions, we are forced to assume that the VO(O) structure (on the non-narrative-inversion reading) involves discourse-oriented scrambling. It is precisely for this reason that the expected maximally-wide focus interpretation is not available. Note that it is not only surprising that the VO(O) structure cannot be interpreted as maximally focused, but that the OVO structures given in (6–8) can be. That is, what we are demonstrating here is that only the OVO structures, as in (30–31) (repeated from (6a) and (7a)), allow for focus projection (with the accompanying focal ambiguities):26 (30)
Russian: Accusative Unaccusative Soldata ranilo pulej. soldierACC wounded[–AGR] bulletINST “A soldier was wounded by a bullet.”
(31)
Russian: Accusative Unaccusative Vetrom i doždjami sbilo seti. windINST and rainsINST knocked down[–AGR] netsACC “Wind and rains knocked down some nets.”
The sujective of defective
145
Thus, (30), for example, felicitously responds to (i) čto slučilos’ “what happened?”; (ii) čto slučilos’ s soldatom “what happened to the soldier?”; and (iii) kak ranilo soldata “how was the soldier wounded?”. The possible focus domains for (30)
are given in (32):
(32)
Focus Projection for (30) [IP (FOC) NPACC [VP(FOG) V [(FOC) NPOBL]]
The central claim being made here is that the surprising focus projection facts in (32) follow from EPP-motivated movement of the direct internal argument. The failure to get maximally-wide focus in the VO(O) structure, then, results from a conflict between a formal syntactic requirement of the clause (T’s EPP) and an underlying argument structure that does not trivially satisfy this requirement via Merge. It follows that the basic-order requirement for focus projection is dependent on the more general requirement of syntactic well-formedness. The non-disruption of focus projection in (6–8), which provides evidence against a scrambling analysis for these predicates, indicates that a non-discourse-oriented relationship is established with Tdef, in terms of its EPP requirement. That is, we distinguish between semantically-driven A-movement (scrambling) and EPP-driven Amovement. Only the latter exhibits focus projection. This is further supported by the fact that focus projection to the clause is not disrupted by movement to Spec-TP in ordinary passives, the prototypical case of A-movement. 3.2.2 Weak Crossover An analysis that treats the movements in (6–8) as EPP-driven, rather than semanticallymotivated scrambling, predicts that the landing site of such movement should exhibit A-position properties. One test that identifies Amovement is the ability to override Weak Crossover (WCO) effects. In English, for example, when a nominal expression is moved to a non-argument position across a pronoun, the pronoun cannot be construed as anaphoric on the moved phrase, even though the anaphoric interpretation is possible when no movement occurs. Thus, compare the overt wh-movement in (33a) and the covert LF movement of the quantified NP in (33b) with (33c), where the nominal expression does not move. (33)
English: WCO a.
*whoi does hisi mother love ti?
b.
*Hisi mother kissed every child;. LF: [every childi [hisi mother kissed ti]
c.
Hisi mother loves Johni
Generative grammar
146
In contrast, movement to an argument position over a pronoun does not induce a WCO effect, as a comparison of (34a) and (34b) demonstrates (see Mahajan 1990 for further discussion). a. *whoi does Jillj seem to hisi mother [α tj to like ti]]
(34) b.
who; [IP ti se seems to his mother [α ti to be a good cook]]
[α in (34) denotes the embedded clause boundary, details aside]
We have already established that movement to Spec-TP does not affect focus projection in simple cases where no anaphoric relations are involved. The focus projection facts generalize to the weak crossover paradigms for Russian in (35–36).27 (35)
Russian: WCO a.
*(Rano ili pozdno) [egoi vladel’ca]k ub’et tk soon or late its ownerACC will kill[–AGR] každym pistoletomi. each gunINST “Sooner or later [itsi owner]k will be killed by every guni.” LF: [every guni [[itsi owner] k will kill tk ti]]
b.
(Rano ili pozdno) [každym pistoletomi] k ub’et soon or late each gunINST will-kill[–AGR] egoi vladel’ca tk. its ownerACC “Sooner or later every guni will kill itsi owner.” LF: [[every guni]k [tk [will kill itsi owner tk]]]
(36)
Russian: WCO a.
*[nogu egoi nositelja]k natiraet tk každym novym footACC of its wearer rubs sore[–AGR] each new sapogomi.28 bootINST “The foot of itsi wearer is rubbed sore by every new booti.” LF: [every new booti [[the foot of itsi wearer]k rubs sore tk ti]]
b.
[Každym novym sapogomi]k natiraet nogu each new bootINST rubs sore[–AGR] footACC
The sujective of defective
147
egoi nositelja tk. of its wearer “Every new booti rubs sore the foot of itsi wearer.” LF: [[every new booti]k [tk [rubs sore the foot of itsi wearer tk]]]
In the non-deviant (b) examples, the focus projection can include the entire clause. Note that even though the antecedent of the pronoun crosses over the pronoun overtly, no crossover effect occurs (i.e., the anaphoric relation with the pronoun is not blocked), suggesting that the moved constituent lands in an A-position before undergoing QR to a non-argument position.29 This reinforces the analysis that movement of the internal argument in accusative unaccusative constructions is to Spec-TP. We are assuming here that the weak crossover configuration, which is generally prohibited, involves a pronoun whose antecedent is construed as a variable to its right (cf. Chomsky 1976). The sentence (37a) violates weak crossover at LF because the trace of každuju deυočku “every girl,” which is construed as a variable, functions as the antecedent of the pronoun to its left. (37)
Russian: WCO (SVO ~ OVS)30 a.
*Eei sobaka ljubit každuju devočkui. her dogNOM loves each girlACC “Heri dog loves every girli.” LF: [every girli [heri dog loves ti]]
b.
Každuju devočkui ljubit eei sobaka ti. each girlACC loves her dogNOM “Every girli heri dog loves.” LF: [every girli [ti; [loves heri dog ti]]]
In contrast, in (37b), when každuju deυočku “every girl” is scrambled overtly across the subject ee sobaka “her dog” the anaphoric relation is not prohibited. This suggests that at LF, the variable bound by the quantified expression každuju deυočku “every girl” occupies the scrambled position rather than the verb complement position in which accusative NP is assigned its thematic function. The idea is that covert QR creates a weak crossover configuration whereas overt scrambling does not. There is, however, a crucial difference between the examples in (37) and the accusative unaccusative constructions in (35–36). In the former, scrambling in the (b) example disrupts focus projection so that only the nominative NP can be focused, whereas in the latter focus can project to the entire clause. The difference concerns whether the dislocated phrase moves to Spec-TP (EPP-movement) or to some higher projection (scrambling). The weak crossover facts demonstrate that both Spec-TP and the position to which an argument may scramble are A-positions. The difference in focus projection construal now
Generative grammar
148
strongly suggests that the position to which an internal argument moves in an accusative unaccusative construction must be Spec-TP.31 3.2.3 Movement to Spec-TP This section takes up the question of whether this non-discourse-oriented preverbal Aposition is actually an immediate constituent of a projection of Tense/Aspect. First we examine evidence from υP adverb placement which demonstrates that the landing site of the displaced internal argument is outside of υP. This analysis is further supported by genitive of negation facts and evidence from binding theory. The latter provides additional evidence that the landing site of the displaced internal argument in accusative unaccusative constructions is an A-position, hence within the complement of C. Given that υP adverbs occur at the left periphery of υP, the contrast between the (a) and (b) examples in (38–39) shows that the dislocated accusative internal argument occurs outside the υP projection: (38)
Russian Accusative Unaccusative Rabočegoi [vP sil’no [vP udarilo ti oskolkom plity]].
a.
workerACC strongly hit[–AGR] shardINST of concrete slab “A worker was hit hard by a shard of concrete slab.” ??
b. (39)
[vP Sil’no [vP rabočegoi udarilo ti oskolkom plity]].
Ukrainian Accusative Unaccusative a.
Inozemcjai bulo [vP pidpil’no [vP posedženo ti do foreignerACC was secretly placed[–AGR] to v’jaznyci]]. prison “A foreigner was secretly put into prison.” ??
b.
Bulo [vP pidpil’no [vP inozemcjai posedženo ti do v’jaznyci]].
The (b) examples show that when the dislocated accusative occurs within the υP projection (i.e., following the υP adverb), the sentences become significantly less acceptable. Given that υP is a complement of the functional head Negation, the analysis proposed for (38–39) is further supported by scope of negation facts. This is based on the contrast between regular accusative constructions and the accusative Unaccusative, as illustrated in (40) and (41).32 (40)
Russian GenNeg a.
SVO Ivan ne videl {??ni odnu sobaku / ni
The sujective of defective
149
IvanNOM NEG saw not single dogACC / not odnoj sobaki}. single dogGEN “Ivan didn’t see a single dog.” b.
OVS {??Ni odnu sobaku/ Ni odnoj sobaki} ne not single dogACC not single dogGEN NEG videl Ivan, saw IvanNOM
(41)
Russian GenNeg: Accusative Unaccusative a.
{Ni odnu korovu/ ??Ni odnoj korovy} ne not single COWACC not single cowGEN NEG razdavilo traktorom. crushed[–AGR] tractorINST “Not a single cow was crushed by a tractor.”
b.
{Ni odin podval/ ??Ni odnogo podvala} not single basementACC not single basementGEN ne zatopilo vodoj. NEG flooded[–AGR] waterINST “Not a single basement was flooded with water.”
In (40a) the structurally Case-marked internal argument appears in the GEN when it is within the scope of sentential negation. If this internal argument occurs as ACC instead of GEN, the result is significantly degraded for most speakers. We assume that the strong preference for the genitive of negation is conditioned by the presence of ni odin “not a single”, presumably concatenated within the object NP. The same preference holds when the internal argument is scrambled to clause initial position (see (40b)). In marked contrast, in the case of the accusative unaccusatives in (41), ACC on the preverbal dislocated internal argument is preferable under neutral discourse to the GenNeg, which is degraded. The fact that GenNeg on the displaced preverbal argument of the accusative unaccusative is distinctly less preferable than ACC is consistent with our analysis that it occurs in Spec-TP. The contrast with a scrambled object suggests that for GenNeg the scrambled object behaves as it it remains in complement position—possibly by reconstruction. Thus the GenNeg facts provide further evidence for A-position properties of the displaced argument in the accusative unaccusative—i.e., it does not exhibit reconstructive behavior.
Generative grammar
150
Anaphor binding, as illustrated in (42), provides further evidence that the displaced argument in accusative unaccusative constructions occupies an A-position.33 (42)
Russian: Accusative Unaccusative a.
Milicionerovi ranilo puljami prinadležaščimi militiamenACC wounded[–AGR] bulletsINST belonging drug drugui. to each otherRECIP “Militiamen were wounded by bullets belonging to each other.”
b.
*Puljami prinadležaščimi drug drugui ranilo bulletsINST belonging to each otherRECIP wounded[–AGR] milicionerovi. militiamenACC
In (42a) the displaced argument appears to bind the anaphor in VP. Given that the antecedent of an anaphor must be in an A-position, (42a) alone would appear to establish that the movement in accusative unaccusative constructions must be to an A-position. However, as indicated in (11–12) and discussed below in section 3.3, it is possible that the anaphor is bound by the trace of the displaced argument, which asymmetrically ccommands it. Nonetheless, (42b) provides clear evidence that the displaced INST argument is in fact in an A-position. If it were in a non-A-position, then under reconstruction the anaphor would be properly bound. Example (42b) shows that the argument that remains in VP cannot bind an anaphor within the displaced constituent (which we assume occupies an A-position). With respect to anaphor binding, the accusative unaccusative construction again contrasts with the short scrambling examples. Thus compare (42) with (43). (43)
Russian: OVS a.
*[Mašu i Ivana]i poznakomili druz’ja Maša and IvanACC introducedPL friendsNOM drug drugai. of each otherRECIP
b.
Fotografii drug drugai ljubjat [Maša i photographsACC of each otherRECIP like3.PL Maša and Ivan]i. IvanNOM “Maša and Ivan like photographs of each other.”
The sujective of defective
151
The example in (43a) shows that even though Mašu i Ivana “Maša and Ivan” ccommands the anaphor drug druga “each other,” proper binding does not occur between the two. However in (43b), the nominative argument (presumably in an A-position) can bind the reciprocal anaphor even though it does not overtly c-command the anaphor.34 The binding in (43b) appears to reflect reconstructive behavior on the part of the scrambled argument. If so, then reconstruction in (43a) would yield a structure in which the accusative argument does not c-command the anaphor, thereby accounting for the deviance of the example. The reconstructive behavior in (43) identifies the scrambled position as distinct from a canonical A-position. The lack of reconstruction in (42b) supports our analysis of the landing site as an A-position. To summarize the results of this subsection, we have shown on the basis of adverb and scope facts (with respect to Negation) that the moved constituent in accusative unaccusatives lands in a site higher than υP (and NegP). The fact that this moved constituent can appear non-D-linked in its preverbal position strongly suggests that movement does not occur to positions generally associated with topicalization (e.g. SpecCP and XP adjoined to TP). Further binding facts indicate that this movement targets an A-position. Locating this movement between υP and C implicates the Tense/Aspect system. This is consistent with our analysis that movement in the case of Russian and Ukrainian accusative unaccusatives is driven by T’s EPP requirement. 3.3 The derivation of accusative unaccusatives We now return to the derivation of Slavic accusative unaccusatives. Russian examples are given in (44a—b) (cf. (6a)) for reference: (44)
Russian: Accusative Unaccusatives a.
Soldata ranilo pulej. soldierACC wounded[–AGR] bulletINST “A soldier was wounded by a bullet.”
b.
Pulej ranilo soldata. bulletINST wounded[–AGR] soldierACC “A soldier was wounded by a bullet.”
Generative grammar
152
The structure for accusative unaccusatives is given in (45) (which combines
The predicate V merges first with the indirect internal argument, and then the resulting constituent merges with the direct object.35 On the first merger, V checks the “inherent” case that is associated with its indirect internal argument. On the next merger, V assigns its remaining theta role. The maximal projection of V then merges with light-υ, a null affix-like element to which lexical V subsequently attaches. Light-v checks structural ACC on the direct object via Agree, assuming that υ is This operation does not involve movement; no Spec-vP is projected because υ does not have an EPP requirement in these languages (i.e., these are not Object-Shift languages). At this point Tdef merges with υP forming TP. Because T is defective it has no agreement features to check. However one of the internal arguments of υP must concatenate with TP to form a specifier position in order to satisfy the EPP requirement of T. This derivation establishes that displacement need not be motivated by either Case or agreement. As noted above, it further shows that an NP whose Case has been valued previously is not necessarily frozen in place. In this case a goal need not contain an unchecked feature to be active with respect to some probe. If T were in (45), the derivation would fail to converge. The exact nature of this failure is not obvious. Given that the NP moving to SpecTP does so only to satisfy the EPP, it’s not obvious why the interpretable agreement features on the NP couldn’t also check the agreement features of T. To address this puzzle, let us consider three possible solutions. Suppose that T assigns NOM to the NP that checks its agreement features. This would create a form N+AGG+NOM—surely impossible in Slavic and likely universally. We might further assume that the ACC Case affix blocks the attachment of the NOM affix. Thus, the derivation violates Lasnik’s stranded affix
The sujective of defective
153
prohibition (Lasnik 1981). Now if a stranded affix constitutes a piece of phonological material that can’t be interpreted as a word, then it might be construed as a violation of Full Interpretation at PF. Alternatively, we might assume that only a NOM Case-marked NP can check the agreement features of T. Then the derivation fails to converge at LF because of the unchecked uninterpretable agreement features of T that survive. This analysis presupposes NOM Case as a precondition for checking the agreement features of T, rather than as a reflex of agreement with T (as in standard analyses). If we abandon the analysis in which agreement values Case, then we lose the explanation for why T and υ cannot both be defective in the same clause—i.e., the uninterpretable Case feature of the internal argument could not be eliminated from the derivation. As a third alternative, consider the possibility that there is a universal prohibition against the of an NP entering into more than one agreement relation. Thus, in the case of the accusative unaccusatives, the ACC argument, which gets its Case valued via establishing an agreement relation with υ, could not enter into an agreement relation with T—regardless of NOM Case. This unique agreement constraint generalizes to other instances of improper movement, for example (46). (46)
*John was believed t was content.
By hypothesis, the deviance of (46) concerns agreement rather than Case—i.e., that the agreement features of the NP can only be used once to value the agreement features of T (see Freidin and Vergnaud (2001) for additional discussion).36 4 Conclusion This paper has attempted to demonstrate that (i) the EPP requirement of T is still operative with Tdef and (ii) this feature impoverishment of T is a necessary condition for the class of accusative unaccusatives. Fact (i) shows that the EPP requirement is independent from the issue of This contradicts the theory in which the EPP is demoted to the status of a mere “property” or “subfeature” of another feature (as in Pesetsky and Torrego 2001). Note further that the Slavic examples show that the EPP requirement of T is independent of any particular structural Case (see Schütze 1993 and Babyonyshev 1996). In standard unaccusatives, the EPP is satisfied by a NOM NP, whereas in the accusative unaccusative construction it may be satisfied by an ACC NP or an INST NP. Contra Chomsky 2001, T’s EPP requirement crucially does not follow from its The EPP requirement of T is independently instantiated in the absence of both (NOM) Case and agreement. As far as we know, this is the first time that the nonredundancy of T’s EPP requirement has been demonstrated on the basis of movement of a non-NOM, structurally Case-marked, optionally non-D-linked NP. We have also shown that it is precisely the incomplete of T that allows certain unaccusative predicates to converge with ACC-Case-marked complements (fact (ii)). With standard unaccusatives, where T is the direct internal argument cannot appear in the ACC because that would involve a single NP entering into two agreement relations (with υ as well as T). Stipulating a correlation between the presence of an
Generative grammar
154
external argument and the licensing of ACC Case is unnecessary. If so, then in the case of passives with a T, there is no need to rule out ACC on the complement by means of the mysterious notion of “Case absorption.” The properties of the various constructions follow from general principles of grammar interacting with the specific attributes of the functional categories T and υ—specifically, their composition and the EPP. Notes
* We gratefully acknowledge Stephanie Harves, Tania Ionin, and two anonymous reviewers for JSL for helpful suggestions and useful criticism. This work was initially inspired by conversations with Len Babby, to whom we affectionately dedicate this paper on the occasion of his sixtieth birthday. 1 See, for example, Martin 1999 and Boeckx 2000a,b where it is argued that the requirement for a specifier reduces to a Case requirement. Pesetsky and Torrego 2001 claim that the requirement is not a primary feature of any head (as in Chomsky 1995c), but rather is a subsidiary feature of some other formal feature that triggers displacement (e.g., wh- or ). Similarly, Chomsky 2001: fn. 6 conjectures that this requirement correlates with the full set of agreement features on T. 2 The term “unaccusative” should be construed as a technical term for a class of constructions in which a nominal expression that functions as a complement of a predicate shows up in PF in canonical subject position (i.e., Spec-TP). 3 Following Chomsky 2000b, 2001, Pesetsky and Torrego 2001, and much previous work dating back to George and Kornfilt 1981, we will argue that NOM Case is licensed by agreement, rather than Tense, in Slavic (as well as English). Cf. latridou 1993 and Ura 2000, where it is shown on the basis of Modern Greek and Japanese, respectively, that the correlation between NOM Case and agreement does not hold universally. In these languages, as well as others, T’s and NOM Case can be checked independently. 4 Since this paper is concerned with the systems of structural Case and agreement, we will not discuss the uninterpretable wh-feature of C. 5 Note that the Extended Projection Principle (EPP) consists of the Projection Principle plus “the requirement that clauses have subjects” (Chomsky 1982a:10). The term EPP in recent theoretical discussions has come to refer just to the requirement that extends the Projection Principle, now generalized to specifier positions of functional categories. To be sure, Chomsky 1982a refers only to the EPP requirement of Infl. In what follows we will use the term EPP to refer to the requirement for a specifier rather than a morphosyntactic feature or a property of such features. Lasnik 2001 gives two arguments for this view of the EPP. One concerns the observation that if Agree applies long distance (as in Chomsky 2000b, 2001), then movement is not motivated by feature checking. The other is based on his analysis of object raising in ellipsis constructions. 6 Chomsky 2001 assumes that the expletive “has the uninterpretable feature [person]”. In this way, satisfaction of the EPP via expletive merger also involves matching of agreement features. Presumably the [person] feature of the expletive must be uninterpretable like the expletive itself, hence entering the derivation unvalued, so that it agrees with the value of the [person] feature of its associate. The derivation of the expletive construction in (3) would involve the application of Agree between the probe T and the goal a visitor, followed by the application of Agree between the probe there and the goal is. For the goal a visitor to be active, it must have an uninterpretable Case feature. Note that the expletive itself need not have a structural Case feature given that the uninterpretable [person] feature is sufficient to render it an active probe. Note further that the valuation of the expletive’s [person] feature
The sujective of defective
155
requires that the valued uninterpretable features of T remain available in the derivation for some period of computational time. That the expletive must have a [person] feature is, however, not obvious. It is only necessary under the hypothesis that EPP-movement occurs as a side-effect of Agree. 7 This predicate type was discussed as early as Rothstein 1983. See Babby 1994 for a detailed discussion of its argument structure. 8 This predicate type is usually referred to in the Slavic literature as “Ukrainian -no/-to”; see Billings and Maling 1995 and Lavine 2000 for discussion. 9 An entity is “D-linked” if it has a pre-established reference in the universe of discourse. We will use the term “non-D-linked” to refer to non-generic indefinites. These terms are borrowed from Pesetsky 1987. They are discussed with respect to the EPP in Russian in Babyonyshev 1996. 10 The trees in (11–12) refer specifically to the Russian accusative unaccusatives in (6–7). Note that the two internal arguments are in the same minimal domain (i.e., VP, or, after V’s adjunction to υ, υP) and, thus, are “equidistant” to T (in the sense of Chomsky 1993) with respect to Minimality and Shortest Move. This explains how the EPP-probe in T can identify either of the internal arguments for movement.
Babby (1994) notes that the indirect internal argument of Russian accusative unaccusatives need not be marked INST. The Source theta role, for example, will be realized as an ot “from”+GEN PP, as in (i): (i)
Nos založilo ot pyli. noseACC clogged[–AGR] from dustGEN, “My nose got stuffed up from the dust.”
For this reason, the trees in (11–12) refer to the indirect internal argument as NP:OBL. Note, additionally, that the INST wh-phrase in the Ukrainian nonagreeing passive is not a complement of the verb and, thus, cannot move to the V-initial position under neutral discourse, in contrast to the indirect internal argument in the Russian examples in (7). 11 See Freidin and Babby 1984 on the distinction between semantic and lexical (quirky) Case. One criterion for semantic Case is that it can alternate with structural Case, while lexical Case cannot (cf. (6a) and (14a)). developed out of 12 This typology of unaccusative predicates with respect to discussions with Stephanie Harves. Harves (2002) proposes a different analysis of the accusative unaccusative construction where both T and υ are and ACC Case is valued by a V We are assuming that V does not carry independent of T and υ. 13 Babby (1994) refers to the examples in (14) as “demiactives”. Raising of the direct (rather than indirect) internal argument (the Theme) requires the affixation of reflexive -sja, which, by lexical stipulation, is not available for all verbs. The affixation of reflexive -sja in the case of di-unaccusatives is a productive property of Subject-Experiencer (Psych) Verbs, as in (i): (i)
Russian Subject-Experiencer Verb Ivan udivil- sja ètoj novosti.
Generative grammar
156
Ivan NOM-M-SG surprised M-SG REFL this news DAT “Ivan was surprised at this news.”
Some verbs in Russian that allow nonagreeing accusative unaccusatives (as in (6)) also allow affixation of reflexive -sja. The example in (6b) shows the same pattern as (i) above: the Theme argument can appear in the NOM (that is, be valued by T’s probe), inducing reflexive -sja. The relevant example is the Middle in (ii): (ii)
Russian Di-Unaccusative Podvaly zatopili- s’ livnem. basementsNOM-PL floodedPL REFL downpourINST “Basements were flooded by the downpour.”
Note that (i—ii) instantiate the configuration given in (13a): Tcomp / Vdef These are unaccusative predicates whose surface subject bears an internal theta-role (Experiencer in (i) and Theme in (ii)). 14 The availability of accusative unaccusatives in Russian and Ukrainian, but not English, ultimately depends on lexical properties. For example, the English passive *Him was attacked by the visitor is ruled out by the fact that English contains no finite form of be that lacks agreement morphology. Nonagreeing passives of the Ukrainian -no/-to type in Russian, such as *Vanju bylo ubito na υojne “VanjaACC was killedN.SG at war”, are similarly ruled out by the lack of nonagreeing morphology. The word-final morphology on ubito “killed” is either default (occurring, for example, with quantified NPs) or neuter singular. In contrast, Ukrainian /-no/-to/ is a dedicated nonagreeing morpheme. The neuter singular in Ukrainian is marked by /-e/ rather than /-o/. We do not attempt to derive the lack of nonagreeing morphology as a lexical resource. 15 Recall that Chomsky 2001: fn. 6 assumes that it cannot. 16 The term, “Generic” Case, presumably comes from a quantificational analysis of the raised NP, where a higher position in the tree for indefinites correlates with a generic (rather than existential) interpretation. 17 For discussion, see Freidin and Babby 1984. 18 The EPP in free-word-order languages is typically “overridden” by Focus Structure in a way that we will not make explicit. The word order chosen for the examples in (17–18) reflects the fact that GenNeg and GenPart apply most naturally to new, or focused, constituents, rather than to topics. Focused constituents in Russian and Ukrainian appear on the right edge under neutral intonation. 19 Due to space limitations, similar examples for GenPart, as well as the Ukrainian cognate examples for both GenNeg and GenPart, which show the same phenomena, will not be given. 20 See Babby 1987 for a discussion of these facts. 21 Note that in this example we have chosen the mono-transitive unaccusative υyrυat’ “tear-out; vomit” to show that mono-transitive accusative unaccusatives likewise occur with a structurally Case-marked complement. Note in (i) that GenNeg also applies with this verbtype:
The sujective of defective (i)
157
Mono-Transitive Accusative Unaccusative Posle vzryva okazalos’, čto nikomu ne vyrvalo after explosion turned-out that no-oneDAT NEG tore-out[–AGR] kišečnika. intestineGEN “After the explosion, it turned out that no one’s intestine was torn out.”
22 On the interpretation of the Russian examples in (6–7), see Kovtunova’s (1980) article on “word order” in the Russian Academy Grammar. 23 The facts in section 3.2.1 draw heavily on Lavine 2000, ch. 2. 24 Focus projection is detected here, and elsewhere, on the basis of whether an indefinite reading is available for both internal arguments. The judgments, as well as the examples themselves, are from Junghanns and Zybatow 1997. See Ionin 2001 for additional argumentation (contra Bailyn 1995) for the DAT-ACC order in double-object constructions as basic. 25 The term “story-initial” for V-initial structures in Russian is borrowed from Bailyn 1998. Zwart (1993) proposes, on the basis of V-initial structures in Germanic, that V-initial narrative inversion structures involve a null narrative operator in SpecCP which is responsible for driving movement of V to C, while at the same time blocking XP-movement into C’s specifier (yielding the non-V-2 structure). Note additionally that Russian and Ukrainian lack EPP-satisfying expletives of the sort employed in Icelandic (16b). See Babby 1998 and Lavine 1998, 2000 for argumentation against the presence of null expletives in structures such as (28–29). 26 Recall from earlier discussion that either the ACC NP or the INST NP can appear preverbally with focus projection. “OVO” is thus shorthand for both NPACC V NPINST and NPINST V NPACC. 27 Note that in (35–36) it is the indirect internal argument that moves to the preverbal position. The examples are constructed this way in order to provide an appropriate testing ground for crossover. 28 A reviewer points out for (36a–b) that nositel’ “wearer” is, strictly speaking, not a word (in the prescriptive sense). Since this does not affect the analysis of (36a–b) and given that native speakers of Russian understand the admittedly non-standard use of the word, we have chosen not to modify the example. 29 It is worth noting here that there is some controversy about whether overt wh-movement in Russian creates a weak crossover effect. Thus (i) is considered marginal, though perhaps not clearly deviant. (i)
??
Kogoi ljubit egoi sobaka ti? whoACC loves his dogNOM “Who does his dog love?”
King (1995:54–55) states that the anaphoric reading of (i) is possible, though not preferred. Interestingly, when the wh-interrogative is replaced by a name, the sentence is fully grammatical on the anaphoric reading.
Generative grammar (ii)
158
Vanjui ljubit egoi sobaka ti. VanjaACC loves his dogNOM “Hisi dog loves Vanjai” / “Vanjai is loved by hisi dog.”
The contrast in relative grammaticality judgments suggests that wh-movement in Russian (in contrast to the short distance scrambling in (ii)) might manifest a weak crossover effect. If it is possible for speakers to analyze (i) as a scrambling construction, then perhaps the weakness of the perceived deviance results from a confusion of the two analyses. 30 OVS structures such as (37b) appear to involve V-movement. See Bailyn (2004) for discussion. 31 Note that accusative unaccusatives in Ukrainian, as in (ia—b), also show weak crossover effects: (i)
a.
*[Johoi hospodarja]k bulo pokusano tk kožnym sobakojui. its ownerACC was bitten[–AGR] each dogINST “Itsi owner was bitten by every dogi.” LF: [every dogi [[itsi owner] k was bitten tk ti]]
b.
[Kožnym sobakojui]k bulo pokusano johoi hospodarja tk each dogINST was bitten[–AGR] its ownerACC “By every dogi was bitten itsi owner.” LF: [[every dogi]k [tk [was bitten itsi owner tk]]]
However, the passive by-phrase in Ukrainian (in contrast to the indirect internal argument in Russian accusative unaccusatives) cannot occupy the preverbal position discourse-neutrally so there is no focus projection in such cases. For this reason, the Ukrainian weak crossover paradigm does not distinguish EPP- from discourse-driven movement. 32 The examples in (40) are from Ionin 2000. 33 Note that reflexives in Russian normally require a canonical NOM subject antecedent. For this reason, we use reciprocals, which potentially can be bound by any c-commanding antecedent in an A-position. See Rappaport 1986 for details on the application of Binding Theory Condition A in Russian. 34 Tania Ionin (p.c.) reports that the nominative argument in (43b) appears most felicitously in the post-verbal position when accompanied by focal stress. 35 To be sure, not all instances of accusative unaccusatives are ditransitive; cf, for example, (23). In the case of mono-transitive unaccusatives, V merges with the ACC NP as its complement. That is, under Bare Phrase Structure (Chomsky 1995b), there is no fixed position in the tree for the direct object. The position in which it is merged depends on the thematic properties of the predicate. 36 Note also that the unique agreement prohibition overlaps with a constraint on Case uniqueness, interpreted as applying to tokens, not just types (see Freidin (1992)).
§ C: Binding
8 Disjoint reference and wh-trace* with H. Lasnik This article explores the ramifications of an analysis of disjoint reference for the theory of core grammar.1 The analysis we will adopt accounts for coreference possibilities between pronouns and wh-traces and, following May (1981), also provides an explanation for the COMP-to-COMP condition on Wh Movement of Chomsky (1973). The central assumption of the analysis—that wh-trace binding is exempt from the Prepositional Island Condition (PIC) and the Specified Subject Condition (SSC)—allows for a simplification of the theories of binding and indexing, and also provides an argument that the Subjacency Condition is properly interpreted as a condition on representations rather than a condition on derivations.2 The organization of this article is as follows. Section 1 deals with the theoretical framework presupposed in the discussion. Section 2 contains an explication of indexing and binding as applied to the various NP-types (e.g., lexical NP vs. NP-trace). This section gives the analysis of disjoint reference under which wh-traces must be exempt from the PIC and SSC. Discussions of the resulting simplification in the analysis of binding and indexing follow in section 3. Section 4 deals with the consequences of our analysis for the *[that—e] filter and Subjacency. 1 Theoretical framework The following discussion presupposes the general framework of the Extended Standard Theory as developed in Chomsky (1975c), Chomsky and Lasnik (1977), and Lasnik and Kupin (1977)—including in particular trace theory (see Chomsky (1980) and the references cited there). In addition, it will be assumed that the PIC and SSC constitute principles of binding (Chomsky (1980) and Freidin (1978)) rather than conditions on rules of grammar (Chomsky (1973; 1976)). Following Chomsky and Lasnik (1977), the organization of core grammar is taken to be roughly as shown in (1). (1)
I. II.
a.
Deletion
b. c.
a.
Base
b.
Transformations (movement and insertion) III.
a.
Quantifier interpretation
Filters
b.
Control
Phonological rules
c.
Binding principles
Disjoint reference and wh-trace
163
(See Lasnik and Freidin (1981) for a discussion of how Case theory (as discussed in Chomsky (1980)) relates to this organization.) The components of (I) constitute “the Syntax”; those of (II), “the Phonology”; and those of (III), “Logical Form.” (I) maps base structures onto S-structures, the output of the transformational component.3 (II) maps S-structures onto phonetic representations; and (III) maps S-structures onto representations in logical form (LF), one aspect of semantic representation. According to (1), the rules of the phonology in no way determine semantic representations and the rules of LF in no way determine phonetic representations. (1) constitutes an empirical hypothesis and therefore comes with the usual caveats. We return to (1) in section 3. With respect to indexing, we adopt part of the formalism given in Chomsky (1980). Every NP is assigned a referential index, consisting of a single integer i(i≥1). In addition, every nonanaphoric NP is assigned an anaphoric index, given as a set of integers which consists of the referential indices of all c-com-manding NPs. An index (i, {j}) indicates that an NP whose referential index is i may not have the same intended referent as an NP whose referential index is j. The relation between the referential and anaphoric indices of an NP is that of disjoint reference (DR). Anaphors are not assigned anaphoric indices; only nonanaphoric NPs enter directly into the relation of DR. Under this analysis, the (i)-examples of (2) would be assigned the indexed representations of the corresponding (ii)-examples. (2)
a.
b.
Pronouns: i.
John saw him.
ii.
John(1) saw him(2,{1})
Names:4 i.
John saw the boy.
ii.
John(1) saw the boy(2,{1})
The algorithms for assigning indices and their location in the grammar will be discussed in section 4. 2 Introduction to binding In this section it will be assumed that the SSC and PIC are properly construed as principles of binding which hold for the relations of bound anaphora and disjoint reference. We will show that while anaphors and pronouns (in contrast to names) fall under these principles, wh-traces (like names) do not. Considering for the moment just bound anaphora, one formulation of the PIC and SSC as binding principles could be (3).
Generative grammar (3)
164
*NPi, NPi an anaphor, unless NPi is bound in the domain of (a)
a subject (the SSC) or
(b)
tense (the PIC).
We take (3) to be a filter at the level of LF (cf. Freidin (1978)). The domain of a category α is taken to be all the categories c-commanded by α. An anaphor is bound in a domain when there is a c-commanding NP with the same referential index in that domain. This formulation has advantages over previous formulations of the SSC and PIC—for example, the one in Chomsky (1980)—as will be discussed in section 3. (3) accounts for the standard paradigm of bound anaphora, where the class of anaphors includes reflexive pronouns, reciprocals, and also NP-traces. Some of the relevant data are given in (4). (4)
a.
b.
Lexical Anaphors: i.
*Harry1 suspected [ Celia to be insulting himself1]
ii.
*Harry1 suspected [ that himself1 was insulting Celia]
iii.
Harry1 suspected [ himself1 to be insulting Celia]
NP-trace: i.
*Harry1 was suspected [ Celia to be insulting e1]
ii.
*Harry1 was suspected
iii.
Harry1 was suspected [ to be insulting Celia]
(that) [e1 was insulting Celia]
The anaphors in the (i)-examples are free in the domain of the complement subject and are therefore prohibited by the SSC (3a). The anaphors in the (ii)-examples are free in the domain of the complement tense and are therefore prohibited by the PIC (3b). In contrast, the anaphors in the (iii)-examples are in the domain of a subject and tense with respect to the matrix S, but not the complement S. In the matrix domain the anaphors are bound and thus not prohibited by (3). This formulation of the binding principles generalizes automatically to anaphors in simple sentences, as shown in (5). (5)
a.
b.
Lexical Anaphors: i.
*Himself1 left
ii.
*John1 insulted herself2
NP-trace: i.
*e1 left
ii.
*John1 insulted e2
Disjoint reference and wh-trace
165
The (i)-examples fall under the PIC (3b); the (ii)-examples, under the PIC and the SSC (3a). Previous formulations of the SSC and PIC, with the exception of Chomsky (1980), applied only to complex sentences. As a result, an account of (5) required additional stipulations (cf. Freidin (1978, (8))). Given (3), these are unnecessary Ordinary pronouns also should fall within the domain of the SSC and PIC, as discussed in Chomsky (1973; 1976). In the case of pronouns, however, the relevant property is obligatory disjoint reference (Lasnik (1976), Chomsky (1976)) rather than obligatory coreference as in the case of anaphors. Because of this, the filtering formulation of the SSC and PIC in (3) cannot be extended to account for the facts of disjoint reference. Therefore, the SSC and PIC as stated in (3) must be reformulated. A reformulation is given below. In the formalism for indexing of section 1, obligatory disjoint reference is expressed in terms of an anaphoric index. Under the procedure for constructing anaphoric indices stated above, the pronoun cases analogous to (4) will be assigned the indexed representations (6a–c). (6)
a. b. c.
Harry(1) suspected [ Celia(2,{1}) to be insulting him(3,{1,2})] Harry(1) suspected [ that he(2,{1}) was insulting Celia(3,{1,2})] Harry(1) suspected [ him(2,{1}) to be insulting Celia(3,{1,2})]
The representations (6a) and (6b), in contrast to (6c), are incorrect because the pronouns him and he are not interpreted as obligatorily disjoint in reference from Harry in the corresponding sentences (7a) and (7b). (7)
a. b.
Harry suspected Celia to be insulting him. Harry suspected that he was insulting Celia.
The pronouns in (7a,b) are free in reference with respect to Harry—that is, coreference is possible though not necessary. In terms of indexing, a pronoun is free in reference with respect to an NP with referential index i when the anaphoric index of the pronoun does not contain the integer i. Given this, the correct representations for (7a,b) are (7′a,b), respectively. (7′)
a. b.
Harry(1) suspected [ Celia(2,{1}) to be insulting him(3,{2})] Harry(1) suspected [ that he(2) was insulting Celia(3,{1,2})]
In the framework of Chomsky (1980), (7′a,b) would be derived from (6a,b) by deleting the member 1 from the anaphoric index of each pronoun. In (6a), the member 1 is not bound (as defined for (3)) in the domain of a subject. In (6b), the member 1 is not bound in the domain of tense.5 Thus, the deletions occur when the member 1 fits the structural circumstance inherent in (3)—i.e., free (=not bound) in the domain of a subject or tense. (7′a,b) are derived from (6a,b) by reformulating the SSC and PIC as re indexing rules which carry out the required deletion under the stated structural circumstances. These rules can be formalized as follows:
Generative grammar
166
(8) When a pronoun with anaphoric index j(j={a1,…, an}} is free(i) in the domain of subject or tense, j→j−{i}. (Cf. Chomsky (1980, appendix).)
A pronoun is free(i) in a domain when there is no c-commanding NP with referential index i in that domain. This type of analysis can be extended to the anaphor cases, as Chomsky shows. For anaphors, the SSC and PIC affect referential indices. For example, the SSC and PIC as reindexing rules should map (4ai) onto (9a) and (4aii) onto (9b). (9)
a. b.
Harry(1) suspected [ Celia to be insulting himself(0)] Harry(1) suspected [ that himself(0) was insulting Celia]
It must be assumed that an NP with a zero referential index is ill-formed—i.e., *NP(0). The appropriate reindexing rule can be formalized as follows: (10) When an anaphor with referential index i is free(i) in the domain of subject or tense, i→0 (i.e., i−i). (Cf. Chomsky (1980, appendix).)
The definition of free(i) with respect to pronouns extends without modification to anaphors. An anaphor is free(i) in a domain when there is no c-commanding NP with referential index i in that domain. Given this system, we will now show that a wh-trace has an anaphoric index which is not subject to the reindexing rules, SSC and PIC. Thus, a wh-trace is distinct from an anaphor and also from a pronoun. The relevant evidence involves coreference possibilities between a wh-trace and a c-commanding pronoun. (11) gives the paradigm bearing on the SSC, and (12), the paradigm bearing on the PIC. (11)
a. b.
Who does he want the woman to like? Who wants the woman to like him?
(12)
a. b.
Who does he think likes the woman? Who thinks he likes the woman?
In the (b)-examples, the pronoun is free in reference with respect to the variable bound by who,6 whereas in the (a)-examples,7 the pronoun is construed as disjoint in reference from the variable bound by who. In the (a)-examples, the speaker must already know the referent of the pronoun; in the (b)-examples, he need not, but rather may be inviting the listener to provide a referent for the pronoun by answering the question. This difference is easily captured on the assumption that a wh-trace has an anaphoric index. Under trace theory, the derivations of (11a) and (11b) involve the following partial mappings from S-structure to LF. (13)
(for (11a)): a. [ [C who3] [S does he1 want [ e3 [S the woman2 to like e3]]]]
Disjoint reference and wh-trace
167
b. (for which x3, x3 a person) [S he1 wants [S the woman2 to like x3]] c. (for which x3, x3 a person) [S he(1) wants [S the woman(2,{1}) to like x(3,{1,2})]] (14)
(for (11b)): a. [ [C who1 [S e1 wants [ [S the woman2 to like him3]]]] b. (for which x1, x1 a person) [S x1 wants[S the woman2 to like him3]] c. (for which x1, x1 a person) [S x(1) wants [S the woman(2,{1}) to like him(3,{l,2})]]
The S-structures (13a), (14a) are mapped onto (13b), (14b) by a rule which replaces a whphrase with its meaning (see Chomsky (1976; 1977a) for discussion). The structures (13b), (14b) are mapped onto (13c), (14c) by a rule which assigns anaphoric indices. Further, (14c) will be mapped onto (15) by the SSC, thus accounting for the fact that him is free in reference with respect to the variable bound by who. (15)
(for which x1, x(1) a person) [S x(1) wants [S the woman(2,{1}) to like him(3,{2})]]
In contrast, the SSC should not apply to the anaphoric index of the variable in the complement of (13c); otherwise, we lose the relevant distinction between (11 a) and (11b). We conclude then that the anaphoric index of a wh-trace is not subject to the SSC. The analysis given above is based on a discussion of (11) in Chomsky (1976). This analysis extends naturally to the PIC cases such as (12) which were not discussed there, and in Chomsky (1980) were assumed not to exist. Parallel to (13) and (14), the derivations of (12a) and (12b) involve the following partial mappings in (16) and (17). (16)
(for(12a)): a. [ [C who2] [S does he1 think [ e2 [S e2 likes the woman3]]]] b. (for which x2, x2 a person) [S he1 thinks [S x2 likes the woman3]] c. (for which x2, x2 a person) [S he(1) thinks [S x(2,{i}) likes the woman(3,{l,2})]]
(17)
(for (12b)): a
b. c.
[ [C who1 [S e1 thinks [ [S he2 likes the woman3]]]]
(for which x1, x1 a person) [S x1 thinks [S he2 likes the woman3]] (for which x1, x1 a person) [S x(1) thinks [S he(2,{1}) likes the woman(3,{1,2})]]
The PIC maps (17c) onto (18). (18)
(for which x1, x1 a person) [S x(1) thinks [S he(2) likes the woman(3,{1,2})]]
It does not apply to the variable in the complement of (16c), thereby preserving the distinction between (12a) and (12b). From this we conclude that the anaphoric index of a wh-trace is not subject to the PIC. Thus far, we have established (a) that a wh-trace, which functions as a variable at LF, takes an anaphoric index, and (b) that this anaphoric index is not subject to either the SSC or the PIC. In this regard, a wh-trace patterns like a name (cf. Chomsky (1976, 335),
Generative grammar
168
where the parallelism between variables and names was first discussed). Below we will consider the question of whether the referential index of a wh-trace might be subject to the reindexing rules. The correct representations for the “crossover cases” (11a) and (12a) follow from the rule that assigns anaphoric indices and the assumption that the trace of wh is not subject to the SSC or the PIC. In addition, a rather natural account of violations of the COMP-toCOMP condition on Wh Movement follows from this analysis without any further stipulations, as shown in May (1981). COMP-to-COMP violations involve movements of a wh-phrase from a COMP to an NP position. In terms of binding, the condition prohibits the binding of a trace in COMP from an NP position in S. For the sake of exposition, we will assume that once a whphrase is wh-moved, it must end up in a COMP position. Violations of the COMP-toCOMP condition will result in structures in which a wh-phrase binds two different NPtraces, one in a matrix S and the other in a complement S. There are two cases to consider: (i) where the complement NP-trace is in the predicate, and (ii) where the complement NP-trace is in subject position. An example of each case is given in (19). (19)
a. b.
*Who believes George to have seen? *Who said saw George? (cf. Who did Harry say saw George?)
The strings in (19) pair with the representations in (20). (20) a.
b.
The Roman numerals in (20) indicate the derivational history of the wh-phrase. In each example the second movement (II) violates the COMP-to-COMP condition. The representations in (20) map onto those of (21) via Wh Interpretation and the rule which assigns anaphoric indices as discussed above for (13)–(14) and (16)–(17). (21) a. (for which x1, x1 a person) [S x(1) believes [S George(2,{1}) to have seen x(1,{1,2})]] b. (for which x1, x1 a person) [S x(1) said [S x(1{1}) saw George(2,{1)]]
In each case the index of the variable in the complement is contradictory—in effect indicating that the intended referent of the variable must be disjoint in reference from itself. These contradictory indices are unaffected by the reindexing rules (i.e., the SSC and PIC). Therefore, we can now account for the COMP-to-COMP violations in terms of indexing—i.e., with a prohibition against contradictory indices of the form (i,{i}). (This observation is due to Robert May—see May (1981) for additional discussion.)
Disjoint reference and wh-trace
169
The indexing analysis of the COMP-to-COMP violations allows us to dispense with an ad hoc condition on derivations in favor of a natural condition on representations; namely, the prohibition against contradictory indices of the form (i,{i}). See Freidin (1978; 1979) for more detailed discussion of the issue concerning conditions on derivations vs. conditions on representations. 3 Indexing and binding Under the analysis presented above, the SSC and PIC are no longer interpreted as conditions on proper binding. Rather, they function as reindexing rules and therefore become part of the algorithms for assigning indices to representations in LF. Conditions on proper binding are given in terms of the conditions on well-formed indices in (22). (22)
a. b.
*NP(0) *NP(i, {…,i,…})
As demonstrated in the previous section, the SSC and PIC as reindexing rules apply to the referential index of an anaphor and the anaphoric index of a pronoun. They do not apply to the referential index of a pronoun, the anaphoric index of a wh-trace, or either index of a name. Whether they apply to the referential index of a wh-trace depends on the status of a wh-trace in LF—that is, is a wh-trace a special case of a name, or is it a fourth NP-type distinct from anaphors, pronouns, and names? In this section, we will show that the assumption that these rules do not apply to the referential index of a wh-trace leads to a simplification in their formulation, which in turn allows for a simplification of the algorithms for assigning indices proposed in Chomsky (1980). In previous formulations of the SSC and PIC, rather than S was taken as the domain of proper binding for an anaphor. The ostensible reason for this choice was the assumption that a wh-trace had the same status as an NP-trace with respect to the conditions (cf. Chomsky (1973; 1976)). We know of no other reason for choosing over S. In contrast, there are several reasons for choosing S over must be stipulated in the structural description of the reindexing rules, whereas S need not. It follows from the definition of in the domain of in conjunction with the phrase structure rule S→NP AUX VP that the domain of both the subject and the tensed element is S.8 In addition to allowing for a simplification of the reindexing rules, the choice of S provides an explanation for why non-wh-NPs in the domain of subject or a tensed element may not use COMP as an “escape hatch” (see Freidin (1978, 529) and the references cited there) with respect to the SSC and PIC—a possibility which was not explicitly discounted in previous formulations of these conditions. Another reason for choosing S rather than is that the ungrammaticality of NPs like (23a) follows immediately from reindexing (specifically the PIC) and the filter (22a).9
Generative grammar (23)
a. b.
170
*the men who each other like [NP the men [ [C whoi] [S each otheri like ei]]]
If S is the relevant domain of application for the reindexing rules, then the referential index of each other in the representation (23b) will be deleted. If, however, is chosen, then each other would not be subject to the reindexing rules since it is bound in Thus, another means of accounting for (23a) would be required. An alternative method of accounting for (23a) would be to specify the algorithms for assigning referential indices in such a way that representations like (23b) would never be generated. One interpretation of the algorithms given in Chomsky (1980) has this effect. The algorithms and their ordering are given in (24). (24) a. Coindex by movement (see Chomsky (1980, (4))). b. Index all unindexed nonanaphoric NPs in S from top to bottom with hitherto unused positive integers (henceforth “contraindexing”). c. Coindex the remaining anaphoric NPs with some c-commanding NP.10
(24a) is part of the syntax; (24b) and (24c) belong to LF, the latter constituting part of the construal rules. (24) will not generate the representation (23b) given the stipulation that whoi does not count as a c-commanding NP. This alternative seems needlessly complicated. Coindexing applies in two different components, before and after contraindexing. Thus, the indexing operation must be given as two distinct suboperations, coindexing and contraindexing. These complications can be eliminated by taking S to be the domain of application for the reindexing rules. Since the ill-formedness of (23b) will fall under (22a), it does not matter if the indexing rules generate such representations. Therefore, we might try to eliminate (24) in favor of the optimal rule of indexing (25).11 (25)
Index NP
Given (25), indexing applies freely, including as a subcase coindexing, at a single point in a derivation. There is no need to define the operation of contraindexing or to stipulate that it applies top-to-bottom as in (24b). Nor is there any need to stipulate that anaphoric NPs be coindexed with some c-commanding NP, excluding wh-phrase in COMP (24c). This follows automatically from the reindexing rules in conjunction with filter (22a). Having established that the choice of S rather than several desirable consequences, let us now return to the question of whether the referential index of a wh-trace is subject to the reindexing rules. If a wh-trace’s referential index is subject to reindexing, then must be taken as the domain of application. Otherwise, the system of rules and filters would predict incorrectly that all wh-questions in English are ill-formed. To maintain this position without giving up the advantages of choosing S would require the implausible analysis in which the domain of application for the reindexing rules is generally S, but for the referential index of a wh-trace. In view of this, we conclude that neither index of a
Disjoint reference and wh-trace
171
wh-trace is subject to the reindexing rules. Thus in LF, a wh-trace has the same status as a name. We turn next to a formulation of the rule for assigning anaphoric indices. Following Chomsky (1980), let us take the anaphoric index of an NPi to be the set {a1,…,an}, where each aj is the referential index of an NP which c-commands NPi. Recall that only nonanaphors are assigned anaphoric indices. Note too that it must be assumed that a whphrase in COMP or its trace in COMP is ignored by the algorithm which constructs anaphoric indices. Otherwise, the algorithm would produce representations like (26) which violate (22b). (26)
[ [C whoi] [S e(i,{i}) left]]
Note also that anaphoric indices must be assigned after Wh Interpretation (see section 2) in order to account for disjoint reference in another crossover case (27a), in contrast to its noncrossed counterpart (27b). (27)
a. b.
Whose book did he read? Who read his book?
In (27b), his is free in reference with respect to the variable bound by who. By Wh Interpretation, the trace theoretic representation of (27) is mapped onto (28) (see Chomsky (1977a)). (28)
a. b.
(for which x3, x3 a person) [S he1 read [NP2 x3’s book]] (for which x1, x1 a person) [S x1 read [NP2 his3 book]]
By the algorithm for constructing anaphoric indices, (28) is mapped onto (29). (29)
a. b.
(for which x3, x3 a person) [S he(1) read [NP(2{1}) x(3,{1})’s book]] (for which x1, x1 a person) [S x(1) read [NP(2{1}) his(3,{1}) book]]
Since his is interpreted as free in reference with respect to x1 in (29b), we assume that there is some principle operative which effects the deletion of the anaphoric index 1 on the pronoun.12 Given that the reindexing rules follow the assignment of anaphoric indices, which in turn follows Wh Interpretation, wh-trace and NP-trace are distinct at the point in a derivation where the reindexing rules apply. At that point, a wh-trace is represented as a variable, whereas an NP-trace is represented as an empty category.
Generative grammar
172
4 Consequences We have presented a number of related arguments that wh-trace is not subject to Opacity.13 Chomsky (1980) is largely compatible with this position, but has one argument to the contrary, to which we now turn. Note that here we follow Chomsky’s presentation in treating Opacity as conditions on logical form rather than as reindexing rules. The argument could readily be restated in the latter format, that of the appendix to Chomsky (1980). Comparing (30) to (31), Chomsky notes that both violate Subjacency with respect to binding to the matrix COMP, yet (31) is somewhat more acceptable. (30)
[ [C who1 [S did you wonder[
[C what2] [S e1 saw e2]]]]
(31)
[ [C what2] [S did you wonder[
[C who1 [S e1 saw e2]]]]
If wh-trace were subject to the NIC (see footnote 5), then the contrast could be described as follows. While both (30) and (31) violate Subjacency, the complement subject in (30), but not in (31), would also be subject to the NIC. (Note that if the PIC is taken to be the relevant condition, (30) and (31) would not be distinguished in this way since the PIC would affect the complement object as well.) Thus, (30) violates two conditions, whereas (31) violates only one—granting that wh-trace is not subject to the SSC. The validity of this account requires that Opacity be split into two quite distinct constraints. The NIC must be taken “to be an “inviolable” constraint, as compared with Subjacency (and [the SSC]…),” the latter two being “weaker” constraints with wh-trace as anaphor (Chomsky (1980, 37–38)). The small difference in acceptability between (30) and (31) seems rather slim evidence for such a weakening and complication of the theory. It should also be noted that there are examples quite parallel to (30) in acceptability, but identical to (31) in relevant derivational properties. Consider (32): (32)
[ [C what1] [S do you wonder [ [C who2] [S Bill gave e1 to e2]]]]
While (32) is just as unacceptable as (30), like (31) it does not involve the NIC, regardless of how wh-trace is to be treated. (33) demonstrates that the unacceptability of (32) cannot be attributed to a crossed binding constraint of the type frequently discussed in the literature (cf. Fodor (1978)). (33)
[ [C who2] [S do you wonder [
[C what1] [S Bill gave e1 to e2]]]]
Though (32) is of the form x1 x2 y1 y2, while (33) is of the form x1 x2 y2 y1, they are equally unacceptable—some new constraint, which presumably would generalize to (30), is required.14
Disjoint reference and wh-trace
173
Thus, if there is in fact a linguistically relevant contrast between (30) and (31), the NIC does not seem to be responsible. Given that wh-trace is not subject to the NIC (PIC), it follows that the *[that–e] filter of Chomsky and Lasnik (1977) cannot be reduced to the NIC as proposed in several recent articles and unpublished papers.15 In fact, given our conclusions above, there is no overlap at all between the filter and the condition. Thus, one argument for eliminating the filter, namely its alleged partial redundancy with the condition, is without force. Another consideration that has come up in discussions of the filter is its complexity. The “unless”-condition particularly has been criticized.16 (34) *[that-e] Filter (2 formulations) a. *[ that [NP e]…], unless or its trace is in the context: [NP NP____…] (=C&L (68)) b. *[ ±WH [NP e]…], unless or its trace is in the context: [NP NP____…] (=C&L (85))
If this complexity actually is a problem, it is interesting to note that it can be completely eliminated. The purpose of the entire “unless”-clause is distinguishing between (on the one hand) that verbal complements, which always lead to ungrammaticality when in violation ((35a)), and (on the other hand) relative clauses and clefts ((35b)), which never lead to ungrammaticality when in violation. (35)
a.
*Who do you believe that t left?
b.
i.
The man that t was shouting left,
ii.
The man left that t was shouting,
iii.
It was this phenomenon that t caused difficulties.
Thus, an alternative to the “unless”-clause is the reasonable separation of complementizer that into two lexical items. The filter would then be stated in terms of the verbal complementizer that and would need no “unless”-clause at all. The conclusion that a wh-trace is not subject to either the SSC or the PIC bears on the analysis of strict cyclicity whereby the empirical effects of the Strict Cycle Condition are derived from independently motivated conditions on representations. In Freidin (1978), two accounts of strict cycle violations involving Wh Movement as in (36) are proposed. (36)
a. b.
[ [C who1] [S did John know [ [C what2] [S e1 saw e2]]]] [ [C who1] [S did John know [ [C what2] [S PRO to give e1 e2]]]]
One account assumes that wh-trace is subject to the PIC and SSC. Thus, (36) is prohibited because e1 in (36a) is subject to the PIC and e1 in (36b) is subject to the SSC. Alternatively, the binding between e1 and who1 in both examples violates Subjacency. As argued above, the first alternative is not viable. This situation provides an argument that Subjacency is properly construed as a condition on representations, and not a condition on movement. Note that (36a) can be
Generative grammar
174
derived in a way that does not violate Subjacency interpreted as a condition on movement, as in (37). (37)
a. b. c. d.
[ [ [ [
[C] [S John knows [ [C] [S who1 saw what2]]]] [C] [S John knows [ [C who1 [S e1 saw what2]]]] [C who1 [S John knows [ [C e1] [S e1 saw what2]]]] [C who1] [S does John know [ [C what2] [S e1 saw e2]]]]
(A similar derivation can be provided for (36b).) Such derivations could be excluded given the Strict Cycle Condition (SCC) (see Chomsky (1973) and Freidin (1978) for details). However, the SCC is redundant for NP Movement, as shown in Freidin (1978), and unnecessary generally if Subjacency is taken to be a condition on representations rather than a condition on movements. It remains to be determined where in a derivation Subjacency applies. 5 Summary and conclusions We have shown that within the framework of Chomsky (1980) the analysis of crossover phenomena and COMP-to-COMP violations supports the conclusion that wh-traces bear anaphoric indices which are not subject to the PIC and SSC.17 This conclusion leads to a simpler formulation of the binding principles (now reinterpreted as reindexing rules) and the algorithms for indexing. An argument that the referential index of a wh-trace is also not subject to the reindexing rules results. From this it follows that the *[that—e] filter of C&L (1977) cannot be reduced to some variant of the PIC and that Subjacency must be interpreted as a condition on representations if the Strict Cycle Condition is to be treated as a theorem of the theory of grammar. Notes
* A version of this material was presented at the 1979 GLOW Colloquium, Pisa, Italy. The work of the first-named author was supported by a Postdoctoral Research Fellowship from the NIMH (grant #1F32 MH 05879–01) and also by NIH grant #HDO 5168–06AI. 1 For further discussion of disjoint reference, see for example Lasnik (1976), Chomsky (1980). On core grammar, see Chomsky and Lasnik (1977) (hereafter, C&L) and Lasnik and Kupin (1977). 2 See Freidin (1978; 1979), Chomsky (1980), and Rouveret and Vergnaud (1980) for discussion of conditions on representations. 3 S-structure is somewhat more abstract than the traditional notion of surface structure, including for example empty categories. 4 By “name” we mean a nonanaphoric nonpronominal NP, hence not existential there or expletive it. The fact that idiom chunks like bucket in the idiom kick the bucket do not function as names indicates that only an NP with intended referent qualifies. 5 Note that the analysis proposed here is slightly different from the one in Chomsky (1980), where the PIC is replaced by the Nominative Island Condition (NIC):
Disjoint reference and wh-trace (i)
A nominative anaphor cannot be free in
175
(=(103))
We assume on the contrary that the relevant parameter for the finite clause condition should remain “domain of tense” rather than “nominative case.” See Lasnik and Freidin (1981) for discussion of this point. Further, as discussed below, we take the domain of application to be S rather than 6 We assume that wh is a quantifier and as such does not refer. See Chomsky (1977b). 7 The (a)-examples are often referred to in the literature as “crossover phenomena” on the grounds that Wh Movement over a pronoun affects coreference possibilities (see Postal (1971) and Wasow (1972)). 8 The slight difference here from the formulation given in (3) is motivated by the need for tense to be introduced by Aux rather than by S. See, for example, Lasnik (1981a), for discussion. 9 We are indebted to Dominique Sportiche for bringing this example to our attention. 10 See Chomsky (1980, fn. 46) for discussion of how indexing might be done when a lexical anaphor receives an index in the syntax via movement. 11 Recall that this is the rule for assigning referential indices. Whether coindexing by movement is still necessary remains to be determined. If the indices assigned via movement are referential indices, then the pied piping cases, or PP movements in general, might provide some problems for interpretation. 12 This principle might involve deletion of the anaphoric index of a genitive pronoun. An alternative approach, suggested in Anderson (1979), would represent the pronoun in (28b), for example, as a bound anaphor, possibly a suppletive form of the nonoccurring himself’s. Under this interpretation, his would receive no anaphoric index at all. Thus, his, her, their, etc., would be systematically ambiguous between bound anaphors, on the one hand, and regular pronouns, on the other. 13 Following the terminology of the first section of Chomsky (1980), we use this term to designate both the SSC and the finite clause condition, the PIC. 14 The slightly more acceptable examples in the set—such as (31) or (i) below—can possibly be thought of as derivatively generated. (i)
What did you wonder how well he did?
In particular, the wh-phrase introducing the embedded S might not have undergone movement at all in (31) and might have been preposed by adverb movement in (i). Thus, violation of Subjacency would be avoided. The examples would still not be fully grammatical, however, since they relax the requirement that wonder and know (in the relevant sense) require whphrases in their embedded COMPs, to the less stringent requirement that the wh-phrase simply be adjacent to wonder or know. In this way, we would not be forced to say that Subjacency is a weaker constraint in these cases. Note that no such “derivative” derivation is available for (30), (32), or (33). It should be noted, however, that the above remarks cannot be extended to the distinction between (ii) and (iii), (ii) (iii)
Which book do you wonder how well the students read? Which students do you wonder how well read the book?
Generative grammar
176
where (iii) seems less acceptable than (ii). 15 See especially Kayne (1981; 1980), Taraldsen (1979), and Pesetsky (1982a). 16 Note though that complexity creates no learnability problem for a universal rule. 17 Although the examples we presented all involve wh-questions, the same result obtains for relative clauses. The following are parallel to (11a,b) and (12a,b): (i)
a. b.
The man who he wants the woman to like… The man who wants the woman to like him…
(ii)
a. b.
The man who he thinks likes the woman… The man who thinks he likes the woman…
Rizzi (1977) provides independent evidence based on wh-extraction facts that Wh Movement in Italian relative clauses is not subject to the PIC and SSC.
9 On the fine structure of the binding theory Principle A and reciprocals with Wayne Harbert
Within the Extended Standard Theory it is assumed that the distribution of lexical bound anaphors—e.g., reciprocals and reflexive pronouns—is accounted for by a general principle of the theory of binding, which is given schematically in (1).
(1)
An anaphor must be bound in domain X.
An anaphor is considered bound where it is coindexed with a c-commanding NP, and bound in domain X where this c-commanding NP occurs in domain X. Following Chomsky (1981) (henceforth LGB) we will refer to (1) as Principle A of the binding theory. Further elaboration of Principle A raises two fundamental questions: i) what constitutes an anaphor? and ii) how is domain X to be characterized? The answers to (i) and (ii) are interdependent as will be discussed below. The main focus of this paper is on the characterization of domain X. We will demonstrate that a careful analysis of anaphor binding across pleonastic subjects (e.g., nonreferential it) raises empirical problems for current versions of Principle A (cf. LGB, chapter 3). We propose a modification of the notion “accessible subject” which eliminates the empirical problems. A brief discussion of some consequences of this modification follows. As a point of departure, we begin with Chomsky’s formulation of Principle A as given in LGB, p. 220. (2) An anaphor must be bound in its binding category, where binding category is defined in (3) (=LGB, 3.2.3:(100)). (3) β is a binding category for α if and only if β is the minimal category containing α and a SUBJECT accessible to α.
“The notion SUBJECT accords with the idea that the subject is the ‘most prominent nominal element’ in some sense, taking INFL to be the head of S” (LGB, p. 209). Thus SUBJECT translates as the agreement element in INFL (AGR either containing an empty N or being represented by it) where present, or the syntactic subject [NP, S] or [NP, NP]. Presumably, where both AGR and a syntactic subject are present, AGR is considered the more prominent nominal element and therefore takes precedence. This interpretation is not crucial for the analysis given below; however it is important to note that under the
Generative grammar
178
LGB formulation AGR and syntactic subject have equal force with respect to binding. A SUBJECT is accessible to an anaphor where it c-commands the anaphor.0 (Chomsky posits an additional condition on accessibility (the i-within-i condition) to which we return below.) Under this formulation of Principle A, the examples in (4) are excluded because the anaphor himself is free in its binding category—S in (4a) and NP in (4b), where Mary is an accessible subject. (4)
a. b.
*John expected [S Mary to help himself] *John sold [NP Mary’s pictures of himself]
In (5) the anaphor is also free in its binding category; but here the analysis is somewhat more complicated. (5)
a. b.
*John said that [S′ himself; AGRi was happy] *John said that [S′ Mary; AGRi helped himself]
In (5a) AGR is SUBJECT and also coindexed with the anaphor himself. However himself is not properly bound in S′, its binding category since AGR is not a possible binder. Thus (5a) constitutes a violation of Principle A. Similarly (5b) violates Principle A because himself is free in its binding category S′—which is established by the presence of AGR. (5a) differs from (5b) in that (5b) contains a possible binder for the anaphor—i.e., the syntactic subject Mary, though presumably both are illformed due to the effect of AGR. Given the distinction between accessible SUBJECT and possible binder, where syntactic subjects are included under both notions and AGR is excluded under the latter, the term “bound” in the formulation of Principle A (1) above should be interpreted as “bound to a possible binder.” Putting aside the question of AGR as an accessible SUBJECT for the moment, let us consider binding with respect to syntactic subjects—and in particular [NP, S]. In the LGB formulation, all syntactic subjects are equally accessible subjects and also possible binders—with one exception concerning coindexing of pleonastic subjects and the iwithin-i condition (cf. LGB, p. 215) to which we return below. Thus there should be no difference in the grammaticality of the following pair of examples: (6) a. *They expected [S Johni to {seem/be reported} to each other [S ei to be crazy]] b. They expected [S it to {seem/be reported} to each other [S that John is crazy]]
However there is a contrast between the two examples where (6a) is definitely illformed while (6b) is relatively wellformed (as indicated by the lack of a star). The examples differ solely with respect to the nature of the subject across which the anaphor each other is bound. In (6a) binding occurs across a lexical subject which bears a θ-role; whereas in (6b) binding is across a pleonastic subject which bears no θ-role. In the discussion which follows we designate lexical subjects bearing a θ-role as θsubjects in contrast to pleonastic subjects—i.e., non-θ subjects. Note that in both (6a) and
On the fine structure of the binding theory
179
(6b) binding is across a subject in a non-θ position.1 The question arises as to whether the weak binding effect of it (if it exists) is due to its non-θ status or its occurrence in a non-θ position. These factors are separated in the following examples: (7)
a. b.
They expect [S it to seem to each other that Mary is lying] They expect [S it to annoy each other that Mary is lying]
In (7a) it is in a non-θ position (subject of seem), whereas in (7b) it is actually in a θposition as illustrated in (8). (8)
The report annoyed us.
Given the θ-criterion, it follows that the report is in a θ-position. If the subject of annoy is a θ-position, then it in (7b) occurs in a θ-postion. Presumably the θ-role of this subject position is transferred to the complement of annoy since nonreferential it may not bear a θ-role and the must. The examples in (7) seem substantially better than the ill-formed (6a)—indicating that non-θ subjects in general have a weaker opacity inducing effect than θ-subjects, whether or not they occur in θ-positions. Yet (6b) and (7a) seem to us somewhat better than (7b)—though the judgments are admittedly subtle, indicating that occurrence in a θ-position also contributes (weakly) to opacity effects. We conclude from this evidence that with respect to Principle A, θ-subjects are significantly stronger than non-θ subjects. Given this qualitative difference between θ vs. non-θ subjects regarding opacity effects, we can now proceed to test the strength of AGR as an accessible subject for binding. If AGR is either a stronger or equally strong opacity inducing factor, then the weak effect of pleonastic it subjects should disappear when AGR also occurs in the clause containing the anaphor. Consider the examples in (9), which are analogous to (6) except that the embedded clause containing tha anaphor each other contains AGR as well. (9) a. *They expected [S Johni AGR would {seem/be reported} to each other [S ei to be crazy]] b. They expected [S it would {seem/be reported} to each other [S that John is crazy]]
If AGR is a strong opacity inducing factor then both (9a) and (9b) should be equally illformed. This is not the case however. Roughly the same difference in relative grammaticality holds for (9a–b) as for (6a–b). (9a) is definitely illformed, whereas (9b) is relatively wellformed in comparison (hence the lack of a star). In (9b) AGR fails to induce the strong opacity effect it should according to the LGB theory Thus we are led to conclude that the strong opacity effect in (9a) is due to the θ-subject effect as in (6a).2 Nonetheless, there is one context where AGR appears to have a strong binding effect, as in (10).
Generative grammar (10)
a.
180
*They expected [S that each other AGR were crazy]
b. *They expected [S it to be reported [S that each other AGR were crazy]]
In neither example in (10) is binding across a θ-subject. In (10a) binding does not cross the domain of any syntactic subject, while in (10b) it occurs across the domain of a non-θ subject, nonreferential it. Yet both examples are as strongly illformed as cases where binding occurs across the domain of a θ-subject (cf. (6a)). In both examples in (10) AGR is local with respect to the anaphor each other. Yet locality by itself is not sufficient to account for the strong binding effect in (10) since AGR is also local with respect to the anaphor in (9b) where there is no strong binding effect. That is, the local effect of AGR holds for subjects, and not apparently for non-subjects. We will refer to this local strong binding effect of AGR as the NIC effect.3 To summarize, AGR has a strong binding effect which holds for only its associated subject position. For local non-subject positions as in (9b), the opacity inducing effect of AGR is comparatively weak (if it exists at all). The opacity of non-subject positions is determined primarily by the thematic property of c-commanding subject NPs. We conclude therefore that the binding effects of AGR and syntactic subject cannot be collapsed under the notion SUBJECT as discussed above (cf. (3)). Granting this conclusion, it is still possible to avoid having to stipulate two apparently unrelated opaque domains for binding—the c-command domain of a θ-subject and the subject of a finite clause (or more generally, a subject marked for agreement with the verb of its clause (see George & Kornfilt (1981) for discussion)4, provided the NIC effect can be reduced to the θ-subject effect. If such a reduction is possible, then the NIC cases in (10) must be interpreted so that the anaphor is free in the domain of a θ-subject. Since the only θ-subject that might qualify is the anaphor itself, it should be the case that the anaphor is free in its own domain—i.e., the domain of a θ-subject. However, this account requires that the illformed cases (e.g., (11a)) be distinguished from the wellformed cases (e.g., (11b)) which differ only in the absence of AGR. (11)
a. b.
*We expect [S* each other AGR would win] We expect [S* each other to win]
In other words, S* in (11a) must be designated as the domain of a θ-subject with respect to the anaphor, while S* in (11b) must not. Although AGR is easily identified as the salient feature which distinguishes the two S*s, exactly how the distinction is to be given formally is far from obvious. For now this remains a non-trivial desideratum in our analysis. Thus far we have argued that the notion “accessible SUBJECT” as it applies to Principle A with respect to the binding possibilities for reciprocals should be limited to θsubjects—in effect, syntactic subjects. Note that this notion of θ-subject must extend to traces as well as lexical subjects as illustrated in (12). (12)
*Johni seemed to them [S ei to annoy each other]
On the fine structure of the binding theory
181
The trace in (12) should be analyzed as a θ-subject even though it does not bear a θ-role strictly speaking. Thus we might sharpen the notion “θ-subject” as “any subject in a chain which bears a θ-role.” For example, Johni and ei in (12) form a chain bearing the θ-role for the subject of annoy. In the LGB theory, the notion “accessible” is constrained by the i-within-i condition on wellformedness of indexing structures, given in (13). (13)
*[γ…δ…], where γ and δ bear the same index. (LGB, p.212)
The incorporation of (13) into Principle A accounts for the apparent wellformedness of (14), under the assumption that subject NPs are coindexed with the AGR of their clause. (14) Theyi expected that [S [NPj pictures of each otheri] AGRj would appear in the local newspapers]]
Since NPj must be coindexed with AGR, neither NPj nor AGR may also be coindexed with each other given (13). AGR is therefore not an accessible SUBJECT for each other, which consequently need not be bound in the complement clause. Note that the i-within-i condition is necessary for the binding theory if we assume that AGR functions as an accessible SUBJECT. If not, as we have argued, then it is not clear that (13) is relevant for Principle A with respect to (14) since NPj dominates each other, thus it does not c-command the anaphor and consequently is not accessible to it given the basic definition of “accessible” (see note 0).5 However, (14) may be relevant to the proper reduction of the NIC effect to a θ-subject effect. In LGB, Chomsky extends the i-within-i analysis to examples like (15). (15) a. Theyi believe [S itj would annoy John [ that [NP each other’si pictures] were being overrated]] b. Theyi believe [S itj would annoy John [ for [NP each other’si pictures] to win prizes]]
Coindexing of pleonastic it with the complement of annoy renders it non-accessible to the anaphor in virtue of (13). This analysis encounters two difficulties—one conceptual and the other empirical. Conceptually, where both γ and δ in (13) are NPs, as in (14) and (i) or (ii-b) of note 5, the i-within-i condition appears to be a natural condition on coreference. In constructions like (15), however, where γ is an the issue of coreference disappears. Unlike the indices of NPs, the index of does not seem to be a referential index. Thus the generalization of the i-within-i condition to such cases is not a natural one. Empirically this analysis appears to make incorrect predictions about the wellformedness of sentences like (16). (16) a. Theyi believe [S it is likely [ that [NP each other’si pictures] are being overrated]] b. Theyi believe [S it is likely [ for [NP each other’si pictures] to win prizes]]
Generative grammar
182
In (15) the pleonastic subject occurs in a θ-position and is assumed to be coindexed with the postverbal in order for the latter to acquire a θ-role, thereby satisfying the θcriterion. This provides the coindexing necessary for the extension of the i-within-i analysis. In (16) the pleonastic subject occupies a non-θ position. Thus there is no motivation for assuming coindexing between it and the complement of likely. Given this analysis, there ought to be a significant difference in wellformedness between the examples in (15) and (16) under the LGB theory, where (16) is illformed and (15) (because of the i-within-i condition) is wellformed. However, these examples do not manifest the predicted difference. They seem to us to be essentially wellformed, on a par with examples like (6b). If there is a difference, it appears to be in favor of (16) over (15) (contrary to the LGB analysis), where pleonastic subjects in θ-positions induce a slight opacity effect as mentioned above. Under our analysis of Principle A, the extension of (13) to account for the relative wellformedness of (15), with its attendant problems, is unnecessary. The relative wellformedness of both (15) and (16) follows from the weak effect of non-θ subjects for binding. In this respect these examples fall together with (6b), (7), and (9b). Our analysis suggests further that the difference in wellformedness between (15) and (16)—if it exists—is just another instance of the difference between (7b) and (7a) respectively. That is, condition (13) is not at issue. In the preceding discussion we have attempted to illustrate that Principle A has a fine structure whereby θ-subjects have strong opacity inducing effects in contrast to non-θ subjects and AGR which have relatively weak effects, with the exception of the NIC effect for AGR as noted above. We noted that pleonastic subjects may manifest a slightly stronger effect when they occur in a θ-position or are locally bound to AGR. Still to be determined is the exact hierarchy of strength for these weak effects. Our analysis has dealt almost exclusively with reciprocals because they provide the clearest cases. We turn now to the question of whether this analysis generalizes to the other type of lexical bound anaphora, reflexive pronouns. We believe that it does—again on the basis of the relative grammaticality judgments. Thus consider the following pair of examples: (17)
a. *Tom expects [S Mary to {seem/be reported} to himself [S ei to be lying]] b. ?Tom expects [S it to {seem/be reported} to himself [ that Mary is lying]]
Although there is a certain awkwardness in using a reflexive pronoun in (17b) where a non-reflexive pronoun would serve for the intended reading, we find that there is a qualitative difference between (17a) where binding occurs across a θ-subject and (17b) where binding occurs across a non-θ subject.6 Note that NP-trace binding is not sensitive to the distinction θ- vs. non-θ subject. (18)
*Hei was expected [ for [S it to be hurt ei]] cf. It was expected for him to be hurt.
Thus if Principle A accounts for (18), then it will be necessary to distinguish between lexical anaphors and NP-e.
On the fine structure of the binding theory
183
It should also be noted that the interpretation of pronouns with respect to disjoint reference is not sensitive to the distinction between θ and non-θ subjects. Thus the pronoun he in (19) may be interpreted as coreferential with John where the pronoun is free in the domain of a non-θ subject. (19)
John expected it to annoy him that Mary was arriving late.
In this way our analysis provides additional evidence for the asymmetry between the domain in which an anaphor must be bound and the domain in which a pronoun must be free. That is, the domain X for Principle A will not be the same as the domain statement for Principle B: (20) Principle B: a pronoun must be free in domain Y. See Huang (1982) for discussion.
Our analysis suggests that the notion θ-subject is crucial for binding of lexical anaphors. This seems natural in that anaphors, having no inherent reference, must get their reference from some antecedent which does. Since non-θ subjects do not qualify as possible antecedents, it seems natural that they wouldn’t define strong opaque domains. However with pronouns it is not necessary that they have an antecedent in the sentence in which they occur. Thus it seems reasonable that opaque domains for pronouns are determined differently than for anaphors. Notes
0 The definition of accessible is as follows: α is accessible to β if and only if α is in the c-command domain of β. (Cf. Chomsky (1981) (henceforth LGB); p. 212). 1 That is, in a non-argument position in the terminology of Freidin (1978) (not to be confused with the notion Ā-position in LGB). Recall that it follows from the θ-criterion (in particular, Functional Uniqueness in its strong form—see Freidin (1978, footnote 25)) that movement can only occur into non-θ positions. 2 Comparing (6b) and (9b), it may be that the latter is very slightly worse (though again judgments here are rather subtle). If so, then AGR might provide a slight increment to opacity effects. In LGB (p. 214) an example involving binding across a pleonastic subject is marked ungrammatical: (i)
*They think it bothered each other that S.
This example incorporates two factors which may have slight opacity effects: AGR in the embedded clause and the presence of it in a θ-position. However, compared with (ii) where binding occurs across a θ-subject, (i) is relatively wellformed.
Generative grammar (ii)
184
*They think Mary’s report bothered each other.
For some reason (i) seems better if the embedded predicate takes an auxiliary— e.g., will bother or has bothered. 3 The Nominative Island Condition (Chomsky (1980)) prohibits the occurrence of nominative anaphors—in effect, anaphor subjects of finite clauses. Thus the NIC covers the one case of strong binding effects which is not accounted for by θ-subjects. 4 Recall that such a stipulation has been considered a defect in previous formulations of the binding theory—see LGB, p. 158. 5 Notice that independent motivation for (13) is given in LGB as in (i) which involves the indexing of a pronoun rather than an anaphor. (i)
*[NPi the owner of [NPi his] boat]
Yet the following example involving an anaphor seems relatively wellformed when compared with its pronominal counterpart.
(ii) a. Every owner of a picture of himself is conceited to some degree, b. Every owner of a picture of him is conceited to some degree.
(ii-b) is impossible on the coreferent reading, where him and owner corefer. 6 (17b) seems better with seem than with be reported, though why this should be is unclear.
10 Fundamental issues in the theory of binding* Within the current framework of the Extended Standard Theory of generative grammar, the Government-Binding theory (henceforth GB) of Chomsky 1981, two lexical NPs are interpreted as having the same intended reference if they bear the same index. Under an optimally simple formulation, coindexing is achieved by a rule “index NP.” Thus there is no special rule of coindexing or any special prohibition against coindexing by the basic indexing rule (see Freidin and Lasnik 1981 for discussion). This rule generates representations of sentences where lexical NPs are interpreted as coreferential as in (1–3).
(1)
How many people that John; knows does John; really like?
(2)
Mary; knows that she; is clever.
(3)
Theyi often annoy each otheri.
It is possible to generate different grammatical sentences by eliminating the coindexing in (1) and (2), but the elimination of coindexing in (3) yields an illformed representation. The expression each other in (3) is a lexical bound anaphor which has no intrinsic reference of its own and must get its reference from some NP antecedent in the sentence in which it occurs. The pronoun she in (2) also has no intrinsic reference; but unlike the bound anaphor, a pronoun can get its reference outside the context of the sentence in which it occurs. The proper name John in (1) presumably has an intrinsic reference—in which case coindexing in (1) does not fix the reference of the second occurrence of John, but rather establishes that both names designate the same individual. In what follows, bound anaphors and pronouns will be referred to as anaphoric elements and the NP with which each is coindexed in a sentence will be designated its antecedent. The rule of indexing as formulated generates representations of anaphoric relations which do not occur—e.g., (4–6). (4)
*Maryi knows that Maryi is clever.
(5)
*Johni likes himi
(6)
*Theyi think that each otheri are crazy.
The examples in (1–6) show that coindexed NPs have a restricted distribution independent of their linear order (since this order is identical for each anaphoric
Generative grammar
186
element/antecedent pair). Moreover, the distribution of coindexed NPs differs according to NP-type. Coindexed names do not occur in the same contexts as coindexed pronouns (compare (4) and (2)), and coindexed pronouns do not occur in the same contexts as bound anaphors (compare (5) and (3)). Within GB the limited distribution of coindexed NPs is accounted for by three general principles—one for each NP-type mentioned above. These principles, commonly referred to as Principles A, B, and C, are given schematically in (7). (7)
A: an anaphor must be bound in domain D. B: a pronominal must be free in domain D’. C: a name must be free.1
These principles constitute a theory of binding. They are interpreted as conditions on syntactic representations at some particular level or levels in a derivation, rather than as conditions on the application of the indexing rule (i.e. on the indexing derivation itself). Under this interpretation the principles in (7) define wellformedness of binding representations.2 An explication of the binding theory as instantiated in Principles A-C must address the following fundamental issues concerning the formulation and application of these principles. (8) i.
What is the proper definition of ‘bound’ for Principle A?
ii. Is the relation ‘free’ for Principles B and C defined as ‘not bound’? iii. How is domain D characterized for Principle A? iv. Is domain D’ of Principle B identical to the domain D of Principle A; and if not, how is it different? v.
To what extent does the binding theory account for the distribution and typology of empty categories?
vi. At what level(s) of representation do the various binding principles hold?
The following discussion is meant to be illustrative rather than definitive. It is an attempt to elucidate current work on the binding theory from the perspective of the fundamental issues raised. The remainder of this paper is organized as follows. Questions (i) and (ii) which deal with the structural relation between anaphoric elements and their antecedents are discussed in section 1 in terms of a single hierarchical relation, c-command. Section 2 covers the analysis of domain D of Principle A. The sproper characterization of domain D′ of Principle B as differentiated from domain D of Principle A is the focus of section 3. Section 4 deals with the application of binding theory to empty categories and the related issue of the typology of empty categories. Section 5 contains a preliminary discussion of question (vi). The determination of what levels of representation the various binding principles apply to turns out to be a complicated matter. The final section provides a brief summary of the major points in the preceding sections.
Fundamental issues in the theory of binding
187
1 On the hierarchical relation between anaphoric elements and their antecedents In GB the relation ‘bound’ is defined in terms of the structural relation ‘c-command’. An NP is bound if it is coindexed with a c-commanding NP, where c-command is defined as in (9). (9) A node α c-commands a node β if neither node dominates the other and the first branching node which dominates α also dominates β (cf. Reinhart 1976, 1981).3
This relation holds between the coindexed NPs in (2–6), but crucially not in (1). By definition, the two instances of the name John are not bound; hence they are free even though they are coindexed. This definition of ‘free’ as ‘not bound’ also holds with respect to Principle C in the following structures, where c-command domains of the coindexed names are indicated by brackets. (10)
a. b. c.
[John’si mother] thinks that [Johni is unhappy] [Mary’si brother] admires [Mary’si boyfriend] Bill returned [Sam’si bicycle] to [Sam’si house]
For Principle C, the definition of ‘free’ as ‘not bound’ appears to be sufficient. For Principle A, the defintion of ‘bound’ based on c-command holds for the basic cases. However there are certain constructions in which an anaphor occurs in a position which is not c-commanded by its antecedent. (11)
a. b.
pictures of himselfi [embarrass Johni] Johni is embarrassed by pictures of himselfi
In contrast to (11b), the anaphor in (11a) is not c-commanded by its antecedent. There are several reasons for not modifying the definition of ‘bound’ (or alternatively ‘c-command’) to accommodate (11a) under Principle A. The definition of ‘bound’ as it stands correctly predicts that if the anaphor in (11a) is replaced by a name, then coindexing is allowed since neither coindexed NP c-commands the other. (12)
[pictures of Johni] [embarrass Johni]
If the definition of ‘bound’ is changed for anaphors, then either (12) should be illformed on a par with (4)—which is false, or we must abandon the common thread between Principles A-C, the structural relation of c-command. Secondly, if object NP can bind NPs in embedded subjects, then there is no way to rule out binding of subjects as in (13) without ad hoc complications of the structural definitions involved.
Generative grammar (13)
188
*Each otheri annoyed the meni
While it could be argued that (13) is ruled out by Principle C since the men is not free, an analysis where an object NP may bind a subject NP would predict incorrectly that (14) is also illformed by Principle C.4 (14)
The meni annoyed each otheri
Next, while the so-called ‘picture NP reflexive’ cases (e.g., (11)) are acceptable, an anaphor in the subject position of an NP whose antecedent is the object of the matrix verb seems significantly less acceptable. (15)
*Each other’si pictures annoyed the womeni
Therefore it is reasonable to consider cases like (1 1a) as outside the range of core phenomena. Additional evidence for this conclusion comes from the fact that “picture NP reflexives,” unlike other reflexives, can have split antecedents as in (16a)—in contrast to (16b). (16)
a. b.
i.
John talked to Mary about pictures of themselves.
ii.
John gave Mary pictures of themselves,
*John talked to Mary about themselves.
Note further that this property is limited to reflexives and therefore should not be considered as a general property of lexical bound anaphors, as illustrated in (17). (17)
a. b.
*John talked to Mary about pictures of each other, *John gave Mary pictures of each other.
(See Bouchard 1982 and Lebeaux to appear for further discussion.) Another construction in which an anaphor does not occur in a c-command relation with its antecedent is given in (18). (18)
John talked to Maryi about herselfi
If the anaphor is replaced with a coindexed copy of the name Mary, the result is illformed (excluding emphatic stress on the second instance of the name).
Fundamental issues in the theory of binding (19)
189
*John talked to Maryi about Maryi
According to the simplest version of the binding theory, the object of to should ccommand the object of about. This is also necessary if Principle B is to account for the illformedness of (20). (20)
*John talked to Maryi about heri
In short, the binding theory makes the correct predictions for all NP-types in this construction if the object of to c-commands the object of about. This c-command analysis could be realized in at least two ways. If the verb talked and the preposition to are reanalyzed as a V, as in (21), then c-command will hold between the relevant NPs in the appropriate manner.5 (21)
[VP [V [V talked] to] NP [PP about NP]]
Alternatively, to might be analyzed as a case-marker adjoined to NP (cf. of in nominals like the destruction of the city) rather than a prepositional head which projects its own phrasal category distinct from NP and thereby blocks c-command between the two NPs. Clearly some special analysis of (18–20) is required in order to maintain the generalization captured by the binding theory as it applies to these constructions.6 For the remainder of the discussion the c-command formulation of ‘bound’ will be assumed as the core notion. 2 Domain D of principle A As illustrated in (6), the condition that an anaphor must be c-commanded by its antecedent is not sufficient to determine the distribution of anaphors. Therefore the basic c-command requirement must be supplemented with a domain statement which specifies the subdomain(s) of a sentence in which an anaphor cannot be free (even when it is bound in the sentence). We say that this subdomain is opaque to binding with respect to an antecedent outside the subdomain. Ungrammaticality due to binding across an opaque domain will be referred to as an opacity effect. The following paradigms give the basic cases to be accounted for. (22)
simple sentences a.
anaphor in subject position: i.
b.
*Each otheri left.
anaphor in object position:
Generative grammar
(23)
i.
Wei admire each otheri
ii.
*Johni admires each otheri
190
complex sentences a.
b.
anaphor in subject position: i.
*Wei expect [(that) each otheri will win]
ii.
Wei expect [each otheri to win]
anaphor in object position: i.
*Wei expect [(that) Mary will admire each otheri]
ii.
*Wei expect [Mary to admire each otheri]
The paradigm shows that an anaphor must be bound in the minimal finite S containing it. This accounts for the illformed examples in (22) as well as in (23.a.i) and (23.b.i), but does not account for (23.b.ii) where the anaphor is bound in the minimal finite S containing it (=the matrix S). Therefore it is necessary to specify a second opaque domain for anaphors—namely the domain a subject. This is independently motivated for binding into NP, as will be discussed below. The specification of two domains for Principle A (i.e. the domains of subject and finite S) creates an apparent problem with respect to overlap. Both domains pick out (22.b.ii) and (23.b.i). This overlap is problematic if, as in previous work (cf. Chomsky 1973, 1976), the two domains are taken as defining two distinct conditions on the distribution of anaphors (e.g. the Tensed S Condition (TSC) and the Specified Subject Condition (SSC)) and violations of more than one condition are assumed to increase ungrammatically. This second assumption is not borne out for the anaphora paradigms above since (22.b.ii) and (23.b.i) are not significantly more ungrammatical than other illformed examples in the paradigm. Nonetheless, the overlapping specifications have been considered a defect in the formulation of conditions on the distribution of anaphors (see Chomsky 1977b, 1980, and 1981). Two solutions for this problem have been proposed. Chomsky 1980 sought to characterize two non-overlapping conditions, the SSC plus the Nominative Island Condition (NIC), where the domain of tense is dropped in favor of a prohibition against anaphors bearing nominative case. The NIC proposal has the empirical advantage of accounting for the grammaticality of (24), in contrast to the TSC and its variants (e.g. the Prepositional Island Condition (PIC) of Chomsky 1977). (24)
In (24) the anaphors are free in the domain of tense (in the embedded S) but are not in the nominative case. Another advantage of the NIC is that it accounts for opacity effects in infinitival complements in Portuguese (see Rouveret 1980) and Turkish (see George and Kornfilt 1981) where infinitival subjects may be marked for agreement with the
Fundamental issues in the theory of binding
191
complement verb. Such subjects are marked in the nominative case. As predicted by the NIC, anaphor subjects are prohibited in these constructions.7 Nonetheless, there are several problems with the NIC proposal. While examples like (24) are acceptable in English, they tend to be illformed in other languages. In Dutch, for example, comparable examples with reflexives are ungrammatical (see Lasnik and Freidin 1981 for further discussion). Secondly, the NIC does not account for opacity effects in Turkish gerunds where the subject is marked genitive when the subject agrees with the verbal form, and genitive anaphor subjects are prohibited as illustrated in (25).8 (25)
*yazar-lari [birbir-lerini-in viski-yi author-PL each other-their-GEN whisky-ACC iç-tik-lerin] -i san-iyor-lar drink-GER-3PL-AGG believe-PRES-3PL “The authors believe that each other drank the whisky.”
This example suggests that it is not simply a prohibition against anaphors with certain morphological properties which accounts for opacity in these examples, but rather that opacity is determined in terms of syntactic domains. The NIC does not define a syntactic domain, and therefore is not an island condition in the usual sense. For example, although the bracketed NP in (24) is marked nominative, it is not an opaque domain for binding. Icelandic presents yet another empirical problem for the NIC analysis. In Icelandic, finite indicative clauses are opaque to binding, whereas infinitival clauses are not—even when binding occurs across a subject as in (26).9 (26)
hanni telur [migj hafa se sigi] he believes me to have seen himself
Icelandic does not appear to observe the SSC. If it does not, then the opacity effects for non-subjects in finite clauses cannot be due to the SSC (see Harbert 1982b for further discussion). For Icelandic the notion “opaque domain” for binding must be defined in terms of some characterization of “finite clause”. If we reject the NIC, then the opacity effects for subjects of infinitivals in Portuguese and Turkish should also fall under our characterization of “finite clause”. (Note that this characterization does not use the term “finite” in its time reference sense.) Since the agreement element (henceforth AGR) occurs in all the relevant examples under discussion, “finite clause” can be characterized as the domain of AGR—assuming that AGR is a grammatical formative in INFL (=inflection) which is an immediate constituent of S and hence c-commands both VP and the syntactic subject in S. This analysis accounts for (25), which is not covered by the NIC. Because for English the SSC is needed independently of a finite clause condition on binding (see (23.b.ii), the problem of overlapping domains arises again since the domains of subject and AGR overlap for non-subject NPs. A second solution to this problem is to give a single specification for domain D of Principle A so that the domains of subject and AGR follow from this specification. Initially this was accomplished in the GB framework by attempting to integrate the binding theory with notions of case as follows. Considering
Generative grammar
192
the core cases (22–23), we note that an anaphor must be bound in the minimal S-domain in which it receives case.10 Under this unitary characterization (23.a.i) and (23.b) are illformed because the anaphors in them are assigned case in the embedded S-domain and are free in that domain. In contrast, the anaphor in (23.a.ii) is assigned case by the matrix verb and is bound in the matrix S, which is the minimal S-domain in which the anaphor receives case. In GB theory this analysis is formalized in terms of the notion of government, since case is assigned to NPs under government by a case-assigning element (e.g. V or P). Thus domain D of Principle A is designated as the governing category of the anaphor, where governing category is defined as the minimal S or NP containing the anaphor and a governor of the anaphor. In S, the governors will be V, A, P, and AGR in INFL—in effect, the heads of maximal phrasal projections in (see Freidin 1983) with the additional stipulation that INFL qualifies only with it contains AGR.11 While this characterization accounts for the S-paradigm as given in (22–23), it does not generalize to the paradigm for binding in NP given in (27–28). (27)
simple sentences a.
subject position: i.
b.
Theyi read [NP each other’si books]
object position: i.
Theyi read [NP books about each otheri]
ii.
*Theyi read [NP John’s books about each otheri]
iii.
John read [NP theiri books about each otheri]
(28) complex sentences a. subject position: i.
Theyi expect [ that [S [NP each other’si books] will be favorably reviewed]]
ii. Theyi expect [S [NP each other’si books] to be favorably reviewed] iii. *Theyi expect [ that [S John will favorably review [NP each other’si books]]] iv. *Theyi expect [S John to favorably review [NP each other’si books]] b. object position: i.
Theyi expect [ that [ [S [NP books about each otheri] will be well received]]
ii. Theyi expect [S [NP books about each otheri] to be well received] iii. *Theyi expect [ that [S [NP John’s books about each otheri] will be well received]] iv. *Theyi expect [S [NP John’s books about each otheri] to be well received] v.
*Theyi expect [ that [S John will read [NP books about each otheri.]]]
vi. *Theyi expect [S John to read [NP books about each otheri]]
Fundamental issues in the theory of binding
193
The paradigm illustrates that the subject position in NP is always transparent to binding by a c-commanding NP ((27.a) and (28.a.i–ii)) unless a c-commanding subject intervenes ((27.b.ii) and (28.a.iii–iv)). Thus the paradigm for NP diverges from that of S in two ways: 1) the subject of NP is always transparent, in contrast to the subject of S, which is transparent only when AGR is absent; and 2) the object position in NP is transparent when there is no subject ([NP, NP]), whereas the object position is almost always opaque to binding from outside S (but cf. (40) below). The theory in which domain D of Principle A is specified as the governing category of the anaphor (as defined above) can only account for opacity effects on binding across subjects of NP. The transparency of NP with respect to binding cannot be accounted for by a mechanism similar to (as in the S-paradigm—see note 11) since there is no exceptional case marking into NP to motivate such a proposal. For NP, the relevant factor which determines its transparency for binding is the presence or absence of a syntactic subject. The SSC is the only relevant opacity condition for NP. Even a c-commanding AGR does not induce opacity in NP, as illustrated in (28.a.i) and (28.b.i). With respect to binding then, there is a basic asymmetry between S and NP. Therefore the governing category analysis of the S-paradigm, which collapses the SSC and the relevant variant of the TSC under a single specification of opaque domain, does not generalize to NP.12 Given that the SSC must be included in the domain statement for Principle A, we once again confront the problem of potential overlap with the other part of the domain statement which accounts for opacity of subjects in the S-paradigm. As discussed above, Turkish and Portuguese suggest that AGR rather than tense is the relevant factor for opacity. Since AGR is always linked to a syntactic subject, there is some motivation for considering the agreement element as a kind of (shadow) subject. Using this observation, we can unite AGR with syntactic subjects (i.e. [NP, S] and [NP, NP]) under the designation ‘SUBJECT’ (read “capital subject”)—see Chomsky 1981:209ff. Under this analysis the domain statement for Principle A reduces to “domain of SUBJECT”.13 This formulation of Principle A accounts for both the S- and NP-paradigms, with the exceptions of (28.a.i) and (28.b.i). In these latter cases the anaphor is free in the domain of a SUBJECT (= AGR of the sentential complement). This analysis is corrected by modifying the definition of opaque domain so that an AGR element does not function as a SUBJECT for anaphors contained in an NP which is coindexed with it. Note that in the cases where AGR functions as a SUBJECT with respect to an anaphor (i.e. the NIC cases), the anaphor is coindexed with the AGR element. Coindexing of the anaphor in (28.a.i) with AGR in the embedded S gives the representation (29). (29) They expect [ that [S [NPi each other’si books] AGRi will be favorably reviewed]]
This yields an indexed representation of the sentential complement subject where the referential index of the NP is also carried by a subpart of the NP itself. The interpretation of the coindexing in (29) would be that each other is coreferential with the NP each other’s books—which is of course incorrect. As proposed in Chomsky 1981, this indexing can be excluded by a general prohibition against NPs of the form (30), where i is a referential index.
Generative grammar (30)
194
*[NPi…NPi…]
(30) is empirically motivated by the impossibility of coindexing a pronoun contained in a NP with the NP itself, as in (31).14 (31)
*[NPi the owner of hisi boat] goes sailing on weekends.
If indexing applies freely as assumed, then some constraint is needed to rule out (31), since it does not fall under Principle B. (See Chomsky 1981:212f. for further discussion). Following Chomsky 1981, (30) can be incorporated into the formulation of the domain statement for Principle A by modifying the specification “SUBJECT”. Since AGR in (28.a.i) and (28.b.i) should not count as a SUBJECT for anaphor binding, it can be considered ‘inaccessible’ to the anaphors by virtue of (30). Thus the domain statement of Principle A is further modified to (32). (32) Principle A: an anaphor must be bound in the domain of an accessible SUBJECT.
Accessibility is defined in terms of (30) and “in the domain of” requires that the SUBJECT c-command the anaphor.15 The grammaticality of (28.a.i) and (28.b.i) constitutes evidence for this formulation of Principle A with respect to AGR. A similar argument holds for the lexical subjects of infinitival equatives.16 (33)
Mary believes those to be pictures of herself.
(33)
has the indexed representation (34).
(34)
Maryi believes [S thosej to be [NPi pictures of herselfi]]
The infinitival subject must be coindexed with the complement of equative be and hence is not an accessible SUBJECT for the anaphor, given (30). There is a certain naturalness in formulating the domain statement of Principle A in terms of accessible SUBJECT. Since anaphors must get their reference from an antecedent in the sentence in which they occur, it is appropriate to formulate the domain in which they must be bound in terms of a potential antecedent (i.e., syntactic subject or the agreement element which entails the existence of a subject). The notion ‘subject’ is crucial here because the simpler structural notion of c-commanding NP is not sufficient. Thus in (35) John c-commands each other but does not block proper binding of the anaphor by the subject we.17 (35) Wei told John lies about each otheri
Fundamental issues in the theory of binding
195
Why “subject” rather than “c-commanding NP” is relevant for binding remains to be explained. As noted by Wayne Harbert, this question is related to the question of why in many languages (e.g., Romance and Germanic (excepting English and Icelandic), and Japanese) the only possible antecedents for anaphors are subjects. As formulated in (32), Principle A still does not cover all the core cases. Consider (36) for example. (36)
*[NPi pictures of himself] AGRi are always on display.
AGR cannot be considered an accessible SUBJECT for the anaphor himself because coindexing of the anaphor with AGR would yield a violation of (30). Thus the anaphor does not occur in the domain of an accessible subject; and therefore (36) is not excluded by Principle A as formulated above. In Chomsky 1981 this problem is resolved by stipulating that an anaphor must be bound in root S. This stipulation is in addition to the domain statement in (32), and in essence constitutes a default condition with respect to (32). The default condition is unnecessary if we assume that, by definition, an anaphor must be bound to an antecedent (see Freidin 1978:fn.7). Under this assumption, Principle A is a condition on proper binding, where it is given that the anaphor is in fact antecedentbound. We then reformulate (32) as (37): (37)
Principle A: an anaphor cannot be free in the domain of an accessible SUBJECT.
(37) defines the notion ‘opaque domain’ for anaphor binding. Where an anaphor occurs unbound in a sentence (e.g. (22.a.i), (22.b.ii), and (36)), the sentence is illformed because it does not satisfy the basic requirement for anaphors—that they be bound to some antecedent. The requirement is needed independently of (36) to exclude (22.a.i). In the latter example, the anaphor is bound to an accessible SUBJECT—i.e., AGR. The example is illformed nonetheless because AGR is not a possible antecedent since it has no intrinsic reference. This suggests that ‘bound’ should be interpreted as ‘antecedentbound’ rather than ‘coindexed with a c-commanding element’. So far we have been assuming that the definition of SUBJECT covers all syntactic subjects. This holds for ‘referential’ lexical subjects and their empty category counterparts (i.e., trace), and in addition PRO. (38)
a.
NP trace: *Johni seems to usj [S ei to like each otherj]
b.
WH trace: * Which boyi do theyj believe [S ei to like each otherj]
c.
PRO: *Wei persuaded Johnj [ PROj to like each otheri]
In (38a–b) the anaphor is free in the domain of a trace subject and in (38c) it is free in the domain of a PRO subject. In comparison with referential NP, however, nonreferential (or
Generative grammar
196
pleonastic) it does not appear to have a strong opacity inducing effect, as discussed in Freidin and Harbert 1983. For example, (39b) is relatively wellformed in comparison to (39a), which is ungr ammatical. (39)
a. b.
*Theyi expect [S Johni to seem to each otheri [S ej to be crazy]] Theyi expect [S it to seem to each otheri [ that John is crazy]]
As illustrated in (40), wellformedness in these constructions is not significantly affected by the addition of AGR to the complement of expect. (40) a. *Theyi expected [ that Johnj AGR would seem to each otheri [S ej to be crazy]] b. Theyi expected [ that it AGR would seem to each otheri [ that John is crazy]]
(40b) is on a par with (39b)—i.e., it is relatively wellformed in comparison with the ungrammatical (40a). Given this statement of the facts,18 a distinction between “referential” and “nonreferential” subjects is necessary for the formulation of Principle A. Such a distinction can be given in terms of thematic relations (or θ-roles). Referential subjects bear θ-roles and thus constitute θ-subjects, in contrast to non-θ-subjects like nonreferential it. Only θ-subjects qualify as accessible subjects. This account can be extended to the NIC effect if the θ-status of the syntactic subject is somehow transferred to AGR.19 Thus where AGR is linked to a θ-subject, the INFL projection (=S, etc.) constitutes the domain of a θ-SUBJECT. Where S contains a syntactic θ-subject and no AGR, the domain of the θ-SUBJECT is the c-command domain of the syntactic subject. This analysis has a certain naturalness in that non-θ-subjects are not possible antecedents for anaphors since they are inherently nonreferential. (For further details see Freidin and Harbert 1983). To summarize, lexical anaphors are subject to two conditions: (41)
a. an anaphor must be bound to an antecedent. b. a bound anaphor may not be free in the domain of an accessible θ-SUBJECT.
(41a) is part of the definition of “anaphor”; whereas (41b) constitutes a condition on proper binding which we call Principle A of the binding theory. 3 Domain D’ of principle B As formulated in (7) above, Principle B requires that a pronoun be antecedent-free in a domain D’. The interpretation of “free” here is not, strictly speaking, the converse of the interpretation of “bound” as required for anaphors. Binding for anaphors entails coreference; whereas freedom for pronouns entails the stronger requirement of disjoint
Fundamental issues in the theory of binding
197
reference—see Lasnik 1976 and Higginbotham 1980a:fn.1).20 In the following discussion the examples cited do not distinguish between disjoint reference and noncoreference. The S-paradigm for bound pronouns shows a systematic correspondence with the analogous paradigm for anaphors. (42)
simple sentences *Johni admires himi
a. (43)
complex sentences a.
pronoun in subject position:
b.
i.
Johni expects [ that hei will win]
ii.
*Johni expects [S himi to win]
pronoun in object position: i.
Johni expects [ that Mary will admire himi]
ii.
Johni expects [S Mary to admire himi]
The correspondence between (42–43) and the analogous paradigm for anaphors (22–23) is that bound pronouns cannot occur in positions where anaphors are properly bound, and conversely, anaphors cannot be properly bound in positions where bound pronouns may occur. This correspondence motivated earlier analyses (e.g., Chomsky 1981) in which the domain statements of Principles A and B were taken to be identical. Although this analysis holds as a first approximation, it fails for the larger class of cases. Thus with the NP-paradigm for bound pronouns (44–45) the generalization fails almost in its entirety. (44)
simple sentences a.
pronoun in subject position: i.
b.
(45)
Johni read [NP hisi book]
pronoun in object position:21 i.
Johni doesn’t read [NP books about himi]
ii.
Johni doesn’t read [NP Mary’s books about himi]
iii.
*Mary doesn’t read [NP John’s books about himi]
complex sentences a.
pronoun in subject position: i. Johni expects [ that [S [NP hisi book] will be favorably reviewed]]
ii.
Johni expects [S [NP hisi book] to be favorably reviewed]
b. pronoun in object position: i.
Johni expects [ that [S [NP books about himi] AGR will be well received]]
Generative grammar ii.
198
Johni expects [S [NP books about himi] to be well received]
iii. Johni expects [S[NP Mary’s book about himi] to be well received] iv. Johni expects [ that [S [NP Mary’s book about himi] AGR will be well received]] v
Johni expects [ that [S Mary AGR will read [NP books about himi]]]
vi. Johni expects [S Mary to read [NP books about himi]]
With the exception of (44.b.iii) where a pronoun in NP must be free with respect to the subject of that NP (cf. the corresponding case for anaphors (27.b.iii)), NP does not have the same opacity effects for pronouns as it does for anaphors. NP is an opaque domain with respect to disjoint reference whether or not the NP contains a syntactic subject.22 This suggests that the notion ‘accessible SUBJECT’ is therefore not the correct notion for formulating the domain statement of Principle B. Since a pronoun—in contrast to anaphors—does not require that an antecedent be present in S, the notion ‘accessible subject’ is unmotivated with respect to disjoint reference (see Huang 1983 for further discussion). Nonetheless, formulating Principle B in terms of accessible SUBJECT does account for the S-paradigm. An alternative formulation must therefore account for the opacity of finite S and the c-command domain of infinitival subject as well as the opacity of NP. The opacity effects for NP with respect to disjoint reference follow without further stipulation if domain D’ for Principle B is formulated in terms of ‘governing category’, defined in the literature as follows. (46) α is a governing category for β where α is the minimal maximal phrasal projection (i.e., of a lexical head) containing β.
Reference to a governor is superfluous since reference to a maximal phrasal projection entails the existence of a lexical head which governs the phrasal constituents of its projection. Unfortunately there are two defects with this simple definition: 1) it fails to account for the S-paradigm; and 2) it incorrectly predicts that AP, PP, and VP will also be opaque with respect to disjoint reference, as indicated in (42) for VP and (47) for AP.23 (47)
*Johni is [AP proud of himi]
Huang 1983 proposes that the domain statement of Principle B be formulated in terms of government and the presence of a SUBJECT. Thus Principle B states that “a pronoun must be free in its governing category,” where governing category is redefined as in (48) (cf. Huang 1983: (14)). (48) α is a governing category for β iff α is the minimal category containing β, a governor of β, and a SUBJECT.
Fundamental issues in the theory of binding
199
Huang assumes that the lexical head of NP is a SUBJECT of NP (analogous to AGR which is considered the nominal head of the INFL projection (see Chomsky 1981)). Therefore NP is always a governing category for pronouns since the head N is both governor and SUBJECT with respect to pronominal NP constituents of the NP. This proposal also encounters problems with the S-paradigm. In (49) VP is the minimal category (=maximal phrasal projection of a lexical head) containing a governor of the pronoun her (=V) and a SUBJECT (=her). (49)
*Maryi [VP believes [S heri to be clever]]
Since, as Huang notes, the accessibility of SUBJECT to the pronoun is not conceptually relevant to pronoun binding, VP will fit the definition of governing category for the pronoun. To define the matrix S as the governing category for (49), we could require that the SUBJECT be distinct from the pronoun. This comes dangerously close to reintroducing the notion of accessibility into the definition of governing category for pronouns. For (50) however we must do just that if (48) is to be maintained in some form. (50)
*Maryi [VP seems [PP to heri] [S ei to be unhappy]]
VP is again the minimal maximal projection containing the pronoun her, a governor of the pronoun (=the preposition to), and a SUBJECT (=ei). Requiring that the SUBJECT ccommand the pronoun reinstates the notion of accessibility. The problem with formulating domain D’ for Principle B is to characterize what is common to NP and S—excluding VP and AP—and in addition allow for the transparency of subjects of infinitivals without resorting to the notion of accessible SUBJECT. Only in this way can we avoid having to stipulate NP and S as binding categories for pronouns. One common feature of S and NP—in addition to allowing syntactic subjects—is that both are domains in which θ-roles are assigned by predicates (i.e., verbs, predicate adjectives, and certain nominals), as illustrated by the sentence/nominal pair in (51). (51) a. …The mayor criticized the city council. b. The mayor’s criticism of the city council. The mayor’s criticism of the city coun
Since V and A may assign a θ-role to an NP outside of their maximal projections (VP and AP)—e.g., to a subject of S, they do not count as “θ-domains.” (52) A θ-domain of, a predicate (=V, A, or N), is the minimal domain in which α assigns it θ-roles to arguments.24
A θ-domain constitutes a complete functional construct. Note that this is a type (as opposed to a token) distinction since S functions as a θ-domain even when the sentential subject is a non-θ-position, as in (50).
Generative grammar
200
In order to account for the transparency of infinitival subjects in the S-paradigm for pronouns, it is necessary to incorporate the notions of government and θ-domain in the formulation of the domain statement for Principle B. (53)
Principle B: a pronoun must be free in the θ-domain of its governor.
In (49), the binding domain for the pronoun her will be the θ-domain of believes—i.e., the matrix S. In (50), the preposition to which governs the pronoun her does not assign the pronoun a θ-role independently of the matrix verb seems. Therefore the θ-domain of the preposition will be identical to that of the verb—i.e., the matrix S. In the case of AGR, however, it is unnatural to talk about its θ-domain since AGR does not assign a θrole to the subject it is coindexed with. This problem can be avoided if we mentioned instead the θ-domain of the predicate which is constructed with AGR. Thus the domain statement of Principle B might be more appropriately formulated as “the minimal θdomain containing a governor of the pronoun.” It is worth noting in conclusion that the notions “θ-domain” and “domain of subject” are conceptually connected in that a subject marks the periphery of a θ-domain. 4 Binding theory and empty categories In GB it is assumed that the binding theory applies to empty categories as well as to lexical NP-types, and therefore provides a typology of empty categories similar to that of lexical NPs (see Chomsky 1981, 1982a). A critical examination of this assumption follows. To a large extent this view is based on the observation of earlier analyses (e.g., Chomsky 1976) that the opacity effects for lexical anaphors in the S-paradigm extend to NP-trace and PRO.25 It was assumed that NP-trace and bound PRO could be considered as empty category analogues to lexical anaphors and that their distribution would then follow from the same principles that determined the distribution of lexical anaphors (i.e. the SSC and TSC of the earliest accounts and their descendents). This analysis is not supported by the opacity effects in the NP-paradigm since neither NP-trace nor PRO may occur as arguments of nominals. (54)
a. b.
*Johni read [NP PROi books] *Johni was received [NP ei book]
(54a) cannot be interpreted as ‘John read his own books’ and (54b) is not a wellformed variant of John’s book was reviewed. (54b) is independently excluded by the Case Filter (Chomsky 1980, 1981) since the NP containing the lexical item book is not marked for case by the passive predicate reviewed. To exclude PRO from a variety of positions (e.g., in (54a)), Chomsky (1981) analyzes PRO as a pronominal anaphor and therefore subject to both Principle A and Principle B. In this analysis the binding domains for both principles are stated in terms of governing category, where specification of a governor is crucial. Thus PRO as a pronominal anaphor must be both bound and free in its governing
Fundamental issues in the theory of binding
201
category if it has one. The conclusion then is that PRO can only occur in ungoverned positions—i.e., subject of gerunds and infinitivals where does not apply (cf. Aoun and Sportiche 1983). Given that the binding domains for Principles A and B are distinct with respect to the NP-paradigm as discussed above, it no longer follows as a theorem of the binding theory that PRO must be ungoverned. Specifically, the nonoccurrence of PRO in NP is not explained by the binding theory.26 Furthermore, given the formulations of Principles A and B above, the binding theory now accounts for the relevant cases of PRO binding, given in (55), as follows. (55)
a. b. c. d.
*Johni saw PROi *Johni expects [ that PROi AGRi will win] *Johni expects [S Mary to like PROi] Johni expects [ PROi to like Mary]
(55a) is excluded because PRO-binding violates Principle B (although it satisfies Principle A). (55b) is illformed because PRO is free in the domain of an accessible SUBJECT (=AGRi) and therefore violates Principle A, though it satisfies Principle B. (55c) is illformed for the same reason, where the accessible SUBJECT here is the syntactic subject Mary. This analysis of PRO as a pronominal anaphor excludes the illformed cases, but is only partially relevant for the wellformed (55d). With respect to Principle A, the matrix S is the binding domain for PRO. Thus PRO satisfies Principle A in (55d). Since is a barrier to government, PRO is ungoverned in (55d) and therefore neither satisfies nor violates Principle B. If PRO were governed (i.e. if had occurred), then PRO-binding in (55d) would violate Principle B. If a lexical pronoun himi were substituted for PRO; in (55d), then to avoid the Case Filter the lexical pronoun must be marked for case, hence governed and thus in violation of Principle B. For (55d) then it is irrelevant that PRO is analyzed as a pronominal, in contrast to (55a) where it is crucial. This leaves the problem of the nonoccurrence of PRO in NP. If we take it as an axiom of the theory that PRO cannot occur in a governed position, then the full distribution of PRO (including non-control PRO, which doesn’t behave like an anaphor in any event) is accounted for. Of course this eliminates the motivation for analyzing control PRO as a pronominal anaphor. Its distribution would be accounted for by the above-mentioned PRO-axiom and the lexical property of particular verbs which allow (or require) control structure complements (e.g., persuade (obligatory control) vs. expect (optional control)— see Jackendoff 1972).27 The distribution of NP-trace patterns exactly like that of lexical anaphors in the S-paradigm—compare (22–23) with (56–57). (56)
simple sentences Johni was awarded [NPi e] $1,000.
(57)
complex sentences a.
trace in subject position: i.
*Johni was expected [ that [S[NPi e] AGRi will win]]
Generative grammar ii. b.
202
Johni was expected [S [NPi e] to win]
trace in object position: i.
*Johni was expected [ that [S Mary would admire [NPi e]]]
ii.
*Johni was expected [S Mary to admire [NPi e]]
As with lexical anaphors, the subject of an infinitival complement is transparent to binding from the matrix clause. In the NP-paradigm, however, NP-trace is generally disallowed, as illustrated by a comparison of illformed examples like (58) and (59) with the corresponding wellformed examples (60) and (61). (58)
*Johni was discovered [NP pictures (of) ei]
(59)
*Johni discovered [NP pictures (of) ei]
(60)
[NPi pictures of John] were discovered [NPj e]
(61)
Johni discovered [NP pictures of himselfi]
The ungrammaticality of (58) is predicted within case theory since the NP pictures of e, being governed by the passive participle, will not be assigned case—in violation of the Case Filter, which requires that a lexical NP be case-marked. The ungrammaticality of (59) follows from the theory of predicate/argument structure, which prohibits a single NP from being assigned more than one argument function in a sentence.28 In (59), John is assigned the argument function of subject of discovered and in addition the argument function of object of the nominal pictures via trace-binding, in violation of the condition on the uniqueness of argument assignment. Because case theory and the theory of predicate/argument structure independently predict the nonoccurrence of NP-trace in NPs, the NP-paradigm for NP-trace provides no motivation one way or the other for considering NP-trace as an anaphor. Another instance where NP-trace does not pattern like a lexical anaphor involves binding across non-θ-subjects. While it appears that an anaphor may be free in the domain of an accessible non-θ-subject (see (39b) and (40b) above), a NP-trace may not (as noted in Freidin and Harbert 1983). (62)
*Hei was expected [ for [S it to be insulted [NPi e]]] cf. It was expected for himi to be insulted [NPi e]
(62) cannot be explained as a violation of the Case Filter or conditions on predicate/argument structure. It could be excluded under Principle A if NP-trace is analyzed as an anaphor with the proviso that nonlexical anaphors are not sensitive to the distinction between θ-subjects and non-θ-subjects. NP-trace is unlike a lexical anaphor in yet another way It does not bear a θ-role independently of its binder. While a lexical anaphor receives its reference from its binder, the NP-trace transmits a θ-role to its binder. Reference is not at issue. Suppose that for NP-trace the domain of accessible SUBJECT does not involve the θ/non-θ distinction as
Fundamental issues in the theory of binding
203
indicated in (62). Then the binding domain for anaphors with respect to the S-paradigm could be reformulated in terms of θ-domain since both formulations are equivalent with respect to the S-paradigm. (Recall that the NP-paradigm is not relevant for the formulation of binding domains with respect to NP-trace). Thus a θ-role can only be transmitted within a θ-domain, just as reference can only be transmitted within a θdomain (defined in terms of accessible θ-SUBJECT). NP-trace, like a lexical anaphor, may not be free in its binding domain. The distinction between NP-trace and WH-trace is based on the different properties of these traces in wellformed structures. NP-trace is always caseless, whereas a corresponding WH-trace (i.e. in an NP-position) is always casemarked (see Lasnik and Freidin 1981). There is also a distinction between the binders of the two types of trace. NP-trace is always bound by an NP in a grammatical function (GF) position (e.g. subject), while WH-trace is always bound by a phase in COMP, a non-grammaticalfunction position. In the GB framework, a GF position is designated as an Aposition; and a position, as an A-position (where “A” stands for “argument”). NPtrace is properly bound when it is A-bound in its binding domain. WH-trace is properly bound when it is Ā-bound subject to certain locality requirements (e.g., Subjacency see Chomsky 1977b, 1981). Given that PRO is ungoverned, then it too, like NP-trace, is a caseless empty category. The case-marked/caseless distinction is therefore sufficient to differentiate empty categories that function like anaphors from WH-trace. This functional interpretation of empty categories contrasts with a derivational interpretation in which an NP-trace is a trace resulting from movement between A-positions, as opposed to a WH-trace, which results from moving a WH-phrase to an Ā-position (COMP). Before exploring the functional analysis of empty categories, let us first determine the status of WH-trace with respect to the binding theory. The binding theory as presented above is a theory of A-binding. NP types are determined on the basis of behavior with respect to Principles A, B, and C as applied to A-binding. To determine the status of WH-trace, we consider its behavior with respect to A-binding. The simplest case is given in (63) where the WH-trace (ei) is A-bound by the pronoun he. (63)
*[ whoi [S did hei see e]]
The coindexing in (63) expresses the interpretation “which person is such that that person saw himself?”. Because the question who did he see? has no such interpretation, this representation must be excluded. The question cannot be asked with intended coreference between the subject and object of see. To express this intended coreference, the question must be formulated as who saw himself?. Therefore a WH-trace does not behave like a lexical anaphor. The example does not distinguish between the two remaining possibilities for the status of WH-trace—as a name or a pronoun—since both must be A-free in S. The two possibilities can be distinguished in the subject position of a finite sentential complement, where a pronoun may be bound but a name must be free (cf. (43.a.i) vs. (4)). The construction which distinguishes between the binding of pronouns and names shows that WH-trace behaves like a name rather than a pronoun. Consider the following paradigm.
Generative grammar (64)
a. b.
204
i.
Johni said [ hei won]
ii.
*Hei said [ Johni won]
i.
[ whoi [S ei said [ hei won]]]
ii.
*[ whoi [S did hei say [ ei won]]]
In (64.a.i) and (64.b.i) the pronoun is free in this binding domain, thereby satisfying Principle B. In (64.a.ii) the name John is bound, and therefore this representation violates Principle C. Similarly, intended coreference between the subjects of say and won in (64.b.ii) is not possible, in contrast to (64.b.i). Assuming that the WH-trace has the status of a name, (64.b.ii) is excluded under Principle C as well.29 So far, we have empty categories that behave like anaphor s (NP-trace and control PRO) and names (WH-trace) with respect to the binding theory. Whether there are empty categories that behave like pronouns remains to be determined. If there are, then the typology of empty categories would seem to mirror the typology of lexical NPs (with perhaps the exception of pleonastic elements—though see Chomsky 1982a). Evidence for empty categories behaving like lexical pronouns is open to interpretation. Putting aside the question of whether they exist in English (see again Chomsky 1982a), let us consider some evidence from Spanish. In Spanish, empty categories occurring in the subject of finite clauses must be differentiated.30 The empty categories in (65) illustrates the standard paradigm for NPtrace with raising verbs. (65)
a.
Ellosi parecen [S ei haber ganado] They seem to have won
b.
*Ellosi parecen [ que [S ei han ganado]] They seem that have won
In (66) however the empty categories cannot be analyzed as NP-traces (or PRO since they are in governed positions). (66)
a.
Dejé a Juani [ que[S ei/*j se fuera]] I let Juan go
b.
Juani quiere [ que [S e*i/j se vaya]] Juan wants *him/you to go
c.
Juani no cree [ que [S ei/j se vaya]] Juan doesn’t believe that he/you is/are going
In (66a) the empty category must be coindexed with an antecedent, whereas in (66b) it cannot, and in (66c) coindexing is optional. Thus only in (66c) does the empty category exhibit the same behavior as a lexical pronoun. The obligatory coreference in (66a) and
Fundamental issues in the theory of binding
205
the obligatory noncoreference in (66b) might be ascribed to lexical properties of the matrix verb dejar and querer, just as verbs of obligatory control in English impose (as a lexical property) coreference between their subject (or object) and the subject of an infinitival complement. In any event, the empty categories in (66) are distinct from NP-trace because they do not transmit a θ-role to their binder. As noted above, they are also different from PRO in that they occur in a governed position. Following standard practice, we will refer to such empty categories as pro (read “small pro”). Under this analysis, empty categories can be distinguished in terms of the properties of their binder and the structural position they occupy. (67)
A.
Properties of binders: 1.
2.
B.
in A vs. Ā position: a.
A position: NP-trace, PRO, pro
b.
Ā position: WH-trace
with vs. without an independent θ-role: a.
with independent θ-role: PRO, pro
b.
without independent θ-role: NP-trace, WH-trace
Structural properties of the empty category: 1.
2.
governed vs. ungoverned: a.
governed: NP-trace, WH-trace, pro
b.
ungoverned: PRO
case-marked vs. caseless: a.
case-marked: WH-trace, pro
b.
caseless: NP-trace, PRO
(67) translates into the following feature matrix. (68)
A-position
independent θ-role
governed
case-marked
WH-trace
−
−
+
+
NP-trace
+
−
+
−
PRO
+
+
−
−
pro
+
+
+
+
Given this feature analysis of empty categories, one question that arises is why are there only four empty category types when sixteen are possible.31 For example, is there an empty category like WH-trace with respect to its binder but ungoverned and caseless? Presumably the answer depends on what empty category types are compatible with the
Generative grammar
206
theory of grammar. Thus an empty category which is case-marked must be governed since case-marking occurs only under government. This rules out the possibility of an empty category which is case-marked but ungoverned. In the current GB framework then, some properties take precedence over others. The problem is to determine which properties of empty categories are basic and which can be predicted by the various principles of grammar. It is by no means obvious how this is to be established. Any one feature in (68) could be eliminated and the remaining three would still distinguish between the four empty category types. If two features are eliminated, then the four empty categories could be distinguished only with the two features in (69).32 (69)
independent θ-role
case-marked
WH-trace
−
+
NP-trace
−
−
PRO
+
−
pro
+
+
This suggests that a functional determination of empty categories minimally requires reference to at least one property of the binder and one property of the empty category itself.33 The functional determination of empty categories has some intriguing consequences for the analysis of representations containing empty categories. Consider (70) for example. (70)
*Johni likes ei
There are two possible derivations of (70), either by movement—in which case ei is a NP-trace, or by base generation—in which case the empty category is either PRO or pro. Functionally, ei is governed, case-marked, and A-bound by an antecedent with an independent θ-role. Therefore ei does not function as a trace (see (68)) regardless of how it is derived. Rather, it is functionally pro and hence a pronominal with respect to the binding theory. Therefore (70) violates Principle B since pro is not free in its bindingcategory. 3 The alternative derivational analysis of the empty category in (70) requires two different explanations for the ungrammatically of the example. Trace-binding here would be ruled out by the prohibition against nonunique θ-role assignment, assuming that John would be assigned a second θ-role via trace-binding (see p. 171). PRO-binding would be excluded because PRO would be governed in violation of the PRO-axiom; and probinding would violate Principle B as discussed above. It might appear that this analysis allows us to derive the condition on uniqueness of θrole assignment via the functional determination of empty categories. The full derivation does not go through however. In (71), the empty category is functionally determined as pro, but would not violate Principle B.
Fundamental issues in the theory of binding (71)
207
*Johni believed [S Mary to like ei]
Since the empty category is both governed and case-marked, it cannot be analyzed as PRO; nor can it be analyzed as NP-trace because it is case-marked and has a binder with an independent θ-role. Functionally this empty category is pro. However, in English the distribution of pro is limited to parasitic gap constructions, as in (72) where ei is a variable (WH-trace) which licences the parasitic gap (=pro) (see Chomsky 1982a, for discussion). (72)
Which lettersi did you file ei [without PRO reading ]
Note that ei does not bind because it does not c-command We might therefore assume that whatever accounts for the limited distribution of pro in English will also account for both (70) and (71) above, without recourse to Principle B, and independently of the functional determination of empty categories. Nonetheless, the functional determination of empty categories is available within the GB framework for the analysis of derived structure. Whether it is empirically or conceptually better motivated than the derivational account of empty categories remains to be determined.34 What is clear is that, under the derivational determination, the binding theory can be used to identify empty category types (e.g., NP-trace as an anaphor), whereas binding theory cannot be used in this way under the functional determination. 5 Binding theory and levels of representation Under the assumption that the binding theory consists of a set of filters which determine the wellformedness of binding for some level (or levels) of representation, an empirical issue concerning which levels are relevant arises. In most discussions it has been assumed that Principles A, B, and C apply as a unit. This may not be correct as will be discussed below. For the purposes of discussion, we will assume the following levels of representation. (73)
The mapping from S-structure to LF establishes quantifier/variable structures via a rule of Quantifier Raising (see May 1977) and perhaps some mechanism for reconstructing
Generative grammar
208
phrase-markers (see Chomsky 1981) as will be discussed below. The mapping from LF to LF’ (where “LF” stands for “logical form”) yields predication-structures via coindexing (as discussed in Chomsky 1982a:fn.11). Given these four levels, we may now consider what evidence might bear on the application of each of the three binding principles at each level. If a principle can apply at more than one level (as suggested in Chomsky 1981, 1982a), then the number of potential binding theories is greatly increased, as is the difficulty of identifying the correct theory. Moreover, some of the evidence that bears on the issue seems to lead to paradoxes. The following discussion is offered as an illustration. Chomsky 1981 addresses the question of levels with respect to Principle C. He notes that the rules of Quantifier Raising and scope assignment to wh-phrases in situ will map the S-structures of the sentences in (74) onto the corresponding LF structures in (75), (irrelevant details omitted).35 (74) a. He liked every book that John read. b. I don’t remember who thinks that he read which book that John likes. (75) a. (for every book x that John read) [he likes x] b. I don’t remember (for which person y and which book x that John likes) [y thinks that he read x]
he c-commands John in (74), but not in (75). Since the pronoun and name are obligatorily disjoint in reference—which would be predicted if Principle C applies to (74) (i.e., at Sstructure rather than LF), examples like (74) provide evidence that Principle C holds at Sstructure. There is also evidence that Principle C does not hold at S-structure. Consider the pair of sentences in (76). (76)
a. Which report that John revised did he submit? b. Which report that John was incompetent did he submit?
While John and he may be construed as coreferential in (76a), they must be construed as disjoint in reference in (76b). This difference in interpretation appears to be systematic, based on the distinction between the relative clause in (76a) vs. the sentential complement in (76b). Given that the sentential complement, but not the relative clause, subcategorizes the noun report, it may be that the subcategorization domain is accessed in the interpretation of the variable bound by the WH-phrase. Presumably this access does not extend to relative clauses, which fall outside the subcategorization domain of the nouns they modify The interpretation of (76a) follows from the assumption that Principle C holds at Sstructure. Unfortunately, this assumption gives the wrong result for the interpretation of (76b), where obligatory disjoint reference holds between John and he as if the pronoun ccommanded the name. At S-structure, the pronoun does not c-command the name in (76b), and therefore Principle C will not account for the only possible interpretation. One way to reinstate the appropriate c-command relationship between the name and the pronoun is to assume that some sort of reconstruction applies between S-structure and LF. That is, in LF the sentential complement of the noun (but, crucially, not the relative
Fundamental issues in the theory of binding
209
clause) would occur in the object position of submit.36 Some support for a process of reconstruction as sketched comes from other cases of WH-movement where Principles A and B are involved. (77)
a. b. c. d. e.
*How angry at Johni was hei? *How angry at himi was Johni? *How angry at himi did Mary say Johni was? How angry at himi did Johni say Mary was? *How angry at Johni did hei say Mary was?
(78)
a. b. c.
How angry at each otheri were theyi? How angry at each otheri did Mary say theyi were? *How angry at each otheri did theyi say Mary was?
(77a–b) show that disjoint reference holds between the pronoun and the name in spite of WH-movement, which destroys the c-command relation that held between them at D-structure. (77a) is presumably a violation of Principle C, whereas (77b) violates Principle B. However (77d), in contrast to (77e), shows SSC (or TSC) effects—i.e., disjoint reference between him and John is blocked by the presence of the intervening subject Mary. (78c) indicates the same effects with respect to bound anaphora, where proper binding of each other by they is blocked by the intervening subject Mary. In other words, Mary functions as an accessible subject for the anaphor in spite of the fact that c-command between Mary and each other does not hold at S-structure. These paradigms would be accounted for under the assumption that Principles A and B apply at some level of representation in which reconstruction has applied to S-structure. It should be noted at this juncture that the notion of reconstruction is at best problematic, as discussed in Higginbotham 1980:section 6 and Higginbotham 1983:section 3. The major difficulty involves the analysis for the LF-representation of sentences like (79). (79)
Which book about which pianist did she read?
The pronoun she cannot be coindexed with the variable bound by which pianist in LF. This would follow under Principle C given that variables (including WH-trace) have the status of names (see section 4 above) and that the pronoun c-commands the relevant variable in LF. The idea is to assimilate structures like (79) to those of (80), which contain names instead of a variables. (80)
She read a book about Ingrid Haebler.
To do this, (79) would have to have an LF representation like (81). (81)
(for which pianist y) (for which book x) [she read [x about y]]
Generative grammar
210
Yet the status of the term [x about y] in (81) is questionable on both syntactic and semantic grounds.37 The above discussion seems to suggest that the notion of c-command may not be sufficient for binding theory when extended to Ā-binding as in the case of WH-movement and Quantifier Raising. As we have seen, the issues surrounding this question are complex and will require substantial clarification. The question of levels of application for the binding theory depends in part on the prior question of the sufficiency of c-command. Before concluding this section, let us examine some additional evidence, which does not involve the question of the sufficiency of c-command, but instead bears on the question of whether Principles A, B, and C apply as a module at the same level(s) of representation. The evidence concerns equative constructions as discussed in section 2 with respect to the notion ‘accessible SUBJECT’. A relevant example was given as (33) above, with an indexed representation (34). (34)
Maryi believes [S thosej to be [NPi pictures of herself)]]
In order to determine that those is not an accessible SUBJECT for the anaphor herself, those must be coindexed with NPj. Yet given this coindexing, we might ask why (34) is not in violation of Principle C. One possible answer would be that Principle C applies at a level of representation prior to the coindexing of those and NPj. Suppose that this coindexing results from a rule of predication which maps LF onto LF’ Then Principle A must apply at a different level of representation than Principle C. Given (34) we might assume that Principle A applies at LF’, whereas Principle C applies at LF or S-structure, depending on the sufficiency of c-command for Ā-binding. In this way there may be some motivation for distinguishing Principle C from Principles A and B. So far, however, there appears to be no motivation for separating Principles A and B in terms of the level(s) of representation to which they apply. 6 Summation The theory of binding investigated here consists of three principles which determine the distribution of bound elements—anaphors, pronouns, and names. Section 1 demonstrates that the relation between an antecedent and a bound element relevant for the operation of these principles is that of c-command. Section 2 addresses the issues involved in specifying opaque domains for anaphor binding, concluding that the notion of accessible θ-SUBJECT gives the closest fit to the facts. In section 3 it is noted that while the Sparadigms for pronoun and anaphor binding are analogous, the NP-paradigms are not. To account for these facts of pronoun binding, it is necessary to formulate the notion of opaque domain differently from that of Principle A. This is achieved by utilizing the notions of θ-domain and government, with the result that Principles A and B overlap with respect to opaque domains for the S-paradigm, but diverge in the appropriate way for the NP-paradigms. The extension of binding theory to empty categories is considered in section 4. While there is some motivation for concluding that WH-trace is subject to Principle C, it is questionable whether PRO is subject to the binding theory in any way.
Fundamental issues in the theory of binding
211
Furthermore, there is some evidence that NP-trace should be treated as an empty category analogue of a lexical anaphor and hence subject to Principle A. In this way, the binding theory contributes to a typology of empty categories, under which empty categories may be identified functionally or derivationally. The question of which levels of representation are relevant for the three principles of the binding theory is raised in section 5. Discussion establishes that the c-command relation of section 1 may be insufficient for cases involving Ā-bounding. This suggests that S-structure by itself is not the appropriate level for any of the principles. Equative constructions provide evidence that Principle C should apply to a different level than Principles A and B. This article has attempted to provide an overview of some of the fundamental issues confronting a theory of binding. At various points, alternative proposals have been considered. At the level of grammatical analysis it is often difficult to distinguish between alternatives which cover the same range of data. Yet these alternatives may not be equivalent in terms of what they suggest for processing or acquisition of language. Consider for example the issue of defining opaque domains for Principles A and B. If the definition is identical for both principles, then we might expect that a child who has acquired the S-paradigm for anaphors will also have the S-paradigm for pronouns given that Principle B is operative in his/her grammar. If, however, the definitions are different as proposed above, we might expect to find differences in the acquisition of the S-paradigms for pronouns and anaphors. For one study that argues in favor of the latter case, see Solan to appear. Notes
* This paper was intended as a critical review/summary of current work on binding theory circa 1983, though at some points it goes beyond this. I am indebted to Wayne Harbert for our many discussions on the binding theory. I would also like to thank Noam Chomsky Lori Davis, Howard Lasnik, and Barbara Lust for comments on an earlier draft. 1 The term “name” covers common as well as proper nouns, excluding bound anaphors (i.e., reflexive pronouns and reciprocals), pronouns, nonreferential it and existential there. There is some evidence that variables—in particular, the empty category bound by a WH-quantifier (see section 4 below)—have the status of names. See Freidin and Lasnik 1981 for an extensive discussion of this point. Chomsky 1981 uses the term “R-expression” (for “referential expression”) rather than “name” in the formulation of Principle C. 2 See Freidin 1978 for discussion of conditions on rules vs. conditions on representations. 3 For a somewhat different definition of c-command, see Chomsky 1981: chapter 3. 4 This point is due to Wayne Harbert. 5 See Chomsky 1981, Stowell 1981, and Manzini 1983 for discussion of the mechanism of reanalysis. 6 It should be noted that these constructions are also problematic in another way. The to-phrase and the about-phrase may be freely ordered, as in (1). (i)
a. b.
John talked to Bill about Mary. John talked about Mary to Bill.
The ordering affects the distribution of anaphors so that neither PP may contain an anaphor when the about-phrase precedes the to phrase (cf. (18)).
Generative grammar
(ii)
a. b. c.
212
John talked to Mary about herself. *John talked about herself to Mary. *John talked about Mary to herself.
If (18–20) are subject to reanalysis as discussed above, then it must be prevented from applying to constructions like (i.b). Then (ii) falls under Principle A as expected. This analysis makes a prediction with respect to Principles B and C. It should be the case that (iii) and (iv) are wellformed, since the object of about does not c-command the object of to. (iii)
*John talked about heri- to Maryi
(iv)
?John talked about Maryi to Maryi.
The prediction is false for (iii) and unclear for (iv). If “bound” and “free” are to be defined in terms of c-command, as the evidence bearing on Principles A and C suggests, then (iii) will have to be accounted for on other grounds. The alternative of complicating the binding theory by defining “free” for Principle B differently than for Principle C is not well-motivated. It may be that some condition on linear order beyond what follows from c-command is necessary for pronouns. For further discussion of problems with pronouns and the c-command condition, see Reinhart 1981:section 5. 7 See Rizzi 1981 for further arguments in favor of the NIC over the TSC/PIC. 8 This data is from George and Kornfilt 1981—their (37). 9 This example is cited in Harbert 1982 and credited to Avery Andrews. 10 The first proposal of this sort comes from Rouveret and Vergnaud 1980. 11 The problem with this is that INFL structurally governs the subject NP of whether or not it assigns case to the subject. The analysis could be maintained if we require that for government to hold there must be both a head and its maximal projection. Since maximal projections—especially considered to be barriers to government and hence to case assignment, it could be claimed that exceptional case-marking of lexical subjects in infinitival complements lacking a for complementizer requires -deletion (see Chomsky 1981). Under the stricter requirement that government entails a maximal projection as well as a head, INFL in exceptional case-marking constructions would not govern the infinitival subject—rather the matrix V would. 12 See Fiengo and Higginbotham 1981 for further discussion of opacity in NP. 13 As Chomsky notes (1981:210), the analysis provides a principled answer to the question of why S and NP should be the two governing categories under the earlier analysis. They are the two categories with subjects. 14 That is, (31) cannot be construed as “the owner of his own boat…”. Note however that (30) can be violated under certain conditions, as in (i)
Johni is [NPi the sole distributor of hisi records]
Fundamental issues in the theory of binding
213
Thus the constraint prohibiting (31) cannot be the simple formal statement given in (30). What is at issue here is that a pronoun cannot get its reference from a NP that contains it. In (i), however, his is bound to John rather than the NP that contains it. 15 In Chomsky 1981 the definition of ‘accessible’ redundantly specifies c-command. 16 This example is due to Kevin Kearney. I am indebted to Lori Davis for calling it to my attention. 17 It is assumed that both objects subcategorize the verb tell and therefore are constituents of the subcategorization domain of V Thus c-command follows. 18 Chomsky 1981:214f. gives a different analysis on the basis of the following data. (i)
*They think [it bothered each other that S] (=(84i))
(ii)
*He thinks [it bothered himself that S] (=(84ii))
Freidin and Harbert 1983 disagrees with the grammaticality judgment on (i)— although it could be a matter of sequences of tenses rather than a binding violation. Speakers who find (i) unacceptable nevertheless accept (iii).
(iii)
They think [it would bother each other that S]
The unacceptability of (ii) might be related to the fact that there is a grammatical alternative to the reflexive—i.e., the proximate pronoun, though how this relates to some principled account remains to be determined. 19 Suppose this involves the actual transfer of the θ-property rather than a sharing of it. Then we have an account for why AGR takes precedence over the syntactic subject. Though this approach might raise further problems concerning conditions on θ-role assignments. See note 28. 20 For discussion of an indexing mechanism for expressing disjoint reference, see Chomsky 1980, 1981; Freidin and Lasnik 1981; Lasnik 1981; and Higginbotham 1983. 21 Speaker judgments on these examples tend to vary for reasons that are unclear. For example, some speakers do not accept (i), but find that acceptability improves in the negative counterpart. (i)
*Johni reads [books about himi]
Similar judgments hold for the pair like vs. dislike. For a somewhat different analysis of (i) and related material, see Chomsky 1982a:fn. 24. 22 Regarding the application of disjoint reference into NP, as mentioned in the previous
footnote, note that examples like (45.b.i–ii) appear to be the standard case across languages (see Lasnik and Freidin 1981 for discussion of this with respect to the NIC). Speakers who tend towards a disjoint reference reading in the simple sentence cases find the coreferent reading in complex sentences unexceptional. The disjoint reading in simple sentences may be due to lexical properties of the matrix verb—i.e., where there is a connection via θ-marking between the verb and its object NP. In complex sentence cases no such
Generative grammar
214
connection exists; hence there is no apparent way to contravene the opacity of the complement object NP. 23 See Harbert 1983 for discussion of PP. Harbert notes that some PPs appear to block disjoint reference as in (i). (i)
Johni put the book [PP beside himi] As (ii) illustrates, this is not generally the case.
(ii)
*Johni mailed the book [PP to himi]
24 Whether or not PP is a θ-domain is left open at this point. (50) above and (ii) of the previous footnote argue against this, (i) would conform to Principle B if PP is a θ-domain. 25 Control PRO must have an antecedent in S, in contrast to non-control PRO (i.e., PROarb) as in (i). (i)
It is unclear [ whati [S PROarb to do ei]]
Where PRO need not be bound by an antecedent, it does not function as an anaphor. If PROarb is never bound to an antecedent at the level of representation where binding principles apply, then binding theory will not account for the distribution of PROarb. This will be assumed in what follows. Whatever accounts for the fact that PROarb occurs only in ungoverned positions might also account for the distribution of control PRO in only ungoverned positions. 26 Huang claims that PRO has only one bounding domain—i.e., the one provided by the most restrictive definition. In the case under discussion this would be the domain of an accessible SUBJECT, and thus the larger of the two possibilities. In this domain PRO must be both bound and free. 27 This is necessary in any case so that PROarb is excluded as a possible subject in the infinitival complement of these verbs. That is, John wants to leave is never interpreted as “John wants someone to leave”. 28 This will occur when an NP in a position which is assigned an argument function (θ-role) binds a trace in a position which is also assigned an argument function. The basic idea is that movement operations are restricted to moving constituents into non-argument positions only (i.e. positions which are not assigned an argument function). In terms of trace binding, the antecedent of a trace cannot occur in an argument position (θ-position in the terminology of Chomsky 1981). See Freidin 1978, Borer 1979, and Chomsky 1981 for further discussion. See also note 33. 29 See Freidin and Lasnik 1981 for a detailed discussion of these cases, and Chomsky 1982a for a different analysis involving a functional definition of empty categories in terms of the local binder of the empty category. Basically, Chomsky suggests that because the empty categories in (63) and (64.b.i) are locally A-bound by antecedents in θ-positions, they function as PRO. Therefore the examples are illformed because they contain governed PRO, which is prohibited. For some important criticism of this analysis, see Brody 1984. 30 I am indebted to Carlos Piera for the following examples. 31 See Bouchard 1982 and Sportiche 1983 for further discussion. 32 Note that the other features make a 3-to-1 distinction in contrast to the 2-to-2 distinction of the features in (69). That is why they don’t work here. 33 It is claimed in Chomsky 1981 and 1982a that one argument for the functional determination of empty categories is that the existing types virtually partition the distribution of NPs. This
Fundamental issues in the theory of binding
215
would follow if there were only one (structural) type of empty category, where the distinctions among the four subtypes are made in terms of their function. This argument is unconvincing because the {PRO, pro}/trace distinction is crucial for both the theory of predicate/argument structure (see footnote 28) and the Empty Category Principle (ECP) of Chomsky 1981 which requires that an empty category be ‘properly governed’ [the exact definition of proper government need not concern us here]. In contrast to trace-binding, PRObinding is immune to the prohibition against binding between argument positions since PRO and its binder always have independent θ-roles. Furthermore, PRO, unlike trace, is immune to the ECP since it is never governed. Similarly for pro. If {PRO, pro} were structurally distinct from trace—i.e., [NP [N e]] for {PRO, pro} vs. [NP e] for trace, then there might be some principled explanation for the immunity of {PRO, pro} based on this structural difference. The alternative of appealing to the independent-θ-role-of-the-binder feature of {PRO, pro} to distinguish them from trace will not account for their immunity to the ECP in non-control constructions. 34 See Chomsky 1982a:sections 3 and 5 for further discussion based on somewhat different assumptions about the analysis of empty categories. 35 These examples are from Chomsky 1981:197. The grammaticality of (74b) seems dubious at best. 36 Exactly how this is to be implemented is far from clear, and raises several complicated issues involving the specification of quantifier/variable structures in LF. van Riemsdijk and Williams 1981 propose an intermediate level of structure between D-and S-structure, designated NP-structure, which indicates only the results of NP-movements (excluding, crucially, WH-movements). If Principle C applies at NP-structure, then the interpretation of (76b) is accounted for without recourse to reconstruction at LF. However, the interpretation of (76a) cannot be accounted for under this proposal. Note further that the correct interpretation of (74) follows from the NP-structure proposal or the reconstruction proposal, given that relative clauses are not affected by QR (contrary to Chomsky’s analysis cited above). 37 (81) might be derived by replacing the trace of the wh-phrase in (79) with the term [x of y]. The problem with this is that it presupposes an analysis of quantifier/variable structure which will lose May’s explanation for linked quantification (see May 1977). That is, in NPs containing two quantified expressions, the more deeply embedded quantifier always has wide scope. This does not follow given the analysis under discussion.
11 Binding theory on minimalist assumptions The Minimalist Program proposed in Chomsky (1995d) radically alters the foundations of syntactic theory by reformulating several fundamental theoretical constructs (e.g., involving phrase structure and transformations) as well as placing severe methodological restrictions on what tools and mechanisms might be employed in the construction of syntactic analyses. To a large extent, the reformulation of constructs is driven by methodological requirements—especially the assumption that all constructs must meet a criterion of conceptual necessity, the Ockham’s razor of the Minimalist Program. This paper attempts to sketch the ramifications of this and other assumptions of the Minimalist Program as they apply to a theory of binding. In particular, the discussion will focus on the effects of minimalist assumptions on the standard version of Binding Theory within the Principles and Parameters framework (Chomsky 1981; Chomsky & Lasnik 1993; Freidin 1994a). As in other areas of syntactic investigation, minimalist assumptions lead to a radical revision of the standard theory that has been in use for over a decade. To begin, let us briefly review the standard theory in broad outline. It consists of the following three principles.1 (1)
a. b. c.
An anaphor must be bound within a local domain. A pronoun cannot be bound within a local domain. An r-expression cannot be bound.
The relation bound is defined in terms of c-command and coindexation: one expression binds another if it c-commands the other and carries the same index. As formulated, the binding principles operate as output conditions (i.e., conditions on representations). Thus binding theory must specify to which level(s) of representation the binding principles apply. It must also define “local domain” for Principles A and B. A subsidiary question arises as to whether this definition is the same for both principles (cf. Chomsky & Lasnik 1993) or different (cf. Freidin 1986). Furthermore, binding theory must account for the fact that the three principles appear to be instantiated somewhat differently crosslinguistically. In the case of Principles A and B this may be due to parametric variation affecting the definition of local domain (see Yang 1983; Freidin 1992). Even for Principle C there appears to be some crosslinguistic variation which involves distinguishing domains in which the principle applies from those in which it does not (see Lasnik 1991). The minimalist assumption that theoretical constructs must meet a criterion of conceptual necessity has a profound effect on the binding theory sketched above. The prime example discussed in Chomsky (1993) concerns levels of representation. The conceptual argument is crystal clear. The interface levels of Phonetic Form (PF) and Logical Form (LF) are required to account for how the computational system of human
Binding theory on minimalist assumptions
217
language CHL connects with other systems of the mind/brain involved in the production and perception of the physical signals of speech (including sign language) and in the translation of thought to language as well as language to thought. However, there is no such motivation for levels of D-structure and S-structure as characterized in previous work. Therefore, the postulation of such levels of representation is, under the Minimalist Program, illegitimate.2 This creates an immediate problem for any version of binding theory that proposes the application of binding principles at either level. Consider for example the empirical argument discussed in Chomsky (1993) that Principle C applies at S-structure. The argument is based on the following evidence [Chomsky’s (23a–c)]. (2)
a. b. c.
You said he liked [the pictures that John took]. [How many pictures that John took] did you say he liked t. Who [t said he liked [α how many pictures that John took]].
In (2a), because he c-commands John, the pronoun cannot take the name as its antecedent. In contrast, the pronoun in (2b) does not c-command the name and therefore the name may be interpreted as its antecedent. While the interpretation of (2c) is straightforward (the pronoun cannot take the name as its antecedent), its analysis is not. (2c) contains two wh-phrases, only one of which has moved from its grammatical function position to create the required quantifier/variable structure. Given the prohibition against vacuous quantification, the second wh-phrase must also move covertly at LF to create a quantifier/variable structure. If the entire phrase α adjoins to who, then in the resulting structure the name and pronoun will be in the same relation as in (2b). Since the name in (2c) cannot be interpreted as the antecedent of the pronoun as it can in (2b), this demonstrates that Principle C cannot apply at LF. However, there is another LF analysis for (2c) that doesn’t lead to this conclusion—namely, that only the quantifier how many gets fronted at LF. In this case the structural relation between the pronoun and the name remains at LF as we see it in (2c); hence there is no need to postulate a special level of S-structure at which Principle C can apply. In the development of the Minimalist Program in chapter 4 of Chomsky (1995d), this line of analysis is motivated on economy grounds. Basically, since what drives movement is feature checking, it is assumed that what is being moved is a set of formal features. Overt movement involves pied piping of categories, for reasons that are not entirely clear. However, if economy requires that operations be minimal, then covert movement should apply only to features. Thus the covert quantificational movement of how many at LF would involve only the features on the quantifier, and not the remainder of the phrase α. The criterion of conceptual necessity argues in favor of this analysis over the alternative that requires the postulation of S-structure. In pursuing it, we discover that the alternative involved an unmotivated assumption—namely, that covert movement must involve categories. The elimination of D-structure and S-structure as levels of representation has a salutary effect for a theory of binding. Without these levels, there is no possibility that either the three binding principles could apply at different levels within a single language or one or more principles could apply at different levels in different languages. Neither possibility was precluded in earlier versions of binding theory. Therefore the fact that
Generative grammar
218
they do not arise would have to be established via empirical argument, which as we have seen may be subject to unwarranted assumptions. The optimal situation given the minimalist perspective is when empirical arguments support conceptual arguments. Limiting the application of binding principles to LF representations appears to require the adoption of the copy theory of movement transformations. In (3), for example, the pronoun cannot take the name as antecedent even though it does not c-command the name. (3)
[How many pictures of Alice] did she really like t.
Assuming that Principle C is operative in such constructions, so that the antecedent relation in (3) is blocked for the same reason that it is blocked in (4), we are led to postulate an LF representation of (3) in which the pronoun c-commands the name. (4)
She really liked four pictures of Alice.
The copying theory of movement would give us (5), which presumably would be translated into an LF representation along the lines of (6).3 (5)
[How many pictures of Alice] did she really like [how many pictures of Alice].
(6)
[How many x] did she really like [x pictures of Alice].
Without this kind of LF derivation, (3) might easily be construed as evidence that Principle C applies at a level of D-structure. It is worth noting here that given the copy theory of movement operations in conjunction with the kind of deletion required to derive (6) from (5), one could still maintain that Move Category (i.e., Move α) applies covertly in (2c) but that the required deletion provides the same result as the Move Feature analysis. Therefore examples like (2c) do not provide empirical evidence for the Move Feature analysis as we might have otherwise expected. Although the copying analysis is required if binding principles apply only at LF, it raises a potentially difficult problem for examples like (2b) where the moved wh-phrase includes a relative clause. Thus compare (7) to (3). (7)
[How many pictures that Alice bought] did she really like t.
In (7) the name Alice may be construed as the antecedent of the pronoun, in contrast to (3) where it cannot. This requires that in the LF representation of (7), the relative clause does not show up in the position of the trace. Exactly how this to be achieved is not clear, nor is it clear exactly what LF representation of (7) would be. Under the copying analysis (7) could involve (8).
Binding theory on minimalist assumptions
219
(8) [How many pictures that Alice bought] did she really like [how many pictures that Alice bought].
The derivation of the LF representation for (7) would involve some deletions— presumably pictures in the moved phrase and the quantifier how many in the copy. If we treat (7) on a par with (3) then the relative clause would be deleted in the moved phrase as well—yielding the wrong structure since Alice may be interpreted as the antecedent of the pronoun. This shows that there is an apparent asymmetry in the behavior of relative clauses and complements with respect to binding principles (cf. Freidin 1986, 1992, 1994a; Lebeaux 1988). How this is to be captured in an LF representation (7) seems problematic. Taking (6) as a model, (7) would presumably appear at LF as (9). (9)
[How many x that Alice bought] did you say she really liked [x pictures].
The problem with (9) is that x is a variable ranging over integers whereas the relative clause modifies pictures not an integer.4 Rather than pursue this analysis further, let us consider a related set of facts that suggest that a Principle C analysis of these constructions is perhaps on the wrong track. If we substitute a copy of the name in (3) and (7) for the pronoun, we should presumably get the same results with respect to Principle C since it is the binding of the name that is at issue. Perhaps surprisingly, this turns out not to be the case. (10)
[How many pictures of Alice] did Alice really like t.
(11)
[How many pictures that Alice bought] did Alice really like t.
In both (10) and (11) it is possible to interpret the two instances of the name Alice as referring to the same person. This is expected for (11) given that its LF representation is like that of (7) where the relative clause is not reconstructed in object position. The coreferential interpretation of (10) is, however, completely unexpected given that its LF representation would be parallel to that of (3)—i.e., (6), hence (12). (12)
[How many x] did Alice really like [x pictures of Alice].
Under the standard theory the two names on the coreferential reading are in a binding relation which should be prohibited by Principle C. In (13) where no overt movement is involved this binding relation is prohibited. (13)
Alice really liked four pictures of Alice.
Generative grammar
220
The natural interpretation of (13) requires that there be two people named Alice. The contrast between (3) and (10) is unexplained and apparently unexplainable under the standard theory. Moreover, the standard theory makes the wrong prediction for the interpretation of (10). Separating the pronoun/name case from the name/name case along the lines of Lasnik (1991), where Principle C is split into several conditions depending on the status of the binder, one of which states that an r-expression is pronoun-free (i.e., cannot be bound by a pronoun), eliminates the problem of having a principle apply in one case but fail to apply in a structurally identical case. However, we are still left with a serious problem for the residue of Principle C since it predicts the wrong interpretation for (10).5 So far we have been discussing the application of the standard binding theory, specifically Principle C, at LF because that is where it would have to apply given the minimalist assumption that the only two levels of representation are the interface levels PF and LF. It can’t apply at PF given the further assumption that PF contains no structural information. “PF is a representation in universal phonetics, with no indication of syntactic elements or relations among them (X-bar structure, binding, government, etc.)” (Chomsky (1993:194)). Therefore, binding principles can only apply to LF representations. However, it is not clear that under minimalist assumptions the formulation of the binding principles in the standard theory is in fact conceptually motivated. Consider first that fact that under the standard theory the definition of “bound” involves two nominal expressions in a c-command relation that are coindexed. Under minimalist assumptions, however, indices and similar devices are not available. Chomsky takes it as a natural condition “that outputs consist of nothing beyond properties of items of the lexicon (lexical features); in other words, that the interface levels consist of nothing more than arrangements of lexical features” (Chomsky 1995d:225), thereby meeting a condition of inclusiveness.6 Furthermore, he claims that: A theoretical apparatus that takes indices seriously as entities, allowing them to figure in operations (percolation, matching, etc.), is questionable on more general grounds. Indices are basically the expression of a relationship, not entities in their own right. They should be replaceable without loss by a structural account of the relation they annotate. (Chomsky 1993:fn.53) Obviously if we eliminate indices as a grammatical device, then binding theory must be recast in some other way since the standard theory is to a large extent a theory about the assignment of indices.7 The alternative proposed in Chomsky & Lasnik (1993) (and adopted in Chomsky (1993)) involves replacing indexing procedures with interpretive procedures. As Chomsky & Lasnik note, the indexing procedures of the standard theory require interpretive procedures as well. By recasting the binding principles as interpretive procedures it is possible to dispense with the indexing procedures. Thus the binding principles in (1) become the interpretive procedures of (14), where D stands for the relevant local domain in which the procedure applies.
Binding theory on minimalist assumptions
221
(14) a. If α is an anaphor, interpret it as coreferential with a c-commanding phrase in D. b. If α is a pronoun, interpret it as disjoint from every c-commanding phrase in D. c. If α is an r-expression, interpret it as disjoint from every c-commanding phrase.
Under this proposal, the principles of binding are not conditions on representations. Rather, they assign certain interpretative relations among nominal expressions, and are thereby derivational in nature. Thus (14a) as an interpretive procedure does not account for cases where the interpretation cannot apply, e.g., (15). (15)
a. b. c. d.
*Herself is clever. *Mary thinks that herself is clever. *Mary expects that John will like herself. *Mary expects John to like herself.
That is, we need a further statement (16) to account for (15). (16)
An anaphor must be interpreted as coreferential with an appropriate antecedent.
If (14a) is the only interpretive rule for anaphors, then the only possible antecedent will be a c-commanding phrase in D.8 In the case of (14b), we need no further condition to process the disjoint reference interpretation for pronouns. However, if (14b) is the only rule of pronoun interpretation in binding theory, then CHL does not account for the fact that sentences like (17) are ambiguous. (17)
Mary thinks that she is clever.
The pronoun and the name will not be interpreted as disjoint in reference by (14b), but that does not say whether they are coreferential or not. Thus we have returned in essence to Lasnik’s 1976 theory of pronominal coreference where the coreference possibility in (17) is not given by any rule of grammar. Principle C of the interpretive theory (14c) does not fare any better than the standard theory version (1c) with respect to (10). It makes the same wrong prediction with respect to (12), the putative LF representation of (10). Moreover, the existence of such a rule of interpretation (or alternatively a condition on representations like (1c)) ought to be suspect on conceptual grounds. While both anaphors and pronouns act as anaphoric expressions—i.e., they stand in for some other nominal expression, r-expressions do not. Thus it seems inappropriate for that reason to treat them as if they could behave as anaphoric expressions and therefore must be interpreted as disjoint from c-commanding nominals.9 If we restrict our attention to the issue of antecedents for anaphoric expressions, then Principle C would be restricted to covering just examples like (18).
Generative grammar (18)
a. b.
222
He likes Max. He thinks that Max is clever.
If so, then Principle C might be reformulated as the interpretive rule (19). (19) If α is a pronoun, then interpret it as disjoint from every r-expression it c-commands.
(19) is equivalent to the Lasnik (1991) principle that r-expressions be pronounfree, but without reference to “bound r-expressions,” which I am suggesting should be illegitimate on conceptual grounds. With the elimination of indexing, the issue of coreference between r-expressions should disappear. Presumably there is no need for a special grammatical mechanism to check pairs of r-expressions to determine whether they corefer or not. At this point we still have no account for the fact that when two r-expressions are phonetically identical, there exists an interpretive option to treat them as having the same reference—which does not entail that one is anaphoric on the other.10 In some cases this option is realized (e.g., (10) and (11) above), in others it is prohibited (e.g., (13)). The fact that the difference depends on whether a c-command relation holds between the two r-expressions is suggestive that binding theory is somehow really involved, even though it is unclear how this could be achieved if binding theory is limited solely to anaphoric relations within sentences, where one expression stands in for another (its antecedent).11 The same reference option for r-expressions is an entirely different type of phenomenon, involving the assignment of reference to r-expressions which is presumably not part of CHL. Anaphoric relations, in contrast, concern the assignment of antecedents to anaphoric expressions (bound anaphors, pronouns, and pronominal epithets), a purely grammatical phenomenon. R-expressions involve word/world relations, while anaphoric expressions involve word/word relations. Thus on conceptual grounds alone it seems natural to separate the two cases. The picture of binding theory on minimalist assumptions that is beginning to emerge seems very different from the standard theory. Instead of three conditions on indexing representations involving anaphors, pronouns, and r-expressions, we have three rules of interpretation—one for anaphors and two for pronouns—that involve the relations between anaphoric expressions and their antecedents. Furthermore, the rule for an anaphor specifies when a nominal can be interpreted as its antecedent, while the rules for a pronoun specify when a nominal expression cannot be its antecedent. The rule for anaphors can fail to apply or apply improperly yielding deviant structures. The rules for pronouns cannot because the only relation CHL specifies for a pronoun is disjoint reference and a pronoun, unlike an anaphor, does not require an antecedent in the sentence in which it occurs. Recasting the principles of binding theory as rules of interpretation instead of conditions on representations avoids a potentially serious problem with respect to the minimalist assumption that all output conditions are external interface conditions (bare output conditions). First, unless parametric variation extends to bare output conditions (a totally unmotivated assumption at this point), it would be difficult to explain crosslinguistic variation for binding configurations documented in the literature (cf.
Binding theory on minimalist assumptions
223
chapters 7 and 8 of Freidin 1992 and Yang 1983). Moreover, it is far from clear how standard violations of Principles B and C could be construed in any real sense as violations of Full Interpretation (FI), the only candidate we presently have for a bare output condition (see Freidin 1997). With bound anaphors, however, it is possible to construe the failure of the interpretive rule (e.g., (15a–b)) as resulting in a violation of FI, if we can assume that an anaphor without an antecedent is assigned no referential interpretation. If the reference of nominal expressions is assigned outside of CHL, then FI will apply externally to CHL as well. This indicates that FI must operate as a bare output condition, since the failure to assign a referential interpretation occurs outside CHL. At the conclusion of his survey of the history of modern binding theory (1989), Howard Lasnik writes: …the developments explored here can best be seen not as a series of revolutionary upheavals in the study of anaphora, but rather the successive refinement of one basic approach, and one that has proven remarkably resilient. Given that BT has become the subject of intensive investigation, with new phenomena in previously unexplored languages being constantly brought to bear, and all this while old problems from familiar languages remain, further refinements, or even revolutionary upheavals, are inevitable. (Lasnik 1989:34) From the preceding discussion of binding theory on minimalist assumptions it would seem that although some revolutionary upheaval may still be off in the future, the ground is certainly shifting so that our perspective appears to be undergoing a significant change. Notes
1 The usual formulation says that a pronoun must be free in a local domain and an r-expression must be free. Nothing of substance changes with the formulation given in (1) given that free means “not bound.” Following standard practice, we will refer to principles (1a-c) as Principle A, B, and C respectively. 2 Although Chomsky allows that the empirical properties of language might force the postulation of constructs that depart from the criterion of conceptual necessity (1995d:318, fn.7), the preference for conceptual arguments over empirical arguments, which lies at the heart of the Minimalist Program, renders this option extremely unlikely (see Freidin 1997a). 3 This follows the analysis of Chomsky & Lasnik (1993). Cf. their (105). 4 Note that this problem arises even if we adopt a Lebeaux style analysis in which the relative clause is adjoined to the wh-phrase after it is moved so that the wrong binding configuration between the name and the pronoun never occurs. 5 Another condition would prohibit an r-expression bound by another r-expression. As Lasnik shows, this condition is subject to parametric variation whereas the other is not. Note that the interpretation of (10) could be accounted for if the condition on pairs of r-expressions held at S-Structure rather than LF. On minimalist assumptions we would want to avoid this conclusion if possible. See below for discussion on excluding the condition on pairs of rexpressions from binding theory altogether.
Generative grammar
224
6 Although the footnote to this passage allows that violations of inclusiveness might be forced by empirical considerations, no indication is given as to what might actually count in that direction.
Note that considerations of this nature can be invoked only within a fairly disciplined minimalist approach. Thus with sufficiently rich formal devices (say, set theory), counterparts to any object (nodes, bars, indices, etc.) can readily be constructed from features. There is no essential difference, then, between admitting new kinds of objects and allowing richer use of formal devices; we assume that these (basically equivalent) options are permitted only when forced by empirical properties of language. (1995d:fn.7)
7 See for example the discussion of the standard theory in §1.4.2 of Chomsky & Lasnik (1993). 8 While (16) is required to account for (15a–b), there is a way of interpreting (14a) so that (16) would not be required to account for (15c–d). If (14a) interprets the anaphor herself as coreferential with John, then the deviance of (15c–d) would come from the failure of agreement of gender features which the coreference relation would surely entail. In other words, (14a) simply marks some c-commanding nominal expression in D as the antecedent of the anaphor (in D). When (14a) fails to apply, as in (15a-b), the result is excluded by (16). This account treats both (15a) and (15b) in the same way. Cf. Chomsky & Lasnik (1993) where (15a) requires a different analysis from (15b) with respect to their interpretation of (14a). 9 Higginbotham (1983) proposes an alternative to standard binding theory which also precludes treating r-expressions as anaphoric in nature. His analysis is based on the claim that “the interpretation of an expression can be given in one and only one way” (his (26)), which is characterized as an informal condition. The basic idea is that an r-expression gets its interpretation from its lexical content and therefore cannot get an interpretation in another way—i.e., by Linking to an expression construed as an antecedent. As Lasnik & Uriagereka (1988) note, there are serious problems with this approach involving the interpretation of pronominal epithets, which have their own lexical interpretation internally and yet can be externally linked to an antecedent. Furthermore, as they also note, even anaphoric expressions (pronouns, reflexives, and reciprocals) have specific interpretations based on and therefore legitimate linking configurations would inherent lexical features (i.e., apparently constitute a violation of this informal condition.
Reinhart (1986:146) proposes to replace Principle C with a pair of pragmatic strategies, one for speakers and another for hearers. The speaker’s strategy is to employ bound anaphora when the structure being used allows it and coreference is intended, unless the speaker has some reason to avoid bound anaphora. The hearer’s strategy is to assume that if the speaker avoids the bound anaphora option available for the structure, then coreference was not intended, unless the hearer knows the speaker had reasons for avoiding bound anaphora. Thus the structure (1.a) satisfies the pragmatic conditions, whereas (i.b-c) both fail. (Coindexing indicates bound anaphora, italics indicates intended coreference, and # marks pragmatic inappropriateness). (i)
a. b. c.
Johni read hisi book. # He read John’s book. #John read John’s book.
Binding theory on minimalist assumptions
225
One problem with this alternative is that it does not distinguish between (i.b) and (i.c), and therefore cannot account for the fact that while intended coreference between the pronoun and the name is precluded in (3), intended coreference is possible when the name is substituted for the pronoun as in (10). For further critical comments on pragmatic accounts of binding, see Lasnik & Uriagereka (1988:166,fn.12). 10 Note too that we have no account of the facts in Lasnik (1991) which suggest that a condition on the coreference possibility for pairs of r-expressions is subject to parametric variation. 11 The interpretive rule (14c) which would accomplish this seems in this regard a holdover from the standard theory in which coindexing allowed us to treat r-expressions as if they could have an anaphoric interpretation.
Part II History
12 The analysis of Passives 1 Introduction Within the framework of transformational grammar, it is generally accepted that phrasestructure rules alone cannot account for the relationship between active sentences and their passive counterparts; therefore the derivation of passives necessarily involves a transformation henceforth referred to as PASSIVE.1 This position is based on the assumption that the active-passive relation is structural in nature and therefore best expressed by a transformation. In what follows I would like to suggest that both this assumption and the position built on it are debatable—(1) because they lead to certain fundamental theoretical problems, and (2) because it is possible to account for the activepassive relation without a transformation, given different assumptions about the nature of that relation. In early transformational theory, the active-passive relation was accounted for by an optional transformation which applied to an underlying structure common to both actives and passives. The relation was considered a derivational relationship—i.e., at some level of representation, actives and their corresponding passives were identical. Thus in a Syntactic structures model, a common underlying structure could be mapped onto two distinct surface structures—one passive and the other active. Figure 1 illustrates how PASSIVE accounted for the active-passive relation.
Figure 1 This situation changed with the publication of Katz & Postal 1964 and Chomsky 1965. In both works it was argued that active and passive sentences must be derived from different underlying structures, and that PASSIVE must be an obligatory rule. For example, in the Aspects model, active and passive sentences had the underlying representations of Figures 2 and 3 respectively.
The analysis of passives
229
Figure 3 Under this analysis, PASSIVE does not relate the derivations of an active sentence and its corresponding passive. Rather, Figures 4 and 5 apply. The derivations for actives and passives are distinct; at no level of representation are they identical. (The same point was noted in Katz & Postal, 118, with respect to all relations between sentence types.) In fact, there is no syntactic rule which explicitly expresses the active-passive relation in the syntactic component. This is also true of other more recent proposals for a transformational analysis of passives (see R. Lakoff 1971 for details).2 Thus an argument that a syntactic rule PASSIVE is desirable in a grammar, because it reflects speakers’ intuitions about the active-passive relation, cannot be supported.
Generative grammar
230
Figure 4
Figure 5 Given that the underlying syntactic representations of actives and passives are not identical, the active-passive relation must be established by stating an equivalence between structures at some level of representation in Figures 4 and 5. Yet no matter what level is chosen to state this equivalence, it will be the statement of equivalence, and not PASSIVE, that will express the active-passive relation. Thus we could state the equivalence in terms of deep structures—following Katz & Postal’s discussion of “similarity” (118–20), assuming that by ∆ is semantically empty. This would result in actives and passives being assigned the same semantic reading, but would still not relate them syntactically. Alternatively, the equivalence could be stated in terms of semantic readings or cognitive (i.e., truth-value) synonymy as noted in Chomsky (1965:22)— generics, quantifiers, and negation aside. In either case, the active—passive relation would be essentially semantic, and certainly not transformational. Thus a rule of interpretation would be required to account for this relation. An explicit formulation of such a rule will be discussed below. Given the discussion above, it is an open question whether a transformation PASSIVE is necessary in an analysis which relates actives and passives by a rule of interpretation. If we accept the traditional assumption that rules of semantic interpretation operate on
The analysis of passives
231
grammatical relations such as “subject” and “object”, which are defined on deep-structure tree configurations, then PASSIVE is a necessary rule of grammar for obvious reasons. If, however, we do not make this assumption, then the conclusion does not automatically follow. Instead, we might assume that rules of semantic interpretation operate on semantic functions which are defined independently of tree configurations, and are therefore unaffected by transformational operations which alter tree configurations in the course of a derivation. It should be possible then to determine semantic functions at a level of surface structure or near it. If, in addition, the surface structures of passives are generatable in the base—a possibility which follows from the notion of PASSIVE as a structure-preserving rule (see Emonds 1970),3 then a rule PASSIVE would be unnecessary. It is this position that I will explore below 2 Some problems with passive Aside from the major problem discussed above, there are two fundamental weaknesses in the transformational analysis of passives. One involves the derivation and interpretation of truncated passives (i.e., without by-phrases); the other, the specification of a lexical feature on a predicate which distinguishes verbs that undergo passivization from those that do not. Truncated passives can be generated transformationally either by an ellipsis rule which deletes a wh-phrase, or by a rule which obligatorily preposes the underlying object—thus passivizing the verb—into underlying subject position when the latter is lexically and semantically empty (see Emonds for a detailed discussion of this analysis). The ellipsis analysis derives truncated passives in 1 from the full passives in 2 by means of optional rules which delete the phrases by someone and by something: (1)
John was hit. Bill was questioned. Alice was bitten.
(2) John was hit by Bill was questioned by someone. Alice was bitten by something.
The deletion of by someone or by something would not violate the recoverability condition on deletions, since the deleted elements can be considered pronominal representatives of the general categories HUMAN and NON-HUMAN noun (cf. Chomsky 1964:71). The ellipsis analysis predicts that both 1 and 2 are well-formed. Yet it does not account for the more plausible sources of 3a-f, namely 4a—f (3a—c are from Emonds):
Generative grammar (3)
232
a. Germany was defeated. b. John wants to be left alone in his room. c. He was never physically harmed by his father, but he was often threatened. d. Marsha was arrested. e. Jane was elected president of the club. f.
(4)
The compound was oxidized.
a. Germany was defeated by her enemies. b. John wants to be left alone in his room by everyone.
Figure 6 c. He was never physically harmed by his father, but he was often threatened by him. d. Marsha was arrested by an officer of the law. e. Jane was elected president of the club by a majority of the members. f.
The compound was oxidized by air.
Here 3a–f cannot be derived from 4a–f because this would violate the recoverability condition on deletions: the by-phrases in 4 do not contain pronominal representatives of general categories of nouns. Nor is it possible to derive 3a–f from full passives containing by-phrases with a pronoun someone or something, since the full passives, if they are not interpreted as anomalous, receive different interpretations from the truncated versions. Thus 3e is not synonymous with ?Jane was elected president of the club by someone. Hence such examples argue against the ellipsis analysis of truncated passives. Alternatively, truncated passives could be derived from underlying structures like Figure 6 which contain empty subject nodes (henceforth the empty-node analysis).4 The empty node in Figure 6 is considered to be lexically empty and therefore not subject to semantic interpretation (cf. Emonds 1970). The empty subject triggers obligatory preposing of the object into subject position and passivization of the verb. Given this analysis, the problem of 3a—f for the ellipsis analysis never arises. However, an objection can be made concerning the use of empty nodes to account for deleted elements. The problem may be posed in the following way. Even though we can
The analysis of passives
233
account for the absence of underlying subjects in truncated passives by employing empty nodes, it is still necessary to use transformations to handle other deletion processes. For example, a rule of Complement Subject Deletion is necessary in order to distinguish the derivations of 5–6: (5)
Harry expects to be nominated.
(6)
Harry is expected to be nominated.
Figure 7 If sentences like 5–6 were to be handled entirely in terms of empty nodes, we would have to suppose that the underlying structure of both was as shown in Figure 7. Sentence 6 would be derived by NP-Preposing Harry twice, once into the subject position of S2 and again into the subject position of S1. The derivation of 5 is somewhat more problematic, since it would involve an unmotivated preposing rule which did not result in passivization of the matrix verb. In any case, such hypothetical analyses are impossible, since 5–6 are not synonymous, and thus should not have the same underlying structure. So it appears that the use of empty nodes to handle deletions is somewhat limited. In the case under discussion, they are motivated mainly by the hypothesis that all sentences have underlying SVO order. In other words, the use of the empty node in the derivation of truncated passives is not motivated empirically but rather is a way of retaining a transformational analysis of passives. Besides the difficulties involved with the underlying structures of passives, a rule PASSIVE also creates problems with respect to the lexical entries of verbs. Most
Generative grammar
234
transformationalists agree that the ability to passivize is a lexical property of verbs—disregarding for the moment the question of derived nominals (but see Chomsky 1970). Yet there is serious disagreement as to what sort of lexical feature marks this ability. The basic problem revolves around the method for excluding verbs like resemble, fit, involve, and weigh from the operation of PASSIVE, even though they take what might loosely be described as NP objects:5 a.
Max resembles Harry. *Harry is resembled by Max.
b.
The kimono fits Dotty. *Dotty is fit by the kimono.
c.
The project involves five days of hard work. *Five days of hard work is involved by the project.
d.
That picnic basket weighs a ton. *A ton is weighed by that picnic basket.
Two solutions have been proposed: one employs strict-subcategorization features (Chomsky 1965); the other, rule features (G. Lakoff 1970). The strict-subcategorization solution is based on the observation that verbs like those in 7 do not take manner adverbials freely whereas verbs that allow passivization do. In this way the strict-subcategorization feature [+____ NP ADVMANNER] uniquely marks verbs which undergo PASSIVE. The node ADV-MANNER would be directly dominated by VP, in keeping with the constraint that strictsubcategorization rules be limited to strictly local transformations.6 Under this analysis, one expansion of the ADV-MANNER node would be the string by- ∆. Yet if ADV-MANNER occurs only once as the rightmost daughter of VP, then the analysis does not account for passives containing manner adverbials other than by- ∆; e.g., (8)
Archie was applauded with great enthusiasm (by the crowd). This poem has been analysed brilliantly (by Brech). The suspects’ stories were checked carefully (by the police).
In fact, the manner-adverbial analysis in its present form predicts that manner adverbials do not occur in passive sentences. As this is clearly false, it would be necessary to allow ADV-MANNER at least two consecutive expansions. This would necessarily entail the rather ad-hoc condition that if both expansions are chosen, then one and only one must be the string by- ∆. The condition rules out the possibility of sentences with more than one lexical manner adverb per VP. Yet it also suggests that the string by-∆should not be considered a manner adverb—to the extent that this string, unlike other manner adverbs, can co-occur with another manner adverb within the same VP. Even though an agentive by-phrase or a manner adverbial does not co-occur with verbs like those in 7, it remains to be demonstrated that both manifest similar syntactic or semantic behavior. As they are not in syntactic complementary distribution, there is no clear syntactic motivation for assuming they are both generated under the same deep-structure node (see also n. 6).
The analysis of passives
235
Contrary to the manner-adverbial analysis, some evidence suggests that the wh-phrase cannot be generated only under ADV-MANNER because verbs like know, believe, think, see, consider, and hear can passivize although they cannot take manner adverbials freely (this observation is from G. Lakoff): (9)
Lovecraft heard the music of Eric Zahn. The music of Eric Zahn was heard by Lovecraft. *Lovecraft heard the music of Eric Zahn with great delight. Shirley saw the neighborhood vampire last night. *Shirley saw the neighborhood vampire carefully last night.
As Lakoff notes, in order to handle 9 and still treat by-∆ as an adverbial under VP, it would be necessary to find another node which could expand as the wh-phrase. The problem with this is that the verbs in both 7 and 9 are stative. To make the analysis work, we must distinguish between the two sets of stative verbs in terms of some adverbial which cannot freely co-occur with the former. However, even if this can be accomplished, the analysis will involve the above-mentioned ad-hoc condition on the double expansion of this second adverbial node under VP. In any case, this analysis is undesirable because of its lack of generality. The rule-feature solution is also undesirable. According to this proposal, the lexical entries of the verbs in 7 will be marked with a rule feature [- PASSIVE]. If PASSIVE applies to a phrase marker which contains one of these verbs, the derivational history of the string will conflict with the feature in the lexical entry, and the derivation will be rejected. The weakness of this solution is that rule features are little more than ad-hoc devices for blocking ungrammatical strings which the transformational component would otherwise generate. The rule-feature analysis is no more revealing than a list of verbs which are exceptions to the passive rule. Neither explains why such verbs are exceptions.7 Given the unacceptability of both the rule-feature and strict-subcategorization proposals, there seems to be no non-ad-hoc way for PASSIVE to filter out the ungrammatical strings in 7. Assuming the filtering function of transformations (see Chomsky 1965), PASSIVE, in its various formulations, cannot be considered a totally satisfactory rule because it is not an adequate filter. This is not to suggest, however, that because a rule does not adequately filter out all ungrammatical strings, it is in principle not viable. For example, there is no justification for discarding PASSIVE because it allows the ungrammatical string John was behaved by himself, since the non-wellformedness of this string can be accounted for by an independently motivated constraint on pronominalization. Yet when a transformation yields ungrammatical output which cannot be filtered by independently motivated constraints, and when other alternatives which do not include such a transformation are possible, it seems reasonable to work on the assumption that the transformation may be unnecessary.
Generative grammar
236
3 A lexical-interpretive approach to passives Since the synonymy of active-passive pairs is not explicitly accounted for by a rule in the syntactic or semantic component, a grammar that captures the active-passive relation must include a rule that does so. In the theory to be sketched here, the synonymy of actives and passives can be specified in terms of the semantic equivalence of predicates and the semantic functions associated with them. Without going into detail, we may consider that any predicate (i.e., verb, adjective, gerundive nominal—and with exceptions, derived nominal) governs a particular set of semantic functions which is determined according to the conceptual structure of the semantic class of the predicate.8 Thus a predicate of motion carry, as in John carried the chair from the study to the kitchen, governs the semantic functions “agent” (the agency responsible for the motion), “theme” (the object moved), “source” (the location the theme is moved from), and “goal” (the location the theme is moved to). In the example, John is agent; chair is theme; study is source; and kitchen is goal. In the passive version of the example, The chair was carried from the study to the kitchen by John, the NP’s perform the same semantic functions, even though the form of the predicate has changed from active to passive and the syntactic positions of the NP’s have been re-ordered.9 From this discussion it should be clear that the two sentences can be interpreted as synonymous if the two predicates are semantically equivalent. This poses no problem for active-passive pairs because the equivalence of active and passive predicates has always been assumed. Thus a rule of semantic interpretation which marks active and passive sentences as synonymous could be stated as: (10) Si is synonymous with Sj if (a) the predicates of Si and Sj are semantically equivalent—in which case it follows that the predicates govern the same semantic functions—and (b) each semantic function is filled by the same lexical material for either predicate.
This rule would have to apply after those interpreting quantifiers, negation, and coreference. Using rule 10 for determining the semantic equivalence of sentences, it becomes possible to approach the near-paraphrase relationship between the following: (11)
a. b.
John bought the book from Jack, Jack sold the book to John.
Examples of this sort were mentioned in Chomsky 1965 as sentences which are obviously related in meaning but which could not be transformationally related. We already know that buy and sell are predicates of motion, and that the NP’s in each sentence have the same semantic function—i.e., John is goal; Jack is source; the book is theme. Therefore the meaning difference between the two sentences is determined by the semantic fine structure of the two verbs. The predicates buy and sell, aside from belonging to the same semantic class, refer to the same event, a financial transaction. They differ in terms of the interpretation of agency, which seems to be a matter of focus.
The analysis of passives
237
In 11a, John is agent; in 11b, Jack. Yet each predicate entails the other. Since such entailments are part of the semantic representation of these predicates, this information is available to rules of semantic interpretation like 10. If the definition of semantic equivalence of predicates accounts for such entailment relations, then 10 should be able to account for the near-paraphrase relation between 11a–b without resorting to unmotivated syntactic analyses. Since the theory being developed here does not include a single phrase marker from which passive and active sentences are derived, the relationship between active and passive predicates must be captured in the lexicon. For this reason, it is necessary to make a few comments on the structure of the lexicon and derivational morphology. 3.1 Lexical structure and derivational morphology Chomsky 1970 presents a case for handling derivational morphology in the lexicon rather than in the transformational component. The analysis concerns only derived nominals and their relation to verbs. Yet it is possible to extend the analysis to cover all predicates—in particular, passive ones. In English, morphological processes are extremely productive, as shown by the following small sample:
(12)
VERB
ADJECTIVE
NOUN
permit
permitted permissible
permit (concrete) permission (non-concrete)
lock
locked
lock
give
given
gift
close
closed
closure
paint
painted
paint
cover
covered
cover
present
presented
present (concrete)
presentable
presentation (non-concrete)
murder
murdered
murder
amuse
amused
amusement
resent
resented
resentment
foresee
(un)foreseen
foresight
foreseeable
No doubt this list could be greatly expanded if we began to consider agent nominals like murderer, the ing-class of adjectives like amusing, and other derived forms. In order to handle these morphological relationships in the lexicon, certain machinery must be built into it. Thus the general form for the lexical entries of related forms would be:10
Generative grammar
238
(13)
All morphologically related forms would be listed in the same entry The semantic representation indicates the semantic class of the item as well as the conceptual material shared by individual items. As an illustration, consider that the difference between the pairs give/loan (V), given /loaned (A), and gift /loan (N) is in each case the difference between the transferring of a physical object from one party to another, but only in the case of loan is the return of the object implied. This difference is independent of the syntactic form of these pairs. Listing each syntactic form separately in the lexicon is uneconomical, and fails to capture the fact that related lexical items of different syntactic categories may contain the same semantic material.11 In addition to 13, the lexicon includes two kinds of rules (followingJackendoff 1969) to account for the syntactic and semantic properties of related forms. Aside from strictsubcategorization features, the syntactic features include morphological rules which specify the root and affix of the derived form and its syntactic category. For example, the lexicon would include a morphological rule for deriving gerundive nominals: (14)
Mgn: @ +ing is analysed as an N (where @ signifies the root).12
Semantic derivation rules specify the range of interpretation of the derived form. Thus the semantic derivation rule corresponding to Mgn appears as: (15)
Sgn: N=“act of V-ing.”
Such rules would be part of the semantic interpretation of a particular item. As an example, consider the lexical entry for the items give (V), given (A), and gift (N):13
The analysis of passives
239
(16)
This entry shows that these items share the entailment of motion and the conceptual structure which refers to moving a physical object to a particular location. They all govern the same semantic functions. They differ only in syntactic function and range of interpretation, as the semantic derivation rule Snom shows. According to the lexical entry the verb and the adjective have the same semantic representation and therefore the same range of interpretation, because no semantic derivation rule is associated with either form. In short, their semantic interpretation is non-distinct from that of the lexical entry itself. They differ in that the adjective entails the morphological rule “Ma: @+ed is analysed as an A’, whereas the verb entails none. The phonological representation of the noun gift is suppletive and must therefore be listed. Specifying the semantic functions governed by these items is superfluous, since this is already accounted for by designating the entry as MOTIONAL. 3.2 The active / passive relation The lexical machinery discussed above is certainly suitable for expressing the semantic equivalence of passive and active predicates. Such pairs differ according to syntactic category, strict subcategorization (which follows from the first distinction), and selectional restriction. Following 13, the lexical entry for an active/passive predicate pair appears as: (17)
Here Mpass signifies the morphological rule which generates passive forms. No semantic derivation rule distinguishes one form from the other. Their semantic representations are identical. Neither 16 nor 17 as presently formulated solves the problem of redundant mirrorimage selectional restrictions. Without a passive transformation, it seems that the selectional restrictions for V-ACTIVE and V-PASSIVE, i.e., [NPX___NPy] and
Generative grammar
240
[NPy____by NPX] respectively, must both be listed in the lexical entry. However, this need not be the case if a redundancy rule is used to handle mirror-image selectional restrictions. To a certain extent, a passive transformation is one way of stating such a redundancy. It is suspect, however, for the reasons discussed above. We cannot specify the redundancy by stating that a verb which has a selectional restriction [NPX NPy] entails the existence of another verb form which has a selectional restriction [NPy by NPX], since it is not true (cf. 7). Yet the redundancy is statable in the following way: (18)
This correctly predicts that if there is a passive verb in the lexicon, then there is also an active counterpart. This resolves the problem of 7. Moreover, 18 can be restated as a redundancy condition: (19) Mpass entails the existence of an active verb where [V-ACTIVE NPX_____NPy] if [V–PASSIVE NPy by NPX], and V-ACTIVE and V-PASSIVE are semantically equivalent.14
Given 19 in the lexicon, one argument for a transformational treatment of passives over a non-transformational account is neutralized. Note that 19 is not a notational variant of a transformational analysis of passives, because it does not admit exceptions (cf. n. 7). Furthermore, it should be clear that Ma and Mpass are in effect the same rule (see §3.3 below). If 10, 13, and 19 taken together are to account for the cognitive synonymy of activepassive sentence pairs, then the selectional restrictions given in 19 must be stated in terms of semantic functions rather than syntactic categories (NP’s etc.) That is, the selectional restrictions of 19 should be given as: (20)
V-ACTIVE: [X____Y] V-PASSIVE: [Y____X]
Here X and Y represent the semantic functions of the constituents immediately preceding and following the verb. The values of X and Y are determined by the semantic class of the verb; thus the verb send is marked [Source Theme]. Condition 19 predicts the selectional frame [Theme Source] for the passive form sent.15 Such specifications must be available because the semantic function of a constituent is not predictable from a syntactic tree configuration—i.e., semantic functions cannot be interpreted in terms of grammatical relations as defined in Chomsky 1965. Consider the distribution of semantic functions in subject position for verbs of motion: (21)
Source: give, sell, loan, send, mail, ship, hand (over)… Goal: grab, receive, buy, inherit, take, steal, seize, accept, get, obtain Theme: (all intransitive) roll, slide, fly, drive, bounce, fall, jump, swim…
The analysis of passives
241
There is also a subclass of motional verbs including move, push, force, remove, shove, and throw, whose subjects are interpreted as agents but not necessarily as having any other semantic function associated with motional predicates. For this reason, the linear order of semantic functions must be specified as a selectional frame. Returning briefly to the question of semantic representation in the framework being developed here, it should be noted that every set of cognitively equivalent strings will have one abstract semantic representation. From this the various surface structures will be derivable by a process of lexicalization.16 Thus, given the abstract lexical entry for carry (V) and carried (A), we may lexicalize in one of two directions. If the adjective is chosen, a passive sentence will be derived; if the verb is chosen, an active sentence will result. Clearly there is no reason to assume that lexicalization will define mapping relations between phrase markers—i.e., transformations. (But transformations may be necessary if their output cannot be filtered by the phrase-structure rules of the base, as in the case of questions, imperatives etc.) More explicit formulations are necessary, but their value and feasibility is sufficiently clear. From the above discussion, it should be apparent that the relationship between active and passive sentences can be handled interpretively and that the proposal presented here for such an account captures other generalizations as well. The remainder of this paper will be devoted to showing how passives may be generated by the phrase-structure rules of the base, and how certain problems for a transformational account disappear under this analysis. 3.3 The surface structure of passives In general, transformational accounts of passive constructions analyse the passive predicate as a verb which is preceded by an auxiliary be and followed by a prepositional phrase by NP. This is by no means a necessary conclusion. Instead, we could analyse the passive predicate as an AP—in which case what we have been calling the passive verb would be labeled an adjective. The claim that passive predicates are adjectives is supported in part by the considerable overlap in distribution of adjectives and passive predicates. Consider the following parallelism of structure:
(22) [NP DET-A-N NP]
PASSIVE PREDICATE
ADJECTIVE
the locked door
the unintelligible solution
the acclaimed speaker
the enthusiastic linguist
[S NP-be [AP A-PP The door was locked by Sam. AP] S]
[NP DET-N-AP NP]
The solution was unintelligible to Max.
The speaker was acclaimed by the senator.
The linguist was enthusiastic about lexicalism.
the door locked by Sam
the solution unintelligible to Max
the speaker acclaimed by the senator
the linguist enthusiastic about lexicalism
Generative grammar
242
Under an analysis which distinguishes passive predicates from adjectives, it is an accident that both appear in the same surface positions—and furthermore are moved there by the same transformation, Modifier Shift.17 It is worth noting that, by analysing passive predicates as adjectives, we can avoid adhoc explanations of the special status of “passive verbs” as opposed to active verbs. That is, for verbs like hit and mention, both a lexical subject and a lexical object are obligatory unless they are passivized: (23)
John hit Sid. *John hit (in the park). *John was hitting. cf. Sid was hit (during the demonstration). Someone mentioned your name (to me). *Someone mentioned (to me). *Someone was mentioning (to me). cf. Your name was mentioned (at the convention).
The passive predicate, like most adjectives, directly governs only one surface argument— other surface arguments being directly governed by prepositions—whereas the active predicate usually governs two (usually obligatorily). Thus, in terms of semantic function, passive predicates function like adjectives rather than verbs. Analysing passive predicates as adjectives accounts for this fact—since, in the phrase-structure rule that expands AP, PP complements of adjectives are optional. In this way, the problem of the unrecoverability of the wh-phrase deletion rule never arises. Additional support for this analysis comes from examples where there is a clear morphological distinction between adjectives and passive predicates:
(24)
ADJECTIVE
PASSIVE PREDICATE
open
opened
empty
emptied
If both forms could not occur in the same syntactic contexts, then there would be some justification for assuming a categorical distinction—i.e., labeling one an adjective and the other a verb. This is clearly not the case. Compare 22 with the following:
The analysis of passives
243
(25) a.
b. The door open at 5:00 was closed at 6:00. The door opened by Jim was closed by Paul. The punch bowl emptied during the concert was filled again with orange juice. The punch bowl empty at midnight was used as an ashtray. c. The door was opened by Jack. The door was open at 5:00. The bottle was emptied by Al. The bottle was empty at midnight.
Even when there is a clear morphological distinction between adjectives and passive predicates, there is no syntactic motivation for making the distinction a categorical one. It could nonetheless be argued that passive predicates are not adjectives because they do not occur in constructions where we usually find adjectives; e.g., (26)
a. b. c.
Harriet seems intelligent. That boy is very lucky. The sweepstakes winner was extremely happy.
While it is true that passive predicates usually cannot occur as the complement of seem, or take degree modifiers like very and extremely, it is also a fact that certain adjectives may not occur in these constructions either: (27)
a.
*The theory seems unpublished. *The child seems helped.
b.
*The lizard was very dead. *Sam was very arrested.
c.
*You are extremely next. *The chef was extremely congratulated for his fine performance.
The constructions NP seem A, NP be very A, and NP be extremely A which analyse 27a-c are syntactically well-formed. Thus it seems appropriate to rule out 27 on the basis of the internal semantic structure of lexical items rather than their external syntactic form. In other words, certain adjectives may not co-occur with seem or degree modifiers because
Generative grammar
244
the conjunction of the two results in semantic anomaly. Given the present state of our knowledge about the world, it is inappropriate to talk about the degree to which one congratulates someone for something. Since events are not conceived of in terms of degrees, predicates referring to events will not take degree modifiers—although they can take frequency modifiers: (28) The radical was
arrested.
It is also inappropriate to speak of the degree of a temporal sequence (e.g., next, former, previous) or of an absolute condition (e.g., dead). An explanation of the anomaly of 27a is somewhat more elusive. It appears that the adjectival complement of seem must describe a quality or condition which is accessible to direct observation. This follows from the fact that, given the sentences John seems sick and John seems to be sick, the former is inappropriate in a context where the speaker is reporting something that he has not directly observed. Thus 27a is anomalous because we cannot tell that a theory is unpublished, or that a child is helped, from direct observation of the theory or the child. This stands in sharp contrast with non-anomalous cases like The theory seems consistent and The child seems overdressed, where the quality of consistency and the condition of being overdressed are accessible to direct observation. Semantic distinctions also account for cases like 29, where the two distinct morphological forms of 24 do not occur in the same syntactic frames: (29)
a
The bottle was being
(cf. The bottle was
when I arrived.
when I arrived.)
b.
The carefully
bottle had a small crack in it.
c.
The bottle was
the waiter.
The analysis of passives
245
Here 29a shows that the morphological distinction between the two forms correlates with the semantic distinction stative/non-stative to the extent that only one form can co-occur with progressive aspect. It follows naturally that the stative adjectives empty and open do not take manner adverbs like carefully (29b), or agentive by-phrases (29c), whereas the non-stative forms emptied and opened can. These data show that the morphological difference between the pairs of 24 reflects a semantic distinction, but not necessarily a categorial one. Moreover, there are adjectives like foolish and noisy which can function as either statives (30b) or non-statives (30c): (30) ;
a. The
child insisted on buying a drum.
b. The child seems c. The child is being
to get attention
Passive predicates also demonstrate this sort of ambiguity; e.g., (31)
a.
The chicken was cooked. Harry was annoyed. Some linguists are inspired.
b.
The chicken was being cooked. Harry was being annoyed (by Marsha). Some linguists are being inspired (by logicians).
c.
The chicken seems cooked, but it might still be raw near the bone. Harry seems annoyed at Marsha. Some linguists seem inspired.
Thus there is no justification for claiming a categorial distinction for 24, but a semantic distinction for the adjectives in 27.18 It is worth noting that, in German, this distinction between stative and non-stative is clearly reflected in the use of the auxiliaries werden and sein. For German, the following paradigm holds: (32)
Der Brief ist geschrieben. (stative) Der Brief wird geschrieben. (non-stative) Der Brief ist geschrieben worden. (non-stative) Der Brief wird geschrieben υon Max. *Der Brief ist geschrieben υon Max.
Generative grammar
246
Thus for German there is little reason to analyse geschrieben as a verb when it occurs with werden, but as an adjective when it occurs with sein. The adjectival character of geschrieben is readily apparent in such constructions as der geschriebene Brief and der υon Max geschriebene Brief. The adjectival character of passives in French, Spanish, Rumanian, and Russian manifests itself in the fact that, like adjectives in those languages, passives are inflected to agree in number and gender with their surface subjects. The decide on class of verbs poses another problem for the analysis that claims passive predicates are adjectives, because the great majority of English adjectives are not associated with specific prepositions. Even so, the adjectival analysis can be argued on the grounds that some English adjectives do take specific prepositional adjuncts; e.g., (33)
McCawley is fond
Benzer is content
Chinese food.
his flies.
Furthermore, adjectives with prepositional adjuncts do occur prenominally, as in the uncalled for reprisals, the most talked about film of the year, or the agreed upon sum. Those instances where adjectives with prepositional adjuncts do not occur in prenominal position are part of the more general problem posed in n. 18. The most serious drawback to analysing passive predicates as adjectives involves double-object verbs such as give. For these verbs, passives may be followed directly by NP; e.g., (34)
The turtle was given an ear of corn.
If given is an adjective here, then the phrase-structure rule that expands AP must allow for an optional NP complement. The rule would look something like (35)
AP→A (NP) (PP)…
Even though this might seem unusual, there are other instances in English which suggest that this expansion is necessary. In addition to 33 above, consider the following:19 (36) a. This is too difficult a problem to give to a beginner. *This is a too difficult problem to give to a beginner,
The analysis of passives
247
b. Harold is so obnoxious a person that not even his analyst can stand listening to him. *Harold is a so obnoxious person that not even his analyst can stand listening to him.
Assuming that fond of and content with are directly dominated by the node A, then the examples in both 33 and 36 contain surface structures which can be filtered by 35—in which case they could conceivably be generated by the base rules. (Note that only predicate adjectives take NP complements.) Thus, with a few minor adjustments—which seem to have some independent motivation—the analysis of passive predicates as adjectives creates no major problems. One of the strongest arguments against directly generating passive sentences with the phrase-structure rules of the base involves the restrictions on the selection of the passive auxiliary be+en. The argument (cf. Chomsky 1957) is based on the assumption that a phrase-structure grammar contains the following rules: (37)
VP→VERB+NP VERB→AUX+V AUX→C (M) (have+en) (be+ing) (be+en)
Here C is expanded as context-sensitive verb agreement morphemes, and M as the various modal auxiliaries. Given 37, the choice of be+en in the AUX—which must result in the generation of a passive sentence—is very heavily restricted, and is therefore unique among the AUX-phrase elements. The selection of be+en must be subject to the three ad-hoc restrictions in 38 if the grammar is to account for 39: (38)
a. Can be selected only if it is followed by a transitive V b. Cannot be selected if NP follows V, even if V is transitive. c. Must be selected if a transitive V is followed by a PP of the form by-NP.
(39)
a. *The explosion was occurred at noon. b. *Honesty is praised Shoplifters Anonymous. c. *Max writes by books.
Since mirror-image selectional restrictions are handled by a redundancy rule in the lexicon (see above), the additional argument that the choice of be+en in a phrasestructure grammar requires stating selectional restrictions twice does not apply. The argument concerning the passive auxiliary is, of course, viable only if be+en is generated as an element of the AUX phrase. Given that passives are adjectives, however, the auxiliary be can only be analysed as the copula, because all predicate adjectives take a copular verb. Under this analysis, the passive be would be considered a main verb rather than an element of the AUX. Thus the need for the ad-hoc restrictions of 38 disappears: 39a is ruled out because occurred as an adjective does not exist in English; 39b is blocked because the adjective praised is not marked as taking NP complements; and 39c is starred because no verb in English takes two non-conjoined agents.20
Generative grammar
248
3.4 Arguments for a passive transformation Aside from the passive auxiliary argument, there are two other standard arguments for PASSIVE—one concerning its cyclic interaction with a rule of Raising, the other concerning PASSIVE and idiom chunks. The former involves sentences like (40)
The bagel was believed by Oscar to have been eaten by Max.
It is argued that, with respect to the derivation of 40, if Raising is a rule, then PASSIVE must be too. The motivation for Raising is based on the assumption that the operation of PASSIVE is subject to a clause-mate condition. This assumption is unnecessary, given Chomsky’s tensed-S condition (1973). Alternatively, the underlying structure of 40 could be Figure 8—in which case the derivation would not require the cyclic interaction of PASSIVE and Raising (cf. Kimball 1972).
Figure 8 Instead, Figure 8 yields 40 from the application of Subject Raising and VP-Extraposition—i.e., the same operations that apply to infinitival subject complements of seem. The idiom-chunk argument is based on the claim that idiom chunks like close tabs and headway do not occur freely as NP’s, but nonetheless show up as surface subjects in passives. A rule PASSIVE would account for such restrictions on occurrence. However, it is not clear that restrictions on the occurrence of idiom chunks in NP’s is so severely restricted. Consider the following: (41)
a.
Our lack of headway on the project displeased John.
b.
John was pleased by the headway that we had made on the project.
c.
John approved of the close tabs that the police kept on the radicals.
The analysis of passives
249
In each example, the idiom chunk occurs as the lexical head of its NP. Given the standard analysis of relative clauses, the idiom chunk would have to be inserted in the base as an NP—contrary to what is assumed for the passive cases. It may be that the restriction on idiom chunks is semantic rather than syntactic—i.e., that they must be interpreted as arguments of particular predicates, not that they must occur in a particular position in underlying structure. In any case, the idiom-chunk argument is a weak one because of the complicated behavior of idioms in general. That is, some idiom chunks occur as surface subjects of passives, while others—like bucket in kick the bucket—do not; and even the former may not be moved by Tough-Movement—e.g., * Advantage is easy to take of Max. Note too that, in 41c, close tabs takes a determiner the, but this is impossible in passives—*The dose tabs were kept on the radicals. In general, it does not seem advisable to try to justify the existence of a rule on the basis of constructions whose behavior is so idiosyncratic (Chomsky 1970 makes exactly the same point with respect to derived nominals). 4 Conclusion The argument against a passive transformation presented above has been motivated by two independent hypotheses: (1) Emonds’ idea that PASSIVE is a structure-preserving rule—that its output can be filtered by the phrase-structure rules of the base; and (2) Jackendoff’s suggestion that semantic functions do not change under transformation. This entails that the specification of semantic functions (unlike grammatical relations) must be independent of tree configurations, which do change—often quite radically—under transformations. This means that semantic functions should be interpretable from surface structures. If so, then structure-preserving transformations like PASSIVE are unnecessary, because the rules of semantic interpretation should be applicable to the output of the phrase-structure rules—this output being identical to that of the structurepreserving transformations. In the preceding pages, I have tried to show how this argument works, and how it undermines the arguments against generating passives in the base. Clearly the nontransformational account of passives eliminates the theoretical problems posed by a transformational account. Moreover, it provides some motivation for stronger constraints on the notion “grammatical transformation”—which is one of the goals of linguistic theory. If the argument against a passive transformation is viable, it should be possible to make similar cases against all structure-preserving rules. Thus only non-structurepreserving (root) transformations like Topicalization, Subject-Verb Inversion, and the like would qualify as grammatical transformations. At present, this is merely a conjecture; yet it has the potential of becoming a strong working hypothesis. Notes
* I am indebted to Charles Bird, Fred Householder, Roger Lass, Jim McCawley, Terence Moore, and Robert Stockwell for their comments on various versions of this work. I have also profited from discussions with Edward Klima and Tim Shopen. None of the abovementioned necessarily agree with the views expressed here, nor are they responsible for errors of interpretation.
Generative grammar
250
1 This position is discussed in detail below. See Chomsky 1957 and Postal 1964 for further discussion. 2 However, this is not true for case grammars where active and passive sentences are derived from a common abstract structure. It is not clear that case-grammar analyses avoid the problems encountered by other transformational approaches to passives (see below and n. 5). 3 This assumes that the passive auxiliary is not inserted by PASSIVE, but such an assumption is not necessary. Under the two-rule analysis of PASSIVE, no new NP node structure is transformationally introduced. In this way PASSIVE is “structurepreserving” in that its output can be filtered by the PS rules of the base (see Emonds for an insightful discussion). A more detailed presentation of this analysis is given below. 4 Whether or not the passive auxiliary be is present in underlying structure is not crucial to the following discussion. 5 Given a case grammar, these examples can be ruled out by ad-hoc rule features; or by assigning special cases to the NP’s, and then claiming that only the rule that fronts the subject (whatever its case) can apply. 6 Chomsky (1965:105) has suggested that this is motivated by the non-ambiguity of The boat was decided on, which has the sense “Someone chose the boat” but not “Someone decided while on the boat.” He also suggests that this constraint accounts for the pair: (a) (b)
The job is being worked at quite seriously. *The office is being worked at quite seriously.
Here at the job is directly dominated by VP, but at the office is directly dominated by S. Therefore only the former will be subject to PASSIVE, because of the strictly local nature of strict-subcategorization rules. Unfortunately this analysis does not work, because the PP at the office will be dominated by ADV-PLAGE. As such it could be generated as a sister node of VP or a sister node of V, according to Chomsky’s base rules (1965:106–7). In the latter case it would be subject to PASSIVE; in the former it would not. Thus the strictsubcategorization analysis of PASSIVE requires additional ad-hoc restrictions on ADV-PLAGE in order to block sentence (b), in the event that at the office is generated under VP rather than S. Furthermore, G. Lakoff has demonstrated that, in some sentences, manner adverbs can follow locative adverbs or directional adverbs; e.g., (c) (d)
John remained in England patiently. Fran dashed into the room enthusiastically.
The manner-adverbial analysis of PASSIVE predicts incorrectly that (e) and (f) are grammatical passives in English: (e) (f)
*England was remained in patiently by John. *The room was dashed into enthusiastically by Fran.
7 The use of rule features creates problems for determining the status of counterexamples. Given such features, it is easy to treat counter-examples to a particular analysis as exceptions, and difficult to distinguish between the two in a number of cases. In any event, a rule-feature analysis makes the weakest claims about syntax, and for that reason is least
The analysis of passives
251
preferable. Such statements raise issues which would be best pursued elsewhere; for additional arguments against rule features, see Kayne 1969, Bresnan 1972. 8 The analysis of semantic functions adopted here is that of Gruber 1965—although there are serious problems with that analysis (see Freidin 1975a for discussion.) 9 This strongly suggests that semantic functions (as opposed to grammatical relations, as defined in Chomsky 1965) are not affected by transformations. 10 This scheme for a lexical entry is similar to that set forth by Stockwell, Schachter & Partee (1968:937–9). 11 This fact, and the fact that each predicate regardless of syntactic form governs the same semantic functions, suggest a way to investigate the possibility that “the categories N, V, A are the reflection of a deeper feature structure, each being a combination of features of a more abstract sort” (Chomsky 1970:199). 12 At this point, there is no evidence for or against rules like (a) or (b) to account for cases like promise (V) and promise (N): (a) (b)
Mv: @+øis analysed as a V Mn: @+øis analysed as a N.
For some evidence that gerundive nominals are nouns, see Emonds. 13 Obviously the semantics of give is more complicated than the discussion suggests. Consider
Harry gave his wife a beating.
It is unclear whether the set of NP’s in braces have the same semantic function, or whether give in each case is the same verb. Only with cupcake—and, marginally, kiss—is Dative Shift acceptable for me. Only the former could occur as surface subject in a passive. Furthermore, additional semantic information must be provided for gift in order to characterize its meaning adequately. Where the semantic relatedness between lexical items cannot be predicted by a general rule of interpretation, such information will be simply listed in the lexicon. 14 It is assumed that any predicate which can co-occur with a wh-phrase (excluding time and place adverbials like by 5 o’clock and (near) by the tree) is a passive predicate. This excludes cases like born, realized, and reincarnated, as in (a) (b) (c)
Max was born on November 15. /p/ is realized as f preconsonantally Moses was reincarnated as a butterfly.
Generative grammar
252
(The latter two examples were called to my attention by Jim McCawley personal communication.) Such predicates must be categorized as predicate adjectives if ad-hoc rule features are to be avoided. 15 It may not be necessary to include the other semantic functions—i.e., those governed directly by prepositions in PP’s, since these can be interpreted in terms of the semantic content of the preposition and the range of interpretation it takes when governed by the main predicate. The linear ordering of PP’s in most instances seems to be a matter of stylistics. See Freidin 1970 for discussion. 16 See Bartsch & Vennemann 1972 for a discussion of comparative adjectives along similar lines. 17 Present participles (V-ing) have a similar, but more restricted, distribution. Because the evidence is not as straightforward as in the case of passives, I take no stand on the proposition that such participles are adjectives. 18 By claiming that passive predicates are adjectives, we confront the problem of adjectives that cannot occur in prenominal position—of which the following is a representative sample: *the chased criminal… *the yelled at boy… *the owned mansion… *the told story… *the killed man… *the followed speech… *the preceded event…
However, this sample is no more a problem for the framework being developed here than it is for one that includes rules like PASSIVE and Modifier Shift. Regardless of whether adjectives are derived from reduced relative clauses, there is no a-priori reason why these examples should be ruled out on syntactic grounds. Whatever the solution to this problem, it will have to account for the difference between the murdered man and *the killed man. It seems unlikely that such distinctions can be made in terms of general syntactic features. 19 These examples are discussed in Bowers 1968. 20 The constraint is probably even more general—e.g., that no verb can govern two identical non-conjoined semantic functions. This rules out * John gave Mary a book to Bill, which is syntactically well-formed (NP V NP NP PP, cf. John gave Mary a book for her birthday), but where both Mary and Bill will be interpreted as goals.
13 Review of Ideology and Linguistic Theory: Noam Chomsky and the deep structure debates by Geoffrey J. Huck and John A.Goldsmith Huck and Goldsmith’s book, Ideology and Linguistic Theory (henceforth ILT), presents a new perspective on the generative semantics distraction in the history of modern generative grammar (roughly 1967–75). During this period, a group of transformational grammarians (principally George Lakoff, James McCawley John Robert Ross, and Paul Postal) attempted to assume leadership of the field by developing a program for semantic research (called “generative semantics,” henceforth GS) based on a critique and revision of the theory of syntax in Chomsky (1965, henceforth Aspects), thus presumably in opposition to Chomsky’s own research program. At the outset, GS generated a great deal of interest and excitement, but by the mid-1970s it had more or less disappeared from sight. ILT, like other historical accounts of the period (Newmeyer 1980, 1986; Harris 1993), repeats the familiar tale of an intense and acrimonious debate between proponents of GS and Chomsky over the existence of a linguistic level of deep structure. However, unlike Newmeyer and Harris, Huck and Goldsmith argue that the debate was inconclusive and therefore cannot account for the demise of GS. Instead, they argue that external factors, and not the disconfirmation of its central claims, were responsible for the failure of the GS research program. Nonetheless, the current consensus among theoretical linguists, they claim, is that GS “was falsified,” in spite of the fact that its central claims are “by and large still accepted” (p. 92). …significant chunks of what were evidently standard Generative Semantics analyses began to reappear in the syntax literature shortly after the movement’s demise and are now often regarded as constituting preferred solutions to contemporary problems. Indeed, the picture of grammar presented in much contemporary work, including, for example, Chomsky’s Lectures on Government and Binding (1981) and Knowledge of Language: Its Nature, Origin, and Use (1986), is similar enough in certain crucial respects to that painted by Generative Semantics in the late 1960s that one who accepts the standard story must be prepared to explain why criticisms of the latter do not also apply to the former. [p. 2] This apparent paradox provides the basis for their conclusion that contemporary linguistic theory is governed by ideology If their view is accurate, then ILT provides a case study
Generative grammar
254
of irrationality in what is supposed to be a field of rational inquiry (cf. Kuhn 1962; Feyerabend 1975) and therefore is of interest beyond the history of generative grammar. While some of the material in ILT does recreate the flavor of the period—in particular, the bits of correspondence between Chomsky and others, and the interviews with some of the most prominent researchers of the period (Jackendoff, Lakoff, Ross, and Postal) included in an appendix that makes up half of the book—the major claim of this study seems easily refutable, even if we grant the assumption that the central claims of GS were not disconfirmed (but see Newmeyer [1996] and the references cited by him for the contrary view). The basis for this claim is weak at best, including unmotivated interpretations and a general lack of solid evidence. GS arose as a particular articulation of the Aspects model, which was the first of Chomsky’s works to discuss the interface between syntax and semantics in any detail. “The earliest recognizably GS works were attempts to fill gaps and rectify defects in the Aspects conception of syntax while preserving most of its framework” (McCawley 1995:311). Yet ILT discusses Huck and Goldsmith’s view of the 1967 GS program five pages before the Aspects program, giving a rather misleading impression of how the two research programs were related. In chapter 2, the discussion of competing research programs is organized around sketches that are intended to identify the core propositions of the program and its basic orientation. For example, Chomsky’s initial research program, presented in The Logical Structure of Linguistic Theory (1955a, henceforth LSLT), is characterized as being “distributional” in orientation with the following core propositions: 1. The distribution of linguistic elements is determined by structural properties of the grammatical strings made up of those elements. 2. The formal (phonological, syntactic) part of the grammar is autonomous from the semantic part. 3. Structural differences among the grammatical sentences of a language can be accounted for by a set of phrase structure rules and the transformations that apply to them. [p. 14] Huck and Goldsmith imply that LSLT is primarily concerned with the distribution of linguistic elements to the exclusion of other concerns (e.g., sound-meaning correspondences, which they designate as a “mediational” orientation). However, as Chomsky notes in his introduction to the 1975 edition of LSLT, the point of view assumed throughout was that “the grammar is to provide a means for semantic description, and should fall in place in a broader semiotic theory in which this promissory note is made good” (Chomsky 1975a:19). Moreover, “semantic criteria are essential for determining the soundness of the general approach, since the grammar is to provide the means to account for the use and understanding of language (Chomsky 1975a:19). In LSLT, Chomsky distinguishes between the appeal to meaning and the study of meaning, the latter being “an essential task for linguistics” (p. 97). Thus it is rather misleading to characterize the LSLT program as completely unconcerned with the semantics of natural language. (See also Chomsky [1955b] for additional discussion of semantics.) The claim that the LSLT program was unconcerned with semantics appears to be reinforced by what Huck and Goldsmith list as its second core proposition (p. 14), often referred to as “the thesis of autonomy of formal grammar.” Such appearances are
Review of ideology and linguistic theory
255
misleading. Chomsky’s discussions of autonomy have focused on a much narrower issue—namely, whether core semantic notions are required for the proper formulation of syntactic rules. His answer has been consistently negative, but this does not mean that semantic considerations are irrelevant to the formulation of syntactic analyses. By listing autonomy as a core proposition, Huck and Goldsmith imply that it is somehow intrinsic to the program. Yet, as Chomsky’s discussion of such hypotheses in his June 1974 lecture at the Linguistic Society of America’s Golden Anniversary Symposium (published as Chomsky 1975b) shows, a notion regarding the autonomy of formal grammar is simply an empirical issue to be resolved, not something on which the viability of the theory depends. It is also worth noting that this purported core proposition, which Huck and Goldsmith claim is carried over to the Aspects program (p. 26), was barely mentioned in the GS literature (with the exception of Lakoff’s [1972] argument concerning preposing of adverbs) and certainly was not a central part of the alleged debate between Chomsky and proponents of GS, as indicated by the fact that the discussion of the autonomy of formal grammar in Chomsky (1975b) contains no mention of arguments by generative semanticists other than that of Lakoff, which Chomsky judged as having no bearing on the autonomy thesis. Thus it is rather odd that a “core proposition” on which GS and Chomsky disagreed should not be the subject of extensive debate. It is also distinctly odd that Huck and Goldsmith do not tie the existence of deep structure to a core proposition of the Aspects program and the interpretivist research program that replaced it—if presumably this was a key issue around which the debates raged. Whether there actually were any “debates” between Chomsky and the proponents of GS is controversial. Chomsky has denied he took part in such debates, with the exception of a paper given at a 1969 Texas conference (Chomsky 1972a; see Barsky 1997:151). In chapter 3, “Rhetorical Strategies and Linguistic Argumentation,” Huck and Goldsmith cite three case studies. The first involves McCawley’s complicated analysis of respectively constructions (McCawley 1968). Chomsky commented on the manuscript McCawley sent him in a letter dated 20 December 1967, to which McCawley replied on 17 January 1968. Then, on 10 December 1968, McCawley wrote again to Chomsky, complaining that Chomsky had misrepresented his argument in his class lectures. McCawley writes again to Chomsky on 8 January 1969 after seeing a draft of the paper eventually published as Chomsky (1971) which contains a critique of McCawley’s analysis. On 12 February 1969, Chomsky responds, saying essentially that he is not going to revise his manuscript because he does not feel that he has misrepresented McCawley’s argument. In his review of Chomsky (1972b), McCawley (1975) admits to bungling his argument concerning respectively and tries to restate the argument more clearly. Hardly a debate, even a private one. The next example is weaker still. Lakoff (1972) contains an argument against deep structure based on the analysis of adverbs. Chomsky (1975b) comments on this two years later in the Linguistic Society of America Golden Anniversary Symposium paper mentioned above. The third example concerns Postal’s 1969 paper on the verb remind (Postal 1970). Chomsky comments on this analysis in his 1969 Texas conference paper (Chomsky 1972a). The manuscript generates an exchange of letters with Ross and with Postal. If the purpose of these case studies is to paint a portrait of Chomsky as an ideologue who defends his pet ideas (e.g., deep structure) by overwhelming his critics with powerful rhetoric, then Huck and Goldsmith are surely on the wrong track. Chomsky has
Generative grammar
256
always maintained that the existence of deep structure is an empirical issue. If deep structure is part of Chomsky’s linguistic ideology, then how do we explain his willingness to give it up in recent work (e.g., in Chomsky [1993], which Huck and Goldsmith cite in another context, but not to point out this highly relevant fact)? Turning to the published literature, we again find little evidence to support the view that Chomsky was engaged in an intense debate with proponents of GS. Huck and Goldsmith cite Chomsky’s (1970) “Remarks on Nominalizations” (written in 1967) as “what was perceived to be Chomsky’s first published response to GS” (p. 28). This perception was off the mark, however, because that paper was written in response to the complicated transformational analysis in Robert Lees’s (1960) dissertation on nominalizations. Lakoff’s (1965) dissertation, which it has been claimed the paper was intended to refute, is only mentioned once in a couple of brief footnotes. Chomsky (1971; written in 1968) is another paper that might be interpreted as an attack on GS. In fact, it does contain critiques of two arguments against deep structure, McCawley’s (1968) respectively analysis and Lakoff’s (1968) analysis of instrumental adverbs. In addition, it contains a general argument against “any variety of semantically-based grammar (what is sometimes called ‘generative semantics’)” (Chomsky 1971:198). However, the main thrust of the paper is to reevaluate “the relation of syntactic structure to semantic representation in generative grammar” (Chomsky 1971:183). Chomsky’s concern here is to develop a theory of grammar that departs from a model where underlying structure is the sole interface for semantic interpretation—a characterization that applies equally to Aspects and GS. “Specifically, these modifications have to do with some possible contributions of surface structure to delimiting the meaning of a linguistic expression” (Chomsky 1971:183). The discussion of McCawley’s and Lakoff’s arguments, which he characterizes as “proposals of a more ‘semantically-based’ grammar that have been developed in important work of the past few years” (Chomsky 1971:183), arise quite naturally in the context of the paper, which attempts to sort out the contributions of underlying and surface structures to semantic interpretation. Note that almost from the outset, Chomsky was aware that surface structure played a role in semantic interpretation (see Chomsky [1965:224 n. 9], which cites examples in which the order of quantifiers at surface structure affects semantic interpretation, examples that were also mentioned in Chomsky [1957:100–101]). Although Huck and Goldsmith’s evidence for a debate on deep structure between Chomsky and proponents of GS is unconvincing, it does give a clear impression that the two sides appear to have been talking past one another. In retrospect, it seems obvious that this would happen. Chomsky was committed to the psychological interpretation of grammar as outlined in the first chapter of Aspects (written in 1958–59). Chomsky comments in the introduction: In LSLT the “psychological analogue” to the methodological problem of constructing a linguistic theory is not discussed, but it lay in the immediate background of my own thinking. To raise this issue seemed to me, at the time, too audacious. It was, however, raised in the review article [of Syntactic Structures] by Lees [1957]…. In rewriting LSLT for publication in 1958–9 I presented this alternative psychological
Review of ideology and linguistic theory
257
interpretation as the framework for the entire study, and in later work have continued to do so. [1975a:35–36] Thus, from the late 1950s on, Chomsky was increasingly concerned with the central problem of language acquisition—of how an individual acquires rich grammatical knowledge on the basis of impoverished information. His tentative solution was that a child must be born with sufficient mental linguistic structure. It did not seem that the analyses of GS and the powerful descriptive devices they employed would lead to the kind of theory that could address the problem of language acquisition. Note also that this was probably Chomsky’s motivation for attempting to provide, seven years after the fact, an alternative to Lees’s (1960) transformational analysis of nominalizations. The rich transformational apparatus Lees employed was not compatible with the goal Chomsky had set for linguistic theory. GS, on the other hand, was primarily concerned with constructing a theory that could account for semantic interpretation broadly construed (including language use). Moreover, GS, for the most part, rejected psychological interpretations of grammar. Having disposed of the apparent myth of the deep structure debates between Chomsky and the proponents of GS, let us turn now to the purported paradox that motivates the title of this study. Here, too, there is a distinct lack of evidence. For there to exist a paradox of the sort proposed by Huck and Goldsmith, it should be the case that some linguist has claimed that GS was disconfirmed and that that linguist has also proposed a theory that adheres to the central claims of GS. Clearly, Chomsky does not fit the bill since he has never claimed that GS was disconfirmed. Nor are Huck and Goldsmith helpful in identifying any other linguist to whom the paradox applies. This still leaves open the question of whether the central claims of GS have been incorporated into current work on generative grammar. Huck and Goldsmith identify three core propositions supposedly unique to GS: 1. Syntactic and semantic representations are of the same formal nature, i.e., they are labeled trees. 2. Semantic representations resemble the constructions of formal logic, with quantifier scope and anaphor-antecedent relations explicitly expressed. 3. Syntactic and semantic representations are related via transformations, [p. 20] The first and second propositions given here are identical to propositions that Huck and Goldsmith give in characterizing the “core” of the LSLT program (p. 14). (Newmeyer [1996] points out that Lakoff and McCawley rejected the first proposition a few years after the advent of GS and that the second proposition was rejected when global rules were introduced.) So, if we eliminate the autonomy thesis from the list of core propositions, GS circa 1967 appears to be an extension of the LSLT program. Note that Chomsky’s assessment of the situation in 1969 was that the differences between GS and the Extended Standard Theory (EST) he was developing were not as radical as had been claimed: There is an appearance of a considerable diversity of points of view—and to some extent the appearance is correct. However, I think that the dust is
Generative grammar
258
beginning to settle, and that it is now perhaps possible to identify a number of real, empirically significant theoretical questions that have been raised, if not settled, in this work. I also think much of the apparent controversy is notational and terminological—including many issues that appear to be fundamental and have been the subject of heated, even acrimonious dispute. [Chomsky 1972a:63–64] Chomsky sharpens this assessment by stating “…when some vagueness and faulty formulations are eliminated, I think we can show that generative semantics converges with EST in most respects” (Chomsky 1972a:74). The question to be raised about Huck and Goldsmith’s three core propositions of GS is whether they were disputed by Chomsky and other linguists working within the EST framework. If not, then the fact that contemporary syntactic analysis is compatible with them tells us nothing about ideology in linguistic theory. Even a cursory reading of Chomsky (1972a) reveals that these alleged core propositions were not the subject of dispute between GS and EST. For example, consider the first and second propositions in light of the following: More generally virtually any proposal that has been made concerning semantic representation can, in equally uninteresting ways, be reformulated so as to use phrase markers for semantic representation. In particular, this is surely true of Katz’s [1967] semantic representations. It is difficult to imagine any coherent characterization of semantic content that cannot be translated into some “canonical notation” modeled on familiar logics, which can in turn be represented with properly bracketed expressions where the brackets are labeled (phrase-markers). The whole question is a pseudo-issue. [Chomsky 1972a:75] Given that Chomsky was willing to grant that semantic representations could be given as annotated phrase markers resembling constructions of formal logic and that it was uncontroversial that phrase markers could be related by transformation (virtually the definition of transformational rule), then the third proposition would pretty much follow as a consequence. (See, for example, Chomsky’s [1972a:182] analysis of quantifiers as well as the more developed analyses in Chomsky [1976], which are clearly precursors of May’s [1977] quantifier raising analysis.) Thus, these core propositions are not incompatible with EST, in particular Huck and Goldsmith’s characterization of the core propositions of EST given on page 43. Of course, Chomsky did not accept the specific analyses of GS that attempted to spell out these core propositions, but his objections were based on empirical considerations, not on a rejection of basic assumptions. If according to Huck and Goldsmith’s account of the core propositions of GS there is no difference between GS and EST, then there should be no principled reason why GS analyses should not show up in contemporary work. However, it is far from obvious that specific GS analyses are incorporated into contemporary syntactic theory. Although Huck and Goldsmith claim at the outset that GS analyses are adopted in Chomsky (1981) and
Review of ideology and linguistic theory
259
(1986), no specific examples are given. Instead, they offer some general comments about Chomsky’s proposals for a level of Logical Form (LF), which they note conforms to the core propositions of GS (pp. 44–45)—unsurprisingly, given the previous quote from Chomsky (1972a). The only specific example cited in ILT is May’s (1977) analysis of quantifier raising, which is claimed to be conceptually similar to the GS analysis of quantifier lowering. What Huck and Goldsmith fail to point out is that the GS analysis, unlike May’s, involves the claim that quantifiers are treated as higher predicates that must be lowered into their surface structure positions. So the claim is simply false at the level of details. Moreover, GS analyses are based on the fundamental assumption that underlying structures are semantic representations, an assumption that was rejected in EST and the work that developed from it. In EST and its decendants, underlying representations are not LF representations, and the latter are not semantic representations. At least since Chomsky (1975c), it has been assumed that LF representations are mapped onto meanings by additional semantic rules “interacting with other cognitive structures” (Chomsky 1975c:105), giving fuller representations of meaning. Given that current work makes almost no use of “semantic representations” as conceived under GS, it is difficult to maintain that such work has incorporated GS analyses of any sort. The main point of ILT is to show how linguistic theory is ideologically motivated. Since Huck and Goldsmith’s case collapses so completely primarily because their interpretation of history turns out to be so at variance with the facts, one can only wonder whether their charge is more applicable to their own revisionist history than to the history they seek to revise.
14 Linguistic theory and language acquisition A note on structure-dependence This commentary concerns the relation between linguistic theory and psycholinguistic studies of child language acquisition. Crain’s elegant experiments ably demonstrate how linguistic theory provides concepts, categories, and data that can be fruitfully explored in language acquisition studies. The question that remains to be answered is what these studies tell us about linguistic theory—that is, about the nature of the language faculty. I will try to formulate a tentative answer on the basis of a more detailed discussion of “the parade case” for postulating innate linguistic constraints: the structure-dependence of grammatical rules. The discussion of the structure-dependence of rules contains a speaker’s knowledge of the status of the following paradigm (from the original presentation in Chomsky 1971b, pp. 25–28): (1)
a. b.
The dog in the corner is hungry, Is the dog in the corner _ hungry.
(2)
a. b. c.
The dog that is in the corner is hungry. Is the dog that is in the corner _ hungry? *Is the dog that _ in the corner is hungry?
(In the question examples the italicized copula is interpreted in terms of the underlined empty position in the sentence.) Assuming that the question sentences are derived by a transformational movement of the copula from the empty underlined positions to the sentence-initial position, the argument for the structure dependence of rules is given as follows. A child exposed to data in (1) has at least two options for postulating a transformation to derive (1b): (3) a. Move the first (or left-most) tensed copula to the sentence initial position, or
b. Move the tensed copula immediately adjacent to the subject NP to the sentence initial position.1
Option (3b) is a structure-dependent formulation because it requires a hierarchical analysis of the sentence to identify the subject NP adjacent to the copula. In contrast, (3a) is structure-independent because it relies solely on the analysis of the sentence as a linear string, that is, it makes no reference to hierarchical structure.
Linguistic theory and language acquisition
261
As (2) illustrates, more complex constructions (in this instance a subject that contains a finite relative clause) show that the structure-independent rule makes the wrong predictions. Taking (2a) as the underlying structure for (2b,c), the rule (3a) would generate the deviant (2c) and not the well-formed (2b). A child who formulates (3a) should produce sentences like (2c), presumably until explicitly corrected, or presented with sentences like (2b), which falsify the hypothesized rule. Because children do not make these errors, an innate constraint on structure-dependence must be guiding their acquisition of the question formation rule. This account is based on an empirical assumption that children encounter (or pay attention to) simple sentences prior to those with subjects containing finite relative clauses. Whether this is actually the case has not been established. If children hear and process sentences like (2b) along with the simple cases before they formulate the question formation rule, then they have been presented with positive evidence against the structure-independent rule (3a). Notice that there is yet another rule of question formation a child could hypothesize, namely (4): (4)
Move any tensed copula to sentence-initial position.
This rule differs from the structure-dependent rule to the extent that it does not directly involve an analysis of the sentence in terms of its hierarchical structure as (3b) does with its explicit mention of subject NP. It is also different from the structure-independent rule (3a) because it does not involve a counting predicate like “first” or “left-most.” This counting property of the structure-independent rule explicitly ignores hierarchical structure. Sharpening the distinction concerning structure-dependence, we can take counting to be the salient property of structure-independent rules. Given this, (4) would be classified as a structure-dependent rule, because it is compatible with a hierarchical analysis of the sentence even though it does not directly refer to this analysis. Rule (4), in contrast to rule (3b), will generate the deviant (2c), so the reason children do not produce such sentences cannot be explained by a constraint on the structuredependence of rules, as Crain’s discussion suggests. The explanation might hold if there were some principle of grammar that prohibited rules of the form (4). This is unlikely, since (4) is simply a more general version of (3b), without the contextual restriction (adjacent to a subject NP). A transformational movement rule consists of an elementary operation (substitution or adjunction) that can be further specified in terms of a structural description and a structural change. The structural description indicates the special properties of a phrasemarker to which the elementary operation applies, and the structural change indicates how the elementary operation modifies the phrase-marker. The elementary operation is part of universal grammar (UG). Any language whose grammar involves movement operations involves one of these elementary operations. The structural description of a transformation is a language-specific condition on the application of the elementary. Current syntactic theory has dropped the statement of these language-specific structural descriptions from the formulation of transformational rules because it has been shown that the behavior of transformations can be accounted for in terms of more general
Generative grammar
262
conditions on the output and application of elementary operations—conditions that are also part of UG.2 From this perspective, rule (4) is just an instance of the general rule (5): (5)
Move α,
where α stands for any syntactic category (e.g., tensed copula) and “move” is an abbreviation for the elementary operations of substitution and adjunction. More precisely, substitute α for β and adjoin α to β, where α and β are syntactic categories. In a theory in which rule (5) accounts for question formation in English, a structuredependence constraint on the form of transformations does not explain the deviance of example (2c). Rather, there is a general principle of UG that accounts for the deviance of such constructions. Example (2c) violates the principle known as the Head Movement Constraint (Travis 1984), which restricts the movement of a lexical head to a head position that governs the phrasal projection of the head. In the structure underlying (2b, c), the head position to which the tensed copula moves governs the main clause copula but not the relative clause copula. The Head Movement Constraint also accounts for the deviance of questions in which an auxiliary verb has been moved into sentence-initial position across another auxiliary, as in (6a), in contrast to the well-formed question. (6)
a. b.
*Have John should _ left you a copy? Should John _ have left you a copy?
Since this analysis is posited at the level of UG, it is preferable to one that involves a language-specific rule like (3b), all other considerations being equal. Note that the issue of structure-dependence is still relevant, since it is important to distinguish between rule (5) and structure-independent rules like (3a) that do not occur in the grammars of human languages as far as we know. Given our characterization of the structure-independent rules above, we can eliminate the possibility of such rules by a constraint on the form of transformations that prohibits the counting property. This is achieved by restricting the structural analyses that define transformations to Boolean conditions on analyzability a constraint that was proposed in some of the earliest work on transformational grammar.3 This restriction excludes quantificational statements that are needed to instantiate the counting property. Thus, we eliminate structure-independent rules by prohibiting quantificational statements in the formulation of transformational rules. In this way the structure-dependence of rules follows from a more specific constraint on the form of rules.4 The analysis of structure-dependence given above should be instructive concerning the relation between linguistic theory and psycholinguistic studies of language acquisition. It demonstrates why it is crucial for psycholinguistic research to be informed by current developments in linguistic theory.5 It also shows that the kinds of studies reported in Crain’s target article do not tell us much about syntactic theory. For example, they do not distinguish between the analysis of question formation as given in rule (3b) as opposed to rule (5) plus the Head Movement Constraint. The relationship between linguistic theory and the kinds of language acquisition studies discussed by Crain therefore seems to be one-sided, in answer to the question raised at the outset. What these studies do show, however, is that quite young children demonstrate the kind of linguistic behavior we
Linguistic theory and language acquisition
263
would expect if our theory of the language faculty (i.e., an innate UG with very specific properties) is on the right track. In this way they contribute to our understanding of human language and for this reason they merit our careful attention. Acknowledgment I wish to thank Maggie Browning and Carlos Otero for helpful discussion of this material. Notes
1 Obviously the rule of question formation applies more generally to tensed auxiliary verbs. How to specify the full range of elements that are moved need not concern us here. 2 For a further discussion, see Freidin (1978, pp. 541f.). 3 See Chomsky (1965) for discussion. See also Lasnik and Kupin (1977) for a discussion of an even more restrictive theory of transformations that eliminates Boolean conditions. 4 Note further that structure-dependence is also relevant to the formulation of general principles of UG. We do not expect to find principles of grammar that depend on counting elements in a linear string. Thus, principles at the level of UG must also be structure-dependent in the relevant sense. Because this constraint defines the general character of the language faculty, it should presumably be construed as a fact of nature rather than as an axiom of the theory. 5 This is not meant as a general criticism of Crain’s work, which is clearly informed by current work in syntactic theory.
15 Conceptual shifts in the science of grammar 1951–92* Chomsky’s work in linguistics over the past forty-one years has resulted in a number of fundamental changes in our conception of grammatical theory its scope and practice. The way in which Chomsky’s early work diverged from that of his teacher and mentor, Zellig Harris, provides one interesting and important case study Another concerns the shift in focus from systems of rules to systems of grammatical principles, and the related shift of emphasis from derivations to representations, both of which occurred during the 1970s and crystallized in Chomsky 1981. Most recently, Chomsky has attempted to give a more general interpretation to the interaction of rules and principles in Universal Grammar (UG) in terms of language design, and has demonstrated how this interpretation constitutes a minimalist programme for linguistic theory. In addition to opening new lines of investigation, this work promises to reshape our understanding of the language faculty in essential ways. 1 The advent of transformational generative grammar Chomsky’s formal introduction to linguistics began in 1947, when he read the proofs of Harris’s Methods of Structural Linguistics (1951; henceforth Methods). As Harris described the book in the introduction: This volume presents methods of research used in descriptive, or more exactly, structural, linguistics. It is thus a discussion of the operations which the linguist may carry out in the course of his investigations, rather than a theory of the structural analyses which result from these investigations. The research methods are arranged here in the form of the successive procedures of analysis imposed by the working linguist upon his data. The operations that Harris details in Methods are essentially taxonomic procedures of segmentation and classification. The basic operations are those of segmentation and classification. Segmentation is carried out at limits determined by the independence of the resulting segments in terms of some particular criterion. If X has a limited distribution in respect to Y, or if the occurrence of X depends
Conceptual shifts in the science of grammar
265
upon (correlates completely with) the occurrence of a particular environment Z, we may therefore not have to recognize X as an independent segment at the level under discussion [footnote omitted]. Classification is used to group together elements which substitute for or are complementary to one another [footnote omitted]. (p. 367) Harris goes on to say in a footnote that ‘the class of elements then becomes a new element of our description on the next higher level of inclusive representation’. These operations yield a grammar of lists: ‘In one of its simplest forms of presentation, a synchronic description of a language can consist essentially of a number of lists’ (p. 376). These include a segmentphoneme list, a phoneme distribution list, several morphophonemic lists, lists dealing with type and sequences of morphemes, a component and construction list and a sentence list—the list of utterance structures.1 The purpose of this grammar of lists is to state all the regularities in a language, derived from an exhaustive analysis of a presumably representative corpus of utterances. In striking contrast, Chomsky’s first major work in linguistics, his master’s thesis ‘The morphophonemics of Modern Hebrew’ (henceforth MMH),2 virtually bypasses these taxonomic procedures and focuses instead on the construction of a grammar that describes grammatical sentences in a language, of which the linguist’s analysed corpus is merely a special subclass. In this work Chomsky equates the linguistic analysis of a language with ‘the process of converting an open set of sentences—the linguist’s incomplete and in general expandable corpus—into a closed set—the set of grammatical sentences—and of characterizing this latter set in some interesting way’.3 He goes on to distinguish linguistic method (e.g., as discussed in Harris’s book) from linguistic description. Accordingly we might distinguish and consider separately two aspects of the linguistic analysis of a language, a process of ‘discovery’ consisting of the application of the mixture of formal and experimental procedures constituting linguistic method, and a process of ‘description’ consisting of the construction of a grammar describing the sentences which we know from step one to be grammatical, and framed in accordance with the criteria related to its special purposes. (p. 2) In MMH Chomsky assumes that the two processes are sequential steps in linguistic analysis. Yet his work on Hebrew concerns only the second step. ‘The outline of Modern Hebrew grammar given below is an example of the second step in linguistic analysis, artificially isolated’ (p. 3). In attempting to motivate the distinction between discovery and description, Chomsky notes that for a linguistic grammar ‘considerations of elegance’ are operative in both processes. However it will still be useful to consider the processes of discovery and description separately For the most reasonable way to approach to the
Generative grammar
266
investigation and analysis of the notions of simplicity in terms of which ‘grammatical in L’ is defined (i.e., those notions of elegance that are relevant to the very formulation of the procedures of linguistics) seems to be to assume, for some language, that the grammatical sentences are fixed (i.e., that the process of discovery has been completed) and to determine the effect on grammar-formulation of explicit considerations of simplicity imposed on the grammatical statement [footnote omitted]. (p. 3) In effect, these remarks indicate how Chomsky’s initial work in linguistics is already moving in the direction of a theory of structural analyses (presumably, but not necessarily, derived from the application of linguistic method)—in contrast to Harris’s Methods, where the distinction between method and theory is made (see the first quote from Methods on page 653) but the issue of theory is not pursued. Another point where Chomsky’s early practice diverges from Harris’s method concerns a distinction which Harris sets up in section 20.21 of Methods. Harris notes that an investigator in linguistics faces a choice of purposes: to state ‘all the regularities which can be found in any stretch of speech, so as to show their interdependences (e.g., in order to predict successfully features of the language as a whole)’, or to synthesize ‘utterances in the language such as those constructed by native speakers’ on the basis of some minimal information (p. 365). The procedures presented in Methods are geared most naturally to the former purpose. Moreover, as Chomsky notes in the introduction to The Logical Structure of Linguistic Theory (1955a, published 1975; henceforth LSLT), ‘Harris did not elaborate on the suggestion that a grammar can be regarded as a device for “synthesizing utterances,” an idea that does not, strictly speaking, seem compatible with the general approach of Methods’ (p. 50: n. 45). In MMH Chomsky gives an explicit interpretation to Harris’s notion of ‘synthesizing utterances’ in terms of a generative grammar. Chomsky describes the origin of this work as follows: Harris suggested that I undertake a systematic structural grammar of some language. I chose Hebrew, which I knew fairly well. For a time I worked with an informant and applied methods of structural linguistics as I was then coming to understand them. The results, however, seemed to me rather dull and unsatisfying. Having no very clear idea as to how to proceed further, I abandoned these efforts and did what seemed natural; namely, I tried to construct a system of rules for generating the phonetic forms of sentences, that is, what is now called a generative grammar. I thought it might be possible to devise a system of recursive rules to describe the form and structure of sentences, recasting the devices in Harris’s methods for this purpose, [footnote omitted] and thus perhaps to achieve the kind of explanatory force that I recalled from historical grammar [footnote omitted]. (LSLT, p. 25)
Conceptual shifts in the science of grammar
267
Although the introductory section of MMH suggests that ‘the application of the mixture of formal and experimental procedures constituting linguistic method’ (e.g., Harris’s) constitute step one in linguistic analysis, Chomsky’s actual practice demonstrated that step one could be bypassed.4 It is the implementation of ‘step two’ in linguistic analysis that leads Chomsky to the formulation of a theory of structural analyses in LSLT. Note that Chomsky’s notion of ‘synthesizing utterances’ differs significantly from Harris’s. In section 20.3 (‘Description of the language structure’), Harris characterizes linguistic analysis as follows: The work of analysis leads right up to the statements which enable anyone to synthesize or predict utterances in a language. These statements form a deductive system with axiomatically defined initial elements and with theorems concerning the relations among them. The final theorems would indicate the structure of the utterances of the language in terms of the preceding parts of the system. (pp. 372–3) Although the mention of ‘deductive system’ has a distinctly contemporary ring—recall Chomsky’s many discussions of the deductive structure of the current principles and parameters theory of UG,5—Harris’s notion of synthesizing utterances involves an actual synthetic process, starting with phonetic units and proceeding from them to the construction of phonemes, morphemes and so on to the construction of utterances. For Chomsky, however, the notion ‘synthesizing utterances’ is interpreted analytically. Chomsky is quite explicit about this in the introductory section of MMH, in the version submitted to the University of Pennsylvania:6 the theoretical linguistic system constructed is available for descriptions of small parts of the realm of experience constituted by the totality of its individuals. In one of the most interesting cases, the part to be described is an individual language. As above, the system can be applied in two ways. One can begin with the elementary phonetic units, construct phonemes, morphemes, syntactic classes, etc., proceeding synthetically. Or one can state the most general unit (i.e., the sentence) in terms of its constituents (e.g., the particular phrases of the language), further analyze these into their constituents, etc., until finally every possible sentence is represented in terms of phonetic units, thus proceeding analytically. Again, the choice will depend on considerations of elegance and adequacy (p. 3) For Chomsky MMH is ‘an attempt to carry out an analytic statement of Modern Hebrew grammar’. In context, it constitutes a radical departure from the kind of linguistics Chomsky had been learning as a graduate student. We have some indication of how radical a departure from the fact that Chomsky himself considered the work he was doing in MMH as ‘more or less as a private hobby, having no relationship to “real linguistics”’ (LSLT, p. 29).1
Generative grammar
268
Adopting the analytic approach to grammar construction led Chomsky to postulate abstract underlying morphological forms, which were simply outside the range of structures adducible via Harris’s methods of segmentation and classification. Note that many of the intermediate derived forms are also beyond the range of Harris’s methods. Consider for example the sample derivation given in section 6, A.2 (where line 0 gives the underlying morphophonemic form): 0. 1.
[MR 2]
2.
[MR 3]
3.
[MR 8]
4.
[MR 9]
5.
[MR 13.1]
6.
[MR 13.4]
7.
[MR 16]
8.
[MR 22]
9.
[MR 35]
10.
[MR 37]
11.
[MR 44]
The derivation of kamá under this analysis involves the sequential application of a set of morphophonemic rules where the output of one rule serves as the input of another. The derivations proceed analytically from a morphophonemic representation to a phonemic representation—in contrast to Harris’s synthetic procedure which begins with phonetic analysis and progresses to phonemic and then morphophonemic level of representation, presumably without intermediate representations between the phonetic/phonemic or phonemic/morphophonemic levels. Recall that Chomsky’s morphophonemic analysis is itself embedded in a broader syntactic analysis of Hebrew sentences, sketched in the introductory sections of MMH though not elaborated there. Chomsky’s use of abstract underlying representations (in contrast to Harris) is also a distinguishing feature of his syntax. By 1952, I was working on generative grammar of English, and shortly obtained results that I found quite exciting, though they were entirely divorced from the systems of procedures of analysis on which I was working at the same time; in particular, results on the system of auxiliary verbs in simple declaratives, interrogatives, negatives, passives, and on complex verb constructions such as ‘want (NP) to (VP)’, ‘consider NP (to be) Predicate’ [footnote omitted]. As in the case of my earlier work on morphophonemics of Hebrew, it was possible, so it became clear, to discover systems of rules that made sense of the distribution of forms,
Conceptual shifts in the science of grammar
269
principles that served to explain the collection of superficially chaotic and anomalous facts. In this case, too, investigation led to more abstract underlying structures that were far removed from anything that might be obtained by systematic application of procedures of analysis of the sort that I was investigating. (LSLT, pp. 30–1) The systems of rules Chomsky is referring to here are of course those of phrase structure and transformational structure as he was then developing them.8 It is well known that Chomsky’s theory of transformations is very different from Harris’s.9 One of the most salient differences between the Harris and Chomsky theories is that grammatical transformations for Harris were essentially equivalence relations among sentences, whereas for Chomsky they were rules of grammar. Thus Harris’s first published discussion of grammatical transformations in section 2.33 of Harris 1952 uses the term in the title of the section but not in the text of that section. Instead, he discusses the procedures of establishing equivalences and providing several examples: But what is ‘equivalence’? Two ELEMENTS are equivalent if they occur in the same environment within the sentence. Two SENTENCES in a text are equivalent simply if they both occur in the text (unless we discover structural details fine enough to show that two sentences are equivalent only if they occur in similar structural positions in the text). Similarly, two sentences in a language are equivalent if they both occur in the language. In particular, we will say that sentences of the form A are equivalent to sentences of the form B, if for each sentence A we can find a sentence B containing the same morphemes except for differences due to the difference in form between A and B. For example, N1 VN2 is equivalent to N2 is V-en by N1 because for any sentence like Casals plays the cello we can find a sentence The cello is played by Casals.10 In Harris 1957 he gives a more formal definition of what constitutes a transformational relationship: If two or more constructions (or sequences of constructions) which contain the same n classes (whatever else they may contain) occur with the same n-tuples of members of these classes in the same sentence environment (see below), we say that the constructions are transforms of each other, and that each may be derived from any other of them by a particular transformation. (p. 147) Moreover, this relationship is predicated in terms of strings rather than constituent structure, as in Harris 1965, where he explicitly rejects constituent structure analysis in favour of string analysis. Thus for Harris, a transformation is a relation that holds between pairs of sentences.
Generative grammar
270
In striking contrast, Chomsky defines a transformation as a mapping of a string with its phrase structure interpretation (i.e., a phrase-marker) onto another phrase-marker. Such mappings may iterate and thus cannot be simply relations between sentences. Furthermore, they crucially rely on constituent structure analysis, unlike Harrisian transformations. Whereas Harris uses transformations as a means for describing the syntax of texts, Chomsky employs transformational analysis to get at a more explanatory analysis of the structure of language. As he notes in Syntactic Structures, for example, ‘transformational analysis brings out the fact that negatives and interrogatives have fundamentally the same “structure,” and it can make use of this fact to simplify the description of English syntax’ (pp. 64–5). This kind of insight, which involves positing abstract underlying structures and intermediate stages of derivation is simply beyond the power of Harris’s transformational analysis, which is restricted to relations between pairs of surface strings. Perhaps the greatest difference between Harris and Chomsky concerns their interpretation of linguistic analysis. From the very outset, Chomsky adopts a realist interpretation of grammar. Language is somehow a real phenomenon of the physical world and as such has a particular structure—i.e., it is structured one way and not just any way. Thus there is a fact of the matter for all grammatical descriptions because they make empirical claims. Harris, in contrast, does not adopt the realist position. The methods described here do not eliminate non-uniqueness in linguistic descriptions. It is possible for different linguists, working on the same material, to set up different phonemic and morphemic elements, to break phonemes into simultaneous components or not to do so, to equate two sequences of morphemes as being mutually substitutable or not to do so. The only result of such differences will be a correlative difference in the final statement as to what the utterances consist of. The use of these procedures is merely to make explicit what choices each linguist makes, so that if two analysts come out with different phoneme lists for a given language we should have exact statements of what positional variants were assigned by each to what phonemes and wherein lay their differences of assignment. The methods presented here are consistent, but not the only possible ones of arranging linguistic description. (p. 2)11 It is obvious from this passage that for Harris there is no fact-of-the-matter in linguistic descriptions. He goes on to say that ‘the particular way of arranging the facts about a language which is offered here will undoubtedly prove more convenient for some languages than for others’. Thus they avoid the undesirable effect of forcing all languages to fit a single Procrustean bed, and of hiding their differences by imposing on all of them alike a single set of logical categories. If such categories were implied, especially to the meanings of forms in various languages, it would be easy to extract parallel results from no matter how divergent forms of speech.
Conceptual shifts in the science of grammar
271
Because Harris’s procedures ‘are merely ways of arranging the original data; and since they go only by formal distinctions there is no opportunity for uncontrolled interpreting of the data or for forcing of the meaning’. As a result, ‘the data, when arranged according to these procedures, will show different structures for different languages’. Thus Harris seems to deny the generality of his methods of analysis as part of a theory of linguistic structure. His focus here appears to be on the apparent diversity of linguistic forms rather than on some underlying unity. In rather striking contrast, consider the opening paragraph of Chapter 2, ‘The nature of linguistic theory’, of LSLT: Descriptive linguistics is concerned with three fundamental problems. On the one hand, the descriptive linguist is interested in constructing grammars for particular languages. At the same time, he is interested in giving a general theory of linguistic structure of which each of these grammars is an exemplification. Finally, he must be concerned with the problems of justifying and validating the results of his inquiries, and demonstrating that the grammars that he constructs are in some sense the correct ones. All three of these problems will occupy us in this investigation of linguistic structure. (p. 77) The agenda that Chomsky maps out here is clearly more ambitious than Harris’s. In addition to providing linguistic descriptions of particular languages, linguistic theory under Chomsky’s conception must provide a general theory of linguistic structure— something which Harris says his procedures will not provide. And furthermore linguistic theory must deal with the problems of justification and validation under the realist interpretation (which follows from the assumption that a grammar can be correct or incorrect). Chomsky’s realist interpretation of linguistic theory is grounded in the psychological interpretation of grammar: a grammar exists as part of the real world in the mind/brain of a speaker and constitutes that speaker’s knowledge of his/her language. Although this psychological interpretation is not discussed in Chomsky’s published writings until 1965 (in the first chapter of Aspects, written in 1958–912) Chomsky has noted that during the period in which he wrote LSLT (that is, circa 1955), ‘it lay in the immediate background of my own thinking’.13 However, he refrained from raising the issue at that time because it seemed to him ‘too audacious’. The earliest reference to the psychological interpretation of generative grammar occurs in the last section of Robert Lees’s review of Syntactic Structures in Language (1957). The final section (titled ‘Learning theory’) begins: ‘Perhaps the most baffling and certainly in the long run by far the most interesting implications of Chomsky’s theories will be found in their cohesions with the field of human psychology.’ Lees goes on to discuss Chomsky’s model of grammar as a scientific theory of human linguistic behaviour. He concludes the section and the review by raising the fundamental issue of language acquisition. We come now to the point of this lengthy discussion of induction versus theory construction. Though it is possible, it is certainly not an easy task
Generative grammar
272
for the psychologist to explain the mechanism by means of which a child, confronted with a vast and perplexing array of different stimuli, manages to learn certain things which can be generalized by induction from repeated occurrences. We would not ordinarily suppose that young children are capable of constructing scientific theories. Yet in the case of this typically human and culturally universal phenomenon of speech, the simplest model that we can construct to account for it reveals that a grammar is of the same order as a predictive theory. If we are to account adequately for the indubitable fact that a child by the age of five or six has somehow reconstructed for himself the theory of his language, it would seem that our notions of human learning are due for some considerable sophistication. To Chomsky it seemed obvious that language acquisition is based on the child’s discovery of what from a formal point of view is a deep and abstract theory—a generative grammar of his language—many of the concepts and principles of which are only remotely related to experience by long and intricate chains of unconscious quasi-inferential steps.14 Furthermore, Chomsky assumed that much of this theory must be innate in the mind/brain of the language learner. A consideration of the character of the grammar that is acquired, the degenerate quality and narrowly limited extent of the available data, the striking uniformity of the resulting grammars, and their independence of intelligence, motivation, and emotional state, over wide ranges of variation, leave little hope that much of the structure of language can be learned by an organism initially uninformed as to its general character.15 The psychological interpretation of grammar leads to a clarification of the notion ‘language’—though it should be noted that this clarification does not appear in print until Chomsky 1986a.16 The psychological interpretation concerns the internalized grammar of a speaker, a structure of the mind/brain—what Chomsky calls an I-language, where ‘I’ is intended to suggest internal, individual and intensional. In contrast, structural and descriptive linguistics is concerned with a rather different notion of language, ‘as a collection of actions, or utterances, or linguistic forms (words, sentences) paired with meanings, or as a system of linguistic forms or events’17—what Chomsky calls an Elanguage, where ‘E’ is intended to suggest external and extensional. The latter notion has no status under the psychological interpretation.18 In Chomsky’s earliest work on generative grammar, before he explicitly adopted the psychological interpretation, the discussion of the concept language’ tends in the direction of E-language—not surprising for someone trained in the tradition of structural linguistics. Consider, for example, the way Chomsky defines language and grammar in LSLT.
Conceptual shifts in the science of grammar
273
We define a language to be a set (in general, infinite) of strings in a finite alphabet, each string being of finite length. We define a grammar of the language L as a finite device which generates all and only the strings of L in a determinate way. (p. 71)19 Given such definitions, a primary goal of linguistic analysis is to construct a system of rules from which the language can be derived. This assumes that a language is in some sense a well-defined object—a false assumption simply because there is no way independent of the grammar postulated by the linguist to determine exactly all and only the grammatical sentences of a language. Under the psychological interpretation, grammar construction is subject to a further constraint. In addition to accounting for linguistic phenomena, a grammar must meet a criterion of learnability. Thus the system of rules in a grammar must plausibly be learnable given the primary language data available to a child learning a first language. Given that the theory of grammar under the psychological interpretation constitutes a theory of the language faculty, the innate mental structure of a child employs in grammar construction, the more linguistic phenomena that can be derived from the theory of grammar, the less that needs to be accounted for by constructing language-specific rules. Therefore, the psychological interpretation leads to a shift in focus from rule systems for particular languages to the more general properties of the language faculty. What happened in actual practice in the development of generative grammar was that investigations of general constraints on rule systems lead to a significant simplification of the rule systems themselves, as will be discussed in the following section. 2 From rules to principles In LSLT Chomsky has two related goals, to develop a general theory of structural analyses and to show how this theory applies to natural languages. The initial work on generative grammar is therefore focused on rule systems for the description of language phenomena. Thus LSLT is organized with a chapter of phrase structure analysis followed by one on the phrase structure of English (Chapters 7 and 8 respectively) and a chapter on transformational analysis followed by one on the transformational analysis of English (Chapters 9 and 10 respectively). The focus on rule systems predominates in the early work on generative grammar into the early 1960s (especially in Syntactic Structures and Chomsky’s contribution to the 1958 Texas Conference on Problems of Linguistic Analysis in English, published in 1962). 2.1 A brief history of the passive transformation Consider for example, the analysis of passive sentences and the passive transformation, perhaps the most central transformation within early transformational analysis. In SS we find the following formulation:
Generative grammar (1)
274
Passive—optional: Structural analysis: NP—Aux—V—NP Structural change: X1—X2—X3—X4→X4—X2+be+en—X3—by+X1
The passive transformation as formulated in (1) performs several distinct operations. It permutes an NP subject in front of an auxiliary phrase20 with an NP following the verb (usually construed to be the object of the verb). In addition it adjoins the passive auxiliary be+en to the right of the auxiliary phrase, and also adjoins the grammatical formative by to the left of the permuted subject NP. The rule is stipulated to be optional, indicating that it need not apply even if its structural description is met in the course of a derivation.21 Given this formulation, an active sentence and its corresponding passive counterpart would have the same underlying deep structure. Thus the transformational analysis expressed the relation between active and passive sentential structures in terms of a common deep structure. In Aspects Chomsky tries to improve on the analysis of the passive transformation by proposing that the passive wh-phrase constituted an instance of a manner adverbial. The motivation comes from an observation that passivization seemed possible only for verbs that could occur with manner adverbs.22 These observations suggest that the Manner Adverbial should have as one of its realizations a ‘dummy element’ signifying that the passive transformation must obligatorily apply. That is, we may have the rule (55) as a rewriting rule of the base and may formulate the passive transformation so as to apply to strings of the form (56), with an elementary transformation that substitutes the first NP for the dummy element passive and places the second NP in the position of the first NP:
(55)
Manner→by passive
(56)
NP—Aux -V-…-NP-…by passive−…
(where the leftmost…[a string variable] in (56) requires further specification—e.g., it cannot contain an NP). (pp. 103–4) Thus the wh-phrase would exist in underlying structure, thereby eliminating the need to insert a language-specific grammatical element (by) using the passive transformation. The analysis also solved the problem of the derived constituent structure of the by-phrase, which in SS had to be specified by an ad hoc rule of derived constituent structure.23 It assumed that markers such as passive were ‘drawn from a fixed, universal, languageindependent set’ (p. 223). The analysis of the passive auxiliary be+en is not discussed explicitly in Aspects. Note that the description of the passive transformation quoted above does not mention the insertion of the auxiliary as part of the transformation. This leaves open the possibility
Conceptual shifts in the science of grammar
275
that the passive auxiliary is introduced into the derivation via a phrase structure rule. Yet it is not clear from the Aspects phrase structure analysis of Aux that the passive auxiliary could be introduced via a phrase structure rule. The rewrite rule for Aux given in (57.xvi) on page 107 is essentially the same as the one given in SS on page 111 (i.e. Aux→C(M) (have+en) (be+en)) except that C is replaced by T (for tense) and the two aspectual auxiliaries are covered by a single designation Aspect’. However it is unlikely that the passive auxiliary was assumed to be introduced via a phrase structure rule given the discussion in SS concerning the problems created by such an analysis.24 In ‘Remarks on nominalization’ (written in 1967, published in 1970; henceforth ‘Remarks’), the analysis of passives and the passive transformation undergoes an even more radical revision. In ‘Remarks’ Chomsky considers the correspondence between sentences and nominals, as illustrated in (2)–(3). (2)
a. b.
The enemy destroyed the city The city was destroyed by the enemy
(3)
a. b. c.
the enemy’s destruction of the city the city’s destruction by the enemy the destruction of the city by the enemy
Earlier work on nominalizations (i.e., Lees 1960) assumed that the nominals (3a,b) were derived via a nominalization transformation which applied to the sentential forms (2a, b) respectively. Chomsky rejects the transformational analysis of nominals on the grounds that such transformations do not have the general character of other standard transformational rules. They are not productive—that is, not all verbs have a corresponding derived nominal. Furthermore, the relation between a derived nominal and its associated verb is idiosyncratic. Consider, for example, such nominals as laughter, marriage, construction, actions, activities, revolution, belief, doubt, conversion, permutation, trial, residence, qualifications, specifications and so on, with their individual ranges of meaning and varied semantic relations to the base forms. There are a few subregularities that have frequently been noted, but the range of variation and its rather accidental character are typical of lexical structure. To accommodate these facts within the transformational approach (assuming, as above, that it is the grammatical relations in the deep structure that determine meaning) it is necessary to resort to the artifice of assigning a range of meanings to the base form, stipulating that with certain semantic features the form must nominalize and with others it cannot. Furthermore, the appeal to this highly unsatisfactory device, which reduces the hypothesis that transformations do not have semantic content to near vacuity, would have to be quite extensive. (p. 19) In addition, nominalization transformations, unlike other transformations, require the ability to change categorial analysis—for example, changing a verb into a noun and correspondingly a sentential structure into an NP. However, given the theory of syntactic
Generative grammar
276
features developed in Aspects, Chomsky is able to formulate the lexicalist hypothesis which can accommodate the analysis of derived nominals in the lexicon and thereby achieve a simplification of the transformational component. The lexicalist hypothesis prohibits feature changing by transformation as well as transformations whose application depends on idiosyncratic features of lexical items. It assumes that idiosyncratic properties of individual lexical items are captured in the lexicon. Though the nominal (3b) is not transformationally derived from the sentence (2b), it s transformationally related to the nominal (3a) in much the same way that the passive construction in (2b) is transformationally related to the corresponding active construction (2a). The semantic relation between city and destruction, as well as enemy and destruction, is identical in both nominals even though their surface syntax is different. Chomsky analyses the nominal (2b) as derived from an underlying structure resembling (2a). (4)
[NP [Det [NP the enemy]] [destroy,+N] [NP the city] by ∆]
A indicates a base-generated empty category (NP). Thus in (4) the city bears the relation object of the nominal and the enemy bears the relation subject of the nominal in both (3a) and (3b). These grammatical relations are identical to those in the corresponding sentential forms (2a,b). Putting aside the insertions of ‘s in front of the nominal and of after the nominal, (3b) is derived from (4) via transformational operations that substitute the NP the enemy for ∆ and fill the empty subject position with the city (via another substitution operation). Chomsky establishes the independence of the two operations by pointing out that Agent postposing may apply without the application of NP preposing (even when it could apply as in (3c)), and that more generally Agent postposing does not require the presence of an object as illustrated by the derivation of (5a) from (5b). (5)
a. b.
[NP [Det the] [offer] by John] [NP [Det [NP John]] [offer,+N] by ∆]
If the wh-phrase in (5b) were missing, then Agent postposing could not apply and John’s offer would be derived. Furthermore, if the absence of the wh-phrase in derived nominals is treated as the absence of the phrase in underlying structure, given its optionality in phrase structure, then NP preposing could apply in the absence of Agent postposing, as illustrated in (6a) (derived from (6b)). (6)
a. b.
[NP [Det [NP John’s]] [appointment] to the committee] [NP [Det [NP ∆]] [appoint,+N] [NP John] to the committee]
If NP preposing does not apply to (6b), then the appointment of John to the committee is derived. Chomsky characterizes the two rules that apply to nominal constructions as follows: ‘Agent-postposing is simply a generalization of one of the components of the passive
Conceptual shifts in the science of grammar
277
transformation. NP-preposing is similar to, and may fall under a generalization of, the other component’ (1970:43). Note that he still assumes the existence of a passive transformation even though he has just demonstrated that the two movement components of the rule that applies to sentential structures apply independently in nominal constructions. Presumably this is because he still assumes that the passive auxiliary must be inserted via the transformation given his arguments in SS against base-generating the passive auxiliary. Thus the analysis in ‘Remarks’ identifies two subparts of the passive transformation as grammatical transformations with respect to nominals and as ‘elementary’ transformations (in the sense of LSLT) with respect to sentential structures.25 A similar analysis occurs in Hasegawa 1968 which identifies the postposing of agents (i.e., Agent-postposing) in gerundive nominals (e.g., the polishing of the lenses by the workers) as an independent grammatical transformation.26 With hindsight, we now know that the analysis in ‘Remarks’ constitutes the first important step towards a more general theory of transformations that does not allow the formulation of construction-specific transformations like Tpassive. It establishes that the movement component of the passive transformation is essentially a pair of substitution operations in which a lexical NP is substituted for an empty NP. The next step occurs with the realization that the rule of NP-preposing is actually an instance of a more general rule ‘Move NP’—or more accurately, Substitute NP. Thus much of the work of the early 1970s was directed towards an analysis of NP movement which generalized the NPpreposing component of the passive transformation to instances of subject-to-subject raising, as illustrated in (7), where ti indicates the underlying position of the co-indexed matrix subject and [α indicates a clause boundary (7)
a.
Johni is likely [α ti to win the election]
b.
Billi seems [α ti to be ahead in the polls]
c.
Maryi is expected [α ti to defeat the incumbent]
(Note that the matrix predicate in each case is distinct—a predicative adjective in (7a), a verb in (7b) and a passive predicate in (7c).) Given a rule Move NP, the structural description for the transformation can be reduced to the bare essentials. (8)
(υbl, NP, υbl, NP, υbl) (Chomsky 1976:7)
Only the movement site and the landing site are given as constant terms. The syntactic distance between the two positions must be mediated by other conditions which are not stated as part of the transformation—for example, the framework of conditions developed in Chomsky 1973, 1976. As it turned out, the development of a system of conditions on rules of grammar made it possible to formulate the more general theory of transformations, under the assumption that these conditions constituted general principles (i.e., of UG).
Generative grammar
278
2.2 From conditions to Move α In discussing the early work on the system of conditions—in particular, the Tensed-S, Specified Subject and Subjacency conditions—Chomsky makes the following important observations: If these principles can be substantiated or improved, the class of potential grammars is vastly reduced. Many potential rules are eliminated outright. Furthermore, by limiting the possible application of rules, principles of the sort discussed make it unnecessary to make available in the theory of transformations as rich an apparatus as would otherwise be needed to delimit the application of particular rules. Thus, the principles constrain the variety of grammars by reducing the ‘expressive power’ of grammatical rules. We might even set ourselves the goal, still distant but perhaps attainable, of so restricting the apparatus of the theory of transformations that rules can only be given in the form ‘move NP’, with other conditions on their application expressed either as general conditions on rules or as properties of initial phrase markers, or as properties of surface structures. (1975c:111–12) Work on the system of grammatical principles progressed so rapidly, that in Chomsky 1976 the rule Move NP is taken as an established result.27 In particular, work on the syntax of Romance languages (e.g., French and Portuguese) demonstrated that not only did Chomsky’s Tensed-S and Specified Subject conditions apply crosslinguistically but that they applied to constructions not found in English—e.g., clitic constructions (see Kayne 1975; Quicoli 1976a,b). To see how the Tensed-S Condition (henceforth TSC) and the Specified Subject Condition (henceforth SSC) mediate the syntactic distance between the movement and landing site with respect to Move NP, consider the following derived structures. (9)
a.
b.
finite clause complements: i.
*Johni was reported [α ti had recommended Mary]
ii.
*Maryi was reported [α John had recommended ti]
infinitival complements: i.
Johni was reported [α ti to have recommended Mary]
ii.
*Maryi was reported [α John to have recommended ti]
In essence, the TSC prohibits the application of a rule that links two positions across a finite clause boundary. Thus the application of Move NP in (9a.i—ii) violates this condition. The SSC prohibits the application of a rule that links two positions across a syntactic subject—hence the application of Move NP in (9a.ii) and (9b.ii) is excluded by the condition. In contrast, the application of Move NP in (9b.i) violates neither condition (or any other principle of grammar) and therefore is allowed.
Conceptual shifts in the science of grammar
279
Under the trace theory of movement rules,28 where ti constitutes an empty category left behind by the movement operation (e.g. substitution),29 it became possible to relate NP movement phenomena to anaphor binding by treating trace-binding as another instance of bound anaphora. Under this analysis it was possible to explain why NP-preposing in passive sentences was obligatory. As noted by Fiengo (1974, 1977), the trace binding relation in structures like (10) violates an independently motivated constraint on anaphor binding which accounts for (11). (10)
*ti was [VP criticized George by the committeei]
(11)
*himselfi [VP criticized Georgei]
Given that this constraint on anaphor binding is part of UG, we have derived a piece of the behaviour of NP-preposing from a general principle of grammar and therefore need not stipulate it in the formulation of the transformation. Moreover, as demonstrated in Chomsky 1976, the TSC and SSC apply to anaphor binding as well as NP movement. (12)
a.
b.
finite clause complements: i.
*Johni believes [α himselfi admired Mary]
ii.
*Johni believes [α Mary admired himselfi]
infinitival complements: i.
Johni believes [α himself to admire Mary]
ii.
*Johni believes [α Mary to admire himselfi]
In (12a.i–ii) anaphor binding violates the TSC; whereas in (12a.ii) and (12b.ii) the binding relation violates the SSC. In this way the paradigms in (9) and (12) are analogues. As a consequence, it appears that much, if not all, of the behaviour of NP movement can be derived from general principles of UG, and therefore that the movement transformation may be formulated simply as Move NP. In Chomsky 1976 the TSC and SSC are treated as conditions on the application of rules. Thus in (9) they block the application of Move NP, whereas in (12) they block a rule of interpretation that links the anaphor himself to the antecedent John. Another way to interpret these conditions is as conditions on representations—or, more specifically, as conditions on binding, assuming that an NP trace is the empty category analogue of a lexical anaphor. Restricting conditions (4) and (5) [the Prepositional Island Condition (PIC)30 and the SSC respectively], now, to rules of construal, we interpret them as applying to transformational rules as filters, in effect; the result of applying a transformational movement rule may or may not yield an appropriate case of ‘bound anaphora’. (Chomsky 1977b:77)
Generative grammar
280
Under this interpretation Move NP (or the rule that links an anaphor to an antecedent) applies freely. If the result is deviant, then the derived representation will violate some principle of grammar.31 The latter interpretation captures a significant generalization and is in addition motivated by a redundancy argument, as demonstrated in Freidin 1978. If we interpret these principles as conditions on the application of rules, then a further condition on the application of rules, the Strict Cycle Condition (Chomsky 1973), is required to rule out one derivation of such deviant sentences as (13). (13)
*John was reported that book to have been given.
The Strict Cycle Condition (henceforth SCC) prohibits the application of a rule within a cyclic subdomain of a current cycle. The derivation of (13) which violates the SCC is given in (14), where np is construed as a base-generated empty NP. (14)
a. b. c. d.
[α np was reported [α′ np to have been given John that book]] [α np was reported [α′ Johni to have been given ei that book]] [α Johni was reported [α′ ej to have been given ei that book]] [α Johni was reported [α′ that bookj to have been given ej ej]]
On the first transformational cycle, α′, John is preposed to the complement subject position. Then on the second transformational cycle, α, John is raised to matrix subject position. If the derivation stops here, we have a well-formed sentence John was reported to have been given that book. However, if Move NP applies again within α′, now a cyclic sub-domain of a, and substitutes that book for the trace of John in complement subject position, the resulting derivation violates the SCC. If, on the other hand, the SSC is interpreted as a condition on representations, then (14d) violates the SSC because John now binds its only trace across a subject even though no application of Move NP violated the SSG.32 Under this interpretation we derive the empirical effect of the SCC. If all the empirical effects of the SCC can be derived in this way, as argued in Freidin 1978, then the SCC is redundant with respect to those other principles of grammar from which its effects are derived. This result, as well as the binding theoretic interpretation of the TSC and SSC, suggests a shift in focus from conditions on rule application to conditions on syntactic representations, thereby reinforcing the shift from rules to principles.33 If this reduction of the ‘passive transformation’ to the rule Move NP is to succeed fully, then some account of the ‘special’ behaviour of the passive auxiliary be+en is required. Recall that in Chomsky 1976 the analysis in which the passive auxiliary is introduced via the rule of NP-preposing is explicitly rejected. Instead it is assumed that this auxiliary ‘derives from an independent source’ (p. 173). If this independent source is a phrase structure rule, then the argument in SS against base generating the passive auxiliary must be addressed. As Chomsky noted, the occurrence of the passive auxiliary is, unlike other auxiliary verbs, subject to a number of ad hoc restrictions:
Conceptual shifts in the science of grammar (15)
281
a. It can be selected only if it is followed by a transitive V b. It cannot be selected if NP follows V, even if V is transitive. c. It must be selected if a transitive V is followed by an agentive wh-phrase.
Examples (15a–c) rule out sentential constructions like (16a–c) respectively. (16)
a. b. c.
*The explosion was occurred at noon *Adam was praised Bernie *The candidate praised by the committee
Under the earlier transformational analysis of passives, these restrictions were accounted for by inserting the auxiliary as part of the passive transformation. However, if the passive auxiliary is base generated, then these restrictions must be handled by other ways—optimally, as the effects of UG principles. Such an account can be derived from a theory of predicate/argument relations developed during the 1970s. The theory concerns the assignment of semantic functions (or thematic relations (henceforth θ-roles)) to arguments of predicates. In active sentences the assignment of θ-roles to arguments is straightforward. (17)
Bernie praised Adam
In (17) the predicate praised assigns two θ-roles, one to the subject (θS) and one to the object (θO). In (18), the passive counterpart to (17), θ-role assignment works somewhat differently. (18)
Adami was praised ei by Bernie
The syntactic subject Adam is assigned θO by virtue of the trace it binds.34 The object of the by-phrase, Bernie, is assigned θS by the preposition by.35 It is standardly assumed that the passive morphology on the verb praised prevents it from assigning θS to the syntactic subject position as in the active construction. Furthermore it is generally assumed that any licit application of NP-preposing moves an NP to a nonthematic position—i.e., one that is not directly assigned a θ-role.36 The possibility of NP movement between two thematic positions, as in (19), is prohibited by a general principle (20) governing the distribution of semantic functions. (19)
*Adami praised ei
(20)
Functional Uniqueness: an argument may bear only one semantic function.
Generative grammar
282
Given the analysis of the active/passive pair (17–18), Adam would be assigned θS by virtue of occupying the subject position and would also be assigned θO via its trace in object position. Because the sentence Adam praised has no interpretation Adam praised himself, the structure in (19) is not licit—as predicted by the Functional Uniqueness principle. A further condition (21) rules out the possibility of a construction in which an argument bears no θ-role, as would occur if every syntactic position in a passive construction were filled with a lexical NP, as in (22). (21)
Functional Relatedness: each argument must bear a semantic function.
(22)
*Adam was praised Barbie by Bernie
In (22) θS is assigned to Bernie and θO is assigned to Barbie. Adam would receive no θ-role assignment in violation of (21). Thus Functional Relatedness accounts for restriction (15b) on the distribution of the passive auxiliary. Notice that it would also account for restriction (15a) under the assumption that passive morphology blocks θ-role assignment to the syntactic subject position. The final restriction, (15c), falls under another principle of predicate/argument structure—namely that each semantic function of a predicate must be uniquely assigned (that is, assigned only once)—henceforth the principle of Unique Assignment. Given that θS is assigned to the syntactic subject position by a verb lacking passive morphology or by the preposition by, it would seem that in (16c) θS is assigned twice to two distinct arguments in violation of the Unique Assignment principle.37 Thus, given the principles of Functional Relatedness and Unique Assignment, the apparently ad hoc restrictions on the distribution of the passive auxiliary follow from the theory of UG. These two principles plus Functional Uniqueness form a subcomponent of UG referred to as θ-theory.38 Given the principle of Unique Assignment, it would possible to base-generate a whphrase containing a lexical object rather than one with an empty object position which is filled by a rule of Agent postposing—that is, an instance of Move NP. One argument for base-generating the object of the wh-phrase in place concerns the derivation of nominals like (23). (23)
the criticism of the chairman by the committee
Under trace theory, the Agent-postposing analysis leaves behind an NP trace in prenominal position. The derivation of (23) would require the substitution of that trace with the determiner the, in violation of the Nondistinctness Condition on substitutions. Note that this analysis presupposes that there is only one prenominal position to be occupied by either a determiner or a possessional NP, a standard assumption during the 1970s.39 Granting basic assumptions about the analysis of categories, a noun (and hence the NP it projects) is distinct from a determiner. If we eliminate the Agent-postposing analysis in nominals, then there is little reason to retain it for sentential constructions.40 In eliminating the application of Move NP that postposes the underlying subject from the derivation of sentential passive constructions, we can no longer rely on binding theory
Conceptual shifts in the science of grammar
283
to explain the obligatory nature of NP-preposing as discussed above. Instead, the obligatory character of NP-preposing is subsumed under Case theory, another subcomponent of UG which was developed during the late 1970s. The basic idea behind Case theory, due to Vergnaud (1977), is that there are two kinds of NP positions, those that are marked for Case and those that are not, and only Casemarked NPs can be lexically realized. Chomsky (1980) formulates this idea in terms of a filter (24), generally known as the Case Filter: (24)
*N, where N has no Case. (p. 25)
Given the Case Filter, a lexical NP which occurs in an underlying Caseless position must move to a Case-marked position. This accounts for the syntax of predicate adjectives which take sentential complements. (25)
a. b. c.
*It is likely [α the situation to improve] The situationi is likely [α ej to improve] It is likely [α that the situation will improve]
In (25a) the complement subject the situation occurs in a Caseless position, in contrast to the finite clause complement subject in (25c). However, in (25b) the complement subject has been moved to the subject position of a finite clause (cf. (25c)) where it can receive Case. Thus (25a) violates the Case Filter, whereas (25b,c) do not. The NP-preposing that occurs in the derivation of (25b) is therefore required to satisfy the Case Filter. This Case-theoretic analysis generalizes to NP-preposing in sentential constructions containing predicates with passive morphology, as illustrated by the raising paradigm given in (26). (26)
a. b. c. d.
We expect [α more women to win elections] More womeni are expected [α ei to win elections] *It is expected [α more women to win elections] It is expected [α that more women will win elections]
Example (26a) shows that the subject of the infinitival complement of the verb expect is in a Case-marked position—presumably because of its syntactic relationship to the matrix verb since the subject of an infinitival is not itself a Case-marked position, as illustrated in (25a). However, the addition of passive morphology to the matrix verb renders the complement subject position Caseless, hence (26c) violates the Case Filter. Thus the complement subject more women must move to a Case-marked position to satisfy the Case Filter. As in (25b), movement to the subject position of a finite clause places the complement subject in a Case-marked position, thereby satisfying the Case Filter. The comparison of the two paradigms (25) and (26) suggests that passive predicates, in contrast to their nonpassive counterparts, share the same property as predicate adjectives—namely an inability to Case-mark an NP.41 ‘Suppose that the unique property
Generative grammar
284
of passive morphology is that it in effect “absorbs” Case: one NP in the VP with the passive verb as head is not assigned Case under government by this verb’ (Chomsky 1981:124).42 The analysis generalizes without qualification to simplex passive constructions. Therefore, the obligatory character of NP-preposing follows from a principle of UG, the Case Filter. Given the Case Filter, the Functional Uniqueness principle of the θ-Criterion, the TSC and the SSC, UG severely limits the syntactic behaviour of the rule Move NP. NP movement is obligatory when an NP occurs in an underlying Caseless position. It must wind up in a Case-marked position at S-structure. NP movement cannot occur between two θ-marked positions, nor can it cross a finite clause boundary or a c-commanding subject. The framework of principles developed over the 1970s demonstrated that the behaviour of transformations which had been stipulated in the structural descriptions and structural changes of more elaborate rules could be derived from general principles of UG.43 As a result, it was possible to dispense with structural descriptions for transformations altogether. Under this framework, a transformation could be given as a structural change which indicated the category (or categories—e.g., in the case of adjunction) affected and the elementary operation involved. In later work, circa 1980, Chomsky generalizes transformations still further by replacing specific reference to categories (e.g., NP or wh-phrase) with a variable α and characterizing the two distinct elementary operations which can perform movements (i.e., substitution and adjunction) under the designation ‘move’—hence the rule ‘Move α’.44 This change of view concerning transformations is discussed in Chomsky’s Lectures on Government and Binding (1981; henceforth LGB). In early work in generative grammar it was assumed, as in traditional grammar, that there are rules such as ‘passive’, ‘relativization’, ‘questionformation’, etc. These rules were considered to be decomposable into more fundamental elements: elementary transformations that can compound in various ways, and structural conditions (in the technical sense of transformational grammar) that are themselves formed from more elementary constituents. In subsequent work, in accordance with the sound methodological principle of reducing the range and variety of grammars to the minimum, these possibilities of compounding were gradually reduced, approaching the rule Move-α as a limit. But the idea of decomposing rules such as ‘passive’, etc., remained, though now interpreted in a rather different way. These ‘rules’ are decomposed into the more fundamental elements of the subsystems of rules and principles (1) and (2). (p. 7) In LGB the system of rules has the following subcomponents:
Conceptual shifts in the science of grammar
(27)
(=(1))
(i)
lexicon
(ii)
syntax
285
(a) categorial component (b) transformational component (iii)
PF-component
(iv)
LF-component
and the system of principles includes the following: (28)
(=(2))
(v) (vi)
(i)
bounding theory
(ii)
government theory
(iii)
θ-theory
(iv)
binding theory Case theory control theory
Given this view, descriptive categories like ‘passive’ have no status under the theory of grammar. Virtually all the properties of the ‘passive transformation’ have been derived from general principles of UG, including the movement operation itself, which is neither a language-particular nor construction-specific rule.45 2.3 Towards a theory of language design More recently Chomsky (1991) has suggested that the theory of grammar developed over the past two decades can be interpreted in terms of more general guidelines for language design. In essence, these guidelines legislate against ‘superfluous steps’ in derivations and ‘superfluous elements’ in representations. For example, the principle of Full Interpretation (Chomsky 1986) requires that every element in Phonetic Form (PF) and Logical Form (LF), the representations interfacing with systems of language use, must receive an appropriate interpretation. From Full Interpretation it follows that LF representations for natural languages may not contain vacuous quantifiers or arguments that are not functionally related to some predicate in the representation (as in (22)). Therefore the functional relatedness requirement of the θ-Criterion is subsumed under the principle of Full Interpretation, a more general requirement that representations be minimal in some sense. In this way Full Interpretation functions as a principle which determines the economy of representations. With respect to the economy of derivations, Chomsky (1991) proposes a ‘last resort’ condition on movement operations to the effect that movement cannot apply to a constituent unless the nonapplication results in the violation of some principle of
Generative grammar
286
grammar. For example in the derivation of sentential passive constructions the last resort condition predicts that the object of the by-phrase is exempt from movement since it is both Case-marked and θ-marked in its underlying position, in contrast to the object of the passive predicate which is Caseless in its underlying position. Notice that the addition of this condition to UG makes a new prediction—namely that there are no languages in which a θ-marked and Case-marked NP can move to a Caseless position which receives no θ-marking. Without the last resort condition such movements are possible under the principles of UG. The last resort condition on movements comes under the more general heading of a ‘least effort’ guideline for derivations.46 The last resort condition on movements also subsumes some of the empirical effects of other UG conditions—in particular, the functional uniqueness requirement of the θCriterion and the TSC and SSC. Consider the standard case of a Functional Uniqueness violation ((19) above, repeated here): (19)
*Adami praised ei
Example (19) violates the last resort condition on movements because the underlying object Adam is both Case-marked and θ-marked in that position and therefore the movement is presumably not required by any principle of UG.47 The same sort of analysis applies in the standard cases of TSC and SSC violations (e.g., (9a.i–ii) and (9b.ii) above). In each case the last resort condition on movement is violated because the NP moved is both Case-marked and θ-marked in its underlying position and therefore is not required to move by any UG condition. Although it remains to be demonstrated, it seems highly probable that the last resort condition on movements may subsume virtually all the empirical effects of the TSC and SSC as well as Functional Uniqueness. In his most recent work (1993) Chomsky develops the idea that linguistic derivations and representations should be minimal in a certain sense into a more ambitious programme for linguistic theory. In this paper he proposes a minimalist programme for linguistic theory based on a number of plausible, though at this point speculative, assumptions. The primary assumption that motivates much of this programme is that only PF and LF count as levels of representation with respect to UG. UG must determine the class of possible languages. It must specify the properties of SDs and of the symbolic representations that enter into them. In particular, it must specify the interface levels (A-P [articulatoryperceptual], C-I [conceptual-intentional]), the elements that constitute these levels, and the computations by which they are constructed. A particularly simple design for language would take the (conceptually necessary) interface levels to be the only levels. That assumption will be part of the ‘minimalist’ program I would like to explore here. (p. 3) The assumption is motivated by the fact that most principles of UG apply to either LF or PF representations. Part of the minimalist programme involves a demonstration that virtually all principles of UG apply at PF or LF and that apparent counterexamples of principles that hold at S-structure or D-structure can be reanalysed so that only PF and LF
Conceptual shifts in the science of grammar
287
are the relevant levels. ‘Conditions on representations—those of Binding Theory, Case Theory, Theta Theory, etc.—hold only at the interface, and are motivated by properties of the interface, perhaps properly understood as modes of interpretation by performance systems’ (Chomsky 1993:6). To the extent that conditions on the economy of representations and derivations restrict the kinds of representations that occur at the interface levels, these conditions are crucial to carrying out the minimalist programme. The discussion of the role of economy in grammatical description dates back to the advent of generative grammar—i.e., Chomsky’s MMH. There Chomsky identifies two kinds of criteria of adequacy for grammars—one concerning the correct description of the structure of the language under analysis, and the other concerning requirements imposed by its special purposes, ‘or, in the case of a linguistic grammar having no such special purposes, requirements of simplicity, economy, compactness, etc.’ (p. 1). In a footnote, Chomsky adds the following clarification: Such considerations are in general not trivial or ‘merely esthetic’. It has been recognized of philosophical systems, and it is, I think, no less true of grammatical systems, that the motives behind the demand for economy are in many ways the same as those behind the demand that there be a system at all. Cf. Goodman (1943). In other words, a grammar is not merely a description of a language; it is moreover an explanatory theory about the structure of a language—i.e., why a language has the properties it does. It is in this context that considerations of economy, etc. come into play.48 In the earliest work on generative grammar, notions of economy and simplicity were considered as part of an evaluation measure, itself part of the theory of grammar, which ranks the potential grammars available (assuming, of course, that the theory allows for more than one grammar that is consistent with the linguistic data). Under the psychological interpretation of grammar, where linguistic theory provides an explanation for language acquisition, an evaluation procedure was thought to be an indispensable element of the theory. To acquire language, a child must devise a hypothesis compatible with presented data—he must select from the store of potential grammars a specific one that is appropriate to the data available to him. It is logically possible that the data might be sufficiently rich and the class of potential grammars sufficiently limited so that no more than a single permitted grammar will be compatible with the available data at the moment of successful language acquisition, in our idealized ‘instantaneous’ model (cf. notes 19 and 20). In this case, no evaluation procedure will be necessary as a part of linguistic theory—that is, as an innate property of an organism or a device capable of language acquisition. It is rather difficult to imagine how in detail this logical possibility might be realized, and all concrete attempts to formulate an empirically adequate linguistic theory certainly leave ample room for mutually inconsistent grammars, all compatible with primary data of any conceivable sort. All such theories
Generative grammar
288
therefore require supplementation by an evaluation measure if language acquisition is to be accounted for and selection of specific grammars is to be justified; and I shall continue to assume tentatively, as heretofore, that this is an empirical fact about the innate human faculté de langage and consequently about general linguistic theory as well. (Aspects, pp. 36–7) From this perspective the relationship between linguistic theory and the particular grammars it provides is that of an evaluation procedure. Given a corpus (e.g., the primary linguistic data a child is exposed to) and a set of possible grammars, linguistic theory ranks the grammars and selects the most highly valued one. As noted in SS, ‘the strongest requirement that could be placed on the relation between a theory of linguistic structure and particular grammars is that the theory must provide a practical and mechanical method for actually constructing the grammar, given a corpus of utterances’ (pp. 50–1). A theory of this sort constitutes a discovery procedure for grammars. Early work on generative grammar assumed that linguistic theory would not be able to meet this requirement, and that only the weaker requirement of providing an evaluation procedure for grammars was attainable.49 However, with the development of the principles and parameters framework during the past decade and a half50 an evaluation measure for grammars has become essentially superfluous. With the reduction of the transformational component to a set of elementary operations that are part of UG and with the elimination of phrase structure rules (see note 45), it appears that linguistic theory allows only a severely limited number of grammars—perhaps only one for each given language. Under the minimalist programme of Chomsky (1993) an even stronger assumption is proposed: ‘there is only one computational system and one lexicon’, apart from a limited kind of variety found in the PF component and in the lexicon (including association of concepts with phonological matrices (what Chomsky refers to as Saussurean arbitrariness), properties of grammatical formatives like inflection, and detectable lexical properties like the linear orientation of heads and complements (the head parameter)). If this is correct, then current linguistic theory now provides a discovery procedure for grammars and has thereby achieved, in Chomsky’s own words, ‘a scientific advance of the highest importance’.51 Notes
* I am indebted to Carlos Otero for discussion of the material covered here and also for comments on an earlier draft, and to Larry Solan for his comments on the paper. For a discussion of Chomsky’s theory of grammar in the more general context of the cognitive revolution, see Otero (1992). See also Newmeyer (1986) for further discussion of the history of generative grammar. 1 See Harris (1951, Appendix to §20.3) for details. 2 This work, submitted in 1951, was based on his undergraduate thesis of 1949. A version of this work was published by Garland in 1979. In particular, the introductions of the 1951 and 1979 versions differ substantially. See below for further discussion, and also Otero (in preparation) for more extensive discussion of the differences between the two versions. 3 MMH (p. 1). In a footnote to the quotation he points out that the closed set is not necessarily finite. ‘Thus the resulting grammar will contain a recursive specification of a denumerable set of sentences’ (MMH: 67).
Conceptual shifts in the science of grammar
289
4 In spite of what he had achieved, Chomsky continued to believe in the importance of taxonomic discovery procedures for linguistic analysis for another two years.
It was not until 1953 that he abandoned any hope of formulating taxonomic ‘discovery procedures’ and turned [his] attention entirely to the problems of generative grammar, in theory and in application. It was at that point that [he] began writing LSLT, bringing together and extending the work [he] had begun on various aspects of generative grammar, but now with conviction as well as enthusiasm. (LSLT, p. 33)
5 See Chomsky and Lasnik (1993) and Freidin (1992) for discussion. 6 It should be noted that the 1979 version published by Garland Press differs from the University of Pennsylvania manuscript. For example, the introductory section of the manuscript version raises issues concerning the relation between logic and linguistics which have been deleted from the published version. See Otero (forthcoming) for a more detailed discussion of the differences between the two versions. 7 In the 1975 introduction to LSLT Chomsky cites Bloomfield’s ‘Menomini morphophonemics’ (1939) as a potential precursor of generative grammar (one that he was unaware of when he wrote MMH)—presumably because the notion of rule ordering is implicit in the work. A comparison of the MMH and Bloomfield’s paper is instructive. The latter is essentially an informal sketch in contrast to the formal rigour of Chomsky’s work where rule ordering is an explicit central topic. 8 In the case of phrase structure, Chomsky rules are modelled on Harris’s morpheme-toutterance formulas (see Harris 1946, and Chapter 16 of ‘Methods’). 9 However, a detailed account of the early history of transformational theory remains to be written. But see Chomsky (1979, Chapter 5) for some helpful discussion. 10 It is perhaps worth noting here that Chomsky also contributed to this initial work on transformations, as indicated in Harris’s first footnote: ‘It is a pleasure to acknowledge here the cooperation of three men who have collaborated with me in developing the method and in analyzing various texts: Fred Lukoff, Noam Chomsky, and A.F. Brown.’ And in the first footnote in the published version (1957) of his 1955 Presidential address to the Linguistic Society of America, Harris makes the following comment:
From a time when this work was still at an early stage, Noam Chomsky has been carrying out partly related studies of transformations and their position in linguistic analysis: see his dissertation, Transformational Analysis (University of Pennsylvania, 1955); and his paper ‘Three Models for the Description of Language’, IRE Transactions on Information Theory, IT-2, No. 3 (1956), 113–24; now also his book Syntactic Structures, The Hague 1957. My many conversations with Chomsky have sharpened the work presented here, in addition to being a great pleasure in themselves. The notion of transformation as equivalence relation explains the otherwise curious usage of the term in MMH, where phrase structure rules are referred to as ‘transformational statements’—see p. 6.
Generative grammar
290
11 See Kuroda (1988) for an attempt to interpret Harris’s transformational theory as a species of realism. However, unless Harris’s thinking underwent some radical change between the late 1940s and 1965, it seems unlikely that he adopted a realist interpretation of linguistic theory. He clearly never accepted the psychological interpretation of grammar that Chomsky put forth in Aspects. Note also that the comparison of Harris and Chomsky in Itkonen (1978:§3.5) fails precisely because the author has missed the fundamental difference υis-àυis realism. 12 See Huybregts and van Riemsdijk (1982:62). 13 LSLT, p. 35. 14 Chomsky (1965, p. 58). 15 Ibid. 16 See also Chomsky and Lasnik (1992) for additional discussion. 17 Chomsky (1986a, p. 19). 18 It is worth noting here that the rejection of the psychological interpretation of grammar by the structuralists was based on a perception of the fundamental diversity of human languages—a perception which seems to be wrong, given the current direction of contemporary generative grammar. Thus for example, Bloomfield criticizes Herman Paul’s Prinzipien der Sprachgeschichte (1880; 5th edn, 1920) for its ‘insistence upon “psychological” interpretation’. He claims that Paul’s discussion of language in terms of mental processes ‘add nothing to the discussion, but only obscure it’.
Paul and most of his contemporaries dealt only with Indo-European languages, and, what with their neglect of descriptive problems, refused to work with languages whose history was unknown. This limitation cut them off from a knowledge of foreign types of grammatical structure, which would have opened their eyes to the fact that even the fundamental features of Indo-European grammar, such as, especially, the part-of-speech system, are by no means universal in human speech. Believing these features to be universal, they resorted, whenever they dealt with fundamentals, to philosophical and psychological pseudo-explanations. (1933:17) These criticisms do not carry over to contemporary generative grammar. 19 In Syntactic Structures Chomsky defines language as ‘a set (finite or infinite) of sentences, each finite in length and constructed out of a finite set of elements’ (p. 13). It is this characterization that allows Chomsky to equate human languages with formalized systems of mathematics. Thus, he continues:
All natural languages in their spoken or written form are languages in this sense, since each natural language has a finite number of phonemes (or letters in its alphabet) and each sentence is representable as a finite sequence of these phonemes (or letters), though there are infinitely many sentences. Similarly, the set of ‘sentences’ of some formalized system of mathematics can be considered a language. Under the psychological interpretation, there is no similarity between a natural language and a formalized system of mathematics.
Conceptual shifts in the science of grammar
291
20 The Auxiliary Phrase need not contain an actual auxiliary verb under Chomsky’s analysis. At deep structure Aux will be at least a tense morpheme or its equivalent (e.g., in imperatives). 21 The stipulation was necessary under the SS analysis since there were other rules like Affix Hopping (called the Auxiliary transformation) which had to be designated as obligatory. See below for further discussion of the optional/obligatory distinction and how it was eliminated from the formulation of transformations. 22 However, this analysis turned out to be untenable because there were some verbs (e.g., know and think) which did not allow manner adverbs freely and yet had passive counterparts. Furthermore, analysing the passive wh-phrase as a manner adverb appeared to make a false prediction—that passive predicates could not take manner adverbs. To account for the fact that passives could take manner adverbs it would be necessary to have two manner adverb phrases in underlying structure. But that would require the addition of an ad hoc restriction against neither of them being realized as the passive wh-phrase. See Freidin (1975b) for details.
Note also that Chomsky’s analysis here is an adaptation of the one proposed in Katz and Postal 1964 which was the first to suggest that abstract markers in underlying structure distinguished one sentence type from another (e.g., question vs. imperative vs. passive). 23 See Aspects, p. 104. In SS Chomsky addresses the issues of derived constituent structure as follows:
We have not discussed the manner in which transformations impose constituent structure, although we have indicated that this is necessary; in particular, so that transformations can be compounded. One of the general conditions on derived constituent structure will be the following:
(77) If X is a Z in the phrase structure grammar, and a string Y formed by a transformation is of the same structural form as X, then Y is also a Z.
In particular, even when passives are deleted from the kernel we will want to say that the by-phrase (as in ‘the food was eaten—by the man’) is a prepositional phrase (PP) in the passive sentence. (77) permits this, since we know from the kernel grammar that by+NP is a PP. (77) is not stated with sufficient accuracy, but it can be elaborated as one of a set of conditions on derived constituent structure. (pp. 73–4)
24 For example, the phrase structure analysis would require that if the passive auxiliary occurred then the passive marker must also occur. Given the Aspects analysis of passives, the requirement could not be stated in a non ad hoc fashion. For further discussion of the analysis of the passive auxiliary, see Freidin (1975b). 25 It is worth noting here that in LSLT the passive transformation is also decomposed into its component parts.
The passive transformation Tp is based on an underlying elementary transformation tp which is the product of a permutation and a deformation. That is, we form a passive by interchanging the subject
Generative grammar
292
and object (by a permutation) and adding be+en between the auxiliary and the verb, and by after the verb. Tp will be applied to any string of the form NP1—VPA—VT—NP2 and will convert it into a string of the form NP2—VPA—be+en—VT—by+NP1. (p. 449)
In LSLT (p. 450) the permutation underlying the passive transformation is given as: πp: Y1,−Y2−Y3−γ4→Y4−Y2−Y3−Y1
and the underlying deformation as: δp: Y1−Y2−Y3−Y2− be+en+Y3−by+Y4 The Remarks analysis shows that the permutation itself can be broken down into two separate components, thereby suggesting that such permutations are not elementary operations, but rather the result of two elementary operations. 26 However Hasegawa’s interpretation of this result is less conservative than Chomsky’s.
Thus, wherever we can extract such cross-transformational elementary operations, it would be much simpler to reinterpret them as transformations (mapping rules) than to set up a far greater number of ‘transformations’ consisting of various combinations of these elementary operations (e.g., if there are three such elementary operations [that is, substitution, adjunction and deletion], seven different ‘transformations’ are theoretically possible). (p. 235) Moreover, he goes on to propose that the structural change of a transformation be limited to a single elementary operation as a means of limiting the expressive power of transformations (see §5). This proposal was, as we now know, ahead of its time. Hasegawa offers it as a conjecture. His analysis of passive constructions in itself provides very little motivation for the conjecture and does not discuss the consequences of adopting it other than to point out that the expressive power of transformations would be substantially reduced. 27 Note that in Chomsky 1976 the rule Move NP is taken to be ‘a natural consequence of assuming the condition of minimal factorization’—which prohibits any transformation whose structural description mentions ‘two successive categorial terms unless one is satisfied by a factor changed by the rule’ (pp. 172–3). However, without general conditions which mediate the behaviour of Move NP, it would not be possible to maintain as strong a condition as minimal factorization.
Conceptual shifts in the science of grammar
293
28 See Chomsky (1973: fn. 49, 1975c, 1975d) for the initial discussions. See also Selkirk (1972), Wasow (1972, 1979) and Fiengo (1974, 1977) for some early extensions and applications of trace theory. 29 In Chomsky 1980 it is noted that traces in syntactic representations result if transformations may not compound elementary operations.
Movement of the category α is assumed to ‘leave behind’ the category [α e], in accordance with trace theory. This assumption was implicit in earlier versions of transformational grammar, and becomes explicit when compounding of elementary transformations is forbidden. (p. 4) Note that trace theory now motivates Hasegawa’s conjecture as discussed in note 26. 30 The PIC is a refinement on the TSC—see Chomsky (1977b) for details. 31 Chomsky goes on to suggest that ‘it might be appropriate to give a similar interpretation to the subjacency condition for movement rules’. See Freidin (1978) and Browning (1991) for further discussion. 32 Of course there is a derivation of (12) from (13a) in which the application of Move NP violates the SSC—namely, the one where that book is moved into complement subject position first and then John is moved long distance into matrix subject position. 33 Note that the discussion of filtering in generative grammar goes back to Aspects, where the ‘filtering function’ of transformations is discussed. The first discussion of filters distinct from transformations occurs in David Perlmutter’s 1968 doctoral dissertation, published as Perlmutter 1971. See also Chomsky and Lasnik (1977) for further discussion of filters and their transformational character.
Regarding the redundancy argument concerning the SCC, a caveat is in order. The analysis given here and also in Freidin 1978 assumes that any lexical NP may substitute for a trace. This may be a dubious assumption, depending on how nondistinctness of categories is determined. If having a different index contributes to the determination of distinctness of categories, then two NPs with different indices will be distinct even if all other features are identical. If so, then the Nondistinctness Condition on substitutions will automatically prohibit the substitution of a lexical NP for a trace which bears a different index. (This might provide an explanation for the trace erasure principle of Dresher and Hornstein 1979 (that is, only designated elements (essentially nonreferential it and existential there) can erase traces) provided such pleonastic elements are nondictinct from an NP trace.) Under this analysis the empirical effects of the SCC follow from the Nondistinctness Condition on substitutions, a basic condition on an elementary transformational operation. (Notice that we obtain the same result with the copying theory of movement operations, whereby a moved phrase leaves behind a copy of itself (which is later deleted in PF)—see Chomsky 1992.) While this interpretation still supports the shift from rules to principles, it does not suggest any shift of focus from derivations to representations since the Nondistinctness Condition is fundamentally a condition on the application of an elementary operation. Furthermore, the analysis demonstrates that strict cyclicity follows from one of the most fundamental properties of movement transformations—hence is as deeply
Generative grammar
294
embedded in the theory of transformations as it is possible to be. As for the argument that the TSC and SSC are properly interpreted as conditions on representations rather than conditions on derivations, the redundancy of the SCC no longer applies. Instead, we are left with the argument that trace-binding is just another instance of anaphor binding and that the binding theory applies to binding relations irrespective of how they are formed. 34 Alternatively, it could have been assigned θ0 in underlying structure in the object position of the verb. 35 See Lasnik (1988) for further discussion. 36 See Freidin (1978) for discussion. 37 The alternative is to assume that once a θ-role has been assigned it is no longer available for other assignments—that is, Unique Assignment is just a consequence of θ-role assignment. In this case (15c) constitutes yet another violation of the Functional Relatedness principle since one or the other position would fail to be assigned a θ-role. 38 Functional Uniqueness and Functional Relatedness were originally proposed in Freidin 1978. The principle of Unique Assignment was originally proposed in Freidin (1975b: fn. 20) (though without that designation). The three principles are combined with a fourth (requiring every θ-role of a predicate to be assigned) as the θ-Criterion of Chomsky (1981:36).
A reasonable criterion of adequacy for LF is (4) [footnote suppressed]: (4) Each argument bears one and only one θ-role, and each θ-role is assigned to one and only one argument. I will refer to (4) as the ‘θ-criterion’. An argument is assigned a θ-role by virtue of the θposition that it or its trace occupies at LF.
For variant formulations see Chomsky (1981:335) and Chomsky (1986a). 39 The disjoint distribution of the determiner and possessional NP was expressed in phrase structure grammar using the curly brace notation. Under contemporary analysis, in which there are no phrase structure rules, this disjoint distribution remains to be explained. Ideally there is some explanation at the level of UG for the nonoccurrence of constructions as in (i). (i)
a. [NP [Det the] [NP the boy’s] book] b. [NP [Det the] [NP Sam’s] book] 40 See Bresnan (1972), Emonds (1976) and Hornstein (1977) for analyses which eliminate Agent-postposing in NPs. For example, Hornstein (1977:144f.) offers the following empirical argument for base-generating the wh-phrase with its lexical object in nominal constructions like (i). (i)
John’s photograph of Mary by Warhol
Conceptual shifts in the science of grammar
295
Given that the possessional NP John’s must be base-generated in prenominal position, this position cannot be the source for the object of the wh-phrase. Notice that this argument is not entirely compelling since the underlying source for the possessional NP John’s could be as postnominal position as in (ii).
(ii)
the photograph of Mary by Warhol of John’s
Although this construction may be somewhat awkward, there are certainly other perfectly acceptable nominals in which possessional NP occurs postnominally— e.g., that photograph by Warhol of John’s or that photograph of John’s. Under this analysis it would be possible for the NP Warhol to occur underlyingly in prenominal position and undergo Agent-postposing. However, the movement of the possessional NP to prenominal position would involve the substitution of a trace— which would be prohibited by the Nondistinctness Condition on substitutions (as mentioned in note 33). Therefore even if the possessional NP is moved to prenominal position, there is no motivation for (in fact, motivation against) postulating a movement analysis for the derivation of the wh-phrase. This argument generalizes to sentential forms as well. 41 See Chomsky (1955) and Freidin (1975b) for some discussion of the adjectival character of passive predicates. 42 For further discussion of Case absorption see Jaeggli (1986). See also Babby (1989) for a critical discussion of Case absorption with respect to Slavic languages and a proposal that deals with the crosslinguistic variation in terms of an optional subject parameter. 43 A similar analysis was developed for wh-movement in Chomsky 1973, 1975c, 1976, 1977b. See Freidin (1978:542) for further discussion. 44 Lasnik and Saito (1984) take the formulation one step further as ‘Affect α’, which includes deletions. See also Lasnik and Saito (1992). 45 A similar sort of reduction has been proposed for phrase structure rules. Because phrase structure rules tend to be language-specific stipulations, whereas the concept of phrasal projection and the constituent schema for projections are part of UG, it is generally assumed that phrase structure rules can be eliminated entirely. Instead the principles of X-bar theory and other principles of UG—in particular, Case theory and θ-theory—predict the details of phrase structure. The general idea is that whatever is stipulated in phrase structure rules is redundant because this follows from various parts of the UG in conjunction with the lexical properties of words in a language. See Stowell (1981) and Speas (1990), among others. 46 See Chomsky (1991) for a detailed discussion. 47 However, a more bizarre construction like (i) might provide independent motivation for a functional uniqueness principle. (i)
a. b.
np believed [α it to have been promoted Henry] *Henryi believed [α it to have been promoted ei]
Given (i.a) as the underlying structure of (i.b), where np indicates an empty subject position, the last resort condition does not block the movement from complement object position to matrix subject position. The underlying complement object
Generative grammar
296
Henry is Caseless and therefore must move to a Case-marked position. Since the complement subject position is filled with pleonastic it, the NP Henry must move into the empty matrix subject position to receive Case. In this position it also receives an additional θ-role, in violation of Functional Uniqueness. The validity of this analysis depends on the presence of pleonastic elements in underlying structure, which is not obvious. In fact given the principle of Full Interpretation, pleonastic elements like nonreferential it should not occur in LF representations since they receive no interpretation at LF. We might therefore assume that such elements are inserted at PF after movement operations (governed by the last resort condition) have applied. Under this analysis, (i) does not arise. 48 It is worth pointing out here that in MMH Chomsky’s notion of simplicity bears some similarity to the more current discussions of economy.
For the formulation of any relative precise notion of simplicity, it is necessary that the general structure of the grammar be more or less fixed, as well as the notations by means of which it is constructed. We want the notion of simplicity to be broad enough to comprehend all those aspects of simplicity of grammar which enter into consideration when linguistic elements are set up. Thus we want the reduction of the number of elements and statements, any generalizations, and, to generalize the notion of generalization itself, any similarity in the form of non-identical statements, to increase the total simplicity of the grammar. As a first approximation to the notion of simplicity, we will here consider shortness of grammar as a measure of simplicity, and will use such notations as will permit similar statements to be coalesced. (p. 5) To avoid circularity, the notation must be fixed in advance and neutral to any particular grammar. Given the fixed notation, the criteria of simplicity governing the ordering of statements are as follows: that the shorter grammar is the simpler, and that among equally short grammars, the simplest is that in which the average length of derivation of sentences is least. (p. 6) In current work, the ‘shortness’ of grammars and of derivations is driven by substantive principles of UG. 49 See Otero (1992: fn. 83) for some additional illuminating discussion. 50 See Chomsky and Lasnik (1991) and Freidin (1992) for discussion. 51 Cited in Otero (1992: fn. 83). It is worth noting in this regard that Chomsky considers the discussion of discovery procedures to be the major contribution of structural linguistics, among its other significant contributions.
Conceptual shifts in the science of grammar
297
Structural linguistics has enormously broadened the scope of information available to us and has extended immeasurably the reliability of such data. It has shown that there are structural relations in language that can be studied abstractly. It has raised the precision of discourse about language to entirely new levels. But I think that its major contribution may prove to be one for which, paradoxically, it has been very severely criticized. I refer to the careful and serious attempt to construct ‘discovery procedures’, those techniques of segmentation and classification to which Saussure referred. This attempt was a failure—I think that is now generally understood. It was a failure because such techniques are at best limited to the phenomena of surface structure and cannot, therefore, reveal the mechanisms that underlie the creative aspect of language use and the expression of semantic content. But what remains of fundamental importance is that this attempt was directed to the basic question in the study of language, which was for the first time formulated in a clear and intelligible way. (Language and Mind, p. 22)
16 Review of The Minimalist Program* by Noam Chomsky This collection of Chomsky’s recent technical papers (henceforth MP) documents what may well herald a third major breakthrough in the study of language and mind. At the very least, it provides a radical reformulation of certain parts of linguistic theory based on some broad speculations about language design and how it might accommodate the architecture of the mind/brain. The first three chapters of MP are reprinted with minor revisions from previous publications. Ch. 4 ‘Categories and transformations’ is a substantial elaboration of Chomsky 1994, which was published under a different title. The eleven-page introduction, most of it taken from §1 of Chomsky 1994, puts the following four chapters in historical/conceptual perspective.1 Ch. 1 ‘The theory of principles and parameters’ (coauthored with Howard Lasnik) presents a general overview of the PRINCIPLES AND PARAMETERS framework circa 1991, one which is transitional between a governmentbased and an economy-based theory because it incorporates notions of economy discussed in the second chapter (written almost three years before, in September 1988). Ch. 2, ‘Some notes on economy of derivation and representation’, elucidates proposals for economy conditions (e.g., FULL INTERPRETATION (Chomsky 1986a, henceforth FI) and LAST RESORT), which Chomsky cautions should be viewed as guidelines rather than principles of Universal Grammar (UG) because their formulations are too vague. These guidelines are intended to ensure that representations contain no superfluous elements and derivations no superfluous steps. The last two chapters deal specifically with a minimalist view of grammatical theory. The third chapter, ‘A minimalist program for linguistic theory’, sketches goals and directions for research under minimalist assumptions—in particular, the elimination of D-structure and S-structure as levels of linguistic representation—and further develops the economy-based view by adding new conditions, PROCRASTINATE and GREED. Ch. 4 subjects the basic framework for a theory of UG under the principles and parameters model to a searching critical analysis ‘in an effort to approach as closely as possible the goals of the Minimalist Program outlined in the introduction’ (219). It presents a radical reformulation of phrase structure theory and a significant revision of the theory of transformations. As Chomsky notes, ‘the end result is a substantially different conception of the mechanisms of language’ (219). MP is unique among Chomsky’s published works because of the theoretical distance it covers between the first chapter and the last. For example, while Ch. 1 contains a substantial section on government theory and Ch. 2 refers to the EMPTY CATEGORY PRINCIPLE (ECP) throughout, Ch. 3 suggests that government is not a legitimate grammatical concept (its effects derivable from other more basic notions), and Ch. 4 essentially assumes that the notion is not available for grammatical description. Not only does the ECP disappear from the landscape by the time we reach Ch. 4, but so do the
Review of the minimalist program
299
Case filter, the projection principle, the head movement constraint, the subjacency condition,2 most of X-bar theory, and parts (potentially all) of the θ-criterion. This alone makes MP a highly challenging book. Moreover, central concepts and mechanisms undergo quite significant reformulation as the discussion proceeds (e.g., the formulations of checking domain in Chs. 3 and 4, a minor readjustment compared to the discussions of phrase structure and transformations in Ch. 4). Without the fourth chapter, MP would still be Chomsky’s richest and most complex collection of essays. Ch. 4, arguably his most difficult paper to date, raises MP to another level entirely. At 175 pages, this chapter makes up almost half of MP and is just about equal to the length of Chomsky’s two Linguistic Inquiry monographs combined. Unlike the first three chapters or the monographs, Ch. 4 focuses on very broad and abstract conjectures about language design and the theoretical constructs that account for it. Where most of the numbered examples in Chs. 1–3 contain linguistic expressions from some particular language, those of Ch. 4 deal mostly with definitions, questions, conjectures, assumptions, abstract structures, lists of relevant topics or properties of constructions, and of course, various principles, conditions, guidelines, and just plain stipulations—less than half cite linguistic expressions. As a result, the empirical basis for much of the discussion is either assumed or passed over. Chomsky acknowledges that the ‘empirical questions that arise are varied and complex, and it is easy enough to come up with apparent counterevidence’, but states clearly that he is putting these problems aside and assuming ‘the best outcome, namely, that UG settles the matter’… ‘hardly an innocuous step’ he cautions (266). Thus Ch. 4, unlike much of Chomsky’s previous writing, focuses on conceptual as opposed to empirical considerations and eschews detailed analyses of linguistic data. From the way Chomsky talks about conceptual arguments in contrast to empirical arguments it is pretty clear that he prefers the former, though ideally conceptual and empirical considerations should eventually converge (as they did in much of the work that developed the principles and parameters framework, where the replacement of language-specific grammatical rules by mechanisms and principles of UG led to a new model of comparative syntax). With the exception of Ch. 2, the book contains many references to ‘conceptual necessity’, ‘conceptual grounds’, ‘conceptual arguments’, and ‘conceptual naturalness’ (most frequently in Ch. 4). Consider for example the following comment about a model of grammar that includes levels of D-structure and S-structure: The empirical justification for this approach, with its departures from conceptual necessity, is substantial. Nevertheless, we may ask whether the evidence will bear the weight, or whether it is possible to move towards a minimalist program. (187) As part of the minimalist program, Chomsky wants to eliminate the levels of D-structure and S-structure simply because they fail to meet the criterion of conceptual necessity in spite of substantial empirical justification. This methodology is central to the minimalist program, which seeks to limit the theory of grammar to what is conceptually necessary. In the case of levels of representation, this includes the interface levels of PF and LF and excludes all others. Empirical motivation
Generative grammar
300
for such a move may be unknown or even nonexistent. If so, then the conceptual argument suffices. The empirical motivation for the contrary conclusion has to be reanalyzed. This is of course quite reasonable, given that empirical arguments themselves are often based on assumptions whose justification is less than solid. Conceptual advances often result in the reevaluation of empirical evidence and arguments. Thus the criterion of conceptual necessity operates in the minimalist program as a type of Ockham’s razor. Exactly what constitutes conceptual necessity in the case of levels of representation is particularly clear, but in other domains may be open to interpretation.3 This style of reasoning, which has to some extent always been in the background of Chomsky’s work, is likely to strike most readers as rather alien and difficult to deal with, primarily because before the advent of the minimalist program most research in the field has been data-driven. This does not mean that Chomsky is no longer concerned with constructing an explanatory account of the properties of natural language. Rather, it seems that he has become increasingly skeptical that such an account can be constructed primarily on empirical arguments—which perhaps indicates an intensification of his critique of empiricism, now applied to the methodology of generative grammar. The result is that Chomsky has moved the principles and parameters model to unfamiliar ground. Not only is it difficult to evaluate conceptual arguments, but it is also unclear exactly how to proceed, if for the present empirical arguments are not strong enough to move the program forward—in marked contrast to the situation when Lectures on government and binding appeared in 1981. Not surprisingly, the motivation for proposing a minimalist program is essentially conceptual, though of course based on considerable empirical research within the principles and parameters framework. It comes from two related conjectures, stated in MP as questions. a. To what extent is language a ‘perfect system’? b. Is the computational system for human language (henceforth CHL) optimal (‘in some interesting sense’) and therefore unique among cognitive systems? Putting aside expected ‘imperfections’ in morphology and phonology Chomsky is optimistic that both questions will receive positive answers. He mentions two criteria for establishing the perfection of the system. One is that language should meet a condition of INCLUSIVENESS: the computation that generates structures (especially PF and LF) applies only to the elements of lexical items initially selected for the computation (the NUMERATION in the new terminology), and is restricted to rearrangements of lexical properties—thus the computation cannot create new objects (e.g., indices or bar-levels).4 The other criterion concerns the extent to which the language faculty is determined by general conditions, that is, without postulating additional special structure.5 Such conditions in turn depend on (1) how the language faculty interfaces with other cognitive systems and (2) ‘general considerations of conceptual naturalness that have some independent plausibility: simplicity, economy, symmetry, non-redundancy’, and so forth. At the moment we know almost nothing about (1) and considerably more about (2), which provides some basis for believing that (a) can be answered in the affirmative. The optimality of CHL will also be determined by the extent to which it conforms to some of the same considerations of conceptual naturalness, especially, economy. With these
Review of the minimalist program
301
considerations in mind, let us turn to the set of assumptions that constitute Chomsky’s minimalist program. The minimalist program shares at least three assumptions with its predecessors. First and foremost, there exists a language faculty in the mind/brain, a component that interacts with other cognitive systems. Next, the cognitive system of language connects, via levels of linguistic representation, with performance systems. Finally, performance systems do not differ across languages. Even at this fundamental level Chomsky remains cautious, noting that these assumptions are ‘not at all obvious’. As the minimalist program is grounded in the principles and parameters model,6 the two share further specific assumptions [see 17]: i. regarding CHL, the initial state S0 contains invariant principles and parameters (i.e., options restricted to functional elements)7 ii. selection Σ of parameters determines a language iii. a language determines an infinite set of linguistic expressions (SDs), each pair (π, λ) obtained from the interface levels, PF and LF iv. language acquisition involves fixing Σ v. the grammar of language states just Σ, lexical arbitrariness and the PF component aside Because these assumptions have been standard for over a decade, we can dispense with further discussion here. There is one further assumption, articulated for the first time in Ch. 3, which could have been considered prior to the minimalist program. Chomsky puts forward as a narrow conjecture the claim that there is no variation in the overt syntax or the LF component. Ignoring PF options and lexical arbitrariness, Chomsky wants to show that Variation is limited to nonsubstantive parts of the lexicon and general properties of lexical items’ (170) (but see n. 7 in this review). If this is correct, then ‘there is only one computational system and one lexicon’, apart from the variation mentioned. Whether this can be convincingly demonstrated may be a distant prospect, but it comes close to saying that human languages share a central core so that at the appropriate level of abstraction they are the same language—a bold and fascinating conjecture. We come at last to those assumptions that are unique to the minimalist program. The fundamental assumption from which the rest more or less follow is that the theory of grammar must meet a criterion of conceptual necessity (as mentioned above). In fact it is so basic that it is never explicitly mentioned as an assumption. It leads directly to the assumption that the interface levels LF and PF are the only relevant linguistic levels, in spite of apparent empirical evidence to the contrary (169). Chomsky further assumes that (i) all conditions are interface conditions and (ii) a linguistic expression is the optimal realization of these conditions (194). The discussion of interface conditions in MP is rather vague. Only the Case filter, now interpreted as a general condition on morphological feature checking, is explicitly mentioned (197). On p. 189 Chomsky refers to ‘(external) interface conditions’, presumably to leave open the possibility that there are internal interface conditions as well (i.e., output conditions that belong to CHL—for example, the various conditions of the binding theory perhaps). In Ch. 4, however, he discounts this possibility: ‘I will continue to assume that the computational system CHL is strictly derivational and that the only output conditions are the bare output conditions determined “from the outside,” at
Generative grammar
302
the interface’ (224), specifically the perceptual-articulatory interface for PF8 and the conceptual-intentional interface for LF. Thus CHL contains only conditions on the application of rules, whereas cognitive systems that use information from CHL impose conditions on representations (filters), BARE OUTPUT CONDITIONS (henceforth BOCs) in the new terminology.9 (For further elaboration of the derivational nature of CHL, see Epstein 1994 which proposes that ‘the only relations that exist for CHL are those established by the derivational process itself (254).) A related assumption of the minimalist program is that BOCs determine which items are ‘visible’ for computations (242). The example cited is that only maximal and minimal projections are accessible to CHL given BOCs. Although Chomsky offers no specific proposals for BOCs, the principle of FULL INTERPRETATION (FI) might be considered a candidate because it operates as an output condition on PF and LF which is imposed by the cognitive systems that use interface representations. From the generally informal discussion of this principle in MP, we might construe FI to require that every element in the interface representation must be interpretable to the cognitive system that uses it. This assumes that the external cognitive system will mark an interface representation as deviant if it contains an uninterpretable element, which requires that external cognitive systems can recognize uninterpretable elements even though they cannot assign them a legitimate interpretation. Under this conceptualization, the notion of interpretability plays a crucial role. This is developed in Ch. 4 in terms of a binary distinction ± interpretable for features. Categorial features and of nouns are+interpretable, whereas Case features generally and of verbs are -interpretable. When a [-interpretable] feature occurs at LF, for example, FI is violated and the derivation is said to CRASH. If only [+ interpretable] features occur at LF, the derivation CONVERGES.10 A convergent derivation must converge at both interface levels and thus must eliminate any [-interpretable] feature introduced via the selection of lexical items.11 Consider how this works for the following examples. (1)
*It is likely John to be here.
(2)
It is likely that John is here.
The Case feature of John is licensed in 2 but not in 1. In MP licensing is achieved via a checking procedure (see §4.5.2 for details) which results in the elimination of the— interpretable feature. Thus the derivation of 1 crashes because the uninterpretable Case feature remains at LF violating FI, whereas the derivation of 2 converges.12 The effects of the Case filter for lexical NPs are subsumed under FI, now interpreted as a BOC, so the former condition can be dropped from the inventory of UG. In this way, BOCs do seem to determine the structure and function of CHL itself, as Chomsky suggests.13 Another fundamental assumption of the minimalist program is that phrase structure representation is bare—meaning that it is composed of lexical features and objects constructed from them. Thus phrase structure is assumed to meet the condition of inclusiveness mentioned above. As a result, a large part of X-bar theory is discarded, in particular, bar levels and rule schema.14 It is worth recalling that as the principles and parameters framework developed, it became increasingly clear that phrase structure rules were largely redundant language-
Review of the minimalist program
303
particular stipulations that basically followed from principles of UG in conjunction with parameter settings. However, for well over a decade Chomsky made no specific proposal for how phrase structure was to be generated if there were no phrase structure rules. The proposal in Ch. 4 is that phrase structure is constructed via a transformational operation MERGE, which concatenates two elements (lexical items and/or concatenations of lexical items)15 and projects the category label of one on the newly formed constituent— basically classical adjunction (which does not produce a two-segment category, as in the analysis of Chomsky 1986b, which adapts a proposal in May 1985). Interestingly, this analysis establishes transformations as fundamental (and phrase structure rules as unnecessary) in contrast to earlier attempts to show the converse (e.g., Harman 1963 and Freidin 1970). Merge interfaces with the lexicon via a numeration N, a set of lexical items (including functional elements) selected from the lexicon for a derivation. Merge must apply to each item in N. If some item is selected more than once (e.g., he in he thinks he is clever), then merge must apply to that item the same number of times it was selected. If N contains any unprocessed elements, what has happened does not count as a derivation. Merge constructs phrase structure bottom-up, in contrast to the top-down nature of derivations under some versions of phrase structure grammar. Merge cannot construct a VP containing a verb and a sentential complement until the sentential complement has been constructed. Furthermore, merge cannot construct classical D-structures where all lexical elements are inserted into phrase structure prior to the application of movement transformations. For example, the classical D-structure representation of we expect John to be arrested soon would have John in complement object position and we in matrix subject position. With bare phrase structure, although John is inserted into complement object position by merge, it must be moved to complement subject position by an operation MOVE to construct the sentential complement before the matrix predicate can be formed.16 The analysis rests on an assumption that the lexicon does not contain empty categories that have no interpretation (in contrast to PRO). Thus there is no strict ordering between lexical insertion and movement operations, as there is in models which include D-structure. While classical D-structure disappears under the new analysis of phrase structure, its central property is maintained, namely, that arguments are inserted by merge into their canonical D-structure positions (i.e. where they are assigned θ-roles). Presumably this is forced by selectional properties of predicates—otherwise it would be mysterious under minimalist assumptions why the position of arguments would be relevant anywhere except at LF. Bare phrase structure theory raises the question as to which category label of the two elements concatenated by merge or move projects. Section 4.4.2 (projection of target) of Ch. 4 is devoted to showing how for movement it is always the target that projects. In essence the minimalist program raises such questions as why a VP is a ‘VP’ and not something else. To account for the order of constituents within phrases, another basic issue, Chomsky adapts the LINEAR CORRESPONDENCE AXIOM (Kayne 1994), which is based on the idea that ‘order reflects structural hierarchy universally’ (335). These are basic questions whose answers were simply assumed as stipulated in earlier analyses. In raising such questions about fundamental points of analysis, the minimalist
Generative grammar
304
program focuses on the root and beginning of syntactic inquiry, not the ‘end of syntax’ as has been claimed (e.g., Marantz 1995). Given BOCs, perhaps just FI, there must be a point in the derivation where one part splits off in the direction of PF and the rest proceeds to LF. Chomsky assumes that the presence of phonetic features at LF will cause a derivation to crash. This entails that an operation of spell-out, which splits a derivation into the two parts, extracts those aspects of SDs relevant to PF (e.g., phonetic features). Given FI, no element of a numeration having phonetic features can be merged after spell-out. Similarly, an element bearing semantic features cannot be merged after spell-out in the derivation to PF. Presumably the point at which spell-out applies in a derivation is determined by BOCs, so no stipulations are necessary. This accords with what Chomsky calls a ‘guiding intuition’ of the minimalist program: ‘that operations apply anywhere, without special stipulation, the derivation crashing if a “wrong choice” is made’ (231). Operations that apply pre-spell-out are overt because their consequences show up in PF as well as LF—thus post-spell-out operations are covert, their consequences showing up only in LF. In this way CHL can be divided into an overt and a covert component. The distinction would be a potential problem for a minimalist approach if a difference between the two components had to be stipulated, which would thus specify when spellout applied. Instead, Chomsky takes the uniformity of CHL from the numeration to LF to be an assumption of the minimalist program—in other words, the same kind of operations apply in both the overt and covert components. If this is correct, then any distinction between pre- and post-spell-out should be a reflection of other factors. There are two ways of meeting the uniformity assumption. Either the operation move applies to categories in both components (the analysis of Ch. 3) or move applies only to features (the analysis of Ch. 4). Since only feature movement is required for checking in the covert component, the minimal operation would be MOVE FEATURE, not MOVE CATEGORY.17 Chomsky notes that the latter is ‘an unnatural interpretation of the operation’ on general minimalist assumptions (262). The move-feature analysis creates a major problem for an account of overt movement, which involves categories, not just features. If only features are required to move in order to check and thereby eliminate -interpretable features from a derivation, then why does overt movement involve more than just features? Moreover, why does it involve single constituents (rather than strings of categories that do not form constituents)? The movecategory analysis answers such questions by mere stipulation. Chomsky’s analysis in Ch. 4 is, however, somewhat less than satisfying. He proposes ‘a natural economy condition’: a feature which moves ‘carries along just enough material for convergence’, ‘a kind of “generalized pied-piping’”, determined by BOCs.18 This displacement property of overt movement, the fact that constituents in sentences can occur in a position different from the one in which they are interpreted at LF, might actually constitute ‘a striking departure from minimalist assumptions in language design’, Chomsky cautions at the outset of Ch. 4 (221). If so, then one of the most interesting properties of human language falls outside the domain of the core theory. In spite of this apparent problem with overt movement, the morphologically driven feature-movement analysis is conceptually attractive, in part because it raises fundamental questions that were previously buried under a stipulation. Furthermore, it suggests a natural explanation for the economy condition PROCRASTINATE, which prefers covert movement to overt movement. Since overt
Review of the minimalist program
305
movement involves an additional operation (pied piping), pure feature movement in the covert component is always preferable on general economy grounds. Given this incomplete sketch of Chomsky’s minimalist proposals,19 let us consider the analysis of some familiar examples of NP movement. (3)
John is likely [to be here]
(4)
*John is likely [is here]
In 3 the Case feature of John will raise to the matrix Infl to be checked and to check the Case feature on Infl itself. In addition, the NP John will be pied piped to Spec IP for reasons of convergence that remain to be specified. The analysis is basically straightforward. In contrast, the analysis of 4 is open to a couple of interpretations. We know that the Case feature of John can only be checked once, since being—interpretable it becomes invisible to CHL (deletes) after it is checked. If John checks the Case feature of the complement Infl, the last resort condition, which allows a feature to move only if it enters into a checking relation with its target, might block any further movement.20 However, given the VP-internal subject hypothesis, John would have been inserted in the complement VP and therefore could have moved directly to the matrix position. In this case John will be able to check the Case feature of the matrix Infl; the movement however would violate the minimal link condition by skipping the complement Infl. But then both the D-feature and the Case feature of the complement Infl will not be checked. In either case, we are left with -interpretable features at LF, in violation of FI. The analysis of 4 shows that both last resort and the minimal link condition appear to overlap with FI.21 If elimination of redundancy is a methodological prerequisite of the minimalist program as Chomsky suggests, then it seems that we still have a fair amount of sorting out to do. The overall character of the minimalist program is highly speculative, as Chomsky notes throughout MP. In a recent paper (Chomsky 1998) he is virtually categorical on this point: ‘There are minimalist questions, but no specific minimalist answers’ (recall n. 6). Whatever answers can be discovered will result from following the research agenda of the program. Unfortunately, how this is to be done is rather unclear as compared to the research agenda outlined in Chomsky 1981. With this in mind, consider the issue raised at the outset of this review: does the minimalist program as outlined in MP constitute a major breakthrough in the study of language and mind? As we have seen, the minimalist program raises fundamental new questions that did not arise in previous work. It also leads to a radical reformulation of syntactic theory within the principles and parameters framework. The phrase structure component is reduced to a fundamental part of the transformational component. Furthermore, the modular subcomponents of UG in standard GB theory are essentially eliminated or substantially reduced in favor of economy conditions and BOCs. The subtheory of government is disallowed for conceptual reasons. Case assignment and licensing, and also Case conflicts are handled indirectly by FI and other conditions on economy, so that there seems to be no need for a special Case theory module. The empirical effects of the θ-criterion are subsumed under FI (the fact that an argument must be assigned a θ-role (but see n. 10)) and last resort, which generally prevents overt movement between two θ-positions, with perhaps the exception of overt control
Generative grammar
306
structures (see Hornstein 1996), so there may be no special θ-theory apart from some mechanism for assigning θ-roles to arguments.22 Under minimalist assumptions, binding theory too must undergo some major reformulation (as suggested in Chs. 1 and 3)— especially if the use of indices is excluded under an inclusiveness condition (see Freidin 1997b for discussion). While these are major changes, the extent to which they will lead to new and deeper insights into the nature of language and mind largely remains to be determined. In terms of empirical payoff, the minimalist program in its present state does not yet constitute a breakthrough of the same sort as the discovery of transformational generative grammar or the major conceptual shifts of the principles and parameters framework (see Freidin 1994b). Nonetheless, there is much about the minimalist program that is promising. Bare phrase structure theory in and of itself is intuitively very appealing. So too is the general idea that the nature of CHL, its function and output, is partly determined by the cognitive systems that access its interface representations. This new perspective requires a broader and more integrated approach to the cognitive system of language by raising as a central question how it interacts with other cognitive systems. Furthermore, the minimalist program has a special therapeutic value beyond the question of its validity in that by focusing on conceptual justification of assumptions it discourages ‘the temptation to offer a purported explanation of some phenomenon on the basis of assumptions that are of roughly the order of complexity of what is to be explained’ (233). To be sure, this is just good scientific practice, but this is the first time Chomsky has made it a serious issue, indicating perhaps a belief that linguistic theory might have developed to the point where it makes sense to invoke such rigorous strictures. So even though it may take a while before we have a strong sense that it is on the right track, the minimalist program seems worth pursuing.23 In the first paragraph of the first chapter of Syntactic structures, Chomsky enunciated one of the goals of the study of syntax as follows: ‘The ultimate outcome of these investigations should be a theory of linguistic structure in which the descriptive devices utilized in particular grammars are presented and studied abstractly, with no specific reference to particular languages.’ In MP, especially the fourth chapter, Chomsky has taken a giant step forward towards realizing this goal. From this perspective, the minimalist program constitutes a major breakthrough. In raising theoretical discussion of the principles and parameters model to a new level of abstraction, it offers us a glimpse of a potentially simpler and deeper theory of human language (with of course the important caveat that it is actually on the right track). However, in doing this, Chomsky has introduced a kind of discontinuity between our previous understanding and the new one he is trying to achieve. The situation is somewhat reminiscent of the one in mathematics noted in Courant 1937. The point of view of school mathematics tempts one to linger over details and to lose one’s grasp of general relationships and systematic methods. On the other hand, in the ‘higher’ point of view there lurks the opposite danger of getting out of touch with concrete details, so that one is left helpless when faced with the simplest cases of individual difficulty, because in the world of general ideas one has forgotten how to come to grips with the concrete. The reader must find his own way of meeting this
Review of the minimalist program
307
dilemma. In this he can only succeed by repeatedly thinking out particular cases for himself and acquiring a firm grasp of the application of general principles in particular cases; herein lies the chief task of anyone who wishes to pursue the study of Science. Notes
* I am indebted to Howard Lasnik and Carlos Otero for their comments on an earlier draft of this review. I would also like to thank Mark Aronoff and two anonymous referees for their suggestions. 1 As Carlos Otero notes (p.c.), ‘a close comparison of the two versions, separated by a little over a year, is not without interest’. Specifically, the formulations of issues in the latter are both expanded and sharper than in the first version. 2 While subjacency is not explicitly excluded, it is not mentioned. Without government, the barriers analysis won’t be available and the mere stipulation of bounding categories goes against the spirit of the minimalist program. Moreover, there is significant overlap between subjacency and a shortest movement/minimal link analysis. If a ‘shortest move’ economy condition is conceptually preferable, then it appears that the subjacency analysis will have to be reconstructed in other terms. 3 Consider for example the question of the notion GOVERNMENT. Chomsky asserts that the minimalist program requires us to dispense with the notion of headgovernment on conceptual grounds. In Ch. 3 an argument is made that Case and agreement can by unified as Spec-head relations, thereby significantly reducing the need for head-government. In Ch. 4 however, Case and agreement features are checked by adjoining directly to a head, which does not form a Spec-head relation. Thus we are back to a direct relation between a head and the features it checks, which is reminiscent of head-government. Still, given the analysis of checking in Ch. 4, there seems to be no need to reconstitute the notion of government. This however would be an empirical argument against the notion, not a conceptual one. As we will see below, an analysis that relies on economy conditions overlaps with government conditions, making the latter superfluous. As for antecedent-government, Chomsky suggests in Ch. 3 that it is a property of chains ‘expressible in terms of c-command and barriers’ (176). This is puzzling since the notion of barrier is based on a notion of government. Regrettably, nothing more is said about this in the remainder of MP. 4 The condition of inclusiveness provides the foundation for the bare phrase structure theory adopted under the minimalist program. Specifically ‘categories are elementary constructions from properties of lexical items, satisfying the inclusiveness condition’ (249). Thus categories cannot be constructed in terms of bar levels and the distinction between lexical item and head collapses (cf. Freidin 1992). One interesting consequence is that a head can also be a maximal projection. 5 Chomsky does not elucidate what he means by ‘special structure’. If this includes indices and bar levels, then the two criteria are partially (or perhaps completely) mutually entailing. 6 It is important to note that the minimalist program is not a theory, nor even a framework or model. 7 Chomsky also mentions here ‘or general lexical properties’—presumably referring to the head parameter. However, as Carlos Otero pointed out to me, the analysis of a universal underlying word order (tentatively adopted in Chomsky 1994 and also appearing in Ch. 4 of MP; see also Kayne 1994, §5.1) precludes the possibility of a head parameter, hence the need to postulate general properties of lexical items, an obscure notion at best. If this is on the right track, then parametric variation appears to be restricted to nonsubstantive elements of the lexicon. For additional discussion see Strozer 1994, §6.2.1. 8 More generally, the sensory-motor interface if we include sign language.
Generative grammar
308
9 Note that BOCs are not to be confused with filters that have been posited as part of CHL (e.g., the Chomsky/Lasnik variety or the Rizzi chain formation algorithms). 10 Something more needs to be said about arguments that cannot be assigned a θ-role—e.g., John in *John seems that Mary is happy. Presumably such arguments violate FI, but don’t involve a formal feature [—interpretable]. The issue is discussed in §4.6 of Ch. 4 and again on p. 347, where Chomsky concludes that an argument with no θ-role violates FI and therefore causes the derivation to crash. Alternatively, as suggested in Lasnik 1995b, it could be that such examples result in semantic unintelligibility not a crashed derivation. On the contrast between examples like this and those like 1, see Epstein 1990. 11 In Ch. 4 FI is the sole condition that determines the set of convergent derivations. From this set, the various economy conditions determine the subset of admissible derivations, that is, the most economical (hence optimal) derivation from a given numeration. Notice, however, that while convergent derivations are compared only if they come from the same numeration, this may not be a sufficient condition for determining the comparison set. Presumably some adults believe that children are irrational and children believe that some adults are irrational share the same numeration but do not compete in terms of economy. Notice that we cannot rely on Case to distinguish the examples. Chomsky (p.c.) suggests a solution whereby ‘numerations are constructed derivationally’, along the lines suggested in his fall 1995 lectures. Thus the numeration of the embedded clause is constructed and undergoes merge and then further elements are selected. So even though the two sentences contain exactly the same lexical items, the initial pieces of the numeration for each are distinct.
Although this idea that economy conditions choose among convergent derivations is enunciated at the beginning of the chapter, it does not seem to play a substantial role in much of the analysis that follows. Chomsky cites the following example (his 66). (i)
did John give which book to Mary
If we assume that adjunction of did to I checks the strong feature Q of I, then covert movement of the WH-feature is blocked by economy conditions, assuming that the derivation of (i) converges ‘with whatever interpretation it has—perhaps gibberish’. The distinction between a crashed derivation and one that converges as gibberish is presumably empirical. The issues of a choice between convergent derivations is also raised in §4.9 (expletives and economy) concerning the analysis of multiple subject constructions in Icelandic, which is too complicated to discuss in this brief review. 12 It is worth mentioning that FI raises problems for the analysis of expletives as in 2 given that expletives have no interpretation at LF. See Chs. 2 and 4 for extensive discussion. 13 Chomsky notes that BOCs ‘might turn out to be critical factors in determining the inner nature of CHL in some deep sense, or they might turn out to be “extraneous” to it, inducing departures from “perfection” that are satisfied in an optimal way’ (221). He also notes the speculative character of the proposal for BOCs:
we do not know enough about the ‘external’ systems at the interface to draw firm conclusions about conditions they impose, so the distinction between bare output conditions and others remains speculative in part. The problems are nevertheless empirical, and we can hope to resolve them by learning more about the language faculty and the systems with
Review of the minimalist program
309
which it interacts. We proceed in the only possible way: by making tentative assumptions about the external systems and proceeding from there. (222–23)
14 Whether phrase structure theory can be ‘eliminated entirely’ as Chomsky suggests (249) depends on what we take to be ‘phrase structure theory’. Presumably the local relations of head-complement and head-specifier, which are part of X-bar theory, remain—as does the fundamental concept of phrasal projection. Moreover, as Chomsky notes, ‘we still have no good phrase structure theory of such simple matters as attributive adjectives, relative clauses, and adjuncts of many different types’ (n. 22). Although we have moved beyond making stipulations in terms of phrase structure rules, it is not yet obvious that we will be able to account for all aspects of phrase structure without recourse to what might be legitimately called phrase structure theory. Thus the elimination of phrase structure theory remains a desirable prospect. For further discussion of phrase structure analysis under the minimalist program, see Atkinson 1996. 15 While this gives us essentially binary branching, it does so by stipulation. Notice also that the merge analysis is essentially that of generalized transformations in Chomsky 1975a and therefore constitutes a rejection of the argument for deep structure (hence against generalized transformations) given in Chomsky 1965. 16 This eliminates an apparent redundancy of earlier analyses where the target of movement exists as an empty category prior to the movement of the category which actually instantiates the position. There is no need to postulate such empty categories under the new analysis because the position is transformationally created by move. 17 In §5.6 (attract/move) Chomsky reinterprets the movement operation as ATTRACT FEATURE to achieve a more natural formulation of the MINIMAL LINK CONDITION (a reformulation of the shortest move requirement in Ch. 3) which is now incorporated as part of the movement operation itself, hence no longer part of economy considerations. 18 In class lectures during the fall of 1995 Chomsky tried to develop a more satisfying account of overt movement under a move-feature analysis. So it is likely that the weak account in Ch. 4 will soon be replaced. 19 Because of the limited scope of this review it was not possible to address a number of important topics, including the minimalist critique of functional categories in §10 (functional categories and formal features) which leads to multiple spec analysis over Agr-based syntax; the analysis of feature strength, the role of the economy condition procrastinate, and the analysis of expletive elements (which become a central focus because of FI as noted in n. 11). 20 I say ‘might’ because under the feature analysis of the EXTENDED PROJECTION PRINCIPLE, John might raise to the matrix to check the D-feature of Infl. Since a D-feature on a noun is +interpretable, it won’t delete when it checks Infl. 21 For an extensive critique of last resort, see Lasnik 1995b. 22 The overlap between economy conditions and the θ-criterion is also noted in Freidin 1994a. We still need to account for the fact that a predicate must discharge all of its θ-roles and that each θ-role of a predicate can be discharged only once. A feature story involving FI is easy to imagine, but depends on the assumption that θ-roles are formal features. 23 Note that this situation is no different than in the mature sciences. Witness Steven Weinberg’s recent comment on string theory: ‘It is because we expect that string theory will be testable—if not directly, by observing the string vibrations, then indirectly, by calculating whether string theory correctly accounts for all of the currently mysterious features of the standard model of elementary particles and general relativity. If it were not for this expectation, string theory would not be worth bothering with.’ (New York Review of Books, October 3, 1996, p. 56.)
17 Exquisite connections Some remarks on the evolution of linguistic theory* with Jean-Roger Vergnaud
Lappin, Levine and Johnson’s critique of the minimalist program (Natural Language and Linguistic Theory 18:665–671; henceforth LLJ) raises three important issues: the relation between the MP and its predecessors (GB theory in particular), the empirical and conceptual motivation for the MP, and the relation (if any) of theoretical linguistics to the natural sciences. Sadly, LLJ’s critique contributes virtually nothing to our understanding of these topics, as the following discussion will demonstrate. 1 MP vs. GB As Chomsky and others have pointed out (see Chomsky 1995d; Freidin, 1997a; Lasnik, 1999), the MP is grounded in the principles and parameters framework, just like the GB theory it has to a large extent superseded. It shares at least three basic assumptions with virtually all of its predecessors within modern generative grammar. First and foremost, the mind/brain contains a language faculty, a component that interacts with other cognitive systems. Next, the cognitive system of language connects with performance systems via levels of linguistic representation, perhaps limited to only two external systems, one involving articulation and perception of the physical linguistic signal and the other involving the encoding or decoding of the meaning of the signal. Under this limitation there are only two interface levels, phonetic form (PF) and logical form (LF). Finally, performance systems do not differ across languages. Even at this fundamental level Chomsky takes nothing for granted, noting that these assumptions are ‘not at all obvious’ (1995d: 3). As both the GB theory and the MP are grounded in the principles and parameters model, they share further specific assumptions [cf. Chomsky, 1995d: 170]: (i)regarding computational system for human language, CHL, the initial state S0 contains invariant principles and parameters (i.e., options restricted to functional elements) (ii)selection Σ of parameters determines a language (iii)a language determines an infinite set of linguistic expressions (SDs), each pair (π, γ) obtained from the interface levels, PF and LF (iv)language acquisition involves fixing Σ (v)the grammar of language states just Σ, lexical arbitrariness and the PF component aside These shared assumptions have been standard for nearly two decades. There is one further assumption articulated for the first time in Chomsky (1993), which could easily have been proposed prior to the MP. Chomsky offers as a narrow conjecture the suggestion that there is no variation in the overt syntax or the LF
Exquisite connections
311
component. Ignoring PF options and lexical arbitrariness, Chomsky suggests that Variation is limited to nonsubstantive parts of the lexicon and general properties of lexical items,’ in which case ‘there is only one computational system and one lexicon’, apart from the variation mentioned (1995d: 170).1 Thus viewed at the appropriate level of abstraction, there is only one human language. Of course the truth or falsity of this bold conjecture remains to be established. We come at last to the three assumptions that are unique to the MP: (1) that the interface levels LF and PF are the only relevant linguistic levels, in spite of apparent empirical evidence to the contrary (p. 169), (2) that all conditions are interface conditions (p. 194), and (3) that a linguistic expression is the optimal realization of these conditions (p. 194). These three assumptions constitute what Chomsky (2001) calls the strong minimalist thesis. In contrast to the six basic assumptions that the MP shares with GB theory, the three assumptions unique to the MP do not add up to ‘a major paradigm change in the theory of grammar’ as will become clear from the following discussion. For LLJ the contrast between GB theory and the MP is that ‘the MP adds economy principles’ ‘in addition to local constraints on operations and the structures they produce.’ While it is correct to highlight the role of economy in the development from GB to the MP, it is inaccurate to claim that economy conditions are an innovation unique to the MP. The discussion of the role of economy in grammatical analysis begins with the advent of modern generative grammar—i.e., Chomsky’s MMH (1951). There Chomsky identifies two kinds of criteria of adequacy for grammars—one for the correct description of the structure of the language under analysis, and the other for requirements imposed by its special purposes, ‘or, in the case of a linguistic grammar having no such special purposes, requirements of simplicity, economy, compactness, etc.’ (1951:1). In a footnote, Chomsky supplies the following clarification: Such considerations are in general not trivial or ‘merely esthetic’. It has been recognized of philosophical systems, and it is, I think, no less true of grammatical systems, that the motives behind the demand for economy are in many ways the same as those behind the demand that there be a system at all. Cf. Goodman (1943). In other words, a grammar is not merely a description of a language; it is moreover an explanatory theory about the structure of a language—i.e., why a language has the properties it does rather than other conceivable properties. It is in this context that considerations of economy, etc. first came into play.2 Applying economy conditions to the selection of derivations within each grammar represents a significant leap from applying them to the selection of grammars. Although economy conditions like Full Interpretation (FI) and Last Resort (LR) were not articulated in the earliest discussions, both were introduced as part of GB, the former in Chomsky (1986a) and the latter in Chomsky 1991 (which first appeared in MITWPL #10 in 1989, three years prior to the advent of the MP). What happened in the 1990s, in a nutshell, was that certain economy conditions (e.g., FI) were interpreted in a way that made it natural for them to supplant a significant portion of the older GB principles. Consider for example the Case Filter. It prohibits any phonetically realized nominal expression that is not marked for Case. Within GB, why this should be is just a matter of
Generative grammar
312
stipulation that appears to accord with the facts. From the point of view of economy, Case features are extraneous to interpretation at the LF interface at least and therefore should be eliminated from the derivation before LF. Otherwise, these uninterpretable features at the LF interface will violate FI, causing the derivation to crash. Thus legibility conditions imposed by the cognitive system that interfaces with CHL determines how derivations proceed with respect to Case features. In this way FI and the Case Filter overlap in a way that FI subsumes the empirical effects of the Case Filter, but not conversely.3 The preference for FI over the Case Filter is just the standard preference for the more general constraint, all things being equal. In short, the heuristic of eliminating overlapping conditions, which has resulted in much fruitful research over several decades, is one of the central motivations for switching from the analysis of GB to that of the MP.4 The same logic provides one of the strongest motivations for eliminating the relation of government from current discussion. Consider the standard case of an ECP violation given in (1), where T stands for the functional category Tense and t is the trace of John. (1)
*John T is believed [that t T is happy]
Under MP analysis, (1) can be interpreted as a violation of FI in the following way. Suppose that the nominative Case feature of John is checked in the embedded clause by T, as is the nominative Case feature of T. When John moves to the subject position of the matrix clause, Spec-T, only the D-feature of the nominal is available to check the Dfeature of the matrix T. The nominative Case feature of John has been checked and thereby eliminated, so that the nominative Case feature of the matrix T remains, causing the derivation to crash at LF because of FI.5 Thus the phenomena that fall separately under the Case Filter and the ECP within GB, are captured under a single principle within the MP analysis—moreover, one that functions as a natural legibility condition that regulates the interface between CHL and other cognitive systems.6 Given the greater generality of FI, we would prima facie want to eliminate the ECP in favor of the more general principle.7 Furthermore, this analysis, which explains deviance on the basis of legibility conditions imposed by cognitive systems that interface with CHL, strikes us as a more promising explanatory account than the postulation of various constraints internal to CHL that basically reflect the complexity of the phenomena in an essentially descriptive fashion.8 Note incidentally that this approach is highly reminiscent of the one followed in Chomsky and Lasnik (1977). However, even if we discount impressions of what might be a more promising explanatory account, the methodological requirement to eliminate overlapping conditions whenever possible motivates the abandonment of much of the machinery of GB in favor of the MP analysis. Thus the change from GB to the MP is motivated by the same methodology that has always motivated changes in syntactic theory. Another motivation for exploring the MP rather than continuing with GB concerns the central topic of phrase structure. The MP introduces bare phrase structure theory, which eliminates the ambivalent top-down and bottom-up view of phrase structure that has been characteristic of the field since the earliest formulations of X-bar theory in the late 1960s. With bare phrase structure there is only bottom up analysis of a specific kind.9 The theory of bare phrase structure provides a derivational mechanism for syntactic representations,
Exquisite connections
313
which has been missing from GB since the elimination of phrase structure rules (circa 1980) on the grounds that they are redundant given X-bar theory, the general principles of GB, and the specific properties of lexical items. Furthermore, bare phrase structure theory as incorporated within the MP cannot produce canonical D-structures, hence the elimination of D-structure as a level of representation follows from the nature of CHL rather than a methodological stipulation. Bare phrase structure eliminates in principle categorial distinctions for levels of phrasal projection, thereby conforming to the Inclusiveness Condition, which restricts computations solely to the elements (features) contained in lexical items. Given this condition, computations cannot introduce new elements such as bar levels, indices, or syntactic categories that are not already part of the lexical items computed. Chomsky (1995d) is very clear that the Inclusiveness Condition provides one criterion for the ‘perfection’ of CHL. Although precisely what Chomsky intends by talking about language as a perfect system may not be totally clear in Chomsky (1995d), this discussion is clarified considerably in Chomsky (2000), written in 1998. Here the issue is represented in terms of the language faculty (FL) as a solution to legibility conditions imposed by the cognitive systems that interface with it. An optimal solution would encompass a CHL restricted to just the properties of lexical items involved in computations plus just those few operations required for derivations that connect LF with PF (perhaps just the elementary operations of adjunction/concatenation (for Merge and Move) and deletion (for feature checking)). If the behavior of derivations is controlled solely by legibility conditions imposed by other cognitive systems at the interfaces, then CHL can be reduced to these bare necessities, excluding additional internal machinery like the Case Filter and the EGP.10 Thus in terms of simplicity, economy, and non-redundancy the MP is clearly preferable to GB. 2 On conceptual naturalness Appeal to general considerations of conceptual naturalness such as simplicity, economy, or non-redundancy is not unique to generative grammar.11 It has been employed fruitfully in the more developed natural sciences—in particular, theoretical physics. The discussion of physics that follows attempts to elucidate this notion in a way that, ultimately, should illuminate its role in contemporary theoretical linguistics. Consider, for example, Einstein’s principle that all physical laws must be Lorentzinvariant. As Putnam (1962) notes: ‘This is a rather vague principle, since it involves the general notion of a physical law. Yet in spite of its vagueness, or perhaps because of its vagueness, scientists have found it an extremely useful leading principle.’ This is because they have ‘no difficulty in recognizing laws’: a law of nature will be an equation relating ‘real magnitudes’ that has ‘certain characteristics of simplicity and plausibility’ In other words, determining whether Einstein’s principle may be applied to any particular case will involve ‘general considerations of conceptual naturalness’. In a different area, Bohr’s quantum mechanical ‘Correspondence Principle’ (circa 1913) is arguably rooted in such considerations. It states that, in the classical limit, the results obtained from quantum mechanics should converge with those obtained from classical mechanics.12 According to some physicists, the research work carried out during
Generative grammar
314
the years 1919–1925 that finally led to quantum mechanics may be described as systematic guessing guided by the Correspondence Principle. This is then a case where considerations of conceptual naturalness appear to have played a direct role in the progress of science. The appeal to conceptual naturalness manifests itself also in the quest for mathematical beauty, which motivates many a theoretical physicist as Dirac13 notes: Theoretical physicists accept the need for mathematical beauty as an act of faith. There is no compelling reason for it, but it has proved a very profitable objective in the past. For example, the main reason why the theory of relativity is so universally accepted is its mathematical beauty. (Dirac, 1968) In the natural sciences, while hypothesis formation may be guided by appeals to conceptual naturalness, any given hypothesis will carry weight only to the extent that it can be subjected to the inexorable test of experiment. This is the essence of the scientific method, which governs physics and linguistics alike. But there is no chosen method for elaborating the scientific hypotheses themselves. The scientific method is not concerned with that, nor could it be, for it is not possible to set up explicit rules or criteria in this area. This does not mean that ‘anything goes’. But it does mean that there is a lot of diversity in the ways scientists deal with problems and arrive at solutions. Dirac discusses this diversity of methods in theoretical physics: One can distinguish between two main procedures for a theoretical physicist. One of them is to work from the experimental basis. For this, one must keep in close touch with the experimental physicists. One reads about all the results they obtain and tries to fit them into a comprehensive and satisfying scheme. The other procedure is to work from the mathematical basis. One examines and criticizes the existing theory. One tries to pinpoint the faults in it and then tries to remove them. The difficulty here is to remove the faults without destroying the very great successes of the existing theory. There are these two general procedures, but of course the distinction between them is not hard-and-fast. There are all grades of procedures between the extremes. (Dirac, 1968) Dirac designates the two types of procedures as ‘experimental’ and ‘mathematical’, respectively. He then proceeds to give several examples of the mathematical procedure: Maxwell’s investigation of an inconsistency in the electromagnetic equations of his time led to his introducing the displacement current, which led to the theory of electromagnetic waves Einstein noticed a difficulty in the theory of an atom in equilibrium in blackbody radiation and was led to introduce stimulated emission, which has led to the modern lasers, [this is Einstein, 1917; RF&JRV] But the supreme example is
Exquisite connections
315
Einstein’s discovery of his law of gravitation, which came from the need to reconcile Newtonian gravitation with special relativity’. (Dirac, 1968)14 Dirac’s notions also apply to the founding work in quantum mechanics between 1913 and 1925. The following description is striking: Whether one follows the experimental or the mathematical procedure depends largely on the subject of study, but not entirely so. It also depends on the man. This is illustrated by the discovery of quantum mechanics. Two men are involved, Heisenberg and Schrödinger. Heisenberg was working from the experimental basis, using the results of spectroscopy, which by 1925 had accumulated an enormous amount of data. Much of this was not useful, but some was, for example the relative intensities of the lines of a multiplet. It was Heisenberg’s genius that he was able to pick out the important things from the great wealth of information and arrange them in a natural scheme. He was thus led to matrices. Schrödinger’s approach was quite different. He worked from the mathematical basis. He was not well informed about the latest spectroscopic results, like Heisenberg was, but had the idea at the back of his mind that spectral frequencies should be fixed by eigenvalue equations, something like those that fix the frequencies of systems of vibrating springs. He had this idea for a long time, and was eventually able to find the right equation, in an indirect way. (Dirac, 1968) The ‘mathematical procedure’ typically arises in what Husserl has called the ‘Galilean style of science’, in recognition of its origins in the work of Galileo. Weinberg (1976) characterizes this style as follows: …we have all been making abstract mathematical models of the universe to which at least the physicists give a higher degree of reality than they accord the ordinary world of sensation. More generally, one can define Galilean science as the search for mathematical patterns in nature.15 As Chomsky notes, implementing the Galilean style entails a ‘readiness to tolerate unexplained phenomena or even as yet unexplained counterevidence to theoretical constructions that have achieved a certain degree of explanatory depth in some limited domain, much as Galileo did not abandon his enterprise because he was unable to give a coherent explanation for the fact that objects do not fly off the earth’s surface’ (1980, 9–10). A significant feature of the Generative Revolution in linguistics has been the development of a Galilean style in that field. And, to a great extent, the recent developments within MP must be viewed in this light—specifically, as Dirac’s mathematical procedure (method) at work within linguistics. Dirac has identified two main methods within the mathematical procedure itself: one is to remove inconsistencies,
Generative grammar
316
the other, to unite theories that were previously disjoint (see Dirac, 1968). In linguistics, the inconsistencies primarily concern overlapping grammatical conditions, as discussed earlier, which conflict with the basic assumption that CHL has an optimal design. Note further that this assumption itself relates directly to the quest for mathematical beauty, which informs the Galilean style. One aspect of Dirac’s mathematical procedure as applied in linguistics involves the effort to extend and deepen the mathematical formalism used to express syntactic concepts and syntactic principles. We will refer to this facet of the Minimalist endeavor as the ‘Generative Program’ for the study of language (GP) because it originates in Chomsky’s foundational work in the fifties and sixties and has been essential to the development of the Galilean style in linguistics.16 However, it should be obvious that linguistics and physics are at very different stages of mathematical maturation. From this perspective, it is useful to distinguish the ‘Galilean character’ of an area, i.e., how much of the subject matter can be analyzed mathematically, from what one could call its ‘Pythagorean character’, how much of mathematics is put to use in the Galilean treatment. Linguistics and physics have the same Galilean character, although they obviously differ in Pythagorean character.17 The difference in mathematical status between physics and linguistics partly reflects the more general difference between physics and biology—especially from the perspective that generative grammar is ultimately a branch of theoretical biology, more specifically, of theoretical developmental biology. In biology, the genetic code rather than mathematics has been the tool of choice for explaining life. This, however, appears to be a historical accident, not the result of some principled difference between biology and the physical sciences. Mathematics has a central explanatory role to play in biology, as discussed in Stewart (1998), whose title, Life’s other secret, is intended as contrapuntal to ‘life’s first secret’, which is the genetic code: The mathematical control of the growing organism is the other secret— the second secret, if you will—of life. Without it, we will never solve the deeper mysteries of the living world—for life is a partnership between genes and mathematics, and we must take proper account of the role of both partners. This cognizance of both secrets has run like a shining thread through the history of the biological sciences—but it has attracted the mavericks, not the mainstream scientist. Instead of thinking the way most biologists think, these mavericks have been taking a much different approach to biology by thinking the way most physical scientists and mathematicians think. This difference in working philosophy is the main reason why understanding of the deeper aspects of life has been left to the mavericks. (Stewart, 1998: xi) The main message of d’Arcy Thompson, one of the great mavericks in biology, is that ‘life is founded on mathematical patterns of the physical world.’18 Thus one role of theoretical biology is to identify such mathematical patterns and elucidate the way they function in organisms:
Exquisite connections
317
The role of mathematics [in biology] is to analyze the implications of models—not ‘nature red in truth and complexity’, as Tennyson did not quite say, but nature stripped down to its essence. Mathematics pursues the necessary consequences of certain structural features. If a planet can be considered a uniform sphere, what would its gravitational attraction be like?… If the movement of cells in some circumstances is controlled by physical forces and does not greatly depend on complicated internal features such as mitochondria, what will the cells do? From this point of view, the role of mathematics is not to explain biology in detail, but to help us separate out which properties of life are consequences of the deep mathematical patterns of the inorganic universe, and which are the result of more or less arbitrary initial conditions programmed into lengthy sequences of DNA code. (Stewart, 1998:243–244) It is worth noting at this point that Chomsky was aware that both approaches, separately or jointly, might account for the human language faculty In criticizing the empiricist view of language acquisition in the first chapter of Chomsky 1965 (written in 1958–1959, as mentioned in Huybregts and van Riemsdijk 1982), he notes: …there is surely no reason today for taking seriously the position that attributes a complex human achievement entirely to months (or at most years) of experience, rather than to millions of years of evolution or to principles of neural organization that may be even more deeply grounded in physical law… (p. 59) However, twenty years later, Chomsky is openly skeptical of a purely genetic approach to evolution. It does seem very hard to believe that the specific character of organisms can be accounted for purely in terms of random mutation and selectional controls. I would imagine that biology of 100 years from now is going to deal with evolution of organisms the way it now deals with evolution of amino acids, assuming that there is just a fairly small space of physically possible systems that can realize complicated structures. (Huybregts and van Riemsdijk, 1982:23) From this point of view, the more promising approach is ‘d’Arcy Thompson’s attempt to show that many properties of organisms, like symmetry, for example, do not really have anything to do with a specific selection but just with the ways in which things can exist in the physical world’ (Huybregts and van Riemsdijk, 1982:23).19 The mathematical perspective informs the Generative Program (GP), in effect, ‘the study of language’s other secret’. Thus Chomsky’s mathematical work defines a central facet of GP, beginning with his construction of the foundations of modern generative grammar in Chomsky (1951) and (1975a).20
Generative grammar
318
Because the MP is a particular implementation of GP, the notion of ‘perfection’ often invoked within MP is ultimately a mathematical notion, calling for a higher level of mathematical formalization in syntax.21 The Minimalist conjecture that CHL is a ‘perfect system’ is a tentative claim about the form and the complexity of each computation. The claim is (i) that each computation can be represented as an abstract mathematical structure completely defined by interface (output) conditions and ii) that this structure is an extremum in some mathematical space. A natural metric for the comparison of computations is their complexity as measured by their length. Note that, if the only constraints on CHL are those that follow from legibility conditions at the interfaces, then it is unavoidable that some notion of computational cost should be part of the definition of ‘effective’ computations, since, within such a system, it is always possible to combine a computation with a ‘vacuous one’ (i.e., one that has a null effect). The unidirectionality of movement (if it is a fact) would then be a particular design feature aimed at reducing the likelihood of vacuous steps. Considerations of economy have a long standing legitimacy in the physical sciences. It was in physics that an economy principle of any depth was first advanced.22 This was the principle of least time, discovered by Fermat circa 1650.23 That principle states that, out of all possible paths that it might take to get from one point to another, light takes the path which requires the shortest time.24 Fermat’s principle is a particular instance of the general physical principle of ‘least action’. Another important economy principle of physics is ‘the idea that the inorganic world is fundamentally lazy: it generally behaves in whatever manner requires the least energy’ (Stewart, 1998:16). That idea was for Thompson (1942) a central principle underpinning the mathematics of growth and form found in living organisms. Comparing Fermat’s principle with Snell’s theory of light,25 Feynman notes that such economy principles have a special philosophical character distinct from causal explanations of phenomena. With Snell’s theory we can ‘understand’ light. Light goes along, it sees a surface, it bends because it does something at the surface. The idea of causality, that it goes from one point to another, and another, and so on, is easy to understand. But the principle of least time is a completely different philosophical principle about the way nature works. Instead of saying it is a causal thing, that when we do one thing, something else happens, and so on, it says this: we set up the situation, and light decides which is the shortest time, or the extreme one, and chooses that path. (Feynman et al., 1963:26–7) Feynman’s observation extends to all economy considerations developed in the natural sciences. Economy principles fall under what 17th and 18th philosophers called ‘final causes’, as opposed to ‘efficient causes’.26 Efficient causes are essentially mechanistic in nature like those invoked in a Newtonian account of the dynamics of a point particle, for example, or Snell’s account of refraction as described by Feynman above. Final causes involve a deeper level of understanding, as Feynman notes:
Exquisite connections
319
Now in the further development of science, we want more than just a formula. First we have an observation, then we have numbers that we measure, then we have a law which summarizes all the numbers. But the real glory of science is that we can find a way of thinking such that the law is evident. (Feynman et al., 1963:26–3) Thus, the distinction between efficient and final causes is locally one of levels of analysis and globally one of levels of explanation. The notion level’ (of analysis, of explanation) is evidently crucial. The natural sciences provide instances where successful explanatory theories that had been developed at a certain level were later unified with theories at some other level. This is the case for classical thermodynamics, which is deducible from statistical mechanics (hence a reduction). Also the unification of structural chemistry with physics was made possible by the development of quantum mechanics, which provided a common foundation (see Chomsky 1995a, and Smith, 1999 for discussion). However, the explanatory import of a theoretical principle at some given level L is in general relatively independent of the possibility of unifying L with other levels. A case in point is that of the principle of least action’ mentioned above (the general principle subsuming Fermat’s principle of least time), which is reducible to other principles in every area where it applies (see Jourdain, 1913 and Lanczos, 1970 for discussion). Thus, it applies in classical mechanics, where it is known as ‘Hamilton’s principle’. And, indeed, Hamilton’s principle is an alternative formulation of classical mechanics, equivalent to the Newtonian formulation. As it turns out, though, the Hamiltonian formulation has desirable features not found within the Newtonian formulation. For example, the Hamiltonian formalism can be generalized to all types of coordinates and, furthermore, is more convenient than Newton’s equations when the system is complex. But the real importance of the Hamiltonian formalism arises from the fact, both, that it can be generalized to classical electricity and magnetism (with an appropriate Lagrangian) and that it constitutes the point of departure for the quantization of physical systems (see the discussion in Cohen-Tannoudji et al., 1996:1476–1491, for example). There may be deep reasons for this remarkable generality. The following excerpt from Toffoli (1999) is intriguing in that respect: We are taught to regard with awe the variational principles of mechanics [such as Hamilton’s principle RF-JRV]. There is something miraculous about them, and something timeless too: the storms of relativity and quantum mechanics have come and gone, but Hamilton’s principle of least action still shines among our most precious jewels. But perhaps the reason that these principles have survived such physical upheavals is that after all they are not strictly physical principles! To me, they appear to be the expression, in a physical context, of general facts about computation, much as the second law of thermodynamics is the expression, in the same context, of general facts about information. More specifically, just as entropy measures, on a log scale, the number of possible microscopic states consistent with a given macroscopic
Generative grammar
320
description, so I argue that action measures, again on a log scale, the number of possible microscopic laws consistent with a given macroscopic behavior. If entropy measures in how many different states you could be in detail and still be substantially the same, then action measures how many different recipes you could follow in detail and still behave substantially the same. (Toffoli, 1999:349–350) If this is on the right track, the computational significance of the Hamiltonian formalism supersedes any deduction of it in any particular subdomain.27 The computational nature of economy considerations provides a link between physics and linguistics, at least metaphorically. Whether it is stronger than that will have to be determined by a future neuroscience that can validate the physical approach to complex mental structures as suggested by Chomsky extending the views of d’Arcy Thompson. In any event, economy considerations contribute substantially to what constitutes the ‘perfection’ of the computational system in both domains. Whether these considerations for each domain turn out to be related or the same remains an empirical question for the future. In linguistics, there are several ways the ‘perfection’ of CHL could be manifested in terms of economy conditions. Shortness of derivation is only one symptom of perfection. Another manifestation, possibly equivalent in some cases, would be the existence of symmetries across levels of analysis, given that such symmetries enhance the economy of computations. To illustrate, consider the following well-known contrast in anaphoric interpretation for the paradigm in (2):28 (2)
a. b.
Mary thinks she solved the problem, She thinks Mary solved the problem.
While Mary in (2a) may be construed as anaphoric with she, this is not a possible construal for (2b). Exactly how we account for this depends crucially on what representations are available. Prior to the Minimalist Program these anaphoric representations would be given in terms of co-indexing generated by a rule of Index NP (see Freidin and Lasnik, 1981 for discussion). Thus the construals under discussion would be given as (3a) and (3b) respectively, where the viable construal of (2b) is given as (3c). (3)
a. b. c.
Maryi thinks shei solved the problem. *Shei thinks Maryi solved the problem. Shej thinks Maryi solved the problem.
However, given the Inclusiveness Condition (4), which we take to be central to the Minimalist Program, indices are not legitimate elements of representations.
Exquisite connections
321
(4) Inclusiveness Condition ‘Outputs consist of nothing beyond properties of the lexicon (lexical features)’ (Chomsky 1995d: 225).
Therefore the construals of (2) indicated in (3) will have to be represented another way. We shall assume that a definite pronoun is a definite description with a silent NP component (cf. Postal, 1966 and Brody 1982). Specifically, we posit the following underlying representation for a pronoun: (5) [DP ], with component.29
the agreement features of the nominal expression and
the silent NP
For example, the pronoun she has the representation in (6): (6)
[DP [+ def] [3rd person, singular, feminine]
]
The form she is taken to be the PF realization of the morphosyntactic representation in (7): (7)
[DP [+ def] [3rd person, singular, feminine]]
The NP component of the pronoun determines its interpretation: two different interpretations of a pronoun reflect two distinct underlying representations of that pronoun. For example, the sentence in (2a) is represented as in (8) when she is construed as anaphoric with Mary, but as in (9) when she is construed as referring to Clea30: (8)
Mary thinks [[+def] [3rdpers., sg., fem.]
(9)
Mary thinks [[+def] [3rdpers., sg., fem.]
] solved the problem. ] solved the problem.
We propose to relate the interpretive contrast in (2) to symmetries in the representations of the structures involved. The defining property of a pronominal element like she in (2) is that its PF representation is invariant under substitution of its NP component. Call this the pronominal symmetry: (10) Pronominal Symmetry Let pro be some singular pronoun analyzed as [DP [+ def] pro is invariant under the substitution in (i): (i) NP → NP′
]. The PF representation of
Generative grammar
322
No matter what representation is assigned to NP, the PF representation of pro remains constant. We formalize this as in (11): (11) Let pro be some occurrence of a pronominal item in a structure Σ: (i) pro = [DP [+ def] [nth person, α number, β gender] ]. Define pro(NP’) to be the pronominal element obtained by substituting NP′ for NP in (i). Define the range of pro, denoted by <pro>, to be the set of all pronouns pro(NP’) for which NP’ has the same agreement feature specifications as NP. Note that <pro> includes pro itself. It also includes such descriptions as pro (scientist that discovered radioactivity), pro(Clea), pro (triggerhappy officer), etc.31
Thus all elements in the range of pro share the same PF representation. Now, there is a general principle in grammar that items in a structure are not interpreted in isolation, but always with respect to some larger domain. Technically, grammar constructs an interpretation of the head and of the specifier of x only at the level of some constituent properly containing x. Call this the Generalized Phase Conjecture, in reference to the analysis proposed in Chomsky (1999): (12) Generalized Phase Conjecture (GPC) Given some constituent C, the head of C and its specifier are interpreted at the level of a constituent [P…C…], where P properly contains C. P is called the phase for C.
Chomsky (1999) considers a notion of phase that seems appropriate for the interpretation of expressions involving displacements. We conjecture that a different notion of phase exists for the assessment of anaphoric relations. Specifically, the phase for a pronominal expression pro is its c-command domain.32 Considering now the paradigm in (2), let us call Pshe the phase for the pronoun she. For the form in (2a), Pshe is the embedded TP [she solved the problem]. The pronominal symmetry associated with she carries over to Pshe: the PF representation of the phase of she is invariant under substitution of NP in the representation of she, quite obviously. We assume this to be a general requirement for phases, stated as (13): (13) The Principle of Phasal Coherence (PPC) Given some structure Σ containing constituent x, let Px be the phase for x (i.e., the minimal phase containing x.) Then, every interpretive symmetry of x must also be a symmetry of Px.
Given the PPC, the PF invariance of a pronoun pro, which constitutes the pronominal symmetry, must in general carry over to the phase of pro Ppro. We evaluate satisfaction of the PPC by extending the notion of ‘range’ to Ppro:
Exquisite connections
323
(14) In a structure Σ, let pro (NP) be some occurrence of a pronominal item and let Ppro be the phase of pro(NP) in Σ. Denote by Ppro (NP’) the constituent obtained from Ppro by substituting NP′ for NP in pro (NP). Define the range of Ppro relative to pro, denoted by , to be the set of all constituents Ppro (NP′) for which NP has the same phi-feature specifications as NP. Note that includes Ppro itself.
Accordingly the range of pro in (15a) establishes the set of parallel structures in (15b): (15)
a. b.
<pro> = {[DP [+ def] = {PP TO ([DP [+ def]
Then: (16) The pair (pro, Ppro) satisfies the PPC only if all structures in share the same PF.
In this way (2a) satisfies the PPG.33 Consider next whether ii) satisfies the PPC. In that structure, the phase Pshe is the matrix TP containing in addition to the pronoun a second DP which could, but need not, relate to the interpretation of the pronoun. In the case where she in (2b) is interpreted as Mary, the corresponding representation is that in (17) (with the phi-features [3rd person, singular, feminine]): (17)
[[[ + def]
] thinks Mary solved the problem]
The structure in (17) contains the accidental chain in (18): (18)
(
Mary)
The set of parallel structures established by the range of pro in this case includes one structure in which a pair of expressions are anaphorically linked, to wit the structure in (17). This conformation is subject to the Parallelism Principle34 as formulated in (19): (19) Parallelism Principle for anaphora Let be a set of parallel structures and let (N,pro) be a pair of nominal expressions with pro a pronoun such that N and pro are anaphorically linked in Σp, Then, either (i) the anaphoric link remains constant across all structures in (the case of ‘sloppy identity’) or (ii) the value of pro remains constant across all structures in
Generative grammar
324
Given the definition of the range of pro, case (ii) doesn’t apply to the set of parallel structures . The application of case (i) amounts to revising the definition of the range as follows: (20) In a structure Σ, let pro (NP) be some occurrence of a pronominal item and let Ppro be the phase of pro(NP) in Σ. Denote by Ppro(NP′ [i]) the constituent obtained from Ppro by substituting NP′ for some occurrence NP[i] of NP in Ppro. The range of Ppro relative to pro, denoted by , is defined as the maximal set of structures Ppro(NP′ [i]) that verifies (i)—(iii): (i)
NP′ has the same phi-feature specifications as NP
(ii)
includes Ppro
(iii)
obeys the Parallelism Principle.
If we apply this definition to the structure in (17), then, the range of Pshe relative to she includes such structures as those in (21): (21)
: (i)
[[[+ def]
(ii)
[[[+ def]
(iii)
Etc.
] thinks Clea solved the problem] ] thinks Susan solved the problem]
Because the set in (21) is not PF invariant, the PPC is violated.35 In this way, the construal (3b) of (2b) is excluded. Note that, in the case of the construal in (3c), the range of Pshe may not include the structure in (17)—the structure where she is anaphoric with Mary—by (ii) and (iii) of (20). It is easy to check that (2b), under construal (3c), satisfies the PPC: no matter the value of pro within the admissible range, the pronoun and its phase will both remain PF invariant. In essence, Principle C reflects a conflict between Parallelism and Phasal Coherence: in the case of a structure such as (17), there is no coherent definition of ‘range of a phase’ that can satisfy both principles.36 To summarize, Principle C follows from the interaction of the Principle of Phasal Coherence, related to QR, with the Parallelism Principle.37 This account immediately extends to the contrast in (21) if the chunk him in himself is treated as a pronoun falling under the analysis above: (22)
a. b.
Clea believes Luci to have introduced himself to Mary, Clea believes himself to have introduced Luci to Mary.
Exquisite connections
325
The above account also extends to the following paradigm from French: (23)
a.
Le juristei sait très bien qu’ili est en difficulté. ‘The juristi knows very well that hei has a problem.’
b.
*Ili sait très bien que le juristei est en difficulté. ‘Hei knows very well that the jurist has a problem.’
c.
Ili sait très bien que le juriste qu’ili est est en difficulté. ‘Hei knows very well that the jurist that hei is has a problem.’
d.
Pierrei sait très bien que le juriste qu’ili est est en difficulté. ‘Peteri knows very well that the jurist that hei is has a problem.’
e.
*Ili sait très bien que le juriste que Pierrei est est en difficulté. ‘Hei knows very well that the jurist that Peteri is has a problem.’
f.
*Le juriste qu’ili est sait très bien que Pierrei est en difficulté. ‘The jurist that hei is knows very well that Peteri has a problem.’
g.
Le juriste que Pierrei est sait très bien qu’ili est en difficulté. ‘The jurist that Peteri is knows very well that hei has a problem.’
h.
Le juriste qu’ili a nommé sait très bien que Pierrei est en difficulté. ‘The jurist that hei appointed knows very well that Peteti has a problem.’
(23a–b) show the standard contrast for disjoint reference under c-command as in (2). In surprising contrast, (23c) allows the coreferential interpretation between the pronominal matrix subject and the complement subject via the pronoun in the relative clause. The same anaphoric behavior obtains when pronominal matrix subject is replaced by an R-expression, as in (23d). However, disjoint reference obtains again if the pronoun in the relative clause in (23c) is replaced by an R-expression, as illustrated in (23e). Note that (23f) results from transposing the matrix and complement subjects in (23d) and thus this pair is exactly parallel to (23a—b). This analysis extends to the pair (23e,g). The example in (23h) is grammatical as expected, in contrast to (23f). The paradigm in (23) shows that the constituent [le juriste que DP est] ‘the jurist that DP is’ has the same anaphoric behavior as the DP in it. For the purpose of applying Principle C, it behaves as a pronoun when DP is a pronoun and as a name otherwise.38 DP in turn behaves as if it occupied the position of the head modified by the relative clause. Noting that the predicate juriste and its subject DP within the restrictive relative clause construction [le juriste que DP est] share the same set of phi-features we shall assume that the notion of symmetry extends to such pairs of constituents: (24) Let (C, C′) be a pair of constituents in Σ that share the same phi-features. Let S be some interpretive symmetry that holds of the pair (C, C′). The PPC is extended to such a case, requiring that S also hold of the minimal phase for (C, C′).
Generative grammar
326
Consider in this light the form [le juriste que pro (NP) est], with pro(NP) a pronoun. The PF of the pair (juriste, pro(NP)} remains invariant under the substitution of NP for NP in the pair ((juriste, pro(NP)). By the above extension, pro(NP) establishes a range not only for its own phase, but also for the phase of the raised predicate le juriste, since pro(NP) and juriste share the same phi-features. The PPC gives rise to the contrast between (23g) and (23h). Note that (24) entails that, in a similar fashion, the notion of symmetry may be extended to the pair (Mary, she) in the structure in (3a), since the DP Mary and the pronoun she share the same phi-features. However, in that case, if an NP different from Mary is substituted for Mary within the pair (Mary, the PF of the pair is altered (we assume that the substitution takes place across the board). No PF invariance obtains and the PPC is then not relevant to the pair (Mary, she). To the extent that the kind of analysis proposed above is viable, it provides a modest illustration of what is being referred to as the ‘perfection’ of the grammatical system. The possibility then arises that the abstract analytical principles involved in the formal definition of a computation turn out to have exactly the right empirical consequences.39 This is an exciting prospect, which, combined with that of potentially rich mathematical developments, is stirring imaginations. The authors of this note understand the excitement, and share in it. Uriagereka’s Rhyme and reason is a particular expression of that entirely natural and legitimate reaction. In essential respects, linguistics is no different from other fields in natural sciences at comparable stages of development.40 3 Methodological issues But apart from prospects for any line of research, there is the more concrete methodological question of how to proceed. LLJ propose as a model Arthur Holly Compton’s Nobel Prize winning work on the quantum theory of the scattering of X-rays and γ-rays by light elements. Compton discovered that, when X-rays of a given frequency are scattered from (essentially) free electrons at rest, the frequency of the scattered X-rays is not unaltered, as the classical theory would predict, but decreases with increasing scattering angle. He described this effect by treating the rays as relativistic particles of energy hυ and momentum hυ/c, and by applying the usual energy and momentum conservation laws to the collision. One could characterize Compton’s contribution as an instance of Dirac’s ‘experimental procedure’, working from the empirical data to arrive at the theoretical conclusions.41 This is in contrast with the general style of the MP, which tends to operate in the reverse direction using the mathematical procedure. The actual history of the quantum theory of radiation-matter interaction provides a more complicated picture, though. It is not just a story about the triumph of the experimental procedure, but also one that involves the triumph of the mathematical procedure as well. In fact, Compton (1923) is preceded by an article by Einstein (Einstein (1917), cited in Dirac (1968) as an example of the ‘mathematical procedure’; see the quote in section 2 above). This article indeed ‘addresses questions of principle without offering any new experimental conclusion or prediction’ (Pais, 1982: Ch. 21). By general admission, this article was a fundamental contribution, which has served as a basis for subsequent research on absorption, emission and dispersion of radiation (with the notable
Exquisite connections
327
exception of Compton).42 One of its central results concerned the exchange of momentum in radiation-matter interactions. This result can be stated as follows: (25) In conditions of thermal equilibrium, an exchange of energy hυ between radiation and matter that occurs by a transition process (between two stationary states of an atomic system) is accompanied by an exchange of momentum of the amount hυ/c, just as would be the case if the transition were accompanied by the starting or stopping of a small entity moving with the velocity of light c and containing the energy hυ. (adapted from Bohr et al., 1924)
This conclusion constituted a fundamental innovation, a conceptual ‘mutation’, since, by associating momentum quanta with energy quanta, it amounted to defining light-quanta as particles, on a par with, e.g., electrons: light-quanta, like electrons, were taken to be entities endowed both with energy and momentum. Previous discussions of the interaction between radiation and matter had solely been concerned with the exchange of energy (cf., for example, Einstein, 1905).43 This central result in Einstein (1917) anticipates Compton’s account. Conversely Compton’s discovery helped clinch the argument in Einstein’s article for ascribing a certain physical reality to the theory of light-quanta.44 The mathematical and experimental procedures are, in the best circumstances, mutually reinforcing. Thus there are several complementary facets to the story of the quantum theory of radiation-matter interaction. (To tell the story properly, one would need the combined talents of the author of The Alexandria Quartet and of the author of Blackbody Theory and the Quantum Discontinuity, 1894–1912.) Theoretical strands and contributions are intertwined in a complex pattern of interactions that are hard to disentangle. What is sure is that a conceptual ‘mutation’ happened, by which the light quantum postulated by Planck in 1900 and in Einstein (1905) was granted the fundamental attributes of particles, namely energy and momentum. Several strands of work contributed to that mutation. It would be nonsensical to single out any of them as the most representative. What we really have here is a kind of ‘ecological system’. The development of scientific hypotheses can actually be advantageously compared to the evolution of species. New hypotheses are put forward and concepts created, which are then subjected to selection. Normally only the fittest survives. Selectional pressure, e.g., in the form of such an experiment as Compton’s, is then crucial to the development of successful hypotheses and theories. At the same time, selection is not the ‘exclusive means of modification’.45 Quite obviously, there must exist a source of new hypotheses (i.e., scientific ‘mutations’). So we are led to distinguish two types of scientific events: ‘selectional events’ (often, but not exclusively, experiments) and ‘mutational events’.46 Analogously, we can distinguish between ‘selectional contributions’ and ‘mutational contributions’. If we follow Dirac (1968), ‘mutational contributions’ in turn would be of two types, ‘experimental’ (i.e. like Heisenberg’s, which was based on the experimental data) or ‘mathematical’. Note that the distinction between ‘selectional contributions’ and ‘mutational contributions’ is one of roles, not essences, relative to scientific situations. The same contribution might count as selectional or mutational depending on the set of hypotheses considered.47
Generative grammar
328
Compton’s Nobel Prize winning contribution was both a selectional and a mutational one.48 However, it was obviously quite different in character and in scope (see note 41) from that of the fundamental ‘mutational contributors’ to quantum mechanics, including Bohr, Born, Dirac, Einstein, Heisenberg, Jordan, Planck, Pauli, and Schrödinger. While Compton was initially busy defending classical physics, they were building an alternative framework, often with few certainties at hand. From the evolutionary point of view above, both approaches were necessary to ensure the required level of ‘ecological diversity’. One cannot emphasize this point too much. Science needs a diversity of styles for its continued progress. This applies as much to linguistics as to physics, perhaps more so given the relative immaturity of the field. Chomsky’s work on the MP has been from the outset grounded in the mathematical procedure he employed so successfully to launch modern generative grammar in the early 1950s. In essence it constitutes a distillation of the mathematical procedure applied to linguistic theory that Chomsky envisioned in the opening paragraph of Syntactic structures: ‘The ultimate outcome of these investigations should be a theory of linguistic structure in which the descriptive devices utilized in particular grammars are presented and studied abstractly, with no specific reference to particular languages’. From one perspective, the MP offers the most promising approach to this goal, far off though it still remains. Nonetheless, based on the considerable empirical successes of its predecessor, which to a large extent are still incorporated under current proposals, there is good reason to continue exploring the MP to discover whatever insights it can provide as well as whatever its actual limitations may be. Notes
* Authors’ note: We would like to thank Noam Chomsky, Carlos Otero, Maria Rita Manzini, Martin Prinzhorn, Johan Rooryck, Alain Rouveret, Andrew Simpson, Neil Smith and Maria Luisa Zubizarreta for discussions. 1 References are to the version of Chomsky (1993) reprinted as chapter 3 of Chomsky (1995b). 2 It is worth pointing out here that in MMH Chomsky’s notion of simplicity bears some general similarity to the more current discussions of economy.
For the formulation of any relative precise notion of simplicity, it is necessary that the general structure of the grammar be more or less fixed, as well as the notations by means of which it is constructed. We want the notion of simplicity to be broad enough to comprehend all those aspects of simplicity of grammar which enter into consideration when linguistic elements are set up. Thus we want the reduction of the number of elements and statements, any generalizations, and, to generalize the notion of generalization itself, any similarity in the form of non-identical statements, to increase the total simplicity of the grammar. As a first approximation to the notion of simplicity, we will here consider shortness of grammar as a measure of simplicity, and will use such notations as will permit similar statements to be coalesced. (Chomsky, 1951:5)
Exquisite connections
329
To avoid circularity, the notation must be fixed in advance and neutral to any particular grammar. Given the fixed notation, the criteria of simplicity governing the ordering of statements are as follows: that the shorter grammar is the simpler, and that among equally short grammars, the simplest is that in which the average length of derivation of sentences is least. (Chomsky, 1951:6) In current work, the ‘shortness’ of grammars and of derivations is driven by substantive principles of UG. A notion of economy is also mentioned in the literature of structuralist linguistics. Hymes and Fought (1981) cite the following from Hockett (1954): A model must be efficient: its application to any given language should achieve the necessary results with a minimum of machinery. (Hockett, 1954:233) However it is doubtful that Chomsky’s notion of economy was influenced by structuralist linguistics in any significant way. MMH predates Hockett’s formulation by at least three years. Moreover, Chomsky’s discussion cites the work of philosopher Nelson Goodman as the source of his ideas, not work in structuralist linguistics. 3 FI prohibits superfluous symbols in general, ruling out vacuous quantification for example, where Case is not an issue. 4 For a detailed discussion of the rationale that has guided the evolution of generative syntactic theory, see Freidin (1994b). 5 Under the system of analysis in Chomsky (1999), the of the matrix T in (1) would not be checked because the N John would be frozen in place in the complement clause (see p. 5). Therefore of the matrix T would violate FI at LF, rather than a Case feature of that T, which may be an unnecessary postulation. Thus (1) is prohibited because it would involve a single DP entering into two independent Spec-head agreement relations. We assume that such configurations are generally excluded. Whether John could actually move to the matrix subject position of (1) is a separate issue. Given that matrix T has a D-feature (EPP) that normally drives movement, then John would move to matrix Spec-TP. If it does, then somehow its agreement features must be unavailable for checking the matching features of matrix T, even though they are interpretable at LF and therefore not erased. For evidence that NP-movement may be motivated solely by a D-feature of T as well as further discussion of this issue, see Lavine and Freidin (2002). 6 The example in (1) is also ruled out by the Case uniqueness requirement of the Chain Condition. Therefore, we might also want to investigate whether the empirical coverage of the Case uniqueness requirement can be subsumed under FI as well. Notice also that certain violations of the θ-uniqueness of chains also fall out from FI with respect to unchecked Case features. For example, in the simplest case (i) the Case feature of T will not be checked.
Generative grammar
330
(i) *Bill T mentioned t. Because the Case feature of Bill is checked in VP, there is no Case feature to check the nominative Case feature of T. 7 Whether this is feasible depends on a demonstration that other cases of ECP violations can be subsumed in a similar fashion under FI or some other general condition. However, it is a reasonable and promising line of investigation. 8 The fact that that the level of complexity of the analysis mirrors the complexity of the data constitutes yet another argument against the ECP analysis. This analysis functions more like a technical description of the data than an explanatory account of the phenomena under analysis. 9 The top-down analysis instantiated via phrase structure rules actually became suspect within GB when it was realized that phrase structure rule function as language specific stipulations of properties that were already accounted for by general principles in conjunction with the specific properties of lexical items. Therefore, given the redundant character of phrase structure rules, it was assumed that they existed neither in individual grammars nor in UG. However, without phrase structure rules in GB, there appears to be no explicit way to derive phrase structure representations, though there were explicit conditions on the form of such representations (i.e., X-bar theory). Thus bare phrase structure answers the crucial question: if not via phrase structure rules, then via what? For this reason alone, the MP constitutes an important alternative to GB. 10 At present this bare necessities view does not account for locality conditions on movement, which appear to be conditions internal to CHL itself rather than the effects of legibility conditions imposed on CHL by other systems that interface with it. 11 Chomsky (1995d: 1) observes that one question that has motivated the work in generative grammar is that of the conditions that are imposed on the language faculty by virtue of ‘general considerations of conceptual naturalness that have some independent plausibility, namely, simplicity, economy, symmetry, non-redundancy and the like.’ 12 More precisely, the principle states that the classical theory is in agreement with the experiments for processes which depend on the statistical behaviour of a large number of atoms and which involve states where the difference between neighbouring states is comparatively little.
A more specific formulation is found in Bohr (1913) for the emission and absorption of spectral lines. There, the principle is taken to postulate a general conjugation of each of the various possible transitions between stationary states with one of the harmonic oscillation components in which the electrical moment of the atom can be resolved. The Correspondence Principle has been a staple of textbooks on quantum mechanics. However, recent experiments have shown it to be incorrect. 13 P.A.M. Dirac shared the 1933 Nobel Prize for physics with E. Schrödinger. 14 One could add to this short list Dirac’s own discovery of the correct laws of relativity quantum mechanics, which was arrived at simply by guessing the equation (see Feynman, 1965:57). 15 Of course, there are many different types of mathematical patterns: algebraic, geometrical, analytical, topological,…etc. 16 See Chomsky, 1975a, 1957, 1965. Following standard practice, we view language as one component of the human mind. Thus the study of language via GP concerns human cognition, and human biology more broadly.
Exquisite connections
331
17 Considering that modern mathematics with its dazzling complexity evolved in great part from the study of numbers, it is to be expected that a science that is concerned with quantitative relations like physics will tend to make maximal use of the structures found in mathematics. This is all the more so since there exist many connections between the different parts of mathematics. 18 See Stewart, 1998:243. Stewart’s book, which features a quotation from Thompson (1942) at the beginning of each chapter, constitutes a contemporary commentary on Thompson’s seminal ideas. 19 See Jenkins (2000) for further discussion. 20 In the early 1950s Chomsky had developed a mathematical understanding of natural language, which he then brought to bear on current issues in automata theory—in particular, demonstrating the inadequacy of finite state automata as a model of natural language (Chomsky, 1956, 1959) and investigating more broadly the relation between automata and grammars. The famous ‘Chomsky Hierarchy’ of formal grammars (and corresponding formal languages) is due to him (Chomsky, 1959), and so are the proofs of associated central theorems about regular grammars (Chomsky and Miller, 1958) and context-free grammars (Chomsky, 1962b), all results that have ever since been a staple of textbooks in computer science. Chomsky (1962b), for example, establishes the equivalence of context-free languages and pushdown automata (which was proved independently by M.P. Schützenberger and by J. Evey). For additional clarification concerning the context of Chomsky’s contributions to computer science, see Otero (1994). 21 We stress ‘ultimately’ because the MP is a research program based on specific conjectures, not a theory or even a framework. As Chomsky has noted (1998), ‘there are minimalist questions, but no specific minimalist answers.’ It should go without saying that whatever minimalist answers we might discover will only be found by actively pursuing the questions posed by the MP. Furthermore, it should be noted that Chomsky has been quite clear about the provisional nature of the MP, saying explicitly that it could turn out to be wrong, or equally problematic, premature (i.e. in much the same way that Einstein’s search for a unified field theory was premature, though not wrong if developments in string theory succeed (see Greene, 1999)). 22 The first ‘economy principle’ acknowledged within the Western intellectual tradition actually is the maxim known as Ockham’s razor. Entia non sunt multiplicanda praeter necessitatem. The prominent 14th century philosopher and logician William of Ockham (c. 1295–1349) has traditionally been credited with this principle (hence the name). However, the historical record suggests otherwise. We quote from Kneale and Kneale (1962):
No doubt this [the maxim RF-JRV] represents correctly the general tendency of his philosophy, but it has not so far been found in any of his writings. His nearest pronouncement seems to be Numquam ponenda est pluralitas sine necessitate, which occurs in his theological work on the Sentences of Peter Lombard. (Kneale and Kneale, 1962:243) See also Boehner, 1958. 23 Fermat had been preceded by Hero of Alexandria, who had stated that the light travels in such a way that it goes from a point to a mirror and then to another point in the shortest possible distance. 24 Actually, as Feynman points out, the principle as stated is incorrect, since it would predict that light emanating from a point in front of a mirror should avoid the mirror! There is a more exact formulation that avoids this problem and coincides with Fermat’s original
Generative grammar
332
formulation in the case of refraction of light. See Feynman R.P., R.B. Leighton and M. Sands, 1963, Chapter 26. 25 Willebrord Snell, a Dutch mathematician, found the formula describing the change of angle of a ray of light that goes from one medium into another. 26 See the discussion in Thompson (1942): Chapter 1. 27 In the light of this discussion, the following statement (LLJ: 666) appears to be profoundly in error:
Finally, one may suggest that the notion of perfection that Chomsky has in mind is based upon an analogy with the minima and maxima principles of physics. So, for example, air pressure in a soap bubble produces a spherical shape as the optimal geometric design for distributing this pressure. Similarly, light reflecting of a mirror takes the path of least time between two points. If this is, in fact, the sort of optimality that Chomsky has in mind, then it has no place in the theory of grammar. Minimization/maximization principles are derived from deeper physical properties of the particles (waves, vectors, etc.) which satisfy them. They follow from the subatomic structure and attributes of these particles, and are not themselves basic elements of the theory. Hence they have no independent explanatory status within physics, but are reducible to other principles. By contrast, the MP takes economy conditions to be essential elements of the grammar and the optimality which they encode to be one of its defining properties. LLJ claim that because the empirical content of a principle X is deducible from other more elementary considerations, X has ‘no independent explanatory status.’ They suggest that this applies to linguistics as well as physics and therefore that the economy principles discussed in linguistics cannot legitimately be considered part of the theory of language. In the case of linguistics, this suggestion is moot because as yet no deductive relation with more elementary considerations has been established. Therefore it is both natural and rational to consider economy conditions as fundamental. In the case of physics, the point appears to be mistaken as the text above indicates. 28 The discussion of Principle C follows from Vergnaud, 1998, which is a hand-out for a lecture given at UCLA, explores various notions of ‘multiplicity of representations’. 29 The DP constitutes a definite description where the head D is indicated by the feature [+def]. At this point whether the agreement features are associated with D or N is left open. 30 Questions arise in the case of such structures as that in (i) (discussed in Jacobson, 1977):
(i) [the man who loved her2]1 kissed [his1 wife]2 We have the following descriptions for the constituents his wife and her in (i): (ii) a. [his1 wife]2=[[[+def] wife], with and the agreement features for man and wife, respectively b. We assume that; (iii) The DP ‘the man’s wife’ is [+def] for the same reason that ‘a man’s wife’ is [— def] (technical details aside).
Exquisite connections
333
(iv) The structure in (iib) is ambiguously realized as his or as her (v) In (iv) above, his is the PF realization of [+ def] and her, of [+def] Independent principles (having to do with contrast at surface structure) determine which one of the alternative forms in (v) is realized. An important fact discovered by P. Jacobson is that her in (i) must be analyzed as a copy of the whole antecedent his wife, and not merely as a copy of [+ def] wife. In other words, her in (i) must be described as in (iv). Call this the Principle of Anaphora Interpretation: (vi) Definite anaphora only holds between complete DPs. An analogous principle was postulated in Vergnaud 1974. The principle in (vi) entails the ungrammaticality of (vii) on the indicated reading (see Brody 1982): (vii) [his2 employer]! respects [her1 secretary]2 The constituents his employer and her secretary in (vii) are described as in (viii): (viii) a. [his2 employer]1 = b. her1 secretary]2 = Note that the above assumptions require that the head of the relative clause construction in (i) (the man) be analyzed as a DP. The impossibility of (ix) is presumably related to that of (x): (ix) *[The President]i said that [he]i that had been elected could not resign (x) *[A man]i came in. [The man]i that looked tired sat down. The example (x) contrasts with (xi): (xi) [A man]i came in. [The man]i sat down.
31 The range of an expression is more than a mere list. To wit, it is a semi-lattice structure (see Vergnaud and Zubizarreta, 2000). Note that the range of an expression may also include descriptions outside the linguistic modality e.g., visual descriptions (see Vergnaud and Zubizarreta, 2000). 32 For the purpose of that discussion, we assume that c-command is reflexive, so that the c-command domain of x contains x. 33 Observe that, in a different domain, the rule of Quantifier Raising (QR) can be taken as a manifestation of the PPC. QR ensures that the inherent variable interpretation associated with a quantified expression QNP is carried over to the phase of QNP. The existence of such symmetries among linguistic representations might suggest an approach to the study of the underlying neural system very much in the spirit of that pioneered by Stewart (1998, 1999) for the study of animal locomotion (see Stewart, 1998: Chapter 9, for example). 34 Introduced by Chomsky in class lectures during the mid-seventies. 35 By contrast, no violation of the PPC occurs in the case of (2a) because Pshe is the embedded TP and therefore does not contain an accidental chain, even though the matrix TP does. The notion of ‘range’ introduced in the text analysis could be developed so as to provide an account of the anaphoric interpretation of pronouns in the spirit of that of Lasnik (1976), where there is no specific rule of grammar that establishes an anaphoric reading between a pronoun and an antecedent.
Generative grammar
334
36 A fundamental aspect of the account of Principle C proposed in the text is that it centrally relies on the PF distinction between the full-fledged nominal expression
and
its pronominal counterpart The prediction is then that Principle C is inoperative in elided constituents such as deleted VPs. The prediction appears to be correct, as shown by the grammaticality of the construal in which he is anaphoric with John for the form in (i) (see Fiengo and May, 1994:220):
(i) Mary loves John, and he thinks that Sally does, too. Indeed, the representation of pronouns proposed in the text would obviate the need for any Vehicle change’ in the sense of Fiengo and May (1994) (see also Lappin, 1991 and Lappin and McCord, 1990). Note however the contrast between (ii) and (iii): (ii) Mary believes that John is eligible and Sally claims he does, too. (iii) Mary believes John to be eligible and Sally claims he does, too. The pronoun he may be construed as anaphoric with John in (ii), but not in (iii). This suggests that an independent LF constraint is at work in the case of (iii), presumably the Principle of Disjoint Reference (see Chomsky, 1976). If this is on the right track, then the behavior of Principle C with respect to reconstruction phenomena needs to be reconsidered. 37 Given the representation of pronouns postulated, the text account reduces Principle C to the law that states that only the most c-commanding element in a chain may be realized at PF. Conversely, one can take that law to be a subcase of Principle C.
Note that the Parallelism Principle itself can be described as a principle of ‘symmetry preservation’. The relevant symmetry in that case is the invariance of LF interpretation under the permutation of anaphorically linked expressions. 38 Technically the generalization encompasses other cases—e.g., (i).
(i) *Pierrei sait très bien que le juriste que Pierrei est est en difficulté. ‘Peteti knows very well that the jursit that Peter1 is has a problem’ (i) has the same anaphoric behavior as (ii). (ii) *Pierrei sait très bien que Pierrei est en difficulté. ‘Peter knows very well that Peter has a problem’ Surprisingly (iii) is not as deviant as (ii) or (i). (iii) ?Le juriste que le Pierrei est sait très bien que le Pierrei est en difficulté. ‘The jurist that the Peteri is knows very well that the Petei has a problem’ (iii) may constitute an exception to this generalization. However, the coreferential interpretation of a pair of R-expressions may not be within the purview of Principle C under the appropriate formulation. See Lasnik (1991) and Freidin (1997b) for some discussion. 39 Some intriguing proposals in this area are put forth in Jenkins, 2000 (see pp. 151–170, for example). See also Fukui (1996) for another line of investigation into these topics. 40 Perhaps linguistics is, in this regard, roughly comparable in character to structural chemistry in the years preceding its unification with physics (see Chomsky (1995a) for important
Exquisite connections
335
remarks on structural chemistry in connection with the issue of ‘reductionism’), or with the initial development of quantum physics prior to the Heisenberg/Schrödinger formulations. 41 In that sense, both Heisenberg’s research and Compton’s Nobel prize winning work belong in the same ‘Diracian category’, i.e., the ‘experimental procedure’. However, one cannot stress enough the difference in scope between the contributions of the two scientists. Compton was concerned with the interpretation of a particular experiment. Heisenberg was a theoretician trying to construct a general account of all the spectroscopic data available at the time in order to get a better understanding of the laws of nature. Heisenberg ended up constructing a new physics. 42 It seems that Compton was unaware of Einstein’s results at the time he was developing his theory. His only reference to Einstein in connection to the question of the interaction between radiation and matter is to Einstein (1905). Compton presented his theory at the 1 December 1922 meeting of the American Physical Society held at the University of Chicago. It is somewhat surprising that he would not have known of Einstein’s (1917) article at the time, since there was really a free flow of information and ideas in physics between Europe and the US at the beginning of the 1920s. Thus, Debye could learn quickly about Compton’s experimental results, as did Einstein. There were also common channels of publication (e.g., The Philosophical Magazine, in which A. Compton had previously published, as had many major contributors to quantum mechanics). More surprisingly, nowhere in his whole work does he refer to Einstein’s 1917 paper.
One may surmise that Compton arrived at his result by a different route, namely, from classical electrodynamics (as it turns out, in conjunction with O.W. Richardson, Compton’s first research advisor at Princeton University). Within classical electrodynamics, an electromagnetic wave carries momentum, giving rise to ‘radiation pressure’. This is how it works. Suppose an electromagnetic wave is acting on an electric charge. The electric component in the wave makes the charge oscillate. This oscillation in turn interacts with the magnetic component of the wave, creating a force in the direction of the propagation of the wave. The value of the induced momentum is equal to the energy absorbed by the charge divided by the speed of light c. The division by c merely reflects the fact that the strength of the magnetic field associated with the wave is that of the electric field divided by c (see, for example, Feynman et al., 1963:34–10&11). A particular instance of radiation pressure is that of an atom emitting an energy Win some direction. Then, according to classical electrodynamics, there is a recoil momentum p = W/c. There is a big leap between the classical theory and the quantum hypothesis put forth in Einstein (1917), in Compton (1923), and in Debye (1923), though. The classical relation is a statistical relation, defined over averages of fluctuating quantities. More seriously, it only applies to directional waves. In case of a spherical wave, there should be no recoil. Einstein’s fundamental insight was to consider that every exchange of energy was accompanied by an exchange of momentum. Correlatively, he was led to assume that the radiation was always directional, even in, e.g., the case of spontaneous emission, which was classically described as a spherical wave. Einstein assumed that that was a case where direction was only determined by ‘chance’. 43 Einstein’s article included several other far-reaching assumptions, from which Einstein was able to derive both Planck’s radiation law and Bohr’s frequency condition, among other results. Indeed, one of the article’s central contributions was to establish a bridge between blackbody radiation and Bohr’s theory of spectra.
Generative grammar
336
44 Debye, who knew about Compton’s evidence against the classical theory, independently derived the equations describing the scattering of a photon off an electron at rest (Debye, 1923). We note that, in his article, Debye expressed his indebtedness to Einstein’s 1917 theory. 45 Cf. the introduction to Darwin (1859), where the last sentence reads as follows:
(i)‘Furthermore, I am convinced that Natural Selection has been the main but not exclusive means of modification’.
46 One issue of importance is how one recognizes a ‘mutational event’. For example, was the atomism of Leucippus and Democritus a ‘mutational’ scientific hypothesis? Of course, a preliminary question is whether it was a scientific hypothesis at all. Given that one could not really conceive of any serious experimental testing of the (rather vague) philosophy of the Greek atomistic school, it would not qualify as a scientific hypothesis. In general, there are various criteria by which one may evaluate and compare mutational hypotheses. One factor is the ‘likelihood’ of an hypothesis, i.e., the degree of expectation of the hypothesis given the state of the knowledge at the time. We could call this the ‘originality’ of the hypothesis. Thus, when the Danish astronomer Roemer proposed that the apparent discrepancies between the measured movement of the moons of Jupiter and Newton’s Law were merely an observational illusion due to the noninstantaneous character of light propagation, this was an original hypothesis at the time. Another criterion in evaluating an hypothesis is its complexity, i.e., how much mathematical or conceptual elaboration it involves, and how much revision of the existing system of knowledge it requires. 47 It should be clear that such notions as ‘scientific mutation’ or ‘scientific selection’ are intended to apply to all sciences, not only to Galilean sciences. Thus, Darwin’s hypothesis concerning natural selection and evolution qualifies as a ‘mutational’ hypothesis. 48 It was a selectional contribution in two distinct ways. It established the reality of the light quantum as a true particle. But also, because it could not account for the angular dependence of the scattered X-ray intensities, the total scattering crosssection as a function of energy, or the exact state of polarization of the Compton scattered X-rays, it emphasized the need for a more basic quantum mechanics, to be soon developed by Born, de Broglie, Dirac, Heisenberg and Schrödinger.
18 Syntactic Structures Redux* 1 Rereading Syntactic Structures As the first published explication of modern generative grammar, Syntactic Structures (henceforth SS) raised many issues in grammatical theory that have remained lively research topics to this day For this reason alone (re)reading SS is highly rewarding for the perspective it provides on how much has changed over the past 45 years and how much remains essentially the same. Howard Lasnik’s brilliant commentary Syntactic Structures Revisited: Contemporary Lectures on Classic Transformational Theory (henceforth SSR) provides us with just such a rereading. SSR elucidates SS from three perspectives. Chapter 1, “Structure and Infinity of Human Language,” discusses initial criteria for a theory of language: a notion of structure and a (finite) way to represent infinity, criteria that have remained unchanged. This first chapter also contains an introductory discussion of English verbal morphology, including some descriptive generalizations that a theory of English grammar ought to capture. The second chapter, “Transformational Grammar,” develops transformational analysis as first presented in SS. It considers the properties of transformations (sect. 2.3) and how the rules formulated in SS apply. It goes on to detail some serious theoretical problems for this theory of transformations (sect. 2.6) and additional problems the theory encounters from the perspective of learnability and language acquisition (sect. 2.7), as well as to propose some solutions. The third and final chapter begins with some further problems with the SS analysis (sect. 3.1) and then moves rapidly to succeeding developments in the theory of grammar that address these problems, such as X-bar theory (sect. 3.2) and the notion of the lexicon including subcategorization and selection (sect. 3.3), which led to the more modern analysis of clause structure and thus the verbal system presented in section 3.4. This latter section covers inflection as a syntactic entity, negation, the Head Movement Constraint, and the distinction between main and auxiliary verbs (which is developed further in the brief sect. 3.5), among other topics. The last three sections of this chapter discuss three different analyses of verb movement: Chomsky’s 1991 economy analysis (sect. 3.6), the minimalist approach in Chomsky 1993 (sect. 3.7), and finally Lasnik’s 1995 hybrid analysis (sect. 3.8). The clarity of the presentation, by now a well-known characteristic of Lasnik’s work, makes this book a pleasure to read.1 Fundamental assumptions and the formal analyses based on them are carefully explicated by application to compelling empirical evidence. The discussion throughout is illuminating, fleshing out background assumptions and details of analyses, and often raising new points. SSR presents the analysis of the verbal morphology system in historical perspective in a way that allows us to appreciate the genius of the SS analysis while at the same time understanding its imperfections and moreover the rationale for the changes embodied in more modern analyses.
Generative grammar
338
At the conclusion of the brief introduction, Lasnik states his belief “that many of the analyses of the 1950s are actually correct, fundamentally and (sometimes) even in detail” (p. 4). In what follows, I would like to consider the contrary view that very little, and perhaps almost nothing at all, survives of the original technical machinery proposed in SS to give an account of the English verbal morphology system. This, of course, requires a rather different analysis of the system than the one Lasnik is proposing. Given his assumptions and analysis, his belief is justified. Furthermore, his analysis is not obviously incorrect. It stands as testimony to the strength of the analytic approach in SS that has dominated our thinking on this topic for almost half a century. Yet it would be essentially miraculous that anyone would get the analysis of so intricate a piece of syntax as the verbal system right on virtually the first attempt, and not at all surprising if a deeper understanding of syntactic theory provided by four decades of research leads to a different and potentially more principled analysis. This review article will be concerned with three closely interconnected topics (sect. 2– 4, respectively): (1) the formal analysis of the English auxiliary system in SS, including Lasnik’s critique of it; (2) Lasnik’s revision of the SS analysis in light of contemporary syntactic theory; and (3) a radical alternative proposal that eliminates the central technical machinery of SS (and Lasnik’s hybrid theory)—namely, the rules of affix hopping and do-support. This latter section is offered as an addendum inspired by the discussion in SSR. 2 The formal analysis in SS The SS analysis of the English verbal system begins with various assumptions about the underlying phrase structure for English sentences, embodied in the phrase structure rules of SS. Thus consider the subset in (1) that directly involves verbal elements. (The numbering follows Appendix 2 of SS.) (1)
1.
Sentence→NP+VP
2.
VP→Verb+NP
8.
Verb→Aux+V
10.
Aux→C (M) (have+en) (be+ing)
The lexical category V, which in the SS analysis is rewritten as a lexical verb, is dominated by two phrasal categories, Verb and VP.2 The rule for Aux is based on the assumption that the aspectual auxiliaries are associated underlyingly with affixes that they never take in Phonetic Form (PF). Furthermore, they occur as bare formatives without any category designation.3 This turns out to be a crucial aspect of the SS analysis, as shall become clear—one that has a particular relevance with respect to Lasnik’s reformulation. And most important, C, which comes to represent the finite tense and agreement affix, is introduced into the derivation as an independent element. Given these rules, the verbal affixes en and ing, and those represented by C4 start out as independent from and to the left of the verbal stems they will be affixed to in PF. In
Syntactic structures redux
339
the case of the aspectual affixes, the analysis assumes an underlying dependency that becomes discontinuous at PF. Note that the exact nature of this underlying dependency is not clear (see n. 3). To account for these discontinuous dependencies as well as the underlying independent tense affix that can not remain independent in PF, SS postulates a transformational rule named Auxiliary Transformation (but commonly referred to as affix hopping). (2) Auxiliary Transformation—obligatory: SA: X—Af—υ—Y (where Af is any C or is en or ing, υ is any M or V, or have or be) SC: X1—X2—X3—X4 → X1—X3—X2#—X4
The structural change (SC) not only inverts the order of an affix and the adjacent verbal stem to the right but also inserts a word boundary symbol # to the right of the inverted affix.5 Although the formulation in SS is not explicit about the nature of the elementary operation the transformation performs, Lasnik makes the natural assumption that affix hopping involves adjunction of the affix to the verbal stem rather than a permutation of the two elements, noting that this is explicit in Chomsky 1975a. About the inexplicitness of the formulations in SS, Lasnik makes an important point. This actually indicates a virtue of the SS analysis, given that Chomsky was assuming that the analytic details not specified in the rules would be provided by general principles of syntactic theory rather than by stipulation in the formulation of the rules. One serious problem with this formulation of affix hopping, which Lasnik points out, is that it introduces symbols, Af and υ, which are not, strictly speaking, symbols of the grammar. They are neither terminal nor nonterminal symbols introduced via phrase structure rules, nor are they variables. Rather, they are abbreviations for a Boolean combination of symbols, one of the possibilities for specifying a term in a transformation. In this case, the terms are Boolean disjunctions: {s,Ø, past, en, ing} and {M, have, be, V}. Lasnik adapts Ross’s critique of the use of Boolean disjunction in the formulation of transformations (Ross 1969).6 He points out that under the SS theory the elements grouped in the disjunctions do not constitute a natural class even though they behave similarly. Furthermore, the evaluation metric makes no distinction between the disjunction of these elements and another disjunction of random elements. Ross’s solution was to designate modals and the aspectual auxiliaries as the same category, [+V, +Aux], thereby making use of a feature theory which wasn’t at the time available to the SS theory To reformulate affix hopping in this way, we would have to substitute [+V, −N] for υ ([−N] to exclude adjectives, and no designation for the feature [Aux] to allow either + or −). Finding a non-ad hoc feature designation for the set of verbal affixes, however, might still be a problem. I won’t pursue this issue further because in current analyses the role of affix hopping is either eliminated or limited to moving only the tense affix, as in Lasnik’s revised theory. The affix-hopping analysis in SS is crucial to an account of the distribution of auxiliary do. In yes/no-questions, for example, a verbal element must occur in front of the
Generative grammar
340
subject. Because the main verb in modern English cannot be so displaced, some verbal auxiliary must occur in the clause-initial position. If the derivation contains neither a modal nor an aspectual auxiliary, then periphrastic do occurs, carrying the tense and agreement. SS accounted for this by postulating a question transformation under which the tense and agreement affix C would front just in case it was adjacent to the main verb V The question transformation is given in SS as (3), where the ellipsis symbol stands for a variable. (3)
Tq—optional:
SC: X1—X2—X3→X2—X1—X3
Using Boolean disjunction, the SA can be abbreviated as (4), thereby eliminating eight symbols. (4)
The formulation in (4) highlights the fact that there seem to be two distinct cases of question formation: one that affects a single constituent C (now realized as some particular tense and agreement affix) and the other that affects a nonconstituent, C and the following verbal auxiliary (which have yet to be conjoined via affix hopping). Lasnik criticizes this second case for violating the structure dependence of transformations. He equates the statement that transformations are structure dependent with “transformations always affect constituents” (p. 76).7 Under this interpretation of structure dependence, affix hopping would have to apply prior to Tq so that Tq would be moving a constituent. This would, of course, create a problem for the derivation of interrogatives with periphrastic do. If the tense and agreement affix concatenates with the main verb before Tq moves it in front of the subject, then do-support can never apply— never, because in SS affix hopping is obligatory One solution, essentially the analysis of Lasnik 1981, is to make affix hopping an optional rule and deal with the unwanted consequences of this formulation with a stranded affix filter. In effect, affix hopping must apply unless do-support applies, thereby eliminating the stranded affix. Lasnik raises other theoretical problems for the transformations formulated in SS, almost all of them involving unnecessary descriptive power in formulating rules, which
Syntactic structures redux
341
creates problems for learnability For example, given the formulation in (3) a term in the SA of a transformation can be any of the following:8 (5)
a. b. c. d. e.
a single nonterminal symbol (NP) a variable (…) a sequence of nonterminal symbols (C +M) a sequence of nonterminal and terminal symbols (C+have) a sequence of nonterminal and variable (V…)
Lasnik points out that the two sequences (5c) and (5d) are problematic because they do not constitute single constituents. As formulated, the rule violates the tacitly assumed principle that transformations can only affect (i.e., move or delete) constituents (i.e., not a string that does not form a single constituent)9 and therefore allows for a wider class of rules than seems necessary. Moreover, just being able to specify terms of a transformation in so many different ways allows for excessive descriptive power. Along these lines, Lasnik also criticizes the general use of Boolean and quantificational conditions in the SAs of transformations in SS. He also discusses problems for learnability posed by postulating an obligatory/optional distinction for transformations, extrinsic rule ordering, and the placement of variables in SAs.10 This material is quite intricate and worth careful study. However, instead of reviewing Lasnik’s account, I would like to approach the solution to these problems from a somewhat different perspective. One reason early transformational accounts were so complicated in their formulations of rules was that the emphasis was on generating all and only the grammatical sentences of the language under analysis using only phrase structure and transformational rules. If the rules generated a deviant expression, then this was a serious problem, construed as a direct refutation of that formulation of the rules. It was not until well after the advent of general principles like the A/A Condition (Chomsky 1964) that it became possible to allow rule mechanisms to misgenerate structures and then deal with the problem in terms of general conditions on rule application and/or representations. And it was not until the development of the conditions framework began to accelerate in the early 1970s that it started to become clear that the detailed articulation of a system of general principles/conditions had a salutary effect on the formulation of rules—namely that the formulations could become optimally simple.11 For transformations, this culminates in the hypothesis that a grammatical transformation cannot compound elementary operations. That is, each grammatical transformation consists of a single elementary operation.12 From this perspective, two questions arise: what are the elementary transformational operations? and what further specifications must be added to formulate the grammatical transformations in various languages? From the perspective that has developed over the past three decades, the answer to the second question seems clear: ideally none. To see how this works, consider the elementary operation of adjunction, which minimally and maximally affects two elements: the element that is to be adjoined and the element to which it is adjoined.13 If we assume that the nature of what can be adjoined to what is fully determined by general principles (e.g., that only single constituents are affected), and further that the syntactic distance between the two (in the case of
Generative grammar
342
movement by adjunction) is also determined by general principles, then there is no need to stipulate the behavior of the rule in terms of a complicated SA and/or SC that contains more constant terms than minimally required by the elementary operation itself. Furthermore, there is no opportunity to utilize Boolean or quantificational conditions to specify SAs of transformations. Given this, the placement problem for variables (see n. 10) disappears. If the elementary operation involves two constants, then there are going to be exactly three variables—two cosmetic end variables and the one intrinsic variable that separates them.14 Furthermore, the optional/obligatory distinction becomes superfluous. If the operation is “obligatory” and it fails to apply in the derivation, then the result will be ruled out by some general principle on representations. Thus, for example, the obligatory character of do-support in SS can be handled by Lasnik’s Stranded Affix Filter, a UG condition on representations. And finally, given the simplicity of elementary operations, it becomes virtually impossible to specify intrinsic or extrinsic ordering relations. Here again, if operations apply in an order that yields a deviant result, general conditions—virtually always on representations—will rule out the derivation.15 Although this perspective on transformations rules out the specific formulations of transformations in SS, it is not obviously incompatible with a rule of affix hopping or dosupport. And of course Tq remains in essence, though now generalized as head movement (T-to-C and V-to-T). Tnot, another transformation intimately connected to the verbal morphology system, has been abandoned. As Lasnik points out, the SS system does not generate simple negative sentences with not. Tnot only inserts the negative affix n’t into the verbal string. The separation of C from the main V by n’t (or by the subject NP (with Tq) or by the emphasis marker A [with TA]) blocks affix hopping of C onto V and triggers do-Support.16 This account of the distribution of periphrastic do remains the major insight of the SS transformational analysis. It is preserved in Lasnik’s revision, to which I now turn. 3 Syntactic structures revised This section (and the following) will focus on three issues in the analysis of the verbal morphology system from our current perspective: (6)
a. b. c.
the linear order of verbal elements the inability of main V in English to undergo head movement the distribution of periphrastic do
Let us take this perspective to include the following assumptions:
Syntactic structures redux
343
(7) a. There are no phrase structure rules; phrase structure is generated transformationally via the adjunction operation Merge.17 b. Grammatical transformations consists of a single elementary operation. c. SAs do not contain constant terms other than those directly affected by the elementary transformation.
Given (7a), the SS analysis of (6a), where the order of verbal elements is stipulated in the phrase structure rule, must be replaced. Given (7c), the SS analysis of (6b), where this fact is stipulated in the complicated SA of Tq, is not available, so (6b) must be captured in some other way. As noted in the previous section, it is not obvious that current assumptions preclude an account of (6c) that utilizes transformational rules of affix hopping and do-support. As we know from Emonds (1978) (see also Pollock 1989), the inability of a main V in English to undergo head movement is a parametric distinction (main V can move in other languages, such as French), and therefore this property should fall under a theory of parameters that relates directly to UG. Pollock’s account of this parametric variation, which Lasnik discusses, is based ultimately on a theory of θ-role assignment. His account starts from the natural assumption that main Vs have θ-roles to assign but auxiliaries do not. The inflectional element Agr is morphologically rich in French but poor in English. Pollock assumes that morphologically rich Agr is transparent to θ-role assignment, whereas morphologically poor Agr is opaque. The effect of this is that main V can raise through Agr to T in French but cannot in English. When English main V raises, it cannot assign θ-roles to its complements, thereby leading to a violation of the θ-Criterion. At this point in Lasnik’s presentation a student asks a penetrating question: how do we know that main V in English moves before assigning its θ-roles?18 Lasnik responds with a discussion of where interpretation occurs. He then notes that Pollock’s account appears to be based on two assumptions: (1) that θ-role assignment is not a property of Dstructure, and (2) that either movement does not leave traces or traces cannot be used to assign θ-roles. He notes that these assumptions will cause problems for standard XP movement (e.g., in John was arrested).19 Lasnik’s proposal for handling the V-raising parameter, in contrast, has nothing to do with θ-roles. It is based instead on two distinctions: (1) between verbal elements that enter a derivation fully inflected (these he calls “lexicalist”—henceforth V*) and those that do not—what he calls “bare V” (henceforth V+); and (2) concerning the nature of the inflectional element I—either it is “featural” (the quotation marks are Lasnik’s—RF) (henceforth If) or affixal (henceforth Ia). It is a hybrid approach that mixes the purely affixal analysis of SS with the lexicalist/feature analysis of more recent minimalist analyses (e.g., Chomsky 1993, which Lasnik discusses in detail). Lasnik proposes that all French verbal elements enter the derivation fully inflected, hence are featural. Consider the case where the main verb VM is the only verbal element (8a), a configuration that could be instantiated in four different ways, as given in (8b).20
Generative grammar (8)
344
a.
IVM
b.
i.
If V*
ii.
If V+
iii.
Ia V*
iv.
LV+
If has strong V-features (in both French and English), so V must raise to I to check those features, as will occur in (8bi).21 If I in (8a) were affixal, as in (8biii), the derivation would crash because V would not be compelled to move, thereby leaving affixal I stranded—a violation of the Stranded Affix Condition (or possibly Full Interpretation at PF), and even if V moved it could not take on the affixal I because it is already fully inflected. Given that all Vs in French enter the derivation fully inflected, I could never be affixal in French, thereby ruling out (8biii) and (8biv). Lasnik makes the simplifying assumption that I can be either affixal or featural universally; no special stipulations are required. UG excludes the combination of Ia with V*. This applies to English as well, where auxiliary Vs are lexicalist. Therefore, I must be featural when the adjacent verbal element is an auxiliary, the configuration in (8bi). Furthermore, auxiliary Vs must raise to I to check the V-features of I. Assuming that featural I is universally available creates a problem for the analysis of English. How do we exclude the configuration (8a) in English where I is featural? Notice that if I is not featural, then V will not be able to raise to I, which is exactly what we want. But to account for this, Lasnik has to make a second stipulation: main V in English is bare, hence not featural. From this it follows that I cannot be featural when it occurs with only a main V.22 Furthermore, given the Stranded Affix Condition, affixal I will have to join with the main V—presumably via affix hopping because there is no motivation for V to raise to I. To account for negative sentences, Lasnik resurrects the adjacency analysis in SS. Affix hopping is subject to a strict linear adjacency requirement. If affixal I and the bare main V are separated by some syntactic element (e.g., not), then affix hopping is blocked. At this point do-support can apply as a last resort operation to save the derivation.23 It’s classic SS analysis. In particular, it does not rely on the Head Movement Constraint (HMC) to block the movement of the affix.24 In fact, it need not rely on the HMC at all. We could assume that in (9) has has moved over not to I (as an instance of V-to-I raising) to check I’s features or to check its own agreement features against the subject John via spec-head agreement. (9)
John has not left yet.
If only the finite form of the auxiliary can check the V-features of I, a natural assumption, then there is no motivation to move any nonfinite verbal element to I. Thus, HMC violations are blocked by Last Resort.
Syntactic structures redux
345
As smoothly as Lasnik’s analysis seems to work, there is a problem that remains to be solved, as he notes. It involves the situation in (10), where I is affixal and V is a main verb and therefore bare. (10)
I not V+
Example (10) will lead to deviant constructions like (11). (11)
a. b. c.
*It doesn’t be raining. *This doesn’t be a problem. *John didn’t have left on time.
Lasnik cites only the uncontracted version of (1 1a), where be is the progressive auxiliary (this does not be a problem). His analysis generalizes to copular be (11b) as well as the perfective auxiliary have in (11c). He suggests that the lexicalist option for the auxiliary (is) with V-to-I raising should block constructions like (11a) but says he does not see how to make the analysis precise.25 Notice that to get (12), the grammatical counterpart to (11b), requires a further stipulation that copular be can be lexical. (12)
This isn’t a problem.
If we can strengthen “can” to “must,” then the problem of (11b) disappears. Whether this is possible under Lasnik’s theory is not obvious, as I will show shortly. It is worth noting at this point that the problem of (11b), but not (1 1a) or (11c), could have also occurred in the SS analysis.26 Given the SA of Tq, which is the same for Tnot, copular be can be ambiguously analyzed as either a main verb (the first line) or the phonological form be (the fourth line).27 If it is analyzed as V rather than be, then Tnot will insert n’t between C and V, thereby blocking affix hopping and triggering do-support. This derivation yields the deviant (11b). Only when the copula is analyzed in terms of its phonological shape rather than its syntactic category does the derivation yield the grammatical (12), where n’t has been inserted to the right of be.28 Thus, in both the SS analysis and Lasnik’s revision there is no motivation for analyzing the copula as a main V; though how this is to be captured remains to be determined. In addition to the problem posed by (11), there are two further potential worries about Lasnik’s hybrid analysis. For one, the distinction between affixal and featural I is not completely transparent. The distinction between tense as a feature versus tense as an affix seems quite subtle. Moreover, featural verbs in French can be analyzed as having affixes, in which case the distinction between features and affixes becomes hard to grasp. And in the case of English, we would still have to assume that affixal I must agree with the of the subject, in which case affixal I would, as noted by Matthew Wagers (p.c.), appear to be featural in the relevant sense. Perhaps the distinction can be made in
Generative grammar
346
terms of strong versus weak tense, leaving the out of the equation—though this is tantamount to stipulating the facts. The second worry concerns the notion of bareness for verbs. Lasnik’s analysis conflates the bareness of underlying main V in English with the bare verbal elements, both auxiliaries and main verbs, that occur following a modal or infinitival to. They are, however, different. The main verb that is underlying bare—that is, apparently affixless— may end up at PF with an affix; whereas the bare form following a modal or infinitival to remains bare at PF. Furthermore, it is plausible that the bare forms at PF occur with a phonetically null (zero) affix, in contrast to the underlying bare main V which by hypothesis has no affix at all.29 The plausibility of distinguishing the two kinds of bare verbal form increases when we consider the problem of main verbs in English that are not inflected for tense. The discussion in SSR does not cover the hybrid analysis of main V when it occurs with an inflected auxiliary, it is not clear how participial main Vs should be handled. One possibility is that a main V that is conjugated with a fully inflected aspectual auxiliary enters the derivation fully inflected (i.e., with an -ing or -en affix). If so, then the stipulation about main Vs entering the derivation bare will not be a general statement about main Vs but instead will merely describe the complexity of the data. In Lasnik 1995d, where the hybrid analysis was first explicated, another possible analysis is mentioned but not developed systematically—namely, the general statement about the underlying bareness of main V is maintained. This requires that aspectual auxiliaries can enter the derivation fully inflected along with their associated affix, which would then hop onto the bare main V Alternatively, these auxiliaries can enter the derivation without the associated affix when the following verbal form is another inflected auxiliary. The problem then arises that aspectual have must be entered in the lexicon in two forms: (13)
a. b.
has/have/had has/have/had+en
(13a) is needed when the progressive auxiliary is present, and (13b) when the perfective auxiliary is immediately followed by a main V. It is not a problem if (13b) is selected to precede the progressive auxiliary because the affix -en would be stranded in violation of the Stranded Affix Condition. However, it would be a problem if (13a) is inserted adjacent to a bare main V. By hypothesis, there would be no features of V+to check because V+ is not featural; so why would bare V be ruled out in this construction? One possible answer is that bare V+is truly affixless, whereas the other bare forms are not, and there is a general prohibition against affixless stems. Of course, the most plausible solution is that the perfective auxiliary selects the past participial form of the following verb (either auxiliary+-en or V+-en). But then it would be more economical to have main V enter the derivation with its participial inflection, thereby avoiding the problem created by the dual lexical entry (13). But then we are back to complex stipulations about how main Vs enter a derivation.
Syntactic structures redux
347
4 Syntactic Structures reformed This section explores an alternative to Lasnik’s hybrid analysis. In contrast to the hybrid analysis, in which an inflectional element can be introduced into a derivation (as an affix or a set of features) independent of the lexical form it inflects, this alternative is based on the assumption that lexical items enter the derivation fully inflected, hence affixes will not be disassociated from the stems that they attach to at PF. Furthermore, there will be no putative distinction between an inflectional affix and corresponding inflectional features. One immediate consequence of this lexicalist analysis is that there is no basis for positing rules of affix hopping and do-support. Let us set aside the issue of the form in which main V enters the derivation (which was considered at the end of the previous section) and turn instead to the question of the linear order of verbal elements, which has a direct bearing on it. If the linear order of verbal elements cannot be stipulated via phrase structure rules, then one possibility is that the order follows from lexical properties of the verbal elements themselves. Selection is an obvious candidate (see Freidin 1992 for a detailed analysis).30 Modals and infinitival to select a bare V (auxiliary or main); perfective have selects the past participial form and progressive be, the present participial form. Selection will account for the distribution of affixes without recourse to affix hopping. In this way it also accounts for the order of auxiliaries as they occur. And it may also provide the full account of the order of auxiliaries—that is, why the verbal elements can only occur in the order in which we find them. For example, the order perfective+modal would be prohibited because modals only occur as tensed forms and therefore cannot satisfy the selectional requirement of perfective have.31 That modals and infinitival to both select a bare form demonstrates a common property. Let us assume that this is actually a categorial property: both are instances of T.32 Thus when a modal enters a derivation it is, like infinitival to, merged as T (not V as in Lasnik’s analysis). The fact that modals and infinitival to are in complementary distribution lends further support to this analysis. Whether we assume bare phrase structure theory or not, we need to account of this complementarity Periphrastic do also has the property of selecting the bare form. Moreover, it too is in complementary distribution with modals and infinitival to. Let us also consider it to be an instance of T—perhaps uncontroversially because in other analyses it always adjoins to the tense inflection T. However, controversially, let us assume that periphrastic do exists in the lexicon in its various finite forms do/does/did33 and is merged fully inflected as T. This creates several problems, some of which I attempt to solve in what follows. First, there is the question of why, if it exists as an item in the lexicon, periphrastic do does not occur more generally. For example, why isn’t (14b) allowed along side of (14a) where both have the same interpretation?34 (14)
a. b.
John left. *John did leave.
Generative grammar
348
One possible answer is that this is a case of blocking where left blocks did leave. However, it isn’t exactly like the standard cases of morphological blocking where one form in the lexicon blocks the existence or use of another—for example, as went blocks *goed. Rather, this has the flavor of an economy constraint, which for lack of a better term I will call Morphological Economy (henceforth ME). ME requires that the fewest grammatical formatives are used to express a grammatical concept. In this case, ME prefers one formative left to the two in did leave. Thus, (14b) is only permitted when did receives emphatic stress, thereby distinguishing did leave from left. ME has some further appealing consequences. It accounts for the nonoccurrence of periphrastic do with aspectual auxiliaries. (15)
a. b.
Mary is singing. *Mary does be singing.
(16)
a. b.
Mary has left. *Mary does have left.
ME works as well for the cases with emphatic stress, where the comparison will be between the stressed auxiliary alone and the stressed periphrastic plus the aspectual auxiliary. Furthermore, it accounts for the nonoccurrence of periphrastic do (stressed or unstressed) with the copula, as in (17). (17)
a. Adam is happy. b. *Adam does be happy.
Additionally, ME accounts for the nonoccurrence of unstressed periphrastic do in whquestions where the wh-phrase is in subject position. (18)
a. Who left? b. *Who did leave?
If we assume that the derivation of (18b) involves both head movement from T to C and wh-movement to Spec,CP, then while there will be traces in both T and Spec,IP (italicized as in (19)), its PF representation would presumably have did in C adjacent to leave in VP because the traces have been eliminated. (19)
[CP who [c did] [TP who [T [T did] leave]]]
Therefore ME is to be interpreted as a PF economy principle. ME appears to also eliminate one form of Lasnik’s (and SS’s) problem construction— that is, where the negative is contracted onto the auxiliary; (1 1b) and (12) repeated here.
Syntactic structures redux (11b)
*This doesn’t be a problem.
(12)
This isn’t a problem.
349
Compared to the well-formed (12), (11b) uses two formatives (doesn’t be) in contrast to a single formative (isn’t). Although this seems like a natural application of ME, we encounter a potentially serious problem when we try to extend the analysis to the following pair. (20)
a. b.
*This does not be a problem, This is not a problem.
If we allow ME to compare the morphological strings does not be and is not, ruling out (20a) as a violation, then ME should in principle prefer the deviant (21b) to the wellformed (21a), all things being equal. (21)
a. John did not leave, b. *John not left.
However, (20) and (21) are not equivalent. We assume that (21b) is deviant for some reason not connected to ME, whereas the only explanation for the deviance of (20a) is ME. Thus, ME compares morphological strings that are otherwise allowed by the grammar. It therefore could not compare (21a) to (21b). Without ME, examples like (21b) demonstrate the appeal of the SS analysis over alternatives. Under the Chomsky/Lasnik analysis, (21b) cannot be generated at all because not will block affix hopping. If (21b) is generated by merging the inflected verb with not as a first step, then some other explanation of deviance is required. Let us assume that (21b) is a TP and therefore contains a covert T. The uninterpretable of leave are on the verb itself, so that in order to be checked, they must be moved to T where spec-head agreement with the subject will eliminate them. The deviance of (21b) could therefore be a result of the failure of these features to be checked. Presumably, the presence of the negative will block covert movement from V to T. Notice that this could not be a consequence of the Head Movement Constraint, taking not to be a head. That is because in (21 a) did selects the bare form leave. If not were a head, then this selection would be blocked. Before we turn the issue of V-to-T movement for main verbs, it will be useful to consider how bare phrase structure theory might affect the analysis of auxiliary verbs. Given that a finite auxiliary verb will always wind up in T position, is there any reason to merge it as a V first and then raise it to T? If not, then on simple economy of derivation considerations there would be no V-to-T raising for finite auxiliaries. As far as I can tell, the raising analysis is just an artifact of the old top-down perspective of phrase structure rules. That is, given that aspectual auxiliaries sometimes co-occur with an independent inflectional element (e.g., a modal), they can occur in a non-T position. Phrase structure rules force this to be the general and underlying case. Building phrase structure bottom
Generative grammar
350
up—hence without phrase structure rules—there may be no reason not to merge a finite aspectual auxiliary as T.35 In contrast, there is a reason to merge a main verb as a V and then move it to T if that is allowed—namely, V constructs with its (internal) arguments to form a predicate in which θ-roles are assigned. If this perspective is correct, then the only difference between languages like English and French is that main V can raise to T in the latter but not the former. It is not the case that the process of V-raising is more general in French than in English because there is no raising of finite auxiliaries in either language. This analysis suggests that predicate adjectives and nominals in finite sentences are just that—that is, an AP or NP complement of T, not a complement of a V Thus, (22) would have the structural analysis (23). (22)
Mary is a doctor.
(23)
As noted by Len Babby (p.c.), this structure would explain why predicate nominals and adjectives do not occur in the structural accusative Case in languages with rich overt morphological Case. On the potentially questionable side, it may require that the corresponding infinitival forms of predicate NPs and APs have a different structure from their finite forms. Consider, for example, (24). (24)
a. b.
It is unlikely for Mary to be content, It is unlikely that Mary is content.
In (24a), given that T is filled by to, be will have to be the head of an independent projection, in contrast to (24b) where the finite form is is merged as T. Whether this lack of parallelism is a problem remains to be determined.36 So far, the basic assumption of this alternative lexicalist account—that all verbal elements come from the lexicon fully inflected—leads to no serious empirical problems or obvious conceptual defects. We have added to it the further assumption that the “tensed” verbal element, with the exception of main verbs, enters the derivation as T.
Syntactic structures redux
351
Under this analysis, the order of verbal elements is determined by natural selectional properties of the verbal elements. Furthermore, the distribution of periphrastic do is governed by a principle of morphological economy. Of the three empirical issues for the analysis of the verbal system (6a-c), what remains is an account of the one true case of Vraising, the movement of a main verb to T and the parametric variation associated with it. Given that the uninterpretable on English finite main verbs must be checked before the LF interface, then one plausible way to eliminate them is via covert raising of V to T (e.g., Chomsky 1993). English and French finite clauses would then have the same structure at LF. The difference between the two languages is limited to the movement of finite main verbs to T: overt in French and covert in English. From the perspective of bare phrase structure theory as developed above, it’s a tossup which one is the special case. Looking at this difference in terms the contrasting metaphors of “procrastinate” versus “earliness” does not add anything useful. We would still have to account for why the other language is able to escape the general strategy. Thus, it seems that we have not progressed much beyond the stipulation in SS via transformational rules that English finite main verbs cannot occur to the left of either negation or a subject NP. A principled account that distinguishes between PF positioning of finite auxiliary versus main verbs in English remains a desideratum for some future account. Nonetheless, we can sharpen our focus on the residual problems a bit more by scrutinizing an implication of the analysis based on ME. Given that ME operates on the PF side of the derivation, then whatever rules out (2 1b) should also apply on the PF side. Otherwise, the application of ME would require that information from the LF side be available to the PF side. Let us continue to maintain what has been more or less tacitly assumed since Chomsky and Lasnik 1977 first proposed the derivational split for PF and LF—namely, that when the derivation splits into two separate tracks, neither one has access to information from the other. Given this, the explanation for the deviance of (21b) cannot be that negation blocks covert V-to-I raising, because ME at PF would not have access to that information.37 Instead, the deviance of (21b) must follow from some problem on the PF side. There are prima facie two ways to account for the deviance of (2 1b) on the PF side: either the derivation is blocked by some condition on the application of rules, or the structure that is derived violates some general output condition (filter) on representations at PF. To determine which of these options is correct (possibly both), we need a comparison case whose derivation yields a well-formed result. Therefore, a prerequisite to an account of (21b) is an understanding of the derivation of the well-formed (25). (25)
John left.
Given that the clause (25) is structurally an TP, the derivation of such clauses, where T is covert because V must not raise to T overtly, raises several fundamental questions. Perhaps the most fundamental question for this analysis concerns the origin of T. If the V left comes from the lexicon fully inflected, then what could T be? Minimally it will at some point in the derivation contain the tense and agreement features of V so that the agreement relation with the subject in Spec,IP can be checked. However, there is no need to postulate in the lexicon the independent existence of a T with tense and agreement features, because this element duplicates information already specified in the lexical
Generative grammar
352
entries of the various verbal elements when they enter the derivation. In the case of finite auxiliaries, the TP projection is created by merging the finite form with its VP complement. The question, then, is what happens when the finite form is the main verb. Given that the verb is prevented from moving to the T position—that is, from projecting T (unlike auxiliary elements), but nonetheless T is required and must project to form a clause structure—a minimal solution would be to extract just the tense feature of the verb to merge with maximal projection of the verb while projecting T.38 Thus, the structure of (25) would be (26), where left also contains the same set of features.
Note that we are assuming that the of left must still be moved to T covertly at LF to be checked. Given the representation (26) for (25), the corresponding representation for (2 1b) would be (27). (27)
Both (26) and (27) contain a chain, the tense feature [+past] in both V and T. Given that a chain is always an illegitimate object at PF (and in morphological structure; MS), we must assume that the chain in (26) is eliminated at SpellOut. Presumably, the head of the chain, the tense feature in T, remains, whereas its trace in V is deleted. At MS/PF, where
Syntactic structures redux
353
the hierarchical structure has been eliminated as well, the feature set will be adjacent to the V that now lacks this feature. It is plausible that MS simply reads the adjacent tense feature as part of V In contrast, the elimination of the chain in (27) creates a representation at MS/PF where the tense feature that belongs with the V is separated by a phonetically realized morphological unit not. The presence of not blocks the interpretation of [+past] as part of V. The fact that [+ past] in this configuration cannot be interpreted as part of an adjacent lexical item renders the representation deviant— presumably a violation of Full Interpretation. This analysis is analogous to Lasnik’s Stranded Affix Filter analysis but not identical because there is no need to identify the tense feature as an affix or postulate such a filter. Alternatively, it could be that V, which now lacks tense, constitutes the deviant element. The alternative analysis presented here is obviously a radical departure from that of SS and its successors.39 Some of this analysis is motivated by relatively recent developments in syntactic theory (e.g., bare phrase structure theory) and the rest is informed by analytic concepts developed over the past almost four decades that have deepened our understanding of the structure of human language. It is offered as another way to think about the verbal morphology system,40 following Feynman’s observation that having multiple theories to compare gives us a better chance at guessing at the truth. The case of the verbal morphology system remains central to a theory of syntax because it is primarily through the analysis of this system that we derive an explicit theory of the structure of the clause. Notes
* I would like to thank Howard Lasnik, Carlos Otero, Ian Roberts, and Jean-Roger Vergnaud for useful discussion of the material covered here, and also Tim Stowell and two anonymous reviewers for helpful comments. 1 SSR started out as a transcript of Lasnik’s course lectures, transcribed and initially edited by Marcela Depiante and further edited by Arthur Stepanov, both University of Connecticut graduate students. Therefore, some credit for the readability of the book belongs to them as well. 2 The question of how these categories are related leads directly to a notion of phrasal projection, though it obviously was not an obvious question at the time. 3 The pluses in Chomsky’s phrase structure rule (28iii) are interpreted differently than those in the structural changes of certain transformations, where+indicates adjunction. Thus have and -en, for example, are not technically associated by adjunction. For further comment, see note 7. 4 Recall that via the SS Number Transformation, C becomes present tense singular -s in the context of a singular subject NP, in other contexts” (i.e., plural subject NP), or “past in any context”—in effect, a single affix representing tense and agreement. Thus C in (1.10) is essentially a placeholder. It is a curious rule, acting more like a context-sensitive phrase structure rule than a transformation. Notice also that the rule does not replace C with these morphemes because C must be available to satisfy the structural analysis (SA) of the rule that forms yes/ no-questions, which applies after the Number Transformation. 5 There is another rule in SS that introduces word boundaries—namely, the obligatory Word Boundary Transformation. Lasnik points out that the SA of this rule contains two conditions that utilize quantificational negation. Given the argument against using quantifiers in the SAs of transformations (see Chomsky 1965), the rule is not viable.
Generative grammar
354
6 Ross’s critique concerns the same implicit disjunctive term in Tq, Tnot and Tso—implicit because SS does not explicitly express this disjunction (cf. (3) and (4)). Ross does not discuss affix hopping, where auxiliaries and main verbs manifest the same behavior. In the rules he discusses, the main verb V has to be excluded from the Boolean disjunction. Thus the focus of the initial argument of his paper, the common SA of these three transformations, would seem to argue for distinguishing auxiliaries from main verbs. 7 This interpretation of the notion of structure dependence occurs in Chomsky 1965 (p. 55): “Specifically, grammatical transformations are necessarily ‘structure-dependent’ in that they manipulate substrings only in terms of their assignments to categories.” It differs significantly from Chomsky’s 1975 discussion, which concerns the proper formulation for a rule that derives yes/no-questions (e.g., (ib)) from underlying syntactic forms (in this case, (ia)). (i) a. Bill is sleeping b. Is Bill sleeping?
As we know, a rule that says simply “invert the first two words” is hopelessly inadequate, whereas a rule that says “move the finite auxiliary that is adjacent to the subject NP to the front of the clause” is not obviously incorrect. The second rule depends on an analysis of subject NP, which involves a notion of hierarchical structure. To the extent that hierarchical structure (hence constituency) is involved in the formulation of the rule, that rule will be structure dependent. Given that the notion constituent entails a notion of hierarchical structure, it might seem that Lasnik’s formulation is virtually equivalent to Chomsky’s. It isn’t, because they have different empirical consequences. Thus consider the kind of paradigm examples Chomsky used to demonstrate the difference between a structuredependent formulation of Tq and one that was not structure dependent.
(ii)
a b.
Can the man who is coming to dinner swim? *Is the man who coming to dinner can swim?
The rule that identifies the subject NP of the main clause as the left context of the auxiliary that moves will not misgenerate (iib), whereas the rule that simply says move a finite auxiliary to the front of the sentence will. Notice that Lasnik’s formulation of structure dependence does not by itself rule out (iib), given that the auxiliary that moves constitutes a single constituent (unlike the formulation in SS). Now if we take the nongeneration of (iib) as a criterion for structure dependence, then the notion “structure dependence” concerns the hierarchical analysis of the context in which a transformation applies and not merely the constituent analysis of the element affected. If so, then the Move α formulation of transformations (and subsequent formulations) will not be structure dependent in the relevant sense simply because they specify no analysis of the context in which they apply. Thus the structure dependence property discussed by Chomsky (197 1b, 1975c) which distinguishes (iia) from the deviant (iib), must now be captured by general conditions on rules or representations. (See Freidin 1991 for detailed discussion.) Note that these conditions invariably concern the hierarchical structure of the
Syntactic structures redux
355
context to which the transformations apply. In this way, the grammar operates in a structure-dependent manner, even if the transformational component itself contains no structure-dependent formulations of rules. Note that, as a further consequence of Lasnik’s interpretation, aspectual auxiliaries represented in the SS phrase structure rule would have to be analyzed as two separate constituents because the affix is moved by a transformation. If so, then the aspectual stem and its associated affix should presumably form a single constituent in underlying structure—in which aspectual auxiliaries would require a label (which they do not have in the SS analysis). 8 Lasnik lists the further option of a single terminal, assuming that C is a terminal symbol. Technically it could be because the phrase structure rules do not rewrite it as some other terminal symbol. However, the Number Transformation in SS turns it into another symbol (S, Ø, or past). Given that the Number Transformation must apply before Tq and the formulation of Tq requires that C remain as a symbol of the derived phrase marker, the only way this could happen is if the Number Transformation creates a new hierarchical relation between C and these morphological elements. Therefore any instance of C in Tq can only be a nonterminal symbol because the Number Transformation must have applied to change it from a terminal to a nonterminal symbol. 9 If affix hopping applied prior to the application of Tq, then at that point in the derivation the nonconstituent strings that Tq moves would in fact form a single constituent. However, in the case where there is only a main verb, Tq would never apply under that ordering. One solution would be to assume that the two rules can apply in either order. If affix hopping creates an adjunction structure, then Tq could move a single constituent consisting of an auxiliary and its affix. The A/A Principle would prevent the inversion of the auxiliary without its affix, as Lasnik points out. When Aux consists of only C, then Tq could prepose only C. In this way, Tq would affect only single constituents. 10 As Lasnik spells out, a rule that mentions three constant terms has four positions in which a variable may or may not occur, yielding 16 possible formulations of the SA. Note further that the SS system places no explicit limits on the number of constant terms mentioned in the SA (henceforth the constant term problem), which also increases the descriptive power of the system. 11 It is worth noting in this regard that Ross’s 1967 MTT dissertation articulates several general constraints on wh-movement but never provides a formulation of a wh-movement rule. Presumably the formulation of such a rule would be so simple as to be superfluous—if it could be formulated at all within the descriptive practice at the time. Notice that Chomsky’s first conditions paper (1973, first draft 1970) can be read in part as an attempt to generalize the conditions approach to (non-wh) NP movement. 12 Noncompounding of elementary operations follows from the definition of transformations given in Lasnik and Kupin 1977:ex. (19) and is stated in Chomsky 1980. This immediately rules out the SS formulation of the passive transformation, another transformation in SS that plays a central role in the analysis of the verbal morphology system. For a detailed discussion of the evolution of the passive transformation, see Freidin 1994b. 13 In his discussion of the formalization of SCs (sect.b.3.2.2) Lasnik lists the following four elementary operations:
i.adjunction of term A to term B ii.deletion (of a term or sequence of terms) iii.adjunction of an element not in the phrase marker to an element that is—basically an insertion operation
Generative grammar
356
iv.permutation (“changing the order of two items without attaching either one to the other”) Affix hopping is an example of (i); do-support and Tnot are examples of (iii). Tq is presumably an example of (iv). In Lasnik’s illustration of the effect of Tq (p. 76) the subject NP does not move, just the sequence of C+auxiliary is displaced. Yet as we know, changing the order of the subject and the auxiliary can also be done via adjunction. So, on simple economy grounds the need for a separate elementary operation of permutation seems suspect. Note that SS does not postulate an elementary operation of substitution. Substitution first appears in Chomsky 1965 and seems to have been discarded with the advent of Bare Phrase Structure Theory. But see Freidin 1999a for an analysis of head movement that utilizes a substitution operation. 14 If this intrinsic variable analyzes the null string, then the two terms will be adjacent. Even though we apparently cannot stipulate adjacency of transformational terms, we cannot stipulate the converse either. 15 Within the SS system of rules the intrinsic orderings (e.g., Number Transformation > Tq) involve at least one obligatory rule. If we invert the ordering of the pair, then the resulting derivation bleeds the application of the obligatory rule. In this case, the extrinsic ordering problem reduces to the obligatory rule problem. See Lasnik’s landmark 1981 paper on the analysis of auxiliaries where this approach to the verbal system was first developed. 16 In the case of Tso, V is deleted under the operation that replaces VP with so and moves so to the front of the conjoined clause. 17 Lasnik does not discuss derivations in terms of Merge, nor does he mention Bare Phrase Structure Theory (BPST). I will assume for the time being that his analysis is compatible. BPST developed as part of the Minimalist Program of the early 1990s. However, the view that there are no phrase structure rules in grammar was first articulated in the late 1970s (e.g., Chomsky, class lecture, September 1979). From 1979 to 1994, there was no explicit device in grammar for generating phrase structure representations, though it was generally assumed that X-bar theory somehow accounted for phrase structure. The methodological imperative for eliminating redundancies in grammatical description motivated the elimination of PSRs. PS rules because they stipulated information that followed independently from general principles of grammar in conjunction with specific lexical properties that had to be specified in the lexicon. See Stowell 1981 for an extensive discussion of this approach. BPST replaces the PSR rule mechanism with the transformational operation Merge—essentially the adjunction elementary which concatenates two syntactic elements.
In this context it is worth noting what Chomsky has to say in SS about the relation between phrase structure and transformations: “I think it is fair to say that a significant number of the basic criteria for determining constituent structure are actually transformational” (p. 83). From our current perspective, these basic criteria are totally transformational. 18 There are several excellent student questions interspersed throughout the lectures to which Lasnik fully responds. 19 Lasnik goes on the criticize Pollock’s account for making an incorrect prediction—namely that auxiliary verbs can always raise in any type of sentence, even when T is morphologically impoverished. He cites the following examples:
Syntactic structures redux (i)
a. b.
357
Do not be foolish! *Be not foolish!
Although it is not obvious that be in this construction is an auxiliary, it seems reasonable to assume that it has the auxiliary property of not assigning θ-roles. Thus the argument against Pollock’s account holds. He also claims that Pollock’s account incorrectly predicts that auxiliary verbs can raise in English infinitival constructions, citing (ii).
(ii)
a. b. c.
I believe John not to be foolish. I believe John to not be foolish. *I believe John to be not foolish.
If the infinitival marker to occupies T, as is usually assumed, and that it is not an affix that requires a stem to adjoin to it, then be would have no head position to raise to in (iic). If so, then (iic) does not constitute a problem for Pollock’s account. 20 Lasnik’s discussion in section 3.8 is presented in terms of a unitary Infl category, rather than the split-Infl analysis of Pollock 1989 and Chomsky 1991, which he develops in the preceding sections (3.5–3.7). The point of Lasnik’s analysis is clear, although how this analysis applies to the split-Infl architecture is not obvious. See note 32 for further discussion. 21 Under a theory where features can be checked long distance, this account does not explain why V has to raise to I. Let’s put this aside, because obviously Lasnik is assuming that there is no long-distance feature checking. 22 Lasnik interprets “nonfeatural” as “uninflected.” English main Vs enter the derivation bare (i.e., without a verbal affix). 23 The same situation happens in negative imperatives (e.g., Don’t be foolish! vs. *Be not foolish!). Lasnik assumes that imperatives are like indicatives with just I +main V I is affixal and V is bare.
There is one potential problem for the adjacency analysis—namely, adverbs that occur between Spec, IP and V, as in (i). (i)John quickly left the room. If the adverb occurs between I and V, then we expect affix hopping to be blocked and do-support to apply, yielding the deviant (ii), where periphrastic do is unstressed.
(ii)*John did quickly leave the room. However, (i)–(ii) would follow if the adverb actually occurred in a position to the left of I. Lasnik (2003) cites the following example to support this analysis. (iii)John partially lost his mind, and Bill completely did.
As Lasnik notes, “what makes this possible is, at present, mysterious” (p. 16). I myself find (iii) distinctly marginal and (iv) simply deviant.
Generative grammar
358
(iv)*John partially lost his mind, and so did Bill completely. In (iv) the VP is represented by an anaphor so, which presumably moves to Spec,CP. The periphrastic did occupies C via head movement from I to C. Note that grammatical sentences of the form in (iv) (e.g., (v), where the adverb completely has been deleted) provide evidence that all IPs are actually CPs with covert C, under the assumption that coordination can only apply to phrases with the same structure. See the discussion of coordination in SS. (v)John partially lost his mind, and so did Bill. Note further that the VP anaphor so in (v) is interpreted as partially lost his mind— that is, with the adverb as part of VP. 24 Lasnik gives a detailed critique of the HMC analysis of the verbal morphology system in Chomsky’s 1991 economy analysis. Note that his discussion concerns using the HMC to block raising, not affix hopping. 25 Such an analysis would, of course, generalize to (11b) and (He). 26 Chomsky avoids the problem by claiming: “We have not shown this, but it is in fact true that in the simplest phrase structure grammar of English there is never any reason for incorporating “be” into the class of verbs; i.e., it will not follow from this grammar that be is a V” (p. 67). Instead, Chomsky proposes that be +predicate is one of the forms of VP. In Chomsky 1965 the phrase structure rule for VP on page 107 distinguishes between the categories V and Copula, thus apparently eliminating the problem. Given X-bar theory, the phrasal projection of the copula would be categorially distinct from the phrase projected by a verb, hence never a VP. Whether an analysis that categorially distinguishes the copula from the class of verbs creates empirical problems is not obvious. 27 Notice that this analysis bears on the interpretation of what has been called the thesis of the autonomy of formal grammar (a.k.a. “the autonomy of syntax” [see Chomsky 1976:166]). The original autonomy thesis presented in chapter 9 of SS hypothesized that the rules of formal grammar could be formulated without recourse to semantic notions (e.g., synonymy, contradiction, and reference). Various misinterpretations of what this said about the syntax/semantics interface resulted (see Chomsky 1975b for discussion). For SS, formal grammar must include phonology since the transformations postulated make specific reference to phonological forms. Under current assumptions about the formulation of rules, this no longer seems possible. 28 In SS (pp. 66ff.) Chomsky discusses the ambiguous transformational analysis as a virtue of his system of rules. He cites the pair in (i), his example (54), claiming that both are well formed. (i)
a. b.
Does John have a chance to live? Has John a chance to live?
In (ia) have is analyzed by (3) as main V, hence only C would be inverted with the subject NP thereby triggering do-support. In (ib), the main verb have would be analyzed in terms of its phonological shape, hence like auxiliary have,
Syntactic structures redux
359
and therefore it would be preposed along with C to clause initial position. There is however an alternative analysis for (ib) where it could be taken as a truncated version of (ii). (ii)
Has John got a chance to live?
If this is correct, then has in (ib) is actually an auxiliary, not a main verb that has been fronted because of phonological identity with the perfective auxiliary Chomsky does not comment on the problem this analysis raises in the case of be, presumably because he proposes that be can never be analyzed as V (see n. 25). 29 Such an analysis is explored in Freidin 1992:chap. 4. 30 Lasnik discusses X-bar theory and selection at the beginning of the third chapter, and on pages 140 and 141 there are tree diagrams showing the progressive -ing affix as the head of its own phrase, where progressive be selects the head -ing, which in turn selects an uninflected main verb (sing). Thus under Lasnik’s analysis at this point, the affixes -ing and -en both select an uninflected V In the case of -en the V could either be the progressive auxiliary be or a main verb, whereas in the case of -ing, V can only be the main verb. Under Lasnik’s hybrid analysis, aspectual auxiliaries enter the derivation fully inflected whereas main verbs enter the derivation bare. Therefore, these selectional relations will occur in certain cases—for example, aspectual have will select -en when the following verbal element is the main verb but will select progressive been when the following element is the aspectual auxiliary. As noted in the previous section, this constitutes an undesirable complication. 31 Granted that this itself is a kind of stipulation; however, the overall analysis provides a rationale for the order of verbal elements in a way that a phrase structure rule, which merely stipulates the order, does not. Given bare phrase structure theory, there is no option to merely stipulate the order of verbal elements. Thus the deviant orderings of verbal elements via Merge become a problem to solve, one way or another. 32 The remainder of the discussion will assume that T is the functional head of a clause, taking a subject as its specifier and a predicate as its complement. One reason for doing this concerns the syntactic structure of infinitivals. The infinitival to is clearly an instance of T and furthermore unrelated to any type of agreement. Given that verbal agreement features are uninterpretable at LF, it would appear to be a mistake to posit agreement projections for LF structures. If so, then neither the unitary Infl nor the split-Infl analyses are on the right track. Tense, in contrast, is obviously relevant to LF interpretation. 33 Note that there is also the imperative form of do as in (i). (i)
Do be careful!
Not surprisingly, imperative do also selects the bare form of V Given its properties, it is plausible to analyze imperative do as another instance of nonfinite T. When it does not occur in an imperative, as in (ii), we might assume that T contains a phonetically null imperative element. (ii)
Be careful!
As Lasnik notes, the bare form be cannot raise from V to I (i.e., T) when the negative is present, as illustrated in (iiia).
Generative grammar (iii)
a. b.
360
*Be not careless! Do not be careless!
Lasnik accounts for this by postulating that the imperative element in I (i.e., T) is only affixal and therefore it must either attach to V or be supported by do. Under this analysis, (i) would be the result of some grammatical formative separating the imperative affix from V, thereby preventing affix hopping and triggering dosupport. The obvious candidate would be the SS emphasis marker; however, it is not clear that the difference between (i) and (ii) is one of emphasis. Nonetheless, there does seem to be a difference between the two, although it is not easy to articulate. Assuming that periphrastic do is no different from other lexical items—that is, it exists as an independent entry in the lexicon has a certain plausibility from the perspective of language acquisition. No special assumptions are necessary regarding its acquisition. In contrast, under the do-support analysis, it must be assumed that the child acquiring English somehow knows that periphrastic do must not be acquired like other lexical items and instead must be associated only with a language-specific transformation. Under the alternative, the child must figure out the distribution for this particular lexical item, just as she must figure out the distribution of other lexical items. 34 Even if we now stress left in (14a), the verb becomes contrastive with some other verb. It does not express the same concept as did leave where did receives emphatic stress. This analysis is further supported by some facts from dialect variation. In the southwest of England (in Somerset and Dorset) we find constructions where no verbal element is stressed. (The following data are from Trudgill 1999:102–103.) (i) a. I sees. (ii) (iii) (iv)
b.
I do see.
a.
seen.
b.
did see.
a.
sees the dentist tomorrow.
b.
seen the dentist last week.
a.
do see the doctor every day.
b.
I did see the doctor every day.
Trudgill comments that periphrastic do is used to express “actions or events that are repeated or habitual” (p. 103), in contrast with single events or actions. The same distinction occurs in the history of English syntax (Ian Roberts, p.c.). For further discussion of periphrastic do in the history of English, see Roberts 1992. 35 This analysis suggests that all verbal auxiliaries are functional rather than lexical categories. Their functional nature seems to derive from the fact that they select a head with a particular grammatical form, as suggested by Jean-Roger Vergnaud (p.c.). Thus modals select bare verbal heads, perfective have selects the past participle form, and so forth. In contrast, the
Syntactic structures redux
361
selectional properties of lexical categories are determined by the θ-roles they assign. Thus a verb of motion like give cannot take a CP complement because a CP cannot be interpreted as a goal or a theme. There is no reason to state redundantly that give selects N rather than C. See Freidin 1983. 36 Note that the lack of parallel behavior of verbal element in finite and infinitival clauses also occurs in French. In finite clauses all verbal elements can (and presumably must) occur in T. Not so in infinitivals, as Pollock (1989) describes. Presumably infinitival auxiliaries plus copular être ‘to be’ and the main verb avoir ‘have’ need not occur in T to the left of the negative element pas. The following French examples are from Pollock 1989:sect. 2.2. (i) a. Ne pas être heureux est une condition pour écrire des romans. not to-be happy is a prerequisite for to-write novels b. Ne pas avoir eu d’enfance heureuse est une condition pour écrire des romans. not to-have had a childhood happy is a prerequisite for to-write novels c. Ne pas avoir de voiture en banlieue rend la vie difficile, not to-have a car in the suburbs makes life difficult
However, it is possible for the infinitive to occur in T, which by hypothesis is the first head position to the left of the negative. Pollock cites the following corresponding examples, where the infinitive form occurs to the left of the negative pas. (ii) a. N’être pas heureux est une condition pour écrire des romans. to be not happy is a prerequisite for to-write novels b. N’avoir pas eu d’enfance heurese est une condition pour ecrire des romans. to-have not had a childhood happy is a prerequisite for to-write novels c.
N’avoir pas de voiture en banlieue rend la vie difficile, to-have not a car in the suburbs makes life difficult
Pollock comments that these examples “are usually considered somewhat literary and recherché… .but are perfectly fine” (pp. 373–374). The comment suggests that these constructions are fine for some but maybe not for others. In contrast, the construction illustrated in (i), where the infinitive does not raise to T, requires no qualification. Given (ii) along with (i), movement of the infinitive to T appears to be optional. If so, then the movement to T does not seem to be driven by any particular grammatical requirement—which is problematic under the current theoretical assumption that all movement is motivated by the need to satisfy such requirements. Whereas such optional movement is possible for copular être, the perfective auxiliary, and the homophonous main verb avoir, the movement of all other infinitival main verbs to T is prohibited.
Generative grammar (iii)
362
a.
Ne pas posséder de voiture en banlieue rend la vie difficle. not to-own a car in the suburbs makes life difficult
b.
*Ne posséder pas de voiture en banlieue rend la vie difficle. to-own not a car in the suburbs makes life difficult
The contrast between (iic) and (iiib) is striking, given that the two sentences are virtually synonymous. Another possible exception is the class of “modal-like verbs” (υouloir, deυoir , and pouvoir), which only marginally allow the raised alternative. (iv)
?Je pensais ne pouvoir pas dormir dans cette chambre. I thought to-‘can’ not to-sleep in this room ‘I thought I would not be able to sleep in this room.’
Again, Pollock notes that such examples “have a very literary ring to them” (p. 375). 37 Note further that the claim that not blocks covert V-to-T movement was basically unmotivated. 38 As proposed in Freidin 1999—q.v. for further discussion. Notice that this looks like affix hopping in reverse, though crucially the assumption is that a feature is moving, not an affix. This operation is motivated by a fundamental structural requirement, rather than a morphological requirement. Furthermore, the analysis is consistent with the Inclusiveness Condition (see Chomsky 1995d:228). Note also that this proposal does not violate Chomsky’s prohibition against “self-attachment” (cf. Chomsky 1995b:72). That prohibition rules out the case where a is extracted from K and adjoined to K, projecting L—crucially when K is a projection of a and L is also a projection of a. In the analysis proposed here, K is a projection of V and L is a projection of tense; therefore, the operation creating TP does not meet the criteria for “self-attachment.” 39 Radical because it eliminates the transformational machinery of SS’s and Lasnik’s analysis; but nonetheless preserving the fundamental insight in SS, which is reiterated in SSR, that the deviance of constructions like (21b) concerns a failure of adjacency between an abstract inflectional element and the main verb. 40 In this regard see also Sprouse 2003.
References Akmajian, A. (1975). More evidence for an NP Cycle. Linguistic Inquiry 6:115–129. Anderson, M. (1979). Noun phrase structure, unpublished PhD dissertation, University of Connecticut. Andrews, A. (1976). The VP complement analysis in modern Icelandic. Proceedings of the Sixth Annual Meeting, NELS, GLSA, University of Massachusetts, Amherst. Aoun, J. (1979). On government, case-marking, and clitic placement, unpublished manuscript. Aoun, J. and D.Sportiche (1983). On the formal theory of government. The Linguistic Review 2:211–236. Aoun, J., N.Hornstein, and D.Sportiche. (1980). Some aspects of wide scope quantification. Journal of Linguistic Research 1:69–95. Atkinson, M. (1996). The minimalist program. In K.Brown and J.Miller (eds) Concise Encyclopedia of Syntactic Theories, Edinburgh: Pergamon Press. Babby L. (1980a). Existential Sentences and Negation in Russian, Ann Arbor, MI: Karoma Publishers. Babby, L. (1980b). The syntax of surface case marking. In Cornell Working Papers in Linguistics 1. Department of Modern Languages and Linguistics, Cornell University. Babby, L. (1987). Case, prequantifiers, and discontinuous agreement in Russian. Natural Language and Linguistic Theory 5:91–138. Babby, L. (1990). Noncanonical configurational case strategies. In Cornell Working Papers in Linguistics 10. Department of Modern Languages and Linguistics, Cornell University. Babby, L. (1994). A theta-theoretic analysis of adversity impersonal sentences in Russian. In Sergey Avrutin et al. (eds) FASL 2: The MIT Meeting, Ann Arbor, MI: Michigan Slavic Publications, pp. 25–67. Babby, L. (1998). Voice and diathesis in Slavic. Position paper presented at the Workshop on Comparative Slavic Morphosyntax, Spencer, Indiana. Available from: http://www.indiana.edu/slavconf/linguistics/babby.pdf. Babby, L. (2000). The genitive of negation: a unified analysis. In Steven Franks et al. (eds) FASL 9: The Bloomington Meeting, Ann Arbor, MI: Slavica Publications, 39–55. Babyonyshev, M. (1996). Structural connections in syntax and processing: studies in Russian and Japanese, MIT doctoral dissertation. Bailyn, J. (1998). The status of optionality in analyses of Slavic syntax. Paper presented at FASL 7: The Seattle Meeting, Seattle, Washington. Bailyn, J. (2004). Generalized inversion. Natural Language and Linguistic Theory 22:1–49. Baltin, M. (1978). PP as a bounding node. In M.J. Stein (ed.) Papers from the Eighth Annual Meeting of the North Eastern Linguistic Society’, University of Massachusetts at Amherst, MA. Barsky, R. (1997). Noam Chomsky: A Life of Dissent, Cambridge, MA: MIT Press. Bartsch, R. and T.Vennemann. (1972). Relative adjectives and comparison, Unpublished manuscript, UCLA. Belletti, A. (1988). The case of unaccusatives. Linguistic Inquiry 19:1–35. Belletti, A., L.Brandi, and L.Rizzi (eds) (1981). Theory of Markedness in Generative Grammar, Pisa: Scuola Normale Superiore. Billings, L. and J.Maling (1995). Accusative-assigning participial -no/-to constructions in Ukrainian, Polish, and neighboring languages: an annotated bibliography. Journal of Slavic Linguistics 3(1): 177–217; (2): 396–430.
References
364
Bobaljik, J. and S.Brown (1997). Interarboreal operations: head movement and the extension requirement. Linguistic Inquiry 28:345–356. Boeckx, C. (2000a). EPP. Manuscript University of Connecticut. Boeckx, C. (2000b). Quirky agreement. Studia Linguistica 54(3): 354–380. Boehner, P. (1958). The realisitic conception of William Ockham. In E.M. Buytaert (ed.) Collected Articles on Ockham, St Bonaventure, NY: Franciscan Institute. Bohr, N. (1913). On the constitution of atoms and molecules. Philosophical Magazine 26:1–25, 476–502, 857–875. Bohr, N., H.A.Kramers, and J.E.Slater (1924). The quantum theory of radiation. Philosophical Magazine 47:785–802. [A German version appeared in Zeitschrift ftir Physik 24:69–87 (1924).] Borer, H. (1979). Empty subjects in modern Hebrew and constraints on thematic relations. In J.Jensen (ed.) Proceedings of the Tenth Annual Meeting of the North Eastern Linguistic Society, University of Ottawa, Ottawa, 25–37. Boskovic, Z. and D.Takahashi (1998). Scrambling and last resort. Linguistic Inquiry 29:347–366. Bouchard, D. (1982). On the content of empty categories, MIT doctoral dissertation. Bowers, J. (1968). Adjectives and adverbs in English, unpublished manuscript, MIT. Bresnan, J. (1971). Sentence stress and syntactic transformations. Language 47:257–281. Bresnan, J. (1972). The theory of complementation in English syntax, MIT doctoral dissertation. [Published by Garland Press (1979).] Bresnan, J. (1976a). On the form and functioning of transformations. Linguistic Inquiry 7:3–40. Bresnan, J. (1976b). Nonarguments for raising. Linguistic Inquiry 7:485–501. Bresnan, J. and I.Grimshaw (1978). The syntax of free relatives in English. Linguistic Inquiry 9:331–391. Brody M. (1982). On circular readings. In N.V.Smith (ed.) Mutual Knowledge, New York: Academic Press, 133–146. Brody, M. (1984). On contextual definitions and the role of chains. Linguistic Inquiry 15:355–380. Burzio, L. (1986). Italian Syntax, Dordrecht: Reidel. Campos, H. and P.Kempchinsky (eds) (1995). Evolution and Revolution in Linguistic Theory: Essays in Honor of Carlos Otero, Georgetown, Washington, DC: Georgetown University Press. Cheng, L. and H.Demirdash (1990). Superiority violations. MIT Working Papers in Linguistics 13:27–46. Chomsky, N. (1951). Morphophonemics of Modem Hebrew. [Published New York: Garland, (1979).] Chomsky, N. (1955a). The logical structure of linguistic theory. Mimeographed unpublished manuscript, Massachusetts Institute of Technology Library, Cambridge, MA. [See Chomsky (1975a).] Chomsky, N. (1955b). Semantic considerations in grammar. In R.H. Weinstein (ed.) Report of the Sixth Annual Round Table Meeting on Linguistics and Language Teaching, Washington, DC: Georgetown University Press, pp. 141–150. Chomsky, N. (1956). Three models for the description of languages. IRE Transactions on Information Theory 2:113–124. Chomsky, N. (1957). Syntactic Structures, The Hague: Mouton. Chomsky, N. (1959). On certain formal properties of grammars. Information and Control 2:137–167. Chomsky, N. (1962a). A transformational approach to syntax. In A.A.Hill (ed.), Proceedings of the Third Texas Conference on Problems of Linguistic Analysis in English, 1958, Austin: University of Texas. [Reprinted in J.Fodor and J.Katz (eds) The Structure of Language: Readings in the Philosophy of Language, Englewood Cliffs, NJ: Prentice-Hall, 1964.] Chomsky, N. (1962b). Context-free grammar and pushdown storage. Quarterly Progress Report 65:187–194, Cambridge, MA: Research Laboratory in Electronics. Chomsky, N. (1962c). The logical basis of linguistic theory. In H.Lunt (ed.) Proceedings of the Ninth International Congress of Linguists, The Hague: Mouton, pp. 914–978.
References
365
Chomsky, N. (1964). Current issues in linguistic theory. In J.A.Fodor and J.J.Katz. (eds) The Structure of Language, Englewood Cliffs, NJ: Prentice-Hall, pp. 50–118. Chomsky, N. (1965). Aspects of the Theory of Syntax, Cambridge, MA: MIT Press. Chomsky, N. (1966). Topics in the Theory of Generative Grammar, Mouton: The Hague. Chomsky, N. (1970). Remarks on nominalizations. In R.Jacobs and P.Rosenbaum (eds) Readings in English Transformational Grammar, Waltham, MA: Ginn and Co., pp. 184–221. [Reprinted in N.Chomsky (1972b).] Chomsky, N. (1971a). Deep structure, surface structure, and semantic interpretation. In D. Steinberg and L.Jakobovits (eds) Semantics: An Interdisciplinary Reader in Philosophy, Linguistics, and Psychology, London: Cambridge University Press, pp. 183–216. [Reprinted in N. Chomsky (1972b).] Chomsky, N. (1971b). Problems of Knowledge and Freedom, New York: Random House. Chomsky, N. (1972a). Some empirical issues in the theory of transformational grammar. In S.Peters (ed.) Goals of Linguistic Theory, Englewood Cliffs, NJ: Prentice-Hall, pp. 63–130. [Reprinted in N. Chomsky (1972b).] Chomsky, N. (1972b). Studies on Semantics in Generative Grammar, The Hague: Mouton. Chomsky, N. (1973). Conditions on transformations. In S.R.Anderson and P.Kiparsky (eds) A Festschrift for Morris Halle, Holt, Rinehart and Winston, New York, pp. 232–286. [Reprinted in N.Chomsky (1977a).] Chomsky, N. (1975a). The Logical Structure of Linguistic Theory, Chicago, IL: University of Chicago Press. Chomsky, N. (1975b). Questions of form and interpretation. Linguistic Analysis 1:75–109. [Reprinted in N.Chomsky (1977a).] Chomsky, N. (1975c). Reflections on Language, New York: Pantheon. Chomsky, N. (1975d). The Amherst lectures. Paris: Departement de Recherches Linguistiques, Université de Paris VII. Unpublished transcription of 1974 lecture notes. Chomsky, N. (1976). Conditions on rules of grammar. Linguistic Analysis 2:303–351. [Reprinted in N.Chomsky (1977a).] Chomsky, N. (1977a). Essays on Form and Interpretation, New York: North-Holland. Chomsky, N. (1977b). On Wh-Movement. In P.W.Culicover, T.Wasow, and A.Akmajian (eds) Formal Syntax, New York: Academic Press, pp. 71–132. Chomsky, N. (1980). On Binding, Linguistic Inquiry 11, 1–46. Chomsky, N. (1981). Lectures on Government and Binding, Dordrecht: Foris. Chomsky, N. (1982a). Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, MA: MIT Press. Chomsky, N. (1982b). The Generative Enterprise. A Discussion with Riny Huybregts and Henk van Riemsdijk [1979–80], Dordrecht: Foris. Chomsky, N. (1986a). Knowledge of Language: Its Nature, Origin, and Use, New York: Praeger. Chomsky, N. (1986b). Barriers, Cambridge: MIT Press. Chomsky, N. (1991). Some notes on the economy of derivation and representation. In R. Freidin (ed.) Principles and Parameters in Comparative Grammar, Cambridge, MA: MIT Press, pp. 417–454. [An early version appeared in MIT Working Papers in Linguistics 10 (1989); reprinted in Chomsky (1995b).] Chomsky, N. (1993). A minimalist program for linguistic theory. In K.Hale and S.J.Keyser (eds) The View from Building 20: Essays in linguistics in Honor of Sylvain Bromberger, Cambridge, MA: MIT Press, pp. 1–52. [Reprinted in Chomsky (1995b).] Chomsky, N. (1995a). Language and nature. Mind 104:1–61. Chomsky, N. (1995b). Bare phrase structure. In Hector Campos and Paula Kemchinsky (eds) Evolution and Revolution in Linguistic Theory: Studies in Honor of Carlos P. Otero, Washington, DC: Georgetown University Press, pp. 51–109. Chomsky, N. (1995c). Categories and transformations. In Chomsky (ed.) The Minimalist Program, Cambridge, MA: MIT Press, pp. 219–394.
References
366
Chomsky, N. (1995d). The Minimalist Program, Cambridge, MA: MIT Press. Chomsky, N. (1998). Some observations on economy in generative grammar. In P.Barbosa et al. (eds) Is Best Good Enough? Optimality and Competition in Syntax, Cambridge, MA: The MIT Press, pp. 115–127. Chomsky, N. (2000a). New Horizons in the Study of Language and Mind, New York: Cambridge University Press. Chomsky, N. (2000b). Minimalist inquiries: the framework. In R.Martin, D.Michaels and J.Uriagereka (eds) Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik, Cambridge, MA: MIT Press, pp. 89–155. [An early version appeared as MIT Occasional Papers in Linguistics 15 (1998).] Chomsky, N. (2001). Derivation by phase. In M.Kenstowicz (ed.) Ken Hale: A Life in Language, pp. 1–52. [Published earlier as MIT Occasional Papers in Linguistics 18.] Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry 36:1–22. Chomsky, N. and H.Lasnik (1977). Filters and control. Linguistic Inquiry 8:425–504. Chomsky, N. and H.Lasnik (1993). The theory of principles and parameters. In J.Jacobs, A.von Stechow, W.Sternefeld, and T.Vennemann (eds) Syntax: An International Handbook of Contemporary Research, vol. 1, Berlin: Walter de Gruyter, pp. 506–596. [Reprinted in Chomsky (1995b).] Chomsky, N. and G.A.Miller (1958). Finite-state languages. Information and Control 1: 91–112. Cinque, G. (1993). A null theory of phrase and compound stress. Linguistic Inquiry 24: 239–298. Cohen-Tannoudji, E. B.Diu, and F.Laloe (1996). Mecanique Quantique, Paris: Hermann. Cole, P., W.Harbert, G.Hermon and S.N.Sridhar (1980). On the acquisition of subjecthood. Language 56:719–743. Collins, C. (1994). Economy of derivation and the generalized proper binding condition. Linguistic Inquiry 25:45–61. Collins, C. (1995). Toward a theory of optimal derivations. In Rob Pensalfini and Hiroyuki Ura (eds) Papers on Minimalist Syntax (MIT Working Papers in Linguistics 27). MITWPL, Department of Linguistics and Philosophy, Cambridge, MA: MIT Press, pp. 65–103. Collins, C. (1997). Local Economy, Cambridge, MA: MIT Press. Compton, A.H. (1923). A quantum theory of the scattering of x-rays by light elements. The Physical Review 21:483–502. Courant, R. (1937). Differential and Integral Calculus, vol. 1, London: Blackie & Son. Crain, S. (1991). Language acquisition in the absence of experience. Behavioral and Brain Sciences 14:S97–650. Crain, S. and M.Nakayama (1987). Structure dependence in grammar formation. Language 63:522–543. Culicover, P. W., T.Wasow, and A.Akmajian (eds) (1977). Formal Syntax, New York: Academic Press. Darwin, C. (1859). On the Origin of Species by Means of Natural Selection, London: John Murray. Debye, P. (1923). Zerstreuung von Rontgenstrahlen und Quantentheorie. Physikalische Zeitschrift 24:161–166. Dirac, P. A. M. (1968). Methods in theoretical physics. In From a Life of Physics; Evening Lectures at the International Center for Theoretical Physics. Trieste, Italy. A special supplement of the International Atomic Energy Agency Bulletin, Austria. Dresher, E. and N.Hornstein (1979). Trace theory and NP movement rules. Linguistic Inquiry 10:65–82. Einstein, A. (1905). Uber einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen Gesichtspunkt. Annalen der Physik 17:132–148. Einstein, A. (1917). Zur Quantentheorie der Strahlung. Physikalische Zeitschrift 18: 121–128. [First printed in 1916 in Mitteilungen der Physikalische Gesellschaft Zurich 16:47–62.] Emonds, J. (1970). Root and structure preserving transformations, MIT doctoral dissertation.
References
367
Emonds, J. (1976). A Transformational Approach to English Syntax: Root, Structure-Preserving, and Local Transformations, New York: Academic Press. Emonds, J. (1978). The verbal complex V’-V in French. Linguistic Inquiry 9:151–175. Epstein, S.D. (1990). Differentiation and reduction in syntactic theory: a case study. Natural Language and Linguistic Theory 8:313–323. Epstein, S.D. (1993). Superiority. Manuscript. Harvard University. Epstein, S.D. (1999). Un-principled syntax and the derivation of syntactic relations. In Epstein and Hornstein (eds) Working Minimalism, Cambridge, MA: MIT Press, pp. 317–345. Epstein, S.D. and N.Hornstein (eds) (1999). Working Minimalism, Cambridge, MA: MIT Press. Epstein, S.D., E.Groat, R.Kawashima, and H.Kitahara (1998). A Derivational Approach to Syntactic Relations, New York: Oxford University Press. Feyerabend, P. (1975). Against Method, London: Verso. Feynman, R. (1971). The Character of Physical Law, Cambridge, MA: MIT Press. Feynman, R., R.B.Leighton, and M.Sands (1963). The Feynman Lectures on Physics, vol. 1, Reading, MA: Addison-Wesley Fiengo, R. (1974). Semantic Conditions on Surface Structure, unpublished doctoral dissertation, Cambridge, MA: MIT Press. Fiengo, R. (1977). On trace theory. Linguistic Inquiry 8:35–61. Fiengo, R. (1980). Surface Structure: The Interface of Autonomous Components, Cambridge: Harvard University Press. Fiengo, R. and J.Higginbotham (1981). Opacity in NP. Linguistic Analysis 7, 395–421. Fillmore, C. (1963). The position of embedding transformations in a grammar. Word 19:208–231. Fodor, J.D. (1978). Parsing strategies and constraints on transformations. Linguistic Inquiry 9:427–473. Freidin, R. (1970). Interpretive semantics and the syntax of English complement constructions, unpublished doctoral dissertation, Indiana University. Freidin, R. (1974). A Note on Wh-movement. Purdue University Contributed Paper in Speech, Hearing, and Language 3:55–60. Freidin, R. (1975a). Review of R.Jackendoff, Semantic Interpretation in Generative Grammar, Cambridge, MA: MIT Press. Language 51:189–205. Freidin, R. (1975b). The analysis of passives. Language 51:384–405. [Chapter 12, this volume.] Freidin, R. (1976). The Syntactic Cycle: Proposals and Alternatives, Bloomington, IN: Indiana University Linguistics Club. Freidin, R. (1977). Review of Emonds (1976). Language 54:407–416. Freidin, R. (1978). Cyclicity and the theory of grammar. Linguistic Inquiry 9:519–549. [Chapter 2, this volume.] Freidin, R. (1979). Misgeneration: conditions on derivations vs. conditions on representations. Manuscript. Freidin, R. (1983). X-bar theory and the analysis of English infinitivals. Linguistic Inquiry 14:713–722. Freidin, R. (1986). Fundamental issues in the theory of binding. In Barbara Lust (ed.) Studies in the Acquisition of Anaphora, vol. I, Dordrecht: Reidel, pp. 151–188. [Chapter 10, this volume.] Freidin, R. (ed.) (1991a). Principles and Parameters in Comparative Grammar, Cambridge, MA: MIT Press. Freidin, R. (1991b). Linguistic theory and language acquisition: a note on structure dependence. Behavioral and Brain Sciences, 14:618–619. [Chapter 14, this volume.] Freidin, R. (1992). Foundations of Generative Syntax, Cambridge, MA: MIT Press. Freidin, R. (1994a). Generative grammar: principles and parameters framework. In R.E. Asher (ed.) The Encyclopedia of Language and Linguistics, vol. 3, London: Pergamon Press, pp. 1370–1385.
References
368
Freidin, R. (1994b). Conceptual shifts in the science of grammar: 1951–1992. In Carlos P. Otero (ed.) Noam Chomsky: Critical Assessments, London: Routledge, pp. 653–690. [Chapter 15, this volume.] Freidin, R. (1994c). Superiority and extraposition. Manuscript, Princeton University. Freidin, R. (1997a). Review article: N. Chomsky. The Minimalist Program, Cambridge, MA: MIT Press. (1995). Language 73(3): 571–582. [Chapter 16, this volume.] Freidin, R. (1997b). Binding theory on minimalist assumptions. In H.Bennis, P.Pica, and J.Rooryck (eds) Atomism and Binding, Dordrecht: Foris, pp. 141–153 [Chapter 11, this volume.] Freidin, R. (1999a). Elementary operations on minimalist assumptions: some speculations. In Linguistic Society of Korea (ed.) Linguistics in the Morning Calm, vol. 4, Seoul: Hanshin, pp. 1–19. Freidin, R. (1999b). Cyclicity and minimalism. In S. Epstein and N. Hornstein (eds) Working Minimalism, Cambridge, MA: MIT Press. Freidin, R. and L.Babby (1984). On the interaction of lexical and syntactic properties: case structure in Russian. Cornell Working Papers in Linguistics 6:71–103. Freidin, R. and W.Harbert (1983). On the fine structure of the binding theory: principle A and reciprocals. In P.Sells and C.Jones (eds) Proceedings of the Thirteenth Annual Meeting of the North Eastern Linguistic Society, Cambridge, MA: MIT, pp. 63–72. [Chapter 9, this volume.] Freidin, R. and H.Lasnik (1981). Disjoint reference and wh-trace. Linguistic Inquiry 12:39–53. [Reprinted in Lasnik (1989)] [Chapter 8, this volume.] Freidin, R. and R.Sprouse (1991). Lexical case phenomena. In Freidin (1991a), pp. 392–416. [Chapter 6, this volume.] Freidin, R. and J.-R.Vergnaud (2001) Exquisite connections: some remarks on the evolution of linguistic theory. Lingua 111:639–666. [Chapter 17, this volume.] Fukui, N. (1996). On the nature of economy in language. Cognitive Studies 3:51–71. George, L. (1980). Analogical generalization in natural language syntax. MIT doctoral dissertation. George, L. and J.Kornfilt (1981). Finiteness and boundedness in Turkish. In Heny (1980), pp. 105–127. Goodman, N. (1943). On the simplicity of ideas. Journal of Symbolic Logic 8(4): 107–121. Greene, B. (1999). The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory, New York: Norton. Gruber, J. (1965). Studies in lexical relations, MIT doctoral dissertation. Harbert, W. (1982a). In defense of tense. Linguistic Analysis 9:1–18. Harbert, W. (1982b). Should binding refer to subject? In J.Pustejovsky and P.Sells (eds) Proceedings of the Twelfth Annual Meeting of the North Eastern Linguistic Society, Cambridge, MA: MIT Press, pp. 116–131. Harbert, W. (1983). On the definition of binding domains. In D.Flickinger (ed.) Proceedings of the West Coast Conference on Formal Linguistics II, Stanford, CA: Stanford University. Harman, G (1963). Generative grammars without transformation rules. Language 39:597–616. Harris, R. (1993). The Linguistic Wars, New York: Oxford University Press. Harris, Z. (1946). From morpheme to utterance. Language 22:161–83. [Reprinted in Z. Harris (1981).] Harris, Z. (1951). Methods in Structural Linguistics, Chicago, IL: University of Chicago Press. Harris, Z. (1952a). Co-occurrence and transformation in linguistic structure. Language 33:283–340. [Reprinted in Z. Harris (1981).] Harris, Z. (1952b). Discourse analysis. Language 28:1–30. [Reprinted in Z.Harris (1981).] Harris, Z. (1965). Transformational theory. Language 41:363–401 [Reprinted in Z.Harris (1981).] Harris, Z. (1981). Papers on Syntax, Dordrecht: D. Reidel. Harves, S. (2002). Unaccusative syntax in Russian, Princeton University doctoral dissertation. Hasegawa, K. (1968). The passive construction in English. Language 44:230–243. Hendrick, R. and M.Rochemont (1988). Complementation, Multiple Wh and Echo Questions. Toronto Working Papers in Linguistics 9.
References
369
Heny F. (ed.) (1980). Binding and Filtering, Cambridge, MA: MIT Press. Higginbotham, J. (1980a). Anaphora and GB: some preliminary remarks. In J.Jensen (ed.) Proceedings of the Tenth Annual Meeting of the North Eastern Linguistic Society 9: 223–236, Ottawa, Canada. Higginbotham, J. (1980b). Pronouns and bound variables. Linguistic Inquiry 11: 679–708. Higginbotham, J. (1983). Logical form, binding, and nominals. Linguistic Inquiry 14: 395–420. Higginbotham, J. and R.May (1981). Questions, quantifiers, and crossing. The Linguistic Review 1:41–79. Hockett, C. (1954). Two models of grammatical description. Word 10:210–234. Hornstein, N. (1977). S and the X’ Convention. Linguistic Analysis 3:137–176. Hornstein, N. (1995). Logical Form: From GB to Minimalism, Oxford: Blackwell. Hornstein, N. (1996). Minimizing control: a modest proposal, College Park, MD: University of Marland. Manuscript. Huang, C.-T.J. (1982). Logical relations in Chinese and the theory of grammar, MIT doctoral dissertation. Huang, C.-T.J. (1983). A note on the binding theory. Linguistic Inquiry 14:554–561. Hymes, D. and J.Fought (1981). American Structuralism, The Hague: Mouton. Iatridou, S. (1993). On nominative case assignment and a few related things. Papers on case and agreement II: MIT Working papers in linguistics 19:175–196. Ionin, T. (2000). Scrambling in Russian: evidence for an A/A-bar distinction. Handout from class presentation, MIT. Ionin, T. (2001). Quantifier scope in Russian, Class presentation, MIT. Itkonen, E. (1978). Grammatical Theory and Metascience: A Critical Investigation into the Methodological and Philosophical Foundations of Autonomous’ Linguistics, Amsterdam: John Benjamins. Jackendoff, R. (1969). Some rules of semantic interpretation for English, MIT doctoral dissertation. Jackendoff, R. (1972). Semantic Interpretation in Generative Grammar, Cambridge, MA: MIT Press. Jacobson, P. (1977). The syntax of crossing coreference sentences, University of California at Berkeley doctoral dissertation. Jaeggli, O. (1982). Topics in Romance Syntax, Dordrecht: Foris. Jenkins, L. (1977). On base generating trace. Paper presented at the Eighth Annual Meeting of the North Eastern Linguistic Society. University of Massachusetts at Amherst, MA. Jenkins, L. (2000). Biolinguistics, Cambridge, England: Cambridge University Press. Jourdain, P.E.B. (1913). The Principle of Least Action, Chicago, IL: Open Court Publishing Company. Junghallils, U. and G.Zybatow (1997). Syntax and information structure of Russian clauses. In Wayles Browne et al. (eds) FASL 4: The Cornell Meeting, Ann Arbor, MI: Slavic Publications, pp. 289–319. Katz, J. (1967). Recent issues in semantic theory. Foundations of Language 3:124–194. Katz, J. and J.Fodor (1963). The structure of a semantic theory. Language 39:170–210. Katz, J. and P.Postal (1964). An Integrated Theory of Linguistic Descriptions, Cambridge, MA: MIT Press. Kawashima, R. and H.Kitahara (1996). Strict cyclicity linear ordering, and derivational ccommand. In J.Camacho, L.Choueiri, and M.Watanabe (eds) Proceedings of the Fourteenth West Coast Conference on Formal Linguistics, Stanford, CA: CSLI Publications, pp. 255–269. Kayne, R. (1969). On the inappropriateness of rule features. Quarterly Progress Report of the Research Laboratory of Electronics, MIT, No. 95:85–93. Kayne, R. (1975). French Syntax: The Transformational Cycle, Cambridge, MA: MIT Press. Kayne, R. (1980). Extensions of binding and case-marking. Linguistic Inquiry 11: 75–96. Kayne, R. (1981). Two notes on the NIC. In A.Belletti L.Brandi, and L.Rizzi (eds) Theory of Markedness in Generative Grammar, Scuola Normale Superiore, Pisa, pp. 317–346.
References
370
Kayne, R. (1983). Connectedness. Linguistic Inquiry 14:223–250. Kayne, R. (1994). The Antisymmetry of Syntax, Cambridge, MA: MIT Press. Kean, M.-L. (1974). The strict cycle in phonology. Linguistic Inquiry 5:179–203. Kenstowicz, M. (ed.) (2001). Ken Hale: A Life in Language, Cambridge, MA: MIT Press. Keyser, S.J. (ed.) (1978). Recent Transformational Studies in European Languages, Cambridge, MA: MIT Press. Kimball, J. (1972). Cyclic and linear grammars. In J.Kimball (ed.) Syntax and Semantics, vol. 1, New York: Seminar Press, pp. 63–80. King, T.H. (1995). Configuring Topic and Focus in Russian, Stanford, CA: CSLI. Kitahara, H. (1993). Deducing ‘Superiority’ Effects from the Shortest Chain Requirement. Harvard Working Papers in Linguistics. Kitahara, H. (1995). Target α: deducing strict cyclicity from principles of economy. Linguistic Inquiry 26:47–77. Kitahara, H. (1996). Raising quantifiers without quantifier raising. In W.Abraham, S.D.Epstein, H.Thrainsson, and C.J.W.Zwart (eds) Minimal Ideas: Syntactic Studies in the Minimalist Framework. Amsterdam: John Benjamins, pp. 189–198. Kitahara, H. (1997). Elementary Operations and Optimal Derivations, Cambridge, MA: MIT Press. Klima, E. (1964). Negation in English. In J.Fodor and J.Katz (eds) (1964), pp. 246–323. Kneale, W. and M.Kneale (1962). The Development of Logic, Oxford: Oxford University Press. Kondrashova, Natalia (1996). The syntax of existential quantification, Doctoral dissertation, University of Wisconsin, Madison. Koster, J. (1977). Remarks on Wh-movement and the Locality Principle, xeroxed paper, University of Amsterdam, Amsterdam. Koster, J. (1978a). Why subject sentences don’t exist. In Keyser (1978), pp. 53–64. Koster, J. (1978b). Conditions, empty nodes, and markedness. Linguistic Inquiry 9: 551–593. Koster, J. (1978c). Locality Principles in Syntax, Dordrecht: Foris. Koster, J. (1986). Domains and Dynasties: The Radical Autonomy of Syntax, Dordrecht: Foris. Kovtunova, I.I. (1980). Porjadok slov. In N.Ju. Svedova et al. (eds) Russkaja grammatika (= Russian Academy Grammar), vol. 2, Moskva: Nauka. Kuhn, T. (1962). The Structure of Scientific Revolutions, Chicago, IL: University of Chicago Press. Kuroda, Y. (1988). Harris and the reality of language. Center for Research in Language Newsletter 3:4–12. Lakoff, G. (1965). On the nature of syntactic regularity, Indiana University doctoral dissertation. Lakoff, G. (1968). On instrumental adverbs and the concept of deep structure. Foundations of Language 4:4–29. Lakoff, G. (1970). Irregularity in Syntax, New York: Holt. Lakoff, G. (1972). Linguistics and Natural Logic. In G.Harman and D.Davidson (eds) Semantics of Natural Language, Dordrecht: Reidel Publishing Company, pp. 545–665. Lakoff, R. (1971). Passive resistance. Papers from the 7th Regional Meeting of the Chicago Linguistic Society, 149–163. Lanczos, C. (1970). The Variational Principles of Mechanics, 4th edn, Toronto: University of Toronto Press. Langacker, R. (1969). Pronominalization and the Chain of Command. In D.Reibel and S.Schane (eds) Modern Studies in English, Englewood Cliffs, NJ: Prentice-Hall. Lappin, S. (1991). Concepts of logical form in linguistics and philosophy. In A.Kasher (ed.) The Chomsky an Turn, Oxford: Blackwell, pp. 300–333. Lappin, S. and M. McCord (1990). Anaphora resolution in slot grammar. Computational Linguistics 16:197–212. Lappin, S., R.Levine, and D.Johnson (2000). The structure of unscientific revolutions. Natural Language and Linguistic Theory 18:665–671. Larson, R. (1988). On the double object construction. Linguistic Inquiry 19:335–391.
References
371
Lasnik, H. (1976). Remarks on Coreference. Linguistic Analysis 2:1–22. [Reprinted in Lasnik (1989).] Lasnik, H. (1981a). Restricting the theory of transformations: a case study. In N.Hornstein and D.Lightfoot (eds) Explanation in Linguistics, London: Longman, pp. 152–173. Lasnik, H. (1981b). On two recent treatments of disjoint reference. Journal of Linguistic Research 1:48–58. Lasnik, H. (1989). Essays on Anaphora, Dordrecht: Kluwer. Lasnik, H. (1991). On the necessity of binding conditions. In Freidin (1991a), pp. 7–28. [Reprinted in Lasnik (1989).] Lasnik, H. (1995a). Case and expletives revisited: on greed and other human failings. Linguistic Inquiry 26:615–633. Lasnik, H. (1995b). Last resort. In S.Haraguchi and M.Funaki (eds) Minimalism and Linguistic Theory, Tokoyo: Hituzi Syobo Publishing, pp. 1–32. Lasnik, H. (1995c). A note on pseudogapping. In Pensalfini and Ura (1995), pp. 143–163. Lasnik, H. (1995d). Verbal morphology: Syntactic Structures meets the Minimalist Program. In Campos and Kempchinsky (1995), pp. 251–275. Lasnik, H. (1999). Minimalist Analysis, Maiden, MA: Blackwell. Lasnik, H. (2000). Syntactic Structures Revisited: Contemporary Lectures on Classic Transformational Theory, Cambridge, MA: MIT Press. Lasnik, H. (2001). A note on the EPP. Linguistic Inquiry 32:356–362. Lasnik, H. (2003). Patterns of verb raising with auxiliary be. In H.Lasnik (ed.) Minimalist Investigations in Linguistic Theory’, London: Routledge. Lasnik, H. and R.Freidin (1981). Core grammar, case theory, and markedness. In Belletti, Brandi, and Rizzi (eds) (1981), pp. 407–421. [Chapter 8, this volume.] Lasnik, H. and J.J.Kupin (1977). A restrictive theory of transformational grammar. Theoretical Linguistics 4:173–196. Lasnik, H. and M.Saito (1984). On the nature of proper government. Linguistic Inquiry 15:235–290. Lasnik, H. and M.Saito (1992). Move A: Conditions on its Application and Output, Cambridge, MA: MIT Press. Lasnik, H. and J.Uriagereka (1988). A Course in GB Syntax, Cambridge, MA: MIT Press. Lasnik, H. and J.Uriagereka (2002). On the poverty of the challenge. The Linguistic Review 19:147–150. Lavine, J. (1998). Null expletives and the EPP in Slavic: a minimalist-analysis. In Zeljko Boskovic et al. (eds) FASL 6: The UConn Meeting, Ann Arbor, MI: Michigan Slavic Publications, pp. 212–230. Lavine, J. (2000). Topics in the syntax of nonagreeing predicates in Slavic, Princeton University doctoral dissertation. Lavine, J. and R. Freidin (2002). The subject of defective T(ense) in Slavic. Journal of Slavic Linguistics 10(1–2): 251–287. Lebeaux, D. (1985). Locality and anaphoric binding. The Linguistic Review 4:343–363. Lebeaux, D. (1988). Language acquisition and the form of grammar, Doctoral dissertation, University of Massachusetts, Amherst. Lees, R. (1957). Review of Syntactic Structures, by Noam Chomsky. Language 33:375–408. Lees, R. (1960). The Grammar of English Nominalizations, The Hague: Mouton. Lightfoot, D. (1976a). Trace theory and twice-moved NPs. Linguistic Inquiry 7:559–582. Lightfoot, D. (1976b). Theoretical implications of subject raising. Foundations of Language 14:257–285. McCawley J.D. (1968). The Role of Semantics in a Grammar. In R.Harms and E. Bach (eds) Universals in Linguistic Theory, New York: Holt, Reinhart, and Winston, pp. 124–169. McCawley, J.D. (1975). Review of The Logical Structure of Linguistic Theory, by Noam Chomsky. Studies in English Linguistics 3:209–311. [Reprinted in McCawley (1982).]
References
372
McCawley, J.D. (1982) Thirty Million Theories of Grammar, Chicago, IL: University of Chicago Press. McCawley, J.D. (1995). Generative Semantics. In J.Verschueren, J.-O.Ostman, and J.Blommaert (eds) Handbook of Pragmatics: Manual, Amsterdam: John Benjamins, pp. 311–319. Mahajan, A. (1990). The A/A-bar distinction and movement theory, MIT doctoral dissertation. Manzini, M.R. (1983). Restructuring and reanalysis, MIT doctoral dissertation. Marantz, A. (1984). On the Nature of Grammatical Relations, Cambridge, MA: MIT Press. Marantz, A. (1991). Case and licensing. Proceedings of ESCOL ‘91, 234–253. Marantz, A. (1995). The minimalist program. In G.Webelhuth (ed.) Government and Binding Theory and the Minimalist Program, Oxford: Blackwell, pp. 349–382. Martin, R. (1996). A minimalist theory of PRO and control, University of Connecticut doctoral dissertation. Martin, R. (1999). Case, the extended projection principle, and minimalism. In Epstein and Hornstein (1999), pp. 1–25. Mascaro, J. (1976). Catalan phonology and the phonological cycle, MIT doctoral dissertation. May, R. (1977). The grammar of quantification, MIT doctoral dissertation. May, R. (1979). Must comp-to-comp movement be stipulated? Linguistic Inquiry 10:719–725. May, R. (1981). Movement and binding. Linguistic Inquiry 12:215–243. May, R. (1985). Logical Form: Its Structure and Derivation, Cambridge, MA: MIT Press. Milsark, G. (1974). Existential sentences in English, MIT, doctoral dissertation. Newmeyer, F.J. (1980). Linguistic Theory in America, New York: Academic Press. Newmeyer, F.J. (1986). Linguistic Theory in America, 2nd edn, New York: Academic Press. Newmeyer, F.J. (1996). Generative Linguistics: A Historical Perspective, London: Routledge. Nunes, J. (1995). The copy theory of movement and linearization of chains in the Minimalist Program, Doctoral dissertation, University of Maryland, College Park. Otero, C.P. (1984). La revolucion de Chomsky: Ciencia y sociedad, Madrid: Tecnos. Otero, C.P. (1991). The cognitive revolution and the study of language: looking back to see ahead. In H.Campos and F.Martinez-Gil (eds) Current Studies in Spanish Linguistics, Washington, DC: Georgetown University Press. Otero, C.P. (1994a). Chomsky and the cognitive revolution of the 1950s: the emergence of transformational generative grammar. In C.P.Otero (1994b), vol. I, pp. 1–36. Otero, C.P. (ed.) (1994b). Noam Chomsky: Critical Assessments, 4 vols, London: Routledge. Otero, C.P. (in preparation). Chomsky’s Revolution: Cognitivism and Anarchism, Oxford: Blackwell. Pais, A. (1982). Subtle is the Lord…: The Science and Life of Albert Einstein, Oxford: Oxford University Press. Pensalfini, R. and H. Ura (eds) (1995). Papers on Minimalist Syntax, MITWPL 27, Cambridge, MA. Pesetsky D. (1982a). Complementizer-trace phenomena and the nominative island condition. The Linguistic Review 1:297–343. Pesetsky, D. (1982b). Paths and categories, MIT doctoral dissertation. Pesetsky, D. (1987). Wh-in-situ: movement and unselective binding. In E.Reuland and A.G. .ter Meulen (eds) The Representation of (In)definiteness, Cambridge, MA: MIT Press, pp. 80–129. Pesetsky, D. and E. Torrego (2001). T to C movement: causes and consequences. In Kenstowicz (2001), pp. 355–426. Pollock, J.-Y (1978). Trace theory and French syntax. In S.J.Keyser (1978), pp. 65–112. Pollock, J.-Y. (1989). Verb movement, universal grammar, and the structure of IP. Linguistic Inquiry 20:365–424. Postal, P. (1964). Constituent Structure, Bloomington: Indiana University. Postal, P. (1966). On so-called ‘pronouns’ in English. In F.Dinneen (ed.) Report of the Seventeenth Annual Round Table Meeting on Linguistics and Language Studies, Washington, DC: Georgetown University Press, pp. 177–206.
References
373
Postal, P. (1970). On the surface verb ‘remind’. Linguistic Inquiry 1:37–120. Postal, P. (1970). On coreferential complement subject deletion. Linguistic Inquiry 1:439–500. Postal, P. (1971). Cross-over Phenomena, New York: Holt, Rinehart and Winston. Postal, P. (1974). On Raising, Cambridge, MA: MIT Press. Postal, P. (1977). About a ‘Nonargument’ for Raising. Linguistic Inquiry 8:141–154. Putnam, H. (1962). The analytic and the synthetic. In H.Feigl and G.Maxwell (eds) Minnesota Studies in the Philosophy of Science, III, Minneapolis, MN: University of Minnesota Press, pp. 358–397. Quicoli, A.C. (1976a). Conditions on Clitic-movement in Portuguese. Linguistic Analysis 2:199–223. Quicoli, A.C. (1976b). Conditions on quantifier movement in French. Linguistic Inquiry 7:583–607. Quicoli, A.C. (1979). Clitic movement in French causatives. Linguistic Analysis 6:131–186. Rappaport, G. (1986). On anaphor binding in Russian. Natural language and Linguistic Theory 4:97–120. Reinhart, T. (1976). The syntactic domain of anaphora, MIT doctoral dissertation. Reinhart, T. (1981). Definite NP anaphora and c-command domains. Linguistic Inquiry 12:605–636. Reinhart, T. (1986). Center and periphery in the grammar of anaphora. In Barbara Lust (ed.) Studies in the Acquisition of Anaphora, vol. 1, Dordrecht: Reidel, pp. 123–150. Reinhart, T. (1995). Interface Strategies, OTS, University of Utrecht. Riemsdijk, H. van (1978a). On the Diagnosis of Wh Movement. In Keyser (1978). Riemsdijk, H. van (1978b). A Case Study in Syntactic Markedness: The Binding Nature of Prepositional Phrases, Peter de Ridder Press, Lisse. Riemsdijk, H.van and E.Williams (1981). NP structure. The Linguistic Review 1:171–217. Rizzi, L. (1980). Violations of the WH Island Constraint in Italian & the Subjacency Condition. Journal of Italian Linguistics 5:157–195. Rizzi, L. (1981). Nominative marking in Italian infinitives and the nominative island constraint. In Heny (1980), pp. 129–157. Roberts, I. (1992). Verbs and Diachronic Syntax: A Comparative Study of English and French, Dordrecht: Kluwer. Ross, J.R. (1967). Constraints on variables in syntax, MIT doctoral dissertation. Ross, J.R. (1969). Auxiliaries as main verbs. Studies in Philosophical Linguistics 1:77–102. Ross, J.R. (1976). On the cyclic nature of English pronominalization. In To Honor Roman Jakobson, The Hague: Mouton, pp. 1669–1682. Ross, J.R. (1986). Infinite Syntax! Norwood, NJ: Ablex. Rothstein, S. (1983). The syntactic forms of predication, MIT doctoral dissertation. Rouveret, A. (1980). Sur la notion de proposition finie. Langage 60:75–107. Rouveret, A. and J.-R.Vergnaud (1980). Specifying reference to the subject: French causatives and conditions on representations. Linguistic Inquiry 11:97–202. Schutze, C. (1993). Towards a minimalist account of quirky case and licensing in Icelandic. Papers on Case and Agreement II: MIT Working Papers in Linguistics 19:321–375. Selkirk, E. (1995). Sentence prosody: intonation, stress, and phrasing. In J.Goldsmith (ed.) The Handbook of Phonological Theory, Oxford: Blackwell, pp. 550–569. Shevelov, G. (1969). The vicissitudes of a syntactic construction in Eastern Slavic. Scando-Slavia 15:171–186. Sigurðsson, H. (1989). Verbal syntax and case in Icelandic, Doctoral dissertation, University of Lund. Sigurdsson, H. (1992). The case of quirky subjects. Working Papers in Scandinavian Syntax 49:1–26. Smith, N. (1999). Chomsky: Ideas and Ideals, Cambridge, England: Cambridge University Press.
References
374
Solan, L. (1987). Parameter setting and the development of pronouns and reflexives. In T.Roeper and E.Williams (eds) Parameter Setting, Boston, MA: Reidel, pp. 189–210. Sportiche, D. (1981). On bounding nodes in French. The Linguistic Review 1:219–246. Sportiche, D. (1983). Structural invariance and symmetry in syntax, MIT doctoral dissertation. Sprouse, J. (2003). Toward an optimal system: a minimal analysis of English auxiliaries, unpublished senior thesis, Princeton University. Sprouse, R.A. (1989). On the syntax of the double object construction in selected Germanic languages, Princeton University doctoral dissertation. Stewart, I. (1998). Life’s Other Secret, New York: Wiley. Stewart, I. (1999). Designer differential equations for animal locomotion. Complexity 5: 12–22. Stockwell, R.P., P.Schachter, and B.H.Partee (1968). Integration of Transformational Theories on English Syntax, Los Angeles, CA: Linguistics Department, UCLA. Stowell, T. (1981). Origins of phrase structure, MIT doctoral dissertation. Strozer, J. (1994). Language Acquisition after Puberty, Washington, DC: Georgetown University Press. Takahashi, D. (1994). Minimality of movement, University of Connecticut doctoral dissertation. Taraldsen, K.T. (1979). On the NIC, Vacuous Application and That-trace Filter, Bloomington, IN: Indiana University Linguistics Club Publications. Thompson, D.W (1942). On Growth and Form: A New Edition, Cambridge, England: Cambridge University Press. Thrainsson, H. (1979). On complementation in Icelandic, Harvard University doctoral dissertation. [Published by Garland Press, New York.] Toffoli, T. (1999). Action, or the fungibility of computation. In A.J.G.Hey (ed.) Feynman and Computation, Reading, MA: Perseus Books, pp. 349–392. Travis, L. (1984). Parameters and the effects of word order variation, MIT doctoral dissertation. Trudgill, P. (1999). The Dialects of England, 2nd edn, London: Blackwell. Ura, H. (2000). Checking Theory and Grammatical Functions in Universal Grammar, New York: Oxford University Press. Vergnaud, J.-R. (1974). French relative clauses, MIT doctoral dissertation. Vergnaud, J.-R. (1977). Letter to Chomsky and Lasnik, 17 April. [Published in R. Freidin, M.L.Zubizaretta, and C.P.Otero (eds) Foundational Issues in Linguistic Theory, Cambridge, MA: MIT Press.] Vergnaud, J.-R. (1985). Dependances ef.niveaux de representation en syntaxe, Amsterdam: John Benjamins. Vergnaud, J.-R. (1998). On two notions of representations in syntax. Manuscript. use. Vergnaud, J.-R. and M.L.Zubizarreta (2000). Intervention effects in French wh-in-situ construction: syntax or interpretation? Manuscript. Wasow, T. (1972). Anaphoric relations in English, MIT doctoral dissertation. Wasow, T. (1977). Transformations and the Lexicon. In Culicover, Wasow, and Akmajian (1977), pp. 327–360. Watanabe, A. (1992). Subjacency and s-structure movement of Wh-in-situ. Journal of East Asian Linguistics 1:255–291. Watanabe, A. (1995). The conceptual basis of cyclicity. In Pensalfini and Ura (1995), pp. 269–291. Weinberg, A. (1977). On the Structure of Gerundive Complements in English. Xeroxed paper, MIT. Weinberg, S. (1976). The forces of nature. Bulletin of the American Academy of Arts and Sciences 29(4): 13–29. Wieczorek, D. (1994). Ukrainskij pierfiekt na -NO, -TO na fonie polskogo pierfiekta. Wrodaw: Wydawnictwo uniwersytetu wrodaw skiego. Williams, E. (1975). Small clauses in English. In J.Kimball (ed.) Syntax and Semantics, vol. 4, New York: Academic Press. Williams, E. (1977). Discourse and logical form. Linguistic Inquiry 8:101–139.
References
375
Williams, E. (1980). Predication. Linguistic Inquiry 11:203–238. Yang, D.W (1983). The extended binding theory of anaphors. Language Research 19:169–192. Yip, M., J.Maling, and R.Jackendoff (1987). Case in tiers. Language 63:217–250. Zaenen, A.J.Maling, and H.Thrainsson (1985). Case and grammatical functions: the Icelandic passive. Natural Language and Linguistic Theory 3:441–483. Zubizarreta, M.L. (1998). Prosody, Focus, and Word Order, Cambridge, MA: MIT Press. Zwart, J.W. (1993). Dutch syntax: a minimalist approach, University of Groningen doctoral dissertation.
Name index Akmajian, A. 43, 45 n. 1 Anderson, M. 102, 111 n. *, 176 n. 12 Andrews, A. 132, 213 n. 9 Aoun, J. 69 n. 8, 202 Atkinson, M. 311 n. 14 Babby, L. 5, 68 n. *, 71 n. 24, 117, 131 n. 6, n. 7, 132, 138, 143, 155 n. *, 156 n. 7, n. 10, n. 11, 157 n. 13, n. 17, n. 20, 158 n. 25 , 297 n. 42 , 353 Babyonyshev, M. 155, 156 n. 9 Bailyn, J. 158 n. 24, n. 25, 159 n. 30 Baltin, M. 45 n. 1, 48 n. 24 Barsky R. 256 Bartsch, R. 252 n. 16 Belletti, A. 136 Billings, L. 156 n. 8 Bobaljik, J. 98 n. 19 Boeckx, C. 141, 142, 155 n. 1 Boehner, P. 335 n. 22 Bohr, N. 317, 331, 332, 334 n. 12 , 339 n. 43 Borer, H. 49 n. 26 , 215 n. 28 Bošković, Z. 97 n. 7 Bouchard, D. 189, 215 n. 31 Bowers,J. 253 n. 19 Bresnan, J. 11, 40, 49 n. 26 , 72 n. 25 , 251 n. 7 Brody M. 215 n. 29 , 325, 336 Brown, S. 98 n. 19 Burzio, L. 114, 121, 131 n. 4 , 132 n. 12 , 138 Cheng, L. 70 Chomsky, N. 1–7, 9, 11–13, 14 n. 1 , 15 n. 2, n. 4, n. 5, 16 n. 8 , 16, 21–24, 26–28, 30, 33, 38, 39, 41–44, 45 n. *, n. 1, n. 4, 46 n. 9 , 47 n. 14–17, n. 20, n. 21, 48 n. 23–25, 49 n. 27 , 50 n. 28, n. 32– 33, n. 35, 51–53, 55–57, 60–63, 65, 67, 68 n. 2, n. 3, 69 n. 6, n. 8–9, 70 n. 11, n. 13, 71 n. 18–20, n. 24–25, 72 n. 28–30, 73 n. 32, n. 35, n. 40, 75–78, 80–83, 85–87, 89, 93–94, 96 n. 1–2, n. 4–6, 97 n. 11–13, 98 n.14, 101–102, 104–110, 111 n.l, n. 3, n. 5–6, n. 8, 112 n. 11–13, 113, 116, 131 n. 5 , 135, 138, 140, 149, 155 n. 1, n. 3, n. 5, 156 n. 6, n. 10, 157 n. 15 , 159 n. 35 , 163–175, 176 n. 1–2, n. 5–6, n. 10, n. 13, 178, 185 n. 0, n. 3, 191, 194–196, 200–202, 204, 205, 208–209, 213 n. 1, n. 3, n. 5, 214 n. 13, n. 15, n. 18, n. 21, 215 n. 28–29, n. 33, 216 n. 34–36, 217–218, 221–222, 225 n. 2, n. 3, n. 7–8, 231, 232, 234–237, 247, 250 n. 1, 251 n. 6, n. 9, n. 11, 254, 291 n. *, n. 4, 292 n. 5, n. 7–11, 293 n. 14, n. 16–17, n. 19–20, n. 22, 294 n. 23, n. 26, 295 n. 27–31, n. 33, 296 n. 38–39, 297 n. 41, n. 43, n. 46, n. 48, 298 n. 50–51, 299, 309 n. 3 , 310 n. 5, n. 7, n. 9–11, 311 n. 13–15, n. 17– 18, 313–316, 319–321, 323–326, 332 n.*, n. 1–2, 333 n. 5 , 334 n. 11, n. 16, n. 20, 335 n. 21, n. 27,
Name index
377
337 n. 34, n. 36, 338 n. 40 , 340, 342, 344, 347, 354, 357 n. 3, n. 5, n. 7, 359 n. 11–13, n. 17, 360 n. 20 , 361 n. 24, n .26–28, 364 n. 38 Cinque, G. 145 Cohen-Tannoudji, E. 323 Cole, P. 131 n. 10 , 132 n. 13 Collins, C. 4, 68 n. 3 , 92, 93, 98 n. 21 Compton, A.H. 330, 331–332, 338 n. 41–42, 339 n. 44, n. 48 Courant, R. 309 Crain, S. 12, 262, 263, 265 n. 5 Darwin, C. 339 n. 45, n. 47 Debye, P. 338 n. 42 , 339 n. 44 Demirdash, H. 70 Dirac, P.A. M. 317, 318, 319, 330, 331, 334 n. 13–14, 338 n. 41 , 339 n. 48 Diu, B. 323 Dresher, E. 42, 46 n. 9 , 49 n. 27 , 295 n. 33 Einstein, A. 317, 318, 330, 331, 338 n. 42 , 339 n. 43–44 Emonds, J. 16 n. 12 , 24, 46 n. 9 , 50 n. 31 , 94, 232, 233, 250 n. 3 , 296 n. 40 , 346 Epstein, S.D. 15 n. 6 , 68 n. 5 , 88, 92, 303, 310 n. 10 Feyerabend, P. 255 Feynman, R. 322, 323, 334 n. 14 , 335 n. 24 , 339 n. 42 Fiengo, R. 23, 24, 45 n. 4 , 46 n. 9 , 48 n. 23 , 69n. 8, 214 n. 12 , 281, 295 n. 28 , 337 n. 36 Fillmore, C. 75, 76 Fodor, J.D. 41, 75, 174 Fought, J. 333 n. 2 Freidin, R. 5, 15 n. 2, n. 7, 16 n. 15 , 24, 51, 54, 62, 68 n. 2, n. 5, 71 n. 18, n. 24, 73 n. 40 , 74, 77, 78, 82, 88, 90, 92, 96 n. 3 , 104, 110, 111 n. 2, n. 4, 117, 130 n. 1, 131 n. 6–7, 143, 154, 156 n. 11 , 157 n. 17 , 159 n. 36 , 163, 164, 165, 166, 170, 171, 175, 176 n. 2, n. 5, 185 n. 1 , 186, 191, 193, 196, 197, 203, 204, 213 n. 1–2, 214 n. 18, n. 20, n. 22, 215 n. 28–29, 217–218, 220, 224, 225 n. 2 , 251 n. 8 , 252 n. 15 , 282, 283, 292 n. 5 , 293 n. 22 , 294 n. 24 , 295 n. 31, n. 33, 296 n. 38 , 297 n. 41, n. 43, 305, 308, 310 n. 4 , 312 n. 22 , 313, 324, 333 n. 4–5, 338 n. 38 , 358 n. 7 , 359 n. 12–13, 362 n. 29 , 363 n. 35 , 364 n. 38 Fukui, N. 338 n. 39 George, L. 71 n. 19 , 155 n. 3 , 181, 191, 213 n. 8 , 254 Goodman, N. 290, 314, 333 n. 2 Greene, B. 335 n. 21 Grimshaw, I. 72 n. 25 Groat, E. 92 Gruber,J. 251 n. 8 Harbert,W. 131 n. 10 , 132 n. 13 , 178, 192, 196, 197, 203, 213 n. 4, n. 9, 214 n. 18 , 215 n. 23 Harman, G. 305 Harris, R. 254 Harris, Z. 266–273, 291 n. 1, 292 n. 8, n. 10–11 Harves, S. 155 n. *, 157 n. 12 Hasegawa, K. 279, 294–295 n. 26 , 295 n. 29 Hendrick, R. 70 n. 10
Name index
378
Hermon, G. 131 n. 10 , 132 n. 13 Higginbotham, J. 70 n. 13 , 198, 210, 214 n. 12, n. 20, 225 n. 9 Hockett, C. 333 n. 2 Hornstein, N. 42, 45 n.*, 46 n. 9 , 49 n. 27 , 69 n. 8 , 95, 295 n. 33 , 296 n. 40 , 308 Huang, C.-T. J. 4, 73 n. 32 , 184, 199–200, 215 n. 26 Hymes, D. 333 n. 2 Iatridou, S. 155 n. 3 Ionin, T. 155 n. *, 158 n. 24 , 159 n. 32, n. 34 Itkonen, E. 292 n. 11 Jackendoff, R. 132 n. 15 , 203, 239, 250 Jacobson, P. 336 n. 30 Jaeggli, O. 54, 69 n. 8 , 297 n. 42 Jenkins, L. 50 n. 30 , 334 n. 19 , 338 n. 39 Johnson, D. 313 Jourdain, P.E. B. 323 Junghanns, U. 145, 158 n. 24 Katz, J. 41, 75, 229, 230, 231, 260, 293–294 n. 22 Kawashima, R. 92 Kayne, R. 49 n. 26 , 67, 71 n. 18 , 92, 177 n. 15 , 251 n. 7 , 281, 305, 310 n. 7 Kean, M.-L. 45 n. 2 Kimball, J. 77, 248 King,T. H. 158 n. 29 Kitahara, H. 91–92, 95, 98 n. 15, n. 16 Klima, E. 46 n. 7 , 250 n. * Kneale, M. 335 n. 22 Kneale,W. 335 n. 22 Kondrashova, N. 145 Kornfilt,J. 155 n. 3 , 181, 191, 213 n. 8 Koster,J. 45 n.*, 46 n. 9 , 47 n. 15 , 50 n. 32, n. 34, 67, 69 n. 8 , 101 Kovtunova, I.I. 137, 158 n. 22 Kramers, H.A. 331 Kuhn, T. 255 Kupin, J.J. 22, 111 n. 1, 163, 176 n. 1 , 265 n. 3 , 359 n. 12 Kuroda, Y. 292 n. 11 Lakoff, G. 235, 251 n. 6 , 256, 257 Lakoff, R. 231 Laloe, F. 323 Lanczos, C. 323 Langacker, R. 46 n. 7 Lappin, S. 313, 337 n. 36 Larson, R. 71 n. 15 Lasnik, H. 7, 10, 14 n. 1, 15 n. 7 , 16 n. 8, n. 14, 22, 24, 33, 38, 39, 44, 46 n. 7, n. 9, 47 n. 16, n. 21, 49 n. 27 , 50 n. 35 , 53, 54, 68 n. *, n. 5, 71 n. 19 , 72 n. 27, n. 31, 73 n. 32 , 95, 97 n. 7 , 101, 111 n.*, n. 1,n. 2, 130–131 n. 1, 136, 154, 156 n. 5 , 163, 176 n. 1, n. 5, n. 8, 186, 191, 198, 204, 213 n. 1, 214 n. 20, n. 22, 215 n. 29 , 217, 218, 221–224, 225 n. 3, n. 5, n. 7, n. 8, 226 n. 9, n. 10, 265 n. 3 , 292 n. 5 , 293 n. 16 , 295 n. 33 , 296 n. 35 , 297 n. 44 , 298 n. 50 , 309 n. *, 310 n. 9, n. 10, 312 n.
Name index
379
21 , 313, 316, 324, 337 n. 35 , 338 n. 38 , 340–350, 352, 354, 356, 357 n.*, n. 1, n. 5, 358 n. 7, n. 8, n. 10, 359 n. 12, n. 13, n. 15, n. 17, 360 n. 18–23, 361 n. 24 , 362 n. 30, n. 33 Lavine, J. 134, 156 n. 8 , 158 n. 23, n. 25, 333 n. 5 Lebeaux, D. 189, 220, 225 n. 4 Lees, R. 257, 258, 274, 277 Leighton, R.B. 322, 323, 335 n. 24 , 339 n. 42 Levine, R. 313 Lightfoot, D. 45 n. 4 , 49 n. 25 McCawley, J.D. 255, 256, 257 McGord, M. 337 n. 36 Mahajan, A. 147 Maling,J. 124, 132 n. 13, n. 15, n. 17, 133 n. 18 , 156 n. 8 Manzini, M.R. 213 n. 5 Marantz, A. 138, 305 Martin, R. 16 n. 8 , 155 n. 1 Mascaro, J. 45 n. 2 May, R. 7, 54, 69 n. 8, n. 9, 70 n. 13 , 112 n. 12 , 163, 169, 170, 209, 216 n. 37 , 304, 337 n. 36 Miller, G.A. 335 n. 20 Milsark, G. 42, 48 n. 24 , 50 n. 31 Nakayama, M. 12 Newmeyer, F.J. 254, 255, 259, 291 n. * Nunes,J. 82, 98 n. 19 Otero, C.P. 51, 68 n. 1, 291 n.*, 292 n. 6 , 298 n. 49, n. 51, 335 n. 20 Pais, A. 330 Partee, B.H. 251 n. 10 Pesetsky D. 53, 54, 70 n. 10, n. 12, 73 n. 41 , 154, 155 n. 1, n. 3, 156 n. 9 , 177 n. 15 Pollock, J.-Y 40, 346, 360 n. 20 , 363–364 n. 36 Postal, P. 11, 49 n. 25 , 72 n. 25 , 75, 176 n. 7 , 229, 230, 231, 250 n. 1, 257, 293–294 n. 22 , 325 Putnam, H. 317 Quicoli, A.C. 49 n. 26 , 281 Rappaport, G. 159 n. 33 Reinhart, T. 46 n. 7 , 47 n. 16 , 145, 188, 213 n. 6 , 226 n. 9 Riemsdijk, H. van 45 n. 1, 101, 112 n. 14 , 216 n. 36 , 292 n. 12 , 321 Rizzi, L. 71 n. 17 , 177 n. 17 , 213 n. 7 Roberts, I. 363 n. 34 Rochemont, M. 70 n. 10 Ross, J.R. 43, 51–52, 342 Rothstein, S. 156 n. 7 Rouveret, A. 104, 176 n. 2 , 191, 213 n. 10 Saito, M. 53, 68 n. 5 , 72 n. 27, n. 31, 73 n. 32 , 297 n. 44 Sands, M. 322, 323, 335 n. 24 , 339 n. 42 Schachter, P. 251 n. 10
Name index
380
Schutze, C. 155 Selkirk, E. 145, 295 n. 28 Shevelov, G. 142 Sigurðsson, H. 132 n. 16 , 133 n. 19 , 141, 143–144 Slater, J.E. 331 Smith, N. 323 Solan, L. 212 Sportiche, D. 69 n. 8 , 71 n. 17 , 202, 215 n. 31 Sprouse,J. 364 n. 40 Sprouse, R.A. 113, 132 n. 14, n. 17 Sridhar, S.N. 131 n. 10 , 132 n. 13 Stewart, I. 320–321, 322, 334 n. 18 , 337 n. 33 Stockwell, R.P. 251 n. 10 Stowell, T. 3, 213 n. 5 , 297 n. 45 , 360 n. 17 Strozer,J. 310 n. 7 Takahashi, D. 89, 97 n. 7 Taraldsen, K.T. 177 n. 15 Thompson, D.W. 322, 334 n. 18 , 335 n. 26 Thráinsson, H. 124, 132 n. 13, n. 17, 133 n. 18 Toffoli, T. 323–324 Torrego, E. 154, 155 n. 1, n. 3 Travis, L. 3, 264 Trudgill, P. 363 n. 34 Ura, H. 155 n. 3 Uriagereka, J. 16 n. 14 , 226 n. 9 Vennemann, T. 252 n. 16 Vergnaud,J.-R. 104, 131 n. 5 , 154, 176 n. 2 , 213 n. 10 , 285, 313, 336 n. 28, n. 30, 337 n. 31 Wasow, T. 16 n. 13, 48 n. 23 , 176 n. 7 , 295 n. 28 Watanabe, A. 71 n. 21 , 98 n. 22 Weinberg, A. 50 n. 35 Weinberg, S. 319 Wieczorek, D. 137 Williams, E. 45 n. 1, n. 5, 46 n. 10 , 104, 216 n. 36 Yang, D.-W. 217–218, 224 Yip, M. 132 n. 15 Zaenen, A. 124, 132 n. 13, n. 17, 133 n. 18 Zubizarreta, M.L. 145, 337 n. 31 Zwart, J.W. 158 n. 25 Zybatow, G. 145, 158 n. 24
Subject index Accusative unaccusative constructions 137, 140, 149, 152, 155, 157 n. 12; derivation of 162 Adjunction 90, 95–97, 324, 360 n. 17 Anaphoric index 164, 166–170, 172–173, 176 n. 12 Anaphoric relations 10, 148, 186, 223, 326 Anaphors 8–10, 24–25, 27–29, 47 n. 17, 109–110, 164–167, 170, 176 n. 12, 178, 184, 185 n. 3, 186–192, 194–195, 197–199, 201, 203–205, 212, 213 n. 1, n. 6, 222–224, 226 n. 9 Bare output conditions (BOC) 304 Binary operation GT 81 Binding theory 4, 6, 8–10, 16 n. 17, 149, 159 n. 33, 178–179, 182–182, 185 n. 4, 186–187, 189– 190, 192, 197, 201–202, 204–205, 207–208, 211–212, 213 n. 6, 215 n. 25, 217–219, 221–224, 225 n. 5, n. 9, 285, 289, 296 n. 33, 303, 308; Binding principles 6, 8, 9, 164–165, 187–188, 209, 215 n. 25, 217–222; Proper Binding (PB) 25, 27–28, 30, 33–35, 38, 42, 68 n. 3, 88, 152, 170–171, 196–197, 210 Biolinguistic approach 1 Burzio’s generalization 138 Case filter 3–5, 14 n. 1, 15 n. 8, 78, 96 n. 7, 113, 114–116, 124, 128–129, 130 n. 1, 131 n. 1, 133 n. 24, 202–204, 285–286, 300, 303–304, 315, 317 Case phenomena 122, 124, 130, 139; Accusative case 5, 6, 115–118, 121, 123, 132, 134, 139, 143, 353; Case assignment 4, 5, 53, 102–109, 11 n. 5, 15 n. 8, 112 n. 10, n. 11, n. 13, 114–116, 126, 128– 130, 131 n. 2, 133 n. 23, n. 25, 213 n. 1 1, 308, 353; Case-licensing 5, 124–125, 137–138, 142 n. 24, n. 25, 144; Configurational Case 5, 115–119, 121, 123–124, 129–130, 131 n. 6, n. 7; Exceptional Case marking (ECM) 126–127; Generic-case proposal 142; Genitive of negation (GenNeg) 118–119, 131 n. 7, 142, 149, 151; Lexical case index 126; Lexical Case phenomena 5, 113, 115–117, 119–121, 123, 125, 127, 129–130; Nominative case 5, 102, 114–116, 120–122, 125, 131 n. 10, 133 n. 22, 176 n. 5, 191, 336, 315, 333 n. 6 Chain 2, 29–30, 53–60, 62–67, 71 n. 25, 72 n. 25, n. 26, n. 28, 73 n. 37, 82–83, 85, 87–89, 98 n. 23, 142, 182, 310 n. 9, 327, 333 n. 6, 337 n. 35, 37, 356; Chain reduction via trace deletion 89 Cognitive synonymy 241 COMP-to-COMP 7, 163, 169–170, 175 Complement Subject Deletion 233 Complementizer 79, 94–95, 105–109, 111 n. 5, 186, 212 n. 11 Complex NP Constraint 43, 52 Computational ambiguity 55
Subject index
382
Constraint on Extraction Domains (CED) 4, 15 n. 6, 86–87 Convergent derivation 58, 66, 84, 97 n. 10, 135, 140, 304 Copy theory of movement 82, 87–88, 219 Core grammar (CG) 4, 101, 163, 176 n. 1 Correspondence Principle 317, 334 n. 12 Counter-cyclic derivation 4, 62, 65–66, 86–96, 98 n. 21, n. 22; Counter-cyclic merge 4, 92–93, 96; Counter-cyclic Move 2, 4, 66, 91–94, 96 CP complement 54, 65, 363 n. 35 cyclic derivation 3, 4, 23, 25 n. 8 60, 65–67, 81 n. 40, 87–89, 91; syntactic cycle 1, 2, 21, 41, 74, 76, 79, 96 n. 1, 283 cyclic domain 3, 22, 23, 26, 45 n. 1, 90; cyclic subdomain 22, 31, 77, 210 cyclic principle 3, 4, 15 n. 6, 22, 41, 74, 77–78, 82–83, 86, 97 n. 11; Exttension Condition 3, 15 n. 6, 81–83, 85–87, 89–90, 97 n. 11, 13; Generalized cyclicity 95, 96; Strict Cycle Condition (SCC) 1, 21, 51, 62, 65, 67, 78, 97 n. 23, 175, 282 D-Structure 9, 113–114, 121, 124, 131 n. 10, 133 n. 20, n. 25, 209–210, 218–219, 289–299, 301, 305, 316, 346; Elimination of D-Structure and S-Structure 9, 219, 299 Defective category 146, 150 Derivational morphology 238 Double object verb 32 Elementary (transformational) operations 83, 89, 91–92, 96, 98 n. 14, 264, 287, 291, 294 n. 25, n. 26, 295 n. 29, 313 n. 33, 317, 345, 359 n. 12, n. 13 Ellipsis analysis 232–233 Empirical and conceptual/methodological issues 13, 74, 76, 313 Empty Category Principle (ECP) 52–53, 81, 215 n. 33, 300, 315 EQUI analysis 105–107 expressive power 41, 76, 280, 295 n. 26 Extended Projection Principle (EPP) 6, 136, 138, 141, 144, 145, 147, 154–155, 155 n. 5, 156 n. 5, 312 n. 20 Extended Standard Theory (EST) 12, 23, 163, 178, 186, 259 Extension Condition, see cyclic principle Form Chain analysis 2, 60, 62, 64, 66, 72 n. 28 Full Interpretation (FI) 5, 10, 15 n. 9 16 n. 9, 57–58, 61, 88–90, 98 n. 16, 224, 299, 303–308, 310 n. 10, n. 11, 311 n. 12, 312 n. 19, n. 22, 315–316, 333 n. 3, n. 5, n. 6, n. 7 Functional category T(ense) 144 Functional Relatedness, see θ-Criterion Functional Uniqueness, see θ-Criterion GB framework 53, 55, 186, 205, 217, 220–221 Generalized transformation, see Transformational analysis Generative semantics (GS), program 11, 254, 257, 259 German 120, 121, 123–124, 129, 13 n. 9, 10, 246 Greed 56–59, 64, 67, 71, 73 n. 37, 299
Subject index
383
Head Movement Constraint (HMC) 3, 12, 15 n. 6, 85, 89, 348, 361 n. 24 Icelandic 5, 121–123, 126–130, 141–142, 144, 158 n. 25, 192, 196, 311 n. 11 Inclusiveness Condition 9 Indexing and binding 163, 170; Contradictory index 7; Elimination of indexing 186, 223 Last resort (LR) 51, 80–81, 288–289, 297 n. 47, 299, 307–328, 308, 312 n. 21, 315, 348 Least effort 60, 65, 67, 288 Lexical anaphors 28–29, 110, 165, 178, 184, 189, 197, 201, 203 Lexical insertion 35, 45 n. 6, 83, 85, 90, 305 Lexicalist hypothesis 278 Linear Correspondence Axiom (LCA) 4, 92–95, 98 n. 20, n..21 Logical Form (LF) 10, 35–36, 41, n. 9, 49 n. 25, 52–54, 63, 69 n. 8, 71 n. 20, 101, 134, 164, 173, 209, 218, 219, 221 260–261, 288, 297 n. 47, 313; see also θ-Criterion Logical Structure of Linguistic Theory (LSLT) 1, 12, 255–256, 258–259, 268 270–271, 273–276, 279, 292 n. 4, n. 7, n. 13, n. 25, 294 n. 25 Maximal projection 82, 90, 94, 153, 200, 213 n. 11, 214 n. 11, 310 n. 4, 355 Merge 304, 360 n. 17 Minimal Link Condition (MLC) 80, 84, 86, 89–90, 97 n. 9 Minimalist Program (MP) 9, 12–13, 74, 79, 97, 134–135, 217, 225 n. 2, 289, 299, 301–308, 309 n. 2, n. 3, 310 n. 4, n. 6, 311 n. 14, 313, 324–325, 359 n. 17 Morphophonemics of Modern Hebrew (MMH) 267–270, 289, 291 n. 3, 300 n. 7, n. 10, 297 n. 48, 314, 332 n. 2, 333 n. 2 Movement transformations 12, 40, 114, 131 n. 2, 219, 296 n. 33, 305; Move category 52 n. 28, 234, 326; Move feature 234–235, 326, 331 n. 18; NP-movement 2, 5, 65, 69 n. 6, 78, 86–87, 104, 333 n. 5; NP Preposing 23–24, 41–42, 234, 279–281, 283, 285–286; wh-movement 2, 5, 7, 14n. 1, 15 n. 3, 57–58, 60, 66–67, 69 n. 6, 72 n. 25, 79, 86–87, 104, 106, 112 n. 11, 147, 158 n, 29, 210–211, 297 n. 43, 352, 359 n. 11 Nested Dependency Condition (NDC) 53–54, 70 n. 13, n. 14, 72 n. 27 Nominative Island Condition (NIC) 6–8, 10, 102–105, 109–110, 111 n. 2, 173–174, 181–183, 185 n. 3, 191, 192, 194, 197, 213 n. 77, 214 n. 22 Opacity effects for NP 180, 185 n. 2, 191–192, 194, 199, 201 Opacity Principle 29–33, 39, 47 n. 15, n. 17, n. 21, 49 n. 26 OVO structures 146 Passive construction 5, 10, 11, 13, 16 n. 12, n. 13, 35, 40, 48 n. 25, 49 n. 25, 65, 75, 94, 112 n. 10, 114–115, 119–121, 123-130, 131 n. 10, 132 n. 16, 134, 136–137, 141–142, 144, 156 n. 10, 157 n. 14, 159 n. 31, 202–203, 209, 229–249, 250 n. 3, 251 n. 6, 252 n. 13, 276–281, 283–288, 293 n. 22, 294 n. 22, n. 23, n. 24, n. 25, 295 n. 26, 297 n. 41, 359 n. 12 Passive transformation 11, 13, 49 n. 25, 75, 240, 248, 250, 376–377, 279–280, 283–284, 288, 294 n. 25, 359 n. 12 Phonetic Form (PF) 57, 134, 218, 288, 313, 325, 342, 352
Subject index
384
Pleonastic subjects 8, 178–180, 183 Principle of Lexical Satisfaction (PLS) 5, 126, 127 Principles and Parameters framework 6, 12 13, 77, 134, 217, 291, 299–301, 304, 313 Principles of grammar 2–4, 12–13, 67, 74, 90, 115, 155, 207, 265 n. 4, 283, 360 n. 17 PRO-Theorem 9 Procrastinate 56–57, 59, 299, 307, 312 n. 19, 354 Proper Binding (PB), see Binding theory Propositional Island Condition (PIC) 102, 104, 110, 111 n. 2, 163, 171, 173–175, 176 n. 5, n. 12, 177 n. 17, 191, 213 n. 7, 282, 295 n. 30 Q-features 79 Quantifier Raising (QR) 95, 148–149, 216 n. 36, 328, 337 n. 33 Referential index 164–167, 169, 170–172, 175, 183, 194–195 Relativized Minimality (RM) 63, 81–82, 84, 89 Rule schema 101, 304 Russian 5, 6, 115–118, 120, 123, 130, 131 n. 8, 132 n. 12, 134, 136, 137, 139, 140, 142–146, 148– 152, 156 n. 9, 157 n. 13, n. 14, 158 n. 22, n. 25, n. 28, n. 29, 159 n. 31, n. 32, n. 33, 246 Segmentation and classification 266, 270, 298 n. 51 Semantic functions 231, 237, 240–241, 250, 251 n. 8, n. 9, n. 11, 252 n. 15, 253 n. 20, 284 Semantic representation 46 n. 9, 101, 164, 238–239, 240–241, 257, 260 Shortest derivation Condition 91 Shortest Movement Condition 63, 67 Singulary transformation, see Transformational analysis Spec-CP 2, 4, 14 n. 1, 152, 158 n. 25 Spec-TP 6, 134–136, 139, 141, 147–149, 151, 154, 155 n. 2, 333 n. 5 Specified Subject Condition (SSC) 24, 28, 37, 69 n. 8, 104, 110, 191, 281 Strict Cycle Condition (SCC), see cyclic principle structural ambiguity 55, 71 n. 23 Structure dependent formulation 262 Subjacency Condition 2, 43, 47 n. 20, 50 n. 34, 52, 78, 163, 295 n. 31, 300; Subjacency Filter 43–44; Subjacency violation 54, 56, 60, 62, 64, 66, 67, 71 n. 19 Superiority Condition 2, 52; Superiority phenomena 69; Superiority violation 53–54, 56, 62, 64, 68 n. 5, 69 n. 6, 70 n. 15, n. 18, n. 19, 72 n. 28, 73 n. 38 Superraising construction 83–85 Syntactic cycle, see cyclic principle Tensed-S Condition (TSC) 24, 26, 37, 248, 281 There-Insertion transformation 77 θ-Criterion: Functional Relatedness 16 n. 9, 36–37, 38, 88, 284–285, 288, 296 n. 37, 38; Functional Uniqueness 38–39, 49 n. 25, 185 n. 1, 284–286, 288–289, 296 n. 38, 297 n. 47 Trace binding 35–36 Trace deletion 82, 88–89 Trace erasure 25–26, 34, 38, 40 n. 9, 80, 82, 88–89, 303 n. 33 Trace Erasure Prohibition (TEP) 26, 38
Subject index
385
Trace theory 1, 2, 9, 23–26, 35, 37, 38, 45 n. 4, n. 5, 47 n. 15, 50 n. 31, 79, 163, 168, 281, 285, 295 n. 28, n. 29; NP-trace 24–25, 47 n. 18, 103, 163, 165, 169, 171, 173, 184, 196, 203–208, 212, 282, 285, 295 n. 33; wh-trace 4, 5, 7, 16 n. 10, 53, 107, 130 n. 1, 131 n. 1, 163, 167–175, 177, 204–208, 211–212 Transformational analysis 11, 14, 94, 232, 234, 241, 257–258, 272, 276, 284, 292 n. 10, 340, 346, 361 n. 28, Generalized transformation 3, 15 n. 4, 75–77, 81, 90, 96 n. 5; Singulary transformation 75–77, 90; see also Merge; Movement transformations Transformational component 21, 101, 164, 236, 238, 278, 287, 291, 308, 358 n. 7 Ukrainian 6, 134, 136–137, 139, 142, 144, 146, 150, 152, 156 n. 8, n. 10, 157 n. 14, n. 18, n. 19, 158 n. 25, 159 n. 31 Universal Grammar (UG) 2, 34, 47 n. 12, 78, 116, 264, 266, 29 Verb movement parameter 14 Weak Crossover (WCO) 154, 157–159, 168 n. 29, 169 n. 31 wh-feature 57–59, 62, 65, 71 n. 22, 155 n. 4, 311 n. 11 wh-infinitive construction 33, 38, 39 wh-island violation 1, 86 wh-movement, see Movemement transformations