Language Acquisition: Knowledge Representation and Processing
Related journals Editors:
English for Specific Purposes T. Dudley-Evans, L. Hamp-Lyons, P. Master
Editor:
Journal of Pragmatics J.L. Mey
Editors:
Language & Communication Roy Harris and Talbot Taylor
Editor:
Language Sciences Nigel Love
Editors:
Lingua John Anderson and Neil Smith
Editor:
Neuropsychologia S.D. Iversen
Editor-in-Chief:
Speech Communication C. Sorin
Editor:
System Norman Davies
Editor:
Trends in Cognitive Sciences Peter Collins
Free specimen copies of journals available on request Related Rook Series Language & Communication Library Series editor: Roy Harris Current Research in the Semantics/Pragmatics Interface Series editors: K.M. Jaszczolt and R. Turner Advances in Psychology Series editor: G.E. Stelmach For more information on all titles in linguistics and related fields, go to the Elsevier Science website at: http://www.elsevier.nl or http://www.elsevier.com
Language Acquisition: knowledge Representation and
Processing
Antonella Sorace Caroline Hey cock Richard Shillcock Editors University of Edinburgh
1999 North-Holland Amsterdam - Lausanne - New York - Oxford - Shannon - Singapore - Tokyo
ELSEVIER SCIENCE Ltd The Boulevard, Langford Lane Kidlington, Oxford OX5 1GB, UK © 1999 Elsevier Science Ltd. All rights reserved. This work and the individual contributions contained in it are protected under copyright by Elsevier Science Ltd, and the following terms and conditions apply to its use:
Permissions may be sought directly from Elsevier Science Rights & Permissions Department, PO Box 800, Oxford OX5 IDX, UK; phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected]. You may also contact Rights & Permissions directly through Elsevier's home page (http://www.elsevier.nl), selecting first 'Customer Support', then 'General Information', then 'Permissions Query Form'. In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (978) 7508400, fax: (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London WIPOLP, UK; phone: (+44) 171 436 5931; fax: (+44) 171 4363986. Other countries may have a local reprographic rights agency for payments. Derivative Works Subscribers may reproduce tables of contents for internal circulation within their institutions. Permission of the publisher is required for resale or distribution of such material outside the institution. Permission of the publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Contact the publisher at the address indicated. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the publisher. Address permissions requests to: Elsevier Science Rights & Permissions Department, at the mail, fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made.
First edition 1999 British Library Cataloguing in Publication Data A catalogue record from the British Library has been applied for. Library of Congress Catalog1ng-1n-Pub11cat1on Data
Language acquisition : knowledge representation and processing / edited by Antonella Sorace. Caroline Heycock, Richard Shlllcock. p. cm. Papers presented at a conference held Edinburgh, 1997. Includes b i b l i o g r a p h i c a l references. ISBN 0-08-043370-7 hc 1. Language acquisition—Congresses. I. Sorace, A n t o n e l l a . II. Heycock. Caroline B., 1960. III. S h l l l c o c k , Richard. P118.L2539 1999 40V.93--dc21 98-51947
CIP
ISBN: 0-08 043370-7 ©The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands.
For
Teun Hoekstra (1953-1998)
in memory
This page intentionally left blank
Contents A. Sorace, C. Heycock, R. Shillcock, Introduction: Trends and convergences in language acquisition research
1
K. Wexler, Very early parameter setting and the unique checking constraint: A new explanation of the optional infinitive stage
23
T. Hoekstra, N. Hyams, Aspects of root infinitives
81
A. Radford, Genitive subjects in child English
113
B.D. Schwartz, The second language instinct
133
B.B. Tesar, P. Smolensky, Learning Optimality-Theoretic grammars
161
P.W. Jusczyk, Constraining the search for structure in the input
197
S. Pinker, Words and rules
219
Author index
243
Subject index
253
This page intentionally left blank
Lingua ELSEVIER
Lingua 106(1998) 1-21
Introduction: Trends and convergences in language acquisition research* Antonella Soracea'*, Caroline Heycockb, Richard Shillcockc " Department of Applied Linguistics, University of Edinburgh, 14 Buccleuch Place, Edinburgh EH8 9LN UK h Department of Linguistics, University of Edinburgh, Adam Ferguson Building, George Sqaure, Edinburgh EH8 9LL UK '' School of Cognitive Science, Division of Informatics, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW UK
Language acquisition has a special place in cognitive science, and has become a fertile ground for interdisciplinary research: this volume presents some current cutting-edge research which illustrates this fruitful convergence of endeavours. The contributions to the volume originated as invited talks presented at the Gala '97 Conference entitled 'Language Acquisition: Knowledge Representation and Processing', held in Edinburgh in April 1997.' The aim of the conference was to promote crossfertilisation across the different disciplines involved in research into language acquisition, and the collection presented here reflects this aim very directly.
1. Background The last decade has witnessed a significant shift in the cognitive paradigm applied to language acquisition. Until perhaps forty years ago, the study of language acquisition was largely restricted to the observation of individual case studies of child development (first language or LI acquisition), and to the improvement of teaching methods for foreign languages (second language or L2 acquisition). Theoretical ori* We would like to thank the authors contributing to this volume for the dedication and thoroughness that they demonstrated in the preparation, revision, and submission of their manuscripts. We are also grateful to the reviewers, whose insightful critiques and suggestions - often produced under severe time pressure - were invaluable. * Corresponding author. Phone: +44 131 650 3961; fax: +44 131 650 3962; e-mail:
[email protected] 1 The proceedings of the conference, containing all other talks and poster presentations, have been published as A. Sorace, C. Heycock and R. Shillcock (eds.), 1997. Proceedings of the Gala '97 Conference on Language Acquisition. Edinburgh: University of Edinburgh. 0024-3841/997$ - see front matter © 1999 Elsevier Science B.V. All rights reserved PII: S0024-3841(98)00028-X
2
A. Sorace et al. I Lingua 106 (1998) 1-21
entations began to emerge largely as a result of the influence of generative linguistics, which firmly places language acquisition at the core of linguistic research. From this perspective, addressing the 'logical problem of language acquisition', that is, the ease and uniformity with which children acquire the ambient language in spite of the fact that they are exposed to qualitatively and quantitatively uneven input, motivated the research programme focused on the child's innate language-specific predispositions, or Universal Grammar (UG). However, from a different, more psychological perspective, tackling the problem of how language is acquired meant accounting for learning processes, that is, how the child's developing general cognitive abilities interact with the properties of the ambient input, and how both assist the child in the acquisition task. Similar orthogonal perspectives can still be found in the field of theoretical second language development, which is approached, on the one hand, by linguistically-oriented researchers with the intent of discovering whether and how Universal Grammar constrains second language grammars, and on the other hand by psychologically-oriented researchers with the aim of explaining second language learning as the result of general cognitive strategies and social interaction (see e.g. Bhatia and Ritchie, 1996). A different but related dichotomy that until recently characterised language acquisition research was the split between two different objects of inquiry: knowledge representation and processing mechanisms. Theoretical linguistic research into knowledge representation was addressed to the 'mental grammars' that learners construct on the basis of the universal notions and principles of Universal Grammar on the one hand, and the actual sentences they hear in their everyday experience on the other. Psycholinguists, in contrast, were primarily concerned with the processing mechanisms involved in actually using language, i.e. understanding and speaking, and how these mediate the acquisition process. This research ranges from speech perception (how listeners distinguish the sounds of their language) to parsing (how listeners group the words of a sentence into syntactic units) to constructing models of discourse (how listeners process the meaning of sentences in particular contexts). Language acquisition research on these topics focuses on the development of these abilities in the child, and how they interact with the developing knowledge of grammar. Developments over the last decade have changed the picture in profound ways, blurring the early dichotomies and bringing about an increasing diversification of perspectives and approaches. As Bloom (1994b: 1) puts it, '... there is no single process that one could call 'language acquisition'; instead, we are left with the task of explaining different acquisitions'. While this might look like fragmentation to the casual observer, we regard it as an unambiguously positive sign of vitality. Because of such diversification, current research exhibits a much higher degree of theoretical and methodological sophistication than was common even in the recent past. In so far as learnability constraints inform theories of grammar, developments in formal linguistics and language acquisition research go hand in hand; at the same time it has become clear that explanations of how the parser operates are central to the developmental problem: positive evidence can be informative only if the learner is able to assign it a structure (see Frazier and de Villiers, 1990). Moreover, experimental psychology and computer science have made available new techniques and concepts for
A. Sorace et al. I Lingua 106 (1998) 1-21
3
the study of both young infants and children (see e.g. Jusczyk, 1997), and adult informants and learners (e.g. Bard et al., 1996). Powerful, fast computers are increasingly being used in concert with very large corpora of language - both text and transcribed speech - to derive generalisations about language that were not visible to smaller scale analyses. The establishment of the CHILDES database (McWhinney, 1995) has provided access to data from the acquisition of many languages, as well as procedures for analysis. A related development has been the growth of computational modelling as a field (e.g. Broeder and Murre, forthcoming), whose purpose is to implement theoretical constructs, thus forcing a very explicit formulation of assumptions. Connectionism, as one of the main computational paradigms of the last decade, has raised new questions concerning our understanding of learning and how it might occur in the brain. Because of their sensitivity to input, connectionist models are often seen in diametrical opposition to symbolic-deductive theories of language and language acquisition; however, this is not necessarily a valid opposition, since the initial definition of the architecture and learning algorithms of connectionist systems is important in understanding the emergence of linguistic structure in the model (see Plunkett, 1995). Connectionist models have become particularly well-known in the context of the acquisition of English verb morphology (Rumelhart and McClelland, 1986; Marcus, 1995; Hahn, Nakisa and Plunkett, 1997, among others), and have often been presented as alternatives to rule-based explanations of the process, giving rise to a debate which finds an echo in the paper by Pinker (this volume). The earlier polarisation between a focus on data and a focus on theory (e.g. 'Child Language' vs. 'Language Acquisition' in Ingram, 1987) also seems to have been largely abandoned, as shown by the recent publication of books on experimental methods for eliciting syntactic data from children (e.g. McDaniel et al., 1996; Grain and Thornton, 1998) - a visible sign of the much enhanced methodological standards that now characterise 'developmental linguistics'. The cross-disciplinary fertilisation of ideas is due to the fact that researchers of different orientations are now, to a much greater extent than before, actively acquainted with each other's work. The contributions to the present volume illustrate the convergence of distinct kinds of theories, tested against distinct kinds of data. Indeed, they reveal at least the basis of a common agenda in current language acquisition research. The papers on LI acquisition (Hoekstra and Hyams, Jusczyk, Pinker, Radford, Tesar and Smolensky, and Wexler), exhibit a consensus on the following issues: • Language acquisition is a tightly constrained process that is biologically predisposed to follow certain paths; it is, in fact, even more constrained than was previously thought. • Basic knowledge of language is acquired very early, in the first two years of life, much of it probably before the emergence of production. • Much acquisition is perceptual, and not dependent on direct negative evidence. Schwartz's paper, the only one that deals with theoretical L2 acquisition, points to another generalisation on which there is much agreement in this field:
4
A. Sorace et al. I Lingua 106 (1998) 1-21
• Non-native grammars may be non-convergent with respect to the target grammar, but are UG-constrained. Taken together, the papers in this volume offer an explanation of constraints on language development, for both LI and L2 acquisition, that crucially hinges on an analysis of early transitional grammars, and particularly on a precise definition of what constitutes the acquirer's 'initial state'. A theory of L2 acquisition, in addition, has the responsibility of accounting for final states, which are not deterministically and uniformly attained, as in LI acquisition, but often differ from the target and from each other. Before introducing the content of the individual papers, we will provide some background to the discussion, focused around these points of convergence. Our review of the main issues will necessarily be brief and restricted to questions of relevance to the papers included in this volume (for more details, and different points of view, see Bloom, 1994a; Fletcher and McWhinney, 1995), but will touch on some of the central concerns in the debate, including controversial questions that have not yet received an answer. 2. Language acquisition is a tightly constrained process The existence of constraints on acquisition makes life easier for the learner, facilitating fast convergence on the target. However, it raises the opposite question: why do children not converge on the target immediately? In other words, why is acquisition not instantaneous, or why is there a 'developmental problem' as well as a 'logical problem'? Consequently, even given a shared assumption of universal constraints on acquisition, it is still an open question as to how much of UG characterises the initial stage: are all the principles of UG there from the start, or are at least some of them genetically scheduled to mature at a later stage? There is a lively debate in the literature between proponents of the Continuity Hypothesis (e.g. Hyams, 1986, 1996; Clahsen, 1992; Penner and Weissenborn, 1996, among others) and supporters of the Maturationist Hypothesis (e.g. Borer and Wexler, 1987; Radford, 1990; Felix, 1992): this debate is also reflected in the present volume. Simplifying somewhat (see Meisel, 1995 for fuller details), the Continuity Hypothesis holds that UG in its entirety constrains child grammars at all stages of development: delays and child-adult differences at any stage are due not to divergences in syntactic representation, but rather to lack of knowledge in domains other than UG. This position is the most parsimonious in that it requires the fewest developmental changes (Pinker, 1984; Grain, 1991, among others); it is illustrated in this volume by Hoekstra and Hyams' contribution. By contrast, the Maturationist Hypothesis maintains that not all of UG is present in early child grammars. There are two kinds of maturationist position. In one, often called the 'Structure-building' account, it is argued that functional categories (e.g. INFL and CP) are absent in children's earliest production because they mature in the third year of life (e.g. Radford, 1990, 1996; Vainikka, 1993/94; Lebeaux, 1988;
A. Sorace et al. I Lingua 106 (1998) 1-21
5
Powers and Lebeaux, 1998). The other position - often labelled 'UG-constrained maturation' - holds that principles of UG are present from the start, but specific elements of linguistic knowledge are genetically scheduled to become operative at a later stage. For example, Borer and Wexler (1987, 1992) propose that A-chains, which are involved in the derivation of passive verbal constructions and other instances of NP-movement, are initially unavailable to children. Early child syntax is more restrictive than adult syntax because it is affected by developmental learning constraints ('proto-principles', in Borer and Wexler's (1992) definition) that uniquely characterise child grammars and generate a subset of the representations that are possible in adult grammars. This position is argued for in Wexler's paper (this volume), which proposes that one of these early constraints is responsible for a variety of features of early child grammars, such as non-finite forms in matrix clauses, oblique case assignment, and null subjects. Thus, both maturationist positions regard change as a series of grammar-internal restructurings, due either to the maturation of late-appearing UG principles or to the withering of developmental constraints, leading to the consequent reinterpretation of the input data. A different explanation of what constrains child grammars, which is however consistent with many of the assumptions of the Continuity position, is offered by Optimality Theory (OT) (Prince and Smolensky, 1993). OT is a formal theory of UG that assumes that (a) universal constraints are ranked in language-specific ways, and (b) constraints can be violated: more specifically, lower-ranked constraints may be violated in order to satisfy higher-ranked constraints. Within this approach, the child is assumed to be equipped with knowledge of the abstract form of an OT grammar, as well as with an interpretive parser, which is essentially the same mechanism used by the competent adult speaker in assigning a structural description to an overt form in the input (see Smolensky, 1996; Tesar and Smolensky, 1998), and a constraint demotion mechanism, which is a procedure for learning constraint rankings. Learners use this knowledge of the abstract form of an OT grammar from the start to refine their ranking hypotheses and converge on the ranking required by the target language. In so doing, they rely on an iterative algorithm, which involves repeated alternations between structure-assignment (by means of interpretive parsing) and grammar learning (by means of constraint demotion). Tesar and Smolensky (this volume) show that the iterative algorithm can be proved to converge very fast on the basis of limited evidence, hence confirming the effectiveness of the constraints imposed by OT on the acquisition space. Whereas speech perception researchers have traditionally been less uniformly committed to the idea of domain-specific constraints, they are in agreement that children's sensitivity to certain properties of the input develops at a very early stage in infancy. Research on language perception has increasingly revealed its scope for illuminating language acquisition. As in the rest of language acquisition research, a dominant question has been the issue of what representations or processes are innate and specific to language, as opposed to being learned and grounded in general, even non-species-specific, perceptual and cognitive constraints. The field has relied on the development of methods to study infant speech perception; techniques involving non-nutritive sucking (in the youngest infants) and head-turning behaviour (in older
6
A. Sorace et al. I Lingua 106 (1998) 1-21
infants) have led to a clearer understanding of the discriminations that infants can make between different speech stimuli, but the issue remains of whether the discriminations are being made at a linguistic or non-linguistic level. Nonetheless, researchers agree on the fact that children's sensitivity to potentially linguistic properties of the input develops very early in the first year, in response to the speech input, and is shadowed by a decline in sensitivity to distinctions not frequently found in the input (see Werker, 1993, for a review.) Although there have been demonstrations that other species are capable of making some of the same distinctions (see, e.g., Kuhl and Miller, 1975), it is still possible that the speed of learning the discrimination, or the initial tuning to it, reflects some distinctly human capability. Although research has strengthened the claim that at least some parts of language acquisition, such as the role of perceptual constancy for instance, reflect general cognitive and perceptual abilities, such research cannot rule out the possibility that differences between human and nonhuman performance reflect the use of innate, language-specific components to acquisition, or at least the very early recruitment of general cognitive and perceptual processes to an emerging language module. Computational perspectives have become increasingly influential, allowing the implementation and testing of particular theories (see, in particular, the debate concerning learning of the past tense, below) and the statistical analysis of large corpora of speech to children (see, e.g., Cartwright and Brent, 1994; Cairns et al., 1997). Such studies may reveal a more sophisticated picture of the information that is in the speech stream, offsetting somewhat the traditional notion that the speech input is irredeemably noisy and impoverished. In general, though, such studies can only show that the information is potentially available, and they are not able to say that a particular generalisation is actually used, or that a particular source of information is more salient than another for the infant. Also from this computational perspective (Elman, 1993; but see also Newport, 1990), it has been proposed that an important general constraint that mediates language acquisition is the size of the processing window onto the speech input granted by the infant's developing memory and attention: a very small window may be advantageous in allowing the infant initially to notice and encode only very local relationships within the speech, and thereby prioritising local over long-distance dependencies - 'less is more'. Indeed, Jusczyk (this volume) reports experiments on the effect of larger or smaller amounts of intervening material in processing syntactic discrepancies in speech heard by infants, demonstrating the potential relevance of this perspective. In general, the increase in knowledge about the infant's sensitivities to the information in the speech input has lent credence to the tradition of 'bootstrapping' theories, in which one type of information - syntactic, semantic or prosodic, in various approaches - provides evidence for the establishment of structured relationships at another level. Thus, for instance, in prosodic bootstrapping, prosodic (and other) information in the speech signal is taken to provide evidence for syntactic structure (Gleitman and Wanner, 1982; Morgan and Demuth, 1996). Overall, then, research on the infant's developing sensitivities to different aspects of the speech signal has provoked models of how the infant develops competence at the
A. Sorace et al. I Lingua 106 (1998) 1-21
1
lexical and the syntactic levels, and has provided a richer context for the ongoing debate about the potential role of innate, specifically linguistic capacities in language acquisition.
3. Basic knowledge of language is acquired very early The early appearance of linguistic knowledge has long been regarded as among the best evidence for UG constraints. There is now a substantial body of evidence that suggests that much knowledge of complex syntactic properties is acquired fast and is in place shortly after the child's second birthday (see Hoekstra and Schwartz, 1994; Clahsen, 1996; and references herein; for a note of caution, see Atkinson, 1996): this knowledge includes basic word-order parameters and the properties of inflection. It is still controversial, however, whether such evidence can be said to characterise the initial stage of syntactic development, or whether it belongs to a stage subsequent to 'First Syntax' (see Paradis and Genesee, 1997, for comments). Whereas it is possible to argue in favour of one or the other possibility on theoretical grounds, only extensive research on pre-linguistic and one-word stages can provide empirical evidence relevant to this question. This is an area where speech perception and linguistic research have a lot to offer each other; as Jusczyk's paper (this volume) suggests, there is a promising line of investigation that explores the existence of universal linguistic constraints in very young infants, using experimental techniques, such as the Headturn Preference Procedure, suited to the pre-linguistic stage. What about the other side of the conundrum, that is, the obvious and protracted divergences between child and adult grammars? Many current accounts within the UG framework favour an explanation of nonadult forms produced by children in terms of underspecification of functional categories, which is held to be responsible for a range of morphosyntactic reflexes. However, the term 'underspecification' has not received a uniform interpretation in the literature. For Wexler (1994; this volume), it means the absence of a particular functional head (Tense or Agr), which may itself be caused by the presence of a developmental constraint in child grammars (in a similar vein, Rizzi's (1994) 'Truncation Hypothesis' assumes that child grammars may lack the principle 'CP = root'. Hyams (1996), on the other hand, argues that functional nodes, both in the clausal and nominal domains, may be underspecified in the sense of 'unindexed' - not part of syntactic chains that anchor the event or the referent. Hoekstra and Hyams (this volume) develop this analysis further, arguing that this lack of anchoring can be traced back specifically to the underspecification of the Number feature. Yet another definition of underspecification is provided by Clahsen, 1996; see also Clahsen et al., 1994) within the Lexical Learning Hypothesis. The assumption of this theory is that UG principles are available in early child grammars, but language-specific functional heads and their feature specification are learned gradually on the basis of exposure to lexical and morphological items in the ambient input. Early child grammars are characterised by underspecified functional projections that have fewer features than
8
A. Sorace et al. I Lingua 106 (1998) 1-21
the corresponding positions in adult grammars; other functional projections, as well as the syntactic features required by the target language, are gradually added as a result of lexical learning. All these theories adopt underspecification as an explanatory mechanism that is meant to account for non-adult forms and developmental delays in child grammars under a UG-constrained scenario. The introduction of underspecification into child grammar has the advantage of making the acquisition process more strongly deductive than it was before. Particular syntactic phenomena that were thought to be independent 'parameters' are now seen as derivative. The Agr/Tense underspecification account, for example, allows us to see how phenomena such absence of finite morphology, null subjects, lack of determiners and oblique case-marking are related. However, an underspecified grammar may exhibit certain theoretically problematic features. The most obvious of these is optionality, which is limited in adult grammars but quite pervasive in child grammars. Optionality is difficult to account for in current formal theories of grammar: this is true of both Minimalism and Optimality Theory. What does it mean to say that the child can, for example, optionally project either Agr or Tense (Wexler, this volume), or optionally choose CP as the root node (Rizzi, 1994)? The favoured solution (and the only one permissible in current theoretical terms; see Fukui, 1993; Anttila, in press) assumes that both a derivation with e.g. Tense and one without are equal in terms of economy: this assumption is exemplified in Wexler's paper (this volume), and by the notion of a 'tie' between constraints in Tesar and Smolensky's paper (this volume). However, this type of solution leaves much to be explained: in particular, it provides no insight into the relative frequency of the variants, far less the observed systematic changes in this over the course of development. One possible way out of this problem would be to introduce probabilities into the grammar, which would obviously affect some of the current basic foundations of formal grammatical theories: recently, some modifications of Optimality Theory have been proposed which essentially adopt this solution (Hayes, in press; Boersma, 1997). Another solution to the optionality generated by underspecification, consistent with current Minimalist assumptions about the nature of grammar, is to exclude it from the domain of syntax proper and ascribe it not to deficits in syntactic representations, but rather to non-target knowledge in other domains, such as pragmatics or semantics. Hoekstra and Hyams (this volume) thus attribute children's freedom to leave Number underspecified to a lack of knowledge of interface conditions. Wexler (this volume) allows for derivations in child grammars which omit Tense and yet converge because Tense is not required by the computational system, but only by a principle of the interface with the conceptual system. Future research will be needed to explain how exactly underspecified grammars become specified, and how exactly knowledge of 'interface' conditions develops and interacts with the grammar, since there is every reason to believe that this development is as strictly constrained as that of syntactic knowledge. The more child grammars are constrained (particularly under the maturationist scenario) the more the acquisition paths for different languages should resemble each other. Wexler (this volume), for example, argues that what look like language-
A. Sorace et al. I Lingua 106 (1998) 1-21
9
specific differences in the distribution of optional infinitives reduce to the application of a single developmental constraint plus knowledge of the default Case and knowledge of a specific value for the Null Subject parameter. An obvious testing ground for these theories is bilingual (LI) acquisition, in which two languages develop in the same brain, and thus the effect of individual variation is eliminated. Whereas the evidence from studies of bilinguals is still limited, it suggests that functional categories may develop at different times in different languages acquired simultaneously (see Paradis and Genesee, 1996, 1997), thus providing an interesting challenge to the maturationist account.
4. Much acquisition is perceptual Early models of language acquisition within the Principles and Parameters framework emphasised the distinction between UG principles, assumed to be known innately, and language-specific parameters, which had to be learned on the basis of exposure to a particular language. These models therefore allowed for the possibility that children could explore the range of parameterised options permitted by UG, and sometimes mis-set parameters (e.g. Hyams, 1986; Jaeggli and Safir, 1989; Roeper and Williams, 1987); they proposed solutions to the problem of how children retreat from such errors (cf. Berwick, 1985; Matthews and Demopoulos, 1989) in the absence of negative evidence. Recent research, by contrast, suggests that many syntactic and perceptual parameters are set at a very early stage. Indeed, parameters may be set before production abilities emerge (thus showing the implausibility of theories that assume that the child learns from his or her own production (e.g. Elbers, 1995, forthcoming)); this is definitely the case for perceptual parameters, but has not been conclusively demonstrated for syntactic parameters. One of the consequences of early parameter setting is that learning from negative evidence is an impossibility, since negative evidence - at least in the narrow sense of explicit correction - presupposes the presence of incorrect forms in children's production. Considerably less clear is the role of indirect negative evidence: if learners expect to find a form in the input which does not occur, it is not obvious how long they have to wait before concluding that it will never occur (see Valian, 1990). The role of indirect negative evidence within Optimality Theory is much more defined, as discussed by Tesar and Smolensky (this volume). Since learners are assumed to have access to the universal Gen function, they know what the competitors are for each piece of positive evidence to which they assign a structure; given the theoretical requirement that there be only one optimal candidate within each set, learners will indirectly know that the competitors are sub-optimal. The other important consequence of the extent to which children learn without any access to direct negative evidence is that a new burden is put on 'triggering' as an explanation of what causes developmental changes. Even a short review of the conceptual and empirical problems raised by the notion of triggering is a task that cannot be accomplished here (see Atkinson, 1992 for a discussion). Suffice it to say that triggering, as opposed to learning, involves less exposure to the input and a
10
A. Sorace et al. I Lingua 106 (1998) 1-21
much narrower margin for retreating from errors: external triggers must therefore have precise structural properties that allow the child to converge fast on the target. Formal approaches to learnability have addressed these questions (Gibson and Wexler, 1994; Clark, 1992; Clark and Roberts, 1993). Triggers, as already mentioned, may also be grammar-internal: under the maturationist scenario, what causes a shift from one stage to the next are changes internal to the acquirer due to her reaching particular developmental stages. The attainment of these developmental milestones confers on the child the ability to access properties of the input that have hitherto been filtered out.
5. Non-native grammars may be non-convergent but are UG-constrained The acquisition of a second language in adulthood is evidently different from the acquisition of a native language because the adult learner already knows (at least) one other language: the initial state of the child and of the adult are not the same. It is also clear that the attainment of native-like competence in a second language is the exception rather than the rule. Unlike children, who reach perfect mastery of whatever language they are exposed to, many adults display varying degrees of 'imperfection' in second languages, even after long periods of exposure, and even those who are capable of native-like performance often have knowledge representations that differ considerably from those of native speakers (Sorace, 1993). So not only the initial state, also the final state of the child and of the adult learner are different. Yet there are important, albeit less obvious, similarities between the two processes, as much recent theoretical research on second language development has demonstrated. The central concern for a theory of L2 acquisition is to determine whether, and to what extent, UG still constrains the process: given that adults can rely on general cognitive abilities, it is at least conceivable that they may use them, instead of UG, in the task of learning a second language, particularly if UG, for neurological reasons, ceases to operate after a certain age. What emerges from recent research, however, is a picture in which UG plays a fundamental role in shaping non-native grammars at all stages of development. Whereas theories that posit an essential identity between the processes of LI and L2 acquisition seem both intuitively and empirically implausible (see the discussion in Epstein et al., 1996), there is now a consensus on the fact that UG and the learner's LI are the two forces that interact to shape L2 development (see White, 1989, 1996; Schwartz and Sprouse, 1994). Precisely how much of the LI characterises the initial state, however, is an open question, as Schwartz's paper (this volume) illustrates very well. The position Schwartz holds is that the LI grammar in its totality represents the L2 initial stage; the learner then progressively restructures this system, on the basis of exposure to the input and a failure-driven mechanism which causes the reanalysis of input that cannot be accommodated by the current grammar. This line of research requires going beyond the 'identity assumption' (Sorace, 1996), namely the belief that the only proof of the availability of UG is the correspondence/identity between the learner's grammar and the L2 grammar, in terms of either interim stages
A. Sorace et al. I Lingua 106 (1998) 1-21
11
or final states: non-native grammars may diverge from the L2, but they are still 'natural' grammars, instantiating options that are within the UG-constrained range (Finer and Broselow, 1986; Thomas, 1991; White, 1996). As Schwartz (this volume) suggests, such divergence is to be expected precisely because the adult learner's initial state is different from the child's: in progressively restructuring their grammars upon exposure the adult learner necessarily resorts to different analyses from those entertained in LI acquisition, and resorts to a multiplicity of UG mechanisms that are not necessarily needed by the LI acquirer at corresponding stages of development. Also to be expected is the fact that L2 grammars are generally characterised by protracted optionality, in contrast with child grammars for which optionality is a transient phenomenon. L2 learners may not receive sufficiently robust evidence to be able to expunge non-target optional variants, regardless of whether these are derived from the LI or not. Optionality, however, may not necessarily be the hallmark of a 'rogue' grammar (contrary to what is suggested in e.g. Towell and Hawkins, 1994): as discussed in Schwartz's paper (this volume), it may be a property of a natural language grammar (see also Robertson and Sorace, in press; Sorace, 1998).
6. Overview of papers The phenomenon on which Wexler's chapter is based is the Optional Infinitive (OI) stage also analysed (with a different focus and conclusions) in the chapter by Hoekstra and Hyams. Crucially, in some languages children go through a stage of alternating between finite and nonfinite verb forms in root clauses; that is, they (optionally) produce Root Infinitives (RIs). Other constraints on finite and nonfinite forms found in the adult language are, however, already mastered. Wexler offers a radically new analysis of the OI stage which is also presented as a demonstration of two larger claims, one about the course of language acquisition and one about the relation between developmental psycholinguistics and linguistic theory. With respect to the course of language acquisition, Wexler argues that, in contrast to what he takes to be generally accepted assumptions, 'basic parameters' are set correctly, and the properties of inflectional elements learned, very early - at the latest by the time there is any production evidence for them at the two-word stage - and that, by contrast, some aspects of UG have to mature over time. As for the relation between developmental psycholinguistics and linguistic theory, Wexler makes a case for results from the study of child language being crucial to linguistics, and in particular being in some cases more informative about the nature of UG than results from adult language. Wexler's starting point for his reanalysis of the OI stage is the consensus in the literature that the OI stage is principle-based and not construction-specific. Children who alternate between finite and nonfinite forms in various languages show knowledge of the basic parameters of verb movement at the earliest observed stage. This is one instance of Wexler's general hypothesis that parameters are set correctly at the latest by the earliest observable stage of production; a further instance is his differentiated analysis of missing subjects in the language of children exposed to pro-drop
12
A. Sorace et al. I Lingua 106 (1998) 1-21
and non pro-drop languages. Wexler then considers the Agr/Tense omission model developed by Schiitze and Wexler (1996), which accounts for the Nom/Acc case alternation on the subjects of root infinitives. One aspect of this analysis is the availability of default case for the subject when Agr is absent. This availability raises the question of what forces leftward movement of the subject from its assumed base position in the Verb Phrase in these sentences, since the original proponents of the VP-internal subject hypothesis argued that this movement (in adult languages) was driven by the need for the subject to be assigned nominative Case. More recently, however, Chomsky (1995) has argued on the basis of quite different data that the necessity for a subject to appear in this external position is due to a feature of the functional heads requiring a DP, rather than to the assignment of Case. For adult grammars the argument is somewhat indirect and depends on assumptions about intermediate stages of a derivation; this is the immediate basis for Wexler's contention that child language can in some cases provide a clearer view of UG than adult grammars (see Deprez and Pierce, 1993 for similar arguments). In the second part of the chapter, Wexler addresses the question of why children acquiring null-subject languages do not go through an OI stage. He explains the existence of OIs in non-null subject languages in terms of a developmental constraint, the 'Unique Checking Constraint', which differentiates the child grammar from the target adult grammar. By assumption, this constraint is present in all children, but given additional assumptions about the nature of Infl in null-subject languages it operates vacuously in these languages; hence the children's output is identical in the relevant respects to the adults'. The presence of the UCC is an example of UG-constrained maturation, which makes child syntax more restrictive than adult syntax: derivations that are allowed in adult grammars are excluded by child grammars. However, in one sense it seems that the children's syntax is (optionally) less restrictive: after all, root infinitives are (generally) not grammatical in the relevant adult grammars. Wexler proposes that representations with OIs are convergent (a narrower notion than 'grammatical') because they only violate interface conditions which require the presence of Agr/Tense (i.e. this requirement is a interpretive/conceptual property, rather than a property of the computational syntax). One might then suppose that children are not initially aware of the relevant interface condition, or that there is a preference for a convergent derivation, even if an interface condition is violated. But in either case, children should never produce adult-like finite sentences that violate the UCC; but they do - some of the time. Thus at this point we face the problem of optionality. Wexler offers two potential solutions: one is to suppose that the UCC itself applies only optionally. The other is to adopt a position more familiar within Optimality Theory: the interface condition and the UCC are equally ranked constraints, and optionality is the result of a tie between two derivations, each of which violates one of these constraints. As Wexler himself notes, neither of these solutions seems wholly satisfactory; the question of optionality remains essentially open. Like Wexler's, Hoekstra and Hyams' paper focuses on the properties of RIs in early child grammars. In common with Wexler, Hoekstra and Hyams assume early convergence of morphosyntactic properties: delays and non-adult forms are due to
A. Sorace et al. I Lingua 106 (1998) 1-21
13
lack of interpretive knowledge that does not belong to the core of the computational system. Also in common with Wexler, it is assumed that some functional head is lacking - or at least underspecified - in the child grammar. For Wexler, either Tense or Agr may be absent; for Hoekstra and Hyams, it is the functional head Number that may be underspecified. This is crucial to the phenomenon of RIs in their theory, since they propose that number morphology is one of the possible ways in which finiteness may be expressed. Thus if Number is underspecified in a language like Dutch, where finiteness is expressed through number morphology, the result will be a nonfinite clause. In a language like Japanese, on the other hand, where finiteness is expressed through tense morphology, the same underspecification will not result in an infinitive. Thus Hoekstra and Hyams, like Wexler, propose an explanation of the cross-linguistic distribution of RIs that relies on the interaction of a non-language specific developmental phenomenon with independently motivated cross-linguistic differences, although they differ in their proposals for both ingredients of the explanation (for Wexler the crucial cross-linguistic difference is whether or not Infl may itself be nominal, in some sense (the empirical evidence for which is the existence of Infl-licensed pro-drop in the language). It might be thought that these different ingredients would yield a very different division of languages, and hence an easy empirical metric for comparing the two analyses, but given the (still contentious) correlation between rich inflection (which presumably entails encoding more than just a number distinction) and subject pro-drop, the predictions actually converge to a very large extent. Whether or not they are empirically equivalent on this point may depend on whether a principled distinction can be made between the kind of prodrop found in the southern Romance languages ('Infl-licensed' pro-drop), and the kind of pro-drop found in Japanese and Korean. The work of Wexler on the one hand and Hoekstra and Hyams on the other does differ considerably in focus: Hoekstra and Hyams include as evidence for their analysis extensive discussion of the interpretive properties of RIs. Most centrally, they take up earlier observations that RIs in all languages where they have been noted, with the exception of English, only occur with eventive verbs; these observations are synthesised into their Eventivity Constraint. This difference in interpretation between English and other languages that allow RIs they argue to correlate with other differences, both interpretive and syntactic. RIs in English differ from those in other languages in not having exclusively modal interpretations; in addition, they occur with an overall higher frequency, and allow a full range of subject types (whereas in a language like Dutch RIs occur almost exclusively with null subjects or noun phrases with no overt marking for number). These differences Hoekstra and Hyams trace back to a peculiarity of the English infinitival form in adult grammar: unlike all the other languages that allow RIs, the English infinitive is morphologically unmarked. This variation in infinitival morphology is crucial to their analysis of the differences between RIs in English and the other languages in which they occur. On the one hand, they argue that the modal interpretation of RIs in children is a direct reflection of the [-realised] feature of infinitival morphology; following Giorgi and Pianesi, 1997, they claim that this interpretive difference can be detected in the dif-
14
A. Sorace et al. I Lingua 106 (1998) 1-21
ferent interpretations of infinitive forms in the adult grammars also. On the other hand, they propose that the lack of overt morphology means that English RIs are actually ambiguous between an infinitive and a default finite form. This then is the basis for the explanation of various distributional differences, including the greater overall frequency of (apparent) RIs in English, and their comparative freedom to occur with subject noun phrases that are specified for number. Hoekstra and Hyams' overall conclusion concerning the difference between adult and child language is quite similar to Wexler's to the extent that they propose that a principal source of difference between child and adult grammars is difference in the interface of syntactic with extra-syntactic systems. However, their account does not rely on any syntactic constraint in child grammar that is absent from adult grammar; thus their approach is consistent with a stronger version of the continuity hypothesis discussed earlier. Both the chapters by Wexler and by Hoekstra and Hyams include analyses of accusative subjects in root clauses in child grammars. Radford's chapter is devoted to another case of non-nominative subjects in root clauses in English: this time apparently in the genitive case. The qualification is important, however: Radford's final conclusion is that there are no genuine genitive subjects in child English; rather, the forms that seem to correspond to genitives in adult grammars have a different status (or rather, different statuses) in child grammars. As Radford points out, analyses that have taken these subjects to be genitive and that have aimed to explain why such subjects occur in child language, despite their ungrammatically in adult grammars, have taken the phenomenon as crucial evidence for (differing) conclusions about the nature of child grammars. Thus, under some analyses, genitive subjects are possible in child grammars because these allow nominalisations as root clauses; under others because they allow root clauses with either under specified Infl or no Infl at all. Radford presents specific arguments against these analyses: if these 'genitives' are the subject of nominalisations - or of clauses with an underspecified or absent Infl - why is the verb sometimes inflected for agreement or tense? And how can interrogatives (with Subject-Aux inversion) also occur with 'genitive subjects'? If these subjects are always in nominalisations that involve a determiner with a VP complement, why can modals appear as well (since modals do not occur within VP)? If nonfinite clauses license genitive case on their subjects in child grammar, why is this not true of adult grammars, and how did the children acquire this pattern at all? Conversely, why are genitive subjects not more common in child grammars even when they are possible? Given these and other arguments against the specific analyses of 'genitive' subjects in child language, Radford concludes that they can be rejected essentially on empirical grounds. This is an important conclusion since he also argues that all of these analyses would entail that child grammars are significantly different from adult grammars. In contrast, the proposal that he puts forward entails a much higher degree of congruence between child and adult grammar; at least with respect to the phenomenon (or rather, in his view, phenomena) in question, the observed differences are due to changes in the acquirers' morphological and syntactic analysis of the pronouns themselves, rather than to differences in the constraints on clause structure.
A. Sorace et al. I Lingua 106 (1998) 1-21
15
Most of the contributors to this book focus on the acquisition of a first language by children, although conclusions about maturational changes such as those drawn in Wexler's work clearly have implications for language learning by adults. Schwartz's topic, in contrast, is the acquisition of a second language. In particular, she addresses directly the question of whether L2 acquisition is different from LI acquisition, and if so, how the differences are to be accounted for. Schwartz adopts the position common to most contributors to this volume that LI acquisition is very heavily constrained by Universal Grammar. Since she also rejects the idea that L2 acquisition replicates exactly normal LI acquisition, and acknowledges results indicating that even the most successful adult L2 acquirers do not reach the same final states as speakers who acquired the language as children, a possible conclusion - and one that has frequently been drawn - is that adult learners do not have (full) access to UG, that their 'language instinct' has been 'dismantled'. However, Schwartz argues strongly against this conclusion. Instead she argues that the only difference that it is necessary to posit between LI and L2 acquirers is that their 'initial state' is different. In fact, Schwartz argues for the maximal difference between the initial states of LI and L2 acquirers. By assumption, for LI acquirers parameters of UG may only be set to default values if at all. In contrast, Schwartz argues that L2 acquirers transfer their knowledge of their first language in toto as their initial state for L2 acquisition. Further, Schwartz argues, this 'Full Transfer' is not determined by the age of the acquirer; even child L2 learners show evidence of this process. In reaching this conclusion, Schwartz reviews two other models of transfer and L2 acquisition and presents data arguing against them. The first model she discusses is the 'Minimal Trees' hypothesis, according to which lexical projections and their associated linear order transfer into the initial state of the L2, but functional categories do not; instead these are added progressively by lexical learning, in an order determined by their hierarchical position. Schwartz introduces data that support three arguments against this model: cases in which there is evidence for LI functional structure even in the earliest stages of L2 acquisition; cases where this structure persists at intermediate stages; and differences between LI and L2 acquirers that are unexpected under this model since the progressive addition of functional structure should be the same for both. The second model reviewed is the 'Weak Transfer' or 'Valueless Features' hypothesis, according to which both lexical and functional projections from the LI transfer, but the 'strength' of the features of functional heads - the engine of movement in the theory assumed - does not. Schwartz's argument against this model derives from cases in which there is transfer of LI orders that are assumed to derive from movement (and ultimately feature strength); these orders are not shown by L2 acquirers whose LI lacks the relevant movement. Finally, Schwartz explores in some detail the support for the 'Full Transfer/Full Access' model provided by Hulk's (1991) study of Dutch L2 acquirers of French. Although in other work Schwartz has argued for the 'Full Access (to UG)' (e.g. Schwartz and Sprouse, 1994), part of the model she espouses, in this chapter she concentrates most heavily on the argument for 'Full Transfer'; the argument stated
16
A. Sorace et al. I Lingua 106 (1998) 1-21
here for 'Full Access' is essentially one of economy: given the demonstration that the initial states of LI and L2 acquirers are radically different, the onus is on those who propose a further difference between the two sets of acquirers (differential access to UG) to demonstrate the necessity of this additional distinction. The paper by Tesar and Smolensky deals with the problem of acquisition as it arises in Optimality Theory (OT). In OT, language variation is captured in the variety of priority rankings that can be given to a universal constraint set. Tesar and Smolensky argue that this structure, unlike, for example, the binary switches of Principles and Parameters theory, aids the language learner in finding the correct grammar. Consequently, the restricted linguistic evidence available to a child can rapidly distinguish the target grammar from the enormous number of alternatives. The authors also point out how this approach to learning escapes the traditional bugbear of language acquisition, Gold's theorem (see Gold, 1967), by making use of implicit negative evidence. Tesar and Smolensky analyse the learnability problem into two components: Robust Interpretative Parsing and Grammar Learning, and discuss how they are, both involved in their iterative model of language acquisition. The first step in this model consists of the computation of the hidden structure of overt input data. This computation is done by the parser, which is 'robust' because it has the capacity to use the learner's current grammar to assign a structural representation to forms that are deviant with respect to that grammar. As Tesar and Smolensky note, the requirement that such ungrammatical forms be parsed seems 'paradoxical'; input data count as information only if the current grammar (incorrectly) analyses them as ungrammatical. Unlike other models in which ungrammatical forms are unparsable, within OT input can be given an interpretative parse despite its ungrammaticality, and can thus be used to revise the grammar. This revision is the second step in the model, consisting of the gradual approximation to an optimal grammar (i.e. a language-specific ranking of constraints). Learning an optimal grammar crucially entails applying the grammatical constraints of OT, which are by hypothesis part of the learner's initial state, in order to deduce a harmonic ordering of structural descriptions. On the basis of such ordering, the learner iteratively revises the current constraint ranking so that constraints violated by the optimal candidate are demoted to be dominated by constraints violated by sub-optimal candidates: this is the Constraint Demotion algorithm. One particularly attractive aspect of OT is that it is amenable to exacting formal specifications, and researchers can develop explicit algorithms within the framework. This paper is no exception. The Constraint Demotion algorithm is the basis for their results on the complexity of acquisition; we know the ranking can be learned with only a number of informative examples quadratic in the number of constraints, because this level of performance is a provable consequence of Constraint Demotion. Jusczyk presents a cogent overview of some of the last decade's research into infant speech processing, demonstrating that methods such as the headturn technique can reveal the developmental course of sensitivities to different aspects of the speech stimulus before the infant produces any actual speech. This research underlines the
A. Sorace et al. I Lingua 106 (1998) 1-21
17
importance of a developmental perspective on processing, as is clear from his discussion of research on speech segmentation. The speech signal is substantially redundant - as is appropriate for vulnerable information being transmitted over a noisy medium. It is often difficult if not impossible for studies using adult subjects, or computational studies of the information structure of large language corpora, to say anything about the relative priorities of the different types of intercorrelated information - prosodic, phonotactic and lexical - that provide adults with good cues as to where words might start and end in continuous speech. Some of the studies reported by Jusczyk show that sensitivity to these different segmentation cues develops at different points in infancy, and allows us to appreciate their interrelationships more fully. He suggests that prosodic information may be fundamental in establishing speech segmentation, and possibly also in the structuring of sequential information. In the course of his discussion, a fuller picture emerges of the temporal profile in which the developing infant establishes internal representations of segmental information, phonotactics, prosodic regularities, lexical entries, and even the distributional properties of function words. Further, experimentation using infants exposed to different ambient languages allows us to approach the issue of exactly what is universal in language acquisition. For instance, research by Cutler and her colleagues (see, e.g., Cutler and Norris, 1988; Otake et al., 1993) has revealed that the languages of the world seem to choose from only three different prosodic criteria for speech segmentation - moraic, syllabic and metrical - but crosslinguistic studies of development are necessary to establish the nature of the universal sensitivity to prosody that gives rise to one of these mutually exclusive alternatives. Overall, it is clear from the review and discussion in Jusczyk's chapter that the developing infant is exquisitely sensitive to the distributional structure of the ambient speech at various levels, but that some of these levels are particularly salient to the infant and seem to have developmental priority. The developmental profile may owe something to general cognitive development, such as the growth of resources for memory and attention, and to some extent specifically linguistic factors may be implicated. Outlining an approach to investigating the latter, Jusczyk ends with a description of current work aimed at testing particular implications of Optimality Theory for language development, presaging further interpenetration of two of the strands of acquisition research represented in this volume. Tinker's paper addresses a high-profile debate that has crossed over into all of the constituent disciplines in the field of language acquisition; it concerns the principles governing the acquisition and maintenance of regular and irregular lexical items. Rumelhart and McClelland (1986) made strong claims that Parallel Distributed Processing (POP) models are capable of exhibiting behaviour - specifically, involving the formation of the past-tense form of verbs - that had traditionally been interpreted in terms of the learning and deployment of linguistic rules. Since then, connectionists have seen pronunciation behaviour as a richer testbed for the same claim regarding symbolic versus subsymbolic processing, but the controversy regarding the modelling of the past-tense has continued, on the one hand with further investigation of the human data concerning morphological processing and with criticisms of twolayer network approaches (e.g., Pinker and Prince, 1988; Prasada and Pinker, 1993)
18
A. Sorace et al. I Lingua 106 (1998) 1-21
and, on the other hand, with demonstrations of the abilities of more sophisticated connectionist and statistical approaches (e.g., Hahn et al., 1997; Hare et al., 1995). A new dimension to this argument is provided by Westermann (1997), who demonstrates some of the capabilities of connectionist networks that are allowed to grow new structure in the course of training, and are therefore distinguished from the more traditional networks in which the form of the architecture and of the inputs and outputs is given from the beginning. Pinker's paper marshals the evidence for the view that children develop both an associationistic and a rule-driven approach to past-tense formation, applying them respectively to irregular and regular inflection. Twelve sets of linguistic and psychological data are presented, some of them subcases of more general arguments. Pinker claims that these arguments all reinforce the distinction he makes between processing that draws on memory for irregular forms, and processing that utilises rules to produce regular inflections. As an example, compound forms may involve irregular but not regular plurals: 'mice-infested' is permissible, but not 'rats-infested', only 'rat-infested'. This observation is interpreted as evidence for a processor in which memorised forms (such as 'rat', but also including irregular forms such as 'mice') are input to a mechanism of complex word formation, with regular inflection (of the compound) being applied at the end of the process; in this process there is no opportunity for regular inflection to apply to 'rat'. In addition to such linguistic data, Pinker also considers behaviour such as the well-known overregularisations of irregular verbs - 'breaked' for 'broke'. Such overgeneralisations, Pinker argues, are produced by children not because they are 'regularity-imposers', but because they have not had enough experience of irregular verbs to feed the pattern-associator. Pinker also discusses data from patients suffering from disorders that differentially affect only the memory system, or only the grammatical system, such as anomic aphasia, Alzheimer's disease and Parkinson's disease. These data show that memory impairment affects only the production of irregular verbs, whereas grammar impairment disrupts only the production of regular forms, thus providing further support for the word/rule model he advances. The conclusion is that regular inflection is the default operation, which applies when an irregular form cannot be retrieved from memory. This assumption also explains why frequency is correlated with irregularity: commonly heard forms are easier to memorise. If an irregular verb becomes less frequent, at some point in the course of time it will be converted into a regular verb to which the regular rule can apply. Pinker concludes with a consideration of German, in which the numerical basis of 'regularity' differs widely from that found in English, and he further sustains his case for a dual-route approach to inflection: regular inflection can be generalised independently of frequency. Overall, Pinker's arguments represent a challenge to proponents of connectionist, 'single-route' models of the processing of inflectional morphology. Teun Hoekstra died shortly before this volume went to press. Despite his illness, he contributed at all stages to the development of this volume and to the original conference. In great sadness, we dedicate this collection to his memory.
A. Sorace et al. I Lingua 106 (1998) 1-21
19
References Anttila, A., in press. Deriving variation from the grammar. To appear in: F. Hinskens, R. van Hout and L. Wetzels (eds.), Variation, change and phonological theory. Amsterdam: Benjamins, [also: Rutgers Optimality Archive, 63; http://ruccs.rutgers.edu/roa.html.] Atkinson, M., 1992. Children's syntax: An introduction to Principles and Parameters Theory. Oxford: Blackwell. Atkinson, M., 1996. Now, hang on a minute: Some reflections on emerging orthodoxies. In: H. Clahsen (ed.), Generative perspectives on language acquisition, 451^485. Amsterdam: Benjamins. Bard, E., D. Robertson and A. Sorace, 1996. Magnitude estimation of linguistic acceptability. Language 72, 32-68. Berwick, R., 1985. The acquisition of syntactic knowledge. Cambridge, MA: MIT Press. Bhatia, T.V. and W. Ritchie (eds.), 1996. Handbook of second language acquisition. New York: Academic Press. Bloom, P. (ed.), 1994a. Language acquisition: Core readings. Cambridge, MA: MIT Press. Bloom, P., 1994b. Preface: Language acquisitions. In: P. Bloom (ed.), 1-4. Boersma, P., 1997. How we learn variation, optionality, and probability. Rutgers Optimality Archive, 221 -1097; http://ruccs.rutgers.edu/roa.html. Borer, H. and K. Wexler, 1987. The maturation of syntax. In: T. Roeper and E. Williams (eds.), 123-172. Borer, H. and K. Wexler, 1992. Bi-unique relations and the maturation of grammatical principles. Natural Language and Linguistic Theory 10, 147-190. Breeder, P. and J. Murre (eds.), forthcoming. Models of language acquisition: Inductive and deductive approaches. Cairns, P., R.C. Shillcock, N. Chater and J. Levy., 1997. Bootstrapping word boundaries: A bottom-up corpus-based approach to speech segmentation. Cognitive Psychology 33, 111-153. Cartwright, T.A. and M.R. Brent, 1994. Segmenting speech without a lexicon: Evidence for a bootstrapping model of lexical acquisition. In: Proceedings of the 16th Annual Conference of the Cognitive Science Society, 148-152. Hillsdale, NJ: Erlbaum. Chomsky, N., 1995. The Minimalist Program. Cambridge, MA: MIT Press. Clahsen, H., 1992. Learnability theory and the problem of development in language acquisition In: J. Weissenborn, H. Goodluck and T. Roeper (eds.), Theoretical issues in language acquisition 53-76. Hillsdale, NJ: Lawrence Erlbaum. Clahsen, H. (ed.), 1996. Generative perspectives on language acquisition. Amsterdam: Benjamins. Clahsen, H., S. Eisenbeiss and A. Vainikka, 1994. The seeds of structure: A syntactic analysis of the acquisition of Case marking. In: T. Hoekstra and B. Schwartz (eds.), 85-118. Clark, R., 1992. The selection of syntactic knowledge. Language Acquisition 2, 83-149. Clark, R. and I. Roberts, 1993. A computational model of language learnability and language change. Linguistic Inquiry 24, 299-345. Grain, 1991. Language acquisition in the absence of experience. Behavioral and Brain Sciences 14, 597-650. Grain, S. and R. Thornton, 1998. Investigations in Universal Grammar: A guide to experiments in the acquisition of syntax. Cambridge, MA: MIT Press. Cutler, A. and D. Norris, 1988. The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14, 113-121. Deprez, V. and A. Pierce, 1993. Negation and functional projections in early grammars. Linguistic Inquiry 24, 25-67. Elbers, L., 1995. Production as a source of input for analysis: Evidence from the developmental course of a word-blend. Journal of Child Language 22, 47-71. Elbers, L., forthcoming. An output-as-input hypothesis for language acquisition: Arguments, model, evidence. In: P. Breeder and J. Murre (eds.). Elman, J.L., 1993. Learning and development in neural networks: The importance of starting small. Cognition 48, 71-99. Epstein, S., S. Flynn and G. Martohardjono, 1996. Second language acquisition: Theoretical and experimental issues in contemporary research. Behavioral and Brain Sciences 19, 677-758.
20
A. Sorace et al. I Lingua 106 (1998) 1-21
Felix, S. W., 1992. Language acquisition as a maturational process. In: J. Weissenborn, H. Goodluck and T. Roeper (eds.), Theoretical issues in language acquisition, 25-52. Hillsdale, NJ: Erlbaum. Finer, D. and E. Broselow, 1986. Second language acquisition of reflexive binding. In: Proceedings of NELS 16, 154-168. Amherst, MA: University of Massachusetts, Graduate Linguistics Student Association. Fletcher, P. and B. McWhinney (eds.), 1995. The handbook of child language. Oxford: Blackwell. Frazier, L. and J. de Villiers (eds.), 1990. Language processing and language acquisition. Dordrecht: Kluwer. Fukui, N., 1993. Parameters and optionality. Linguistic Inquiry 24, 399^420. Gibson, E. and K. Wexler, 1994. Triggers. Linguistic Inquiry 25, 407^-54. Giorgi, A. and F. Pianesi, 1997. Tense and aspect: From semantics to morphosyntax. New York: Oxford University Press. Gleitman, L. and E. Wanner, 1982. The state of the art. In: E. Wanner and L. Gleitman (eds.), Language acquisition: The state of the art. Cambridge: Cambridge University Press. Gold, E.M., 1967. Language identification in the limit. Information and Control 10, 447^474. Hahn, U., R.C. Nakisa and K. Plunkett, 1997. The dual-route model of the English past-tense: Another case where defaults don't help. In: A. Sorace, C. Heycock and R. Shillcock (eds.), Proceedings of the GALA '97 conference on language acquisition, 346-351. Edinburgh: University of Edinburgh. Hare, M., J. Elman and K.G. Daugherty, 1995. Default generalisation in connectionist networks. Language and Cognitive Processes 10, 601-630. Hayes, B., in press. Gradient well-formedness in Optimality Theory. To appear in: J. Dekkers, F. van der Leeuw and J. van der Weijer (eds.), Optimality theory: Phonology, syntax, and acquisition. Oxford: Oxford University Press. Hoekstra, T. and B. Schwartz (eds.), 1994. Language acquisition studies in generative grammar. Amsterdam: Benjamins. Hulk, A., 1991. Parameter setting and the acquisition of word order in L2 French. Second Language Research 7, 1-34. Hyams, N., 1986. Language acquisition and the theory of parameters. Dordrecht: Reidel. Hyams, N., 1996. The underspecification of functional categories in early grammar. In: H. Clahsen (ed.), 91-128. Ingram, D., 1987. First language acquisition: Method, description and explanation. Cambridge: Cambridge University Press. Jaeggli, O. and K. Safir, 1989. The null subject parameter. Dordrecht: Kluwer. Jusczyk, P., 1997. The discovery of spoken language. Cambridge, MA: MIT Press. Kuhl, P.K. and J.D. Miller, 1975. Speech perception by the chinchilla: Voiced-voiceless distinction in alveolar plosive consonants. Science 190, 69-72. Lebeaux, D. 1988. Language acquisition and the form of the grammar. Ph.D. dissertation, University of Massachusetts. Marcus, G.F., 1995. The acquisition of inflection in children and multilayered connectionist networks. Cognition 56, 271-279. Matthews, R.J. and W. Demopoulos (eds.), 1989. Learnability and linguistic theory. Dordrecht: Kluwer. McDaniel, D., McKee, C. and H. Smith Cairns (eds.), 1996. Methods for assessing children's syntax. Cambridge, MA: MIT Press. McWhinney, B., 1995. The Childes project. Hillsdale, NJ: Erlbaum. Meisel, J., 1995. Parameters in acquisition. In: P. Fletcher and B. McWhinney (eds.), 10-35. Morgan, J.L. and K. Demuth, (eds.), 1996. Signal to syntax. Mahwah, NJ: Erlbaum. Newport, E., 1990. Maturational constraints on language learning. Cognitive Science 14, 11-28. Otake, T., G. Hatano, A. Cutler and J. Mehler, 1993. Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language 32, 258-278. Paradis, J. and F. Genesee, 1996. Syntactic acquisition in bilingual children: Autonomous or interdependent? Studies in Second Language Acquisition 18, 1-25. Paradis, J. and F. Genesee, 1997. On continuity and the emergence of functional categories in bilingual first language acquisition. Language Acquisition 6, 91-124. Penner, Z. and J. Weissenborn, 1996. Strong continuity, parameter setting and the trigger hierarchy: On the acquisition of the DP in Bernese Swiss German and High German. In: H. Clahsen (ed.), 161-200.
A. Sorace et al. I Lingua 106 (1998) 1-21
21
Pinker, S., 1984. Language learnability and language development. Cambridge, MA: Harvard University Press. Pinker, S. and Prince, A., 1988. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition 28, 73-193. Plunkett, K., 1995. Connectionist approaches to language acquisition. In: P. Fletcher and B. McWhinney (eds.), 36-72. Powers, S. and D. Lebeaux, 1998. More data on DP acquisition. In: N. Dittmar and Z. Penner (eds.), Issues in the theory of language acquisition, 37-76. Bern: Lang. Prince, A. and P. Smolensky, 1993. Optimality Theory: Constraint interaction in generative grammar. Technical Report RuCCS TR-2, Rutgers Center for Cognitive Science (to appear, MIT Press). Prasada, S. and S. Pinker, 1993. Generalization of regular and irregular morphological patterns. Language and Cognitive Processes 8, 1-56. Radford, A., 1990. Syntactic theory and the acquisition of English syntax. Oxford: Blackwell. Radford, A., 1996. Towards a structure-building model of acquisition. In: H. Clahsen (ed.), 43-90. Rizzi, L., 1994. Early null subjects and root null subjects. In: T. Hoekstra and B. Schwartz (eds.), 151-176. Robertson, D. and A. Sorace, in press. Losing the V2 constraint. To appear in: E. Klein and G. Martohardjono (eds.), The development of second language grammars: A generative approach. Amsterdam: Benjamins. Roeper, T. and E. Williams (eds.), 1987. Parameter setting. Dordrecht: Reidel. Rumelhart, D.E. and J.L. McClelland, 1986. On learning the past tense of English verbs. In: J. McClelland, D.E. Rumelhart and the PDF Research Group (eds.), Parallel Distributed Processing: Explorations in the microstructure of cognition. Volume 2: Psychological and biological models, 216-271. Cambridge, MA: MIT Press. Schiitze, C. and K. Wexler, 1996. Subject case licensing and English root infinitives. In: Proceedings of the 20th Boston University Conference on Language Development. Somerville, MA: Cascadilla. Schwartz, B. and R. Sprouse, 1994. Word order and nominative case in non-native language acquisition: A longitudinal study of (LI Turkish) German interlanguage. In: T. Hoekstra and B. Schwartz (eds.), 317-369. Smolensky, P., 1996. On the comprehension/production dilemma in child language. Linguistic Inquiry 27,720-731. Sorace, A., 1993. Incomplete vs divergent representations of unaccusativity in non-native grammars of Italian. Second Language Research 9, 22-48. Sorace, A., 1996. On gradience and optionality in non-native grammars. In: S. Epstein et al (eds.), 741-742. Sorace, A., 1998. Optionality in native and non-native grammars. Manuscript, University of Edinburgh. Tesar, B. and P. Smolensky, 1998. Learnability in Optimality Theory. Linguistic Inquiry 29, 229-268. Thomas, M., 1991. Universal grammar and the interpretation of reflexives in a second language. Language 67, 211-239. Towell, R. and R. Hawkins, 1994. Approaches to second language acquisition. Clevedon: Multilingual Matters. Vainikka, A., 1993/94. Case in the development of English syntax. Language Acquisition 3, 257-325. Valian, V., 1990. Logical and psychological constraints on the acquisition of syntax. In: L. Frazier and J. de Villiers (eds.), 119-146. Werker, J., 1993. Developmental changes in cross-language speech perception: Implications for cognitive models of speech processing. In: G.T.M. Altmann and R.C. Shillcock (eds.), Cognitive models of speech processing. The second Sperlonga meeting, 57-78. Hillsdale, NJ: Erlbaum. Westermann, G., 1997. A constructivist neural network learns the past tense of English verbs. In: A. Sorace, C. Heycock and R. Shillcock (eds.), Proceedings of the GALA '97 conference on language acquisition, 393-398. Edinburgh: University of Edinburgh. Wexler, K., 1994. Optional infinitives, head movement and the economy of derivations. In: D. Lightfoot and N. Hornstein (eds.), Verb movement, 305-350. Cambridge: Cambridge University Press. White, L., 1989. Universal grammar and second language acquisition. Amsterdam: Benjamins. White, L., 1996. Universal grammar and second language acquisition: Current trends and new directions. In: T.V. Bhatia and R. Ritchie (eds.), 85-120.
This page intentionally left blank
Lingua ELSEVIER
Lingua 106 (1998) 23-79
Very early parameter setting and the unique checking constraint: A new explanation of the optional infinitive stage^ Ken Wexler* Department of Brain and Cognitive Science, Department of Linguistics and Philosophy. MIT E10-20, Cambridge, MA 02139, USA
Abstract This paper argues that the traditional view of experience-dependent properties (learned properties) of language as developing late and non-experience-dependent properties as developing early is in fact often wrong. Parameters are set correctly very early (Very Early Parameter-Setting) and properties of inflectional items are also learned very early. On the other hand, some universal properties of language emerge later, presumably under a genetically-driven maturational program. The Optional Infinitive(OI) Stage (Wexler, 1990, 1992, 1994) of grammatical development is explained by the AGR/TNS Omission Model (ATOM) of Schiitze and Wexler, (1996). This paper derives this model via a new proposal for a developmental constraint: the Unique Checking Constraint (UCC), which prevents a D-feature on DP from checking more than one D-feature on functional categories, thus forcing either AGR or TNS to be omitted. The Minimalist framework of Chomsky 1995 is assumed - in particular the assumption that a D-feature and not a case feature is the driving force for the Extended Projection Principle. With AGR and TNS both having a D-feature, UCC predicts that finite sentences will not converge. The model also predicts that subjects of OI's will raise to a higher functional projection, even when case is not assigned by INFL, thus solving a traditional problem in the theory of OI's. With natural assumptions on the nature of null-subject languages, the Null-Subjection/Optional Infinitive Correlation of Wexler (1996) is derived from the UCC — that OI's exist in early child language if and only if the adult grammar is not an INFL-licensed null-subject language. Thus the UCC is seen as a fundamental explanatory force for a range of phenomena in early child grammar. Moreover the child data provide strong evidence for the claim that a D-feature motivates the raising of the subject in UG, thus unifying child and adult grammar and demonstrating the usefulness of the investigation of child grammar in the study of UG. Key words: Optional infinitives; Syntactic development; Parameters; Minimalism; Agreement; Tense; Maturation of grammar ' My thanks to Carson Schiitze, whose extensive and insightful discussions have helped me a great deal, to two anonymous reviewers, who provided extremely detailed and useful critical commentary on relatively short notice, and to the editors. And to the community of OI researchers, who via their work and discussion have made this area such a pleasure to work in. * Phone +1 617 253 5797; Fax +1 617 253 9767; E-mail:
[email protected] 0024-3841/997$ - see font matter © 1999 Elsevier Science B.V. All rights reserved PII: 80024-3841(98)00029-1
24
K. Wexler I Lingua 106 (1998) 23-79
1. Introduction: Against late learning early emergence In this paper I would like to argue for a dramatically different view of linguistic development than is standard in almost every approach to language acquisition that I know. In particular, the standard view of grammatical development assumes, implicitly or explicitly, that the aspects of grammar that are learned are the ones to develop slowly, and that the aspects of grammar that are innate are present very early. Let's us call this view the 'Late Learning Early Emergence (LLEE)' Hypothesis, the idea being that properties of grammar that just 'emerge', i.e. are encoded in the genetic system underlying language, emerge early and that properties of grammar that are experience-dependent emerge late. LLEE is rather natural and plausible; after all, 'learning' is the kind of thing that takes time, the experience must actually occur, usually after birth, and there is no necessary reason that emergence should take a long time - if the genetic system underlying brain development is responsible for a particular development, why can't the development, the emergence, take place by birth? The LLEE view is so strong that, it seems to me, it is advocated or assumed by approaches that are radically different in other ways, for example, on the question of how much and what is innate (genetically encoded) in linguistic development. Almost nobody is a radical behaviorist in today's environment in which generative grammar has shown the complexity of language. But there is still a very strong school (e.g. Quartz and Sejnowski, 1997; Elman et al., 1996), carried over from former days, of those who believe that only 'general purpose' mechanisms are innate, whereas the view which attempts to encompass the results of linguistic theory and the true empirical situation of language development in human children tends to believe that particular aspects of grammar (UG) are genetically encoded, so as to explain why language is the way it is and why and how all normal children arrive at essentially the same grammar with experience that strongly underdetermines the particular nature of the grammar. Both 'general purpose' nativists (i.e. most 'psychological' theories) and 'specific domain' nativists believe in LLEE. That is, most approaches seem to believe that learning takes time and that aspects of language which are not learned (whether domain-specific or general) tend to emerge early. To take one important example, consider the question of the development of inflection. Very roughly speaking, inflection is the linguistic process whereby grammatical information is encoded in 'small' grammatical formatives and added to 'contentful' words. The standard view is that much of inflection appears late, and is often incorrect, and that the reason that it is late is because there is much to 'learn' about it. In fact, the standard view is that very young children 'talk funny' because they have not yet 'mastered' ('learned') inflection and its properties. Thus a child might say me going home instead of / am going home because the case 'inflection' on the subject pronoun has not been learned and the appropriate grammatical inflection called the 'auxiliary' has not been learned, and is therefore omitted. Syntactic parameters are another domain of variation where learning based on experience must operate. Thus on the LLEE view, we would expect parameters to be
K. Wexler I Lingua 106 (1998) 23-79
25
set relatively late in development, only after sufficient experience had been accumulated by the child. Such an expectation is behind, for example, Hyams' (1986) famous hypothesis that English-speaking children had 'mis-set' the null-subject parameter, thereby producing null subjects well into the twos and even threes. Similarly, in a non-linguistic-theory framework, Grimm (1993) claims that Germanspeaking children at an early age do not know the correct word order for major constituents, constructions which in a more linguistic-theoretic framework, would be related to syntactic parameters. Thus, whether from a linguistic theory or general psychology viewpoint, LLEE seems to have been the dominant assumption of students of language development. In this paper I will argue that LLEE is at best partially true and that it is often strongly wrong. In particular I will argue for the following two (roughly stated) hypotheses: (1) Very Early Parameter-Setting (VEPS) (Wexler, 1996): Basic parameters are set correctly at the earliest observable stages, that is, at least from the time that the child enters the two-word stage around 18 months of age. (2) Very Early Knowledge of Inflection (VEKI): At the earliest observable stage (from the time that the child enters the two-word stage around 18 months of age) the child knows the grammatical and phonological properties of many important inflectional elements in their language. Notice that (1) and (2) come close to being equivalent. Borer (1984) argued that syntactic parameters are to be interpreted not as properties of the grammar as a whole, but as properties of morphological and inflectional elements in the lexicon (and see Manzini and Wexler, 1985; Wexler and Manzini, 1987 for the related 'lexical parameterization hypothesis') and Fukui (1988) further suggested that the parameters are related to the 'functional' elements in the lexicon. On this view the properties of inflectional elements and properties of syntactic variation are essentially identical. For ease of exposition, we will maintain (1) and (2) as separate hypotheses, but it is worth remembering that they may turn out to be two slightly different ways of expressing the same hypothesis. Why do I mention the entry into the two-word stage as the point at which parameters and inflectional properties are known? Not because of any inherent connection between the two-word stage and parameters or inflection. Rather, because the evidence which shows us whether young children know parameters and/or inflection is only available when children begin to produce strings containing at least two words. Quite possibly - quite likely, in my estimation - children have set basic parameters (and learned inflectional properties) before the entry into the two-word stage - we just don't have any evidence at the current time which tells us whether this is true.1
1
Of course, further development of experimental techniques, for example the selective-looking paradigm, might provide evidence about this extremely early knowledge.
26
K. Wexler I Lingua 106 (1998) 23-79
In this paper I will give some for arguments for VEPS and VEKI. And I will argue against the universality of LLEE for inflectional development, that is, I will argue that certain aspects of morphosyntactic inflectional development emerge somewhat late, although they are not learned. Thus a certain amount of inflectional development unfolds over time according to a genetic blueprint. I will also demonstrate that the surface characteristics of this development can differ in different languages. This leads to what appears to be a conundrum: how can development be guided over time by a genetic blueprint and yet the effects of development look so different in different languages? The answer will be quite revealing, in my opinion: the genetic blueprint is the same for all children, across languages. But learning of inflectional material of course also has to take place, and the children learn this inflectional material very early, according to VEPS or VEKI. The learned parameters (inflectional properties) interacting with the innately unfolding aspects of inflection combine to create quite different surface effects in the development of different languages. Thus variation across time and across languages is neatly and precisely accounted for, as the precise interaction of genetically-specified grammatical growth and the effects of learning. Such a deep explanation of time and language variation is exactly what we should be hoping for in the study of grammatical development. There is another major theme of this paper. Namely, I want to illustrate the growing importance of the study of grammatical development in the theory of grammar. That is, I wish to show how the study of grammatical development leads to particular claims and evidence about the nature of Universal Grammar. I will concentrate on one important case - the syntactic motivation for the Extended Projection Principle, the fact that subjects raise out of the Verb Phrase. I will argue that developmental research and adult (standard 'linguistic theory') research converge on the same somewhat unexpected conclusion about this motivation, and that the developmental results are perhaps even clearer than the adult results, despite the EPP being central to the theory of syntax. Other cases touched upon will include which functional properties (AGR?) assign NOMinative case, what default case looks like in a language, etc. The evidence from developmental work that I touch on in this paper is directly relevant to these issues, and syntactic theory that ignores this evidence fails to capture what I consider to be crucial, even central, phenomena. To ignore the developmental results in the construction and testing of linguistic theory, in my opinion, would be to make an arbitrary decision about which phenomena are relevant, the kind of arbitrary decision, say, that structural/behavioral linguistics made when it ignored judgments of grammaticality. The empirical and theoretical material that I discuss in this paper involves the Optional Infinitive (OI) stage of linguistic development, and a large number of cross-linguistic properties associated with it. In this paper I develop a new theoretical analysis of this stage which captures what have been difficult phenomena to capture, including why certain languages don't go through the OI stage; the analysis also integrates the theory of the stage with current syntactic theory, especially concerning the status of the Extended Projection Principle. Thus I believe the theory presented here to be more empirically adequate than previous theories concerning the OI stage with respect to linguistic development and also to be better integrated
K. Wexler I Lingua 106 (1998) 23-79
27
with syntactic theory. The large amount of detailed empirical investigation of the OI stage cross-linguistically in the last decade has made far more specific theories possible, and current theoretical analysis shows the large boost to theory construction that has arisen from this empirical investigation. 7.7. Summary of paper In the next section I briefly outline some of the central properties of the OI stage without providing any kind of detailed coverage. Section 3 discusses Very Early Parameter-Setting (VEPS) and some of the evidence for it. Section 4 draws some important consequences of VEPS for the theory of learning, in particular that parameter-setting must be some kind of perceptual learning. In Section 5 I discuss the evidence that the null-subject parameter is set correctly at the earliest observed stage. This is important for VEPS because the null-subject parameter has been classic case of a mis-set parameter, contrary to VEPS. Section 6 briefly draws implications about the incorrectness of Late Learning Early Emergence from the results in the earlier sections. In Section 7 I apply the same ideas to the study of inflection and make the case for extremely young children as 'little inflection machines' as opposed to the traditional view that they are poor at inflectional learning. Section 8 briefly describes the Schutze and Wexler (1996a,b) AGR/TNS Omission Model (ATOM) of the OI stage, which is the starting point I take for the investigation. In Section 9 I discuss the central question of why subjects of OI's raise, given that they are often not assigned case by non-finite INFL, relating the developmental results to the current Minimalist view of the Extended Projection Principle. Section 10 summarizes briefly the literature on cross-linguistic variation in which languages undergo the OI stage, the generalization being the Null-Subject/Optional Infinitive Correlation of Wexler (1996). In Section 11 I propose a new theory of the OI stage, which assumes that OI children are governed by the genetically-specified (and withering away in time) Unique Checking Constraint (UCC), which allows a D-feature on the subject to only check once. The UCC theory is integrated with the current view on the EPP, and provides a good deal of evidence for it. Section 12 shows how the central cross-linguistic variation phenomenon in the OI stage, the NS/OI, strictly follows from the UCC analysis of the previous section. In Section 13 I conclude, allowing myself some thoughts on the relation between linguistics and psycholinguistics, namely that it is now more clear in practice that it is important to integrate them, the developmental research being important in detail for the nature of human language. 2. The OI stage in grammatical development The foundations of the evidence for VEPS and VEKI came with the discovery (Wexler, 1990, 1992, 1994).2 of the Optional Infinitive (OI) stage in grammatical development. Wexler (1994) characterizes this stage as in (3): :
From now on, reference to 'Wexler, 1990, 1992, 1994' will be designated as 'Wexler, 1990ff.'
28
K. Wexler I Lingua 106 (1998) 23-79
(3) The Optional Infinitive(OI) stage (Wexler, 1990ff.) a. Root infinitives are possible grammatical sentences for children in this stage (around 2 years) b. These infinitives coexist with finite forms c. The children nevertheless know the relevant grammatical Principles, e.g. head movement, checking, etc. Since the existence of the OI stage with its general properties as described by Wexler in (3) are by now well-known and generally agreed to, I won't enter into detail about it. For illustration, let me just give one example that Wexler used to describe the OI stage, from Poeppel and Wexler, 1993. (4) shows the data (number of utterances of a particular type) as analyzed in Poeppel and Wexler, from one child Andreas at 2; 1. (original data from Wagner, 1985). (4) German, Poeppel and Wexler (19933) +Finite -Finite V2/not final 197 6 V final/not V2 11 37 German is an SOV/V2 language. SOV means that its underlying constituents, according to the standardly assumed mechanics for German, are in the order Subject Object Verb. That German is V2 (Verb Second) means that in root (matrix) clauses, the finite verb moves to second position (the head C of the Complementizer Phrase (CP)) while another constituent occupies first position, the Specifier of CP). Thus Wexler argued that it follows from the properties of the OI stage in (3) that German-speaking children in the OI stage should produce finite verbs in second position and non-finite verbs in final position. The data in (4) strongly conform to this prediction. Finite verbs (as determined by their visible inflectional morphology) occur in second position and non-finite root verbs4 (as determined by their visible inflectional morphology) occur in final position. Wexler also argued that the OI stage could be demonstrated in other languages when processes other than V2 (V to C movement) distinguished finite from non-finite verbs. For example, in languages (such as French) which have (finite) verb raising to INFL, the properties of the OI stage in (3) argue that the finite verb will precede certain adverbs or other elements (e.g. negation) whereas non-finite verbs (which don't raise) will follow these elements. Thus, on the basis of a variety of grammatical processes cross-linguistically, Wexler argued that the OI stage properties in (3) characterize child grammars across many languages. In light of previous research on child language, the demonstration that the OI stage existed across languages is a surprising result, since the non-finite stage posited was independent of the particular grammatical process which showed it (e.g. 3
See e.g. Mills, 1985; Weissenborn, 1991; Clahsen et al., 1994; Boseret al., 1992; Behrens, 1993 for data which fits the same analysis. 4 The root verbs are 'main' verbs, verbs that should be finite in the adult grammar.
K. Wexler I Lingua 106 (1998) 23-79
29
V2 versus V to INFL) but seemed to have an independent existence. Of course, from the standpoint of explanatory adequacy in linguistic development (as explanatory adequacy is understood, for example, in principle-based syntactic theory) the result was a clear advance, since it seemed principle-based rather than construction-specific. By now, the existence of the OI stage in many different languages has been so well-documented that its existence and general properties are no longer controversial, though many important details remain to be explicated, and the theoretical basis for the OI stage is subject to active investigation. With this brief background, I will discuss the relevance of the OI stage to the question of VEPS and VEKI, and then go on to the study of cross-linguistic variation in the OI stage and a deeper understanding of the stage itself. 3. Very early parameter-setting As early as Wexler (1990, 1994), it was argued that work on the OI stage had shown that children had set parameters correctly at an extremely early age. Wexler (1996) formalized this observation as VEPS, presented in (5): (5) Very Early Parameter Setting (VEPS) Basic parameters of verb movement, e.g. V to I, V to I to C, are correctly set at the earliest observed stages, thus in the OI stage they're correctly set. Parameters that are set at the earliest observed stage (i.e. at beginning of production of multiple word combinations, around 1;6) include: a. Word order, e.g. VO versus OV (e.g. Swedish vs. German) b. V to I or not (e.g. French versus English) c. V2 or not (e.g. German versus French or English) d. Null subject or not (e.g. Italian versus English or French) This is not the place to discuss the evidence for (5) in any detail. To briefly review, one can find evidence for (5a) by observing OI's in, say, Swedish and German. These root infinitives will appear in sentence final position in German, but in Swedish they will appear after the subject and before, say, the object. That is, the OI's, not undergoing V2, will appear in their base-generated position. Wexler (1990, 1994) documents these differences, based on data in Plunkett and Stromqvist (1990), and Platzack (1990). Or, one may observe Japanese in its early state, where there is no verb movement, and notice that the verb is in final position, after objects, that is early Japanese shows SOV, not SVO. English, whose main verbs do not undergo verb movement, shows mostly SVO order, even for finite verbs. Thus the correct word order is displayed quite early, whether SVO or SOV. Similarly, the raising of finite verbs in French means that the finite verb will precede negative pas and the non-finite verb will follow pas, and this is demonstrated by Pierce (1989, 1992) for very young French children. English main verbs do not raise and thus finite verbs should follow negation not in young children, which they
30
K. Wexler I Lingua 106 (1998) 23-79
mostly do - there is no evidence that sentences like she likes not ice cream are systematically created by young English-speaking children. Thus (5b) holds, the value of the verb-raising parameter is set extremely early, whether its value is raise or not raise. The verb-second parameter in (5c) is also set correct at the earliest ages, as Wexler argued. The data from Poeppel and Wexler 1993 that we have already discussed shows that German children have V2 for finite verbs at early ages - the finite verb appears in second position but the non-finite verb appears in final position. Similarly as Wexler demonstrates, in the Mainland Scandinavian languages (Danish, Norwegian, Swedish), which are V2/SVO, the finite verb precedes negation (because the verb has raised over it, to C), whereas the root infinitival remains to the right of negation. Thus these languages show V2 at very young ages. On the other hand, negation precedes the verb in young English-speaking children, showing that they do not raise the verb, even the finite verb, to C. French children, on the other hand, since French is a verb raising language, will have the finite verb precede negative pas. But we can see that these children are not speaking a V2 language because they will not in general produce constituents other than the subject (objects, adverbs, etc.) in first position, with the finite verb in second position. German-speaking children, on the other hand, will do just this (Poeppel and Wexler, 1993), thus showing that German, but not French, is a V2 language at very early ages. The above arguments can be repeated, for many different languages and many different construction types, but without going into detail here they should give the flavor of the arguments for VEPS; namely, for both values of these (binary) parameters, if a language exhibits a certain value of the parameter, then a child learning that language has set the parameter for that language from the earliest observations that we can make. In particular, since most of the arguments involve the correlation between word order and morphology on a verb, we need to observe at least one verb and one other constituent in a sentence in order to test whether a child has set the parameter correctly (for example knows that the verb raises to INFL or not, or raises to C or not). Thus, minimally the child must be in what has been called the two word stage, since we must observe at least two words in an utterance, the verb and at least one word for another constituent. If all utterances only contain one word, no observations on parameter-setting can be made from natural production data. In English it is standardly said that the child enters the two word stage at about 18 months of age. As far as I can tell the observations that I have made above concerning the parameters in (5) hold from the very beginning of the two word stage; that is the source for my estimate that the child has set the parameters correctly by approximately 18 months of age. Of course, we don't know how much before entering the two word stage the child has actually set the parameters correctly. This age is an upper bound on the age of correct parameter-setting; for all we know the child has set the parameters correctly by age 1;0 or even earlier. It will take the development of further experimental methods (e.g. selective looking paradigms or other methods for studying infant cognition) to determine more precisely the age at which children set basic parameters correctly.
K. Wexler I Lingua 106 (1998) 23-79
31
Nevertheless the extremely early upper bound that we have determined is quite important. To the extent that VEPS is correct (and the overwhelming weight of evidence is in favor of VEPS, at least for basic parameters of clause structure and syntax), we know that the child has set parameters correctly before entering the two word stage, thus the child has set parameters correctly before using them. 4. On the theory of learning This result (VEPS) thus has two extremely important consequences for the theory of learning. First, it makes even more strikingly clear the hypothesis (Wexler and Hamburger, 1973, and many references since) that children do not use what Wexler and Hamburger called negative evidence in language acquisition. In particular, children do not learn from corrections by adults of their utterances. Wexler and Hamburger drew this conclusion on the basis of evidence from Brown and Hanlon (1970) that showed that adults do not correct (or understand) ungrammatical utterances by young children more than they correct (or understand) grammatical utterances. Wexler and Hamburger concluded that the lack of negative evidence made even more necessary the conclusion that there must be innate specification of some linguistic structures, a conclusion since widely (but not universally) adopted. But VEPS provides an even stronger result. Namely, if VEPS is true, then children have learned the correct values of parameters before they illustrate these values in their productions. Thus it is logically impossible for them to have used negative information as part of the evidence for setting parameters. The only way for adults to provide negative information to the children would be for the adults to read the minds of the children. On the assumption that adults can't read the minds of children to know what parameters the children have instantiated in the absence of productions by the children instantiating the parameters, we can conclude that children cannot have used negative information to set the parameters. Thus for one class of linguistic expertise which is uncontroversially 'learned' (the values of basic parameters of clause structure and inflection), we know that negative information is not used by the children. The second important consequence of VEPS for the theory of learning involves the nature of learning itself. Since children have set the parameters correctly before instantiating these values in productions (i.e. they have set the parameters by the end of the one word stage), the children have learned the parameter-settings without making responses which involve the parameters. We can say that the learning of parameters is a kind of perceptual learning, since it is done perceptually, rather than behaviorally. Perceptual learning is an old and important topic in psychology; there has been no doubt that it goes on, say, in visual perception, where the system under investigation is only an input system and not also an output system. Following VEPS, it is now clear that perceptual learning is also the basis for the setting of linguistic parameters.3 This is a nice generalization over cognitive domains, and shouldn't really be surprising, though it is perhaps not often stated as a general conclusion. 5
The kinds of learning theories that would thus be ruled out by VEPS (for basic parameter-setting at least) would include, for example, ideas based on Piaget's theories, which demand for higher cognitive functions that learning strongly depends on a motor response.
32
K. Wexler I Lingua 106 (1998) 23-79
A natural questions arises: is this empirical consequence of VEPS theoretically reasonable? That is, are there possible learning theories for parameters which don't involve correction or responses by children? Can perceptual learning actually work for parameter-setting? In fact, the most worked-out theories of parameter-setting turn out to be theories which treat parameter-setting as a kind of perceptual learning. Consider, for example, the 'learning on errors' approach to learning grammatical properties (Wexler and Culicover, 1980) specialized to parameter-setting in a Principles and Parameters framework in Gibson and Wexler (1993). The assumption is that the learner attempts to generate an input sentence with her current grammar (i.e. parameter-settings) and if she can't generate it, she changes the setting of a parameter. No behavioral response by the learner is assumed - it is a type of perceptual learning. The same can be said for other approaches to parameter-setting, such as Dresher and Kaye's (1990) 'cue-based' approach to phonological parameter-setting. Thus perceptual learning of parameters is in fact a potentially viable theory, and is implemented and worked out far more than an approach which assumes that the child must respond behaviorally in order to learn. Thus the theory of learning and the empirical properties of grammatical development as captured by VEPS seem to converge on perceptual learning as the correct model for grammatical development.6 5. The early setting of the null-subject parameter Of the parameters used to illustrate VEPS, we have not yet discussed (5d), the null-subject parameter. It may surprise the reader that I have claimed that the nullsubject parameter is set correctly at the earliest observed ages because there is a famous result in language acquisition due to Hyams (1986), which says that Englishspeaking children mis-set the null-subject parameter for a fairly long period, incorrectly setting the parameter as yes, that is, allowing null-subjects, as instantiated by sentences like baking cookies. In fact, there are a large number of languages which have been studied at early ages, and children speaking these languages all seem to allow null-subjects even if the adult language does not (e.g. French, as in Pierce, 1989). Given these empirical results, it is quite clear that children at young ages allow null-subjects. But in my original paper on the OI stage (Wexler, 1990ff.), I sug6
Of course there is a kind of 'implicit' response in the models of parameter-setting. For example, in Wexler and Culicover's (1980) or Gibson and Wexler's (1993) approach, the learner attempts to generate the input sentence (covertly) to see if it can be generated by the current grammar. But this covert response is quite different from what is meant by theories such as Piaget's, which assume that a behavioral response must be made. The parameter-setting theories involve cognitive not behavioral processes. In order for an adult to 'correct' (provide 'negative information to') the language learner, the adult would have to be able to observe the cognitive processes, clearly an impossibility if one assumes that this type of rather interesting extrasensory perception does not exist. For a discussion of the need for extrasensory perception in earlier behavioral ('stimulus-response') models of language learning, see Batchelder and Wexler (1979).
K. Wexler I Lingua 106 (1998) 23-79
33
gested the possibility that children produce these null-subjects as the result of the OI stage. The basic empirical result that I suggested might hold, and that has been documented in many places since7 is that during the OI stage, Optional Infinitives (i.e. non-finite main verbs) have a far greater proportion of null-subjects than do finite main verbs. Basically, across (non-null-subject) languages, differences in the rate of null-subjects contingent on the verbal inflection often emerge on the order of 70-95% null subjects with OI's (nonfinite main verbs) and 15-30% null subjects with finite verbs. Given this result, Wexler (1996) (see also Bromberg and Wexler, 1995) suggested that the null-subject parameter is not mis-set by young children; rather, the null-subjects of OI's are licensed in some way by the non-finite main verb, the null-subject perhaps being PRO, which is licensed in non-finite contexts (e.g. as the subject of control verbs: Mary triedfPRO to leave). Schiitze and Wexler (1996a,b), suggested in particular that -TNS (as opposed to -AGR) licenses PRO. There are many theories about how null-subjects are licensed by OI's, and they do not all assume that the subject of an OI is PRO. I will not review these theories here. But all these theories do have in common that null subjects are allowed because of the existence of the OI in the sentence. Thus the null-subject parameter has not been 'mis-set'. Rather the null subjects are licensed by the non-finite verb. However, the empirical results show that even finite verbs in non-null-subject languages allow null-subjects in young children, though at a greatly reduced rate. How can these be explained if it is OI's that license the null-subject? Don't we need to assume still that children have mis-set the null-subject parameter as yes? Again, there are a number of ideas concerning these null-subjects of finite verbs. Let me, following Bromberg and Wexler (1995), propose that these null-subjects are a kind of topic-drop* That is, the null-subjects of finite verbs represent a kind of pragmatic error. These languages sometimes drop particular kinds of topics as in (7) in English: (7) a. Speaker one: John will never lose weight. He's eating too much, b. Speaker two: Yeah, isn't exercising enough either. In (7b) the second speaker drops a very clear subject topic: John. Subjects that are such strong topics that they will be quite destressed in English are candidates for topic-drop. If speaker two had used a subject, it would have been a highly destressed he. What I am proposing is that children sometimes drop topics even in situations where they should not and that the excess null-subjects of finite verbs in young children are the result of a kind of pragmatic error, one in which the children treat information that is not a very strong topic nevertheless as a strong topic, one strong 7
See Phillips (1995a,b) and Razzetti (1996) for a summary of data from many languages Hyams and Wexler (1993) suggested that children using null-subjects in non-null-subject languages might be doing a kind of topic drop, but they didn't suggest that this was particularly true for null-subjects of finite verbs. 8
34
K. Wexler I Lingua 106 (1998) 23-79
enough to be omitted.9 Such a proposal is quite consistent with a large body of empirical evidence which suggests that children treat new information as old information, but not the other way around, what Thornton and Wexler (in press) call Asymmetry. For example, as Avrutin and Wexler (1992) and Avrutin (1995) have pointed out, Karmiloff-Smith's (1981) results that show that young children use definite instead of indefinite articles (but not the other way around) and that they use pronouns when lexical NP's are needed means that they treat new information as old information. As Thornton and Wexler argue, following the approach of, but slightly modifying, Chien and Wexler (1990), this suggests that children assume that a discourse element is available to listeners even when it isn't. This is a kind of pragmatic error - children don't understand well enough what listeners know about a discourse, and assume that they know more than they in fact know. The converse doesn't seem to happen. In fact, it may turn out that topic-drop is not quite the right way to look at the phenomenon in adults; rather, it may be that the null subject in English is an optional spell-out for destressed subject pronouns.10 This would explain the possibility of dropping expletive subjects, which are typically destressed - see footnote 5. Referential subjects are destressed when they are very strong topics; as I pointed out, if a subject were used in (7b) it would be a destressed he. Very strong topics are destressed; thus if children mistakenly assume that certain referential subjects are very strong topics, they will consider them destressed and thus candidates for spell9
Haegeman (1995) has argued that the null-subjects of finite verbs cannot be (all) topic-drop because non-topics are often dropped by children, e.g. expletive subjects, and she has provided empirical evidence of this dropping of expletive subjects. To the extent that this result is correct, I agree that topicdrop cannot be a full explanation. Rizzi (1994a,b) has proposed that the dropped subjects meet the condition of diary drop (Haegeman, 1990) and that these diary drop null subjects result from a quite complicated syntax - they are a null constant and licensed by some complicated conditions so that only specifiers of roots may be null-constant. However, so far as I can tell, Rizzi's theory does not predict the strong difference in rates of null-subjects for finite verbs and for OI's. Furthermore, it mispredicts the wh null-subject data that I will soon discuss. But it is fair to say that if expletive null-subjects of finite verbs are a real phenomenon, they can't be captured by the topic-drop theory. So far, however, expletive nullsubjects have only been observed in very few languages, and not very extensively. Moreover, it is possible that in at least some (perhaps all) cases, the expletive subject is a quasi-argument (it rains), at least for the child, so that topic-drop can explain these null subjects (e rains, if it exists in some language). It's not clear how many non-quasi-argument null-subjects exist, forms equivalent to the English e seems that Mary has left. Hyams (1986) claims that in the null-subject stage expletives aren't used - could it be that non-null and null expletives as well aren't used? This is possible, as young children seem to prefer verbs with argument subjects, perhaps as a result of a constraint against A-chains or perhaps instead of a constraint against A-chains (Borer and Wexler, 1987, 1992; Babyonyshev et al., 1994, 1998) At any rate, the case of null and non-null subject expletives remains as a major case for investigation in young children's grammars. It should also be mentioned that in more recent work Haegeman (p.c.) reports that null subjects are indeed possible in embedded clauses in the diary-drop register. Thus it appears that the specifier of root analysis in fact doesn't account for diary-drop in the same way that truncation doesn't account for OI structures - embedded clauses are not oblivious to the process. 10 Only distressed subjects in first position can alternate with a null subject, (i) Speaker 1: John is buying a book. Speaker 2: What is he/*e buying'.'/What's he/*e buying The subject can't be dropped by Speaker 2 because it is not in first position.
K. Wexler I Lingua 106 (1998) 23-79
35
out with a null-subject. From now on I will simply talk about topic-drop, since we are dealing only with referential subjects, but the alternative possibility should be kept in mind. Thus if children assume that some subjects are very strong topics, strong enough to be dropped, this is congruent with the general point that children assume that the listener knows more about the subject than she (the listener) actually does. Let's say: (8) In a language, Topic-drop applies to Very Strong (VS) Topics where we will not specify the conditions which make a topic VS. We also assume: (9) Children believe that some non-VS subjects are in fact VS ( a pragmatic error). From (8)-(9) it follows that children will drop subjects of finite verbs even when they shouldn't. Let me mention just one further piece of evidence for (9). In some V2 languages, for example, Dutch, topics in Spec, C can often be dropped (the stronger they are as topics, the more readily they can be dropped - there is clearly a pragmatic effect). Children do this topic drop from a very early age (e.g. Van Kampen, 1997). Wexler et al. (1998), studying a large number of Dutch children, showed that the children dropped topics more often than would be expected by adults, in conformity to (9). Thus I assume two processes that allow null-subjects in children in the OI stage. (10) In OI stage, null-subjects are: a. PRO or Topic-drop if the verb is non-finite (an OI) b. Topic drop if the verb is finite That is, dropped topics aren't licensed by inflectional material on the verb, or particular inflectional projections existing or missing. So they can exist with finite or nonfinite (OI) main verbs. On the other hand, PRO is grammatically licensed by OI's. Thus the conditions under which OI's may have null subjects (topic drop, or OIlicensed) properly include the conditions under which finite verbs may have null subjects (topic drop). Thus (10) predicts: (11) The rate of null-subjects with OI's is larger than the rate of null-subjects with finite main verbs. As we have already discussed, (11) is true, so the two-process theory in (10) predicts the empirical results across a wide variety of languages. Of course we even know that often the rate of null-subjects for OI's is much greater than the rate for finite verbs. This simply suggests that the pragmatic error (9) is not extremely common - just common enough to allow a certain rate of null-subjects of finite verbs. Moreover, it often happens that the rate of null-subjects in some languages for OI's approaches 100% (though not in all children in all languages). This large rate of null-subjects for OI's suggests strongly that the non-finite nature of the OI gram-
36
K. Wexler I Lingua 106 (1998) 23-79
matically licenses the null-subject. This is a very different phenomenon than the low rate of pragmatic-error-induced null-subjects for finite verbs. The two-process theory of null-subjects allows us to explain the strongly differential rate of null-subjects in finite and OI sentences. However, it makes an even stronger prediction, as argued in Bromberg and Wexler (1995). Consider wh-sentences, where the wh is a non-subject, say an object or an adjunct. We can ask whether the subject can be null. Let's take as an example an adjunct question, with a locative question, where in English. First, suppose that the verb is finite. This means that PRO is not licensed, thus structure (12b) is ungrammatical. That is, only (lOb) Topic Drop is the potential source for a null subject. Let us designate a null topic as e(T), as in (12c). But we have already indicated (see footnote 7) that only topics in first position can be omitted, and children know this.11 Thus (12c) is ungrammatical for children. (12) a. b. c. d. e. f.
where is she going? * where is PRO going? * where is e(T) going? where she going? where PRO going? * where e(T) going?
Now consider OI wh questions, as in (12d), where the verb is non-finite. By (lOa), the OI licenses PRO, so that (12e) is grammatical. As with the finite verb, the subject not in first position cannot be dropped as a topic (or, following Bromberg and Wexler, 1995, topic drop applies only to topics in Spec,C and there is already an element in Spec,C, the wh-element where.). Thus (12f) is ungrammatical. The analyses in (12) yield the predictions of the following sentence patterns in OI grammar. (13) a. b. c. d.
where is she going? * where is e going? where she going? where e going?
The prediction from the two-process theory thus is that for wh-questions, the subject may be omitted for OI's, but not for finite verbs. The reason is that for finite verbs the subject can't be PRO and for wh-questions topic-drop of the subject isn't allowed. Bromberg and Wexler (1995; cf. also Roeper and Rohrbacher, 1995) showed that in fact this empirical result strongly follows. The data in (14) is from 11
Bromberg and Wexler (1995) argued that topic drop couldn't take place finite in wh-sentences like (12c) because topics have to be in Spec,C and the wh constituent already fills Spec,C in (12c) so that the topic cannot fill Spec,C. This is quite possibly the correct explanation. I have talked about 'first position' only in order to generalize to adult forms in which expletives drop and also forms like (7b), where the dropped subject is probably not in Spec,C. If these forms are accounted for in some other way, then the Spec,C analysis will be preferred.
K. Wexler I Lingua 106 (1998) 23-79
37
one child studied by Bromberg and Wexler (1995); the child is Adam, reported on in (Which publication??, 1973). The data was taken from the CHILDES data base.12 (14) Finiteness of null and pronominal subjects in adam's w/z-questions Finite Nonfinite Null 2 118 Pronominal 117 131 The result in (14) is extremely strong. In wh-questions, Adam has plenty of pronominal subjects in both finite (13a) and non-finite (OI) (13c) sentences, and plenty of null subjects in nonfinite (OI) (13d) sentences, but essentially no null-subjects in finite sentences (13b). This is exactly the pattern that the two-process theory predicts in (13). Bromberg and Wexler studied other children and obtained the same result. They also obtained the same result considering all visible subjects, not just pronominal subjects; that is, in wh-questions, non-pronominal lexical subjects appear both with finite and non-finite verbs although the null-subjects appear only with nonfinite verbs. (They argued that pronominal subjects may be the best to compare to null subjects, because only a pronominal subject has the capacity to be topic-dropped, on pragmatic grounds.) The strikingly strong data in (14) (and for other children) is clear enough that it must be explained by any theory of the OI stage. As far as I know, other theories cannot explain this data. For example, Rizzi's (1994a,b) 'truncation' theory assumes that if a functional projection is present in the child's representation, then all projections underneath that projection are also present. Thus if there is a wh-question, CP must be present, and thus the empty subject, wherever it is, is in a projection under CP, and thus is not a Specifier of a root. Thus, as Rizzi argues, the truncation theory predicts that in a wh-sentence, with the wh-phrase in root position, since the subject is a non-root, it can not be null. Thus the 118 subjects of nonfinite verbs are impossible. Thus the truncation theory mispredicts the data. Of course we already know that the truncation theory cannot explain the difference in null-subject rates between finite and non-finite (OI) main verbs. Furthermore, the truncation theory predicts that OI's will not exist in wh-questions, since if CP exists TP (Tense-Phrase) must exist, on the assumptions of the truncation theory. Thus the very existence of non-finite wh-questions is evidence against the truncation theory, as first pointed out by Roeper and Rohrbacher (1995) and confirmed by Bromberg and Wexler (1995). The interaction of finiteness and null-subjects thus gives three empirical arguments against the truncation theory. It is true that there are very few OI's in wh-questions in languages like German and Dutch, unlike English or in fact, as Poeppel and Wexler (1993) showed for German (see now Santelmann, 1995, for Swedish), there are extremely few OI's when there is any constituent in Spec,C. Following Poeppel and Wexler (1993), we can see All references to the CHILDES data base are to McWhinney and Snow (1985).
38
K. Wexler I Lingua 106 (1998) 23-79
why this should hold. These are V2 languages. It is a property of V2 languages that if Spec,CP is filled by some constituent then C must be filled by a finite verb. Given VEPS, a child in the OI stage will have correctly set the V2 parameter (and Poeppel and Wexler produced extensive evidence for this in the German child they studied). On the assumption that a child who knows that a language is V2 knows the properties that UG says follow from the V2 nature of the language, however these are specified, we would expect that a child speaking a V2 language would insist that C be filled when wh or any other constituent filled Spec,C. But since it is also a property of such languages that only finite verbs can move to C, the child will fill C with a finite verb. Thus VEPS predicts that there are no wh-OI's in V2 languages. It is a virtue of an INFL-omission model such as Wexler (1990ff.) or Schutze and Wexler (1996a,b) that it predicts that wh OI's will exist in language like English but not in V2 languages like German, Dutch or Swedish. The truncation theory on the other hand mispredicts the possibility of wh OI's in languages like English.13 The basic problem with truncation is that it necessarily omits far too many functional projections. Once a category is missing, all higher categories must be missing. This radical and obligatory omission of categories simply doesn't appear to be a property of the OI stage, as much empirical analysis has shown. Of course, one could modify truncation so that the omission of these categories wasn't obligatory. But then the model would lose its 'truncation' characteristics and become (almost?) identical to the INFL-omission model of Wexler for which it was supposed to provide an alternative. The truncation theory has proved to be very valuable in showing what one kind of precisely specified reasonable alternative to INFL-omission would look like, so that precise predictions of INFL-omission versus an alternative could be studied. It will be very useful to continue to have a variety of models which meet the central phenomena but which make divergent predictions concerning other phenomena. Furthermore, the data in (14) cannot be explained by any theory which assumes that subjects are dropped only for informational reasons, i.e. only as a kind of pragmatic error (e.g. Greenfield, 1991). For why should there be a difference in this kind of subject drop for finite and non-finite verbs? Moreover, the data in (14) cannot be explained by a 'memory load' type theory which assumes that children drop subjects when the rest of the sentence (the verb phrase) is too long (e.g. Bloom, 1990). For, as Bromberg and Wexler (1995) pointed out, the memory load theory predicts that finite sentences, having 'longer' verb phrases than non-finite sentences,14 should omit subjects more often than non-finite n Crisma (1997) argues that there are no wh-OI's in French, a non-V2 language, in conformity with truncation, and against the INFL-omission model. But Phillips (1995 a,b) shows that almost all of the wh sentences in Crisma's data involve a (finite) auxiliary and noting that (Wexler, 1990, 1994) auxiliaries are never OI's, argues that we would expect Crisma's results independently of truncation. Thus the empirical results to date are compatible with INFL-Omission but not with truncation. Of course, it would be useful to have data on the question of whether wh OI's exist in other non-V2 languages, besides English. 14 This is automatically true if one counts length by morphemes, since Tense/Inflection counts as a morpheme, and it is even true if one counts length by words, since finite auxiliaries are often missing in OI sentences.
K. Wexler I Lingua 106 (1998) 23-79
39
sentences. This prediction should hold both for declarative sentences (where we already know it is a misprediction, since finite declarative sentences have in general far fewer null subjects than OI sentences) and for wh-questions (where we know it is a very strong misprediction, by data like (14)). Importantly, the two-process theory, so strongly upheld by (14), assumes that English-speaking children in the OI stage have correctly set the null-subject parameter, to the value no, null-subjects are not allowed. For if the children had set the parameter to yes, the theory would have made different predictions; namely finite wh-questions would have easily allowed null-subjects (i.e. (13b) would be grammatical). Thus, as Bromberg and Wexler (1995) argued, (14) is quite strong evidence that English-speaking children have set the null-subject parameter correctly to no. Furthermore, the equivalent of (13b) in a real null-subject language (e.g. Italian) is perfect grammatical, as expected. There is nothing about wh-questions which disallows null-subjects. Moreover, Rizzi (1994b) provides preliminary, suggestive evidence that young Italian-speaking children omit subjects of finite verbs in a whquestion (i.e. the equivalent of 13b). If this result is indeed true, then we can conclude that young Italian-speaking know that they are speaking a null-subject language.15 This rather precise and beautiful pattern of results, concerning the non-existence of null-subjects in finite object and adjunct wh-questions in a non-null-subject language (and their existence in a null-subject language), thus provides the strongest evidence, in my opinion, that the null-subject parameter is correctly set at the earliest observed ages: both for non-null-subject languages like English (where it is set to no) and to null-subject languages like Italian (where it is set to yes.) Thus we can conclude (as it seems to me by now the field already has, including Hyams (see Sano and Hyams, 1994) that the null-subject parameter is set correctly from the earliest ages. And that it is correctly set for both plus and minus values of the V2 parameter, depending on the input language. Thus the null-subject parameter (5d) can be added to the list of parameters set correctly at the earliest observed stages, providing further support for VEPS. In fact, to my knowledge, while there is extensive evidence (some of which I have just briefly reviewed) demonstrating the correct setting from the earliest observed ages of many basic parameters of clausal syntax, verb movement and inflection, there is almost no evidence demonstrating the incorrect setting of any parameter. For example, I know of no evidence showing that a child has incorrectly set the V2 parameter in German as not V2 or in French as V2. The lists of parameters and languages for which this conclusion holds is fairly large, and it grows every time somebody studies a new construction which shows parametric variation.16 15 Also suggestive in this regard are the results of Valian (1991), which suggest that there are a larger proportion of null-subjects in young Italian children than young English-speaking children, though the issue is somewhat complicated since we expect a good proportion of null-subjects in OI's, which English-speaking children have. 16 The strongest evidence of which I am aware which at first appears to show that a parameter has been mis-set is in Schoenenberger (1998) which offers quite convincing evidence that two young Swiss German-speaking children systematically produce embedded V2 constructions, while the adult language
40
K. Wexler I Lingua 106 (1998) 23-79
6. Early learning, growing principles That there is no evidence for the incorrect setting of any parameter, in other words, that VEPS is true, is a striking result. It is so striking not only because the evidence is so strong and so without exception, but also because the result itself is so surprising. When the Principles and Parameters theory was first created, the natural expectation in both linguistic theory and in the field of language acquisition was that children would sometimes mis-set parameters, and that this mis-setting would be the cause of children's 'funny' language productions. Thus Hyams' (1986) result which seemed to show that children in English mis-set the null-subject parameter was quite important, because it was the first discovered case which seemed to actually show in empirical language acquisition studies that parameters were mis-set, something which had been thought to be a natural result. Of course we now know, as demonstrated in this section, that the null-subject parameter is not mis-set. Why was it expected by the field as a whole that parameters would be mis-set? The reason, I believe, was the (sometimes implicit, sometimes explicit) assumption of LLEE (Section 0): Late Learning Early Emergence. In the Principles and Parameters theory, Principles were in general assumed to be innately specified in the genetic program; this was what is called Universal Grammar (UG). Parameters, on the other hand, represented what had to be 'learned' from experience. Given the assumption of LLEE, it was thus predicted that while principles should be known to the young child, parameter values would be set correctly at a later age, since learning was 'late'. Thus the demonstration that VEPS is true must be taken as strong evidence against LLEE. This is an important foundational result for the field, because it means that the fundamental intuition of the field of first language acquisition that non-adult grammatical productions in children were the result of faulty learning cannot be easily maintained. Given the failure of faulty (i.e. late) learning as an explanation, it follows that a different explanation of children's non-adult grammatical productions is clearly not V2 in embedded clauses. The statistical result is extremely strong, and the effect of the children's productions quite odd to a native speaker. Moreover, Schoenenberger (1998) has provided experimental evidence which confirms the evidence based on natural production data for a larger group of children. However, Schoenenberger (1998) argues on the basis of convincing distributional evidence that the children have not mis-set the embedded V2 parameter (or its equivalent in any syntactic theory of Swiss-German). Rather, she argues, the children have mis-classified some complementizers (C, heads) as full DP's, and thus place them in Spec,C rather than in C. Given the grammar of V2 languages, since Spec,C is filled (by the mis-analyzed Complementizers), movement of C is necessary, so that C is not empty. Thus the finite verb moves to C. On Schoenenberger's analysis, the parameter is a unitary V2 parameter whose effects in embedded clauses differ from matrix clauses in the adult language only because C is filled in embedded clauses. Thus children have set the V2 parameter correctly. Two important questions remain. One, why do the children mis-classify the complementizers as DP's? (Schoenenberger provides some speculation on this question.) Second, what is the relation between a mis-classified complementizer and the mis-setting of a parameter? What leads to the mis-classification of a lexical item? Is this dangerously close to the mis-setting of a parameter for VEPS' comfort? It seems to me that these are important questions for further investigation. It would also be interesting to discover any further cases of systematic mis-classification of items which such grammatical content.
K. Wexler I Lingua 106 (1998) 23-79
41
must be discovered, because children's early productions are quite often non-adult in certain ways. The natural assumption, of course, when learning fails, is that certain properties of grammar grow or mature, in the sense of Borer and Wexler (1987, 1992). For a current discussion of maturation see Wexler (in press). This is not the place to discuss the conceptual and foundational issues in any detail, but I will simply point out that Wexler argues in that paper that the only known alternative to maturation and learning - that 'performance' factors are the only or major cause of children's non-adult grammatical productions - cannot work as an explanation (see also Grain and Wexler (in press)). Thus we are left with maturation as the only viable explanation for a substantial number of properties of children's language which vary over time. Of course, such an explanation is the most natural one, given the assumption that grammar is instantiated in the human brain as a biological entity. Growth or maturation or development is perhaps the key feature of human biology. Thus the empirically-discovered need for maturation (given all the evidence produced on other topics, plus the new evidence from VEPS (that LLEE is false)) is actually a welcome result, since it shows that the most general biological model indeed is extendible to human language. Moreover, the assumption that language 'grows' in the mind/brain is the basic assumption of linguistic theory, so that the concept of language growth/maturation reintegrates the study of language acquisition with the study of linguistic theory. For further extensive discussion, comparing maturation with its opposite - 'rigidity' - see Wexler (in press).
7. Very early knowledge of inflection As mentioned earlier, VEKI is very close to VEPS, since inflectional material and parameters are closely linked theoretically. Thus the evidence that we have discussed which supports VEPS also supports VEKI. But, more specifically, we can ask whether the grammatical and phonological properties of specific inflections are known, in addition to the properties which induce what have been called syntactic parameters such as V-raising. One important set of inflectional items is the set of agreement morphemes. Are these known to young children? With respect to verbal inflection, Wexler (1990, 1994) argued that much of it is known to very young children. For example, Poeppel and Wexler argued that young German-speaking children (in their case a child of 2; 1, but they also discussed evidence from the data of Clahsen, 1991) do not make agreement mistakes; if they produce a verbal inflection, then the subject appropriately agrees with it. For example, if the child produced third person singular t on a verb, then the subject was third person singular more than 95% of the time. Likewise, if a subject has certain phi features, then an inflection on the verb will not have contradictory phi features. To take a well-documented example, the following table from Harris and Wexler (1996a,b) shows the frequency of English verb inflections with all first singular nominative pronoun subjects in a number of CHILDES corpora, over a number of children. See Harris and Wexler for the exact corpora.
42
K. Wexler I Lingua 106 (1998) 23-79
(15) stem 1349
irreg pst 325
ed 47
s 3
The number of third singulars on verbs with I as subject is vanishingly rare, only 3 out of more than 1,700 sentences which have / as a subject. Children simply don't say / likes ice cream. Harris and Wexler (1996) also show that s occurs much more frequently when the subject is third singular - she likes ice cream. (It shouldn't always occur; the lack of s indicates an OI, as Wexler (1990ff.) argues.) The conclusion is that the correct agreement features on verbal inflectional morphemes are known. Moreover the general phonological properties of the agreement morphemes are known. Thus children know that the phoneme /s/ in English is specified as +3rd +singular +present. Similarly for German t as argued above, etc. Of course, there may be some morphemes whose inflectional properties are not known at the earliest stages, since these must be learned. But in general, the morphemes of verbal agreement do seem to be known, in the sense that their phonology is in general known and their grammatical features are known. Some basic morphemes of tense are also known. Until recently it hadn't really been established, so far as I know, whether children knew the grammatical (tense) properties of the third singular present morpheme s or the past morpheme ed in English. This is because it isn't always easy to tell in natural production data what tense is intended by the children. However, there is now excellent evidence that young children know the grammatical properties of these morphemes. Rice et al. (1995) and Rice and Wexler (1996) did an elicitation experiment for present s and past ed morphemes, establishing the time context quite clearly. There were 40 children with mean age 3;0 in the Rice and Wexler study. The basic outline of the results is as follows. When a present context was used, children either used the correct inflectional present verbal form she paints pictures or an OI she paint pictures. When a past context was used, children either used the correct past verbal form she painted pictures or an OI she paint pictures. Out of large numbers of elicited productions children almost never used a present form s in a past context or a past form ed in a present context. There can be no question that these children knew that ed is +past and that s is -past.17 The Lexical Learning Hypothesis (Wexler and Chien, 1985; see also Borer, 1984; Manzini and Wexler, 1987 for related ideas) suggests that lexical items (including inflectional material, as emphasized by Borer) that underlie parameters have to be learned, but the hypothesis does not say when this material is learned. VEKI suggests that it general the inflectional material is learned quite early, though there is no reason to think that all inflectional material is known by the time that children are producing language (1;0 on average). In fact, Schiitze and Wexler (1996a,b) and Wexler et al. (in press) allow for the possibility and provide evidence that some case forms may not be known at a particular age, even relatively common forms. However, it is quite striking how much is known about abstract properties of verbal inflections (and of other types of inflection, such as NP inflections, including determiners, see e.g. Bohnacker (1997). In unpublished work Carson Schiitze and I have extended this result to children in their 2's.
K. Wexler I Lingua 106 (1998) 23-79
43
Again a general view in the field of language acquisition was that children have a great deal of trouble learning inflection. This view really was widely held - see, for example, Gleitman and Wanner (1982). I believe, in fact, that the belief in late learning of inflectional material was a parallel belief to late learning of parameters - both attempting to explain why children's utterances were so often non-adult like, even though they might know grammatical principles. The view was natural enough, again, given LLEE. Since the phonological and grammatical properties of particular inflections had to be learned, LLEE suggested that they would be learned late. In contrast, my view, that I think has been substantiated by now with a great deal of evidence in the Optional Infinitive stage, is that young children are extremely good at learning inflection; it is very difficult to find examples of central inflectional material that is difficult for young children to learn. In fact, I would say than an informal way to characterize some of very young children's abilities is that they are little inflection machines. Of course, it wouldn't be surprising if this is an over-statement - so many inflections remain to be investigated, and there could be all sorts of reasons that really do make certain inflections difficult to master. But I would like to propose this metaphor to counter the prevailing view that very young children are not good at inflections. 8. The AGR/TNS Omission Model As background for the continuing discussion on variation and maturation, we need to have a model of the OI stage - what syntactic properties do children in this stage show? Basically I assume that children in the OI stage have set the relevant parameters correctly (VEPS) and that, moreover, they know the relevant properties of UG. However, there is one UG difference between these children and adults. What is that difference? In Wexler (1990ff.) I assumed that children in the OI stage can optionally omit TNS from their syntactic representations. However, Schiitze and Wexler (1996a,b) argued that children in the OI stage can omit AGReement or TNS (or both, although I won't discuss that case here). I think there is good evidence for this AGR/TNS Omission MODEL (ATOM), and will assume it here. Here I will just briefly mention the evidence that Schiitze and Wexler (1996) used to argue for ATOM. Basically the evidence involved errors of subject case. Englishspeaking children often use non-NOM case in subject position (her going). Why? If TNS is omitted from the representation we can assume that a 'default' case (ACC in English) is assigned. This view predicts that ACC subjects will occur with OI's, since OI's are missing TNS. Such a view is confirmed by much data in English development. For example, here is the data from Nina, as reported in Schiitze and Wexler (1996a,b).18 18
Schiitze and Wexler provide file by file analyses showing the effect holds at different stages of a child's development; thus the effect cannot be due to both NOM and finiteness developing simultaneously. In other words, they do statistical analysis to show that there is a causal contingency between finiteness and NOM.
44
K. Wexler I Lingua 106 (1998) 23-79
(16) Nina's 3rd singular subject pronouns: finiteness vs. case (Schiitze and Wexler, 1996a,b) Subject Finite Nonfinite he + she 255 139 him + her 14 120 Percent non-NOM 5% 46% In (16) it can be seen that with nonfinite (OI) matrix verbs, 46% of Nina's pronominal subjects are ACC rather than NOM. We would expect these if default ACC were used with non-NOM subjects. Furthermore, with finite verbs, only 5% of the subjects are non-NOM. However, as Schiitze and Wexler point out, this analysis (of default ACC used when TNS is omitted) is not sufficient. The reason is that for nonfinite (OI) verbs, for which presumably TNS has been omitted, 54% of the subjects are NOM. That is, although finite verbs show almost 100% use of NOM subjects, nonfinite verbs show an alternation between NOM and non-NOM subjects. What licenses the NOM subject of OI verbs? 19 Schiitze and Wexler argue that AGR assigns NOM and that AGR may be present even when TNS is missing, resulting in an OI, but NOM case. Furthermore, Tense may be missing even when AGR is present; missing TNS results in a possibly PRO subject, thus empty subjects for the non-finite verbs, as discussed in Section 5. (17) AGRATNS Omission Model (ATOM) of OI Stage (Schiitze and Wexler, 1996a,b): a. AGR or TNS (or both) may be deleted b. AGR assigns NOM, if no AGR, subject gets default case c. Default case in English is ACC, in German/Dutch it is NOM d. Lack of TNS licenses PRO d. Morphology inserted according to Elsewhere Principle, e.g. Distributed Morphology e. Kid knows adult Syntax and Morphology (features, Elsewhere Principle, default forms) (18) Features of English Verbal Inflections (Schiitze and Wexler, 1996a,b) I si [+3rd, -i-singular, +present] led! [+past] 101 [ ] (i.e. no features, the empty inflection is default) In English, the morphosyntactic features for the verbal inflections of agreement and finiteness are specified in (18).20 Since (17e) assumes that the children know the 19 Schiitze and Wexler refine the predictions, arguing that past tense and modal verbs in English, which don't show agreement, should allow ACC subjects, and they present evidence in favor of this view. See also Wexler et al. (in press) for further evidence. 20 Notice that the model uses the Halle and Marantz (1993) assumption that in English the AGR and TNS nodes have fused, but this assumption isn't necessary for the analysis.
K. Wexler I Lingua 106 (1998) 23-79
45
relevant inflections (in accordance with VEKI), (18) is known to the child in the OI stage. The principles of Distributed Morphology (DM) (including the Elsewhere Principle) say that, at a node, various morphemes 'compete', but that the maximallyspecified morpheme which doesn't include a feature which is not present on the node will be inserted. In other words, the morpheme which has the most features in the lexicon will be inserted, so long as this morpheme does not contain a feature which does not exist on the node. It is possible, of course, that a morpheme may not specify a feature which is present on the node. Consider third singular present. If both AGR and TNS are included in the representation, then these three features will be specified, and DM predicts that I si will be inserted since it is the maximally specified consistent morpheme. Suppose, however, that AGR is missing from the representation although TNS is present, so that only +present is in the syntactic representation. I si is now inconsistent with the representation, since it is specified +3rd -(-singular, which are not in the representation, led I is also not consistent, since +past is not in the representation. However, the phonetically empty morpheme 101, which has no features is consistent with the representation, since it has no features which are not in the representation. Since no other morpheme is consistent with the representation, 101 is the maximally specified morpheme that is consistent with the representation, and it will be inserted. Thus with [-AGR, +TNS] in this (third person singular present) case, 101 will be inserted and we will get the verb like instead of the correct form likes. This is example of an OI in English. Note that ATOM predicts that there will be two kinds of OI's in English.21 Either AGR will be present and TNS missing or TNS will be present and AGR missing. Either situation will result in an OI where third singular s should appear. An OI form like Mary like ice cream will have two possible analyses. Now, suppose a third person masculine pronoun is used with this representation. Will he or him emerge? Since (17b) claims that if AGR is not in the representation, the default case is used, and since (17c) assumes that ACC is the default case in English, the ACC form (him) will emerge, and we will get (19). (19) a. him like [-AGR, +TNS] ice cream b. he like [+AGR, -TNS] ice cream Carrying through the analyses, one can see the general pattern for a child in the OI stage: (20) a. b. c. d.
21
he likes ice cream [+AGR, +TNS] he like ice cream [[+AGR, -TNS] him like ice cream [-AGR, +TNS] *him likes ice cream
Again, ignoring the possibility of both AGR and TNS being missing; see Schiitze and Wexler, 1996a; Schiitze, 1997, for discussion.
46
K. Wexler I Lingua 106 (1998) 23-79
(20d) is ungrammatical for the OI child because the presence of I si on the verb shows that AGR is present in the syntactic representation, but the presence of him shows that AGR is not present. This is the pattern of results that hold, as Schiitze and Wexler argue. (17c) assumes that 'default case' is different in different languages. Schiitze and Wexler (and for detailed argumentation see Schiitze, 1997) provide evidence for this choice in the adult language, and it is assumed (17e above) that children in the OI stage know which forms are defaults. Schiitze and Wexler use this assumption to predict the empirically correct fact that in languages like German and Dutch, the subject of OI's are almost uniformly NOM even for OI's, distinctly different from the results in English and French. I have simplified somewhat for ease of exposition. Schiitze and Wexler do not assume that there is a 'default' case in a language. Rather, they provide a set of features for the items which bear case which predicts how the less-specified forms will show up when specifications are missing. This is in accordance with the assumptions of Distributed Morphology. In particular, in English an ACC form like him will not be specified for a Case feature but will be specified only for phi features (person, gender, number). For example, /he/ is specified as [+NOM, +3rd, +singular, -i-masculine] whereas /him/ is specified as [+3rd, +singular, +masculine]. Thus when no case feature exists on a DP in the syntactic representation, only the ACC form him can be inserted, since other forms will have case features specified which will clash with the lack of a case feature on the DP in the syntactic representation. In German it is the 'NOM' form of the pronoun that has no case feature specified, whereas the ACC form is specified [+ACC]. Thus the NOM form is inserted when no case feature exists in the syntactic tree. This is a precise DM account of 'default' forms. Thus we have a case of variation in development (many subject case errors in English, none in German or Dutch), a strong cross-linguistic difference, which is predicted by a model which assumes that development of UG principles is the same in both languages (essentially as in adult language except that AGR/TNS may be deleted), and that VEPS and VEKI is true, so that the children have learned the correct values (for example, the features of verbal inflections, the features on case (e.g. NOM/ACC features are known, 'default case' is known, etc.). The 'difference' in development is simply the consequence of universal principles developing in a certain way, but interacting with (correctly) learned parameter-settings so that certain differences in surface behavior emerge. Results of this nature demonstrate in the field of language acquisition the familiar everyday lessons from science (e.g. the hard sciences) and from linguistic theory. Namely, deeper, more correct explanations of phenomena are necessary and useful; a surface difference may result from what is essentially a deeper commonality. The UG state of children in Dutch and English is the same (i.e. basically UG, but AGR/TNS Omission, in both languages). The variation (subject case 'errors' in English but not in Dutch) is due to a property of the languages themselves (the form of default case), which the children have learned, correctly, for each language. Neither language is more 'difficult' to learn; the English-speaking children simply give what appears to be a more 'incorrect' behavior (ACC instead of NOM case on subjects)
K. Wexler I Lingua 106 (1998) 23-79
47
because of the surface form of default case. The sophistication of knowledge of grammar in the English- and Dutch-speaking children is identical: UG modified by AGR/TNS Omission, together with the correct setting for default case in the input language. Why isn't omission of AGR or TNS from a syntactic representation of a root/matrix declarative sentence a violation of VEPS or VEKI? It isn't a violation of VEPS because the omission does not reflect a mis-set parameter. I am assuming that the requirement for AGR and TNS in the representation of a root declarative clause is a property of UG, whether of the computational component itself or of the interface of the computational component and the conceptual/interpretive component. There is nothing in UG that allows a 'setting' which will permit AGR/TNS to not exist in root structures. A violation of a universal, non-parametric property is, by definition, not a mis-set parameter. As for VEKI, the hypothesis that children know the grammatical properties of inflectional morphemes, note that the requirement is that children know what syntactic, morphological and phonological features are specified on a morpheme. These specifications do not require that the morphemes always appear in a sentence, even a root/matrix declarative sentence. Such a requirement could only come from a requirement that a node or phrase of a particular kind (e.g. AGR, TNS) appear in a phrase-marker (with a certain interpretation). Knowledge of inflection cannot produce the obligatory behavior. I have shown that children know the properties with which a morpheme is specified in the lexicon (e.g. that a particular agreement morpheme has certain specified agreement features, or that a particular INFL morpheme has a 'strong' property, so that movement to it must be overt). The obligatory nature of the presence of a morpheme in a structure depends on the syntactic representation, not on the lexical specification of the morpheme. Thus the omission of AGR or TNS does not violate VEKI. 9. What causes subjects of OI's to raise: D-features and the EPP A classic problem in the theory of OI's, ever since Wexler's (1990, 1992, 1994) proposal that OI's were missing INFL, is: what causes the subject of an OI to raise from a lower VP position to a higher position, the Specifier of some INFL projection. If INFL is missing, what feature is being checked when the subject raises? On Wexler's TNS-omission theory for example, if TNS is missing in an OI, why do subjects raise? An equivalent way of putting the question is: how do subjects of OI's get their case? If TNS is missing and TNS is responsible for NOM-assignment, how is NOM assigned? If default (ACC in English) case is assigned, because TNS is missing, why does the subject raise, since classically in contemporary grammar, subjects raise to satisfy a (NOM) case-feature. That is, subjects raise to INFL to get case, but if case of an OI is not assigned by an INFL element, but is default, why does raising of the subject take place? On Economy grounds, why doesn't the subject stay in place?
48
K. Wexler I Lingua 106 (1998) 23-79
The word order facts in the OI stage show that there must be a functional projection to which the subject moves. In sentences with negation, the basic word order observed is: (21) NPjNEG [ VPti V[-finite] ...] Assuming, as is standard, that NEG is generated to the left of VP and that the subject is generated in the Specifier of VP (as in (21)) or in a left-adjoined position to VP, we can see that the subject NP; raises to some position to the left of NEG. In adult syntax, this is the specifier of some INFL position. But if V is non-finite (an OI, as in (21)) then it is standardly assumed that no case is assigned to the subject of a non-finite verb, and thus case can't be assigned to NPi? the subject. The problem then is, why does the subject raise? First, let me very briefly just list some of the references which show that the subject of an OI in fact does appear to the left of negation, as in (21). (22) Some evidence for raised subjects of OI's: a. Medial Neg in English (Harris and Wexler, 1996) e.g. Mary not go b. Mainland Scandinavian OI's (Wexler, 1990ff.) though subject could be in Spec,C c. Faroese subjects before NEG in OI's, in (Jonas, 1995a,b, who shows that specific subjects exist before NEG and argues that that's a raised position in Faroese) d. French (Pierce, 1989, 1992; Deprez and Pierce, 1994; Levow, 1995) where subjects of OI's mostly appear to the left of NEG pas Although I do not have the space to review the evidence in detail here, it is accepted in the field that subjects appear to the left of NEG, even in OI's. This is overwhelmingly true. There is a small suggestion that at a very early part of the OI stage, the subject may actually not raise, and surface to the right of NEG (Pierce, 1989, 1992; Deprez and Pierce, 1994). However, at best this is a very short-lived stage. In fact, whether this possibility exists at all is not clear; Stromswold (mss.) argues that the Deprez and Pierce and Pierce evidence doesn't support the suggestion of such a possibility at all. She points out that most of the sentences with negation in initial position don't have subjects, thus making the determination of whether the subject is still in the VP impossible. She also makes a number of other methodological points. Wexler (1990, 1994) pointed out that this VP-internal stage didn't have much evidence for it, and subsequent work has confirmed rather than thrown doubt on that view. At any rate, there is no question or controversy at all, that during the great bulk of the OI stage, subjects surface consistently to the left of NEG, so that even if there turns out to be a subject internal to VP possibility, it is extremely shortlived and doesn't characterize the OI stage. Thus the problem remains: why do subjects of OI's raise, if they don't do it to receive (or check off) case? The problem is quite serious for any theory of the OI stage which assumes that children basically have UG representations and syntax. Simple theories like
K. Wexler I Lingua 106 (1998) 23-79
49
Wexler's TNS Omission theory (1990ff.) assume that this is so; but why then do subjects of OI's raise? Consider Rizzi's (1994a,b) 'Truncation Theory'. Since on this theory, OI's must be missing the TNS projection, and also any projection higher than TNS, how does the subject get case? The assumption of this theory is that the relevant functional projections come in the order AGRP TP VP. Thus an OI is a bare VP since it is missing TNS and thus all higher projections. But how is case assigned to the subject of a bare VP? In fact, Rizzi at least speculates that all subjects of OI's are in fact in some kind of adjoined 'Topic' position, which isn't represented at all by a functional category, something like (23): (23) MSubject.tvpti...]] Presumably the subject has raised to some kind of 'Topic' position, for semantic, not case reasons. This proposal leaves at least two questions unanswered. First, how does NOM case get assigned to the subject if there is no appropriate inflectional projection to assign it? (We know, even from the English facts, that NOM is often assigned, even though it is not default). Second, why should all subjects of OI's be in topic position? There is no evidence to support this view. Basically, Truncation theory has a difficult time with the raising and case of a subject.22 Schutze and Wexler's (1996 a,b) AGR/TNS Omission Model described in the last section predicts that when NOM is assigned by INFL, AGR is in the representation, so that if the sentence is an OI, TNS must be missing. So far, so good, AGR (which assigns NOM) motivates the raising of the subject, so that case can be assigned/checked. But what happens when TNS is present and AGR is missing, as ATOM assumes is possible? Default case is assigned under straight-forward assumptions from Distributed Morphology. That is, if no case feature is specified on the subject (since AGR is not present), then only the ACC form of the pronoun (which has no case features specified on it in English) will be consistent with the representation, and this ACC form will be inserted. (See an earlier footnote). So far so good. But why does the subject raise? In fact, the subject does raise even in this case, when default case is assigned, producing forms like (24a), but not (24b): (24) In the OI stage: a. him [Default case] not go (OI stage word order) b. *not him [Default case] go (NOT the OI stage word order)
22
A related problem for Truncation theory is that it predicts that OI's are not possible with negation, since NEG is superior to TNS, and thus NEG is omitted whenever TNS is omitted. Rizzi takes this as a virtue of Truncation Theory, since he argues that the facts are true for French. But, in fact, as Levow (1995) has shown, there are ample number of negative OI's in the French OI stage. Of course, negative OI's are also abundant in the Germanic OI stage, a fact which caused Haegemann (1995) to propose that NEG was below TNS in Germanic, thus allowing NEG w/out TNS under Truncation. The facts of negation in the OI stage thus argue against Truncation, unless negation were universally under TNS, which would cause other problems for the Truncation model.
50
K. Wexler I Lingua 106 (1998) 23-79
Moreover, in the other Germanic languages, since sometimes the observed NOM is default (when AGR is missing), we would expect some NOM subjects in those languages to appear to the right of negation, but they don't. Thus it is quite clear that all subjects, whether assigned NOM or not by INFL (AGR) raise out of the VP. The AGR/TNS Omission Model at least allows a functional position for the subject to move to in an OI (AGR or TNS, when the other one is missing).23 But it still leaves unanswered the question of why the default-case subject of an OI raises up. If TNS doesn't assign case, and AGR is missing, by Economy (Greed) we should expect the subject to not raise. Under any theory that assumes that subjects raise to be assigned case, we would expect that the subject of an OI with the case-assigning projection AGR missing would not raise, and would stay to the right of negation. This problem of the raising and case assignment to the subject of an OI has been a serious and interesting problem in the theory of OI's. If it couldn't be solved, we would be left with a real dilemma about the nature of the child's knowledge of UG. It is a virtue of the precise investigation of children's grammar in the OI stage that the problem has indeed surfaced. In my opinion, it is a sign of the maturity of the field of developmental psycholinguistics that a problem of this nature could even arise. For it takes enough structure to the theoretical analysis to let a problem arise. It is thus gratifying that in fact a rather satisfying answer has emerged. Chomsky (1995: Chapter 4) argues that it can't be a Case property that attracts the subject out of VP to a Spec, INFL. Subjects raise to higher positions even when they don't receive Case there. So EPP (the Extended Projection Principle) is not driven by Case in a Minimalist (Economy) theory. Consider an infinitival (raising) construction, as in (25): (25) Mary INFL(+FIN) seems [e, to INFL(-FIN) [VP e2 like it here] DP Mary in (25) first moves to the lower Spec, INFL, but it doesn't receive Case there, since non-finite INFL doesn't assign (or check) case, certainly not NOM case. In general EPP applies cyclically, and can apply many times, but Case is only checked once, when NOM is assigned/checked. But if case is the reason for movement of the subject, there would be no reason for the subject to undergo its first move, the subject of the non-finite INFL. Economy (Greed) would thus predict that it wouldn 't move. However, it does move. Thus case checking cannot be the reason for the movement of the subject, for EPP. Thus Chomsky proposes (26). (26) EPP is the requirement that a D (Determiner) feature be checked
23
Schutze and Wexler (1996a,b) and Schutze (1997) discuss the possibility that when both AGR and TNS are missing, GENitive case surfaces on the subject - my go. If both AGR and TNS are missing, how could the subject raise? Where is it sitting? In fact, we don't know whether genitive subjects do raise to the left of negation. There isn't enough analyzed empirical material at the moment to discuss this situation, so far as I know. See Vainikka (1993/94) for a discussion of genitive subjects in young English-speaking children.
K. Wexler I Lingua 106 (1998) 23-79
51
In other words, the INFL projection has a D-feature which must be checked by a suitable D-feature on a DP. The subject raises up and checks off this feature. Crucially, the relevant feature is not a Case feature. Thus in infinitivals such as (25), the non-finite lower INFL has a D-feature which must be checked. Thus it attracts the subject DP Maty, which has this D-feature. The upper (finite) INFL also has a Dfeature, which must be checked, so it then attracts the DP Mary, checking off its Dfeature. In OI representations, the INFL is non-finite. But it still will have a D-feature, just as infinitives do in the adult grammar. Thus this D-feature will still have to be checked off, and the subject to raise to Spec,INFL to accomplish this, even when no case-assignment or checking is accomplished by INFL (again as in some embedded infinitival constructions in the adult grammar, as in (25)). Thus the problem of why subjects raise in OI representations is solved; in fact, raising is forced, by the same mechanisms which force raising in the adult grammar. Case assignment or checking is seen to be irrelevant, whether in adult representations or in OI representations. To see this in more detail in the ATOM, we assume (27). (27) Both AGRS and TNS have a D feature24 which must be eliminated by checking against the D-feature of a DP which raises up for checking Briefly and informally, what forces movement in the (Chomsky, 1995: Chapter 4) theory that I am adopting is that -Interpretable (non-interpretable) features (that is, features that play no role at LF) must be eliminated by checking, thus they delete as soon as they check and they obligatorially check, so that they can delete. Chomsky (1995: 282) assumes that "features of the target that enter into checking relations [are] invariably -Interpretable". Otherwise the movement/checking would be "locally superfluous", so that the -Interpretable nature of the target features helps to avoid "excessive computational complexity" (1995: 283). The functional categories are the targets, thus it is their features that are necessarily -Interpretable. Thus Dfeatures on AGRS and TNS must be -Interpretable (in contrast to the D-features on a DP, which are interpretable and thus don't delete after checking). Chomsky (1995: Chapter 4) assumes that TNS has a D-feature. When this feature is strong, overt movement is forced, thereby displaying the Extended Projection Principle effects. I am assuming in (27) that AGRS also has a D-feature, so that 24
I have worked out the theory here assuming that AGRS heads a projection, just as does TNS. Chomsky (1995: Chapter 4) questions the assumption that AGR heads a projection, and the issue remains open (see Jonas (1995a,b) for arguments that an INFL projection in addition to TNS is necessary). If it turns out that AGR should not be looked on as heading a projection, then it seems possible that a theory in the same spirit as I present here can be implemented. Possibly a single INFL projection will have both TNS and AGR features, either of which may be omitted by the OI child. One might attempt to replicate the effect of AGRS in the OI stage with considerations of the verbal functional category v of Chomsky (1995: Chapter 4). However, there doesn't seem to be an obvious way to then explain the relation between null-subject languages and the lack of the OI stage, as is done with AGRS in Section 12. See Schiitze (1997) for additional discussion.
52
K. Wexler I Lingua 106 (1998) 23-79
movement is forced to AGRS as well as to TNS, checking the D-feature in each case. I am making no particular assumptions about whether the D feature on AGRS or TNS is strong in every language; it could vary across languages and possibly vary between AGRS and TNS even in a language. Based on the child data in the OI stage we will return to this question. In finite clauses (adult or child), the D-feature on TNS attracts the subject to raise to Spec,TNS and then the D-feature on AGR attracts the subject to raise to Spec,AGR. In OI's, according to the ATOM, there are two possibilities, with AGR present and TNS missing, or with TNS present and AGR missing.25 (28) a. NPj [No CASE Assigned] TNS[D] [ VP ti V...] b. NPj [NOM] AGR[D] [VP t; V...] (29) a. him like ice cream b. he like ice cream If TNS is present, but AGR missing, as in (28a), the D-feature on TNS causes the subject to raise. Since TNS by itself does not assign case, no case feature is assigned/checked by TNS and Default case is spelled out by the morphology, yielding (29a). If AGR is present, but TNS is missing, as in (28b), the D-feature on AGR causes the subject to raise. Since AGR assigns/checks NOMinative case, this shows up on the subject, yielding (29b). Note that the subject raises up in OI structures in English whether or not NOM has been assigned to the subject, as we have pointed out. Thus the subject raises up whether AGR or TNS is the sole INFL category in that structure. Thus we can conclude from the child grammar that AGRS and TNS in these languages both have a strong D-feature. Whether the D-feature on AGRS and TNS must have the same strength value for a given language remains to be investigated. Consider VSO languages, like Irish. Carnie (1995) argues that the subject actually raises up to the lower of two INFL positions, but that the verb raises up to the higher of the two INFL positions, the VSO order resulting. We can take the higher position to be AGRS and the lower position to be TNS. Thus the subject raises up (overtly) only to Spec, TNS. Thus in Irish we can take the D-feature on TNS to be strong but the D-feature on AGRS to be weak. Thus, apparently, the strength of the D-feature on AGRS and TNS can vary within a language. Consider the OI stage of Irish, as mentioned by Wexler (1995). Suppose both AGRS and TNS are present, then we get the adult VSO finite structure. Suppose AGRS is missing from the structure but that TNS is present. Then we get a nonfinite verb (because agreement can't be checked). The non-finite verb doesn't raise and the subject raises up overtly (via the strong D-feature on TNS) to Spec,TNS. Thus we get a non-finite SVO structure (and since AGRS isn't present, the case on 25
Again, for simplicity we ignore the possibility of both AGR and TNS missing. See an earlier footnote for discussion.
K. Wexler I Lingua 106 (1998) 23-79
53
the subject is whatever Irish default case turns out to be). Suppose, on the other hand that TNS is missing from the structure, but AGRS is present. Again, we get a nonfinite verb which doesn't raise, and a subject in this case which doesn't raise overtly, again yielding an SVO non-finite structure. In this case, NOM case will be assigned to the subject since AGRS is present (and the checking can take place at LF). As Wexler pointed out, the pattern is VSO finite sentences and SVO non-finite sentences is exactly what is observed in child Irish. (I know of no reported evidence on subject case in OI Irish.) The UG-motivated assumption (26) that subject-raising (EPP) is motivated by a categorial feature rather than a case feature solves the problem of why the subject moves up even though it doesn't receive structural case in OI's. Motivation of movement through a categorial feature thus receives strong support in acquisition research concerning very early inflectional development. As has happened before (e.g. the case of Binding Principle B, and the distinction between binding and reference; see Wexler and Chien, 1985; Chien and Wexler, 1990; Avrutin and Wexler, 1992; Wexler, in press), the child's grammar shows a grammatical process in a purer form than does the adult grammar. Adults must have AGR in root clauses, and thus NOM case is assigned, and it appeared as if it was Case that was driving movement. The child, in the same simple clause, can be missing AGR, so that it becomes quite clear that Case can't be driving movement; the actual underlying process shows up more clearly than in adult grammar, where it was obscured by the obligatory existence of AGR (and thus of NOM case on the subject) in standard declarative clauses.26 One might ask, how can a representation which is missing AGR (or TNS for that matter) still allow us to claim that basically children are applying UG syntax, since these projections are obligatory in simple clauses. It is quite difficult to find a principle of syntax which requires the existence of AGR or TNS in a clause. If one searches through, say, GB syntax or Minimalist syntax, one doesn't find a principle which requires the presence of these projections. Rather, the projections are assumed to exist, and then, given properties of syntax, certain consequences follow, for example, the raising of a subject. Even the existence of TNS is not explained in the syntactic theory. Perhaps the best place to look for conditions which would make TNS obligatory in declarative root clauses would be a theory of the syntax/interpretation interface, for example, in some kind of 'Anchoring' Principle (En£, 1987). But it is important to stress that the existence of these projections in declarative representations is not part of the 'Computational System' of the syntax, as discussed in Chomsky (1995). Thus to the extent that the child is omitting projections, it might be looked on as an 'interface' property that the child is making a mistake on. Wexler (in press) formulates the hypothesis that much (most? all?) of syntactic development 26 Of course there are some non-finite root clauses in adult grammar, for example various kinds of exhortatives (Rain in Edinburgh? Never!), but these are marginal enough that they haven't played much of a role in the discussion of the syntax of basic clause structure. For discussion of the relation of nonfinite root clauses in adults to non-finite root clauses in children, see Weverink, 1989; Wexler, 1990ff.; Rizzi, 1994a,b among others.
54
K. Wexler I Lingua 106 (1998) 23-79
(growth) (as opposed to 'learning', as in parameter-setting) is the growth of interface properties, perhaps in a way analogous to the growth of coordination of other physiological systems (e.g. eye-hand coordination). However these more broad issues turn out, the fact that there was a word order problem (why does the subject of an OI raise?) and a case problem (how is it possible that the subject of an OI can receive NOM CASE or non-NOM CASE, when it's clear that children know case syntax and morphology) should have driven acquisition researchers to create a theory of syntax which does not assume that case is what drives the EPP (raising of a subject). The data is actually better in Acquisition, in my opinion, because of the existence of OI's, than in adult syntax. Even though the syntactic theory of D-motivation of EPP was developed independently of the acquisition results, I hope that this discussion of why the acquisition results should have led acquisition researchers to reformulate the theory of UG will have the effect of making researchers more open to using theoretically and empirically well-grounded acquisition results as the basis for reformulating elements of the theory of syntax in the future. Of course, the basic assumption that is necessary for such an approach is that the computational system of the syntax is mostly in place in young children, an assumption which seems to me by now to be well-grounded empirically. I hope here to have demonstrated that the OI child is a model system for teasing apart the affects and syntactic significance of specific INFL categories. Studying an OI child is like studying an animal with a gene 'knock-out' technique. The fact that the OI child is missing highly specific aspects of grammar (perhaps interface aspects) allows us to deduce which properties of UG are determined by these aspects. In my view, the case for the integration of child syntax and adult syntax is now substantial, not only in principle but in practice. Moreover, the influences of the fields can and should (in practice as well as in principle) be bi-directional. As Hamburger and Wexler (1973) concluded in their last sentence, "[t]he bridge that Chomsky has re-erected between psychology and linguistics bears two-way traffic". 10. Developmental variation across languages. The null-subject/optional infinitive correlation Before we turn to the important question of what causes the OI stage, we should note that there is variation across languages; some languages appear not to undergo an OI stage, as suggested in the original cross-linguistic description (Wexler, 1992, 1994). If some languages don't go through the OI stage, how can we say that the stage is the result of universal linguistic growth or maturation? Doesn't the very existence of the cross-linguistic variation imply that the finiteness requirement is easier to 'learn' in one language than in another language, so that the OI stage itself must be the result of some kind of slow or difficult learning? The answer is 'no'; as we will see in the next section, the cross-linguistic variation is once again to be considered as the consequence of a universal (across languages) development interacting with particular well-learned properties of a language. First, however, it is neces-
K. Wexler I Lingua 106 (1998) 23-79
55
sary to find the correct empirical generalization concerning which language undergo the OI stage. This is not the place to review the evidence concerning particular OI languages in any detail, but let me just extremely briefly summarize the general outline. What do I mean by an 'OI language'? Namely, one in which in early development a substantial proportion of root clauses (that in the adult grammar are required to be finite) are produced by the child in non-finite form. There is sometimes confusion about the name Optional 'Infinitive'. Even in the earliest work (Wexler, 1990,1994) it was understood that OI's are not only what is usually described as 'infinitives', but that there are a variety of non-finite forms that appear in the stage (e.g. missing auxiliaries mean that finiteness isn't marked, as in Mary going (a common sentence in the OI stage instead of Mary is going in English). The usual measure, however, is the proportion of clauses that should be finite that are actually produced as infinitives,27 and that is the measure that I will adopt here (In fact we will see later that a language may, on principled and empirical grounds, not literally have root infinitives, but may have other types of non-finite forms.) Again, for reasons of space I won't actually discuss the percentage of OI's in each language, but simply list them according to what is generally understood about them, basically following Wexler (1990ff., 1996), though with some additions.28 (30) OI languages: All Germanic languages studied to date, including Danish, Dutch, English, Faroese, (Jonas, 1995a,b); Icelandic (Sigurjonsdottir, 1992); Norwegian (Wexler, 1990, 1994); Swedish (Wexler, 1990, 1994). Also French (Pierce, 1989; Weissenborn, 1991). Also Irish (Wexler, 1995, based on data in Hickey, 1990; see also Guilfoyle, 1996); Russian (Bar-Shalom and Snyder, 1997); Brazilian Portuguese (Wexler and Secco, in preparation); Czech (Moucka and Wexler, in preparation). (31) Non-OI languages: Italian (Wexler, 1992, 1994, based on Schaeffer, 1990 data; Guasti, 1994); Spanish (Grinstead, 1994; Torrens, 1995); Catalan (Torrens, 1995); Tamil (Sarma, 1995); Polish (Klepper, 1996). Based on (some of) the above observations, Wexler (1990, 1994) suggested it was rich agreement languages which didn't allow OI's, but some of the non-OI languages (e.g. Icelandic) have almost as rich agreement as Italian, perhaps as rich, depending on verb conjugation. Also, surface morphology usually doesn't turn out to be exactly identical with underlying grammatical process in syntactic investigations,29 so it would be surprising if the existence of such a striking and consistent
27 Again, it has been known since the earliest cro'ss-linguistic work on the OI stage (Wexler, 1990ff.) that the infinitival particle (to in English, zu in Gei/man, de, a in French, etc does not show up on the OI verb. Thus 'infinitive' refers to the form of the verb itself, an aspect of verbal morphology.) 28 Percentages of OI's in various languages, compared, can be found in many references, including Phillips (1995a,b). 29 For example, although there definitely is a correlation between richness of verbal morphology and licensing of null-subjects (putting aside languages like Japanese, etc.), still there is no agreed-on precise
56
K. Wexler I Lingua 106 (1998) 23-79
piece of developmental syntax as the Optional Infinitive phenomenon was strictly tied to surface morphology. As an empirical generalization based on the above material, Wexler (1996) (see also Sano and Hyams, 1994) proposed the Null-Subject/OI Generalization. (32) The Null-Subject/OI Generalization (NS/OI) Children in a language go through an OI stage if and only if the language is not an INFL-licensed null-subject language. Wexler based this proposal on the contrast between languages like Italian, which don't show OI's, and the Germanic languages and French, which do show OI's. Since then there has been a substantial amount of research on the OI stage which substantiates the conclusion. It is important to understand the nature of the proposal. Some languages omit many subjects but don't seem to license subject-omission through INFL. Rather the subjects seem to be omitted through some kind of a discourse process. Thus, although Polish and Russian both allow subjects to be omitted, Franks (1995) argues that Polish is an Italian-style (i.e. INFL-licensed) null-subject language, whereas Russian isn't. Thus we expect Russian to show OI's in an early developmental stage, but Polish to not show OI's. Such appears to be the case (BarShalom and Snyder, 1997, for the Russian data, as well as for discussion of the nullsubject status of Russian and the relation to NS/OI; Klepper, 1996, for Polish). Similarly, Czech is classified by Franks (1995) as probably/possibly null-subject, but it definitely shows properties (e.g. overt expletives and the strong possibility of unmarked subject pronouns) which lead Moucka and Wexler (in preparation) to consider it not a null-subject language, and therefore we would expect it to have OI's. A rather striking case is Hebrew, since it is well-understood (e.g. Borer, 1984) that part of the verbal inflection paradigm in Hebrew (non-present, non-3rd person) licenses null-subjects, whereas part of the paradigm does not license null-subjects, although discourse-related omitted subjects are possible. Rhee and Wexler (1995) showed that NS/OI holds within Hebrew, that is OI's are allowed only in that part of the inflectional paradigm which does not license null subjects. Thus NS/OI seems to hold, so far rather strongly. It is a striking generalization which has a good deal of empirical support, making rather subtle and precise predictions concerning separation between languages on syntactically deep characteristics. It cries out for explanation. Of course, the explanation of NS/OI requires first an explanation of why OI's occur. Before proposing such an explanation in the next section, I would like to mention again the great value of detailed empirical quantitative research in development morphological characterization of languages which license null-subjects. Similarly, although there is definitely a correlation between richness of verbal morphology and verb-raising, the search for an empirically adequate statement of this correlation across languages is only partially successful. The appropriate conclusion seems to be that the syntactically-relevant morphological property is somewhat more abstract than can be stated as a generalization over surface morphology, thus, certain languages have an INFL that licenses and identifies null-subjects, others don't. It is this kind of syntactically-relevant feature that we should look for in attempting to generalize as to what features of a language license OI's.
K. Wexler I Lingua 106 (1998) 23-79
57
related to clear ideas of syntax. Without both a detailed quantitative map of aspects of children's early verb forms in many different languages and an understanding of the syntax of these languages (so that one can distinguish between surface characteristics of morphological forms and the question of what are the more abstract syntactic characteristics of the languages) it would have been impossible to arrive at such clear and striking generalizations. In my opinion both the methodological and theoretical depth of the study of language acquisition has increased markedly in recent years, and the fact that both methodology and theory have been treated rigorously together is one of the more salient reasons for this increase in insight in the field. In my opinion, the traditional dichotomy in the field of developmental psycholinguistics between 'theory' and 'empirical methodology' was always a red herring; one can hardly separate the two in serious research. The methodological and empirical standards of the field have been raised by OI research - given the nature of science, which is always at its best an interaction between theory and experiment - it could hardly be otherwise.
11. What causes the OI stage?: The Unique Checking Constraint 11.1. The Unique Checking Constraint defined and the derivation of ATOM Wexler (1990, 1994) proposed the TNS-omission Model of the OI stage, and suggested that it was properties of the child's knowledge of TNS that resulted in the OI stage.30 Since the ATOM argues that AGR as well as TNS may be deleted in the OI stage, we cannot do empirically with a strict model of TNS-omission, but must rather explain why AGR or TNS may be deleted. Furthermore, any theory of OI's should explain why NS/OI (32) holds. Why is the AGR or TNS projection omitted in some L's early on, while not in other L's? Therefore, we must attempt to derive the (so far empirically adequate) AGR/TNS Omission Model from more fundamental assumptions and then attempt to understand why null-subject languages show different effects from non-null-subject languages. We must explain: (33) In a. b. c.
the OI stage: Finite sentences are sometimes used Non-finite sentences are sometimes used Finite and Non-finite verbs show up in the appropriate positions
30 Wexler proposed three different models, which differed in a number of ways. The details don't matter here, only the point that it was knowledge of TNS that was relevant. I should also point out that it has been suggested in other places (for example Sano and Hyams, 1994; Wexler, 1994, 1996)) that the knowledge of TNS that children didn't have was a kind of pragmatic or interpretive knowledge. This is consistent with the point I made earlier in this paper that linguistic development in general, and development out of the OI stage in particular, might be due to development at the interface. However, given the arguments of the ATOM that AGR can be omitted along with TNS, it does not appear as if the OI stage is due purely to properties of TNS.
58
K. Wexler I Lingua 106 (1998) 23-79
d. Case and finiteness properties follow the ATOM e. NS/OI holds, that is, the ATOM doesn't hold for null-subject languages OI's are not used Ever since the TNS-Omission model of the OI stage in Wexler (1990ff.), there have been a variety of models of the OI stage proposed. Although different in quite interesting detail, they all bear a strong resemblance to the TNS-Omission model. Namely, they all propose that the underlying cause/analysis of the OI stage involves the ability to omit certain functional features or projections in the syntactic representation, or to put it another way, the lack of knowledge of the obligatoriness of certain features or projections. In some sense, from the TNS-Omission model to the current model, it is the underspecification of a feature and/or projection that characterizes models of the OI stage. Thus the TNS-Omission model assumes that children have a problem with TNS so that TNS is omitted, and this characteristic holds over the three models presented in Wexler (1994). Similarly, Rizzi's (1994a) truncation theory assumes that OI children omit certain projections because of the lack of knowledge that the root of a sentence must be CP; thus certain projections are omissable. Schiitze and Wexler's AGR/TNS Omission Model modifies the TNSOmission model so that AGR may be omitted as well as TNS. Finally, Hoekstra et al. (1996) assume that the number feature may be omitted in the OI stage rather than the TNS feature. In order then, we have models of the OI stage which involve underspecification of TNS (Wexler), any (functional?) category (Rizzi), AGR and TNS (Schiitze and Wexler) and number (Hoekstra et al.). In this section I would like to propose an entirely different kind of model, which accounts for the OI stage and its properties not as the direct result of the underspecification of a feature, but from a more fundamental constraint on the computational system of syntax. I intend by this move to integrate a wider variety of developmental phenomena together and also to tie the theory of the OI stage closer to current syntactic theory. The underlying theory will involve deeper derivations than the underspecification theories proposed to date since the assumptions will be further from the data described. If correct, such a move has illuminating consequences for a science, as linguistic theory has demonstrated. It is more difficult to easily believe in such a theory, of course, because it is less directly phenomenological, depending on its support for its ability to describe and predict a wide variety of phenomena.31 Such a theory should be proposed tentatively, but vigorously pursued. In fact, the theory to be presented will integrate a wider class of phenomena, in an unpredicted way, not only deriving properties (33) above, but also applying for example to Italian structures in the OI stage that have been previously unexplained, or even discussed, given that it was thought that Italian was not an OI language. Thus the promise for such a theory has begun to be met. Future research will indicate whether such deeper theories may in fact be important to investigate in the study 31
See Wexler (1982) for a discussion of Einstein's observation that as science advances just such nonobvious, less common-sensical theories and concepts become necessary.
K. Wexler I Lingua 106 (1998) 23-79
59
of linguistic development. Experience in other areas of linguistics as well as in science more generally suggests that they will. The question to ask is, why are finite sentences sometimes not the representation used by the OI child, especially since, as Wexler (1990ff.) argues, the child knows completely correctly all the relevant finite morphology? Let us look at the structure and derivation of a finite sentence using the syntax that we have proposed, following fairly standard assumptions. (34) a. AGRS[D] TNS[D] [ V P DPV...] b. UoRspDPj AGRS0 [TOSpt, TNS0 [ypt, V...]]] As before, for illustrative purposed I am using English SVO (left-headed) word order, but the structural properties apply to right-headed languages as well; the order of constituents is irrelevant. (34a) shows the left to right order of the functional projections AGRS and TNS, followed by the VP with its subject DP. As we assumed in (27), both AGRS and TNS have a D-feature, which implies that a D-feature must raise to check them off since features of functional categories like AGRS and TNS are -Interpretable and -Interpretable features must be eliminated. Following usual theory, the subject DP first raises up to check off the D-feature of TNS and then raises up to check off the D-feature of AGRS. The derivation is shown in (34b), with traces of the subject DP in the original position of the subject in the VP and in its intermediate position, after the first movement, in Spec,TNS. The final position is Spec,AGRS. What could go wrong for a young child with this derivation of a finite sentence in (34b), given that the child knows the morphology associated with AGR and TNS and the relevant syntax? What might be problematic for the child is the fact that two checkings are required, that DP must check its D-feature twice, first against the Dfeature of TNS, then against the D-feature of AGR. (35) Unique Checking Constraint (UCC) (on kids in Ol-stage) The D-Feature of DP can only check against one functional category. Sometimes a child produces finite utterances, and UCC doesn't apply (we will later discuss why UCC only seems to apply sometimes). Consider a simple declarative sentence. Suppose UCC does not apply to a child's representation. Then the child carries out the derivation in (34), checking the D-feature of the subject twice, and producing an adult finite sentence as in (36a). (36) a. Mary likes ice cream b. Mary like ice cream Now suppose the child attempts to derive a representation for a simple declarative sentence, but assume that this time the UCC (35) does apply. The child will have a beginning representation like (34a). The D-feature on TNS will attract the D-feature of the subject DP and the D-feature on TNS will be checked off. But now AGRS
60
K. Wexler I Lingua 106 (1998) 23-79
must have its D-feature checked. However, since we have assumed that the Unique Checking Constraint (UCC) in (35) applied, this D-feature, having already checked the D-feature on TNS, can not check a second D-feature, on AGRS. Thus the terminating representation will be (37). (37) AGRS[D] [DPjTNS [ V p t j V . . . ] In (37) I have indicated by the notation [D] that the D-feature of AGRS remains unchecked. Since by (27), the D-feature of AGRS is must be eliminated by LF (since, as the feature of a functional category (target) it is -Interpretable), the representation in (37) fails to converge, and is ruled out as ungrammatical by the grammar. That is, we assume (38), the unmarked fundamental assumption on children's grammars, since it is UG-consistent, and an assumption for which there is excellent evidence given all the knowledge of early syntax that we have demonstrated, in particular the demonstration that the subject raises up to the left of negation in English and similar languages: (38) Children in the OI stage know that: a. the features of functional categories are -Interpretable b. a derivation with unchecked -Interpretable features doesn't converge (is ungrammatical) Given that we have assumed that UCC holds and have shown that the derived representation in (37) then fails to converge, and that the child knows this (38), how can the child obtain a grammatical representation? We are continuing to assume that the child in this instance is governed by UCC. Given that UCC holds there is no way for the lexically-induced representation (34a) to converge. Therefore, the child attempts the minimal meaning-preserving change to (34) that will allow convergence given UCC.32 This change will be to only allow one INFL functional category, either AGR or TNS, so that all (i.e. only one) D-features are checked, even given UCC. Thus the child uses either representation (39a) or representation (39b): (39) a. AGRS[D] [VP DP V...] b. TNS[D] [ V P DP V...] That is, either TNS is omitted (39a) or AGR is omitted (39b).33 The subject DP in either case is raised to check off the D-feature of the INFL functional category. 32
For simplicity of presentation I am talking about what the 'child attempts', etc., but I definitely do not intend that as a serial order proposal or at all as a production model or processing model of any kind. I am talking about alternatives in the grammar of the child. When I discuss implementations of UCC later in this section (for example, as Minimize Violations) this will become quite clear. The 'alternatives' are no more to be looked as a alternatives in a production model than are alternatives in any kind of grammatical theory which compares potential representations. 33 One important point which there is no room to discuss here is that it should be possible to distinguish more than one kind of OI in some languages, depending on whehter AGR or TNS is omitted. For
K. Wexler I Lingua 106 (1998) 23-79
61
Since there are no unchecked (i.e. undeleted) -Interpretable features, the derivation proceeding from (39a) or (39b) converges. But given the child's knowledge of Distributed Morphology and the morphophonological features of the language, as discussed in the presentation of ATOM in Section 8, in some cases the adult inflectional material (made necessary by the presence of both AGR and TNS) will not be spelled out; rather the inflectional material appropriate to representation (39a) or (39b) will be spelled out. In the case of English third person singular present main verbs it turns out that in either case (39a or 39b) the spellout is the same - as the zero morpheme, and (36b) will result, in this case in which UCC holds. Does UCC apply only to cases of overt movement, or to LF movement as well? The simplest, unmarked assumption of UCC as a constraint on the computational system would be that it applies to cases of double checking whether or not the checking is overt. To test this assumption consider VSO languages like Irish, discussed in Section 9. As we showed there, the D-feature of AGRS in Irish is weak, movement of the subject DP to AGRS and checking of the D-feature of AGRS only occurring at LF, thus accounting for the VSO order. We also argued that Irish is an OI language, with the SVO order for root clauses possible in the OI stage with a non-finite verb. Thus AGRS or TNS must be missing in the OI structures as we argued. This result follows from UCC so long as UCC rules out cases of double Dchecking even if one (or more) of the checking movements is covert. Thus the simplest unmarked assumption - that UCC is oblivious to the 'overtness' of the checking movement - seems correct.34 11.2. The interface nature of the requirement for AGR/TNS Thus we have derived the AGR/TNS Omission Model from the Unique Checking Constraint.35 That is, we have shown that UCC derives properties (33a-d). Since we know that ATOM is satisfied empirically over a wide range of developmental phenomena, UCC is a good candidate for a developmental constraint on early grammar. some detailed discussion relating to Dutch morphology and development, see Wexler et al. (1998). Also note that we would expect a kind of default agreement to be used in languages which have such (i.e. where certain person/number combinations weren't specified on some agreement morphemes) even when an 'infinitive' isn't predicted. Thus in French, if the third person singular agreement morpheme isn't specified with agreement features it will appear when AGR is omitted, rather than an infinitival. Such phenomena have been observed by Ferdinand (1997). 34 The fact that UCC applies to covert movement as well as overt movement is an argument against a 'processing' or 'resource limitation' account of UCC. Why should processing restrictions apply to covert movement? I know of no processing results - in adult or child language - which would argue that processing limitations apply to covert movement. 35 Borer and Wexler (1992) argued for the Unique External Argument Prota-Principle (UEAPP) in very early child grammar. UEAPP says informally that a subject may only be the subject of one verbal element (so that the subject may not be the subject of an auxiliary and a participle, for example. In some ways, UCC is reminiscent of UEAPP, thought stated in current syntactic terminology; UCC says a subject (i.e. a subject's D-feature) can't check twice. Details differ between the two proposals, it would be useful to see if they could be integrated so that the phenomena that UEAPP explains (different from the ones discussed in this paper) could be understood in UCC terms.
62
K. Wexler I Lingua 106 (1998) 23-79
Before going on in the next section to see how UCC also derives the crucial NS/OI (33e), we should discuss some of the properties of UCC and how it may be implemented in the grammar. First, I have stated UCC in one natural way given current Minimalist (checking) theory, which works in deriving the OI stage (the ATOM). Namely, the D-feature on DP can only check once. However, there are a variety of other ways of stating UCC which could also derive ATOM, following the leading idea that it is the double checking/movement process which causes a functional category to be omitted, so that convergence takes place. Perhaps in the OI stage, subject DP's can only check once, no matter which feature is being checked, and the 2 D-feature checking is only a special case. Or perhaps if checking isn't the right mechanism to state UCC, there is some other way of stating the constraint. Perhaps we should state that a subject (or some other DP?) can only move once, not twice or more, so that ATOM derives from the inability of the DP to undergo two movements, first to TNS and then to AGR. Such a proposal might have different empirical consequences than the constraint against double-checking, depending on whether DP's move for reasons other than D-checking. Since Chomsky (1995, Chapter 4) proposes that wh-movement is also motivated by D-checking, it's not clear what non-D-checking-motivated movements there are. Suppose that some movements, such as scrambling, topic-movement, etc., are motivated by particular semantic features rather than by the checking of a D-feature, though this is far from clear (after all, topicalization is close to wh-movement, and scrambling is usually thought to involve some 'specificity' feature, and D is the home of specificity). If there are such non-D-motivated movements of DP, then a constraint against double movement should prevent topicalization, after a single movement to AGR or to TNS. Thus Italian children in the OI stage should not allow topicalization, although they don't have OI's (as we later derive). There will be a variety of other consequences, but there is no room here for the careful syntactic analysis that would be required to attempt to distinguish consequences of a single movement hypothesis from the single checking hypothesis that I present. There are probably other ways of implementing UCC, depending on syntactic theory and, crucially, on what further empirical cases it is applied to. Further empirical investigation of children's grammars will be crucial in determining the best way of implementing UCC. For the moment the way I have stated UCC (as a constraint against double checking) works for the OI stage. Note that UCC applies to DP's in general, and not just to subjects. Thus UCC will apply, for example, to direct objects that have to undergo double movement/checking. Hagstrom (mss.) has shown that UCC in fact applies to cases of double movement/checking of direct objects in child Korean (in the OI age range), deriving the existence of word order errors involving the object and negation. To the extent that the UCC predicts the properties of the OI stage and applies also to constructions which don't involve the grammatical subject and associated (AGRS, TNS) INFL projections, it will turn out that the explanation for the OI stage cannot be found in a child deficit concerning the particular subject-related INFL projections. Rather, the explanation will be broader based, affecting the computational system of
K. Wexler I Lingua 106 (1998) 23-79
63
the syntax in more general ways, although the system of verbal morphology and its associated (subject) specifier has been the focus of research to date. At the moment the question remains open, but the general direction of research investigation is clear. What is the nature of the child 'deficit' that causes UCC? There are various possibilities. On the one hand, it may be that the child's syntax simply states UCC, thus causing a more restrictive syntax than the adult UG has. This could be true to the extent that the child only allows the single-checking against INFL, and doesn't allow double checking. Consider this possibility for the moment.36 The child has a more 'restrictive' syntax than exists in adult UG, since grammatical (convergent) possibilities in adult UG are ruled out in the OI child (i.e. double-checking derivations, which converge in adult UG are ruled out as non-convergent in Ol/UCC-constrained syntax). This restriction is thus an instance of Borer and Wexler's (1992) UG-Constrained Maturation, the defining characteristic of which is that the child's possible grammatical representations are a subset of UG representations, with 'wild' (nonUG compatible) representations ruled out. Thus the move from OI syntax to adult syntax is a case of pure 'growth', with representational possibilities added over time, but no non-UG possibilities ever allowed. How is this possible, given that we have assumed that the OI child - unlike the adult - allows simple clauses with either AGR or TNS missing, i.e. clauses with the structure (39)? There are various possibilities. One reasonable analysis seems to be that the OI child knows that AGRS and TNS are required in simple clauses, but that UCC forces one of them to be omitted if the derivation is to converge. Recall that the presence of AGRS and (finite) TNS in a root clause does not appear to be a consequence of the computational syntax of UG. There is no 'principle' of Minimalist Syntax, say, that forces the existence of AGRS or TNS. Consider the most 'motivated' projection - TNS - most motivated because it is motivated on semantic or interpretive grounds. But exactly what is the 'principle' that forces the existence of finite TNS in a root clause? Assuming that it is something 'interpretive' (the best guess), some kind of 'anchoring' (as in En£, 1987), by definition this 'interpretive' property (a property of the interface between syntax and the 'conceptual' system) does not play a role in 'convergence', a formal/syntactic/computational property. Thus OI clauses converge; their oddness is due to something interpretive/conceptual. In particular, the structures (and the associated derivations) in (39) converge, so far as can be told from current syntactic theory. Thus the OI child has ruled out as non-convergent some UG-convergent structures (double-checking ones such as (34)), and the price she pays is that in order to obtain a close-enough match that converges in her grammar she has to assume some interpretively-odd (i.e. interfaceodd), but convergent structures (like (39)). On this view we can maintain that all structures which converge for the child also converge for the adult.
36
We will return to the 'optionality' that seems to be inherent in the OI stage, since finite sentences are also grammatical in this stage, and discuss the question of whether UCC has to be seen as an 'optional' principle. The answer will be no.
64
K. Wexler I Lingua 106 (1998) 23-79
(40) UG-Compatible Convergence (UGCC) If a structure converges in child syntax, then it converges in UG. UG-Compatible Convergence is a refinement of UG-Restricted Maturation, making clear that what is specifically disallowed in child grammar is a derivation which is looked on as grammatical in the computational system of the syntax (i.e. convergent) for the child if that derivation is not looked on as grammatical in the computational system of the syntax (i.e. convergent) for the adult. The intention of (40) is that 'wild grammars' - those which allow structures to converge which UG doesn't allow to converge - are not permitted in child grammar.37 It is thus gratifying that a property (UG-Restricted Maturation) which has been taken to cover so much of grammatical growth can also be implemented in a natural way to cover the large range of empirical properties discovered about the OI stage. However, we should ask what it means for the 'interpretive/conceptual' properties which require AGR and TNS to be violable by the child. 11.3. Minimize violations Note that we don't have to assume that the child doesn't know that AGRS and TNS are required in simple declarative clauses (except for exhortatives, etc.). The child could very well know this property, but be willing to sometimes violate it in order to get a convergent derivation, when this is demanded by UCC. We can assume (41): (41) Minimize Violations (MV) Given an LF, choose a numeration whose derivation violates as few grammatical properties as possible. If two numerations are both minimal violators, either one may be chosen.38 Let's assume that Minimize Violations applies in child grammar.39 Suppose the child's grammar attempts to derive sentence (36a). Suppose the numeration (the 37 UGCC is not a principle of grammar, adult or child. Rather it is a (hypothesis about a) property of child grammars. As an anonymous reviewer points out, UGCC will have to contend with many problems. For example, English-speaking children often produce questions without inversion. How do these converge for adult grammar? We would have to ask whether non-inversion structures converge for the child. If they do, we would have to ask what their structure is, and whether with this structure they converge in UG. I am not prepared to give a discussion of this topic here, but the answer isn't obvious without investigation. 38 The idea is that numerations are chosen so as to yield an LF. Presumably, say, TNS is required for a certain LF, but when TNS (and AGR) is chosen UCC is violated. An enumeration without TNS may yield the same LF (with tense/time filled in from 'context' but may violate whatever property requries TNS in a root structure. For a different view on how competing numerations may enter into the explanation of the OI stage, see Schiitze (1997). 39 Putting off the question of whether Minimize Violations applies in adult grammar; it very well might, though with a slightly different grammar (e.g. no UCC) the consequences will be different in certain cases.
K. Wexler I Lingua 106 (1998) 23-79
65
words which enter into the sentence) include the words in (36a), with the verb likes as one member of the numeration set. This sentence doesn't converge given UCC, since (37) results, and (37) has an unchecked (-Interpretable) D-feature. To derive (36a) so that no D-feature is left would require a violation of UCC, but no violation of the interpretive/conceptual property which requires AGRS and TNS. On the other hand, suppose the child's grammar attempts to derive a sentence beginning with the numeration of the words in (36b), which are identical to (36a) except that the verb is run. Deriving this sentence (as in (39a or 39b), with AGRS or TNS missing, involves no violation of UCC but does involve a violation of the interpretive/conceptual property which requires AGRS and TNS in a sentence. Given that both numerations (from 36a or from 36b) are both minimal violators (one violation in each case), either representation is chosen as the grammatical representation. Thus the 'optional' behavior in the OI stage. In adults, there are no grammatical properties violated in the derivation of (36a), and one violation of a grammatical property (AGRS or TNS is missing) in the derivation of (36b). Thus (36a) is the only possible sentence. In the OI child, on the other hand, there is one violation (of UCC) in the derivation of (36a) and one violation (that AGRS or TNS is missing) in the derivation of (36b). Thus either (36a) or (36b) may be chosen in OI grammar (one governed by UCC), since they are both minimal violators. 'Optionality' in the OI stage thus is a consequence of a tie between two representations, each of which violate one property of child syntax. It happens that one of those properties (the requirement that AGRS and TNS appear) is also a property of adult syntax. But UCC is not a property of adult syntax. Therefore we get slightly different systems in the child and adult, both being consequences of the same fundamental system (with MV, etc.).40 40
The implementation of UCC via MV predicts, on the simplest assumptions, that finite sentences such as (36a) have the same grammaticality status as the OI sentences (36b). Since both finite and OI sentences are produced in the OI stage, this assumption seems as if it is confirmed. In general it is difficult to tell from the proportions of finite and OI sentences produced whether one is 'more' grammatical than the other. The basic facts are that the proportion of finite sentences increases over time in the OI stage, with the earliest stages showing extremely few finite sentences. Wexler (1990ff.) suggests that it is conceivable that there is a (very young) stage during which only OI's are produced, but the data is too scanty to determine whether this is true, in general. So, if proportion of finite sentences were to determine which form were 'more' grammatical, we would have to say it changed over time. So the production data are consistent with an equivalent grammaticality status for finite and OI sentences. The one potentially testable prediction of the MV view might be that, if grammaticality judgment tests were conducted, finite sentences would be judged to have the same grammaticality status as OI sentences, namely grammatical. The only judgment data about the OI Stage of which I am aware is on English. Rice et al. (in press) have shown that English-speaking children in the OI stage judge OI sentences (e.g. (i)) to be more grammatically than sentences with agreement violations (e.g. (iii)). Grammatical (finite, with no agreement violation, e.g. (ii)) sentences are judged somewhat better than OI sentences (i), but the OI sentences are closer to the grammatical (ii) judgments than to the ungrammatical (iii) judgments. (i) Mary go/going (ii) Mary goes/is going
66
K. Wexler I Lingua 106 (1998) 23-79
Notice that both numerations 'converge' for the adult. One is simply the adult structure. The other involves no syntactic violations, and is therefore convergent, with the functional category (AGRS or TNS) missing. That is, nothing about the structure is non-convergent so far as can be told from current syntax. (Of course we have to assume that if AGRS is missing, no case feature is selected from the lexicon on the subject DP.41 But exactly such an assumption of 'no case filter' is what Minimalism requires, as Chomsky (1995: Chapter 4) discusses). The Minimize Violations view of the UCC appears to substantiate work in other areas of linguistic development. Babyonyshev et al. (1994) investigate the growth/maturation of Argument-chains (A-chains) (Borer and Wexler, 1987, 1992) by applying the theory to the Genitive of Negation in Russian. Borer and Wexler (1992) predicted that children who don't find A-chains grammatical will give an unergative analysis (with subjects directly generated in subject position), to unaccusatives. Babyonyshev et al.'s (1994) experiment shows that children before five sometimes give an A-chain analysis to unaccusatives and sometimes don't (i.e. the same child does one or the other). They explain this 'optional' result by assuming that children have a *A-chain constraint (i.e. nontrivial A-chains are ungrammatical) in their syntax but also know UTAH, an interpretive/interface condition, which predicts that unaccusatives must generate their subject in object position. Thus any unaccusative for a *A-chain child will violate either *A-chain, if it is given an unaccusative analysis, or will violate UTAH if it is given an unergative analysis. Thus either of these analyses - unaccusative or unergative - can be chosen; they are equivalent with respect to the number of violations. Thus it may turn out that the MV view, with violation of one or the other constraint possible, is a general explanation for what appears to be 'optional' behavior in children which is obligatory in adults. The Minimize Violations implementation of the UCC thus explains in a natural way the 'optionality' property of the OI stage without recourse to mechanisms of optionality that don't exist in the adult grammar. The optionality is the result of a stronger UG constraint than exists in adult UG, but not from the result of a fundamental difference in computational syntax, such that optionality is a property of OI grammar but not of adult grammar.42
(iii) Mary am going/I goes Thus the results are somewhat equivocal on this issue. Although clearly OI sentences are more grammatical than are OI sentences, it's not absolutely clear whether finite, agreeing sentences are more grammatical than are OI sentences or whether they are equivalently grammatical. The ideal result given the point of view just presented would be for OI sentences to be as grammatical as finite agreeing sentences, at least during the heart of the OI stage. This may yet turn out to be true. Experiments in other OI languages would be extremely useful in this regard. It would be good to compare finite agreeing sentences against sentences which don't violate UCC, and violate no other constraint, but since simple grammatical sentences, on the syntactic view that we have presented, always involve a violation of UCC, it is not clear how to do this. 41 If NOM case, say, is selected, then it won't meet the requirement which pairs AGR with it. 42 The MV view of UCC seems to imply that computational constraints (like UCC) and interpretive/interface constraints (like the requirement for AGR/TNS) have the same status in child grammar, since either constraint may be violated if one of them has to be violated. Suppose we wanted
K. Wexler I Lingua 106 (1998) 23-79
67
11.4. Interpretable features? UCC (35) simply assumes that double-checking cannot take place grammatically in the OI child. Is it possible to derive UCC from more fundamental assumptions? A potential avenue for exploration focuses on what it is in UG which allows or forbids double-checking at all. Chomsky (1995) assumes that features can be divided into interpretable features (those which must stay around in a derivation till LF) and uninterpretable features (those which don't stay around until LF because they play no interpretive role). Interpretable features don't delete after checking; after all, they must show up at LF. Thus interpretable features can (and often do) check more than once. Uninterpretable features delete after checking, so they can check only one time.
to attempt an implementation of the OI stage which in fact did treat purely syntactic/computational constraints as non-violable by the child, and interpretive/conceptual/interface constraints as violable if necessary to maintain the non-violation of the computational system. We could assume (i) instead of MV (41). (i) Preference for Convergence (PC) Given a choice between a convergent derivation (in child syntax) which omits an interpretively motivated functional projection and a non-convergent (in child syntax) derivation which includes the projection, choose the convergent derivation. The Preference for Convergence interpretation of UCC thus chooses an OI representation, inw hich AGR or TNS is omitted, over a representation in which double-checking of D is permitted. Thus PC implies that finite sentences are always ungrammatical for the child, the OI sentence always being preferred. Thus we would have to specify that UCC optionally applies, an unexplained stipulation at odds with the spirit of Minimalism. One possibility is that UCC is not a syntactic/computational constraint after all, but is the consequence of 'processing' problems in the child. For some reason, the OI child's processing system can't always check a feature twice, or can't always move the same constituent twice, or has some related property that makes the derivation of finite sentences complicated. We would also have to assume that the processing system is subject to 'optional' behavior in a way in which the syntactic system is not. This is a possible explanation, but we should be careful before following the time-honored (but rarely supported) route of assuming that all child deviations are due to the processing system. See Wexler (in press) for arguments against this being any kind of preference, and also see Grain and Wexler (in press) and Grain and Thornton, (1998) for the Modularity Matching Model, which argues that the general spirit of linguistic theory as well as empirical results imply that child processing systems should be looked on as having the same essential properties as the adult systems. There is no independent evidence for the processing 'difficulty' of double-checking, and children in the OI stage show excellent ability to carry out many operations in a sentence (for example, to assign case and agreement correctly). There is no empirical evidence or theoretical analysis of children's processing abilities/difficulties which would support this claim. It might be just another case of ignoring the problem. On the other hand, it is a possibility that UCC is due to some kind of processing difficulty (of a highly special kind, yet to be discovered), a claim which is difficult to evaluate empirically given the underdeveloped state of processing theory and data in children. The PC view implies that finite sentences are derived by UCC not applying and are therefore totally grammatical, whereas OI sentences violate an interpretive/interface constraint. Thus finite sentences should be judged as more grammatical than OI sentences. These predictions are different from the predictions made by the MV without PC view, discussed in a previous footnote. As I discussed in that footnote, the empirical results are not completely clear. Also, if 'processing' is the explanation for the UCC and its optional application, then it is not clear how processing will affect grammaticality judgments.
68
K. Wexler I Lingua 106 (1998) 23-79
Case features, for example, having no interpretive role, are [-interp], and don't stay around until LF, deleting after checking once. Thus more than one NOM can't be checked by the same INFL (I put aside questions of multiple specs, as in Ura, 1996). D, on the other hand, does play an interpretive role, thus is [+interp] when it appears on a DP (as we pointed out before, D is -Interpretable when it is on a functional category like AGRS or TNS). A D-feature thereby can check more than once. An example is a raising construction, as in (25), repeated here as (44). The subject DP Mary has a D-feature which first checks the D-feature of the -fin INFL in the embedded clause. This D-feature (since it is [+interp] does not delete after this checking. The subject Mary then raises to the higher +fin INFL and the same D-feature on Mary checks the D-feature on this INFL, thus double-checking. (44) Mary INFL(+FIN) seems [ e t to INFL(-FIN) [VP e2 like it here] Another example is the derivation of a finite sentence that we have assumed in (34), repeated here as (45). The DP raises to TNS, and its D-feature checks off the D-feature of TNS. The D-feature on DP, being [+interp], does not delete. The DP then raises to AGRS, and the same D-feature on the DP checks off the D-feature of AGRS. Thus double-checking, by a D-feature. (45) a. AGRS[D] TNS[D] [ VP DP V...] Pi AGRS0 Uspt, TNS0 [ VP ti V...]]] Thus on the view of UG in Chomsky (1995), a feature can check more than once if and only if it is [+interpretable]. Since the UCC forbids a double-checking in the child where it is allowed in the adult, this immediately suggest the following hypothesis to explain why UCC holds in the OI child. (46) For the OI child, D can not only be [+ interp] (as it is for adult) but also [-interp], as it is not for the adult. Consider a finite sentence, such as (34)/(45). This sentence requires the D-feature on the subject to check twice, against TNS and then against AGRS. If a child chooses a [+interp] D, the derivation converges, as it does for adult. Thus finite sentences are grammatical for the OI child. Suppose, on the other hand, the child chooses [-interp] D on the DP, as is allowed by (46). Then the D-feature of DP deletes after checking one INFL category. Thus in (34)/(45), after checking D on TNS, the D-feature of the DP deletes, and no further checking can take place. Thus the D-feature of AGRS remains unchecked, as in (39). Thus the derivation crashes. But if the child omits AGRS or TNS, as in (39), one checking of the D-feature on the subject DP is necessary. The ([-interp]) D-feature on DP deletes after checking the D-feature on the one INFL category in (39) (whether AGRS or TNS), but it doesn't cause the derivation to crash, since no noncheckeD-features remain.
K. Wexler I Lingua 106 (1998) 23-79
69
Thus the effects of UCC can be derived via a misunderstanding of the interpretive conditions in D. The child has to consider D as [-interp] sometimes. This might be the result of a possible interface/pragmatic deficit. Children in the OI stage often delete determiners,43 although it isn't clear what relation there is between determiner omission and interpretability.44 Furthermore, there is evidence (Maratsos, 1976; Karmiloff-Smith, 1979; Hickmann et al., 1996; Kail and Hickmann, 1992 among others) that young children often use the instead of a. Thus children don't necessarily interpret specificity correctly, possibly not understanding that D is always interpretable. Det is the home of specificity (Or of old/versus new information).45 Thus this may also be suggestive evidence that children don't understand the interpretive features on DET.46 If the effects of UCC are due to a misunderstanding of the interpretive characteristics of D (46), then we expect that double-checking only of D will be ruled out by the OI child. We don't know enough about other properties to test this assumption at the moment. At any rate, we should consider both UCC as a fundamental constraint on OI grammar and the derivation of UCC from flaws in the knowledge of interpretation of features as potential hypotheses as to why ATOM holds in the OI stage. 12. The derivation of NS/OI We have seen how the ATOM and raising of subjects even in OI's follow from the Unique Checking Constraint, on the assumption that D-checking as the motiva43
Though in general determiners in OI languages are not deleted as often as TNS/AGR is deleted. Enough hasn't been studied about this topic but there's some good quantitative evidence about English in Rice and Wexler (1996 ) and in Dutch in unpublished work I am conducting with Jeannette Schaeffer and Gerard Bol. 44 Suppose determiners are licensed strictly through their interpretable features. If a determiner is treated by a child as [-interp] there may be no further licensing property for the determiner and it is therefore omitted. Compare this situation with Case, which is licensed by formal features. NOM case, for example is licensed by AGRS. As discussed earlier (and see Schiitze and Wexler, 1996a,b), we know that NOM is almost never deleted if AGRS appears. Of course the speculations in this footnote must be supported by an understanding of the licensing conditions for determiners. Another way of thinking about the assumption that kids think D can be [-interp] is that they are sometimes treating D as pleonastic in some sense, a formal feature without interpretation. 45 Hoekstra et al. (1996) have attempted to relate Determiners and TENSE as has Wexler (1996), but the models presented quite clearly different than the one presented here. In those models it's the semantic properties of DET and TNS that are compared (having to do with particular features, for example number features in Hoekstra et al., and referential features in Wexler). The UCC, on the other hand, involves a particular computational constraint on checking. 46 Wexler and Chien (1985) and Chien and Wexler (1990) and many other references argue that 'Delay of Principle B' effect is due to pragmatic conditions on TNS. Avrutin and Wexler (1995) and Avrutin (1992) relate the definite for indefinite errors mentioned above (the instead of a) to the Principle B effect. Thus the OI stage might be tied together with the Delay of Principle B phenomenon. A problem for this analysis is that the Delay of Principle B effect lasts much longer (to around 5) than does the OI stage. Thus although there may be a pragmatic/interpretive basis for both late developments, it is unlikely that is exactly the same pragmatic/interpretive deficit.
70
K. Wexler I Lingua 106 (1998) 23-79
tor for the Extended Projection Principle. It remains only to show that NS/OI - the puzzling generalization that the OI stage exists if and only if the adult language being developed is an INFL-licensed null-subject language - follows from UCC. At first sight it seems that UCC is in no position to derive NS/OI. For UCC simply disallows the D-feature on DP to check twice, to check two functional categories, thereby forcing the omission of AGR or TNS, with its associated D-feature. Why shouldn't this apply in all languages, since we assume that the hierarchy of functional categories is identical in finite clauses in both null-subject and non-nullsubject languages. It turns out, however, that the nature of null-subject languages is exactly such that UCC won't apply. Recall that we would are maintaining VEPS, so that all relevant parameters are set correctly in both null-subject and non-null-subject languages. Moreover, we make the unmarked assumption that development proceeds equivalently with respect to constraints that grow in all languages. Thus UCC holds in null-subject languages like Italian at a given age just as much as it does in French or German or English. Why then doesn't UCC force Italian children in the OI age range to drop AGR or TNS, thus displaying OI phenomena? The crucial reason is (47): (47) In INFL-licensed null-subject languages, AGR is not D In null-subject languages like Italian, intuitively and traditionally AGR is D; it doesn't need D, which is what the D-feature on AGR means, that AGR needs to be checked against some subject D-feature. The traditional idea, and what still underlies the analysis of null-subject grammars in contemporary syntactic theory (e.g. Rizzi, 1994a,b; Zagona, 1982) is that AGR is pronominal or nominal in some sense. The pronominal nature of AGR in null-subject languages is what makes a subject unnecessary. The strength (pronominal nature) of AGR is what makes null-subject grammars like Italian possible. AGR itself licenses or identifies (we won't distinguish these notions here) the null-subject. Thus in null-subject grammars, AGR is pronominal. How is this to be characterized? Essentially pronominals are D's. I suggest that the +D-feature is an extremely natural way to characterize AGR in a null-subject language, capturing within Minimalist Theory the traditional idea. But if AGR is +D, it can't be D, that is, it can't have the requirement that it needs a D-feature to check it. AGR is D, it doesn't need aD. Thus (47) simply states a property of null-subject languages and in fact we should state the property more fully as (48): (48) AGR is D in a language if and only if the language is an INFL-licensed nullsubject language47 47
In (47) and (48) I don't commit to whether AGR is itself +D when it is not -D; this is a matter of execution which doesn't seem crucial here; probably the +D nature of AGR is a natural way to implement the hypothesis.
K. Wexler I Lingua 106 (1998) 23-79
71
When AGR is D, then the subject must raise to it so that the D-feature of the subject can check off the D-feature of AGR. When AGR is not D, there is no D-feature on AGR which must be checked off by the subject. Therefore AGR does not have to attract the subject, at least for reasons of checking off D.48 We have no reason to think that the D-feature doesn't exist on TNS in Italian as well as in the non-null-subject languages. So, in contrast to finite clauses in the nonnull-subject languages, which have the pre-movement/checking analysis in (34a) (repeated as 49a) and the final representation in (34b)(repeated as 49b), finite clauses in null-subject languages have the pre-checking analysis in (50a) and final representation in (50b). (49) Finite clause analysis in non-null-subject language a. AGRS[D] TNS[D] [ VP DP V...] b- UoRSpDPi AGRS0 [TNSPti TNS0 [ V pt, V...]]] (50) Finite clauses analysis in null-subject language a. AGRS TNS[D] [ VP DP V...] b. UGRSP AGRSo tiNSpDP, TNS0 [vpt, V...]]]49 Since there is no D-feature on AGRS in null-subject languages, the subject DP doesn't raise to AGRS to check off a D-feature. (See the last footnote for the question of whether the subject raises to AGRS at all. Whatever the status of this question, it is clear that it is not a D-feature that drives the movement in null-subject languages.) Rather, the subject DP only raises to TNS to check off the D-feature on TNS. Now suppose UCC applies. As we have seen in non-null-subject languages, since double-checking can't take place, we would wind up with (37) (repeated as (5la) instead of (49a)). The D-feature on AGRS is unchecked and the derivation crashes. On the other hand, in a null-subject language like Italian, there is no D-feature on AGRS; thus UCC only applies vacuously. UCC says that DP can't check a D-feature a second time, but since there is no second D-feature in a null-subject language, UCC doesn't change the derivation. The output representation (51b) is the same as the adult (non-UCC-constrained) output representation (50b). Since there is no uncheckeD-feature in (51b), this representation doesn't crash. 48
I won't work out the detailed syntax of the null-subject here, in particular the question of whether the subject raises to AGR for other reasons. But the proposal I have made in the text seems to me quite likely to be on the right track, given current work on null-subject languages. Let's assume the empty subject (pro) is in Spec,AGR, not because it has been attracted by some -D-feature, but rather for reasons of identification; pro is deficient in features, which must check against AGR. What about lexical subjects? Since they have agreement features which must match AGR, haven't they raised to AGR? Barbosa (1996) has argued that lexical subjects in the null-subject Romance languages are in a higher position than INFL, presumably in some kind of focus-like position. So presumably these subjects have raised to this higher position to check off some kind of interpretive (focus-like) feature, not to check off a D-feature. Presumably a checking against a -D-feature of AGR is never a process in null-subject languages like Italian. 49 We ignore questions of verb movement, simply leaving the verb in situ in the representations so as not to complicate them.
72
K. Wexler I Lingua 106 (1998) 23-79
(51) Finite clause terminating representation constrained by UCC: a. Non-null-subject language *AGRS[D] [DP1 TNS [VP t; V...] b. Null-subject language AGRS [DP; TNS [VP tj V...] Thus, under UCC, the representation of a finite sentence in a non-null-subject language (5la) crashes, but in a null-subject language the representation of a finite sentence (51b) does not crash. Therefore whereas OI's become good representations in a non-null-subject language, they are not good representations is a null-subject language. Thus we have derived NS/OI. This derivation proceeds no matter which of the alternative implementations of UCC discussed in Section 11 we assume. For example, if we assume Minimize Violations (MV) we showed that in a non-null-subject language, the UCC-constrained representation of a finite clause ((37), (5la)) had one violation, UCC. And the alternative representation in which AGRS or TNS is missing has one violation. So that MV allowed either the finite or OI representation to be chosen. But in a null-subject language, the UCC constrained representation of a finite clause (51b) has no violations, whereas the representation in which AGRS or TNS is omitted has one violation. So MV selects the finite clause as the only grammatical representation. Similarly for the other implementations discussed. For example, if D can sometimes be [-interp] for a child, we assume that the same is true of a child developing Italian. It is just that the deletion of the D-feature after one checking will have no effect in Italian and other null-subject languages because D does not have to check a second time in these languages. What do we have to assume about the child to make this derivation go through? (52) a. The OI child knows the UG syntax of null-subjects (48) b. The OI child knows the correct parameter-setting for AGR in her language, i.e. whether AGR is D or not c. The OI child is subject to UCC, independently of the language being acquired (52a) is simply the assumption that the child knows a UG property, namely whatever properties implicate (48), the relation between a D AGR and the licensing of null-subjects by AGR. (52c) is the one assumption that we have made to derive the OI stage and the ATOM more generally. (52b) is an instantiation of Very Early Parameter-Setting; it simply assumes that the OI child knows the correct setting of this parameter (whether AGR is D or not). Not only is it thus an instance of the general hypothesis of this paper, but in Section 5 we gave some strong evidence that the null-subject parameter is correctly set in the OI age. Thus the assumptions in (52) essentially come for free. NS/OI is forced; if it weren't true we would have something to explain. Crucially, UCC, a developmental constraint, something which the child grows out of, applies to the child universally. It only appears to not apply in null-subject languages, because of the nature of the
K. Wexler I Lingua 106 (1998) 23-79
73
language itself, as instantiated in a parameter value, something which the young child knows. Crucially, the child is the same with respect to underlying developing constraints, the UCC in this case, whether the child is speaking a non-null-subject language with OI's like French or German, or a null-subject-language without OI's, like Italian or Spanish. UCC holds universally. Moreover, the correct parameter-setting has developed universally; it is just not a particular parameter-setting which makes the effects of the developmental constraint (UCC) vacuous. This is exactly the kind of result we should expect; the only thing that has to be learned is a parameter-value, and it is learned, early and well. A part of the genetic program - the UCC- applies in interaction with the parameter-setting to yield what appear to be very different phenomena - a large proportion of OI's in non-null-subject languages, and no (or very few, presumably due to noise) OI's in null-subject languages. Such a result makes us think we are on the right track. We may ask further questions. Since we assume that UCC actually holds in nullsubject languages, though it applies vacuously to the simple finite sentence case, can we find any structures in null-subject languages to which UCC should apply nonvacuously? Consider a structure in Italian with an auxiliary verb, such as the passato prossimo, the periphrastic perfect tense construction, the equivalent of Mary has laughed (meaning Mary laughed) or Mary is come, (meaning Mary came). Belletti (1990) has argued that auxiliaries in Italian head their own phrase. Thus we would expect such a structure to look like (53). (53) AGRS TNS[D] AUX[D] [ VP DP V...] Exactly where the AUX is in (53) with respect to the INFL categories AGRS and TNS isn't important for our purposes, thus probably it starts out below them. But what is crucial is the assumption I have made in (53) that AUX (like TNS but unlike AGRS) has a D-feature. The EPP, which forces movement to TNS, will force movement through an AUX if one exists. Whatever licenses/motivates the [D]-feature on TNS should do the same for an AUX; it 'needs' a subject, i.e. a DP, i.e. it has a [D]-feature. In the adult grammar of Italian, then, DP will first be attracted to AUX, checking off AUX's D-feature and then the DP will be attracted to TNS, checking off TNS's D-feature. But note that two checkings of the D-feature of DP are necessary in the he derivation that applies to (53); first against AUX, then against TNS. Thus the derivation of (53) will violate UCC, unlike the derivation of a simple finite sentence with only a main verb (50), which only has one D-feature. Thus (53) will crash in OI stage Italian (i.e. Italian constrained by UCC) even though OI's are not produced instead of finite main verbs. How can (53) be fixed so that UCC isn't violated? The simplest possible change seems to be to omit AUX.50 50
An alternative is to assume that AGR or TNS is omitted in (53) given UCC and that no auxiliary may then be spelled out given the morphology. I am not sure how to empirically distinguish the alternatives. At any rate, various possibilities converge on the omission of the auxiliary as the way to satisfy UCC.
74
K. Wexler I Lingua 106 (1998) 23-79
For then the sentence will have the representation like that of a finite main verb (50) even though a participle will exist in the sentence instead of a finite verb. Thus UCC predicts that auxiliaries will be omitted in Ol-age Italian. Exactly this occurs, as has been shown in Lyons (1997). Namely, a substantial proportion of auxiliaries are omitted in Ol-age Italian. Thus despite the fact that Italian is not an OI language in the sense of root infinitives appearing instead of finite main verbs, it does display another property of OI languages - auxiliary omissions. And this is exactly the property that UCC predicts Italian should exhibit. 13. Conclusion: On the seamlessness of linguistics and psycholinguistics In this paper I have argued that when something in syntax is learned (a parameter value) it is learned very early. Developmental constraints often take much more time to work themselves out. There is no reason to think that it is learning - experiencedependent properties - that are the cause of different structures used by children. Rather, the genetic program underlying language growth is the cause of many (most? all?) of these structures. Children set parameter values correctly at the earliest observed ages (VEPS). Moreover, they are masters of inflectional morphology, at least with respect to the central cases that have been investigated (VEKI). What takes more time is for oversevere constraints on the nature of UG to be developmentally/genetically relaxed. The OI stage contains linguistic structures which at first sight look strikingly different from adult structures. Deeper analysis, however, suggests that it is the result of one extra, more severe constraint that UG has - the Unique Checking Constraint. The analysis presented predicts what have appeared to be some serious difficulties for the analysis of the OI stage, namely, that subjects of OI raise, that subjects of OI's can receive NOM case, and most strikingly of all, that many languages do not undergo the OI stage. All these properties are seen to follow from UCC and from interesting conclusions drawn in contemporary syntactic theory - especially about the D-motivation for the Extended Projection Principle. In this regard the theory of linguistic development and the theory of syntax have drawn closer together, depending on the same particular assumptions, a very satisfying development. The study of language acquisition turns out to have the capacity to add to linguistic theory - the central questions of the nature of UG and of its development. It seems to me that the actual study of linguistic development now makes it clear that it is to underestimate this capacity of language acquisition to add to theory if one assumes that acquisitional studies are done simply to 'confirm' adult analysis or to show that the child 'has' UG. In my opinion, the theory of linguistics has much to gain by incorporating the problems of acquisition directly; there are already a number of real examples. And acquisitionists must confront the possibility of proposing changes in linguistic theory on the basis of developmental results. Of course, the proposed changes must be based on clear analysis, integrating child and adult data. As a methodological lesson, it seems to me that it might be suggested that the most interesting places in the current state of the art to look at children's language is
K. Wexler I Lingua 106 (1998) 23-79
75
where the child shows different behavior from the adult, where the child appears to have different grammaticality judgments. This is where we have a good chance of learning something about the particular nature of the genetic program underlying UG. This is especially true since we have every reason to believe (VEPS, VEKI) that it isn't learning that is responsible for the difference. One may distinguish two views in the history of Psycholinguistics, even in generative grammar-influenced research. One is the view of generative grammar, which says that performance systems contain grammars. The second view is what I call the Independence view, which I attribute to (at least) Fodor et al. (1974) . The Independence view says that performance systems don't contain grammars (rather performance systems are based on 'strategies'). Independence means that results in psycholinguistics. don't influence analysis in linguistic theory and conversely results from the study of adult language don't influence developmental psycholinguistics. Independence thus has the advantages of allowing each field to go about its business without worrying about the other field. It does simplify research by allowing a related field's results to be ignored. Perhaps for this reason Independence seems to me to have been the dominant view in generative grammar-related psycholinguistics. Despite this seeming advantage, in my view, Independence is the less fruitful view. The fields can influence each other and should. The trade-off for the greater complexity created by having to integrate different kinds of methodologies and domains of phenomena is the greater insight obtained by bringing to bear the different results from both fields, thus constraining the problem in useful ways. In this paper I hope to have gone a small way toward showing that results in developmental psycholinguistics are relevant in detail to the study of the nature of language.
References Avrutin, S., 1995. Psycholinguistic investigations in the theory of reference. Doctoral dissertation, Massachusetts Institute of Technology. Avrutin, S. and K. Wexler, 1992. Development of Principle B in Russian: Coindexation at LF and coreference. Language Acquisition 2(4), 259-306. Babyonyshev, M., R. Fein, J. Ganger, S. Avrutin, D. Pesetsky and K. Wexler, 1994. Maturation of Achains: Preliminary new evidence from the acquisition of Russian unaccusatives. Paper presented at the 19th Boston University Conference on Language Development. (BUCLD 19). Barbosa, P., 1995. Null subjects. Doctoral dissertation, MIT. Bar-Shalom, E. and W. Snyder, 1997. Root infinitives in child Russian: A comparison with Italian and Polish. In: A. Sorace, C. Heycock and R. Shillcock (eds.), Proceedings of the GALA '97 Conference on Language Acquisition, 22-27. Edinburgh: University of Edinburgh. Batchelder, W.H. and K. Wexler, 1979. Suppes' work in the foundations of psychology. In: R.J. Bogdan, (ed.), Patrick Suppes, 149-187. Dordrecht: Reidel. Behrens, H., 1993. Temporal reference in German child language. Doctoral dissertation, University of Amsterdam. Belletti, A., 1990. Generalized verb movement. Torino: Rosenberg and Sellier. Bloom, P., 1990. Subjectless sentences in child language. Linguistic Inquiry 21, 491-504. Bohnacker, U., 1997. Determiner Phrases and the debate on functional categories in early child language. Language Acquisition 6, 49-90.
76
K. Wexler I Lingua 106 (1998) 23-79
Borer, H., 1984. Parametric syntax. Dordrecht: Foris. Borer, H. and K. Wexler, 1987. The maturation of syntax. In: T. Roeper and E. Williams (eds.), Parameter setting, 123-172. Dordrecht: Reidel. Borer, H., and K. Wexler, 1992. Bi-unique relations and the maturation of grammatical principles. Natural Language and Linguistic Theory 10, 147-189. Boser, K., B. Lust, L. Santelmann and J. Whitman, 1992. The syntax of V-2 in early child German grammar: The strong continuity hypothesis. Proceedings of the Northeastern Linguistic Society (NELS) 22, 51-66. Bromberg, H. and K. Wexler, 1995. Null subjects in child wh-questions. In: C. Shiitze, J. Ganger and K. Broihier (eds.), Papers on language processing and acquisition. MIT Working Papers in Linguistics, 26. Brown, R., 1973. A first language: The early stages. Cambridge, MA: Harvard University Press. Brown, R. and C. Hanlon, 1970. Derivational complexity and the order of acquisition of child speech, 11-53. In: J.R. Hayes (ed.), Cognition and the development of language. New York: Wiley. Carnie, A.H., 1995. Non-verbal predication and head-movement. Doctoral dissertation, MIT. Chien, Y.-C. and K. Wexler, 1990. Children's knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition 1(3), 225-295. Chomsky, N., 1995. The minimalist program. Cambridge, MA: MIT Press. Clahsen, H., 1991. Constraints on parameter setting: A grammatical analysis of some acquisition stages in German child language. Language Acquisition 1(4), 361-391. Clahsen, H., M. Penke and T. Parodi, 1994. Functional categories in early child German. Language Acquisition 3(4), 395-429. Grain, S. and R. Thornton, 1998. Investigations in universal grammar: A guide to experiments on the acquisition of syntax and semantics. Cambridge, MA: MIT Press. Grain, S. and K. Wexler, (in press). On methodology in the study of language acquisition: A minimalist/modular approach. In: W.C. Ritchie and T.K. Bhatia (eds.), Handbook of language acquisition. San Diego, CA: Academic Press. Crisma, P. 1992. On the acquisition of wh-questions in French. Geneva Generative Papers 0, 115-122. Deprez, V. and A. Pierc 1994. Crosslinguistic evidence for functional projections in early child grammar. In: T. Hoekstra, B.D. Schwartz (eds.), Language acquisition studies in generative grammar: Papers in honor of Kenneth Wexler from the 1991 GLOW Workshops. Philadelphia, PA: Benjamins. Dresher, E. and J.D. Kaye, 1990. A computational learning model for metrical phonology. Cognition 34,137-195. Elman, J.L, E.A. Bates, M.H. Johnson, A. Karmiloff-Smith, D. Parisi, and K. Plunkett, 1996. Rethinking Innateness: A connectionist perspective on development. Cambridge, MA: MIT Press. En$, M., 1987. Anchoring conditions for tense. Linguistic Inquiry 18(4), 633-757. Ferdinand, A., 1997. The development of agreement in early French. Doctoral dissertation, University of Amsterdam. Fodor, J., T. Bever and M. Garrett 1974. The psychology of language. New York: McGraw-Hill. Franks, S., 1995. Parameters of Slavic morphosyntax. New York: Oxford University Press. Fukui, N., 1988. Deriving the differences between English and Japanese: A case study in parametric syntax. English Linguistics 5, 249-270. Gibson, E. and K. Wexler, 1994. Triggers. Linguistic Inquiry 25(3), 407-454. Gleitman, L.R. and E. Wanner (eds.), 1982. Language acquisition: The state of the art. In: E. Wanner, and L.R. Gleitman (eds.), Language acquisition: The state of the art, 3^4-8. Cambridge: Cambridge University Press. Greenfield, P., 1991. Language, tools and the brain: The ontogeny and phylogeny of hierarchically organized sequential behavior. Behavioral and Brain Sciences 14, 531-595. Grimm, H., 1993. Syntax and morphological difficulties in German-speaking children with Specific Language Impairment: Implications for diagnosis and intervention. In: H. Grimm and H. Skowronek (eds.), Language acquisition problems and reading disorders: Aspects of diagnosis and intervention. Berlin, Walter deGruyter. Grinstead, J., 1994. Tense, agreement and nominative case in child Catalan and Spanish. M.A. thesis, UCLA.
K. Wexler I Lingua 106 (1998) 23-79
77
Guasti, M.-T., 1994. Verb syntax in Italian child grammar: Finite and nonfinite verbs. Language Acquisition 3, 1-40. Guilfoyle, E., 1996. The acquisition of Irish and the internal structure of VP in early child grammars. In: A. Stringfellow, D. Cahan-Amitay, E. Hughes and A. Zukowski (eds.), Proceedings of the 20th Boston University Conference on Language Development. Somerville, MA: Cascadilla. Haegeman, L., 1990. Understood subjects in English diaries. Multilingua 9, 157-199. Haegeman, L., 1995. Root infinitives, tense and truncated structures in Dutch. Language Acquisition 4(3), 205-255. Hagstrom, P., 1998. Implications of child errors for the syntax of negation in Korean. Manuscript, Dept. of Linguistics and Philosophy, MIT. Halle, M. and A. Marantz, 1993. Distributed morphology and the pieces of inflection. In: K. Hale and S. J. Keyser (eds.), The view from building 20: Essays in linguistics in honor of Sylvain Bromberger, 111-176. Cambridge, MA: MIT Press. Hamburger, H. and K. Wexler, 1973. Identifiability of a class of transformational grammars. In: K.H.H. Hintikka, E. Moravcsik and P. Suppers (eds.), Approaches to natural language: Proceedings of the 1970 Stanford Workshop on Grammar and Semantics. Dordrecht: Reidel. Harris, T. and K. Wexler, 1996. The optional-infinitive stage in child English. In: H. Clahsen (ed)., Generative perspectives in language acquisition, 1—42. Hickey, T., 1990. The acquisition of Irish: A study of word order development. Journal of Child Language 17, 17^41. Hickmann, M., H. Hendriks, R. Francoise and J. Liang, 1996. The marking of new information in children's narratives: A comparison of English, French, German and Mandarin Chinese. Journal of Child Language 23(3),. 591-619. Hoekstra, T., N. Hyams and M. Becker, 1996. The underspecification of number and the licensing of root infinitives. In: A. Stringfellow, D. Cahan-Amitay, E. Hughes and A. Zukowski (eds.), Proceedings of the 20th Boston University Conference on Language Development, 293-306 Somerville, MA: Cascadilla.. Hyams, N., 1986. Language acquisition and the theory of parameters. Dordrecht: Reidel. Hyams, N. and K. Wexler, 1993. On the grammatical basis of null subjects in child language. Linguistic Inquiry 24(3), 421-459. Jonas, D., 1995a. Clause structure and verb syntax in Scandinavian and English. Doctoral dissertation, Harvard University. Jonas, D., 1995b. On the acquisition of verb syntax in child Faroese. MIT Working Papers in Linguistics (MITWPL) 26, 285-280. Kail, M. and M. Hickmann, 1992. French children's ability to introduce referents in narratives as a function of mutual knowledge. First Language 12, 73-94. Karmiloff-Smith, A., 1981. The grammatical marking of thematic structure in the development of language production. In: W. Deutsch (ed.), The child's construction of language. London: Academic Press. Klepper-Schudlo, A., 1996. Early verbal inflection in polish and the optional infinitive phenomenon. Talk presented at the Workshop on the acquisition of morphology in LI. Seventh International Morphology Meeting, Vienna, Austria. Levow, G.-A., 1995. Tense and subject position in interrogatives and negatives in child French: Evidence for and against truncated structure. MIT Working Papers in Linguistics (MITWPL) 26, 281-304. Lyons, I., 1997. Spontaneous production of past participles in Italian and underspecification of number. In: A. Sorace, C. Heycock and R. Shillcock (eds.), Proceedings of the GALA '97 Conference on Language Acquisition, 96-101. Edinburgh: University of Edinburgh. Manzini, R. and K. Wexler, 1987. Parameters, binding theory, and learnability. Linguistic Inquiry 18, 13-444. Maratsos, M., 1976. The uses of definite and indefinite reference in young children: An experimental study of semantic acquisition. Cambridge: Cambridge University Press. Mills, A., 1985. The acquisition of German. In: D. Slobin (ed.), The cross-linguistic study of language acquisition. Volume 1: The data, 141-254. Mahwah, NJ: Erlbaum.
78
K. Wexler I Lingua 106 (1998) 23-79
Moucka, R., and K. Wexler (manuscript in preparation). Quartz, S.R. and T.J. Sejnowski, 1997. The neural basis of cognitive development: A constructivist manifesto. Behavioral and Brain Sciences 20, 537-596. Phillips, C., 1995a. Right association in parsing and grammar. In: C. Schiitze, J.B. Ganger and K. Broihier (eds.), Papers on language processing and acquisition. MIT Working Papers in Linguistics 26, 37-93. Phillips, C., 1995b. Syntax at age two: Cross-linguistic differences. In: C. Schiitze, J.B. Ganger and K. Broihier (eds.), Papers on language processing and acquisition. MIT Working Papers in Linguistics 26, 37-93. Pierce, A., 1989. On the emergence of syntax: A crosslinguistic study. Doctoral dissertation, Massachusetts Institute of Technology. Pierce, A., 1992. Language acquisition and syntactic theory: A comparative analysis of French and English child grammars. Dordecht: Kluwer. Platzack, C., 1990. A grammar without functional categories: A syntactic study of early Swedish child language. Working Papers in Scandinavian Syntax 45, 13-34, University of Lund. Plunkett, K. and S. Stromqvist, 1990. The acquisition of Scandinavian language. In: J. Allwood (ed.), Gothenburg Papers in Theoretical Linguistics. Gothenburg: University of Gothenburg. Poeppel, D. and K. Wexler, 1993. The full competence hypothesis. Language 69(1), 1-33. Rasetti, L., 1996. Null subjects and root infinitives in the child grammar of French. Geneva Generative Papers 4(2), 120-132. Rhee, J. and K. Wexler, 1995. Optional infinitives in Hebrew. In: C.T. Schutze, J.B. Ganger and K. Broihier (eds.), Papers on language processing and acquisition. MIT Working Papers in Linguistics 26, 383-402. Rice, M. and K. Wexler, 1996. Toward tense as a clinical marker of Specific Language Impairment in English-speaking children. Journal of Speech and Hearing Research 39, 1239-1257. Rice, M., K. Wexler and P. Cleave, 1995. Specific Language Impairment as a period of extended optional infinitive. Journal of Speech and Hearing Research 38, 850-863. Rice, M., K. Wexler and S. Redmond (in preparation). Comprehension of an extended optional infinitive grammar: Evidence from English-speaking children with specific language impairment. Rizzi, L., 1994a. Some notes on linguistic theory and language development: The case of root infinitives. Language Acquisition 3(4), 371-393. Rizzi, L., 1994b. Early null subjects and root null subjects. In: T. Hoekstra and B. Schwartz (eds.), Language acquisition studies in generative grammar. Amsterdam: Benjamins. Roeper, T. and B. Rohrbacher, 1995. Null subjects in early child English and the theory of economy of projection. Unpublished manuscript, University of Massachusetts, Amherst, and University of Pennsylvania. Sano, T. and N. Hyams, 1994. Agreement, finiteness and the development of null arguments. In: M. Gonzalez (ed.), Proceedings of NELS 24, 543-558. Amherst, MA: GLSA, University of Massachusetts. Santelmann, L., 1995. Verb Second grammar in child Swedish: Continuity of Universal Grammar in whquestions, topicalization and verb raising. Doctoral dissertation, Cornell University. Sarma, V., 1995. How many branches to a syntactic tree? Disagreements over agreement. In: J. Beckman (ed.), Proceedings of the NELS 25, 89-103. Amherst, MA: GLSA, University of Massachusetts. Schaeffer, J.C., 1990. The syntax of the subject in child language: Italian compared to Dutch. M.A. thesis, University of Utrecht. Schonenberger, M., 1998. The acquisition of verb placement in Swiss German. Doctoral dissertation. University of Geneva. Schonenberger, M., A. Pierce, K. Wexler and F. Wijnen, 1995. Accounts of root infinitives and the interpretation of root infinitives. Geneva Generative Papers 3(2), 47-71. Schiitze, C., 1997. INFL in child and adult language: Agreement, case and licensing. Doctoral dissertation, MIT. Schutze, C. and K. Wexler, 1996a. Subject case licensing and English root infinitives. In: A. Stringfellow, D. Cahan-Amitay, E. Hughes and A. Zukowski (eds.), Proceedings of the 20th Boston University Conference on Language Development, 670-681. Boston, MA: MIT Press.
K. Wexler I Lingua 106 (1998) 23-79
79
Schiitze, C. and K. Wexler, 1996b. What case acquisition data have to say about the components of INFL. Talk presented at the WCHSALT Conference, Utrecht University, June 28-30. Sigurjonsdottir, S., 1992. Binding in Icelandic: Evidence from language acquisition. Doctoral dissertation, UCLA. Thornton, R. and K. Wexler (in press). Principle B in child language. Cambridge, MA: MIT Press. Torrens, V., 1995. The acquisition of syntax in Catalan and Spanish. MIT Working Papers in Linguistics (MITWPL) 26, 451-471. Ura, H., 1996. Multiple feature-checking: A theory of grammatical function splitting. Doctoral dissertation, MIT. Vainikka, A., 1993/94. Case in the development of English syntax. Language Acquisition, 3(4), 257-325. Valian, V., 1991. Syntactic subjects in the early speech of American and Italian children. Cognition 40, 21-81. Van Kampen, J., 1997. The development of wh- movement in Dutch. Doctoral dissertation, University of Utrecht. Wagner, K., 1985. How much do children say in a day? Journal of Child Language 12, 475^187. Weissenborn. J. 1991. Functional categories and verb movement: The acquisition of German syntax reconsidered. In: M. Rothweiler (ed.), Spracherwerb und Grammatik: Linguistische Untersuchungen zum Erwerb von Syntax and Morphologic. Linguistische Berichte, Sonderheft 3. Weverink, M., 1989. The subject in relation to inflection in child language. M.A. thesis. University of Utrecht. Wexler, K., 1982. A principle theory for language acquisition. In: E. Wanner and L.R. Gleitman (eds.), Language acquisition: The state of the art. Cambridge: Cambridge University Press. Wexler, K., 1990. Optional infinitives, head movement and the economy of derivations in child grammar. Paper presented at the Annual Meeting of the Society of Cognitive Science, MIT. Wexler, K., 1992. Optional infinitives, head movement and the economy of derivation in child grammar. Occasional paper 45. Center for Cognitive Science, MIT. Wexler, K., 1994. Optional infinitives, head movement and the economy of derivations. In: D. Lightfoot and N. Hornstein (eds.), Verb movement, 305-350. Cambridge: Cambridge University Press. Wexler, K., 1995. TNS, DET anD-feature strength and interpretability in early child grammar. Talk presented at the Workshop on Clausal Architecture, Bergamo, Italy. Wexler, K., 1996. The development of inflection in a biologically based theory of language acquisition. In: M. L. Rice (ed.), Toward a genetics of language, 113-114. Mahwah, NJ: Erlbaum. Wexler, K. (in press). Maturation and growth of grammar. In: W.C. Ritchie and T.K. Bhatia (eds.), Handbook of Language Acquisition. Wexler, K. and Y.-C. Chien, 1985. The development of lexical anaphors and pronouns. Papers and Reports on Child Language Development (PRCLD), Stanford University, 138-149. Wexler, K. and P. Culicover, 1980. Formal principles of language acquisition. Cambridge, MA: MIT Press. Wexler, K. and H. Hamburger, 1973. On the insufficiency of surface data for the learning of transformational languages. In: K.H.H. Hintikka, E. Moravcsik and P. Suppes, Approaches to Natural Language Proceedings of the 1970 Standard Workshop on Grammar and Semantics, 16-179. Dordrecht: Reidel. Wexler, K. and R. Manzini, 1987. Parameters and learnability in binding theory. In: T. Roeper and E. Williams (eds.), Parameter setting, 41-76. Dordrecht: Reidel. Wexler, K. and G. Secco (manuscript in preparation). Wexler, K., J. Schaeffer and G. Bol, 1998. Verbal syntax and morphology in Dutch normal and SLI children. Paper presented at the 1998 IATL Conference, Israel. Wexler, K., C. Schiitze and M.R. Rice (in press). Subject case in Normal and SLI children: Evidence for the AGR/TENSE deletion model. Language Acquisition 7. Zagona, K., 1982. Government and proper government of verbal projection. Doctoral dissertation, University of Washington, Seattle.
This page intentionally left blank
Lingua ELSEVIER
Lingua 106 (1998) 81-112
Aspects of root infinitives^ Teun Hoekstraa, Nina Hyamsb'* 8 b
Leiden University/HIL, PO Box 9515, 2300 RA Leiden, The Netherlands Dept. of Linguistics, UCLA, 405 Hilgard Avenue, Los Angeles 90025, USA
Abstract This paper discusses the phenomenon of root infinitives (RIs) in child language, focussing on a distributional restriction on the verbs that occur in this construction, viz. event-denoting verbs, as well as on a related aspect of interpretation, viz. that RIs receive modal interpretations. The modality of the construction is traced to the infinitival morphology, while the eventivity restriction is derived from the modal meaning. In contrast, the English bare form, which is often taken to instantiate the Rl-phenomenon, does not seem to be subject to the eventivity constraint, nor do we find a modal reference effect. This confirms the analysis, which traces these to the infinitival morphology itself, which is absent in English. The approach not only provides a precise characterization of the distribution of the Rl-phenomenon within and across languages; it also explains differences between the English bare form phenomenon and the Rl-construction in languages with genuine infinitives by reference to the morphosyntax of the languages involved. The fact that children appear to be sensitive to these distinctions in the target systems at such an early age supports the general thesis of Early Morphosyntactic Convergence, which the authors argue is a pervasive property of the acquisition process. Keywords: Syntax; Acquisition; Root infinitives; Eventivity; Modality
1. Introduction The phenomenon of root infinitives (RIs) in child language has received a lot of attention in the recent literature. There has been much debate over the distribution of the phenomenon both within and across languages, and also over the structure underlying RIs. In this paper we focus on the interpretive properties of RIs, an area which has not been very widely investigated. As we will show, there seems to be a con* The research for this paper was made possible through a grant from NWO (Dutch Organization for Pure Scientific Research) to Nina Hyams, and a UCLA Faculty Senate Grant. This support is hereby gratefully acknowledged. We thank the audience attending a presentation of this work at the University of Southern California, March 21, 1998, for their comments and questions. We also thank Carson Schutze, Neil Smith and two anonymous Lingua readers for their comments. * Corresponding author. Phone: +1 310 206 8375; E-mail:
[email protected] 0024-3841/997$ - see front matter © 1999 Elsevier Science B.V. All rights reserved PII: 80024-3841(98)00030-8
82
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
straint on the aspectual nature of the verbs occurring in Rl-constructions, viz. only eventive verbs are allowed in such constructions, whereas stative predicates occurring during this same period typically require finiteness. We shall refer to this as the Eventivity Constraint (EC). Also, RIs typically do not get a deictic tense interpretation, but rather receive a modal interpretation. We call this Modal Reference Effect (MRE). Interestingly, both the EC and the MRE found in Dutch and other languages appear to be absent from early English bare V-constructions. The fact that English bare V-constructions show neither the effects of the EC nor seem to have modal reference suggests that these two properties are related, one of the points which we will argue in this paper. More generally, we will address the following questions: (i) What is the nature of the Eventivity Constraint? (ii) Why do RIs receive modal interpretations and how does this relate to the Eventivity Constraint? (iii) Why is English different? The paper is organized as follows. In Section 2, we introduce the Rl-phenomenon, and discuss its main properties as they have emerged from the recent literature. In this section we also discuss several analyses of the Rl-phenomenon, focusing on our own underspecification analysis of RIs (Hoekstra and Hyams, 1995, 1997). We show that various syntactic and distributional differences between RIs and English bare Vconstructions can be explained as effects of the different morphological systems. This proposal is part of a more general hypothesis concerning the relationship between the child's development of morphosyntax and his development of the interface or discourse conditions governing the expression of the morphosyntax, which we call Early Morphosyntactic Convergence. As we have argued elsewhere (Hoekstra and Hyams, 1995; Hyams, 1997), children readily converge on the specific morphosyntax of the adult target language, but have less restricted interface conditions than adults. In section 3, we present the empirical evidence for the EC and for the claim that RIs receive modal interpretations. In this section we also show how English child language is different in these respects from other child languages that display the Rl-phenomenon. In Section 4 we discuss the merits of another theory of RIs, the Null Modal Hypothesis (NMH), which might account for the modal interpretation of RIs. We conclude that although the NMH has a lot of explanatory potential, it meets a number of insurmountable empirical problems. Finally, in Section 5 we directly address the questions in (i) to (iii). We introduce our account of the modality of Rl-utterances, and we discuss the relationship between modality and aspect, showing how the EC is in fact an immediate consequence of the modality. Finally, we explain why English is different in these respects from the other languages that have been studied. We explain these interpretive effects in terms of the morphosyntactic properties of the respective adult languages, which also show a (limited) Rl-effect. In the conclusion we briefly discuss the interface differences between adults and children that allow children to have a more liberal use of RIs than adults.
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
83
2. The Root Infinitive Phenomenon 2.1. Properties of Rh It has long been noted that children acquiring Dutch, German, Swedish, French and many other languages pass through a stage in which they use infinitives in root contexts - so-called root infinitives (RIs) (term due to Rizzi, 1994), as in (1). (1) a. Papa schoenen wassen. Daddy shoes wash-inf. b. Michel dormir. Michel sleep-inf. c. Thorstn das haben. Thorstn that have-inf. d. Jag ocksa hoppa da'r & da'r. I also hop-inf. there and there
(Dutch, Weverink, 1989) (French, Pierce, 1992) (German, Poeppel and Wexler, 1993) (Swedish, Santelmann, 1995)
Van Ginneken commented on the existence of RIs in Dutch child language as early as 1917, noting that such utterances had a modal reference. More recent investigation of a number of child languages shows that there are systematic patternings of finite and non-finite clauses in early stages. First, RIs appear in positions in accordance with the target grammar. In French, for instance, finite verbs appear to the left of the negative adverb pas, while infinitives appear to its right (Emonds, 1978; Pollock, 1989). Pierce (1992) shows that similarly, in French child language RIs appear to the right of pas while finite verb forms appear to its left, as in (2) (cf. also Meisel, 1990; Verrips and Weissenborn, 1992). (2) [+finite] Elle a pas la bouche. she has not a mouth Veux pas lolo. (I) want not water Marche pas. (she) walks not Ca tourne pas. that turns not
[-finite] Pas la poupee dormir. not the doll sleep-inf. Pas manger la poupee. not eat-inf the doll Pas casser. not break-inf. Pas tomber bebe. not fall baby (Pierce, 1992)
Note that the finite and non-finite forms in (2) occur during the same time period occurring side by side in the same transcripts, so that RIs cannot be treated as an earlier developmental stage. Table 1 below shows the form-by-position interactions obtained by Pierce. Similarly, various studies of Dutch and German and other V2 languages have shown that RIs appear in the position of infinitives (clause finally in Dutch and German, after the negative adverb in the Scandinavian languages), while finite forms appear in second position (German: Jordens, 1991; Meisel, 1990; Boser et al., 1992,
84
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
Table 1 Finiteness and position of negation in French (Philippe) (Pierce, 1992)
Neg V V neg
Finite verb
Non-finite verb
11 185
77 2
Weissenborn, 1991; Poeppel & Wexler, 1993; Clahsen and Penke, 1992; Dutch: de Haan, 1986; Jordens, 1991. Swedish and Danish: Plunkett and Stromqvist, 1990). Data from the German child Andreas, presented in Table 2, is illustrative of this general finding. Table 2 Finiteness and position of verb in Andreas (Poeppel and Wexler, 1993)
V2-position Final position
Finite verb
Non-finite verb
197 11
6 37
Andreas corpus: Wagner, 1985; CHILDES, MacWhinney and Snow, 1985.
As discussed in Hoekstra et al. (1996), the Rl-phenomenon is not due to a lack of knowledge of the relevant finite morphology nor to a lack of knowledge of Spechead agreement requirements. When finite forms are used, agreement is almost always correct. Table 3 provides the frequencies of agreement errors occurring in finite utterances in several child languages. The number of such errors is under 4%, very low even by the most stringent acquisition standards. Table 3 Percentage of subject-verb agreement errors in early language Child
Language
Age
Simone Martina* Diana* Guglielmo* Claudia Francesco Marco Marti* Josep* Gisela* Guillem*
German Italian Italian Italian Italian Italian Italian Cat/Span Cat/Span Catalan Catalan
l;7-2;8 l;8-2;7 l;10-2;6 2;2-2;7 l;4-2;4 1;5-2;10 1;5-3;0 l;9-2;5 l;9-2;6 l;10-2;6 l;9-2;6
%Error
1732
1%
478 610
1.6% 1.5% 3.3% 3%
201 1410 1264
415 178 136 81
129
2% 4% 0.56% 3% 1.2% 2.3%
Source Clahsen and Penke, Guasti, 1994 Guasti, 1994 Guasti, 1994 Pizzuto and Caselli, Pizzuto and Caselli, Pizzuto and Caselli, Torrens, 1992 Torrens, 1992 Torrens, 1992 Torrens, 1992
1992
1992 1992 1992
(*Data available on CHILDES, MacWhinney and Snow, 1985; Martin, Guglielmo, Diana corpora, Cipriani et al., 1991; Marti, Josep, Guillem, Gisela corpora: Serra and Sole, 1992)
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
85
Furthermore, RIs bear correct infinitival morphology, e.g. -en in Dutch and German; -er/ir in French, -a in Swedish and so on. The data just reviewed leads to the conclusion that RIs constitute a grammatical category in their own right in a number of child grammars, with properties that differentiate it from finite constructions. This conclusion will be further reinforced by evidence we discuss below concerning other distributional and semantic differences between RIs and finite verbs. Let us now turn to English. Wexler (1994) has argued that English speaking children also show an Rl-stage. Utterances of the sort in (3), where the verb is missing the 3rd person singular -s, are analyzed by Wexler as the English analogue of the RI. (3) a. Eve sit floor. b. Cowboy Jesus wear boots.
(CHILDES, Brown, 1973)
The evidence for Wexler's conjecture is obviously much less strong than the evidence from other languages. First, English lacks a clear infinitival ending. Moreover, in English there is no positional difference between finite and non-finite verb forms.1 However, by assuming that it instantiates the Rl-phenomenon, the lack of -s inflection in English finds an independently motivated explanation and it brings English in line with the other languages studied. An alternative for English might be to analyze utterances such as those in (3) as resulting either from a breakdown in agreement or from the dropping of the -s affix due to production problems. But neither of these suggestions is theoretically very compelling. Why would English children make agreement errors or drop finite morphology when children acquiring other languages do not? In fact, with regard to agreement, Harris and Wexler (1996) show that to the extent that this can be tested given the weak inflectional system of English, young English speaking children do not make agreement errors. Of 1352 sentences containing the first person subject / from the corpora of several English speaking children, only 0.02% occur with a verb bearing -s (e.g. / goes). Many more such errors would be expected if these children simply failed to respect Spec- head agreement requirements. On the other hand, an account in terms of affix dropping would not explain the distribution of negation in early English utterances. Harris and Wexler show that bare negatives in early English, that is, negative adverbs unsupported by an auxiliary, only occur with uninflected verbs as in (4a) and never with inflected forms as in (4b). (@ indicates that the form or utterance is unattested in the acquisition data.) As Harris and Wexler note, this asymmetry is unexpected if the bare form results simply from the dropping of finite morphology.2 (4) a. Mommy not go. b. @Mommy not goes. 1
An exception are the verbs have and be, but as we will see later, auxiliary verbs never occur as RIs. Children use both doesn't and don't with third person singular subjects in negative utterances, where the latter type of utterance could be considered an agreement error. We assume that do is inserted here to support n 't. It is interesting to note that when inverted, auxiliary do is always correctly inflected. 2
86
T. Hoekstra, N. Hyams / Lingua 106 (1998) 81-112
Despite the theoretical appeal of Wexler's conjecture, we will show that it is too general. As we will see, there are both quantitative and qualitative differences in the behavior of the English bare form and the root infinitive in the other languages studied. We will further show that these differences stem precisely from the morphosyntactic difference between true infinitives and the English bare form, which is not in fact an infinitive. 2.2. The grammatical representation offiniteness and RIs Now that we have concluded that RIs constitute a construction type in its own right, different from finite constructions, we turn to the question of their grammatical representation. Various approaches are available in the literature. One such approach, championed by Radford (1990) and others, holds that the Rl-phenomenon results from the absence of functional categories. The idea is that functional categories are subject to maturation, and that early grammars consist solely of lexical projections (cf. also Lebeaux, 1988). Since the Infl-projection, or whatever inflectional projections comprise the label Infl, is a functional category, the absence of -s in the examples in (3) is an automatic consequence of this maturational theory. It should be noted that this explanation is strongly inspired by the bare form phenomenon in English, the only language considered by Radford and most other proponents of this theory. The presence of an infinitival marker in the Rl-utterances in (1) is less evidently in congruence with the idea that early grammars have only lexical projections. Moreover, the fact that the Rl-construction occurs alongside finite clauses and shows different properties argues strongly against this maturational account. Another approach is Rizzi's (1994) theory of truncation, according to which RIs result from truncating part of the top structure of full clauses. On this approach the child's grammar has in principle the full clausal structure available. The Full Clause Hypothesis also underlies various analyses which postulate the underspecification of particular functional categories in the top structure of the clause. This has been the tack that we have taken (cf. Hoekstra and Hyams, 1995, 1996, 1997). Our underspecification analysis, the details of which we outline below, is part of a more general theory of the relationship between two aspects of language, the morphosyntax and the interface (or discourse) conditions which govern the expression of morphosyntax. We propose that children converge on the morphosyntax of their ambient language very quickly, as is evidenced by the kind of properties we reviewed above concerning agreement and form-position correlations, but that children are less restricted than adults in the grammatical options allowed by the relationship between grammar and discourse. In particular, we take the position that finite and non-finite constructions are grammatical in the adult system as in the child's system, but that the Rl-construction is much more limited in the adult output because of a bleeding relationship that exists between RIs and finite utterances. We shall return to this difference between adult and child systems in Section 5. Let us now briefly sketch the representational system that we envision. Our basic assumption with respect to finite clauses is that they are grammatically anchored. By this we mean that the temporal location of the eventuality denoted by
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
87
the VP is fixed through a temporal operator, which we assume is located in C, following Enq (1987) and Gueron and Hoekstra (1989). The notion of finiteness refers to this fixation, as finiteness makes visible a chain between the operator and the verb, or, more specifically, the Tense position. Tense itself is taken to be a pronominal, which receives the status of pronominal variable if it is connected to a tense operator (TO) through a visible tense-chain. This is depicted in (5): (5) TOjF!... F n ... Tensej VP Languages vary with respect to the morphological extensions that are used to make the tense-chain visible. This is to say that the notion of finiteness is expressed in different ways in different languages. Some languages give overt manifestation of finiteness in terms of a tense-morpheme, for example, Japanese. Others express finiteness with person morphology, and yet others through number morphology. So, in Dutch, a present tense finite verb is morphologically marked for singular or plural, but not for either person or tense. It is therefore number morphology in Dutch that makes a tense-chain visible in the case of present tense.3 A language such as Japanese makes a tense-chain visible through the morphological expression of tense, and languages such as Italian, Spanish and Catalan make the chain visible through the expression of (at least) person morphology. We furthermore observe that not all child languages allow RIs. The empirical generalization in this respect is that RIs occur only in languages where the expression of finiteness may be done exclusively through number morphology. Languages where finiteness is always expressed with person agreement, or with tense-morphemes appear not to allow RIs. This can be seen in Table 4. In this table we divide the languages in non-RI languages, where the phenomenon occurs sporadically at best, and Rl-languages, where it occurs with much higher frequencies (with some amount of variation depending on the child and on the language). It is worth noting that English has the highest frequency of non-finite forms, ranging from 75% to 81%. We will return to this point below (in Section 4) when we discuss the differences between English and the other Rl-languages. In order to capture this empirical generalization, Hoekstra and Hyams (1995) propose that the Rl-phenomenon results from the optional underspecification of the functional head Number. If Number is unspecified in languages where only number morphology expresses finiteness, e.g. Dutch, no tense-chain is made visible and the verb is non-finite. In this instance, Tense is not bound as a pronominal variable by a tense operator, but rather has the status of a free pronoun, which gets its interpretation discursively. In that case, the structure is unanchored. In person-marking and 3
We take the -t suffix to be an extension of singular number. Its absence in first person follows from the assumption that first person is unmarked (cf. Kayne, 1989). We similarly follow Kayne in assuming that English -s expresses singular number, the absence of -s in first person following from the same assumption, and its absence in the case of you following from the assumption that you is grammatically plural, like French vous. The inclusion of German and French among the Rl-languages may be surprising in view of the fact that these languages apparently have person marking morphology. See Hoekstra and Hyams (1995) for motivation and discussion of this point.
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
88
Table 4 Frequencies of root infinitives in child language (partially based on Sano and Hyams, 1994) Non-Rl languages Child
Age
Italian (Guasti, 1994)
2 ;0 1;ii 2 ;1 2 ;0-2 ;5
Diana Martina
%RIs
0.00 0.16 0.04 0.07 Italian Paola ;7-2 ;6 0.08 (Schaeffer, 1990) Daniele Massimo ;7-2 ;6 0.06 ;7-2 ;6 0.07 Gabriele ;7-2 ;6 0.05 Orietta •1-2 ;5 0.10 Elisabeta ;9-2;5 0.05 Francesco Damariz 2 •6-2 ;8 0.05 Spanish (Grinstead, 1994) Juan 1 ;7-2;0 0.12 2 ;i-2 ;4 0.10 1;11-2;6 0.03 Guillem Catalan (Torrens, 1992) 2 ;0-2 ;5 0.03 Marti 2 ;3-2;8 0.02 Toshi Japanese 2 ;8-2 ;10 0.08 Ken (Sano, 1995) Masanori 2 ;4 0.10
RI languages French (Pierce, 1992)
Child
Nathalie Philippe Daniel Swedish Freja (Platzack) Tor (from Guasti,! 994) Embla S German (Weissenborn) (from Guasti, 1994) Dutch Laura (Weverink,1989) Tobias Fedra (Haegeman, 1994) Hein Birna Icelandic (Sigurjonsdottir p.c .) Eve English Adam Nina
A ge
%RIs
1; 9-2; 3 1; 9-2; 6 1; 8-1 ; 1 1 1; 11-2;0 1; ll-2;2 1; 8-1; 10 2; 1 2; 2
0.49 0.20 0.43 0.38 0.56 0.61 0.46 0.40
1; 8-2 ;1 0.36 1; 10-1;11 0.36 1; 10-2; 1 0.26 2; 4-3; 1 0.16 2; 0-2; 3 0.36 1; 6-1; 10 0.78 2; 3-3 ;0 0.81 2; 4-2; 5 0.75
tense-marking languages the verb will always bear the relevant finite morphology since these heads may not be left unspecified (by hypothesis) and hence RIs do not occur. Alternative underspecification accounts, which postulate that tense itself may be optionally unspecified in early grammar (Wexler, 1994) or that agreement is absent altogether (Clahsen et al., 1994) do not explain the lack of RIs precisely in those languages which exclusively mark tense, such as Japanese, and the Romance person-marking languages. The question arises as to why number should have a priviledged status, i.e.why should number, but not person or tense, have the option of remaining unspecified in the early grammar. It is quite understandable that tense does not have this status, if we are right in claiming that the Rl-phenomenon basically derives from Spec-Head agreement (cf. Hoekstra et al., 1996a, and Section 4 below), and hence is based on a specification in DP. But the question remains as far as the distinction between number and person is concerned. It would take us too far afield to go into this question in any depth, but the difference in behavior in this respect is no doubt related to a more fundamental difference between the role of number and person in the grammar. Number is a nominal feature, classifying nominal structures in a paradigmatic opposition. Its grammatical status is also evident from the fact that it is normally involved in various agreement relationships (noun-adjective agreement, for example). Person, in contrast, is not a nominal feature in the same sense: it does not classify nominal expressions along some dimension, nor is it generally found in agreement relationships. Rather, person is deictic, allowing reference to participants in the speech situation, and its occurrence in the grammar is often limited to agreement when tense, similarly a deictic category, is also involved. Thus,
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
89
if underspecification in the verbal domain (i.e. RIs) is a reflex of underspecification in the nominal domain and if number, but not tense or person, is a nominal feature, then we see why underspecification is restricted to number. How the underspecification of number in child language relates to the deeper differences between number and person noted above is a question which requires further investigation. Having provided a brief overview of the Rl-phenomenon, we now turn to the main topic of this paper, viz. certain distributional and interpretative properties of RIs that have hitherto received far less attention.
3. The Eventivity Constraint and the Modal Reference Effect As discussed above, the observation that children use non-finite verbs has led to the investigation of distributional and other syntactic properties associated with different verb forms. A second line of research, dating back to the early 70s, is more interested in the interpretive correlates of the different inflections that children use in the early stages of language development. The basic observation is that different inflections distribute selectively over different aspectual classes of verbs. The most influential version of this 'aspect-before-tense' hypothesis is Antinucci and Miller (1976), who argue that Italian (and English) children use participles not to denote preterites, but rather resulting states, and hence, that their distribution is limited to accomplishment verbs (cf. also Shirai and Anderson, 1995; Bronkart and Sinclair, 1973; Bloom etal., 1980). The link between inflection and aspectual class raises the question as to whether there are particular aspectual properties associated with RIs. Relevant to this question is the finding of De Haan (1986), based on early Dutch, that auxiliary verbs do not occur as RIs (see also Sano and Hyams, 1994). According to De Haan, elements of the AUX category, which expresses time and modality, always occur in finite form and in second position, while elements of the category V, which express notions such as act and change, occur in non-finite form in final position. Jordens (1991) notes, however, that the finiteness restriction is not limited to auxiliaries, but applies more generally to statives. Jordens (1423) describes the distribution of inflections across aspectual classes of verbs in early Dutch as in (6): (6) finite statives resultatives
infinitives activities
participles resultatives activities
Although we disagree with Jordens' characterization of activities and resultatives, that is not pertinent to the present discussion. We will focus instead on the point that is important for our purposes, which is the observation that stative verbs are exclusively finite, or, put differently, that RIs do not allow stative predicates, but rather, require event-denoting predicates. We formulate this as the Eventivity Constraint (EC), as in (7):
90
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
(7) The Eventivity Constraint (EC) RIs are restricted to event-denoting predicates The non-occurrence of auxiliary verbs as RIs noted by De Haan is subsumed under the EC. Ferdinand (1996) observes that there is an eventivity constraint in early French as well. During the RI stage in French, stative verbs are exclusively finite, while eventdenoting verbs occur both in finite and non-finite forms. In (8) we list the set of stative verbs which Ferdinand finds during this stage. These verbs occur in finite form only. (8) Stative verbs in early French: Finite only avoir(have), etre (be), s'appeler (be called), manquer (be absent, lack), vouloir (want), croire (believe), plaire (please), aimer (love), adorer (adore), esperer (hope), savoir (know), se souvenir (remember), devoir (must), falloir (be necessary), pouvoir (can), aspectual aller (go) A particularly good example of the EC at work is Ferdinand's observation that the verb aller (go) does occur as an RI, but only in its main verb use. In its use as an inchoative auxiliary it is always finite. Thus, we find Mama aller ('Mommy go-inf) and Mama vait manger ('Mommy is going to eat') but not *Mama aller manger ('Mama go-inf eat-inf). Wijnen (1996) provides relevant quantitative data from four Dutch children.4 Out of 1883 RIs in their corpora, 1790/1883 or 95% are eventive verbs, and only 93/1883 or 5% are stative verbs. In fact, the number of real stative verbs may be even more limited. The most frequently found 'stative verbs' are hebben (have), zien (see) and horen (hear). But the latter two verbs, while not controllable, are certainly not stative. Hebben, on the other hand, denotes a state, but the interpretation of hebben in at least some of these Rl-contexts is 'get' rather than 'have', as is also a possible interpretation of hebben in adult Dutch (cf. Ik heb een fiets gehad voor mijn verjaardag (I have had (=got) a bicycle for my birthday)), and of have in English (cf. Mary had a baby last night). Thus, the figures given by Wijnen may actually overestimate the number of stative RIs, which is already strikingly low as compared to eventive RIs. We conclude, therefore, that stative verbs (at least with stative, i.e. non-inchoative interpretation) hardly ever occur in RIs. In contrast to the RI situation, Wijnen finds that the finite verbs are evenly split between eventive and stative verbs, as shown in Table 5.
4
The ages of the children in Wijnen's study are as follows: Josse 2;0.7-2;6.22 Matthijs l;11.10-2;8.5 Niek 2;7-3;2.13 Peter 1;9.6-2;1.26 Josse and Matthijs' data were collected by Gerard Bol and Evelien Krikhaar; Nick's data were collected by Frank Wijnen. These corpora are available through CHILDES (MacWhinney and Snow, 1985).
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
91
Table 5 Inflection of eventive and non-eventive verbs in four Dutch children (based on Wijnen, 1996)
Eventive Non-eventive Total
Finite
RIs
350 349 699
1790 93 1883
In Russian, there is a similar eventivity constraint on RIs. While the number of RIs in early Russian is relatively small, Van Gelderen and Van der Meulen (1998) find that 98% of Varya's (CHILDES, MacWhinney and Snow, 1985; data collected by E. Protassova) RIs are eventive. We see, then, that in several typologically distinct child languages, RIs are subject to the EC. A further interesting property of RIs is that their reference is not free. As noted previously, it is an old observation that RIs typically have a modal interpretation (Van Ginneken, 1917). More recently, Hoekstra and Jordens (1994) argue for the modality of Dutch RIs on the basis of patterns of negation. The child they studied uses two negative forms, the adult negative adverb niet with finite verbs, and the adult anaphoric negative adverb nee in combination with RIs. This nee stands in opposition to combinations of modal verbs plus niet, such as kan-niet (cannot) and mag-niet (may not). Nee is replaced later by moet-niet (must-not). Contextual analysis of nee with RIs confirms this modal negative meaning. Plunkett and Stromqvist (1990) make the same observation concerning the modality of RIs for Swedish, Ingram and Thompson (1996) for German, and Meisel (1990) and Ferdinand (1996) for French. Wijnen (1996) provides the quantitative data in Table 6 on the reference of Dutch RIs. Table 6 Temporal reference of RIs and finite verbs in four Dutch children (Wijnen, 1996)
RIs Finite verbs
Present
Future/modal
Past
Total
194 (10%) 657 (93%)
1625 (86%) 21 (3%)
64 (3%) 21 (3%)
1883 699
These data show that while finite verbs have mostly present tense interpretations, the modal reference is the most frequent one for RIs. Let us formulate this finding as in (9): (9) The modal reference effect (MRE) With overwhelming frequency, RIs have modal interpretations Given that the modal itself is not overtly expressed, we can only state the kinds of modal messages that these RIs seem to convey. These include deontic and boulemaic
92
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
modality, expressing necessities and desires. The meaning of the RI sentences is inferred from the linguistic and non-linguistic context of the utterance.5 Some examples follow. (10) a. Eerst kaartje kopen! first ticket buy-INF 'We must first buy a ticket'. b. Niekje buiten spelen. Niekje outside play-INF 'Niek (^speaker) wants to play outside'. c. Papa ook boot maken. Papa also boat make-INF 'Papa must also build a boat', or 'I want Papa to also build a boat'. d. Jij helicopter maken. You-NOM helicopter make-INF 'You must build a helicopter'. Let us now turn to English again. Here the situation is very different: neither the EC nor the MRE seem relevant to the English bare form phenomenon. Let us first consider eventivity. Ud Deen (1997) checked the distribution of finiteness across eventive and non-eventive verbs in the files of Adam and Eve (Brown, 1973). The verbs which Ud Deen counted as stative were know, need, and want. Only third person subject sentences were counted so that the finite/non-finite status of the verb was clear. Repetitions and imperative sentences were excluded. Ud Deen found numerous examples of bare stative verbs such as those in (11). (11) a. Man have it. b. Ann need Mommy napkin. c. Papa want apple.
5
In Wijnen's (1996) study an utterance was taken to be on-going (present) when the utterance and the eventuality it referred to co-occurred. This was inferred either from contextual information in the transcript or from the response of an adult interlocutor. The utterance was classified as 'past' if context suggested that it referred to a past eventuality, and the utterances was classified as 'future' if it referred to an as yet unrealized eventuality; Wijnen notes that these were often expressions of the child's wishes or desires, as in (i), as is also reflected in the fact that an adult interlocutor would recast the utterance using a modal, as in (ii) (examples from Wijnen). In our tables we refer to this category as 'future/modal', (i) ME: Papa bouwen Daddy build-INF FAT: geef jij de blokjes maar aan dan 'well, hand me the building blocks then' (ii) NIE: drinke(n)! drink-INF FAT: wil je die kamer drinken? want you in that room drink 'do you want to have a drink in that room?'
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
93
Table 7 reports the quantitative results of Ud Deen's analysis.6 Table 7 Finiteness of eventive and non-eventive verbs in English (Adam and Eve) (based on Ud Deen, 1997)
Eventive Non-eventive Total
Finite
Bare verb
81 8 89
199 65 264
We see first that the English bare form is not at all limited to eventives, in contrast to Dutch RIs; approximately 25% (65/264) of the bare verbs are non-eventive, while only 5% of the Dutch RIs are (cf. Table 5). The difference between English and Dutch is even more striking when we look at the breakdown of the non-eventives. The English non-eventives occur most often in the bare verb form. Of the 73 tokens of non-eventive verbs, 65 (89%) are bare forms and only 8 are finite. This is again in marked contrast to the situation in Dutch where non-eventive predicates are typically finite, 349 of 442 (79%) are finite. And recall that the 93 non-eventive RIs in Dutch is inflated due to the inclusion of verbs such as hear and see which are not really stative. Thus in English non-eventive verbs most often occur in non-finite form. Clearly the EC is not operating on English bare forms. Ud Deen (1997) also looked at the reference of the bare forms, and again, as shown in Table 8, we see that in contrast to Dutch, modal reference is not the dominant one. Rather, the bare forms have mostly a deictic temporal interpretation (present or past), with the present tense here-and-now interpretation being the most frequent. Only 13% of the English bare forms have a modal interpretation (34/264 collapsing across eventive and non-eventives). This is in contrast to Dutch, where 86% of RIs have a modal reading. Table 8 Temporal reference of bare forms in English (Adam and Eve) (based on Ud Deen, 1997)
Eventive Non-eventive Total
Past
Present
Future/modal
Total
56 (28%) 3 (4%) 59
109 (55%) 62 (96%) 171
34 (17%) 0 34
199 65 264
In fact, there is little difference between the reference of bare forms and finite forms, as can be seen by comparing Table 8 with Table 9. 6
The files which are included in this analysis are Eve: files 1-12 (age 1;6-1;11) and Adam: files 1, 8, 10, 12, 14, 20, 22, 24, 28, 30) (age 2;3-3;5) from the CHILDES data-base (Brown, 1973; MacWhinney and Snow, 1985).
94
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
Table 9 Temporal reference of finite forms in English (Adam and Eve) (based on Ud Deen, 1997)
Eventive Non-eventive Total
Past
Present
Future/modal
Total
33 (40%)
38(47%) 8 (100%)
10(12%)
81 8 89
0 33
46
0 10
As might be expected, the predominant reference for finite verbs is temporal. Only 10% of the verbs in Table 9 have a modal reading (10/89 collapsing across eventive and non-eventive), which is very close to the 13% modal reading that we find for bare forms. Again the contrast with Dutch in noteworthy; in Dutch, 86% of RIs have a modal interpretation, while only 3% of the finite verbs do (cf. Table 6). On the basis of this evidence we conclude that the bare form in English is not subject to either the EC or to the MR. The fact that English is different in both respects from the other languages studied, suggests that the EC and the MR are related. Having laid down the empirical groundwork, let us then return to the questions formulated in the introduction: first, why are RIs in Dutch and other languages restricted to eventive verbs (the EC); second, how does this relate to the future/modal interpretation (MR); and third, why is English different in these respect? In the section that follows we present one possible answer to these questions, the Null Modal Hypothesis (NMH). We will show, however, that despite its explanatory potential, the NMH does not provide an adequate account of various properties of Rl-constructions, nor does it provide a basis for explaining the crosslinguistic differences associated with RIs. 4. The Null Modal Hypothesis A possible explanation for the EC and MR takes the form of a two-part hypothesis, as in (12): (12) (i) The structure of Rl-utterances contains a non-overt modal verb, and (ii) Modal verbs select eventive predicates. Let us refer to this as the Null Modal Hypothesis. Various suggestions leading to such a hypothesis can be found in the literature. Plunkett and Stromqvist (1990: 48) suggest that negative RIs (e.g. inte Mamma tvdtta 'NEG Mummy wash-inf) have a missing modal verb, in order to account for the observed word order with preverbal negation. Boser et al. (1992) adopt a version of the Continuity Hypothesis, according to which children's apparently non-finite utterances are structurally identical to finite adult utterances. They account for sentences which lack a finite verb in German by postulating a null auxiliary in the underlying structure. The null auxiliary moves to C, which blocks the raising of the infinitive, thereby explaining the fact that German RIs occur in sentence-final position. Kramer (1993) also postulates a
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
95
null modal for RIs in order to account for the Case marking of lexical subjects which sometimes occur with RIs. Ferdinand (1996) goes one step further. She suggests not only that RIs have a null modal, but also that modals select eventive predicates. The NMH has a large explanatory potential. First, it explains the presence of infinitives in root contexts; the lack of a finite verb is only apparent as there is a phonologically null finite modal verb and it is this modal verb that selects the infinitival form. The fact that the RI occurs in the position where an infinitive normally occurs in the adult language is also accounted for, as noted above; it is the null modal that undergoes V-to-I (to C) and hence the main verb cannot raise. The surface syntax of such utterances is therefore accounted for. The NMH also explains the modal reading of RIs, which is established through the meaning of the null modal itself. Finally, to the extent that a connection can be established between modal verbs and the aspectual nature of their complements, the EC might follow from it as well. We return to point this below. However, despite its considerable appeal, the NMH runs into a number of theoretical and empirical problems, some of which are noted in the literature. The gist of some of the problems is that the NMH obliterates the morphosyntactic distinction between finite utterances and RIs. As a result, it is unable to capture those properties that are dependent on finiteness, and which are missing in RIs. One such case concerns topicalization in V2 languages. As Poeppel and Wexler (1993) point out, the NMH has no proper account for the asymmetry in topicalization in finite and nonfinite clauses (this observation is attributed to David Pesetsky). Under the standard theory of topicalization in (adult) V2 languages (Den Besten, 1983), topicalization targets the CP, which itself requires a finite verb to move into its head-position, yielding the structure in (13b) for a topicalization construction like (13a): (13) a. Aan Marie wil ik een boek geven. (Dutch) To Mary want I a book give-INF 'To Mary I want to give a book' b. [CP aan Marie j [c wil ; [IP ik een boek tj geven t ; ]]] As has been observed by a number of people, including Boser et al., children acquiring V2 languages adhere to the adult restrictions; they topicalize in finite sentences, but not in RIs. Table 10 summarizes the data from three V2 languages, German, Dutch and Swedish. It shows that non-subject initial RIs are virtually absent, that is, RIs with objects or adverbs as topics do not occur, while topicalization of non-subjects in finite clauses is a robust phenomenon.7 Under the view that RIs are genuinely infinitival structures, as we argue, this asymmetry is easily accounted for. According to the NMH, on the other hand, RIs are covert finite clauses and hence should pattern like overt finite clauses. 7
Under the standard analysis of V2, subject initial main clauses also arise through topicalization (movement of the subject from Spec,IP to Spec,CP). However, subject initial clauses are also amenable to a non-movement analysis (cf. Travis, 1984; Zwart, 1994). Only when a non-subject (object or adverb) occurs in clause-initial position, are we certain that topicalization has occurred.
96
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
Table 10 Number of subject initial and non-subject initial RIs and finite clauses Non-subject initial
Subject initial
German Dutch Swedish
Finite
RI
Finite
RI
130 1223 145
24 101 147
50 1346 61
0 5 1
Poeppel and Wexler, 1993 Haegeman, 1994 Platzack, 1992
Boser et al. (1992) are aware of this problem and suggest the following solution. Because the auxiliary in C is a null category, it must be licensed by an overt specifier with the same agreement features. This condition is satisfied only when the subject occupies the specifier of CP position since in this case the features of the null auxiliary are recoverable from its specifier. If an object or adverb occupies the Spec,CP position, the null auxiliary is not properly licensed or identified because it does not share agreement features with the null modal. This then explains why in the presence of a null auxiliary (i.e. in RIs) the subject has to be in clause initial position. Apart from the question of how other features of the modal verb are recoverable (e.g. its modal value), the Boser et al. version of the NMH requires that the null auxiliary be licensed by an overt subject in its specifier. Hence, the analysis predicts that RIs should only occur with overt subjects. But in fact, as shown in Table 11, in various languages that have been studied the vast majority of Rl-subjects are null. Particularly relevant to the present discussion are the V2 languages, German, Flemish and Dutch, which Boser et al. specifically analyze as null modal languages. Note that English does not follow the pattern. English bare forms most often occur with overt subjects and do not distinguish themselves from finite forms in this regard. We return to the idiosyncratic behavior of English below. The asymmetrical behavior of finite clauses and RIs with respect to topicalization is mimicked in WH-constructions. As WH-movement targets the specifier of CP and induces V-movement to C, it is expected that it will only occur in constructions with finite verbs.8 This prediction is correct. As Table 12 shows, in Dutch, German, Swedish and French, WH-questions are virtually absent in non-finite utterances, though these languages clearly have non-WH RIs, as shown by the percentages of [-finite] verbs in the 'all clauses' column.9 8 This expectation is based on the generalization that V-movement to C is restricted to finite verbs. It is not immediately evident whether this restriction is related to WH-movement (or V-movement to C), or whether it is a corollary of the requirement on adult grammar that independent clauses be finite. In Hoekstra and Hyams (1997), we in fact argue that child English has non-finite I-to-C movement, and hence that the finiteness requirement on V-to-C movement in the adult grammar is indeed a consequence of the independent requirement on adults that root clauses be finite. 9 The data in Table 12 is based on the following sources: Dutch: Hein 2;4-3;l (Haegeman, 1994); German: various children (Kursawe, 1994); Swedish: various children (Santelmann, 1994); French: Philippe 2;l-2;3 (Crisma, 1992); English: Adam 2;3-3;l (Phillips, 1995).
97
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112 Table 11 Percentage of null and overt subjects in finite and non-finite clauses Lang
Flemish German German French* French* Dutch English English *
Child
Maarten 1;11 Simone 1;8-4;1 Andreas Nathalie l;9-2;3 Philippe 2; 1-2; 6 Hein2;3-3;l Eve l;6-2;3 Adam 2;3-3;0
Finite verbs Overt
Null
75% 80% 92% 70% 74% 68% 90% 69%
25% 20% 8% 30% 26% 32% 10% 31%
Source
Non-finite verbs Total
92 3636
220 299 705 3768
86 113
Overt
Null
Total
11% 11% 32% 27% 7% 15% 89% 80%
89% 89% 68% 73% 93% 85% 11% 20%
100 2477
68 180 164 721 155 242
Kramer Behrens, 1993 Kramer, 1993 Kramer, 1993 Kramer, 1993 Haegeman, 1994 Phillips, 1995 Phillips, 1995
for French, only preverbal subjects were counted
The WH-facts bring to light a further difficulty of the NMH as formulated by Boser et al. (1992). Under their proposal, it is expected that non-subject WH-RIs will not occur since the null modal would not be licensed in this case, but that subject WH-RI questions should freely occur. This prediction is not borne out. As just noted, in Dutch, German, Swedish and French, non-finite WH-questions are virtually absent and hence we do not find the subject-non-subject asymmetry expected on the Boser et al. account.10 Table 12 Percentage of RIs in finite and non-finite WH-questions and in all clauses All Clauses -Finite
+Finite
-Finite
3768 (86%)
721 (16%)
-
-
88 (97%) 306 (99.6%) 675 (99.6%) 114(100%) 69 (43%)
2 (3%) 1 (0.4%) 5 (0.4%) 0 (0%) 92 (57%)
+Finite Dutch German Swedish French English
WH-questions
921 (83%) 134(40%)
195 (17%) 203 (60%)
10 Given that there is in fact no difference between subject and non-subject questions, Barry Schein (p.c.) suggests that the relevant generalization might be that a null auxiliary may not occur in C. Such a suggestion is consistent with the distribution of ha (have) deletion in adult Swedish, which, according to Holmberg (1986), can occur only in embedded clauses, where the empty ha would be governed by the complementizer. This hypothesis would indeed capture the non-occurrence of WH-RIs, but it is unable to explain the actual occurrence of RIs in general, as these always involve a null modal in C. One might adopt the non-unitary analysis of V2 (cf. Travis, 1984: Zwart, 1994), according to which only non-subject initial main clauses involve V-movement to C, while the verb moves no further than I in subject initial main clauses. The absence of non-subject initial RIs then follows from Schein's suggestion, assuming now that null modals are allowed in I, but not in C. However, this would make child grammar different from adult Swedish, in which foz-deletion is also excluded in subject initial main clauses, which supports a unitary V-to-C analysis for adult V2 languages.
98
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
Table 12 shows that English differs from the other languages in allowing nonfinite WH-questions, e.g. 'Where Teddy sleep?' Neither does English show the asymmetry with respect to subject and non-subject questions; infinitival WH-questions include both subject and non-subject questions (cf. Guasti and Rizzi, 1996; Roeper and Rohrbacher, 1994). The NMH has no basis for explaining the crosslinguistic difference, as it is unclear why English would allow null modals in WH-questions, while the other languages do not. For the same reason, the NMH also fails to provide a basis for explaining the fact that certain languages exhibit an Rl-stage and others do not. Recall from Section 2.2. that RIs are not found in all child languages. Rather, whether a language shows an Rl-stage or not is dependent on the inflectional properties of the adult language languages which mark number exclusively show an Rl-stage, while tense and person-marking languages do not (Hoekstra and Hyams, 1995). It is not easy to see how the NMH could account for this generalization. Why would the former languages allow null modals, and the latter not? A final point for which the NMH provides no understanding concerns the range of subject types found in Rl-constructions. As the NMH analyzes RIs basically as finite clauses with a null finite verb, the only expectation we may have is that RIs only occur with overt subjects, since these are required to license the null modal. However, as we already showed in Table 11, this expectation is not at all borne out. Rather, with the exception of English, RIs occur predominantly with null subjects.11 This null-subject-RI correlation is expected on our underspecification approach, as we will now explain. The following discussion is based on Hoekstra and Hyams (1995), Hoekstra et al. (1996) and Hoekstra and Hyams (1997). Given that Number is an inflectional ingredient in both the nominal and the verbal system, the underspecification of Number hypothesis leads us to expect cross-categorial effects as well as correlations between the finiteness specification in the clause and in its subject. More precisely, we argue that the optionality of determiners (e.g. doggie bark) observed in child languages, as well as the occurrence of certain types of null subjects, result from the underspecification of Number in the nominal domain. On our 11
It is important to be clear about the nature of the null subjects. We assume that the subject of nonfinite clauses, including RIs, is a radically underspecified null element, essentially PRO. On the other hand, there is the pro subject of the Romance pro-drop languages, which we take to be a fully specified pronominal minus a phonological matrix. This is the element that appears in subject position in finite clauses in Romance child and adult languages. A question arises as to the identity of the null subject in finite clauses in non-pro-drop languages. Although finite clauses generally occur with lexical subjects, there is still a percentage of null subjects (cf. Table 11). It seems clear that in the V2 languages what is involved is a null topic, as first argued by De Haan and Tuijman (1988), an option also available in the adult languages, but in a more restricted way, cf. Dutch Die film heb ik al gezien 'That movie have I already seen' with an optionally null object topic, and &at mag niet 'That may not = that is not allowed' with an optional null subject topic. The evidence for a null topic analysis is that these null subjects almost always occur in first position (Spec,CP), and not in post-verbal position (Spec,IP) (cf. Poeppel and Wexler, 1993; Haegeman, 1995). Null subjects also occur with finite verbs in English, which is not a V2 language. Nevertheless, there is evidence to suggest that these null elements also arise from topic or diary drop. Roeper and Rohrbacher (1994) show that null subjects in finite contexts drop significantly in WH-contexts, where the WH-phrase occupies the Spec-CP position.
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
99
analysis the specification of functional heads within DP, through definite determiners, pronouns, or plural marking, results in 'finite' DPs whose reference is obtained through grammatical anchoring via a D-chain, the nominal counterpart of a T-chain. On the other hand, underspecified DPs, for example certain null subjects and bare N subjects, are parallel to RIs in that they lack a D-chain and are therefore grammatically unanchored structures. In Section 2, we established that children by and large observe the requirements of Spec-Head agreement. We assume that the presence or absence of finiteness on the verb is determined by the functional specification within the subject DP with which it agrees. Assuming that a null subject does not carry a specification for number, the verb likewise should be unspecified for number, hence non-finite. The correlation between RIs and null subjects bears out this expectation. It can be seen from Table 11, however, that despite the high frequency of null subjects in Rl-utterances, there are also a fair number of overt subjects with non-finite verbs. This result is not surprising once we understand that not all overt DPs have equal status. In our analysis, we distinguish between specified DPs (=finite DPs), whose anchoring is visible either through a definite determiner or a plural marker, and unspecified DPs, consisting of bare Ns (e.g. doggie). We also take pronouns to be specified DPs since they occupy the D-head. Given that bare-N DPs are not marked for the agreement features relevant for number, they should freely occur as subjects of RIs. Alternatively, specified DPs (pronouns, DPs with determiners, plurals) can license an inflected verb, but they should not in principle occur with RIs. Consider Table 13, where we lay out the different DP-types that occur with finite and non-finite utterances in Dutch child language.12 Table 13 Distribution of overt subjects for Dutch children (Niek and Hein)
Overt Det Plural Pronoun 0-det (bare N)
Non-finite V
Finite V
4 2 169 28
382 180 4407 423
1% 1% 4% 6%
99% 99% 96% 94%
As expected, specified DPs occur overwhelmingly with finite verbs. Infinitives, on the other hand, which are negatively marked for finiteness, do not occur with specified DPs, again as predicted. Somewhat unexpected in this table is the fact that 0-dets also occur with finite verbs in the majority of cases (e.g. hondje zit hier 'doggie sits here'). Hoekstra and Hyams (1997) explain this in the following way: the verb form in these cases is the singular form, which in Dutch, as in many languages, 12
The analysis and counts of Hein's data was done by Yvonne Heynsdijk, whose help is hereby gratefully acknowledged. The category Det includes determiners and demonstratives, but not quantifiers.
100
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
is the default form. We understand 'default' here in a very specific sense to mean that a marked form (i.e. third person singular) is licit without there being a specified nominal subject that licenses its features. This status can be more directly observed in the use of default forms in impersonal constructions, such as impersonal passives (e.g. Er wordt gelachen 'there is laughed' = People are laughing.) which lack any kind of nominal subject. Note that being default does not mean being exempt from agreement requirements. Agreement requires non-conflict of features. Therefore, even though it is a default form, the third person singular verb form may not take a subject that conflicts with its features (e.g. a first person singular subject). The notion of default just means that the form is licit even if there are no features that it corresponds to in the specifier. Let us now turn to English, and consider the same partitioning of subject types. In Table 14 we have collapsed the various types of specified DPs (pronouns, DPs with overt determiner, and plurals), and we contrast them with bare-N DP's (0-dets). Table 14 Distribution of overt subjects for English children (Nina and Adam) Non-finite V
Specified DP 0-det
406 96
Finite V
%
n
%
32% 92%
877 8
68% 8%
The most noteworthy observation in this table is that 0-det DPs are virtually excluded with the finite -s form. This difference with Dutch is immediately explained once we realize that the -5 form in English is not a default form, unlike the Dutch -t form. As a marked and non-default form, it is licit only if its feature specification is licensed by agreeing features in the specifier. Since 0-det DPs do not have these features specified, it will not license the -s form. The second way English differs from Dutch concerns the occurrence of fully specified DPs with the non-finite (bare) form (32%) (e.g. the doggie bark}. This kind of subject is excluded with Dutch infinitives since infinitives are marked as non-agreeing forms, as noted above. This is not the case for the English bare form, however. The English bare form is not negatively marked for agreement, but just unmarked. The difference between the bare form and a real infinitive is crucial and is responsible for a range of differences in the two languages. Since Spec-head agreement is formulated as a non-conflict condition, the occurrence of a specified DP in the specifier of an unmarked form does not trigger a Spec-head agreement violation. Hence, we find in early English, the examples in (14a) while we do not find the Dutch examples in (14b). (14) a. The doggie/he bark. b. *Het hondje /hij hier zitten. the doggie /he here sit-inf.
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
101
Why then do adults not produce sentences such as (14a)? The difference with respect to agreement between the child's grammar and the adult grammar is that in the adult grammar the more specified form must be selected. This is a different requirement than Spec-head agreement, which merely requires non-conflict of features, and which is adhered to in both the child and adult grammar. The result of this discussion is that functionally, the English bare form is ambiguous between an infinitive, i.e. an unanchored structure in our terms, and a finite form, i.e. an anchored structure,13 in which an unmarked form is selected rather than the more specific marked form. This ambiguity of the bare form resolves many of the problems raised by Wexler's conjecture. On the one hand, the English bare form, because of its unmarked status, occurs with the full range of subject types. This accounts for the higher percentage of lexical subjects occurring with bare forms than with real infinitives (cf. Table 11). On the other hand, the dual function of the bare form accounts for its overall higher frequency as compared to RIs (cf. Table 4). The fact that the form that functions as an infinitive is a bare form in English, while it is a marked form in the other languages that we discuss, means that while V raises to Infl in the latter languages, Infl does not attract the bare form in English. This difference accounts directly for the asymmetry that we observed in Table 12 with respect to infinitival WH-questions. We follow Rizzi (1990) in assuming that in WH-questions a WH-feature raises from Infl to C. In English, this movement may take place as pure feature movement, whereas in the languages with a real infinitive, feature movement necessarily affects the entire category, i.e. the infinitival verb form (cf. Chomsky, 1995). As the infinitive is marked as a non-agreeing form, this results in ungrammaticality. We refer to Hoekstra and Hyams (1997) for further discussion of WH-questions in English vs. real infinitive languages. In this section we have established that a true infinitive is different from the bare form in English. This difference allows us to understand (a) the quantitative differences in frequency between RIs built on real infinitives and the bare verb form construction, (b) the distribution of subject types, and (c) its occurrence in WH-questions. In the next section we will address the question of whether this difference between real infinitives and the English bare form is also responsible for the differences we find with respect to the Eventivity Constraint and the modality of the constructions, discussed in Section 3. Note that by discarding the Null Modal Hypothesis we are left with the question of what the observed modality of RIs stems from, and how this relates to the Eventivity Constraint. It is to these questions that we now turn. 13 The question of what it means for a form to be finite or non-finite is not so easily answered. We can isolate, on syntactic grounds, positions where infinitives are required (e.g. after to, as in to be happy), as well as positions where finite forms are required (e.g. in //za/-complements, as in that he is happy), where different forms of the verb occur, if there are such different forms available. Given the poverty of English inflection, this is most often not the case, as in to go there and that you go there. This situation can be approached in two ways: one is to enter the form go twice (or more often), with different feature matrices; the other is to have one unmarked form, chosen as the elsewhere form if the context does not require a more specific form. It is the latter approach that we opt for. Hence, the bare form is marked neither as an infinitive nor as a finite form, but is compatible with syntactic positions of either sort.
102
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
5. Modal reference and the Eventivity Constraint 4.1. The modality of infinitives Although an infinitive is not inflected for person, number or tense, it is not a stem form either. Hence, the analysis of the infinitive should not only be negative, that is, in terms of absence of tense and agreement, but should also address the relevance of the infinitival morpheme itself. The absence of an infinitival affix in the English bare form has consequences for its interpretation. Giorgi and Pianesi (1997) note that the English bare form denotes not only the processual part of an event, but includes the completion of that event. In this respect it differs from real infinitives, as the contrasts in (15) show.14 (15) a. *I see John cross the street. b. I saw John cross the street. c. Ik zie/zag Jan de straat oversteken. I see/saw John the street cross-INF 'I see/saw John cross the street.' Giorgi and Pianesi argue that English verbs have the feature [+perfective]. Hence, the bare form complement denotes the entire event, including its completion. This is incompatible with the present tense of see, but it is compatible with the past tense saw, whence the contrast in (15a,b). The Dutch infinitive, in contrast, is not inherently perfective, but may refer to the processual part of an event. In this respect, it is like the English -ing form, which may occur in the complement of present tense perception verbs as well (cf. / see John crossing the street). The other Germanic languages, as well as the Romance languages, all of which have genuine infinitives, all work like Dutch in this respect. As a non-finite form, the infinitive contrasts with the participle in an aspectual sense. Whereas a participle refers to the completion of an eventuality, the infinitive denotes that the event is not yet realized. This aspectual value of the infinitive makes it understandable that in the Romance languages, the future tense is built on the infinitive, cf. French j'arriver-ai 'I shall arrive', built on arriver 'arrive-INF'. 14 Giorgi and Pianesi note that the reason for the ungrammaticality of (15a) is basically identical to the reason why event-denoting verbs cannot occur in the simple present in English. As is well known, this restriction only applies in as far as such sentences are denoting ongoing events. So, under any kind of quantification, the simple present with event-denoting verbs is fine, as in (i). (i) a. John often visits his parents. b. When John visits his parents, he ... Neil Smith (p.c.) observes that the bare verb complementation in (15a) is equally fine under quantificational conditions, as in (ii): (ii) a. I can see John cross the street. b. Whenever I look out of the window these days, I see John cross the street. That the completion of the observed event is nevertheless included can be seen in the oddness of (iii). Its Dutch translation with an infinitive does not have this oddness. (iii) ? ? I saw John cross the street when he was hit by a car.
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
103
We want to argue that it is this aspectual value of [-realized] that is the basis for the modal interpretation. Children's Rl-utterances contrast with finite utterances precisely in this respect: while finite utterances describe actual states of affairs, RIs do not refer to actual eventualities, but to eventualities that are not realized, and are therefore interpreted as statements of desire with respect to these eventualities. Importantly, children's RIs are very similar to RIs in adult language in this respect. Adult RIs have a much more restricted use, but to the extent that they occur, they have a similar [-realized] aspectual value, with an imperative or counterfactual meaning. Consider the following two categories of adult RIs (cf. Wijnen, 1996).15 (16) jussives Hier geen fietsen plaatsen! here no bicycles place-inf 'Don't put bicycles here!' (17) Mad Magazine sentences Jan met mijn zus trouwen?! Dat nooit. John my sister marry-inf. That never. Jussives are closest to the kinds of RIs used by children. Like most of the children's RIs, they involve deontic modality. The category of Mad Magazine sentences likewise denotes non-realized eventualities. The possibility of the eventuality is mentioned, which is then commented on in the next statement. So we maintain that the modal interpretation of children's RIs is determined by the inherent quality of infinitives as being marked [-realized]. And this is a feature of adult RIs as well. It is important that the modality is indeed present in the structure of RIs itself. In this respect, our analysis contrasts with a recent analysis of the modality of German RIs provided by Ingram and Thompson (1996). While Ingram and Thompson also reject the Null Modal Hypothesis, they present an explanation of the modality of RIs in terms of a model which we may call 'Acquisition by Association'. According to this model, children produce RIs because they form a semantic association between the infinitive and modal meanings. This association is based on the input they receive, sentences in which the infinitive occurs in the context of an overt modal. For example, the German child will hear connected discourse as in (18) or complex sentences such as (19), in which the infinitive occurs with the modal willen (want).
15
Wijnen also mentions as an instance of an adult RI an infinitival response, as in (i): (i) a. Wat ben je aan het doen? What are you at the do-INF = 'What are you doing?' b. Plaatjes draaien. Records play-INF = 'Playing records' However, (ib) is not a Rl-utterance, but a kind of nominal infinitive. Such an infinitival response is impossible with full object DPs, or other material typical of full clauses, as shown in (c). c. *mijn nieuwe plaatje draaien. my new record play-INF = 'Playing my new record'
104
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
(18) Was will der? Mil dem Auto fahren?' What wants he? With the car drive-INF? (19) Ich glaube der will mit dem Auto fahren, wohl? I think he wants with the car drive, right? As a result of such input the child comes to 'associate' modality with the infinitival form of the verb. Ingram and Thompson propose that in language acquisition 'what [the children] hear, is what you get'. They claim that "children will by the nature of their conservative learning strategies show surface forms that look like the input language to which they are exposed". (Ingram and Thompson, 1996: 101). In one sense, Ingram and Thompson's claims are trivially true; a child exposed to English acquires English, not Swahili. In another sense, however, the claim is quite simply false since children produce all sorts of things that they do not get in the input, for example, overgeneralized forms such as goed instead of went and mouses instead of mice, sentences with missing functional elements, such as Teddy sleeping,Where dragon? So, if it is true that children are conservative and stick to what they hear, why do they so often go astray, producing forms distinct from adult forms, but which nevertheless have a systematicity of their own? We agree with Ingram and Thompson that children's RIs have a modal meaning. The difference between their view and ours is that they assume that the modal meaning comes about through the child's associating infinitives with modals in the input, while our position is that the modal meaning is based on an element that is part of the grammatical representation of the RI, viz. the infinitival morpheme. The fact that RIs also occur - be it in a much more limited way - in adult grammar with a modal meaning, supports our view. Ingram and Thompson's association model begs the question of why the infinitives in the adult language also have a modal value (cf. examples (16) and (17)). Moreover, the claim that the modal value is assigned to infinitives on the basis of input with overt modals predicts that English speaking children's bare forms should also have a modal value since they too occur in the input with modals (viz. John can/may/will go.). As we noted earlier, the bare forms produced by English speaking children typically do not have modal value (cf. Table 9). Our analysis explains this difference between English and languages with true infinitives.16
16
Ingram and Thompson also argue that children are conservative in their acquisition of inflectional morphology, representing individual inflected verb forms holistically rather than as morphologically complex forms. Thus, for such a child kommt 'comes' is not grammatically related to geht 'goes' or any other third person singular verb since the child does not analyze the -t suffix. In a similar fashion, it must be that the child does not relate gehen 'go-inf, kommen 'come-inf or any other infinitival form since infinitives are also morphologically unanalyzed. Ingram and Thompson's child does not have any verbal categories, in fact. There is no category of 'infinitive' and hence no way of capturing any generalization over this category, including the one that forms the basis of their paper, that infintives have a modal value. They also have no way of describing the various syntactic generalizations distinguishing finite and non-finite forms discussed in the text and elsewhere.
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
105
5.2. The eventivity constraint Now that we have established the source of the modality of RIs, we are in a position to address the question about the source of the Eventivity Constraint. Let us first look somewhat more closely at a particular aspect of modality. As is well known, modal verbs are crosslinguistically ambiguous between epistemic and deontic readings. This ambiguity is triggered not by a lexical ambiguity in the modal itself, but rather is determined by the nature of the complements with which it combines. Let us consider the modal must, which denotes necessity.'7 When combined with a stative predicate, we normally obtain the epistemic reading: the truth of the state denoted by the complement is evidentially necessary, e.g. in view of the available evidence. Alternatively, the state denoted by the complement can be said to be necessary in order to comply with some condition. Consider the following examples: (20) a. John must be British b. If Bill wants to qualify for this job, he must know French (20a), a pure case of epistemic modality, states that, based on some kind of evidence, it is necessarily true that John is British. In (20b), where necessity comes out as a requirement, Bill's knowing French is presented as a necessity for him to qualify for a job. When combined with an event-denoting complement, on the other hand, must doesn't denote the necessary truth of the event, but rather the necessity of the event taking place, ie. deontic modality. Since the event itself cannot be evaluated as to its truth, must is prospective. This is illustrated in the examples in (21), where (2la) asserts that at the moment of speech there is some obligation for some future event to take place, and (21b) similarly asserts that there is future event of tearing down the house that must happen. (21) a. John must read this book b. The house must be torn down We see, then, that deontic modality arises in combination with event-denoting predicates, while epistemic modality is typically found with state-denoting predicates. More or less the same can be observed in Dutch. Consider the following examples: (22) a. Jan moet/kan het antwoord weten. John must/can the answer know. b. Jan moet/kan dit boek lezen. John must/can this book read.
We are obviously not interested in alethic truth.
deontic ?
epistemic +
+
?
106
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
The epistemic reading is most easily available in (22a). It is possible to give a deontic reading, but that requires an eventive, i.e. inchoative, interpretation for know, viz. as 'come to know'. Conversely, with an event denoting complement, as in (22b), the deontic reading is the most easily accessible reading. The epistemic reading can also obtain, but requires an imperfective reading of the infinitive, something which is possible in Dutch, but not in English since the English bare form is inherently perfective, as we discussed above. So, the epistemic reading of (22b) is basically identical to the epistemic reading of English sentence John must be reading this book, with the progressive, while the deontic reading is equal to (2la). The relationship between stativity and epistemic readings can also be brought out by considering unambiguously epistemic predicates, such as seem and believe. As is well-known, these complements require stative predicates when infinitival, as shown in (23) and (24). (23) a. b. c. d. e.
John seems to know French. John seems to be reading this book. John seems to have read this book. *John seems to read this book. John seems to dance.
(24) a. Jan schijnt het antwoord te weten. b. Jan schijnt dit boek gelezen te hebben. c. Jan schijnt dit boek te lezen. Again, while the English example in (23d) is ungrammatical, since it only allows a perfective event interpretation, the Dutch (24c) is grammatical because the infinitive can be construed as continuous, without perfectivity, and hence gives rise to a reading similar to (23b). (23e), with the event-denoting verb dance, is grammatical, but only has the reading of 'John being a dancer', hence a stative (property reading), not that of 'he seems to be dancing'. We therefore conclude that epistemic modality requires states, while deontic modality requires events. To the extent that RIs occur with stative predicates, most notably hebben 'have', the deontic modal imposes an inchoative interpretation on it, so that Thorstn Ball haben ('Thorsten ball have') means 'Thorsten must get the ball'. Barbiers (1995: Ch. 5) provides an explanation for these correlations. He argues that modals are not inherently ambiguous between epistemic and deontic meanings, but that the difference between these two readings comes about as a function of different scales upon which the modal operates. In the case of epistemic modality, this scale involves the truth value of the proposition that the modal modifies, ranging from 0 (false) to 1 (true), with moetenlmust taking the value 1, and kunnenlmay taking some intermediate value on the scale of 0 to 1. Deontic modality, on the other hand, involves what he calls a 'polarity transition', the scale referring to the necessity or probability of the transition taking place. The expression moetenlmust P, where P is the proposition modified by the modal, presupposes that P is not the case, and states that the transition from not P to P is required. With kunnenlcan modality,
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
107
the transition is said to possibly take place. Hence, John must have a gun presupposes that 'John has a gun' is not the case, and requires a transition so that 'John has a gun' is the case. Given this transition, P cannot be a stative predication, but requires predications of an eventive nature. For our purposes here, we can summarize these observations as in (25). (25) Epistemic modality is found in combination with stative predicates. Deontic modality is found in combination with event-denoting predicates. Let us assume that (25) is correct. The next important result that has been widely established in the literature is that children under three years of age do not seem to have the an epistemic use of modality (Wells, 1979; Shepherd, 1981; Pea et al., 1982; Stephany, 1986) We remain agnostic as to the reason for this developmental delay of epistemic modality, which may be either of a purely linguistic nature, or determined by a delay in the child's conceptual development. However, this early restriction to deontic modality coupled with the result in (25) provides an immediate account of the EC. As deontic modality is the only modality that children have at their disposal, the fact that only event-denoting complements are found in modal environments is a consequence of this restriction.18 A final point about the modality of RIs concerns the observation that they seem to express boulemaic (desiderative) modality, i.e. that Thorstn Ball haben would mean Thorsten wants to have the ball' rather than Thorsten must have the ball'. These readings are obviously hard to distinguish. As argued in Hoekstra (1994), deontic modality, which involves 'obligation' or 'permission', has two parameters: the source of the obligation/permission and the target of the obligation/permission. Consider the following examples. (26) a. Jan moet meespelen (van zijn vader). John must with-play (of his father) 'John is required (by his father) to play with the team'. 'It is required (by John's father) that John play with the team.' b. Er moet een doelpunt gescoord worden. there must a goal scored become 'It is required that a goal be scored'.
18
or
An alternative way to think about the eventivity constraint is in terms of denotata. Eventive predicates denote objects in the world, viz. events, at least when they have an on-going activity interpretation. Statives, on the other hand, denote properties of their subjects. This idea is inspired by the distinction argued for in Kratzer (1989), according to which events, but not states have an event argument. Following our idea that RIs receive their temporal reference through discourse context, that is, T functions like a free pronoun, it can only refer to objects in the world, and hence not to properties. For this reason stative RIs are excluded. We initially took this line of reasoning (Hyams and Hoekstra, 1995; cf. also Wijnen, 1996; Avrutin, 1996), but rejected it later on. One problem for this proposal is that the overwhelming majority of RIs in Dutch, German, etc. do not refer to on-going events, but rather have a modal interpretation. This is unexplained on a denotational account.
108
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
Consider first the target of obligation. (26a) is ambiguous in this respect. The obligation to play with the team may be directly on John, viz. John's father orders John to play with the team. Alternatively, the target of obligation might be someone other than John, viz. the coach of the team must put John on the team, or the other children must allow John to play with them, etc. The availability of this second reading is clear in (26b), where the obligation cannot be on the subject 'a goal'. As (26a) shows, in Dutch the source of the obligation may be overtly specified by a vanphrase, here by van zijn vader. We would like to argue that in children's RIs, it is the child, i.e. the speaker, who is the source of the obligation. This makes a deontic modal very much like a boulemaic modal, which is equally prospective, and where the subject is also the source of the desire. Let us now return to English. Recall from Section 3 that English bare form utterances are qualitatively different from RIs in other languages in not having modal reference, and also in not being subject to the EC. We are now in a position to explain this result. Since the modality of RIs is connected to the infinitival morpheme, we do not in fact expect English bare forms to induce modal interpretations. And since the sensitivity to the EC is a direct consequence of the kind of modality inherent in RIs, neither do we expect English bare form utterances to be subject to the EC. The English bare form, though functionally an infinitive, is very different from a true morphological infinitive. This difference manifests itself most clearly in child language because there exists a stage in which infinitives can be used more freely (i.e. in root contexts). Yet, the difference is not limited to child language, but also shows up in the limited use of unanchored infinitives in the respective adult languages. Consider the jussive and Mad Magazine sentences discussed above. In Dutch these are subject to the EC, just like children's RIs in Dutch are. This is shown in the examples in (27)-(28). On the other hand, remarkably, the only adult English type of bare form construction, the Mad Magazine sentences, does not seem to be subject to the EC. Thus (29) is fine. (27) jussives *Morgen alle antwoorden weten! tomorrow all answers know-INF (28) Mad Magazine Sentences *Jan alle antwoorden weten?! Dat geloof ik niet. John all answers know-INF?! That believe I not. (29) John know all the answers?! I don't believe it.
6. Concluding remarks By way of conclusion, let us add a few words concerning the difference between children and adults with respect to the use of RIs. As just noted, RIs are not ungrammatical in adult grammar, but simply have a much more limited use. We assume that this difference between adults and children does not reflect a grammatical difference
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
109
between the two populations, but rather a difference at the interface of grammar and discourse. Broadly put, children seem to have more options available to satisfy interface requirements. Functional categories stand squarely at the interface of grammar and discourse; C is the grammatical expression of the pragmatic force of a sentence, whether interrogative, emphatic, declarative, etc., D is responsible for the referentiality of nominal expressions, and I carries the finiteness which fixes the temporal reference of a sentence vis-a-vis speech time. These categories anchor the sentence into a discourse representation and there is, we believe, an inherent tension between the grammar and extragrammatical mechanisms with respect to this anchoring. In the adult system the grammar generally wins out, while in the child's system there is a greater reliance on discourse, and presuppositional information. On our analysis, RIs are unanchored structures in which the eventuality is not fixed through the grammatical mechanism of syntactic binding of a variable by a syntactic operator. Rather, it is discursively interpreted in the manner of a free pronoun. Pronoun resolution depends on discourse and other contextual and presuppositional information. There is a tension between syntactic binding and pronoun resolution, along the lines discussed above. As Table 8 shows, the predominant reading for the English bare V-construction is a temporal here-and-now (or past) interpretation. In other words, the child's bare form construction, in as far as it instantiates an unanchored structure, has a reading which is indistinguishable from a properly anchored (i.e. finite) structure. When this situation arises in the adult system, that is, the situation in which a reading obtained through syntactic binding is indistinguishable from a reading obtained through free pronoun resolution, the grammatically determined interpretation takes precedence (cf. Reinhart, 1983, and later work for this perspective in the case of pronouns). Thus, RIs are normally blocked in the adult system except in particular registers (e.g. Mad Magazine sentences). In the child's language, in contrast, both the grammatical and discourse-related mechanisms are available in the interpretation of functional material. This lends some substance to the intuition often expressed that children's language is more heavily discourse dependent than the adults'. In Dutch and other languages, where RIs are inherently modal, the competition does not come from the alternative in which the infinitive is replaced by a finite verb, but rather from a structure with an overt modal (eg. Papa moet de toren bouwen 'Papa must build the tower'), the latter representing the grammatical solution. In such languages, the unanchored RI does not receive the here-and-now interpretation of the English bare V because the infinitival morpheme imposes a modal reading. This difference between English and the RIs languages with respect to the locus of the competition accounts for the often noted fact that RIs in child language decline in tandem with the rise of modal verbs (cf. a.o. Jordens, 1991, Wijnen, 1994). Our expectation for English, yet to be tested, is that the bare V-forms will decline relative to deictic finite forms, while the proportion of modal sentences will remain roughly constant.19
19
We would include here not only sentences with true modals such as can, must, should, etc. but also expressions of modality such as hafta, wanna, gonna, oughta, and so on.
110
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
References Antinucci, F. and R. Miller, 1976. How children talk about what happened. Journal of Child Language 3, p. 167-189. Avrutin, S., 1996. Events as units of discourse representation in root infinitives. MIT Occasional Papers in Linguistic 12, 65-91. Barbiers, S., 1995. The syntax of interpretation. Ph.D. dissertation, HIL/Leiden University. Behrens, H., 1993. Temporal reference in German Child Language. Ph.D. dissertation, University of Amsterdam. Besten, H. den, 1983. On the interaction of root transformations and lexical deletive rules. In: W. Abraham (ed.), On the formal syntax of the Westgermania. Amsterdam: Benjamins. Bloom, L., K. Lifter and J. Hafitz, 1980. Semantics of verbs and the development of verb inflection in child language. Language 56, 386-412. Boser, K., B. Lust, L. Santelmann and J. Whitman, 1992. The syntax of CP and V2 in early child German. NELS 23, 51-65. Bronkart, J.P. and H. Sinclair, 1973. Time, tense and aspect. Cognition 2, 107-130. Brown, R., 1973. A first language. Cambridge, MA: Harvard University Press. Chomsky, N., 1995. The minimalist program. Cambridge, MA: MIT Press. Cipriani, P., A.M. Chilosi, P. Bottari and L. Pfanner, 1991. L'acquisizione della morfosintassi: Fasi e processi. Padova: Unipress. Clahsen, H., S. Eisenbeiss and M. Penke, 1994. Underspecification and lexical learning in early child grammars. Essex Research Reports in Linguistics 4. Clahsen, H. and M. Penke, 1992. The acquisition of agreement morphology and its syntactic consequences. In: J. Meisel (ed.), The acquisition of verb placement. Functional categories and V2 phenomena in language acquisition, 181-224. Dordrecht: Kluwer. Crisma, P., 1992. On the acquisition of Wh-questions in French. In: GenGenP, 115-122. Geneva: University of Geneva. Emonds, J., 1978. The verbal complex V'-V. Linguistic Inquiry 9, 151-175. Eng, M., 1987. Anchoring conditions for tense. Linguistic Inquiry 18, 633-657. Ferdinand, A., 1996. The acquisition of the subject in French. Ph.D. dissertation, HIL/Leiden University. Gelderen, V. and I. van der Meulen, 1998. Root infinitives in Russian: Evidence from acquisition. Term paper, Leiden University. Ginneken, Jac. van, 1917. De roman van een kleuter. 's-Hertogenbosch: Malmberg. Giorgi, A. and F. Pianesi, 1997. Tense and aspect: From semantics to morphosyntax. New York: Oxford University Press. Grinstead, J., 1994. Consequences of the maturation of number morphology in Spanish and Catalan. M.A. Thesis, UCLA. Guasti, M.-T., 1994. Verb syntax in Italian child grammar: Finite and non-finite verbs. Language Acquisition 3(1), 1^0. Guasti, M.-T. and L. Rizzi, 1996. Null AUX and the acquisition of residual V2. BUCLD 20 Proceedings, 284-295. Gueron, J. and T. Hoekstra, 1989. T-chains and constitutent structure of auxiliaries. In: A. Cardinaletti, G. Cinque, G. Giusti, Constituent structure: Papers from the Venice GLOW, 35-99. Dordrecht: Foris. Haan, G. de, 1986. A theory-bound approach to the acquisition of verb placement in Dutch. In: G. de Haan and W. Zonneveld (eds.), Formal parameters of generative grammar 3, 15-30. University of Utrecht. Haan, G. de. and K. Tuijman, 1988. Missing subjects and objects in child grammar. In: P. Jordens and J. Lalleman (eds.) Language development, 101-122. Dordrecht: Foris. Haegeman, L., 1994. Root infinitives, tense and truncated structures. Language Acquisition 4(3), 205-255. Haegeman, L., 1995. Root null subjects and root infinitives in early Dutch. In: C. Koster and F. Wijnen (eds.), Proceedings of the GALA 1995, 239-250. Groningen: Center for Language and Cognition.
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
111
Harris, T. and K. Wexler, 1996. The optional infinitive stage in child English: Evidence from negation. In: H. Clahsen (ed.), Generative approached to first and second language acquisition, 1-42. Amsterdam: Benjamins. Hoekstra, T., 1994. HAVE as BE plus or minus. In: G. Cinque et al. (eds.), Festschrift for Richard Kayne, 199-215. Washington, DC: Georgetown University Press. Hoekstra, T. and N. Hyams, 1995. The syntax and interpretation of dropped categories in child language: A unified account. Proceedings of WCCFL 14, CSLI, Stanford University. Hoekstra, T. and N. Hyams, 1996. Missing heads in child language. In: C. Koster and F. Wijnen (eds.), Proceedings of the GALA 1995, 251-260. Groningen: Center for Language and Cognition. Hoekstra, T. and N. Hyams, 1997. Agreement and the finiteness of V2. Evidence from child language. To appear in the Proceedings of BUCLD 21. Hoekstra, T., N. Hyams and M. Becker, 1996. The underspecification of Number and the licensing of root infintives. BUCLD 20 Proceedings, 293-306. Hoekstra, T. and P. Jordens, 1994. From adjunct to head. In: T. Hoekstra and B. Schwartz (eds.), Language acquisition studies in generative grammar, 119-149. Amsterdam: Benjamins. Holmberg, A., 1986. Word order and syntactic features in the Scandinavian languages and English. Ph.D. dissertation, University of Stockholm. Hyams, N., 1996. The underspecification of functional categories in early grammar. In: H. Clahsen (ed.), Generative perspectives on language acquisition, 91-128. Amsterdam: Benjamins. Hyams, N. and T. Hoekstra, 1995. Nominal and verbal finiteness and the specification of number in early grammars, talk presented at HIL/Leiden University and MIT. Ingram, D. and W. Thompson, 1996. Early syntactic acquisition in German: evidence for the modal hypothesis. Language 72(1), 97-120. Jordens, P., 1991. The acquisition of verb placement in Dutch and German. Linguistics 28, 1407-1448. Kayne, R., 1989. Notes on english agreement. Manuscript, CUNY. Kramer, I., 1993. The licensing of subjects in early child language. MITWPL 19, 197-212. Kratzer, A., 1989. Stage-level and individual predicates. Manuscript, UMASS. Kursawe, C., 1994. Fragesatze in der deutschen Kindersprache. M.A. thesis, University of Diisseldorf. Lebeaux, D., 1988. Language acquisition and the form of the grammar. Ph.D. dissertation, UMASS. Lightbown, P., 1977. Consistency and variation in the acquisition of French. Ph.D. dissertation, Columbia University. McWhinney, B. and C. Snow, 1985. The child langugage exchange system. Journal of Child Language 12, 271-296. Meisel, J., 1990. Inflection: Subjects and subject verb agreement. In: J. Meisel (ed.), Two first languages: Early grammatical development in bilingual children, 237-300. Dordrecht: Foris. Pea, R.D., R.W. Mawby and S.J. MacKain, 1982. World-making and world-revealing: Semantics and pragmatics of modal auxiliary verbs during the third year of life. Paper presented at the 7th BU Conference on Child Language Development. Phillips, C., 1995. Syntax at age 2: Crosslinguistic differences. MITWPL 26, 325-382. Pierce, A., 1992. Language acquisition and syntactic theory: A comparative analysis of French and English child grammars. Dordrecht: Kluwer. Pizutto, E. and M.-C. Caselli, 1992. The acquisition of Italian morphology: Implications for models of language development. Journal of Child Language 19(3), 491-558. Platzack, C., 1992. Functional categories and early Swedish. In: J. Meisel (ed.), The acquisition of verb placement: Functional categories and V2 phenomena in language acquisition, 63-82. Dordrecht: Kluwer. Plunkett, K. and S. Stromqvist, 1990. The acquisition of Scandinavian languages. Gothenburg Papers in Theoretical Linguistics 59. Poeppel, D. and K. Wexler, 1993. The full competence hypothesis of clause structure in early German. Language 69, 1-33. Pollock, J.-Y., 1989. Verb movement, Universal Grammar and the structure of IP. Linguistic Inquiry 20, 365-424. Radford, A., 1990. Syntactic theory and the acquisition of English syntax. Oxford: Blackwell. Reinhart, T., 1983. Anaphora and semantic representation. Chicago, IL: The University of Chicago Press.
112
T. Hoekstra, N. Hyams I Lingua 106 (1998) 81-112
Rizzi, L., 1990. Relativized minimality. Cambridge, MA: MIT Press. Rizzi, L., 1994. Some notes on linguistic theory and language development: The case of root infinitives. Language Acquisition 3, 371-393. Roeper, T. and B. Rohrbacher, 1994. True pro-drop in child English and the principle of economy or projection. Manuscript, University of Massachusetts at Amherst. Sano, T., 1995. Roots in language acquisition: A comparative study of Japanese and European languages. Ph.D. dissertation, UCLA. Sano, T. and N. Hyams, 1994. Agreement, finiteness, and the development of null arguments. Proceedings of NELS 24, 543-558. Santelmann, L., 1995. The acquisition of verb second grammar in child Swedish. Ph.D. dissertation, Cornell University. Schaeffer, J., 1990. The syntax of the subject in child language: Italian compared to Dutch. M.A. thesis, University of Utrecht. Serra, M. and R. Sole, 1992. Language acquisition in Spanish and Catalan children, longitudinal study. University de Barcelona. Shepherd, S.C., 1981. Modals in Antiguan Creole, child language acquisition, and history. Ph.D. dissertation, Stanford University. Shirai, Y. and R. Anderson, 1995. The acquisition of tense-aspect morphology: A prototype account. Language 71(4), 743-762. Stephany, U., 1986. Modality. In: P. Fletcher and D. Garman (eds.), Language acquisition, 375-400. Cambridge: Cambridge University Press. Torrens, V., 1992. The acquisition of Catalan and Spanish. Talk given at the psycholinguistics lab, UCLA. Travis, L., 1984. Parameters and the effect of word order variations. Ph.D. dissertation, MIT. Ud Deen, K., 1997. The interpretation of root infinitives in English: Is eventivity a factor? Term paper, UCLA. Verrips, M. and J. Weissenborn, 1992. Routes to verb placement in early German and French. The independence of finiteness and agreement. In: J. Meisel (ed.), The acquisition of verb placement. Functional categories and V2 phenomena in language acquisition, 283-331. Dordrecht: Kluwer. Wagner, K., 1985. How much do children say in a day. Journal of Child Language 12, 475- 487. Weissenborn, J., 1991. Functional categories and verb movement: The acquisition of German syntax reconsidered. In: M. Rothweiler (ed.), Spracherwerb und Grammatik. Linguistische Untersuchungen zum Erwerb von Syntax und Morphologic. Linguistische Berichte, Sonderheft 3, 190-224. Wells, G., 1979. Learning and the use of the auxiliary verb. In: V. Lee (ed.), Language development, 250-273. London: Croom Helm. Weverink, M., 1989. The subject in relation to inflection in child language. M.A. thesis, University of Utrecht. Wexler, K., 1994. Optional infinitives, verb movement and the economy of derivation in child grammar. In: D. Lightfoot and N. Hornstein (eds.), Verb movement. Cambridge: Cambridge University Press. Wijnen, F., 1994. Incremental acquisition of phrase structure: A longitudinal analysis of verb placement in child Dutch. Manuscript, Groningen University. Wijnen, F., 1996. Temporal reference and eventivity in root infinitives. MIT Occasional Papers in Linguistics 12, 1-25. Zwart, J.-W., 1994. Dutch syntax. A minimalist approach. Ph.D. dissertation, Groningen University.
ingua LLingua
&© ELSEVIER
Lingua 106 (1998) 113-131
Genitive subjects in child English^ Andrew Radford* University of Essex, Essex CO4 3SQ, UK
Abstract This chapter provides an analysis of children's so-called 'genitive subjects' (like my in My want one) within the framework of Principles and Parameters Theory. Child clauses with genitive subjects have been argued to have a very different syntactic structure from their adult counterparts, viz. to be nominal rather than clausal, or VPs rather than IPs, or projections of an underspecified (rather than a fully specified) INFL. I argue that the distribution of children's genitive subjects shows conclusively that the structures containing them are clauses rather than nominals. I go on to challenge the traditional analysis of her/my/our/its subjects as genitive pronouns, arguing instead that her subjects are objective, my/its subjects function as strong nominative pronouns for the children who use them, and that our subjects result from a lexical gap in the child's pronoun paradigm. I conclude that there is no evidence that English children go through a genitive subjects stage, and hence no evidence that the grammars developed by two- and three-year old children are radically different from their adult counterparts. Keywords: Case; Subjects; Genitive
1. Introduction In numerous acquisition studies over the past two decades, there have been reports of young children (aged 1-3 years) producing structures like those in (1) below with (italicized) genitive subjects: (1) a. My put it up 'side my bus. Her is jolly strong, isn't she? (Douglas 3;2, from Huxley, 1970) b. Her fell off ... Her didn't ... Her did it again (Polly 2;3, from Chiat, 1981)
* I am grateful to the Humanities Research Board of the British Academy for a grant which enabled me to undertake the research embodied in this paper, and to Martin Atkinson, Harald Clahsen, Roger Hawkins, Tom Roeper, Carson Schiitze and two anonymous referees for helpful comments on an earlier draft of this paper. * Phone: +44 1206 872215/+44 1255 435878; Fax: +44 1206 872085, E-mail:
[email protected] 0024-3841/997$ - see front matter © 1999 Elsevier Science B.V. All rights reserved PII: S0024-3841(98)00031-X
114
A. Radford I Lingua 106 (1998) 113-131
c. Why did her have a runny tummy? Look, our found that other bit (Sophie 3;0, from Fletcher, 1985) d. Look what my got. Her crying now (Nina 2;3, from Vainikka, 1994) e. My can't get out of here (Child 9, from Rispoli, 1995) (For familiarity, I describe child pronouns like my as genitive since they function as such in adult English, setting aside for the time being the question of whether they have the same genitive status in child English.) Rispoli (1995) has shown that children's case errors are not random: although oblique (objective/genitive) pronouns are frequently extended to use as subjects, by contrast nominative pronouns are rarely extended to non-subject use. For example, in Rispoli's study, me was used as a subject 798 times and my 191 times: by contrast, / was used as a subject 11,791 times, as a possessor 3 times, but never as an object. In this paper, my specific focus is on children's use of apparent genitive subjects. The theoretical framework I adopt is that of Principles and Parameters Theory (PPT). Within acquisition studies in a PPT framework, children's use of genitive subjects has been argued to provide crucial evidence about the nature of early child grammars. For example, Vainikka (1994) argues that genitive subjects provide us with evidence in support of a structure-building model of acquisition in which children gradually build up more and more complex clause structures, first projecting VPs, then forming an extended projection of VP into IP, and then finally forming a further extended projection of IP into CP. By contrast, Schiitze and Wexler (1996) argue that genitive subjects lend empirical support to a very different underspecification model of acquisition in which children's clauses have the same CP/IP/VP structure as their adult counterparts (so that there is continuity between adult and child structures), but in which functional heads are optionally underspecified with respect to grammatical features they carry in the corresponding adult grammars. A very different view (articulated in Pensalfini, 1995) is that children's genitive subjects suggest that their earliest clause structures may be nominal rather than verbal in nature. In this paper, I present a critique of all three analyses, ultimately arguing that children's so-called 'genitive' subjects may not be genitive at all. I begin by exploring the possibility that children's clauses with genitive subjects are nominals of some kind. 2. Nominal analyses Pensalfini (1995) highlights potential parallels between child clauses with genitive subjects and adult gerund structures with genitive subjects such as that bracketed in (2) below: (2) John's computer replies to e-mail without [his having to turn it on] He maintains that adult genitive+gerund structures are NPs, with the subject being assigned genitive case by virtue of being in spec-NP (i.e. in the specifier position
A. Radford I Lingua 106 (1998) 113-131
115
within NP). He goes on to suggest that children's clauses with genitive subjects like those in (1) above are also NPs: he analyses them as nominalizations, though provides no details of their internal structure. However, there are a number of reasons to be sceptical of Pensalfini's claim that child clauses with genitive subjects are nominal(ization)s. As Pensalfini himself observes, since +ing is the suffix most commonly used to nominalize verbs in English (and indeed the only one which is productive), his analysis would lead us to predict that genitive subjects would mainly occur with +ing verb forms. However, if we look at the distribution of my subjects in the utterances produced by Nina between 1; 11 and 3;0, we find 134 examples of my used as the subject of an uninflected verb (e.g. My have more), 5 examples of my used as the subject of a past tense verb (e.g. My saw that in a stocking), 3 examples of my used as the subject of got (e.g. My got that], and 11 examples of my used as the subject of an +ing form (e.g. My moving the legs). The fact that only 7% (11/153) of Nina's my subjects occur with +ing verb forms casts doubt on Pensalfini's analysis.1 What makes the nominalization analysis particularly problematic is the fact that a number of studies have reported children using genitives as subjects of verbs overtly inflected for (past) tense - as in the examples below: (3) a. My taked it off. My cracked the eggs. My blew the candles out (Jeffrey 2; 6, from Budwig 1995) b. My finished (Katriona 2;4, from Huxley, 1970) c. And my broked it (Child 11, 2;8, from Rispoli, 1995) d. My caught it. My ate outside. My cried in the bed. See my made a poopoo (Nina 2;l/2;l/2;3/2;4, from Vainikka, 1994) e. My bumped it (Naomi 2;3, from Powers, 1996) f. My had a tape recorder (Peter 2;5, from Schiitze and Wexler, 1996) g. My did get my leg dry (Betty 2; 6, from the Bristol corpus) Since tense is a verbal inflection (and verbs carrying past tense +d cannot be nominalized in English), it seems implausible to analyse structures like (3) as nominalizations. A different nominal analysis of genitive clauses is proposed in Hamburger (1980), and modified in Powers (1996). Hamburger suggests that child clauses like My did it are noun phrases containing a determiner my which modifies a verb phrase. Recasting Hamburger's analysis within the DP framework, Powers (1996: 132) proposes that My did it has the structure (4) below: (4) [DP [D my] [VP did it]] 1 Tom Roeper has suggested to me that the force of this conclusion is weakened by the fact that it overlooks the possibility that 'a child could be projecting a novel nominal that does not occur in the adult language' - a nominal headed by 'an invisible +ing nominalizer'. However, it is unclear what input data would lead the child to hypothesize an invisible nominalising morpheme for which there is no evidence in the adult English data which constitutes the child's speech input.
116
A. Radford I Lingua 106 (1998) 113-131
However, Powers' analysis is no less problematic than Pensalfini's. For one thing, it fails to account for the fact that genitives can occur as the subjects of auxiliaries (since auxiliaries are positioned within IP, not within VP), as in (5) below: (5) a. b. c. d. e.
Oh, my can't open it by myself (Child 3, 2;6, from Rispoli, 1995) My will do it again (Child 7, 2;4, from Rispoli, 1995) My would be lonely, won't I? (Douglas 3;2, from Huxley, 1970) My don't (Naomi 2;3, from Powers, 1996) No, my am coming up to play in there (Child 6, 2;5, from Rispoli, 1995)
The auxiliary status of items like can't and don't is underlined by the fact that they contain the clitic n't (which attaches only to auxiliaries). Structures like (5) are also problematic for Pensalfini's analysis, given that modals, do and finite forms of be cannot be nominalized in English. Further evidence against a nominal analysis of genitive clauses comes from the fact that genitive subjects also occur in interrogative structures which involve auxiliary inversion or wh-movement, as examples like those in (6) below illustrate: (6) a. b. c. b.
Can our do it again? (Sophie 3;0, from Fletcher, 1985) Should my make a airplane? (Child 9, 2;9, from Rispoli, 1995) Where my sit? (Sarah 3;0, from Vainikka, 1994) Look what my got (Nina 2;3, from Vainikka, 1994)
Since auxiliary inversion and wh-movement are found in clauses but not nominals, sentences such as (6) fatally undermine both Pensalfini's and Powers' analyses. In short, it seems highly unlikely that genitive-subject structures like those in (1), (3), (5) and (6) are nominal in nature, and far more likely that they are clausal.2 In the next section, I examine two recent clausal analyses of children's genitive subjects.
3. Clausal analyses Vainikka (1994) notes that children around two years of age produce nonfinite clauses like those below with (italicized) genitive subjects: (7) a. My get my car (Nina 1; 11) 2
Tom Roeper has pointed out to me that this conclusion is potentially undermined by child structures such as 'That a my did it' reported by Hamburger. Note, however, that a DP analysis (of the type suggested by Powers) along the lines of (4) would fail to account for the co-occurrence of a with my. Perhaps (as Tom suggests) my is analysed by the child as an adjectival possessive (of the type which occurs in Romance languages). Joseph Galasso has pointed out to me that his son Nicholas (at around three years of age) frequently said 'It's my do it' in contexts where an adult would have said 'It's my turn'. One possibility is that my do it is an elliptical variant of 'my turn to do it': however, the fact that Nicholas did not generalize this structure to other verbs suggests the alternative possibility that Nicholas misanalysed do it as a noun.
A. Radford I Lingua 106 (1998) 113-131
117
b. My see that (Adam 2;3) c. My pet him (Naomi 2;0) She argues that structures like (7) are simple VPs headed by nonfinite verbs, with the subject occupying spec-VP (the specifier position within VP). In terms of the assumptions she makes, a sentence such as (7a) My get my car would be analysed as having the simplified structure (8) below: (8) [VP My [v get] [NP my [N car]]
with the possessor my in my car occupying spec-NP (the specifier position within NP), and the subject my in My get ... occupying spec-VP (the specifier position within VP). Vainikka argues that just as the possessor my in (8) carries genitive case by virtue of being the specifier of the noun car, so too the subject My carries genitive case by virtue of being the specifier of the verb get. In terms of the specific version of Case Theory which she adopts, genitive case is assigned to the specifier of a head noun or nonfinite verb. A rather different analysis of structures like (7) is proposed by Schiitze and Wexler (1996). They argue that there is continuity between adult and child grammars, and that clauses have the same IP structure in both, with subjects occupying spec-IP. They also argue that there is continuity between the case systems found in adult and child grammars, so that children 'know' that: (9) An a. b. c.
overt (pro)nominal subject is nominative if the subject of a [+agr] INFL genitive if the subject of a [-tns, -agr] INFL objective otherwise (= if the subject of a [+tns, -agr] INFL)
However, they claim that INFL may be underspecified in child grammars in respect of its tense and/or agreement features. Under their analysis, (7a) would be an IP which has the structure (10) below, with the subject my carrying genitive case by (9b):
(10) [IP My [, -tns, -agr] [VP [v get] my car]] They argue that an analysis such as (10) maximizes continuity between adult and child grammars, since (they maintain) adult English gerund structures like my winning the race are clauses headed by a [-tns, -agr] INFL and also have genitive subjects. Unfortunately, there are empirical shortcomings in both Vainikka's analysis and Schiitze and Wexler's analysis. The two analyses have in common that they assume that genitive subjects are found only in nonfinite clauses. However, sentences such as (3), (5) and (6) above show that this is not the case at all; one example of each type is repeated in (11) below: (11) a. My taked it off (Jeffrey 2; 6, from Budwig, 1995) b. No, my am coming up to play in there (Child 6, 2;5, from Rispoli, 1995) c. Should my make a airplane? (Child 9, 2;9, from Rispoli, 1995)
118
A. Radford I Lingua 106 (1998) 113-131
Since my in examples like (3), (5), (6) and (11) is used as the subject of verbs and auxiliaries overtly inflected for tense/agreement, it clearly cannot be maintained that genitives only occur as subjects of nonfinite verbs. Moreover, both analyses are problematic from a developmental perspective. Vainikka assumes that nonfinite verbs assign genitive case to their subjects: but this raises the (unanswered) question of how children come to acquire this type of case marking (since e.g. infinitives don't allow genitive subjects in adult English), and how they later come to delearn it. Equally problematic is Schiitze and Wexler's claim that genitives occur as subjects of a [-tns, -agr] INFL in child and adult grammars alike. One developmental question which this raises is why Rispoli's (1995) data show that my subjects represent only 1.5% (191/12,780) of children's first person singular subjects: if child grammars license genitive subjects in root clauses, we should surely expect them to be used far more productively. There are also theoretical questions posed by the two analyses. One such relates to descriptive adequacy of the claim in (9b) that a [-tns, -agr] INFL licenses a genitive subject in adult English. (9b) would appear to be falsified by Mad Magazine sentences such as that produced by speaker B in the dialogue below: (12) Speaker A: I heard that you got drunk at Nina's party last night Speaker B: Me/*my/*I get drunk at Nina's party?! Impossible -1 was at home in bed with a good bottle of malt whisky Here, INFL would seem to be [-tns], since the verb get is tenseless (it does not carry past tense even though the alleged incident took place in the past); and by hypothesis, INFL cannot be [+agr], since a [+agr] INFL requires a nominative subject by (9a). So, INFL is [-tns, -agr] in (12B), and yet has an objective subject - thereby falsifying (9b). Note that subjects in adult Mad Magazine sentences are clearly in spec-IP rather than spec-VP, as can be seen from speaker B's reply in (13) below: (13) Speaker A: How can we have a serious conversation when you won't take me seriously? Speaker B: Me not take you seriously?! You can't be serious! If the subject me were in spec-VP, it would follow negative not (which is positioned between INFL and V): the fact that me precedes not suggests that me is in spec-IP. And yet, the subject me clearly does not carry genitive case, so further undermining (9b). 4. The problematic status of genitive subjects So far, I have argued that children's structures with genitive subjects are clauses rather than nominals. Does this mean that children's clauses systematically license genitive subjects? There are a number of reasons for doubting this. One is that it is
A. Radford I Lingua 106 (1998) 113-131
119
far from clear that such structures are consistent with principles of UG (= Universal Grammar). The key point to note here is that children sometimes use genitives as the subjects of verbs which are clearly inflected for agreement, as the examples in (14) below illustrate: (14) a. Now see my am (Child 4, from Rispoli, 1995) b. No, my am coming up to play in there (Child 6, 2;5, from Rispoli, 1995) c). My am mad (Child 9, 2; 10, from Rispoli, 1995) Now, while there are languages in which genitives can be used as subjects of finite verbs (e.g. Finnish, Icelandic and Russian), the verbs in the relevant constructions are typically impersonal (as Schiitze (1997) notes). In other words, verbs don't agree in person and number with genitive subjects in finite clauses, but rather are in the default third person singular form, as the following Icelandic example (from Andrews, 1990: 171) illustrates: (15) Sjuklinganna var vitjad The-patients(MGP) was(3S) visited(DEF) 'The patients were visited' (MGP = masculine genitive plural; 3S = third person singular; DEF = default.) If UG determines that genitive subjects don't agree with finite verbs,3 and if child grammars are constrained by UG principles, doubts arise as to whether my subjects like those in (14) could be the result of child grammars systematically licensing genitive subjects in root clauses, since it would appear that my agrees with (a)m in such structures. A second reason for questioning whether child clauses do systematically license genitive subjects comes from our earlier observation that genitive subjects are comparatively rare: Rispoli's (1995) study shows that my subjects represent only 1.5% (191/12,780) of children's first person singular subjects. If child grammars systematically license genitive subjects, we should expect them to be used far more frequently. A third doubt about whether genitive subjects are licensed in children's clauses arises from the observation by Schiitze (1997: 220) that children tend to use a very limited range of genitive subjects. Pensalfini's (1995) study shows that the only genitive subjects used by the four children he studied were my and her in the case of Eve, Nina and Naomi and my in the case of Peter, as shown in the table below:
3 An anonymous referee has pointed out to me that a potential problem with this claim is posed by ergative languages in which ergative case is identical to the genitive. It may be, therefore, that this claim has to be weakened so as to apply only to non-ergative languages such as English.
120
A. Radford I Lingua 106 (1998) 113-131
(16) Number of recorded examples of genitive subjects Child
Age
My
Her
Others
Peter Eve Nina Naomi
2;0-2;8 l;6-2;3 2-1-2-5 2-0-2-5
39 13 12 4
0 5 114 2
0 0 0 0
It would appear that children like Peter, Eve, Nina and Naomi simply don't produce other genitive-subject structures such as the following (where the equals sign means 'is intended to be synonymous with'): (17) a. b. c. d. e. f.
Our were hungry (= 'We were hungry') Your were sleeping (= 'You were sleeping') Their might frighten me (= 'They might frighten me') His couldn't see me (= 'He couldn't see me') Its was raining (= 'It was raining') Daddy's has gone to work (= 'Daddy has gone to work')
So, for children like these, production of genitive subjects is limited to her and my. This obviously raises the question of whether such forms really do represent a systematic syntactic error (with finite verbs and auxiliaries allowing genitive subjects), or whether they can be accounted for in some other way.4 In the next two sections, I show that there are other (more plausible) ways of analysing her and my subjects. 5. Her subjects Schiitze (1997: 78-9) suggests that her subjects in sentences like Her likes me are attributable to a gap in the child's lexicon, reasoning as follows: "At least some children go through a detectable stage at which some of the English pronoun forms are not produced at all; this is particularly common with respect to she ... The syntactic tree could be built up using feature bundles such as [pron, 3sg, fern, NOM] ... If there is no vocabulary entry with exactly this set of features, then the item with the greatest subset of these features will be inserted. Thus, a child who knows the word her, and knows that it is a feminine singular pronoun, could insert it in a tree, producing Her goes." 4
Vainikka (1994) suggests that many examples of child genitive subjects are mistranscribed as nominative+copula structures: in other words, she claims that what is transcribed as you're/they 're/it's/he's may be a mistranscription of genitive your/their/its/his subjects (though I am sceptical about whether all three independent transcribers who transcribed the Brown corpus would have confused he's with his, given the clear differences in vowel quantity and quality between them). If this were so, children would be using a wider range of genitive subjects than is claimed here. However, the fact that genitive your/their/his/its subjects don't occur in contexts like (17) where such confusion is unlikely in principle casts doubt on this claim.
A. Radford I Lingua 106 (1998) 113-131
121
In other words, at a stage where the child's lexicon includes an entry for her but no entry for she, her would be used in contexts where adults require she. This would predict that children only use her as the subject of an agreement-inflected verb when they have no entry for she. However, doubt is cast on the generality of such an account by the fact that some children continue to use her subjects long after they have acquired she. For example, Pensalfini's (1995) study shows that Nina (from 2; 1 to 2;5) used her 1 times as the subject of finite clauses, but also used she 12 times. Huxley's (1970) study suggests that Douglas went through a period of alternating between she and her subjects. Relevant examples are given in (18) below: (18) a. She naughty girl (2;5). She doe(s)n't put them in (2;8). She is a big girl (3;2) She wasn't cooking (3;6) b. Her haven't got her glasses (2;9). He big so her able to ride on her big donkey (2; 10). Her up in bed. Yes her can (2; 11). Her bringed it (3;0). Her would just break it. Yes her sometimes locks it (3;4). But where is her coming? And her had yellow hair too (3;5). No her won't fit it right (3;5) Indeed, we even find she and her subjects within the same sentence: (19) Her is jolly strong, isn't she! (3;2) She kept hiding our balls and I needed to shoo her away but her didn't go (3;5). For children like Nina and Douglas, it is clear that the use of her subjects can't be attributed to nonacquisition of she. A more plausible way of accounting for her subjects is to analyse them as objective rather than genitive forms: after all, adult English her serves both functions. Some evidence that her subjects are objective comes from the study by Pensalfini (1995) which shows that the three children in his study who produced her subjects (Eva, Nina and Naomi) also produced objective subjects; relevant figures are given in the table below: (20) Number of recorded examples of objective subjects
*
Child
Age
Me
Us
Him
Them
Eve Nina Naomi
l;6-2;3 2;l-2;5 2;0—2;5
13 2 22
1 * *
2 12 4
2 * *
There were no examples weluslour or they I themi their subjects in the relevant corpus.
Hence, it seems plausible (on distributional grounds) to conclude that (for such children) her subjects may be objective rather than genitive. Huxley's (1970) study of Douglas reveals a similar pattern: alongside her subjects, Douglas also uses overtly objective subjects like melhimlus/them, as the exam-
122
A. Radford I Lingua 106 (1998) 113-131
pies in (21) below illustrate (numbers in parentheses indicate the age at which the relevant recording was made): (21) How me get them out? Now take them out when me finished. Douglas can't say them big (2;8). Me finish that up. That's how us put them on again. Me filling it up now (2;9). It is pretty when me put my socks on. Us able to make two trees. Yes them go round like that (2; 10). Me able to nip onto that one. Him jumping out. Him only in the picture. Them match. How me put it under? Now him happy. Now us able, going to get more horse (2; 11). Know what me keep for you? Us going to make a road for our cars. Them putting their shoes into the water. Right, after me read all stories Douglas will. One day us went to Granny's. What's for us having for lunch? Douglas see if me can do it. Us must come to look for it now (3;0). And him bumped into prison (3;1). When me big, I will go to playgroup. Him did get stung, didn't he? Him pulled out the telephone. Then us taked off all our clothes (3;2). Us need to have two piles, mustn't we? Us got a jigsaw what has it. Them can't go (3;3). Us got some round ones in our garden. And then them drove away. Driver can peer out, couldn't him! Them got names (3;4). Yes, us got a toy one in our room. When them have gone away. Us can see seagulls from here. Us couldn't keep them in the right place. No us haven't, haven't we not Mummy? (3;5) Us can make fire engines with that. Us going to have a visitor. Us went to see her the other day (3;6). Could us draw a picture of that? No, us buyed this in a shop (3;9). One way in which we can account for Douglas' use of objective subjects is to suppose that he has already acquired the adult English case system and hence 'knows' that: (22) An a. b. c.
overt (pro)nominal is nominative if in a checking relation with a [+agr] INFL genitive if in a checking relation with a [+agr] D objective otherwise
(Here I am assuming, following Abney (1987), that genitive possessors serve as specifiers of a DP headed by a D which agrees with its specifier, and that adult gerunds with genitive subjects are DPs with the genitive in spec-DP.) Let's further suppose (following Schiitze and Wexler) that the clauses children produce may be underspecified for tense, agreement or both. On this view, clauses containing a tensed verb or auxiliary with an objective subject (like One day us went to Granny's or Could us draw a picture of that?) are IPs headed by a [+tns, -agr] INFL, whereas clauses which lack a finite verb or auxiliary (like Me filling it up now or Now him happy) are IPs headed by a [-tns, -agr] INFL: both types of clause have a subject with (default) objective case by (22c), since it follows from (22c) that any clause headed by a [—agr] INFL has an objective subject. On this view, her subjects would carry objective case, and would be used as subjects of clauses headed by a [+tns, -agr] INFL in structures like Her bringed it and Her would just break it, and by a [-tns, -agr] INFL in structures such as Her able to ride on her big donkey.
A. Radford I Lingua 106 (1998) 113-131
123
One problem posed by this account, however, is that we find her used as the subject of s-inflected forms, as the examples in (17) and (18) above show (cf. Yes her sometimes locks it, But where is her coming?, Her is jolly strong, isn't she?}. Analysing her as objective in such sentences would conflict with the claim implicit in (22) that objective subjects are used only in agreementless clauses. If we accept the assumption made by Schiitze and Wexler that +s is an inflection marking (present) tense and (third person singular) agreement, we should expect that children would use only nominative (not objective) pronouns as the subjects of ^-inflected forms. If so, her cannot be objective in sentences like But where is her coming?. However, the data from Douglas in Huxley (1970) seem to call this assumption into question. Douglas alternates between nominative he and objective him subjects with 5-forms, as the examples below illustrate: (23) a. He's got a mother (2; 11) He's a clever pilot, he can fly upside down. When he crashes into the sea, this rescue boat go. He's quite like a duck (3;4). He can go in the lorry too, but not Peter Rabbit 'cos he is naughty (3;5) b. Him is driver. Him is bear. Him was at Granny's house, too (2; 11). Him hits it with it. Then the postman comes to get it, then outs it, then puts it into big piles, doesn't himl (3;3). Him is getting some petrol (3;4) (Douglas, Huxley 1970) One way of interpreting such data is to suppose that s-forms have a dual status. They can either mark agreement with a third person singular subject and so have a nominative subject by (22a), or they can represent an agreementless default form (i.e. a form which carries no agreement features in the syntax but is assigned the default third person singular value in the morphology) and so have an objective subject by (22c). At any rate, the fact that Douglas uses him as the subject of s-forms undermines the credibility of claiming that her cannot be objective when used as the subject of s-forms. Having suggested that her subjects are probably best analysed as objective pronouns, I now turn to look at the status of children's my subjects. 6. My subjects As already noted, my subjects occur in a wide range of structures such as those in (24) below: (24) a. What my doing? (Eve 1; 10, from Vainikka, 1994)5 b. Where my sit? (Sarah 3;0, from Vainikka, 1994) c. Know what my making? (Nina 2;4, from Vainikka, 1994) 5
The form transcribed as my here may represent a contracted form of (a)m I?, so that What my doing? could be a mistranscription of What (a)m I doing? For this reason, sentences like (24a) should not be taken as clearcut examples of (potential) genitive subjects.
124
A. Radford I Lingua 106 (1998) 113-131
d. e. f. g. h.
My My My My My
get my car (Nina 1; 11, from Vainikka, 1994) seen Terrence the digger (Bill 2; 5, from Anders sen, 1996) going in (Nina 2; 3, from Powers, 1996) taked it off (Jeffrey 2; 6, from Budwig, 1995) will do it again (Child 7, 2;4, from Rispoli, 1995)
Interestingly my is often used in nominative contexts (i.e. contexts where adults require nominative /). This is particularly clear in the case of sentences like (25) below, where my is used as the subject of an auxiliary am which is inflected for both tense and agreement: (25) a. Now see my am (Child 4, from Rispoli, 1995) b. No, my am coming up to play in there (Child 6, 2;5, from Rispoli, 1995) c. My am mad (Child 9, 2; 10, from Rispoli, 1995) What this suggests is that my may be functioning as a lexical variant of /, and thus carry nominative rather than genitive case: one possibility is that children who use my subjects misanalyse my as a strong form of the nominative pronoun /. Personal pronouns in English are generally of the form stem+affix, so that e.g. the pronouns he/him/his can be segmented as h+e/h+im/h+is and likewise the pronouns they/them/their as th+ey/th+em/th+eir. However, strong forms like him/them have weak (contracted) variants 'iml 'em which have a null stem (e.g. in colloquial structures such as / want 'im to find 'em). The adult first person pronouns me/'myil can be segmented as /m+i/, /m+ai/, /0+ai/, with the /m/+ stem being restricted to oblique (objective/genitive) use, and the null stem 101 being restricted to nominative use (and with +/ai/ serving as a nominative/genitive suffix, and +/i/ as an objective suffix). Since null-stem forms like 'im and 'em are weak variants of the strong forms him and them, it may be (as Tom Roeper has suggested to me) that some children initially hypothesize that the null-stem nominative form /ai/ T is a weak form, and that it has a strong form counterpart /m+ai/ 'my' containing the overt first person singular stem /m/ which is found in forms like me/my/mine. This would mean that my is the strong form and / the weak form of the first person singular nominative pronoun. An obvious consequence of this would be that children's my subjects (which from an adult perspective would appear to be genitives) are (from a child perspective) actually strong nominatives: the fact that my subjects occur in nominative contexts in sentences like (33) is therefore entirely to be expected.6 The hypothesis that children who use my subjects misanalyse my as a strong nominative pronoun predicts that children will use my as the subject of strong (uncon6
Note that the suggestion that children have two first person nominative pronoun forms (strong my and weak /) does not entail that they will conversely use / as a weak counterpart of possessive my. The assumption made here is that children posit that weak pronouns with a null stem have strong counterparts (not that strong pronouns with an overt stem have weak counterparts with a null stem). Since adult overt-stem forms like me/my/we/you/your/she have no null-stem counterparts, there is no reason for the child to expect genitive my to have the null-stem counterpart /.
A. Radford I Lingua 106 (1998) 113-131
125
tracted) auxiliaries, but not of clitic (contracted) auxiliary forms (if we assume that auxiliary contraction involves cliticisation of a weak pronoun to a weak auxiliary, as argued in Radford, 1997b). Although no relevant quantitative research has yet been undertaken, the data on genitive subjects reported in the existing literature (some of which are cited in (26) below) would generally appear to bear out this suggestion: (26) a. b. c. d. e. f. g. h. j.
My can't get out of here (Child 9, from Rispoli, 1995) Oh, my can't open it by myself (Child 3, 2;6, from Rispoli, 1995) My will do it again (Child 7, 2;4, from Rispoli, 1995) My would be lonely, won't I? (Douglas 3;2, from Huxley, 1970) My don't (Naomi 2; 3, from Powers, 1996) Should my make a airplane? (Child 9, 2;9, from Rispoli, 1995) Now see my am (Child 4, from Rispoli 1995) No, my am coming up to play in there (Child 6, 2;5, from Rispoli, 1995) My am mad (Child 9, 2; 10, from Rispoli, 1995)
As predicted, children seem to say my will rather than my 'II, my would rather than my'd, my am rather than my'm, and so on.7 However, sentences such as (27) below at first sight seem to pose a problem for the assumption that my subjects are strong nominative pronouns, since they might appear to show that children sometimes use my as the subject of a nonfinite verb (e.g. an +ing or +n participle): (27) a. My seen tractors. My seen Terrence the digger (Bill 2;5, from Anderssen; 1996) b. My been the sweeties shop. My driving this car (Kenny 2;8, from Anderssen; 1997) c. My moving the legs. My going in. My gonna make a egg (Nina 2;0/2;3/2;5, from Powers, 1996) Since nominative pronouns occur only as the subject of finite (not of nonfinite) verbs in adult English, it might be thought that sentences like (27) undermine the claim that my is a strong nominative pronoun. However, it is interesting to note that the same children also use other nominative pronouns as subjects of clauses which contain no finite verb or auxiliary, as the examples in (28) below illustrate: 7
The only example of my used as the subject of a contracted auxiliary which I am aware of is the following: i My'm gonna play cowboy (Child 9, 2; 10, from Rispoli, 1995) However, what is not clear from the transcription in (i) is whether 'm represents an unstressed nonclitic form of am with a schwa nucleus, or whether it represents a true nonvocalic clitic form /m/. An interesting question raised by (8d) My would be lonely, won't I? is why the subject in the tag should be / rather than my. It may be that tags typically involve cliticisation of a subject pronoun to an auxiliary, and that this is why the weak (clitic) form / is used here. The assumption that tags involve cliticisation of a subject to an auxiliary would account for the fact that tags do not allow (non-clitic) nominal subjects - hence the ungrammaticality of * Harry is lying, isn't Harry?
126
A. Radford I Lingua 106 (1998) 113-131
(28) a. He sitting on he knee. He not bigger. He not on your shoes. He running. He not weeing in there. They jumping? / seen scarecrow in the park. He not hot (Bill 2; 5-2; 6, from Anderssen, 1996) b. He have to open it? / got to be in bed (Kenny 2;7, from Anderssen, 1996) c. / popping balloons (Nina 2;0, from Vainikka, 1994) One way of accounting for data like (27) and (28) is the following. Let's suppose that INFL in child grammars may be underspecified in respect of its tense or agreement features, and that in sentences like (27) and (28) INFL is specified for agreement but not tense. On this view, a sentence such as They jumping would have the simplified structure (29) below: (29) [jp They [, -tns, +agr] jumping] Because INFL is unspecified for tense and finite auxiliaries in English can only lexicalise a [+tns] INFL, the head I position of IP is empty in (29). However, since nominative pronouns like they occur as specifiers of a [+agr] INFL, the subject they in (29) carries nominative case by (22a). In much the same way, we can argue that a sentence such as My seen tractors has the structure (30) below: (30) [IP My [I -tns, +agr] [VP [v seen] tractors]] with the subject carrying nominative case by virtue of being the specifier of a [+agr] INFL, and with the strong form my being used because the subject is in spec-IP rather than cliticized to INFL.8 The more general conclusion which this line of reasoning leads us to is that sentences like (27) do not undermine the claim that my subjects are strong nominative pronouns.9
8 An interesting question posed by the assumption that / is a weak (and potentially clitic) nominative pronoun form for children who use my subjects is what / cliticises to in a sentence such as / got to be in bed. The answer may be that / attaches to a clitic variant of the auxiliary have in INFL which ultimately surfaces as a null form (perhaps because the cluster /vg/ in 'I've got' is reduced to /gg/ by assimilation and /g/ by degemination). If so, this would suggest that / subjects are in INFL, whereas my subjects are in spec-IP. For arguments that clitic subjects in adult English are in INFL, see Radford (1997b). 9 A fact which the analysis offered here has to account for is that the use of my as a subject pronoun is relatively rare. Of 12,780 first person singular subjects produced by the children studied in Rispoli (1995), 92% (11,791) were nominative, 6% (798) were objective, and only 1.5% (191) were genitive. In terms of the analysis offered here, what might be claimed is that most children correctly identify the properties encoded by specific lexical items from the outset (in keeping with the lexical continuity assumption made by Schiitze, 1997), so that they know that / has a dual status as a strong/weak nominative form in English and hence don't make the error of thinking that my is a strong nominative pronoun. An alternative account of the rarity of my subjects is offered by Rispoli (1994, 1995, 1997), who claims that children's my subjects are the result of a sporadic retrieval error: more specifically, he claims that my subjects are retrieved when nominative 7 cannot be accessed in the child's lexicon. Rispoli's analysis amounts to the claim that my subjects are the result of a performance error: hence, his analysis is consistent with the more general claim made here that my subjects do not indicate that child grammars systematically license genitive subjects.
A. Radford I Lingua 106 (1998) 113-131
127
7. Our subjects Although most instances of (potentially) genitive subjects reported in the acquisition literature are occurrences of my or her, we nonetheless find sporadic reports of children using other types of genitive subject. For example, Fletcher (1985) reports that Sophie used our as a subject in sentences such as those below: (31) Can our do this - this red one? Look, our found that other bit. Can our do it again? Once our came back from somewhere. Once our came back from somewhere, and me found it there, Mummy (Sophie 3;0, from Fletcher, 1985) Since Sophie uses our as the subject of finite verbs and auxiliaries, one possibility (suggested in Radford, 1991 a) is that she extends the domain of genitive case assignment from the specifier of a [+agr] D in nominal structures like our car to the specifier of a [+agr] I in clausal structures like Once our came back from somewhere. This would mean that corresponding to the adult pattern of case assignment found in (32a) below, Sophie develops the alternative pattern in (32b): (32) a. A [+agr] D can check genitive case b. A [+agr] head can check genitive case In comparison with (32a), (32b) is categorially underspecified, since (32a) allows the specifier of a DP (but not an IP) to carry genitive case, whereas (32b) allows the specifier of DP and IP alike to carry genitive case. However, it is unlikely that Sophie's grammar systematically allows genitives to be used as subjects of clauses headed by a [+agr] INFL. For one thing, although she uses my, your and his as possessives in sentences such as the following (all produced by Sophie at 3;0): (33) a. Give my one to - her. Why did you put that in my room? Mum, is this my xxx? b. Me going to watch you doing your riding lesson. Here's your one. c. There's his face there are no examples of her using genitive my1yourIhis subjects. In fact, the only other potentially genitive pronoun she uses as a subject is her, e.g. in sentences such as the following (all produced by Sophie at age 3;0): (34) Why did her have a runny tummy? And why did her have - two sweets, Mummy? Why did you give her - to her when her been flu? What did her have wrong with her? However, as noted earlier, her is not a clearcut example of a genitive subject since it could alternatively be analysed as an objective pronoun (As table (35) below illustrates, Sophie makes extensive use of me subjects).
128
A. Radford I Lingua 106 (1998) 113-131
The full range of subject pronouns used by Sophie in the four transcripts found in Fletcher (1985) is given in (35) below: (35) Number of times Sophie uses pronouns as subjects Pronoun form
/ me you he she her we our they
Age
2;4
3;0
3;5
4 24 15 0 0 5 0 4 0
0 37 8 10 1 4 0 5 0
65 1 30 4 0 6 7 0 0
36 0 21 16 2 0 3 0 4
An interesting pattern which appears to emerge from the table in (35) is that our subjects are used prior to the acquisition of we, and that once we is acquired (by 3; 5) our subjects no longer occur. Although Sophie appears not to have acquired we by 3;0, she has acquired us as we see from (36) below: (36) He given one to - two to Hester and two to us (Sophie 3;0, from Fletcher, 1985) This suggests that her only first person plural pronoun forms at 3;0 are our and us. But if this is so, a lexical gap analysis (of the sort proposed by Schiitze (1997) for sentences like Her goes) might be appropriate. That is, lacking the form we in her lexicon, Sophie resorts to using another first person plural pronoun instead in nominative contexts: since nominative and genitive pronouns share in common (under the analysis proposed in (22)) the fact that their case is checked by a [+agr] head (nominatives in clauses, genitives in nominals), our is used rather than us. If so, Sophie's our subjects are not the result of a syntactic error in case assignment, but rather the result of a lexical error (reflecting the fact that she has an incomplete pronoun paradigm at the relevant stage).
8. Its subjects Brown (1973) reports that Adam used its as a subject; examples (from Stromswold, 1990) are given below (where § marks the number of the relevant file): (37) Why its came off? (§19) Its breaks. Its can't fit with dis (§24). Its doesn't write (§26). Its opens. No, its doesn't go (§27). Adam said: When it go outside its moves (§28). Its fell so hard. Why its flies all by itself? (§33). Its will knock
A. Radford I Lingua 106 (1998) 113-131
129
Paul? Its popped. Its looks like a popper who pop. (§34). Its pulls it. Its will hold it. Its comes apart. Its keeps (§35). Its makes like a sword (§36). Its comes off. Its hurts. Its runs away. Its breaks (§37). Its keeps falling off. Its could go in the tunnel like dat, could it, huh? (§39). Its turns (§40). Its doesn't talk (§41). See its makes some more colours. Its won't hurt (§42). Its stopped OK (§43). Its is real (§45). Its writes [...] (§47) The status of its in utterances like (37) is far from straightforward. One possibility is that its comprises the pronoun it and the contracted variant 's of the auxiliary is. We could then say that a sentence like Its opens comprises the subject pronoun it, plus a contracted third person singular present tense auxiliary 's, plus a third person singular present tense verb opens: and this would be compatible with the view that sentences like (37) involve the use of a nominative it subject (rather than a genitive its subject). On this analysis, structures like Its comes apart might involve some form of tense copying and have a (simplified) structure along the lines of: (38) [IP It [j 's] [VP [v comes] apart]] with (present) tense marked not only on the auxiliary is but also on the verb comes. However, there are a number of aspects of the sentences in (37) which cannot be accounted for straightforwardly in terms of a tense-copying analysis like (38), under which 's is a present tense auxiliary. Firstly, only the contracted form its is found in such structures, not the full form it is. Secondly, although we find its in such structures, we don't find (e.g.) he's, she's, or Daddy's. Thirdly, its occurs as the subject of past-tense verbs like came/fell/popped/stopped/could. Fourthly, its occurs as the subject of sentences which already contain a tensed auxiliary like can't/will/won't/could/doesn 't. And finally, other cases of tense-copying in noninterrogative sentences reported in the acquisition literature involve the use of the auxiliary do rather than be, as the examples produced by Ross in the files indicated below illustrate (The data are from Stromswold, 1990): (39) He's getting unhappy [#] and he doesn't likes to be unhappy (§32). I did fell (§34). But my boots does tickles. And he did jumped on there. I didn't disappeared (§35). Yeah [#] but for a long time it did worked (§39). You ask me [#] it doesn't exists (§55) For reasons such as these, it seems more plausible to follow Brown (1973) in concluding that its is a single lexical item. But what is its status? One possibility is that its is misanalysed as a nominative pronoun (so that the child's pronoun form its serves a dual nominative/genitive function). Such a misanalysis might come about via mis-segmentation of adult forms like It's disappeared or It's raining, where the sequence its is misanalysed as a strong nominative pronoun rather than as a combination of a weak pronoun it cliticized to a weak auxiliary 's. This type of mis-segmentation error is all the more understandable if Adam's speech input is to a large extent based on African American Vernacular English (in which
130
A. Radford I Lingua 106 (1998) 113-131
unstressed is typically has a null realization). Of course, this begs the question of why sentences like He's left or She's working aren't similarly mis-segmented, with he's and she's being misanalysed as (strong) nominatives. A plausible answer is that pronouns form a closed system, and children only reanalyse forms for which they already have existing entries within their own lexicon: since the child has no entry for forms like she's and he's (but does have an entry for its), only it's can be missegmented as a single item its, not he's or she's. Such an analysis would also account for the fact that Adam makes no productive use of thats or whats ? as subjects: since these pronouns have no inflected case forms (e.g. no genitive form *thats/*whats), mis-segmentation of sentences like That's broken or What's happening ? would not be expected. 9. Conclusions There are two main conclusions which emerge from this study. The first is that children's structures with apparent genitive subjects are not nominals, but rather clauses. The second is that it is unlikely that children's clauses license genitive subjects in English: my and its subjects are arguably misanalysed as strong nominative pronouns; her subjects are objective; and our subjects are the product of a lexical gap in the child's pronoun paradigm. If so, we no longer have to concern ourselves with explaining why children acquiring English go though a stage when they produce genitive subjects in root clauses, even though there is no counterpart of this type of structure in adult English, and even though children acquiring other languages with nominative subjects don't appear to go through a stage of using genitive subjects (according to Schiitze 1997: 234). If our reasoning here is along the right lines, there is no genuine genitive-subject stage in the acquisition of English. More generally, there is no evidence that the grammars of two- and three-year-old children differ radically from their adult counterparts. References Abney, S.P., 1987. The noun phrase in its sentential aspect. Ph.D. dissertation, MIT. Anderssen, M., 1996. The acquisition of functional categories. (Troms0 Studies in Linguistics 17.) Oslo: Novus. Andrews, A., 1990. The VP-complement analysis in Modern Icelandic. In: J. Maling, A. Zaenen (eds.), Modern Icelandic syntax: Syntax and semantics 24, 165-185. San Diego, CA: Academic Press. Brown, R., 1973. A first language: The early stages. Harvard, MA: Harvard University Press. Budwig, N., 1995. A developmental-functionalist approach to child language. Mahwah, NJ: Erlbaum. Chiat, S., 1981. Context-specificity and generalizations in the acquisition of pronominal distinctions. Journal of Child Language 8, 75-91. Retcher, P., 1985. A child's learning of English. Oxford: Blackwell. Hamburger, H., 1980. A deletion ahead of its time. Cognition 8, 389^416. Huxley, R., 1970. The development of the correct use of subject pronouns in two children. In: G.B. Flores d'Arcais, W.J. Levelt (eds.), Advances in psycholinguistics, 141-165. Amsterdam: North-Holland.
A. Radford I Lingua 106 (1998) 113-131
131
Pensalfini, R., 1995. Pronoun case errors: Both syntactic and morphological. MIT Working Papers in Linguistics 26, 305-324. Powers, S.M., 1996. The growth of the phrase marker: Evidence from subjects. Ph.D. dissertation, University of Maryland. Radford, A., 1997a. Possessive structures in child English. Manuscript, University of Essex. Radford, A., 1997b. Syntactic theory and the structure of English. Cambridge: Cambridge University Press. Rispoli, M., 1994. Pronoun case overextensions and paradigm building. Journal of Child Language 21, 157-172. Rispoli, M., 1995. Mechanisms of pronoun case error: biased retrieval, not syntactic incompetence. Manuscript, Northern Arizona University. Rispoli, M., 1997. The default case for subjects in the optional infinitive stage. In: E. Hughes, M. Hughes, A. Greenhill (eds.), Proceedings of the 21st Annual Conference on Language Development: Volume I, 465-475. Somerville, MAss: Cascadilla. Schiitze, C.T., 1997. INFL in child and adult language: Agreement, case and licensing. Ph.D. dissertation, MIT. Schiitze, C.T. and K. Wexler, 1996. Subject case licensing and English root infinitives. In: A. Stringfellow, D. Cahana-Amitay, E. Hughes, A. Zukowski (eds.), Proceedings of the 20th Annual Boston University Conference on Language Development: Volume 2, 670-681. Somerville, MA: Cascadilla. Stromswold, K., 1990. Learnability and the acquisition of auxiliaries. Ph.D. dissertation, MIT. Vainikka, A., 1994. Case in the development of English syntax. Language Acquisition 3, 257-325.
This page intentionally left blank
Lingua ELSEVIER
Lingua 106 (1998) 133-160
The second language instinct1 Bonnie D. Schwartz* Department of Linguistics and English Language, University of Durham, Elvet Riverside II, New Elvet, Durham DH1 3JT, UK
Abstract This paper proposes that the notion 'language instinct' appropriately characterizes nonnative language (L2) acquisition in two distinct ways. I argue that like native language (LI) development, L2 development, even by adults, relies on language instincts - despite L1-L2 differences at intermediate stages and in ultimate attainment - and that a primary source of L1-L2 differences is differences in their respective initial states. A variety of acquisition data, from the L2 child, the L2 adolescent and the L2 adult, are used to illustrate and assess three models that adopt this general characterization of L2 acquisition: Minimal Trees (Vainikka and Young-Scholten, 1994), Weak Transfer (Eubank, 1993/94) and Full Transfer/Full Access (Schwartz and Sprouse, 1996). These proposals differ on the extent of LI influence, i.e., on the representation of the L2 initial state, and I show that the L2 data support Full Transfer/Full Access. Keywords: Second language acquisition; L2 initial state; transfer; Universal Grammar
1. Introduction Most language acquisition researchers would agree that there is something akin to a language instinct for native language (LI). But can the same be said of nonnative language (L2) acquisition and especially L2 acquisition by adults? In this paper I try
* My sincere thanks to the audience at GALA '97, where this paper was first presented, and to its organizers, Antonella Sorace, Caroline Heycock and Richard Shillcock. A talk focussing on the end of this paper was subsequently (28 March 1998) part of a symposium at Pacific Second Language Research Forum (PacSLRF), and I would also like to thank that audience as well as the organizers, Yoichi Miyamoto and Shigenori Wakabayashi. For help in preparing the GALA talk and/or this written version, special thanks are due to Ute Bohnacker, Belma Haznedar, Teresa Parodi, Antonella Sorace and an anonymous reviewer; I am especially appreciative of the helpful suggestions received from Martha Young-Scholten (several times!) and Lydia White. Parts of this work were supported by grants, for which I am very grateful: from the British Academy, for travel to Tokyo to present at PacSLRF, and from the University of Durham, for PacSLRF and for the ACCESS (Adult and Child Crosslinguistic English Second Syntax) project, co-directed with Martha Young-Scholten. * Phone: + 44 191 374-2643; Fax: + 44 191 374-2685; E-mail:
[email protected] 0378-2166/98/S 19.00 © 1998 Elsevier Science B.V. All rights reserved Pll 50024-3841(98)00032-1
134
B.D. Schwartz I Lingua 106 (1998) 133-160
to make the case that the notion 'language instinct' applies equally well in the characterization of L2 acquisition, with one crucial difference to LI acquisition. Consider first how Steven Pinker conceives of 'the language instinct' in his 1994 book of the same title: "[Language] is a distinct piece of the biological makeup of our brains. Language is a complex, specialized skill, which develops in the child spontaneously, without conscious effort or formal instruction, is deployed without awareness of its underlying logic, is qualitatively the same in every individual, and is distinct from more general abilities to process information or behave intelligently. For these reasons some cognitive scientists have described language as a psychological faculty, a mental organ, a neural system, and a computational module. But I prefer the admittedly quaint term 'instinct' " (Pinker, 1994: 18)
His remarks on adult L2 acquisition are rather different: "Everyone knows that it is much more difficult to learn a second language in adulthood than a first language in childhood. Most adults never master a foreign language, especially the phonology - hence the ubiquitous foreign accent. Their development often 'fossilizes' into permanent error patterns that no teaching or correction can undo. [...] Many explanations have been advanced for children's superiority: they exploit Motherese, make errors unself-consciously, are more motivated to communicate, like to conform, are not xenophobic or set in their ways, and have no first language to interfere. [...] Holding every other factor constant, a key factor stands out: sheer age. [...] In sum, acquisition of normal language is guaranteed for children up to the age of six, is steadily compromised from then until shortly after puberty, and is rare thereafter." (Pinker, 1994: 290, 293)
My purpose here is not to argue that Pinker's observations are wrong or unfounded: adult L2 acquisition does seem more 'difficult'; accents and 'errors' and fossilization are the rule rather than the exception; and explicit instruction and corrective feedback are often futile. My purpose here is also certainly not to suggest that L2 acquisition is merely the replication of normal LI acquisition (the position held by Epstein et al., 1996) - a position which is empirically untenable, as we shall see. Rather, what I propose to do is challenge Pinker's conclusion, namely, that the key factor is age and hence that over time the language instinct as conceived for LI acquisition comes to be, in Pinker's metaphor, "dismantled" (Pinker, 1994: 294). I will argue that nonnative language acquisition - in adults and in children depends on three components: the L2 initial state, Universal Grammar (UG) and exposure to Target Language (TL) input. (1) L2 initial state + UG + TL input => development of L2 knowledge The key position in my argument is that 'second language instinct' can be conceived of in two different ways. First, it can be bracketed as in (2), i.e. there is an instinct for L2 acquirers to transfer knowledge of their LI grammar: (2) [second language] instinct = transfer - the L2 initial state Unlike in the familiar (often ad hoc) discussions of transfer in the L2 acquisition literature, I use 'transfer' here in a very explicit yet overarching way: what is trans-
B.D. Schwartz I Lingua 106 (1998) 133-160
135
ferred from the LI defines the L2 initial state, that is, it is the starting point of L2 acquisition (e.g. Hoekstra & Schwartz, 1994; Schwartz & Eubank, 1996). The second way to bracket 'second language instinct' is as in (3), which is intended to convey the idea of relying on the language instinct a second time: in the course of L2 development, the intermediate systems - or grammars - of Interlanguage are constrained by Universal Grammar. (3) second [language instinct] = Universal Grammar The idea of combining transfer and UG is not new to the field of generative L2 acquisition, dating back to Lydia White's work of the 1980s. What is innovative is using transfer to define the L2 initial state. Various explicit proposals on the extent of LI influence in the L2 initial state have recently appeared (e.g. Eubank, 1993/94; Vainikka and Young-Scholten, 1994; Schwartz and Sprouse, 1996), all adopting the schema of (1). These will be presented and evaluated in what follows, by way of a range of L2 data from child, adolescent and adult acquirers. The overall goal of this paper, then, is three-pronged: (i) to argue that L2 development does depend on the language instinct, despite obvious differences at intermediate stages and in ultimate attainment between normal LI acquisition and typical (adult) L2 acquisition; (ii) to argue that 'a key factor' for these differences rests in large part on the L2 initial state; and (iii) to present relevant L2 data. In the course of so doing, I review three recent L2 acquisition models and, ultimately, argue for one conception of the L2 initial state in particular.
2. Child L2 development: Haznedar (1995,1997a,b) The first point to establish is that transfer is not something that distinguishes adult L2 acquisition from child L2 acquisition. Haznedar (1995, 1997a, 1997b) provides a longitudinal study of naturalistic child L2 acquisition. Her young subject, Erdem, is a 4-year-old Turkish-speaking boy acquiring English. This combination of languages is especially conducive to testing for the role of LI in L2 development, given the many syntactic differences between the two languages - principal among them, the surface word order of verb and object: English is VO, whereas Turkish is OV, as shown in (4) where the object kitap ('book') precedes the verb al ('buy'). (4) (Ben) kitap al -ma -yacag -im (I) book buy -neg -future -Isg 'I will not buy books'
(Haznedar, 1995: 5, (7))
To make a case for LI transfer, one needs to find a significant difference between English LI development and Erdem's L2 development, where this difference plausibly derives from the structure of the LI. To preview, this is indeed what Haznedar found.
136
B.D. Schwartz I Lingua 106 (1998) 133-160
Erdem arrived in the U.K. at age 3; 11; for two months he was mostly at home with his parents, both native speakers of Turkish. Erdem's initial encounter with English began (at age 4;1) when he started nursery school, giving him exposure to English for 2.5 hours a day, five days a week. There was no special English instruction at school nor were there any other Turkish speakers in the class. Haznedar began collecting the data - all spontaneous production - from Erdem after only a month and a half at the nursery school, at a point which marks the onset of his English speech (at age 4;3). At the earliest interviews, Erdem produced only isolated words, usually nouns. These interview sessions took place on average three times per month. While Haznedar collected nearly three years of data, here we look at data from the earliest periods.1 One of the phenomena Haznedar (1995, 1997a,b) reports on concerns the position of the verb in relation to other VP-material, i.e. objects or adverbials. Erdem's early utterances containing a verb are consistently verb-final, like Turkish. In the first 8 samples, the object or adverbial precedes the verb (i.e. the order XV) in 21 out of 23 cases - a rate of 91.3%. Examples are given in (5) and (6) (where 'S' stands for 'sample number'): (5) a. Investigator: Shall we play with your toys? Erdem: yes, toys play (Haznedar, 1995: 8, (lla): S3, 23 Mar 94) b. Investigator: Where are we going now? Erdem: Newcastle going (Haznedar, 1995: 8, (lib): S5, 11 Apr 94) c. [context: on swing at the playground] fast push (Haznedar, 1995: 8, (1 Ic): S5, 11 Apr 94) d. would you like to outside ball playing? (Haznedar, 1995: 6, (9a): • S7, 6 May 94) (6) a. I something eating (Haznedar, 1995: 6, (9b): S8, 20 May 94) b. television watching (Haznedar, 1995: 6, (9c): S8, 20 May 94) c. this cartoon # this cartoon television looking 'I watched this cartoon on television' (Haznedar, 1995: 8, (12c): S8, 20 May 94) This contrasts with LI English developmental data, which are (S)VO. Thus, the initial and consistent production of the XV order in Erdem's very early English is evidence, as Haznedar (1995, 1997a,b) argues, minimally for the transfer of VP from Turkish, i.e. evidence for the [second language] instinct. Not surprisingly, Erdem is not forever stuck with an OV English. As can be seen in Haznedar's Table 1, given in (7), Sample 9 (5 June 1994) marks an abrupt change in the position of the verb.
1 I am indebted to Belma Haznedar for her generous help, especially for providing a copy of the table. Note that Haznedar (1997a,b) somewhat reworks Haznedar (1995), but these differences are not pertinent to the discussion here.
137
B.D. Schwartz I Lingua 106 (1998) 133-160 (7) Table 1 Number and percentage of XV vs. VX utterances
->
Sample
Recording date
XV
SI S2 S3 S4 S5 S6 S7 S8 S9 S10 Sll S12 S13 S14 S15 S16 S17 S18 S19 S20 S21 S22
9 Mar 1994 17 Mar 1994 23 Mar 1994 4 Apr 1994 11 Apr 1994 22 Apr 1994 6 May 1994 20 May 1994 5 Jun 1994 13 Jun 1994 17 Jun 1994 9 Aug 1994 23 Aug 1994 30 Aug 1994 16 Sep 1994 4 Oct 1994 12Oct 1994 20 Oct 1994 1 Nov 1994 8 Nov 1994 15 Nov 1994 22 Nov 1994
0 0 2 1 7 2 3 6 0 4 5 0 0
0 0 0 0
XV 0% 0% 100% 100% 100% 66.67% 100% 85.71% 0% 9.52% 16.67% 0% 0% 6.67% 1.78% 1.20% 1.06% 1.06% 0% 0% 0% 0%
VX 0 0 0 0 0
1 0
1
21 38 25 20 57 14 55 82 93 93 69 132 79 83
% VX
Total
0% 0% 0% 0% 0% 33.33% 0% 14.29% 100% 90.48% 83.33% 100% 100% 93.33% 98.20% 98.80% 98.94% 98.94% 100% 100% 100% 100%
0 0 2 1 7 3 3 7 21 42 30 20 57 15 56 83 94 94 69 132 79 83
From this point on, Erdem's utterances consistently display the order VX. Some early VX utterances are provided in (8): (8) a. b. c. d. e. f. g. h. i.
I am watching the television (Haznedar, 1995: $ ', (14c): S8, 20 May you eating apple (Haznedar, 1995: 8, (13a): S9, 5 June my daddy always playing me (Haznedar, 1995: 8, (13b) S9, 5 June I am talking very very fast (Haznedar, 1995: 8, (13c) S9, 5 June big man is playing toys (Haznedar, 1995: 8, (13d): S9, 5 June I'm drink the milk (Haznedar, 1995: 9, (14d): S9, 5 June (Haznedar, 1995: 9, (14a): S10, 13 June going this way my mum is go to the shopping (Haznedar, 1995: 9, (14b): S10, 13 June this teddy bear is looking that (Haznedar, 1995: 9, (14e): S10, 13 June
94) 94) 94) 94) 94) 94) 94) 94) 94)
Under Haznedar's analysis, between Samples 8 and 9 the headedness of VP has switched from the Turkish head-final (SOV) to the English head-medial (SVO). Erdem's data thus support both ways of bracketing 'second language instinct': first there is transfer, and then the language instinct is engaged a second time, resetting the headedness parameter on the basis of TL input. (For another possible analysis of the change from surface OV order to surface VO order, see Schwartz, in press.)
138
B.D. Schwartz I Lingua 106 (1998) 133-160
2.1. Model 1: Minimal Trees Haznedar's developmental data on verb placement repeat a pattern which is becoming increasingly familiar in the L2 acquisition literature (e.g. Jansen et al., 1981; Jagtman and Bongaerts, 1994), namely, that constituent order in lexical projections transfers, later to be replaced by TL constituent order. Such a view of transfer is consistent with the Minimal Trees hypothesis of Vainikka and Young-Scholten (1994, 1996a,b). They capitalize on the distinction between lexical and functional categories and claim that whereas lexical projections and their linear order transfer into the L2 initial state, functional projections do not. 'Growth' of an Interlanguage grammar stems from the addition of functional categories, based on Lexical Learning; in fact, under their view, L2 development consists of the progressive addition of functional structure, going up the tree - i.e. IP before CP. The Minimal Trees hypothesis is thus very similar (though not identical) to the Weak Continuity hypothesis (e.g. Vainikka, 1993/94; Clahsen et al., 1994) proposed for LI acquisition. The original empirical motivation for the Minimal Trees hypothesis was as follows. Vainikka and Young-Scholten compare L2 German data from adults whose LI is either (i) Korean or Turkish (Vainikka and Young-Scholten, 1994), both OV, or (ii) Italian, Portuguese or Spanish (e.g. Meisel et al., 1981; Clahsen and Muysken, 1986; Vainikka and Young-Scholten, 1996a), which are VO. As is well known, the surface syntax of German in embedded clauses is OV; in main clauses German respects the verb-second constraint. As for the Interlanguage German, the least proficient Koreans and Turks produce main clauses that are (S)OV, whereas the Romance subjects start off (S)VO. In these early data, there is little evidence of (correct) verbal inflection or of auxiliary and modal verbs. From this Vainikka and Young-Scholten conclude that functional projections are absent; hence, the L2 initial state, at the sentential level, consists of VP, head-medial (SVO) for the Romance speakers and right-headed (SOV) for the Koreans and Turks. Subsequently, the Koreans and Turks produce VO orders; this leads Vainikka and YoungScholten to conclude that a functional head must now be present, to serve as a landing site for verb raising out of the VP, which has remained head-final. Since the morphological form of the (raised) verb is mostly incorrect,2 and since auxiliary and modal verbs are still rare, Vainikka and Young-Scholten hypothesize that this landing site heads an underspecified functional projection, FP (for Finite Phrase). FP becomes specified as a full-fledged AgrP only later, once auxiliaries and modals occur with some regularity and the appropriate inflectional morphology on the (raised) verb is found.
2
Verbal inflection does not agree with the subject or, more often, the verb is realized as an infinitival or stem form. These L2 data are thus strikingly different from those in German LI acquisition, where virtually all raised verbs show targetlike agreement with the subject, that is, virtually no raised verbs realize either non-agreeing or infinitival/stem forms. See, e.g., Jordens (1990) and Poeppel and Wexler (1993).
B.D. Schwartz I Lingua 106 (1998) 133-160
139
2.1. Evidence for more transfer than posited in Minimal Trees This first view of transfer - that LI lexical projections alone constitute the L2 initial state - is the most restrictive of the three conceptions of transfer we will consider. In principle, there are three kinds of evidence that could undermine such a position. The first type is evidence that functional structure as instantiated in the LI does characterize the earliest phases of L2 development. Such evidence might appear to be the most obvious type to look for; however, it is also often the most elusive and ambiguous. This is because functional heads are typically associated with morphological affixes or elements such as complementizers. When the regular use of (correctly) inflected verbs and of complementizers has been found - as in the study by Grondin and White (1996) on child L2 acquisition of French - a proponent of Minimal Trees can simply counter that the data do not reflect an early enough stage (as Vainikka and Young-Scholten, 1996b: 126 indeed argue). L2 data that unambiguously suggest early LI functional structure do, nevertheless, exist. Consider, for instance, Erdem's early negative placement. His first negated utterances (both verbal and nominal) are consistently neg-final (e.g. finish no, Sample 1), in line with the patterning of negation in Turkish (see, e.g., (4)) - and completely unlike what is found in early LI English. As Haznedar (1995, 1997a,b) argues, under the assumptions that Neg is a functional head, the neg-final order is evidence against the very narrow conception of transfer defended by proponents of Minimal Trees. The second type of counterevidence to transfer as conceived in Minimal Trees is evidence for Interlanguage phenomena necessarily invoking functional structure at some intermediate stage that, on the one hand, could not have arisen from TL input interacting with UG, but, on the other, are present in the LI grammar. And the third type involves a comparison of L2 development and LI development: since Minimal Trees predicts L1-L2 similarities in progressively building the tree (that is, once the TL constituent order in lexical projections is acquired), it follows that substantial LlL2 developmental divergences which implicate functional structure constitute a challenge to this conception of transfer. These two types of counterevidence are considered next, this time using data from adolescent L2 acquisition. 3. Adolescent L2 development: White (1990/91, 1991, 1992) In a series of studies, White (1990/91, 1991, 1992) investigated the position of adverbs in the L2 English of 11- and 12-year-old francophone Canadians, whose exposure to English was in a school setting. A variety of data collection tasks acceptability-judgment, preference and elicited-production - all show that these speakers experience considerable difficulty with English frequency (e.g. often) and manner (e.g. quickly} adverbs in sentence-internal position. The facts on adverb placement in the two languages are well known: in French the finite thematic verb must precede the adverb ((9)), whereas English shows the
140
B.D. Schwartz I Lingua 106 (1998) 133-160
opposite pattern ((10)). The French SVAdvO order is ungrammatical in English, and likewise the English SAdvVO order is ungrammatical in French. (9) French a. SV_AdvO b. * SAdvVO (10) English a. * SVAdvO b. SAdvVO
Marie prend souvent le metro * Marie souvent prend le metro * Marie takes often the metro Marie often takes the metro
The L2 adolescents in White's studies readily accept and produce sentences in English with the ungrammatical SVAdvO order, as in (lOa). Our concern is the source of this error. Recall that according to Minimal Trees, only lexical projections and their linear orientation transfer. Under a standard analysis of French, VP-adverbs are base-generated at the left periphery of VP (Emonds, 1978); so under Minimal Trees, the L2 acquirers' (L2ers') initial state will generate a phrase marker as in (11), with the adverb left-adjoined to VP (Pollock, 1989):
(11)
VP AdvP I Adv Spec
VP V
NP In (11) the adverb precedes the verb; this is the sole position for the verb, since, by hypothesis, there are no functional projections in the L2 initial state - which means there is no position for the verb to move to. If (11) is the L2 adolescents' representation, how then can their SVAdvO error result? Note that when they get English input containing a sentence-medial adverb and a finite transitive verb, as in (lOb), it will exhibit the SAdvVO order. With (11) as their initial representation and (lOb) as the input type, francophone acquirers of English should not allow the finite thematic verb to precede the adverb. Yet, not only is the order SVAdvO frequently produced and accepted by the young adolescents in White's studies, but anecdotal evidence suggests that it also persists into advanced levels of Interlanguage English. Let us therefore assume that in addition to lexical architecture, more of the LI grammar transfers, specifically, the functional projection immediately dominating VP(callitFP):
B.D. Schwartz I Lingua 106 (1998) 133-160
141
If one also assumes transfer of French verb movement (Pollock, 1989), V to F in (12), the Interlanguage English VAdvO order is then derivable.3 Indeed, this is precisely the analysis that White (1990/91, 1991, 1992) assigns to these data. This presupposes, of course, that the L2 initial state must be characterized as more than the bare VP of Minimal Trees. The final type of counterevidence to Minimal Trees is evidence for L1-L2 developmental differences that implicate functional structure. For the sake of argument, let us suppose, as Vainikka and Young-Scholten do, that functional projections 'grow' in LI acquisition as well (e.g. Vainikka, 1993/94; Clahsen et al., 1994). This means that (11) is also the state of LI English before the addition of functional structure. Again, the relevant type of input English LI acquirers get is like (lOb). And just as one would expect, children acquiring English as their native language never pass through a stage in which they allow the order SVAdvO, i.e. LI English-speaking children do not incorrectly raise thematic verbs past the adverb; instead, they produce SAdvVO. In sum, transfer as conceived under Minimal Trees predicts that once the VP constituent order is targetlike, the progressive building of functional architecture will be the same in LI and L2 development. But as the data show, the LI and L2 developmental patterns are quite distinct. (For fuller arguments and counterarguments regarding these adverb placement data, see Vainikka and Young-Scholten, 1996b; Schwartz and Sprouse, 1996.) The above arguments illustrate two types of counterevidence to the Minimal Trees conception of transfer: (i) the error that is observed is not derivable solely from the LI VP plus TL input, and (ii) it is not found in LI development. Transfer of more of the LI grammar is apparently required to explain the data, specifically, at least the head (and specifier - see fn. 3) of the functional projection dominating VP (as illustrated in (12)).
3
To get the subject in front of the verb (SVAdvO), raising the subject leftward out of VP must also be assumed, another property attributable to transfer from French.
142
B.D. Schwartz I Lingua 106 (1998) 133-160
3.1. Model 2: Weak Transfer/Valueless Features How much more of the LI defines the L2 initial state? Eubank's (1993/94, 1996) 'Weak Transfer' hypothesis - also called the 'Valueless Features' hypothesis offers one possible answer to this question. In Eubank's (1993/94) model, both lexical and functional projections from the LI transfer; however, morphologically-driven syntactic information, i.e. 'strength' of inflection, does not. Eubank adopts the view that certain syntactic phenomena such as verb raising depend on the values of inflectional features (Pollock, 1989; Chomsky, 1993; inter alia), and that these feature-values are in turn determined by morphological paradigms (e.g. Rohrbacher, 1994). English has a meager verbal paradigm and hence its feature-value for inflection is [weak] (or [-strong]), whereas in French the verbal paradigm is fuller and inflection is assigned the value [strong]. This difference in feature-values correlates with differences in verb placement: finite verbs in French raise (overtly); finite main verbs in English do not. Some of the original empirical motivation behind Eubank's model came from verb placement in French-English Interlanguage. He points out that the data from White's studies indicate that an adverb may either follow or precede the finite thematic verb, i.e. the L2ers allow both SVAdvO and SAdvVO. Eubank takes these data as evidence for the optionality of ('short') verb movement. This optionality arises, according to Eubank, from there being no way to specify the feature-value of the head of the functional projection (IP)4 dominating VP. This is because inflectional information necessary to the determination of the feature-value does not transfer (and has not yet been acquired). As the feature of INFL is initially 'valueless' in Eubank's terms,
- this 'results' in optional verb raising, capturing the permissive patterning of AdvVO and VAdvO.5 Development under the Weak Transfer model is dependent on acquiring morphology, specifically morphological paradigms : for example, once verbal inflection in English is acquired, the value of INFL is specified as [-strong], which in turn results in no verb raising.6 3.2. Evidence for more transfer than posited in Weak Transfer Eubank's view of transfer - and hence of the L2 initial state - is broader than that of Minimal Trees: both lexical and functional categories transfer but strength of feature-values does not. Again there are at least three ways to put this second conception of transfer to the empirical test. The first type of counterevidence would be evidence that feature-values (i.e. [±strong]) as instantiated in the LI do in fact characterize the earliest L2 stage; for 4
In Eubank (1993/94), this is actually TP; for the illustrative purposes here, IP will do. Why an value of a feature should give rise to optionality of movement has never been addressed; one could just as easily imagine that an value results in, say, no movement. 6 For the full set of data considered by Eubank in regard to verb raising in French-English Interlanguage, see his 1993/94 article, and for further developments, see Eubank (1996); for a critique of his approach and counteranalyses, see Schwartz and Sprouse (1996, 1997a). 5
B.D. Schwartz I Lingua 106 (1998) 133-160
143
instance, movement that is obligatory in the LI being initially obligatory in the Interlanguage (or the opposite, i.e. obligatory non-movement operations in the LI also realized in the initial Interlanguage as obligatory non-movement). The second type of evidence just requires that non-optionality of movement be in evidence at the earliest stage.7 Both these types of evidence are hard to find, principally because they both demand evidence from the ever-elusive 'earliest' stage of L2 development. The third type of counterevidence is in principle more readily available, precisely because it does not need data from the earliest phases. Rather, it merely requires comparing the L2 acquisition of a given language (L) by several L2ers or groups of L2ers whose Lls differ in regard to some feature regulating movement. The Weak Transfer hypothesis predicts that regardless of LI strength specification, all L2ers acquiring L should pattern the same in regard to movement, specifically, permit optional movement. If L2 data on movement from L2ers with distinct Lls instead exhibit LI-correlated patterns, then such data would argue against this second conception of transfer. The argument should become clear as we go through the next set of data - this time from L2 adults.
4. Adult L2 development: Parodi et al. (1997) Parodi et al. (1997) studied the acquisition of German nominals by native speakers of Korean, Turkish, Italian and Spanish. One of the phenomena they examined was the relative order of noun and adjective. The languages involved differ in this regard. Consider first the data in (13) for German, Korean and Turkish: (13) a. German: jene drei interessanten Biicher those three interesting.pl books (Parodi et al., 1997: (7b)) b. Korean: ku se -kwon -uy caemiissnun chaek -tul that three -class -gen interesting book -pi 'those three interesting (volumes of) books' (Parodi et al., 1997: (lie)) c. Turkish: ben -im pek£ok ilging kitab -im Isg -gen many interesting book -Isg 'my many interesting books' (Parodi et al., 1997: (13b)) In these three languages, the adjective must precede the noun. For Korean and Turkish speakers, then, the adjective-noun order matches that of German, the Target Language. The situation is different, however, for speakers of Italian and Spanish, illustrated in (14):
7
Note that in regard to falsifying Weak Transfer, evidence could consist of any type of non-optionality of movement at the earliest phase. This could be either in conformity with the L1 or irrespective of the LI; the latter case would not constitute evidence of transfer, of course, but it would still be counterevidence to Weak Transfer.
144
B.D. Schwartz I Lingua 106 (1998) 133-160
(14) a. Italian: quei tre libri interessanti Spanish: esos tres libros interesantes those three books interesting.pl 'those three interesting books' b. Italian: un uomo povero Spanish: un hombre pobre a man poor 'a poor man' c. Italian: un pover' uomo Spanish: un pobre hombre a poor man 'a pitiable man' interessanti libri (cf. d. Italian: quei tre Spanish: esos tres interesantes libros (cf. those three interesting.pl books 'those three INTERESTING books'
(Parodi et al., 1997: (18a))
(Parodi et al, 1997: (18f))
(Parodi et al., 1997: (18g)) (14a)) (14a)) (Parodi et al., 1997: (18h))
In Italian and Spanish (henceforth 'Romance'), the typical position of adjectives is after the noun, as in (14a). Some attributive adjectives may occur both post-nominally and pre-nominally, but with a change in meaning (compare (14b) and (14c)). An attributive adjective whose position is typically post-nominal, as in (14a), can often occur pre-nominally, for emphasis, as in (14d). The question arises as to how to derive the relative orders between noun and adjective. It is assumed that the minimal structural configuration these languages share is as in (15):
(15) FP
The adjective is generated in an adjoined position, similar to the base order of adverbs in relation to verbs. The surface order adjective-noun - in German, Korean and Turkish - reflects this base order. As for Romance, much comparative research (e.g. Bernstein, 1991, 1992; Picallo, 1991; Cinque, 1994) has come to the consensus that the noun-adjective order results from raising the noun to some functional head, call it F, higher than the adjective. This is shown in (16): (16) ...[ F N,[ N pAdjP[ N P [ N .t l ]]]]
B.D. Schwartz I Lingua 106 (1998) 133-160
145
In sum, the order Adj-N, found in all four languages at hand, reflects the order in which these elements are base generated. The reverse order, N-Adj, permitted only in Romance, is derived via nominal head movement to a higher functional head. The L2 German data come from three groups of untutored adult acquirers: (i) cross-sectional data from 8 Korean speakers, an elementary level (n=2) and a more advanced level (n=6), whose spontaneous speech was collected in the LEXLERN Project (see Clahsen et al., 1990); (ii) longitudinal and cross-sectional data from 3 Turkish speakers, the longitudinal data (n=2) from the ESF Project (see Klein and Perdue, 1992) and the cross-sectional data from von Stutterheim (1986); and (iii) longitudinal data from 4 Romance (3 Italian, 1 Spanish) speakers in the ZISA Project (Meisel et al., 1981), whose spontaneous speech was recorded at regular intervals, starting at about 1.5 to 5 months after arrival in Germany.8 Recall that the question of interest is whether the three LI groups pattern the same, since only in Romance are post-nominal adjectives permitted. The table in (17) gives a breakdown of the incidence of post-nominal adjectives among the L2 subjects: (17)/taw numbers and percentages of N-Adj order Koreans 1/102 (1.0%) Turks 0/103 (0%) Italian: Bongiovanni I 3/8 (37.5%) Bongiovanni II 1/5 (20.0%) Italian: Una I 3/23 (13.0%) Lina II 0/8 (0%) Una III 1/11 (9.1%) Italian: Brunei 9/32 (28.1%) Bruno II 17/64 (26.6%) Bruno III 0/12 (0%) Spanish: Anal 7/28 (25.0%) Ana II 0/10 (0%) (from Parodi et al., 1997: Table 2) As the table shows, the noun-adjective order is differentiated by LI group. In the data from the Korean and Turkish speakers, there was a total of 205 adnominal adjectives, 102 from the Koreans and 103 from the Turks. All except one (produced by Gabho, a level I Korean) appeared in pre-nominal position. Two example utterances are given in (18): (18) a. das arme Fisch the poor fish correct form: der arme Fisch
8
(Gabho, Korean, level I) (Parodi et al., 1997: (25a))
Different data-collection periods for the Turkish and Romance speakers are indicated below - for example, 'Bongiovanni (cycle) II' is later than 'Bongiovanni (cycle) I'.
146
B.D. Schwartz I Lingua 106 (1998) 133-160
b. schone Wald nice forest correct form: der schone Wald
(Ilhami, Turkish, cycle I) (Parodi et al., 1997: (25b))
In contrast, both pre- and post-nominal adjectives were attested in the data from all the Romance subjects. Consider the utterances in (19) as illustration: (19) a. neue Auto (Bongiovanni, Italian, cycle I) new car correct form: neues Auto b. eine Schlilssel normal (Bongiovanni, Italian, cycle I) a key normal correct form: ein normaler Schliissel a normal key c. schone Wetter (Bongiovanni, Italian, cycle II) nice weather correct form: schones Wetter (Parodi et al., 1997: (26b)) d. von meine Schwester verheirat (Bongiovanni, Italian, cycle II) from my sister married correct form: von meiner verheirateten Schwester from my married sister These LI-differentiated results thus constitute evidence against Eubank's proposal on transfer. No matter what the exact content of the feature that drives overt noun movement in Romance - Number is often suggested - in order to derive the N-Adj order, this feature in F must have a [strong] value. Recall that under Weak Transfer, feature-values do not transfer; this means that all the L2 adults, including the Koreans and Turks, should start off without the value of F specified, i.e. F should initially be . This predicts that they should all allow both Adj-N and N-Adj, the latter resulting from (optional) noun movement. But as the data show, the adjective placement of the L2 adults is differentiated, dividing along lines of LI, which suggests that the source of the division is in fact the LI grammar. The German-Interlanguages of the Koreans and Turks do not generate the N-Adj order because this is not generable in their LI and because no German input would steer them otherwise. By contrast, in spite of this same German input, which gives no evidence of the N-Adj order, the Romance speakers do produce post-nominal adjectives because their LI grammar generates this order via noun movement. So one is led to the general hypothesis that where there is no movement in the LI, there is no evidence of movement in the initial Interlanguage, but where there is movement in the LI (despite no evidence of it in the TL), this movement is seen in the initial stages of the Interlanguage - and beyond. The above hypothesis does not imply that L2ers cannot acquire movement. If the Target Language has overt movement with respect to phenomenon P, then TL input containing P can provide evidence for movement to L2ers whose LI lacks movement for P, and so it is predicted that movement (for P) will characterize their Interlan-
B.D. Schwartz I Lingua 106 (1998) 133-160
147
guage at some point subsequent to the L2 initial state. An example of just such a scenario can be found in the study by Hawkins et al. (1993) on the (tutored) acquisition of French by English speakers. While the absence of thematic verb raising characterizes their LI, they do come to know that in French all finite verbs raise past adverbs. Another example comes from the study by Bley-Vroman et al. (1988) on the acquisition of English by Korean speakers, but this time in regard to phrasal movement. Whereas Korean lacks overt wh-movement, these L2 adults do come to know that in English, overt wh-movement (in questions) is obligatory. Results such as these, then, are additional instances of the second way to bracket 'second language instinct', i.e. relying on UG again in the course of constructing the Interlanguage. What about the opposite case, where the LI has movement but the TL does not? Such a scenario exemplifies delearning, one which for a variety of reasons might prove more troublesome to L2 acquirers (for discussion, see White, 1989; Schwartz and Sprouse, 1994). This situation is in fact what the Romance speakers face in acquiring German nominals; they have to delearn noun raising. The longitudinal data in Parodi et al. speak to this issue. (17) shows that for all the Romance speakers, the percentage of N-Adj order in nominals declines over time, dropping all the way to zero for two of them, Bruno III (0/12) and Ana II (0/10). Only experimental evidence showing that Romance speakers (eventually) prohibit the N-Adj order in their German Interlanguage would definitively indicate that noun raising has been delearned - but these results are suggestive nonetheless. 4.1. Model 3: Full Transfer I Full Access So far we have seen evidence of transfer in the L2 child, the L2 adolescent and the L2 adult. Although the Weak Transfer model posits more influence from the LI than the Minimal Trees model, it was still found to be insufficient. In the Full Transfer/Full Access model that Rex Sprouse and I have been developing (Schwartz, 1998, in press; Schwartz and Sprouse, 1994, 1996, 1997a,b; Sprouse and Schwartz, 1998), the most extreme of the three conceptions of transfer is adopted, a kind of 'no holds barred' view of LI influence: in brief, the final state of LI acquisition defines the initial state of L2 acquisition. We see Full Transfer/Full Access (FT/FA) as pushing many ideas in Lydia White's work (e.g. 1985, 1989, 1990/91) to their logical limit, contending that Interlanguage development is constrained both by the LI grammar and by UG. According to the 'Full Transfer' part of FT/FA, the entirety of the LI grammar (excluding the phonetic matrices of lexical/morphological items) is the L2 initial state; in other words, all of the abstract syntactic properties of the LI transfer. This means that the LI grammar is the first 'way station' for TL input, imposing analyses on this input and potentially deriving analyses quite distinct from those of the native speaker. Input that cannot be so accommodated at any point can cause the system to restructure; hence, syntactic development is 'failure-driven'. In some cases, this revision may occur rapidly; in others, much more time may be needed. All such revision is hypothesized to fall within the hypothesis space of UG, the same hypothesis space of LI acquisition (hence the 'Full Access' part of FT/FA).
148
B.D. Schwartz I Lingua 106 (1998) 133-160
Embedded in this approach to L2 development are two auxiliary points. First, following Bley-Vroman (1983), Interlanguage should not be analyzed from the perspective of the Target Language grammar, but rather in terms of its own internal coherence (for discussion, see Schwartz, 1997). Second, convergence on the TL grammar is not guaranteed; this is because unlike in LI acquisition, the L2 starting point is not simply open or set to 'defaults', and so the data needed to force L2 restructuring could be either nonexistent or obscure. Under FT/FA, the starting points of LI and L2 acquisition differ, and the endpoints of LI and L2 acquisition are likely to differ; however, this does not imply that the cognitive processes underlying LI and L2 acquisition differ. Indeed, we maintain that precisely because (i) UG and learnability principles (Pinker's language instinct) are constant across LI and L2 acquisition of L but (ii) their initial states are distinct, the 'final states' of L2 acquisition of L do not systematically replicate the final state of LI acquisition of L. 5. More on second language instincts in L2 adults: Hulk (1991) Perhaps the clearest L2 acquisition study that single-handedly exemplifies the essence of FT/FA, in relation to initial LI influence as well as subsequent development, is Hulk (1991). Hulk investigated the development of verb syntax in Dutch acquirers of French. In this regard, Dutch and French differ in two important respects, illustrated in (20) (with non-pertinent details omitted): (20) a. Dutch: SOV and, in matrix clauses, V2 [CP AdvP [c V[+fln]l [IP S [vp ti [CP [c dat [IP S [vp NP V ]]]]]]]] b. French: (XP)SVO in matrix and embedded clauses [1P AdvP [IP S [vp V [a, [c que [IP S [VP V NP ]]]]]]] Dutch is SOV in surface syntax, whereas French is SVO; Dutch is verb-second (V2) in matrix clauses, while French is not, although French does allow topicalization (of e.g. Adverbs) to pre-subject position - call it adjunction to IP. These differences are exemplified in (21) and (22): (21) Dutch a. SAuxOV
Jan heeft de aardbeien gegeten Jan has the strawberries eaten b. AdvAuxSOV Gisteren heeft Jan de aardbeien gegeten Yesterday has Jan the strawberries eaten c. AdvVSO Gisteren at Jan de aardbeien Yesterday ate Jan the strawberries d. ...COMPSOV (Ik geloof) dat Jan de aardbeien at (I believe) that Jan the strawberries ate e. ... COMP SOVAux (Ik geloof) dat Jan de aardbeien gegeten heeft f. *SAuxVO *Jan heeft gegeten de aardbeien g. *AdvSVO *Gisteren Jan at de aardbeien
B.D. Schwartz I Lingua 106 (1998) 133-160
h. i. ']. k.
*AdvSAuxV<9 *... COMP SAuxVO *AdvSAuxOV *AdvAuxSVO
(22) French a. *SAux0y b. *AdvAuxS0V c. *AdvVSO d. *... COMP SOV
149
*Gisteren Jan heeft gegeten de aardbeien *(Ik geloof) dat Jan heeft gegeten de aardbeien *Gisteren Jan heeft de aardbeien gegeten *Gisteren heeft Jan gegeten de aardbeien
*Jean a les fraises mange *Hier a Jean les fraises mange *Hier mangeait Jean les fraises *(Je crois) que Jean les fraises mange e. *... COMP SOVAux *(Je crois) que Jean les fraises mange a f. SAuxVO Jean a mange les fraises g. AdvSVO Hier Jean mangeait les fraises h. AdvSAuxVO Hier Jean a mange les fraises i. ... COMP SAuxVO (Je crois) que Jean a mange les fraises j. *AdvSAux£V *Hier Jean a les fraises mange k. *AdvAuxSV0 *Hier a Jean mange les fraises
(Hulk, 1991: 15, (22)) (Hulk, 1991: 16, (26)) (Hulk, 1991: 16, (27)) (Hulk, 1991: 16, (25)) (Hulk, 1991: 15, (24)) (Hulk, 1991: 17, (32)) (Hulk, 1991: 19, (45)) (Hulk, 1991: 20, (51)) (Hulk, 1991: 20, (50)) (Hulk, 1991: 19, (46)) (Hulk, 1991: 20, (53))
Notice that in Dutch, (21a-e) are grammatical, but the corresponding French versions, in (22a-e), are ungrammatical: for (21) and (22), the (a-c) main-clause examples are (topicalization with) V2, and in (a) and (b) as well as in the embedded clauses of (d) and (e), the object precedes the (thematic) verb. The exact contrary holds of the (f-i) set, these being grammatical in French but ungrammatical in Dutch: in (f-i) the object follows the (thematic) verb, and in (g) and (h) there is topicalization but no V2. Finally, the orders in (j) and (k) are impossible in both languages: (j), because it is OV like Dutch, but, like French, not V2; and (k), because it is VO like French, but V2, like Dutch. Based on these differences between Dutch and French, Hulk devised a French acceptability judgment task consisting of 40 (written) items, the majority of which instantiated the variations on word order in (22).9 The vocabulary was kept as simple as possible in order to not unduly tax the students, the Beginners in particular. A total of 131 Dutch students of French were tested, divided into four groups: (i) 26 beginners, in first-level, high-school French (who had "just started learning French" (Hulk, 1991: 21)); (ii) 64 students at the second level of high-school French, which I will call the 'Low-intermediate' group; 10 (iii) 25 in high school at 9
Judgments on the grammaticality of these items by two native French speakers were as expected. In the text of the article, Hulk (1991: 21) states that there were 21 subjects at this level, but in fn. 18 she writes: "In fact we had three groups of ... 20, 22 and 22 [students] from this level. In the results we have taken the average from the three groups in order to be able to compare them more easily with the other levels". 10
150
B.D. Schwartz I Lingua 106 (1998) 133-160
the third level; and (iv) 16 French majors in their first year at the Free University of Amsterdam. In what follows the focus will primarily be on the results of the first two groups of students, the Beginners and the Low-intermediates. The first set of data to consider concerns the order between verb and complement. All three L2 acquisition models presented above predict that the Dutch OV system will initially be extended to French, and this is what is found, as shown in (23) and (24): (23) Acceptance ofOV (from Hulk, 1991: 22-23): Vm Dutch; * in French Sentence type Beginners Low-intermediates Intermediates a. SAuxOV (cf. (22a)) 73% 40% 2% b. ...COMPS0V (cf. (22d)) 89% 31% 8% c. ... COMPSOVAux (cf. (22e)) 65% 26% 0% (24) Acceptance ofVO (from Hulk, 1991: Sentence type a. SAuxVO (cf. (22f)) b. ... COMPSAuxVO (cf. (22i))
22-23): * in Dutch; V/n French Beginners Low-intermediates Intermediates 42% 86% 100% 27% 87% 100%
The Dutch speakers clearly approach French from the vantage point of their OV LI. Results of the Beginners show that sentences instantiating the unambiguously OV orders of (23) are overwhelmingly accepted at rates between 65% and 89%; the unambiguously VO orders of (24) are accepted by the Beginners at much lower rates, 27% to 42%. Students at the second level, the Low-intermediates, exhibit the expected development, as shown especially in (24) by their high acceptance (86%-87%) of grammatical VO sentences. Nevertheless, the Low-intermediates' behavior, in (23), on OV sentences still shows remnants of the Dutch OV system, with acceptance rates from 26% to 40%. We will come back to this. The next set of data concerns V2 vs. non-V2 orders. Recall that of the three models, only FT/FA predicts transfer of V2 from Dutch: Minimal Trees allows transfer of only lexical architecture; Weak Transfer allows transfer of both lexical and functional structure but not the strength of feature-values determining movement (in this case, verb raising to C - see below). Consider the tables in (25) and (26): (25) Acceptance ofV2 (from Hulk, 1991: 24): Vm Dutch; * in French Sentence type Beginners Low-intermediates Intermediates a. AdvVSO (cf. (22c)) 92% 50% 32% b. AdvAuxSOV (cf. (22b)) 92% 38% 0% (26) Acceptance of 'V3' (from Hulk, 1991: 24): * in Dutch; Vm French Sentence type Beginners Low-intermediates Intermediates a. AdvSVO (cf. (22g)> 38% 80% 100% b. AdvSAuxVO (cf. (22h)) 19% 85% 100%
B.D. Schwartz I Lingua 106 (1998) 133-160
151
In (25), the Beginners' 92% acceptance rate of Dutch-like V2 orders is in striking contrast to their low rates, in (26), of French-like non-V2 (henceforth 'V3') - 19% to 38%. Transfer from the LI is an obvious explanation for both sets of facts. Notice that the word order of (25b) is V2 and (unambiguously) OV; hence sentences of this type could also be included in (23), raising the upper limit of the OV-acceptance range to 92%, a fact we will also come back to. As for development, again one can see that the Low-intermediates are moving closer to French: in (26), they accept French-like V3 orders at much higher rates than the Beginners (80%-85% versus 19%-38%); still, the fact that, in (25), the Low-intermediates accept the Dutch-like V2 orders 38% to 50% of the time testifies again to the continued influence of their LI syntax. Nevertheless, the data of the Low-intermediates, and, arguably, of the Beginners as well, do exhibit what looks to be optionality: the extent to which non-V2 orders are accepted (see (26)) is not matched by the extent to which V2 orders are rejected (see (25)). Since both V2 and V3 are being accepted, might such data then count as support for Eubank's Weak Transfer model? Recall that optionality of movement in this case, V to (I to) C (see (20a)) - is the key observation Weak Transfer is designed to explain; feature-values regulating movement are said to be initially (in this case, some feature in C), and the consequence of an value is optional movement (but see fn. 5). Notice, however, that if this were the explanation for the Dutch students' acceptance of both V2 and V3, one would expect V2 and V3 main-clause declaratives to typify an early phase in all L2 acquisition of French, for instance, the L2 French of English speakers. In all the studies on English-French Interlanguage - and there have been many - there has been no mention of ever finding V2 existing side by side with V3 in main-clause declaratives - in fact, no mention of finding anything remotely like declarative V2 at all.11 FT/FA accounts for the V2-V3 alternation as follows: (i) the existence of V2 in the early Dutch-French Interlanguage is an instance pure and simple of massive LI influence: transfer of the whole Dutch clause with LI feature-values, which includes the operations necessary for deriving V2 (i.e. [second language] instinct); (ii) the 'blooming' of V3 is the development that occurs - via UG kicking in a second time (i.e. second [language instinct]) - in response to French input (e.g. types (22g) and (22h)). We will return to this as well. Continuing with the data from Hulk's study, we now examine the L2ers' results on the two remaining orders, given in (27), each of which is neither Dutch-like nor French-like. Here the focus will be on what is happening from level to level.
11
By contrast, similar findings of V2 and V3 coexisting in Interlanguage have been robustly documented in the Ll-TL situation reversing the (L1=V2; TL=non-V2) design of Hulk (1991): the L2 acquisition of V2 Germanic by speakers of non-V2 languages such as Romance/English (e.g. duPlessis, et al., 1987) and Turkish (Schwartz and Sprouse, 1994).
152
B.D. Schwartz I Lingua 106 (1998) 133-160
(27) Acceptance of non-Dutch, non-French orders (from Hulk, Dutch; * in French Sentence type Beginners Low-intermediates a. AdvSAuxOV (cf. (22j)) 30% 38% b. AdvAuxSVO (cf. (22k)) 38% 64%
1991: 24): * in Intermediates 0% 8%
In (27a), the order is AdvSAuxOV, which is the combination of non-V2 (i.e. V3) plus OV. This order is not one that any of the groups like much, as the acceptance rate never rises above 38%. Why should this be? What we have seen so far actually tells us the answer. Consider, first, the table in (28) which compiles, for ease of comparison, the Beginners' relevant results laid out previously in (23), (25) and (26): (28)Beginners' low acceptance of *AdvSAuxOV ((27a)/(28g)): OV and 'V3' comparisons Sentence number Sentence type Beginners a. (23a): OV SAuxOV 73% b. (23b): OV ...COMPSOV 89% c (23c): OV ... COMPSOVAux 65% d. (25b): OV AdvAuxSOV 92% e. (26a): 'V3' AdvSVO 38% f. (26b): 'V3' AdvSAuxVO 19% g. (27a): OV & 'V3' AdvSAuxOV 30% (28a) through (28d) (formerly (23a-c) and (25b)) have already shown that the Beginners overwhelmingly accept OV (65%-92%). So OV cannot be the reason for their distaste for (27a)/(28g). However, they do not yet like V3 - compare (28e) and (28f) with (28g): in (28e) and (28f), these Beginners accept V3 at low rates of 38% and 19%; similarly, in (28g) they accept V3 at a rate of only 30%. Hence, the reason the Beginners do not like (27a)/(28g), AdvSAuxOV, is because it is V3. As for the Low-intermediates, they do not like (27a) for the other reason, namely that it is OV. This is shown in (29), where the relevant Low-intermediates' acceptance rates are compared: (29) Low-intermediates' low OV comparisons Sentence number a. (26a): 'V3' b. (26b): 'V3' c. (23a): OV d. (23b): OV e (23c): OV f. (25b): OV g. (27a): 'V3' & OV
acceptance of * AdvSAuxOV ((27a)/(29g)): Sentence type AdvSVO AdvSAuxVO SAuxOV ...COMPSOV ... COMP SOVAux AdvAuxSOV AdvSAuxOV
'V3' and
Low-intermediates 80% 85% 40% 31% 26% 38% 38%
First compare the Low-intermediates' 38% acceptance for AdvSAuxOV in (27a)/(29g) - which, recall, is V3 plus OV - with the V3-plus-VO orders in (29a)
B.D. Schwartz I Lingua 106 (1998) 133-160
153
and (29b). Unlike the Beginners, the Low-intermediates accept V3 plus VO at very high rates, 80% to 85%. So, V3 cannot be the reason for their relative rejection of (29g). Rather, dislike of OV is the issue, as shown in (29c) through (29f) (spanning both matrix and embedded OV orders): for all these OV types, the Low-intermediates' acceptance of OV is comparably depressed, with rates of 26% to 40%. Drawing this all together, what's been deduced is in (30): (30) *AdvSAuxOV ((27a)): rejected by Beginners and Low-intermediates for distinct reasons (i) Beginners ((28)): because it is not V2, i.e. [second language] instinct (ii) Low-intermediates ((29)): because it is not VO, i.e. second [language instinct]
AdvSAuxOV, (27a), is disliked by the Beginners because of V3 but by the Lowintermediates because of OV. Or in other words, in keeping with my themes, AdvSAuxOV is disliked by the Beginners because of something they have not yet acquired - because they are still primarily relying on [second language] instinct, i.e. transfer of V2 - and by the Low-intermediates because of something (else) they have acquired - i.e. because of second [language instinct], i.e. UG accommodating VO input. The last order to be examined is AdvAuxSVO, (27b), which instantiates the combination of V2 and VO. The first thing to note is that the pattern across the three levels is unlike anything seen so far: despite the fact that AdvAuxSVO is ungrammatical in French, the Low-intermediates accept it at a rate higher than the Beginners (64% compared to 38%) - whereas the Intermediates correctly reject it. Why do the Low-intermediates perform less like native speakers than the Beginners on this (and only this) particular word order? The answer, I believe, lies in a comparison of (31) and (32), where again the relevant acceptance rates presented earlier have been combined: (31) Beginners' 'lower' acceptance of * AdvAuxSVO ((27b)/(31e)): V2 and VO comparisons Sentence number Sentence type Beginners a. (25a): V2 AdvVSO 92% b. (25b): V2 AdvAuxSOV 92% c. (24a): VO SAuxVO 42% d. (24b): VO ... COMP SAuxVO 27% e. (27b): V2 & VO AdvAuxSVO Clearly, the reason for the Beginners' lukewarm acceptance of AdvAuxSVO cannot be because it is V2, since the V2 orders of (3la) and (31b) are accepted at a very high rate, 92%. On the other hand, the lukewarm acceptance rates of (31c) and (3Id), from 27% to 42%, suggest VO is what is responsible for the Beginners' results on (27b)/(31e). They tend to reject VO because it is not an LI-based order. Consider next the results of the Low-intermediates, where we saw a rise in the acceptance rate for (27b) in comparison to the Beginners, even though (27b) is ungrammatical in French.
154
B.D. Schwartz I Lingua 106 (1998) 133-160
(32)Low-intermediates' 'higher' acceptance of *AdvAuxSVO ((27b)/(32e)): VO & V2 comparisons Sentence number Sentence type Low-intermediates a. (24a): VO SAuxVO 86% b. (24b): VO ... COMP SAuxVO 87% c. (25a): V2 AdvVSO 50% d. (25b): V2 AdvAuxSOV 38% e. (27b): VO & V2 AdvAuxSVO 64% I would like to suggest, as in (33), that the reason the Low-intermediates accept the ungrammatical AdvAuxSVO order ((27b)/(32e)) at a higher rate than the Beginners is because the Low-intermediates have essentially adopted VO (see (32a) and (32b)), but they have not yet completely abandoned V2. Compare, for instance, the results on (32e) and (32d): (32e), accepted at a rate of 64%, is the combination of VO and V2, whereas (32d), accepted at a rate of 38%, is the combination of (unambiguously) OV and V2.12 (33) *AdvAuxSVO ((27b)): Low-intermediates perform less like native speakers than Beginners (i) Beginners ((31)): because it is not OV, i.e. [second language] instinct (ii) Low-intermediates ((32)): because it is VO, i.e. second [language instinct]
The Beginners are still primarily relying on the OV order of their LI and so they reject the order in (27b), AdvAuxSVO, because it is VO. And it is because it is VO that the Low-intermediates, by contrast, accept it at a higher rate - that is, in response to French input they have engaged the language instinct again, flipping the headedness of VP from OV to VO. So, in sum, the fact that the Low-intermediates' performance on AdvAuxSVO looks 'worse' is precisely because their French Interlanguage syntax is getter closer to that of the Target Language. All in all, the Interlanguage of the Beginners exhibits extensive LI influence, in line with the Full Transfer part of FT/FA: it is OV, V2, and not quite yet V3. On the other hand, the Interlanguage of the Low-intermediates, while still showing some effects from LI Dutch, is clearly moving towards VO, is wavering about V2, and has V3. Thus, as mentioned earlier for the Low-intermediates (and the Intermediates see (25a) and (26)), Adverb preposing can be associated with either V2 or V3. Of course, this state of affairs holds of neither the LI nor the Target Language. Is this 'optionality' then a problem for the idea that UG constrains the development of adult Interlanguage? Why doesn't what is superficially V3 (from their French input) 'kick out' the V2 (from their LI grammar) more immediately? The reason this does not happen is because the analyses of V2 and French-type topicalization need not be taken to be in complementary distribution - for example, V2 as movements to the
12
The AdvVSO order in (32c) is derivationally ambiguous, from either a VO base order or an OV base order.
B.D. Schwartz I Lingua 106 (1998) 133-160
155
CP-level and topicalization as adjunction to IP (Schwartz and Tomaselli, 1990), as in (34). (34) a. [ C P AdvP[ c V [ + f i n ] [ I P S... b. [IP AdvP[ I P S ... Indeed, as Hulk (1991: 28) points out, such a V2-V3 co-existence is distinctly reminiscent of Middle French, as documented, for example, by Vance (1989). Idealizing slightly, then, the SVO, V2, 'V3' Interlanguage syntax of the Low-Intermediates, while no longer the syntax of Dutch and not yet the syntax of French, does seem to represent a natural language grammar. The Low-intermediates' acceptance of AdvAuxSVO (at a rate of 64%) takes on an added significance in light of the resemblance between Middle French and the LowIntermediates' Dutch-French Interlanguage; specifically, it constitutes a clear argument for UG constraining (adult) L2 development, in line with the Full Access part of FT/FA: As we have seen, AdvAuxSVO falls out from the combination of V2 (from Dutch) and VO (from the input). But this combination itself is not a pattern that occurs in their French input (or in their LI Dutch) - that is, AdvAuxSVO does not constitute part of their primary linguistic data - and yet the Low-intermediates accept it nonetheless. Thus, the acceptance of AdvAuxSVO represents a poverty of the stimulus effect, the strongest argument for the existence of UG there is.
6. Conclusion Throughout this paper I have argued that the LI grammar and UG (in light of TL input) together drive L2 acquisition. Earlier conceptions of L2 acquisition pitted one against the other (e.g. Dulay, Burt et al., 1982); these as well as recent proposals (Epstein et al., 1996; Platzack, 1996) are asking, quite simply, the wrong question: in discounting influence from the LI in their attempts to make a case for UG, they end up with no story to tell for the kind of L2 data covered above. Yet, even if the accounts I have offered are on the right track, they are not complete. In particular, my foci have been the L2 initial state and intermediate Interlanguages; what I have not broached is the topic of endstates. Recall from Section 1, however, that it was Pinker's (1994) remarks about adult L2 acquisition that instigated this essay, and there the concern seems to be exclusively with endstates. Among the differences between adult L2 acquisition and LI acquisition (see BleyVroman, 1990), endstates figure prominently. Serious projects exploring the characteristics of L2 endstates have been launched in recent years - for example the work of Sorace (e.g. 1993) and Borer (1995). Sorace (1993) has shown that even for nearnative L2 speakers, endstates can be qualitatively different, dependent on properties of the LI. A related L1-L2 difference is fossilization, on which there is very little empirical research. Work by Lardiere (e.g. 1998, to appear) constitutes, to my knowledge, the sole documentation of fossilization, specifically, the total absence of development by an adult nonnative speaker in an input-rich environment. Her case
156
B.D. Schwartz I Lingua 106 (1998) 133-160
study shows, interestingly, that what resists change is inflectional morphology. Indeed, inflectional morphology, even something seemingly simple like English present and past tense suffixation, is notoriously problematic for (adult) L2 acquirers, and so this is another observational difference between LI and L2 acquisition. Finally, the last difference I list, touched on earlier, is 'optionality': L2 systems, throughout development, seem to allow a lot more optionality or variation than seen in LI development. Recent attempts to shed light on the nature of the system underlying such variation include work by Eubank (1996), Sorace (1996), Prevost (1997) and White and Prevost (in press). I bring up these observations to acknowledge that there are definite, nontrivial differences between normal LI acquisition and typical adult L2 acquisition. What significance should one attach to them - especially the difference in endstates, which Pinker (in following Newport, 1990) appears to take as the primary point of comparison? Pinker, like Newport, seems to employ the criterion of 'identity in endstate' to assess UG's role in adult L2 acquisition (see also Bley-Vroman, 1990): in the acquisition of L, only if the L2 adult arrives at an endstate identical to the LI endstate can one conclude that UG constrains adult L2 acquisition, and if instead nonidentity is the result, then UG is not involved in adult L2 acquisition. While 'identity in endstate' is a straightforward criterion, it is not necessarily a valid criterion. As argued in Schwartz (1990, 1994), the mere existence of differences between LI and L2 endstates does not imply that the two necessarily instantiate fundamentally different, viz. epistemologically non-equivalent, knowledge types. In LI acquisition, the child's resulting grammar is not necessarily identical to that of the input providers', and indeed the child's re-creation of grammar is often cited as one locus of language change (e.g. Andersen, 1973). That grammars may differ from generation to generation - or for that matter more broadly in the evolution of language (e.g. Old English, Middle English, Modern English) - does not lead to the conclusion that these grammars are epistemologically not equivalent. They can differ, but they are of the same type of knowledge. Thus, on the basis of endstate difference alone, one cannot deduce epistemological non-equivalence. Similarly for the comparison of LI and L2 endstates: they may differ, perhaps even in significant ways, and they could be epistemologically equivalent (or non-equivalent); but simply because they are distinct does not entail epistemological non-equivalence. In short, non-identity between LI and (adult) L2 endstates in and of itself does not tell us much about what is happening in (adult) nonnative language acquisition. (For related discussion, see also White, 1996.) Instead of comparing differences between L1-L2 endstates, one can focus on the systems L2 acquirers build in the course of acquisition. At issue is the representation of L2 knowledge, and for this one should try to determine whether the independently motivated mechanisms of UG constrain Interlanguage syntax (duPlessis et al., 1987) - or even better, whether nonnative speakers end up with knowledge of relatively obscure properties of the TL that are poverty of the stimulus problems (e.g. Dekydtspotter et al., in press). Such tacks have the potential of revealing a great deal about what is happening in the development of nonnative systems. The empirical studies reviewed in this paper indicate that the L2 child and the L2 adolescent as well as the
B.D. Schwartz I Lingua 106 (1998) 133-160
157
L2 adult rely on the LI grammar and show UG-constrained development. And if one finds UG to be constraining Interlanguages developmentally, then non-identity in L1-L2 endstates becomes an independent issue - one, of course, that needs to be explained. So, despite the fact that L2 acquisition does not mirror LI acquisition initially, developmentally or even ultimately (Schwartz, 1992) - the data indicate that second language instincts are in operation: initially, TL input is filtered via at least parts of the LI grammar (Schwartz, 1987), irrepressibly, reflexively - instinctively, if you will. And developmentally, change is effected via the re-engagement of Universal Grammar - the other, original, language instinct. References Andersen, H., 1973. Abductive and deductive change. Language 49, 765-793. Bernstein, J., 1991. DPs in French and Walloon: Evidence for parametric variation in nominal head movement. Probus 3, 1-26. Bernstein, J., 1992. On the syntactic status of adjectives in Romance. CUNYForum 17, 105-122. Bley-Vroman, R., 1983. The comparative fallacy in interlanguage studies: The case of systematicity. Language Learning 33, 1-17. Bley-Vroman, R., 1990. The logical problem of foreign language learning. Linguistic Analysis 20, 3^9. Bley-Vroman, R., S. Felix and G. loup, 1988. The accessibility of Universal Grammar in adult language learning. Second Language Research 4, 1-32. Borer, H., 1995. Heblish: A study of steady state L2 acquisition. Manuscript, University of Massachusetts, Amherst. Chomsky, N., 1993. A minimalist program for linguistic theory. In: K. Hale, SJ. Keyser (eds.), The view from Building 20: Essays in linguistics in honor of Sylvain Bromberger, 1-52. Cambridge, MA: MIT Press. Cinque, G., 1994. On the evidence for partial N-movement in the Romance DP. In: G. Cinque, J. Koster, J.-Y. Pollock, L. Rizzi, R. Zanuttini (eds.), Paths towards Universal Grammar: Studies in honor of Richard S. Kayne, 85-110. Washington, DC: Georgetown University Press. Clahsen, H., S. EisenbeiB and A. Vainikka, 1994. The seeds of structure: A syntactic analysis of the acquisition of case marking. In: T. Hoekstra, B.D. Schwartz (eds.), Language acquisition studies in generative grammar: Papers in honor of Kenneth Wexler from the 1991 GLOW workshops, 85-118. Amsterdam: Benjamins. Clahsen, H. and P. Muysken, 1986. The availability of Universal Grammar to adult and child learners A study of the acquisition of German word order. Second Language Research 2, 93-119. Clahsen, H., A. Vainikka and M. Young-Scholten, 1990. Lernbarkeitstheorie und Lexikalisches Lernen: Eine kurze Darstellung des LEXLERN-Projekts. Linguistische Berichte 130, 466-477. Dekydtspotter, L., R.A. Sprouse and B. Anderson, in press. The interpretive interface in L2 acquisition: The process-result distinction in English-French Interlanguage grammars. Language Acquisition 6, 297-332. Dulay, H., M. Burt and S. Krashen, 1982. Language two. Oxford: Oxford University Press. duPlessis, J., D. Solin, L. Travis and L. White, 1987. UG or not UG, that is the question: A reply to Clahsen and Muysken. Second Language Research 3, 56-75. Emonds, J., 1978. The verbal complex V'-V in French. Linguistic Inquiry 9, 151-175. Epstein, S., S. Flynn and G. Martohardjono, 1996. Second language acquisition: Theoretical and experimental issues in contemporary research. Behavioral and Brain Sciences 19, 677-758. Eubank, L., 1993/94. On the transfer of parametric values in L2 development. Language Acquisition 3, 183-208. Eubank, L., 1996. Negation in early German-English Interlanguage: More Valueless Features in the L2 initial state. Second Language Research 12, 73-106.
158
B.D. Schwartz I Lingua 106 (1998) 133-160
Grondin, N. and L. White, 1996. Functional categories in child L2 acquisition of French. Language Acquisition 5, 1-34. Hawkins, R., R. Towell and N. Bazergui, 1993. Universal Grammar and the acquisition of French verb movement by native speakers of English. Second Language Research 9, 189-233. Haznedar, B., 1995. Acquisition of English by a Turkish child: On the development of VP and negation. Paper presented at Language Acquisition Research Symposium (LARS), University of Utrecht, 11 May (Manuscript, University of Durham). Haznedar, B., 1997a. Child second language acquisition of English: A longitudinal case study of a Turkish-speaking child. Ph.D. dissertation, University of Durham. Haznedar, B., 1997b. L2 acquisition by a Turkish-speaking child: Evidence for LI influence. In: E. Hughes, M. Hughes, A. Greenhill (eds.), Proceedings of the 21st annual Boston University Conference on Language Development, vol. 1, 245-256. Somerville, MA: Cascadilla. Hoekstra, T. and B.D. Schwartz, 1994. Introduction: On the initial states of language acquisition. In: T. Hoekstra, B.D. Schwartz (eds.), Language acquisition studies in generative grammar: Papers in honor of Kenneth Wexler from the 1991 GLOW workshops, 1-19. Amsterdam: Benjamins. Hulk, A., 1991. Parameter setting and the acquisition of word order in L2 French. Second Language Research 7, 1-34. Jagtman, M. and T. Bongaerts, 1994. Verb placement in L2 Dutch. Paper presented at the America Association for Applied Linguistics (AAAL), Baltimore, 8 March (Manuscript, Delft University of Technology/University of Nijmegen). Jansen, B., J. Lalleman and P. Muysken, 1981. The alternation hypothesis: Acquisition of Dutch word order by Turkish and Moroccan foreign workers. Language Learning 31, 315-336. Jordens, P., 1990. The acquisition of verb placement in Dutch and German. Linguistics 28, 1407-1448. Klein, W. and C. Perdue, 1992. Utterance structure (developing grammars again). Amsterdam: Benjamins. Lardiere, D., 1998. Case and tense in the 'fossilized' steady state. Second Language Research 14, 1-26. Lardiere, D., to appear. Dissociating syntax from morphology in a divergent L2 end-state grammar. Second Language Research. Meisel, J.M., H. Clahsen and M. Pienemann, 1981. On determining developmental stages in natural second language acquisition. Studies in Second Language Acquisition 3, 109-135. Newport, E., 1990. Maturational constraints on language learning. Cognitive Science 14, 11-28. Parodi, T., B.D. Schwartz and H. Clahsen, 1997. On the L2 acquisition of the morphosyntax of German nominals. Essex Research Reports in Linguistics 15, 1-43. [Revised version submitted for publication] Picallo, M.C., 1991. Nominals and nominalizations in Catalan. Probus 3, 279-316. Pinker, S., 1994. The language instinct. London: Penguin Books. Platzack, C., 1996. The initial hypothesis of syntax: A minimalist perspective on language acquisition and attrition. In: H. Clahsen (ed.), Generative perspectives on language acquisition, 369-414. Amsterdam: Benjamins. Poeppel, D. and K. Wexler, 1993. The full competence hypothesis of clause structure in early German. Language 69, 1-33. Pollock, J.-Y., 1989. Verb movement, Universal Grammar, and the structure of IP. Linguistic Inquiry 20, 365-424. Prevost, P., 1997. Truncation and root infinitives in second language acquisition of French. In: E. Hughes, M. Hughes, A. Greenhill (eds.), Proceedings of the 21st annual Boston University Conference on Language Development, vol. 2, 453^464. Somerville, MA: Cascadilla. Rohrbacher, B., 1994. The Germanic VO languages and the full paradigm: A theory of V to I raising. Ph.D. dissertation, University of Massachusetts, Amherst. Schwartz, B.D., 1987. The modular basis of second language acquisition. Ph.D. dissertation, University of Southern California. Schwartz, B.D., 1990. Un-motivating the motivation for the Fundamental Difference Hypothesis. In: H. Burmeister, P. Rounds (eds.), Variability in second language acquisition, 667-684. Eugene, OR: University of Oregon. Schwartz, B.D., 1992. Testing between UG-based and problem-solving models of L2A: Developmental sequence data. Language Acquisition 2, 1-19.
B.D. Schwartz I Lingua 106 (1998) 133-160
159
Schwartz, B.D., 1994. L2 knowledge: What is the null hypothesis? In: W.T. McClure, B.D. Schwartz, I.-M. Tsimpli (eds.), Newcastle and Durham Working Papers in Linguistics 2, 145-154. Schwartz, B.D., 1997. On the basis of the Basic Variety .... Second Language Research 13, 386-402. Schwartz, B.D., 1998. On two hypotheses of 'Transfer' in L2A: Minimal Trees and Absolute LI Influence. In: S. Flynn, G. Martohardjono, W. O'Neil (eds.), The generative study of second language acquisition, 35-59. Mahwah, NJ: Erlbaum. Schwartz, B.D., in press. Some specs on Specs in L2 acquisition. In: D. Adger, S. Pintzuk, B. Plunkett, G. Tsoulas (eds.), Specifiers: Minimalist approaches. Oxford: Oxford University Press. Schwartz, B.D. and L. Eubank, 1996. What is the L2 initial state? Introduction. Second Language Research 12, 1-5. Schwartz, B.D. and R.A. Sprouse, 1994. Word order and nominative case in nonnative language acquisition: A longitudinal study of (LI Turkish) German Interlanguage. In: T. Hoekstra, B.D. Schwartz (eds.), Language acquisition studies in generative grammar: Papers in honor of Kenneth Wexler from the 1991 GLOW workshops, 317-368. Amsterdam: Benjamins. Schwartz, B.D. and R.A. Sprouse, 1996. L2 cognitive states and the Full Transfer/Full Access model. Second Language Research 12, 40-72. Schwartz, B.D. and R.A. Sprouse, 1997a. Evidence for full transfer in German-English Interlanguage. Paper presented at Second Language Research Forum (SLRF), Michigan State University, East Lansing, MI, 17 October (Manuscript, University of Durham/Indiana University). Schwartz, B.D. and R.A. Sprouse, 1997b. Transfer: A tradition in transition. Paper presented at the American Association for Applied Linguistics (AAAL), Orlando, FL, 9 March (Manuscript, University of Durham/Indiana University). Schwartz, B.D. and A. Tomaselli, 1990. Some implications from an analysis of German word order. In: W. Abraham, W. Kosmeijer, E. Reuland (eds.), Issues in Germanic syntax, 251-274. Berlin: Mouton de Gruyter. Sorace, A., 1993. Incomplete vs. divergent representations of unaccusativity in nonnative grammars of Italian. Second Language Research 9, 22-47. Sorace, A., 1996. Permanent optionality as divergence in non-native grammars. Paper presented at the 6th annual Conference of the European Second Language Association (EUROSLA), University of Nijmegen, 1 June. Sprouse, R.A. and B.D. Schwartz, 1998. In defense of Full Transfer in German-English and French-English Interlanguage: Comparative L2 acquisition research. In: A. Greenhill, M. Hughes, H. Littlefield, H. Walsh (eds.), Proceedings of the 22nd annual Boston University Conference on Language Development, vol. 2, 726-736. Somerville, MA: Cascadilla. Stutterheim, C. von, 1986. Temporalitat in der Zweitsprache: Eine Untersuchung zum Erwerb des deutschen durch tiirkische Gastarbeiter. Berlin: de Gruyter. Vainikka, A., 1993/94. Case in the development of English syntax. Language Acquisition 3, 257-325. Vainikka, A. and M. Young-Scholten, 1994. Direct access to X'-theory: Evidence from Korean and Turkish adults learning German. In: T. Hoekstra, B.D. Schwartz (eds.), Language acquisition studies in generative grammar: Papers in honor of Kenneth Wexler from the 1991 GLOW workshops, 265-316. Amsterdam: Benjamins. Vainikka, A. and M. Young-Scholten, 1996a. The early stages in adult L2 syntax: Additional evidence from Romance speakers. Second Language Research 12, 140-176. Vainikka, A. and M. Young-Scholten, 1996b. Gradual development of L2 phrase structure. Second Language Research 12, 7-39. Vance, B., 1989. Null subjects and syntactic change in Medieval French. Ph.D. dissertation, Cornell University. White, L., 1985. The pro-drop parameter in adult second language acquisition. Language Learning 35, 47-62. White, L., 1989. Universal Grammar and second language acquisition. Amsterdam: Benjamins. White, L., 1990/91. The verb-movement parameter in second language acquisition. Language Acquisition 1, 337-360. White, L., 1991. Adverb placement in second language acquisition: Some effects of positive and negative evidence in the classroom. Second Language Research 7, 133-161.
160
B.D. Schwartz I Lingua 106 (1998) 133-160
White, L., 1992. Long and short verb movement in second language acquisition. Canadian Journal of Linguistics 37, 273-286. White, L., 1996. Universal grammar and second language acquisition: Current trends and new directions. In: T. Bhatia, W. Ritchie (eds.), Handbook of second language acquisition, 85-120. New York: Academic Press. White, L. and P. Prevost, in press. Accounting for morphological variation in L2 acquisition: Truncation or missing inflection? In: M.A. Friedemann, L. Rizzi (eds.), The acquisition of syntax: Issues in comparative developmental linguistics. London: Longman.
Lingua ELSEVIER
Lingua 106 (1998) 161-196
Learning Optimality-Theoretic grammars^ Bruce B. Tesar3, Paul Smolenskyb-* " Linguistics Department and Rutgers Center for Cognitive Science, Rutgers University, New Brunswick, NJ 08903, USA h
Cognitive Science Department and Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, USA
Abstract We present evidence that Optimality Theory's account of Universal Grammar has manifold implications for learning. The general principles of Optimality Theory (OT; Prince and Smolensky, 1993) are reviewed and illustrated with Grimshaw and Samek-Lodovici's (1995) OT theory of clausal subjects. The optimization structure OT provides grammar is used to derive a principled decomposition of the learning problem into the problem of assigning hidden structure to primary learning data and the problem of learning the grammar governing that hidden structure. Methods are proposed for analyzing both sub-problems, and their combination is illustrated for the problem of learning a stress system from data lacking metrical constituent boundaries. We present general theorems showing that the proposed solution to the grammar learning sub-problem exploits the special structure imposed by OT on the space of human grammars to correctly and efficiently home in on a target grammar. Keywords: Learnability; Acquisition; Optimality Theory
How, exactly, is learnability enhanced by the structure imposed by a particular grammatical theory on the space of possible human grammars? In this paper we summarize research on this question addressed to the particular theory of grammar provided by Optimality Theory (Prince and Smolensky, 1991, 1993); we argue that the structure imposed by Optimality Theory on Universal Grammar has strong consequences for learnability. Our central results were presented in Tesar and Smolen-
* Many thanks to the organizers and participants of the 1997 GALA conference, which provided a delightful and stimulating environment for pursuing a number of the issues discussed here. We are greatly indebted to Alan Prince, whose challenges, insights, and suggestions have been instrumental to the work summarized. We are also grateful for helpful discussion and suggestions from John McCarthy, Geraldine Legendre, Jane Grimshaw, Bruce Hayes, Linda Lombardi, Luigi Burzio, and Clara Levelt. Finally, we thank an anonymous reviewer for helpful comments and questions. * Tesar: Phone: +1 (732) 932-7289; Fax: +1 (732) 932-1370; E-mail: [email protected]; Smolensky: Phone: +1 (410) 516-5114, Fax: +1 (410) 516-8020, E-mail: [email protected] 0024-3841/997$ - see front matter © 1999 Elsevier Science B.V. All rights reserved PII: 80024-3841(98)00033-3
162
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
sky, 1993, 1998; the discussion here focuses on issues and examples treated briefly or not at all in that article. Some of the fuller treatment provided here was presented previously in the unpublished report Tesar and Smolensky, 1996 (henceforth, 'T&S'); this report also treats additional topics which are briefly summarized here. 1. Learnability and the structure of UG Any grammatical theory that admits only a finite number of possible grammars reduces the search space of the learner from an infinite number of conceivable languages to a finite space, affording a dramatic asset to learnability. Or so it would seem. In fact, of course, such a grammatical theory really serves only to improve - albeit dramatically - the worst-case performance of the least informed search method of all, exhaustive search: at worst, in the finite case, search is guaranteed to terminate after all possible grammars have been examined; in the infinite case, search may continue forever. In learnability theory, comfort from the finiteness of the space of possible grammars is tenuous indeed. For a grammatical theory with an infinite number of possible grammars might be well-structured, permitting informed search which converges quickly to the correct grammar, even though uninformed, exhaustive search is infeasible. And it is of little value that exhaustive search is guaranteed to terminate eventually because the space of possible grammars is finite, if the number of grammars is astronomical. A well-structured theory admitting an infinity of grammars could well be feasibly learnable, while a poorly-structured theory admitting a finite but very large number of possible grammars might not. And indeed, a principles-and-parameters grammar space with n parameters admits at least 2" grammars; more, if the parameters are not binary. Such exponential growth in the number of parameters quickly leads to spaces much too large to search exhaustively. An Optimality Theoretic grammar space with n constraints admits n \ grammars, which grows still faster. Thus to achieve meaningful assurance of learnability from our grammatical theory, we must seek evidence that the theory provides the space of possible grammars with the kind of structure that learning can effectively exploit. Consider principles-and-parameters (P&P) theory in this regard; our discussion will be brief since our objective is not theory comparison but presentation of results about Optimality Theoretic learnability. Two types of learnability research are useful as contrasts to what we provide below. The first is cue learning, exemplified by work such as Dresher and Kaye (1990). They adopt a particular parametrized space of grammars, and analyze in great detail the relationships between the parameter settings and the overtly available forms. They propose a specific learning algorithm to make use of the structure provided by a specific theory. The overall approach of Dresher and Kaye's work differs from our approach in several respects. Their analysis is entirely specific to their particular parametric system for metrical stress; a cue learning approach to a parametric grammar for some other aspect of linguistic theory, or even an alternative parametric analysis of metrical stress, would essentially
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
163
require starting over from scratch. Further, their algorithm is evaluated entirely through computer simulations; despite the great specificity, they have little in the way of formal analysis to validate the algorithm. Our work aims to identify the goal of principles of learning general to the framework of Optimality Theory, principles which apply to all OT systems (not restricted even to, say, phonology). Such a goal is not likely to be reached all in one step. Our formal results to date address only the sub-problem of learning the grammar from full structural descriptions. The algorithms we propose apply to all OT systems, and the validity of the algorithms, with regard to both correctness and tractability, is ensured by theorems. These strong results on learnability from full structural descriptions open opportunities for results on the more general problem of learning grammars from overtly available data, while applying to OT systems in general. Another approach to learnability within P&P, quite different from cue learning, is represented in the work of Gibson and Wexler (1994), and Niyogi and Berwick (1993). The triggering learning algorithm (and its variations) is designed to learn grammars from overtly available data. Like our work, it applies to any instance of a general class of systems: in their case, the class of P&P systems. Further, Niyogi and Berwick (1993), provides formal analysis of the algorithms. However, this work differs from ours in a way that represents the opposite extreme from cue learning. The P&P algorithms are minimally informed by the grammatical theory; they treat the grammar only as a black box evaluating learning data as either analyzable or not, and involve either randomly flipping parameters in order to render an input analyzable (Gibson and Wexler's Triggering Learning Algorithm), or randomly flipping parameters without regard to immediate resulting analyzability (which, Niyogi and Berwick argue, can actually outperform the Triggering Learning Algorithm). These are simply generic search algorithms which employ no properties of the grammatical theory per se. Our approach, by contrast, makes explicit use of the special structure of OT. Further, the learnability results relating to triggers presume the existence of data which directly reveal individual parameter values, an assumption of unclear relevance to realistic grammars (see Frank and Kapur, 1996); we discuss this further in Section 6.1. Finally, regardless of the availability of such triggering forms, these algorithms offer no guarantee of tractability. In fact, the only result regarding time complexity is that the probability of learning the correct grammar increases towards 1.0 as the number of learning instances approaches infinity,1 leaving the possibility of doing even worse than brute-force enumeration. These two approaches to learnability analysis within P&P, then, either: (i) use grammatical structure in the learning algorithm, but the structure of a particular parametric system, or (ii) develop general algorithms applicable to any P&P system, but algorithms which are so general they apply just as well to any non-grammatical parametrized system. This dichotomy of approaches is likely a consequence of the nature of P&P. A particular P&P system, like one for stress, has sufficient structure to inform a learning procedure (option i). But as a general theory of how grammars 1
This is the case even when triggering forms exist for all parameter settings, and those forms appear among those presented to the learner with some reasonable frequency.
164
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
may differ (as opposed to how stress systems may differ), P&P provides little structure to exploit beyond the existence of a finite space for searching. But the situation in Optimality Theory is quite different. This theory is reviewed in Section 3, but the immediately relevant claims of OT are these: (1) OT in a nutshell a. What is it that all grammars have in common? A set of constraints on wellformedness. b. How may languages differ? Only in which constraints have priority in case of conflict. (Note that the constraints of (l)a are the same in all languages: they contain no parameters.) Unlike P&P, this is a theory of cross-linguistic variation with sufficient structure to enable grammatically-informed learning algorithms that are independent of substantive grammatical assumptions - this is the central result discussed in this paper: (2) Main claim OT is a theory of UG that provides sufficient structure at the level of the grammatical framework itself to allow general but grammatically-informed learning algorithms to be formally defined and proved correct and efficient. The algorithms we develop are procedures for learning the priority ranking of constraints which, by (Ib), is all that distinguishes the grammar of a particular language. These are unquestionably grammar-learning algorithms, not generic search algorithms. Yet the structure that makes these algorithms possible is not the structure of a theory of stress, nor a theory of phonology: it is the structure defining any OT grammar: (1). Of course, if a grammatically-wmnformed learning algorithm, such as the Triggering Learning Algorithm, is desired, it can be instantiated as easily in OT as in P&P; in fact, Pulleyblank and Turkel (1995, 1998) have already formulated and studied the 'Constraint-Ranking Triggering Learning Algorithm'. Indeed, any number of generic search algorithms can be applied to the space of OT grammars: for example, Pulleyblank and Turkel (1995, 1998) have also applied the genetic algorithm to learning OT grammars. But unlike P&P, with OT we have an alternative to grammaticallyuninformed learning: learning algorithms specially constructed to exploit the structure provided by OT's theory of cross-linguistic variation.2 2
Any algorithm is a solution to an abstract, formal problem, and thus to any particular problem that can be cast as an instance of that formal problem. Thus our learning algorithms apply to any learning problem in which the learner's task is to find an unknown ranking of a given set of constraints with respect to which a given set of structures are optimal (3). Because the only known problem with this structure is that of learning grammars under OT, this algorithm can be said to exploit characteristically linguistic structure. By contrast, generic P&P learning algorithms apply to any learning problem in which the search space is parameterized; this weak structure is shared by a huge class of learning problems, in which parameterized grammars have no distinguished status. Thus the weak generic P&P learn-
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
165
To begin the study of OT learnability, the problem to be solved must first be formulated rather precisely. This turns out to be a somewhat involved matter in itself: it is the topic of Section 2. This analysis requires one further concept from OT, which will be developed in Section 3.2; for now, the following synopsis will suffice: (3) Harmony and Optimality a. The grammar of a particular language is an evaluator of structural descriptions, assigning a degree of Harmony which (non-numerically) assesses the degree to which universal well-formedness conditions are met, taking into account language-particular constraint priorities. This provides the harmonic ordering of forms, ordering structural descriptions from maximal to minimal Harmony. b. The grammatical forms of the language are the optimal ones: the wellformed structural description of an input is the one with maximal Harmony. 2. Decomposing the language learning problem Our OT learnability results address a grammar learning problem which must be carefully separated from a closely related but quite distinct problem. Defining and justifying this separation is the goal of this section. 2.1. Grammar learning and robust interpretive parsing To begin, we must distinguish three elements: (4) The players in order of their appearance a. Overt part of grammatical forms: is directly accessible to the learner. b. Full structural descriptions: combine overt and non-overt, 'hidden', structure. c. The grammar: determines which structural descriptions are well-formed. These three elements are all intimately connected, yet we propose to distinguish two sub-problems, as schematically shown in Fig. 1. (5) Decomposition of the problem a. Robust Interpretive Parsing: mapping the overt part of a form into a full structural description, complete with all hidden structure - given a grammar. b. Learning the Grammar - given a (robust) parser. ing algorithms are equally suitable a priori to Chomskyan P&P grammar learning, to OT grammar learning, to learning to distinguish rocks from submarines by their sonar echoes, and to a host of other problems. The algorithms developed below acquire their relative strength from exploiting stronger structure characteristic of a much more restricted class of learning problems: only of the OT grammar learning problem, as far as we know.
166
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
Full structural descriptions
Grammar
Fig. 1. Problem decomposition.
('Robust' will mean the ability to parse overt structure that is not grammatical according to the grammar; the importance of this will be discussed shortly.) The question we address is whether a grammatical theory provides sufficient structure so that procedures for both parsing and grammar learning can strongly exploit grammatical principles. We will argue that for Optimality Theory, the answer is yes. We propose that the problems of parsing and grammar learning be de-coupled to some degree. Such separation does at first seem problematic, however. One of the central difficulties of language learning, of course, is that grammars refer crucially to non-overt, hidden structure. Let's take the acquisition of stress as an expository example. The problem, then, is that the grammatical principles concern metrical feet, yet these are hidden in the data presented to the learner: only the location of some stressed syllables is provided overtly. The learner can't learn the metrical grammar until she knows where the feet lie; but she can't know where the feet lie until she knows the grammar. It is the burden of this section to argue that, despite this conundrum, partial de-coupling of the parsing and learning problems makes good sense. 2.2. Iterative model-based solutions to the problem of learning hidden structure The learner can't deduce the hidden structure in learning data until she has learned the grammar; but she can't learn the grammar until she has the hidden structure. This feature of the language learning problem is challenging indeed; but not at all special to language, as it turns out. This problem of learning in the face of crucial hidden structure has been extensively studied in the learning theory literature (often under the name 'unsupervised learning', e.g., Hinton, 1989). Much of this work has addressed perception, including automatic speech recognition (mostly under the name 'Hidden Markov Models', e.g., Baum and Petrie, 1966; Bahl et al., 1983; Brown et al., 1990). This problem
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
167
has been quite successfully addressed, in both theory and practice, with a class of algorithms, the most important of which is the Expectation Maximization (EM) algorithm (Dempster et al., 1977; for recent tutorial introductions, see Nadas and Mercer, 1996; Smolensky, 1996a). The basic idea common to this class of algorithms, which we will call iterative model-based learning algorithms, may be characterized in very general terms as follows:3 (6) Iterative model-based solution to the problem of learning hidden structure 0. Adopt some initial model of the relation between hidden and overt structure (e.g., a grammar); this can be a random or a more informed initial guess. 1. Given this initial model, and given some overt learning data, find the hidden structure which, together with the observed data, is most probable according to the model. This first step of the algorithm is performed on all the available data. 2. Now that we have assigned hidden structure (initially incorrect) to the overt data, we use it to improve our model. We adopt as the new model that which makes most likely the analyzed training data: the overt data together with hidden structure assigned in Step 1. 1'. Now that the model has been changed, it will assign different (generally more correct) hidden structure to the original overt data. So we re-assign hidden structure to the data, re-executing Step 1. 2'. This new assignment of hidden structure permits Step 2 to be repeated, leading to a new (generally improved) model. And so Steps 1 and 2 are executed repeatedly. This is summarized in row a of (7); rows b-d summarize the rest of this section.
3
This characterization corresponds most closely to the Viterbi algorithm; see, e.g., Nadas and Mercer, 1996; Smolensky, 1996a.
168
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
(7) Iterative model-based learning algorithms, from probabilistic data modeling to OT Execute Repeatedly: Iterative model-based Step 1 solution to the problem of learning Find ... hidden structure ... Find the hidden structure ...
Step 2
. . . that is most probable when paired ... the model that makes Step 1's with the overt data, given the current pairing of overt and hidden structure most probable. model of overt/hidden relations.
a.
. . . under general probabilistic data modeling
b.
. . . under Harmony . . . that is most Harmonic (probable) ... the grammar that makes Step 1's pairing of overt and hidden structure Theory /Harmonic when paired with the overt data, given the current grammar (model of most Harmonic (probable). Grammar overt/hidden relations).
c.
... under OT:
. . . consistent with the overt learning ... a grammar that makes Step 1's pairing of overt and hidden structure data that has maximal Harmony, given the current grammar. optimal.
RIP/CD
Robust Interpretive Parsing
Grammar Learning
Correctness Criterion
Correctly compute this hidden structure, given the correct model/grammar.
Correctly compute this model/grammar, given the correct hidden structure.
d.
In various formalizations, it has been proven that iterative model-based algorithms converge to a model which is in some sense optimal; in practice, convergence often occurs after only a few iterations, even with a very poor initial model. The key to success is combining correct solutions to the two sub-problems addressed in Steps 1 and 2. Crucially, 'correct' here means: finding the correct solution to one sub-problem, assuming the other sub-problem has been correctly solved. As summarized in row d of (7): (8) Correctness criteria for solutions of iterative model-based sub-problems Step 1. Given the correct model of overt/hidden relations, correctly compute the hidden structure that is most probable when paired with the overt data. Step 2. Given the correct hidden structure, correctly compute the model that makes the given pairing of overt and hidden structure most probable. The iterative model-based approach to learning can be connected directly with OT with the mediation of Harmony Theory (Smolensky, 1983, 1986), from the theory of neural networks. In Harmony Theory, the well-formedness of a representation in a neural network is numerically measured by its Harmony value //, and the probability of a representation is governed by its Harmony: the greater the Harmony, the higher the probability (prob <* eH). A representation has a hidden part and an overt
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
169
part, and the Harmony function provides the model which relates these two parts: given some overt structure, associating it with different hidden structures leads to different Harmony values (and hence different probabilities). In Step 1 of the iterative learning algorithm (6), given some overt learning data we find the hidden structure which makes the overt data most probable: this means finding the hidden structure which maximizes Harmony, when associated with the given overt structure. In Step 2, we use this hidden structure to change the model: change the Harmony function so that the just-derived hidden/overt associations have the greatest possible Harmony. In Harmonic Grammar (Legendre et al., 1990a,b), an application of Harmony Theory to linguistics, the overt and hidden structures are parts of linguistic structural descriptions, and the model that governs the relation between overt and hidden structure is a grammar. In this context, the iterative algorithm (7a) becomes (7b), and the correctness criteria (8) for the corresponding subproblems becomes (7d), with 'grammar' in place of 'model.' In Optimality Theory, the Harmony of structural descriptions is computed from the grammar non-numerically, and there is (as yet) no probabilistic interpretation of Harmony. But the learning procedure of the preceding paragraph still makes perfect sense; it is summarized in (7c), and labeled RIP/CD, for 'Robust Interpretive Parsing/Constraint Demotion': Constraint Demotion is the procedure we will propose for performing the Grammar Learning of Step 2. Given some overt learning data, RIP/CD first computes the hidden structure which has maximal Harmony when combined with the overt structure. Given learning data consisting of a sequence of syllables with stresses, for example, we find the foot structure which, in combination with the given stress pattern, has maximal Harmony. Which foot structure this is depends jointly on the overt stresses and on the currently assumed grammar - the current ranking of metrical constraints. So the algorithm proceeds as follows. Start with an initial grammar (an issue we touch upon in Section 6.2). In Step 1 (RIP Step), use this grammar to assign (initially incorrect) hidden structure to the overt learning data by maximizing Harmony. In Step 2, (CD Step) use this hidden structure to learn a new grammar, one in which each combined hidden/overt structure of the currently analyzed data has higher Harmony than all its competitors. With this improved grammar, return to Step 1 and repeat. Whether this iterative algorithm can be proved to converge, whether it converges in a reasonable time, whether it can be converted into an algorithm that operates effectively on one datum at a time - these and other issues are all open research problems at the moment. But successful experimental results (Tesar, 1997, 1998b) and previous general experience with iterative model-based algorithms suggests that there are reasonable prospects for good performance, provided we can devise efficient and correct algorithms for solving the two sub-problems of our decomposition (7): robust interpretive parsing and grammar learning. The criteria of correctness (8) are now: (9) Correctness criteria for solutions to sub-problems under OT a. Robust interpretive parsing: Given the correct grammar, compute the hidden structure that is most harmonic when paired with the overt data.
170
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
b. Grammar learning: Given the correct hidden structure, find the grammar that makes the pairing of overt and hidden structure optimal. We are proposing here to analytically separate the parsing and the grammar learning problems; the latter is the topic of this paper, and we restrict our remarks concerning the former to a few points. With respect to the first sub-problem (9a), parsing, it is essential that we have a parser that can use a grammar to assign hidden structure to overt forms that are not grammatical according to that very grammar: this is what we mean by 'robustness'. For our problem decomposition, we now see, imposes a seemingly paradoxical requirement. An overt form will be informative (allow the learner to improve the grammar) if the current grammar (incorrectly) declares it to be ungrammatical. Step 1 of the RIP/CD algorithm requires that we use our current (incorrect) grammar to parse this input (assign it hidden structure), even though the grammar declares it illformed. For many formal grammars, an ungrammatical form is, by definition, unparsable; yet Step 1 requires the grammar to parse it just the same. OT grammars can naturally cope with this demand. An OT grammar provides a Harmony ordering of all full structural descriptions (3). This Harmonic Ordering can be used in a variety of ways. The customary use is as follows: given an input /, Gen(I) is the set of all structural descriptions of /; we find the maximal-Harmony member of this set, and it is the output assigned to /. This use of the grammar corresponds to the 'language generation' problem of computational linguistics, or the 'language production' problem of psycholinguistics. But as proposed in Smolensky (1996b), Harmonic Ordering can be used for the reverse 'language interpretation' or 'language comprehension' problem as well. Now we are given an overt 'phonetic' form (p. The set 7n?(q>) is the class of all structural descriptions with overt part equal to (p. Call the maximal-Harmony member of this set the interpretative parse assigned to (p by the grammar. Crucially for present purposes, this interpretation process makes sense even when the grammar declares cp ungrammatical (i.e., even when there is no input / for which the optimal member of Gen(I) has overt form (p). An algorithm that can compute this mapping from cp to its interpretative parse is thus a robust interpretive parser capable of performing Step 1 of the RIP/CD algorithm. Parsing algorithms that exploit the optimization character of OT grammars have been developed (Ellison, 1994; Tesar, 1994, 1995a,b, 1996; Eisner, 1997; see also Frank and Satta, in press; Karttunen, 1998). Under general formal assumptions on Gen and Con, these algorithms are proved to be correct and efficient. These algorithms address the parsing problem in the 'language production' direction, rather than the 'language interpretation' direction required here: given an input to the grammar, they compute its optimal structural description. We will call this production-directed parsing to contrast it with the interpretive parsing used in RIP/CD. There are a number of problems which must be solved in constructing parsing algorithms which meet the correctness criterion of robust interpretive parsing in (9a), but positive results have already been achieved (Tesar, in press; see also Hammond, 1997). All evidence to date indicates that the optimization character of OT does not increase the computational cost of parsing. Developers of grammatical frameworks
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
171
typically assume that the computational demands of parsing can be successfully achieved, and there is already evidence that this is case for parsing in OT. We thus assume that algorithms for computing optimal interpretative parses from overt inputs exist; obviously, these are needed eventually for a comprehensive OT theory of language processing. The optimization character of OT means that a solution to the interpretive parsing problem naturally provides a robust parser: the optimization required to parse an ungrammatical structure is the same as that required to parse a grammatical structure. The point is that such an interpretive parser, already required independently of learning considerations, can be directly recruited via the RIP/CD algorithm into the service of grammar learning. In the context of theoretical - as opposed to computational - linguistics, formal, general, efficient algorithms for parsing are neither commonly exhibited nor commonly needed. We will see in our concrete little example of RIP/CD in Section 5, just as OT in theoretical linguistics has to date been entirely workable without explicit production-directed parsing algorithms, RIP/CD is equally workable without explicit interpretive parsing algorithms. The hand-calculational tool of constraint tableaux will allow us to find the interpretive parse of an overt structure just as they have always allowed OT practitioners to find the production-directed parse of an underlying form. We now put aside parsing until Section 5 and focus on the grammar learning problem. To meet the correctness criterion for this problem (9)b, we need to solve the following formal problem: (10) Our learning problem Given: The universal components of any OT grammar Learning data in the form of [correct] full structural descriptions of grammatical forms Find: A language-particular OT grammar with respect to which all the forms in the given data are optimal In Section 4 we summarize a family of solutions to this problem. We briefly return in Section 5 to illustrate how our solutions to the grammar learning problem can be combined with robust interpretive parsing to yield the larger RIP/CD learning algorithm which operates only on overt learning data. In Section 6 we provide an overall summary of the OT learnability theory we are sketching here. We begin in Section 3 with a quick review of OT so that we may precisely specify what we mean in the learning problem (10) by 'the universal components of any OT grammar' and 'a language-particular OT grammar'. But before reviewing OT we should clear up one potential confusion regarding problem (10), the focus of this paper. On first glance, this problem may seem trivial, since knowing the full structural descriptions provides considerable information about the grammar which is not evident in the overt data. What this glance crucially misses is that in OT, the grammatical principles (constraints) interact in a rich, complex way. There is nothing like a transparent mapping from the hidden structure to the grammar: the explanatory power of OT lies precisely in the diversity of struc-
172
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
tural consequences of a constraint embedded within a hierarchy. Knowing the location of the metrical feet in a word, for example, leaves one far short of knowing the metrical grammar. For an OT grammar is a collection of violable constraints, and any given foot structure will typically involve the violation of many different constraints, and many OT language-particular grammars will be consistent with the given foot structure. A learning algorithm faces a serious challenge in deducing the grammar that explains not just the given form, but also the forms yet to be seen. (Linguists who have actually faced the problem of deducing OT grammars from a complete set of full structural descriptions can attest to the non-triviality of solving this problem, especially in the general case. Indeed, the class of learning algorithms we will discuss have practical value for linguists working in Optimality Theory. Given hypothesized structural descriptions of language data and a hypothesized set of constraints, the algorithms can quickly and easily provide a class of constraint rankings which account for the data, or directly determine that no such ranking exists -seeT&S: §11.6.) The difficulty of the problem of deducing grammars from full structural descriptions is related to the explanatory power of OT as a linguistic theory. We return to this important issue, and some comparisons to Principles-and-Parameters learnability work, in Section 6.1, after we have presented the relevant OT principles and learnability results. To summarize, in this section we have argued that the following proposal is wellmotivated on general learning-theoretic grounds, and naturally applicable to OT: (11) Problem decomposition Two interrelated sub-problems may be distinguished: - Robust interpretive parsing - Grammar learning Given a solution to each subproblem, correct if the other problem is correctly solved, these can be iteratively applied to yield the RIP/CD algorithm, which learns grammars from overt data. Having developed a problem decomposition in which the grammar learning subproblem is formulated as in (10), we now proceed to flesh out the problem with the particular grammatical structure provided by OT. 3. The grammar learning problem in Optimality Theory Optimality Theory defines grammaticality by optimization over violable constraints. The defining reference is Prince and Smolensky, 1993 (abbreviated 'P&S' here; see also McCarthy and Prince, 1993). Sections 3.1 and 3.2 below provide the necessary OT background, while Section 3.3 formulates the precise OT grammar learning problem addressed in the subsequent sections. Readers familiar with OT may wish to move directly to Section 3.3.
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
173
We present the basics of OT as a series of general principles, each exemplified within the theory of the distribution of clausal subjects proposed in Grimshaw and Samek-Lodovici (1995), (see also Samek-Lodovici, 1994, 1996; Grimshaw and Samek-Lodovici, 1998). This theory, which we will dub 'GSL', is also used in Section 4 to illustrate the learning procedure, and in Fig. 2 to schematically depict the core structure of OT. In
P«t
Output
Gen (sing(x), x=topic, x=he; Tense = present perfect}^-.
_ Hlp he has [t sun
'
<
^]
[IP has sung]
.^
^ [IPhas sung]
Fig. 2. Optimization in OT.
3.1. Constraints and their violation (12) Grammars specify functions A grammar specifies a function which assigns to each input a structural description or output. (A grammar per se does not provide an algorithm for computing this function, e.g., by sequential derivation.) In GSL, an input is "a lexical head with a mapping of its argument structure into other lexical heads, plus a tense specification ... as in Grimshaw (1995). The ... input also specifies which arguments are foci, and which arguments are coreferent with the topic." (Grimshaw and Samek-Lodovici, 1995: 590). The example we will use is shown in (13) (and Fig. 1); it represents the predicate sing, in the present perfect tense, with a masc. sing, argument that is the current discourse topic. An output in GSL is an X' structure, a possible extended projection for the lexical head in the sense of Grimshaw, 1991. Several examples are given in (13)) (13) Some outputs for the input / = (sing(x), x=topic, x=he;Tense=present perfect) a. [ip has [ sung]] b. [IP he, has [t,- sung]] c. [IP has [[t,- sung] he, ]] d. [IP it has [[t, sung] he, ]]
174
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
In the discussion below, these outputs will consistently be labeled a-d as in (13). Output a is a clause with no subject: the highest projection of the verb, labeled IP, has no Spec position. Output b has he in SpecIP, co-indexed with a trace in SpecVP. Output c has no SpecIP position, and he right-adjoined to VP, co-indexed with a trace in SpecVP; output d is the same, but with an expletive subject in SpecIP. (14) Gen: Universal Grammar provides a function Gen which, given any input /, generates Gen(I), the set of candidate structural descriptions (or 'parses') for /. For the input / given in (13), Gen(I} includes the four outputs a-d, along with others such as the entirely empty null parse, 0. Each structural description of / in Gen(f) should be understood to include / itself as a subpart, along with the output X' structure. Following McCarthy and Prince (1995), we may assume that each structural description includes a correspondence relation linking the lexical heads in / with their correspondents in the output. Output (13a) displays underpaying: an element of the input, x, has no correspondent in the output. Output (13d) displays overparsing: an element of the output, it, has no correspondent in the input. (15) Con: Universal Grammar provides a set Con of universal well-formedness constraints. The constraints in Con evaluate the candidate outputs for a given input in parallel (i.e., simultaneously). Given a candidate output, each constraint assesses a multi-set of marks, where each mark corresponds to one violation of the constraint. The collection of all marks assessed a candidate parse p is denoted marks(p). A mark assessed by a constraint C is denoted *C. A parse a is more marked than a parse b with respect to C iff C assesses more marks to a than to b. (The theory recognizes the notions more- and less-marked, but not absolute numerical levels of markedness.) The GSL constraints are given in (16) (Grimshaw and Samek-Lodovici, 1995: 590). (16) Constraints of the GSL theory of subjects a. SUBJ(ECT) : The highest A-specifier in an extended projection must be filled (Grimshaw, 1995). b. FULL-INT(ERPRETATION): Elements of the output must be interpreted (Grimshaw, 1995). c. DROP-TOP(IC): Arguments coreferent with the topic are structurally unrealized. d. AL(IGN)-FOC(US): The left edge of a focussed constituent is aligned with the right edge of a maximal projection. e. PARSE: Input constituents are parsed (have a correspondent in the output). These constraints can be illustrated with the candidate outputs in (13), as shown in (17). (ALiGN-Focus is vacuously satisfied because this input has no focus.)
175
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
(17) Constraint Tableau for (English-like) L/ (sing(x), x=topic, x=he; T=pres perf) PARSE "^ b. [IP he, has [t,
SUBJ
FULL-INT
*
sung]]
d. [IP it
has [[t, sung] he, ]]
c. [IP
has [[t, sung] he, ]]
a. [IP
has [
sung]]
DROP-TOP At-Foc
* *
*
* *
*
We can interpret PARSE and FULL-INTERPRETATION as members of the FAITHFULNESS family of constraints which play the important role in OT of requiring that an output faithfully parse its input: each input element has one output correspondent with identical featural content, and vice versa. (Relative to OT phonology, the technical details of FAITHFULNESS in OT syntax are more obviously an open question for research. In phonology the 'vocabulary' of the input and output are more nearly identical, so requiring one-to-one correspondence between input and output is more straightforward.) 3.2. Optimality and Harmonic Ordering The central notion of Optimality now makes its appearance. The idea is that by examining the marks assigned by the universal constraints to all the candidate outputs for a given input, we can find the least marked, or optimal, one; the only wellformed parse assigned by the grammar to the input is the optimal one (or optimal ones, if several parses should tie for Optimality). The relevant notion of 'least marked' is not the simplistic one of just counting numbers of violations. Rather, in a given language, different constraints have different strengths or priorities: they are not all equal in force. When a choice must be made between satisfying one constraint or another, the stronger must take priority. The result is that the weaker will be violated in a well-formed structural description. (18) Constraint Ranking A grammar ranks the universal constraints in a dominance hierarchy. When one constraint Q dominates another constraint C2 in the hierarchy, the relation is denoted Q » C2. The ranking defining a grammar is total: the hierarchy determines the relative dominance of every pair of constraints: G! »C 2 » ... »Cn (19) Harmonic Ordering (H-eval): A grammar's constraint ranking induces a harmonic ordering < of all structural descriptions. Two structures a and b are compared by identifying the highestranked constraint C with respect to which a and b are not equally marked: the
176
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
candidate which is less marked with respect to C is the more harmonic, or the one with higher Harmony (with respect to the given ranking). a < b denotes that a is less harmonic than b. The harmonic ordering -< determines the relative Harmony of every pair of candidates. For a given input, the most harmonic of the candidate outputs provided by Gen is the optimal candidate: it is the one assigned to the input by the grammar. Only this optimal candidate is well-formed; all less harmonic candidates are ill-formed. A formulation of harmonic ordering that will prove quite useful for learning involves Mark Cancellation. Consider a pair of competing candidates a and b, with corresponding lists of violation marks marks(d) and marks(b). Mark Cancellation is a process applied to a pair of lists of marks: it cancels violation marks in common to the two lists. Thus, if a constraint C assesses one or more marks *C to both marks(d) and marks(b), an instance of *C is removed from each list, and the process is repeated until at most one of the lists still contains a mark *C. (Note that if a and b are equally marked with respect to C, the two lists contain equally many marks *C, and all occurrences of *C are eventually removed.) The resulting lists of uncancelled marks are denoted marks'(a) and marks\b). If a mark *C remains in the uncancelled mark list of a, then a is more marked with respect to C. If the highest-ranked constraint assessing an uncancelled mark has a mark in marks'(a}, then a < b: this is the definition of harmonic ordering -< in terms of mark cancellation. Mark cancellation is indicated by crossing out cells in the tableau (20): the mark *DROP-Top cancels between the two candidates b and c of (17), as both candidates fail to drop the topic he. After cancellation, candidate b has no remaining marks, and is thus more harmonic than c, which has one uncancelled mark *SuBJ (since it, unlike a, has no SpecIP position). (20) Mark Cancellation (sing(x), x=topic, x=he; T=pres perf) PARSE SUBJ "^ b. [IP he, has [t, c. [IP
sung]]
has [[t, sung] he, ]]
*
FULL-lNT
DROP-TOP AL-Foc
X ^x^
Defining grammaticality via harmonic ordering has an important consequence: (21) Minimal Violation The grammatical candidate minimally violates the constraints, relative to the constraint ranking. The constraints of UG are violable: they are potentially violated in well-formed structures. Such violation is minimal, however, in the sense that the grammatical parse p of an input / will satisfy a constraint C, unless all candidates that fare better than p on C also fare worse than p on some constraint which is higher ranked than C (or unless there simply are no candidates in Gen(I) that satisfy C).
111
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
Harmonic ordering can be illustrated with GSL by reexamining the tableau (20) under the assumption that the universal constraints are ranked by a particular grammar of a language Ll with the ranking given in (22). (This is a language like English with respect to the distribution of subjects.) (22) Constraint hierarchy for (English-like) L t :
PARSE » SUBJ » FULL-INT » DROP-TOP » AL-Foc The constraints are ordered in (20) left-to-right, reflecting the hierarchy in (22). The candidates in this tableau have been listed in harmonic order, from highest to lowest Harmony; the optimal candidate is i®" pointed out. Starting at the bottom of the tableau, a < c can be verified as follows. The first step is to cancel common marks: here, *SUBJ. Then c has a single uncancelled *DROP-Top mark, marks (c) = {*DROPTOP}, while a has an uncancelled *PARSE mark, marks (a) = {PARSE}; PARSE dominates DROP-TOP; so a is less harmonic. Next we verify that c •< d: the sole uncancelled mark of c, *SuBJ, is assessed by a constraint that is higher-ranked in L{ than that assessing the uncancelled mark of d, FuLL-lNT. Finally, d < b holds because d has an uncancelled mark while b does not. As shown in tableau (17), L{ is a language in which unfocussed topic-referring subjects are parsed into subject position (SpecIP). This English-like behavior changes to Italian-like behavior when the ranking of PARSE and SUBJ are lowered to their positions in the ranking defining language L2. (23) Constraint hierarchy for (Italian-like) L2:
FULL-INT » DROP-TOP » PARSE » AL-Foc » SUBJ As shown in tableau (24), now an unfocussed topic-referring subject is not parsed: (24) Constraint Tableau for(Italian-like) L2 (sing(x), x=topic, x=he; T=pres perf) ^ a. [ip
has [
sung]]
b. [IP he, has [t,
sung]]
c. [IP
has [[t, sung] he, ]]
d. [IP it
has [[t, sung] he, ]]
FULL-INT
DROP-TOP PARSE AL-Foc SUBJ
*
*
* * *
*
*
The relation between L{ and L2 illustrates a principle of Optimality Theory central to learnability concerns: (25) Typology by reranking Systematic cross-linguistic variation is due entirely to variation in languagespecific total rankings of the universal constraints in Con. Analysis of the optimal forms arising from all possible total rankings of Con gives the typology of
178
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
possible human grammars. Universal Grammar may impose restrictions on the possible rankings of Con.4 Analysis of all rankings of the GSL constraints derives a typology of subject distribution relating the presence or absence of expletive subjects, the pre- or post-verbal positioning of focussed subjects, and the presence or absence of topic-referring subjects (Grimshaw and Samek-Lodovici, 1995, 1998; Samek-Lodovici, 1996). As our last issue concerning OT fundamentals, we return to the question of infinity. In many OT analyses, the free generation of candidates by Gen leads to an infinite number of candidate structural descriptions of each input /. For example, syntactic analyses along the lines of Grimshaw's (1991), conception of extended projections suggests that there is no principled limit to the number of nested maximal verbal projections, so a parsimonious formulation of Gen might well admit candidate parses with any number of such projections. Or in phonology, the possibility of overparsing (epenthesis) typically leads to an infinite candidate set. It is then reasonable to ask whether, in the face of this infinity, the theory is well-defined. Of course, the overwhelming majority of formal systems in mathematics involve an infinity of structures, yet they are well-defined in the most rigorous sense. Echoing a theme of Section 1, the mere fact of infinity means only that the most primitive conceivable method, listing all the possibilities and checking each one, is infeasible. But even in finite cases, this method is commonly infeasible anyway. In order for an OT grammar to be well-defined, it must be that for any input, it is formally determinate which structure is optimal. The necessary formal definitions are provided in P&S: §5. In order to show that a given structure is the optimal parse of /, we need to provide a proof that none of the (possibly infinitely many) other parses in Gen(f) has higher Harmony. One method, developed in P&S: 115, is the Method of Mark Eliminability: this proceeds by showing that any attempt to avoid the marks incurred by the putatively optimal output leads to alternatives which incur worse marks. Thus the infinite candidate set has a perfectly well-defined optimum (or optima, if multiple outputs incur exactly the same, optimal, set of marks). Yet it might still be the case that the task of actually computing the optimal candidate cannot be performed efficiently. But as discussed above, computational feasibility is not a problem either, at least in the general cases which have been studied to date. One reason is that the infinity of candidates derives from the unbounded potential for more structure than necessary to parse the input. But such extra structure is always penalized by a general sub-class of FAITHFULNESS constraints, the FILL or DEP family: these militate against empty syllable positions in phonology, empty X° positions in syntax (OBLIGATORY-HEADS of Grimshaw, 1993, 1995), uninterpretable elements 4
Ranking restrictions in UG typically encode universal markedness hierarchies; e.g., the universal ranking *LABIAL »UG *CORONAL expresses the universal markedness of labial relative to coronal place of articulation (P&S: §9); ARGOP-SPEC »uo MANOp-SPEC asserts that, when not in specifier position, argument operators are universally more marked than manner operators (Bakovic 1998); REF & BAR »UG BAR states that, when crossing a barrier, a non-referential chain link is universally more marked than a referential link (Legendre et al., 1998).
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
179
(FuLL-lNT), and the like. Optimal structures may have extra structure, in violation of FILL, only when that is necessary to avoid violation of higher-ranking constraints. This will not be the case for unbounded quantities of extra structure. It follows that finite inputs will only have a finite number of structural descriptions which are potentially optimal, under some constraint ranking. Thus a parser constructing an optimal parse of a given input / need only have access to a finite part of the infinite space Gen(f). For example, the parsing algorithms developed in Tesar (1994, 1995a,b, 1996) construct optimal parses from increasingly large portions of the input, requiring an amount of computational time and storage space that grows with the size of the input only as fast as for parsers of conventional, rewrite-rule grammars of corresponding complexity. The structure in the space of candidates allows for efficient computation of optimal parses, even though the grammar's specification of well-formedness makes reference to an infinite set of parses. 3.3. The grammar learning problem Having provided the necessary principles, we can now state a central part of the learning problem for OT grammars: (26) The grammar learning (sub-)problem in Optimality Theoij Given: The universal components of any OT grammar: - the set of possible inputs - the function Gen generating the candidate outputs for any possible input - the constraints Con on well-formedness - learning data in the form of full structural descriptions of grammatical forms Find: A language-particular OT grammar consistent with all the given data: - a ranking (or set of rankings) of the constraints in Con with respect to which the optimal parses of the inputs in the learning data match the outputs in that data The initial data for the grammar learning problem are well-formed outputs; each consists of an input together with the structural description which is declared optimal by the target grammar. For example, the learner of the Italian-like language L2 might have as an initial datum the input / = (sing(x), x=topic, x=he; T=pres perf) together with its grammatical parse, p = [IP has [sung]] (24). 4. Constraint Demotion Having isolated the grammar learning problem from the parsing problem in Section 2, and provided the grammar learning problem with the UG structure characteristic of OT in Section 3, we are now in a position to present our proposal, Constraint Demotion, for addressing the OT grammar learning problem (26).
180
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
Optimality Theory is inherently comparative; the grammaticality of a structural description is determined not in isolation, but with respect to competing candidates. Therefore, the learner is not informed about the correct ranking by positive data in isolation; the role of the competing candidates must be addressed. This fact is not a liability, but an advantage - a comparative theory gives comparative structure to be exploited: each piece of positive evidence, a grammatical structural description, brings with it a body of implicit negative evidence in the form of the competing descriptions. Given access to Gen (which is universal) and the underlying form (contained in the given structural description), the learner has access to these competitors. Any competing candidate, along with the grammatical structure, determines a data pair related to the correct ranking: the correct ranking must make the grammatical structure more harmonic than the ungrammatical competitor. This can be stated more concretely in the context of our running example, the GSL theory. Suppose the learner receives a piece of explicit positive evidence such as the form: p = (sing(x), x=topic, x=he>; T=pres perf; [IP has [sung]]. (Recall that in OT, full structural descriptions consist of an 'input', an 'output', and a correspondence between their elements. This example informs the learner that an unfocussed, topic-referring subject is not overtly realized in the target language.) Now consider any other parse p' of p's input / = (sing(x), x=topic, x=he; T=pres perf); e.g., the parse p' with output [IP he, has [t, sung]]. In the general case, there are two possibilities. Either the alternative parse p' has exactly the same marks as p, in which case p' has the same Harmony as p (no matter what the unknown ranking) and must be tied for optimality: p' too then is a grammatical parse of/. This case is unusual, but possible. In the typical case, a competitor/?' and p will not have identical marks. In this case the Harmonic Ordering of Forms determined by the unknown ranking will declare one more harmonic than the other; it must be p that is the more harmonic, since it is given as well-formed learning data, and is thus optimal. Thus for each well-formed example p a learner receives, every other parse p' of the same input must be sub-optimal, i.e., ill-formed - unless p' happens to have exactly the same marks as p. Thus a single positive example, a parse p of an input /, conveys a body of implicit negative evidence: all the other parses p' in Gen(I} - with the exception of parses p' which the learner can recognize as tied for optimality with p in virtue of having the same marks. In our GSL, a learner given the positive datum p knows that, with respect to the unknown constraint hierarchy of the language being learned, the alternative parse of the same input, p', is less harmonic: for / = (sing(x), x=topic, x=he; T=pres perf), [IP he, has [t, sung]]-< [IP has [sung]]. Furthermore, corresponding harmonic comparisons must hold for every other parse p' in Gen(f). Thus each single piece of positive initial data conveys a large amount of inferred comparative data of the form: [sub-optimal parse of input /, 'loser'}< [optimal parse of input /, 'winner']
181
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
Such pairs are what feed our learning algorithm. Each pair carries the information that the constraints violated by the sub-optimal parse loser must out-rank those violated by the optimal parse winner. This can be made precise using the characterization of optimality given above in Section 3.2; (27) is the result. (27) The Principle of Constraint Demotion For any constraint C assessing an uncancelled winner mark, if C is not dominated by a constraint assessing an uncancelled loser mark, demote C to immediately below the highest-ranked constraint assessing an uncancelled loser mark. Constraint Demotion learning algorithms start with an initial ranking of the universal constraints Con in which most constraints are unranked with respect to each other; in the simplest case, all are unranked, and form one stratum of equally-ranked constraints. In tableaux, equal ranking of two adjacent constraints is indicated by a dotted line separating the constraints' columns. Then constraints are demoted following the Principle of Constraint Demotion in response to learning data. At each learning step, the current winner is indicated with a /; "^ denotes the structure that is optimal according to the learner's current grammar, which may not be the same as the winner (the structure that is grammatical in the target language). The constraint violations of the winner, marks(winner}, are distinguished by the symbol ®. Depending on the precise form of the algorithm, a loser is chosen, and its marks in common with the winner are cancelled. Constraints are demoted according to (27) and more learning data is then analyzed. To illustrate one demotion for the GSL theory, assume all the constraints of (16) are initially unranked, forming a single stratum. Then suppose the learner receives the form p above (the winner), and selects form p' above as the loser; these forms are respectively shown as a and b in tableau (28). In this case, there are no common marks to cancel, so we proceed directly to constraint demotion. In order for a to be more harmonic than b, each of a's marks (®) must be dominated by at least one mark (*) of b. This is achieved by demoting the constraints b violates, SUBJ and PARSE, beneath the constraint that a violates, DROP-TOP. After demotion, there are now two strata in the constraint hierarchy, with SUBJ and PARSE forming the lower stratum and the remaining constraints the higher stratum. (28) Constraint Demotion for (Italian-like) L2
(sing(x), x=topic, x=he);T=pres perf
/ a. [IP
has [ sung]]
b. [IP he,- has [t, sung]]
SUBJ ! DROP-TOP ii ® ii i ,. *i ii '
! AL-Foc ii ii iii
! FULL-INT ii ii ii i
PARSE PARSE i SUBJ ® ® i! ®
ii i
182
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
Given a piece of positive data p, how is the corresponding piece of implicit negative data p' required by Constraint Demotion chosen? In the Err or-Driven Constraint Demotion Algorithm (Tesar, 1998a), the loser is chosen to be the optimal parse of the input I of p: the current grammar erroneously declares this to be optimal, instead of p'. It can be shown that this choice of p' will be informative for reranking if any candidate is; if the learner's current grammar correctly parses / as p (i.e., if p' = p), no learning can occur. Having briefly described Constraint Demotion (CD), we now turn to its analysis. The following two theorems are proved in the Appendix of Tesar and Smolensky (1998). (29) Correctness of Constraint Demotion Starting with an arbitrary initial ranking of the constraints in Con, and applying Constraint Demotion to informative positive evidence as long as such exists, the process converges on a stratified hierarchy such that all totally-ranked refinements of that hierarchy correctly account for the learning data.5 (30) Data complexity of Constraint Demotion The number of informative winner/loser pairs required for learning is at most N(N-\), where N = number of constraints in Con. The significance of the data complexity result is perhaps best illustrated by comparing it to the number of possible grammars. Given that any target grammar is consistent with at least one total ranking of the constraints, the number of possible grammars is the number of possible total rankings, N\. This number grows very quickly as a function of the number of constraints N, and if the amount of data required for learning scaled with the number of possible total rankings, it would be cause for concern indeed. Fortunately, the data complexity of CD is quite reasonable in its scaling. In fact, it does not take many universal constraints to give a drastic difference between the data complexity of CD and the number of total rankings: when W=10, the CD data complexity is 90, while the number of total rankings is over 3.6 million. With 20 constraints, the CD data complexity is 380, while the number of total rankings is over 2 billion billion (2.43 x 1018). This reveals the restrictiveness of the structure imposed by Optimality Theory on the space of grammars: a learner can efficiently home in on any target grammar, managing an explosively-sized grammar space with quite modest data requirements by fully exploiting the inherent structure provided by strict domination. It is important to note that the complexity result (30) gives an upper bound on the number of informative positive examples needed to learn a correct hierarchy. A positive example is informative if and only if the learner's current grammar erroneously 5 That is, the learned grammar generates all the positive evidence. Different initial rankings may lead to different generalizations to unseen examples. If the learner is to generalize conservatively from the evidence, the initial state must meet certain requirements: see Smolensky forthcoming.
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
183
declares it ungrammatical. Thus (30) provides an upper limit to the number of errors that can made during learning. The only way to delay the learner from arriving at a correct grammar is to withhold examples on which the current grammar is incorrect. And even in this case, the learner's performance remains correct on all data not withheld. For the CD learner, every error made moves the grammar closer to a correct one. This is in clear contrast to that of trigger- or cue-based learners discussed in Section 1: these will continue to make errors for as long as it takes to receive a datum - a trigger or cue - with the special form needed to enable a learning step. As argued in Section 1, the number of grammars made available by a grammatical framework is a rather crude measure of its explanatory power. A more significant measure is the degree to which the structure of UG allows rich grammars to be learned with realistically few positive examples. The crude number-of-grammars measure may be the best one can do given a theory of UG which does not enable the better learnability measure to be determined. In OT, however, we do have a quantitative and formally justified measure of learnability available in our N(N-\) limit on the number of informative examples needed to solve our grammar learning problem. And we can see precisely how large the discrepancy can be between the number of grammars made available by a UG and the efficiency of learning that its structure enables. This dramatic difference between the size of the OT grammar space and the number of informative examples needed to learn a grammar is due to the well-structured character of the space of fully-ranked constraint hierarchies. It is useful to consider a set of parameters in the grammar space that suffice to specify the AM grammars: these parameters state, for each pair of different constraints C, and C2, which is dominant, i.e., whether Cj » C2 or C2 » C,. There are in fact N(N-l)/2 such dominance parameters, half the maximum number of informative examples needed to learn a correct hierarchy. Efficient learning via Constraint Demotion is possible because the search space allows these dominance parameters to be unspecified (constraints can be unranked), and because evidence for adjusting these dominance parameters can be assessed independently (via the Principle of Constraint Demotion). A single adjustment may not irrevocably set a correct value for any dominance parameter, each adjustment brings the hierarchy closer to the target, and eventually the adjustments are guaranteed to produce a correct set of parameter values. Note that what is independently adjustable here is not the substantive content of individual grammatical principles: it is the interaction of the principles, as determined by their relative rankings. This is an important point to which we will return in Section 6.1. 5. Recomposing the language learning problem Having discussed a solution to our grammar learning sub-problem, we now return to the larger problem decomposition in which this sub-problem was embedded in Section 2. There we proposed a learning algorithm, RIP/CD (7), which combines a
184
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
solution to our grammar learning problem with robust interpretive parsing to learn from overt learning data: (31) RIP/CD: Iterative model-based approach to the Problem of Learning Hidden Structure under OT Step 1. Find the hidden structure consistent with the overt learning data that has maximal Harmony, given the current grammar. [Robust Interpretive Parsing] Step 2. Find a grammar that makes this pairing of overt and hidden structure optimal. [Grammar Learning by Constraint Demotion] Starting with some initial grammar, execute Steps 1 and 2 repeatedly. In this section we sketch an illustration of how RIP/CD might look when applied to the domain of learning stress systems. This illustration is tentative in several respects; we offer it to render concrete the general discussion of Section 2. Our objective in this example is to illustrate as expeditiously as possible how RIP/CD may proceed to learn a grammar from overt data only. To facilitate some more general comparisons, we consider the same grammatical domain as the learnability study of Dresher and Kaye (1990), stress systems. A serious study would start with a characterization of the metrical module of UG already under vigorous development in the OT literature: a set of structures and constraints such that all possible rankings of the constraints yields all and only the observed stress systems. To allow us to proceed directly to the relevant learnability issues, however, we will simply consider a few plausible constraints which allow some major dimensions of metrical variation to arise through reranking. Two of the major dimensions of variation that a stress learner must cope with, in the fairly standard pre-OT terminology used by Dresher and Kaye (1990), are ±quantity-sensitivity and ±extrametricality. The former is particularly interesting because it can be seen as a rather major distinction in how the input is to be analyzed. It is convenient for our illustration to adopt the following four constraints: (32) Illustrative metrical constraints6 a. BISYLL: A foot is bisyllabic. b. WSP: A heavy syllable is a head of a foot. (Weight-to-Stress Principle) c. PARSE-CT: A syllable is parsed into a foot. d. NONFIN: A foot is not final in the prosodic word. 6
These constraints are closely related to those of P&S. BISYLL is half of the constraint FrBiN (P&S: 47, (61)), "Foot Binarity: Feet are binary at some level of analysis (|a, a)". The WSP is P&S: 53 (67); PARSE-CT is the relevant member of the PARSE family, introduced explicitly in P&S: 58. NONFIN is formulated in P&S:52 (66) as "NONFINALITY: No head of PrWd is final in PrWd"; our slightly simpler formulation is possible here because in the relevant candidates the final foot is the head of PrWd, the prosodic word.
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
185
To focus the analysis, we will assume in this example that the learner has already correctly ranked the constraints governing the trochaic/iambic distinction, the 'directionality' of footing, and whether the word's head foot is on the right or left; we also assume these are all higher-ranked than the constraints in (32) which we examine. Simultaneously ranking all these constraints would constitute not an illustrative example but a major study well beyond the scope of this paper. See Tesar (1997, 1998b) for more extensive studies. Our first dimension of interest, the ±quantity-sensitive contrast, is characterized in part by the relative ranking of the two constraints (32a,b). When BISYLL dominates WSP, feet will always be bisyllabic even when that entails heavy syllables falling in the weak position in a foot; footing will be driven by the higher-ranked constraints on foot form and 'directionality' and by BISYLL; WSP will not be active and so the weight of syllables will not affect footing. On the other hand, when WSP » BISYLL, footing is weight-sensitive, with heavy syllables banned from foot-weak position. Our second dimension, ±extrametricality, is governed by the two constraints (32)c,d. In the +extrametrical case, the final syllable is left unfooted; this ensures that no foot is final in the word, in satisfaction of NoNFiN; but this entails a violation of PARSE-CT since the final syllable is not parsed into a foot. Thus the +extrametrical case arises when NoNFiN dominates PARSE-CJ; the reverse ranking yields extrametricalty. In our example, we assume the learner is faced with a stress system like that of Latin: stress is penultimate when the penult is heavy, otherwise it is antepenultimate. This is a quantity-sensitive stress system with +extrametricality. We assume that the learner's current ranking is wrong on both dimensions. We assume the feet to be troachic, right-to-left, with main stress on the right; as already remarked, we presume the learner to have correctly learned these dimensions. When we pick up the learner, the ranking of the four constraints in (32) is taken to be:
(33) PARSE-CT » NONFIN » BISYLL » WSP We suppose the next datum is LLLL. Note that here we are taking as given to the learner only the overt structure: syllable weights (L=light, H=heavy) and stresses, assumed by Dresher and Kaye to be directly available to the learner. Note that this datum is not grammatical according to the current grammar, which declares that LLLL should be parsed as (LL)(LL) [-quantity-sensitive, -extrametrical], surfacing as LLLL; this is shown in tableau (34). The optimal candidate according to the current grammar, a, is, as usual, marked with v^.1 The correct parse L(LL)L [+quantitysensitive, +extrametrical] is shown with /.
7
We have of course omitted other candidates which must be shown sub-optimal, such as those violating the high-ranked constraints requiring trochaic feet, candidates like (L)(LL)(L) which are universally sub-optimal (universally less harmonic than a), etc.
186
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
(34) Ranking when we pick up the learner LLLL: before PARSE-CJ NONFlN
BlSYLL
WSP
*
a. ^
(LL)(LL)
b. /
L (LL) L
**
c. ^
(L)(LL)L
*
*
We want to learn from this informative datum LLLL by Constraint Demotion, but this requires a full structural description for the datum. According to our overall learning algorithm RIP/CD (31), we get a structural description by applying robust interpretive parsing, the first step of the algorithm. Recall from Section 2 that the correct interpretive parse is defined to be the most harmonic candidate among those that have the correct overt part. Candidate a has incorrect overt part (penultimate stress), while candidates b-c have correct overt part. Comparing them, we see that c has maximal Harmony: it is the robust interpretive parse, as indicated by »*, although it is not the correct parse. (Note the need for robust parsing here: we need to use the current grammar to parse LLLL into foot structure, even though this stress pattern is not grammatical according to the current grammar, which declares that input /LLLL/ must be stressed LLLL.) The RIP/CD algorithm assigns the hidden foot structure of the robust interpretive parse c to this datum, and proceeds to the second step, grammar learning from the full structural description given by interpretive parsing. For this we will use Error-Driven Constraint Demotion. The relevant winner-loser pair is determined as follows. The loser, according to the Error-Driven algorithm, is the optimal candidate from the current grammar: a. The winner, from the interpretive parsing step, is c. Both the winner marks *PARSE-CT and *BISYLL must be dominated by the loser mark, *NoNFiN. This is already true for *BISYLL, but PARSE-CT must be demoted to just below NoNFiN, into the same stratum as BISYLL, yielding the ranking shown in (35). (Recall that the dotted line between PARSE-CJ and BISYLL indicates they are equally ranked: both part of the same stratum.) (35) After robust interpretive parsing + Error-Driven Constraint Demotion on datum LLLL
Now c is optimal, as desired. But b is optimal as well: their constraint violations differ only in that c has an additional violation of PARSE-CJ while d has an additional
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
187
violation of BISYLL; since these constraints occupy the same stratum in the hierarchy, these candidates are equally harmonic. After this demotion, the ranking of NONFIN and PARSE-CT has reversed, and the stress system has switched from -extrametrical to +extrametrical. (Lower-ranked PARSE-CT is still active in the grammar, however; for example, it causes c to tie for optimality with b -incorrectly, as it happens.) The quantity-insenstitivity of the grammar has not changed; that requires an input with quantity contrasts, such as LHL. As shown in tableau (36), the current grammar parses the input LHL as (LH)L, with incorrect stress placement due to quantity insensitivity: (36) Initial analysis of second datum LHL: before
NONFlN
a. ^
(LH)L
b.
L(HL)
c. *+ /
L (H)L
d. ^
(L)(H)L
PARSE-CT BISYLL WSP *
*
* **
*
;
*
; **
To learn from this error, we apply the first step of RIP/CD: robust interpretive parsing. The maximal-Harmony parses with correct overt part (stress) are c and d (tied); we suppose the incorrect one is chosen, d. So the hidden (foot) structure imposed on the overt data is (L)(H)L. With this full structural description, we can now perform the second step of RIP/CD: Constraint Demotion. The winner/loser pair has d as winner and a as loser; the two constraints violated by the winner, PARSE-CT and BISYLL must be demoted beneath the one constraint violated by the loser, WSP: (37) After robust interpretive parsing + Error-Driven Constraint Demotion on datum
After demotion, two candidates c and d tie for optimality. We have assumed c to be the correct parse; d will have to be rendered less harmonic by a later demotion of PARSE-CT, or by constraints on foot form which we have not taken into account, such as the part of Foot Binarity, FiBiN, which requires feet to be binary at the level of
188
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
the mora, banning (L). But we will not follow this learning process any further. We simply observe that after the second iteration of RIP/CD, the first datum we considered is still assigned correct stress (although the structural ambiguity involving monomoraic feet still obtains): (38) Resulting analysis of first datum, LLLL LLLL: final NONFlN WSP a.
PARSE-CT
BISYLL
*!
(LL)(LL)
b. "5V L(L L) L
**
c. **"
*
(L)(LL)L
*
To see how one novel datum is treated by this grammar, we consider LLH. As it happens, the grammar correctly assigns antepenultimate stress, properly handling the interaction between the quantity-sensitivity and extrametricality dimensions in the case of a final heavy syllable: (39) Resulting analysis of novel datum LLH
LLH: final
NONFlN
WSP *
PARSE-CT
BISYLL
*
a. "^ /
(L L) H
b.
(LL)(H)
*!
c.
L(LH)
*!
*
d.
(L)(LH)
*!
*
e.
L(L)H
*
* *!
*
f-
(L)(L)H
*
*
*! *
* *
*
(As the fatal violations *! in this tableau and tableau (37) show, every constraint in this little grammar is active, i.e., used to eliminate sub-optimal candidates.) In concluding this little illustration, let us be clear about what it is and is not intended to show. As we have said, the formal properties of RIP/CD are the subject of future research; we are not presently in a position to claim for RIP/CD what we can claim for Constraint Demotion: that it efficiently computes a correct solution to its learning problem, namely, learning from overt data alone.8 8
In evaluating the overall state of learnability results in OT five years after its founding document, a relevant context is the state of learnability results in Principles-and-Parameters theory, more than fifteen years after its creation. Consider the examples discussed in Section 1. The primary claims of Gibson and Wexler (1994) are: (a) the failure of learnability in a particular, simple UG with three parameters, and (b) learnability given the existence of triggers, a conclusion challenged by Frank and Kapur (1996). Concerning the success of their learning algorithm specially designed for a particular metrical theory, Dresher and Kaye's assessment is that it "does quite well" (1990: 175); "while determining that a sys-
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
189
Our example is intended primarily to illustrate concretely how RIP/CD addresses the problem of learning when hidden grammatical structure is not provided in the primary learning data, and how Constraint Demotion can provide the grammarlearning engine at the heart of a larger learning algorithm operating on overt data: the only additional element needed is robust interpretive parsing. Rather than heuristic procedures for finessing absent hidden structure, based on particular properties of a given theory of, say, stress (e.g., the theory-specific cues of Dresher and Kaye, 1990) RIP/CD's solution to the problem of assigning hidden structure is grammatical-framework general, based on the fundamental principles of optimization that are the defining foundation of the framework. The example also demonstrates the point that, in linguistically interesting cases, determining the correct robust parse can be no more problematic than determining ordinary optimality: comparison of candidates using tableaux works in much the same way in the two cases. From the perspective of theoretical (as opposed to computational) linguistics, the availability of correct, efficient and highly general algorithms for computing robust interpretive parses is simply not an issue. Finally, the example illustrates how the interaction of robust interpretive parsing and Constraint Demotion in RIP/CD leads to the minimal demotion that will fit the overt form. For interpretive parsing chooses the structural description for the datum which is closest to grammatical (among those with correct overt form); choosing this 'winner' for Constraint Demotion entails that the smallest demotion is required to render this winner optimal. Thus RIP/CD preserves a crucial property of Constraint Demotion: the reranking performed is always the minimal change that allows correct analysis of the given datum.
6. Concluding discussion 6.1. Parametric independence and linguistic explanation Let us consider an important learnability consequence of the different conceptions of cross-linguistic variation found in Optimality Theory and in the Principles-andParameters framework. In P&P theory, cross-linguistic variation is accounted for by a set of parameters, where a specific grammar is determined by fixing each parameter to one of its possible values. Work on learnability focuses on the relationship between data and the parameter values, usually discussed in terms of triggers, a trigger being a datum (e.g., for syntax, a type of sentence) which indicates the appropriate value for a specific parameter (see, for example, the definitions of trigger in Gibson and Wexler, 1994; Frank and Kapur, 1996). It is significant that a trigger tem is QS is usually unproblematic, the Learner does less well than a linguist in making the further discrimination between QS [Rime] and QS [Nucleus]" (1990: 175-176; "QS" = quantity sensitive); "aside from some problems with extrametricality ... the Learner performs quite well within the bounds of its theory" (1990: 177). Clearly, deriving linguistically relevant formal learnability results is a rather difficult problem.
190
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
provides information about the value of a single parameter, rather than relationships between the values of several parameters.9 This property is further reinforced by a proposed constraint on learning, the Single Value Constraint (R. Clark, 1990; Gibson and Wexler, 1994): successive hypotheses considered by a learner may differ by the value of at most one parameter. The result is that learnability concerns in the P&P framework favor parameters which are independent: they interact with each other as little as possible, so that the effects of each parameter setting can be distinguished from the effects of the other parameters. In fact, this property of independence has been proposed as a principle for grammars (Wexler and Manzini, 1987). Unfortunately, this results in a conflict between the goals of learnability, which favor independent parameters with restricted effects, and the goals of linguistic theory, which favor parameters with wide-ranging effects and greater explanatory power (see Safir (1987), for a discussion of this conflict). Optimality Theory may provide the opportunity for this conflict to be avoided. In Optimality Theory, interaction between constraints is not only possible but explanatorily crucial. Cross-linguistic variation is explained not by variation in the substance of individual constraints, but by variation in the relative ranking of the same constraints. Cross-linguistic variation is thus only possible to the extent that constraints interact. The Constraint Demotion learning algorithm not only tolerates constraint interaction, but is based entirely upon it. Informative data provide information not about one constraint in isolation, but about the results of interaction between constraints : such data allow the learner to deduce the relative ranking of constraints by revealing which constraint violations are tolerated in cases of constraint conflict. Constraints which have widespread effects benefit learnability: the marks they assess are visible in a broad range of contexts, affording increased opportunity for deducing rankings. Thus the results presented here provide evidence that in Optimality Theory, linguistic explanation and learnability work together: they both favor interacting constraints with wide-ranging effects and explanatory power. This attractive feature arises from the fact that Optimality Theory defines grammaticality in terms of optimization over violable constraints. This central principle makes constraint interaction the main explanatory mechanism. It provides the implicit negative data used by Constraint Demotion precisely because it defines grammaticality in terms of the comparison of candidate outputs, rather than in terms of the structure of each candidate output in isolation. Constraint Demotion proceeds by comparing the constraint violations assessed to candidate structural descriptions. This makes constraint interaction the basis for learning. By making constraint interaction the foundation of both linguistic explanation and learning, Optimality Theory creates the opportunity for the full alignment of these two goals. The discovery of sets of constraints which interact strongly in ways that participate in diverse linguistic phenomena represents progress for both theoretical explanation and learnability. Clearly, this is a desirable property for a theoretical framework. 9
Under the normal definitions of trigger, a single datum can be a trigger for more than one parameter, but is such independently. In such a case, the datum would not be interpreted as expressing any relationship between the values of the two parameters.
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
191
6.2. Summary We close by summarizing the OT learnability theory developed in T&S, including aspects which space restrictions did not allow us to consider above. (For other approaches to learning in OT, see Hale and Reiss, 1997; Pulleyblank and Turkel, 1998; Boersma, to appear.) An Optimality-Theoretic grammar is a ranked set of violable constraints which defines a notion of relative Harmony of structural descriptions, the maximally harmonic or optimal structures being the grammatical ones. The consequences of constraint hierarchies for surface patterns can be quite subtle and often surprising. Remarkably different surface patterns can emerge from the reranking of the same set of universal constraints. All this is integral to the explanatory power of OT as a linguistic theory. But it also raises concerns about learnability. If the relations between grammatical forms and grammars is so complex and opaque, how can a child cope? Linguists working in OT are frequently faced with a hypothesized set of universal constraints and a collection of surface forms which they have given hypothetical structural descriptions; the question is, is there a ranking of the constraints that yields the correct structures? Typically, this turns out to be a challenging question to answer. Of course, with even a modest number of constraints, the number of possible rankings is much too large to explore exhaustively. So the starting point of the present research is the question, are there reliable, efficient means for finding a ranking of a given set of constraints which correctly yields a given set of grammatical structural descriptions? As we have seen in Section 4, the answer is yes, if the learner is given informative pairs of optimal structures with suboptimal competitors. For any set of such data pairs consistent with some unknown total ranking of the given constraints, Constraint Demotion finds a stratified hierarchy consistent with all the data pairs. A key to these results is the implicit negative evidence that comes with each positive example: all the universally-given competitors to each optimal structure (excluding any which may have identical constraint violations); these are guaranteed to be sub-optimal and therefore ill-formed. The pairs of optimal forms and sub-optimal competitors are the basis of Constraint Demotion: the constraints violated by the optimal form are minimally demoted to lie below some constraint violated by the sub-optimal form (excluding canceled marks). Is it necessary that informative sub-optimal forms be provided to the learner? As we saw, the answer is no (Tesar, 1998a). Given a grammatical structural description as a learning datum, the learner can identify the input in the structural description, and compute the optimal parse of that input using the currently hypothesized hierarchy: that parse can be used as the sub-optimal competitor, unless it is equal to the given parse, in which case the example is not informative - no learning can occur. This is Error-Driven Constraint Demotion. Analysis like that discussed above in Section 4 shows that learning in all these cases is efficient, in the sense that the number of informative examples, or number of learning operations (demotions), is guaranteed to be no more than N(N-l), where
192
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
N is the number of constraints. This grows quite modestly with N, and is vastly less than the number of grammars, N\. This brings us to the current frontier of the formal results. In addition to demonstrating these results, we have also discussed in T&S some further implications of these results, and made several proposals for how the results might be further extended. The most important extension addresses the question, must full structural descriptions of positive examples be provided to the learner? The Constraint Demotion algorithms operate on constraint violations or marks, and these can be determined only from full structural descriptions. In Section 2, we proposed that the learner, given only the overt part of grammatical structures, can compute the full structural description needed for Constraint Demotion by using robust interpretive parsing: using the currently hypothesized grammar, the learner finds the maximal-Harmony structural description consistent with the overt form (and the currently hypothesized lexicon). Such parsing is a necessary part of the overall theory anyway, independent of learning, since grammar users must perform it when interpreting overt forms. Coupling interpretive parsing to the Constraint Demotion solution to the problem of learning a grammar from full structural descriptions yields an algorithm we call RIP/CD, a new member of the family of iterative model-based solutions to the general problem of learning hidden structure. In other learning domains, these solutions have been highly successful in both theory and practice; positive experimental results in the domain of acquiring stress have been achieved (Tesar, 1997, 1998b). In the case of phonology acquisition, must the learner be provided with the lexicon of underlying forms (necessary for interpretive parsing, as well as the inputs to production-directed grammatical parsing)? In T&S: §9, we proposed that, as part of the same iterative process that is adapting the grammar to accommodate the structural descriptions produced by interpretive parsing (RIP/CD), the learner can incrementally learn the lexicon via lexicon optimization at the level of the morphological paradigm. At each stage of learning, the current grammar is used to find the underlying form for morphemes which yields the maximum-Harmony structural descriptions for paradigms. We provided there a miniature example of how this can work in the face of phonological alternation. Taken as a whole, our OT learning theory constitutes a proposal for how a learner, provided with the universal elements of any OT UG system, and the overt parts of forms grammatical with respect to some grammar admitted by that UG, could learn the grammar, the structural descriptions, and the lexicon. This proposal decomposes the problem into three subproblems: robust interpretive parsing, lexicon learning, and grammar learning. Currently, we have a set of formal results on the grammar learning sub-problem. How do these learnability considerations relate to OT work on actual acquisition? Smolensky (forthcoming, pursuing an insight of Prince (1993)), considers the question of the initial state, and developed a 'subset'-type argument which uses the OT principle of 'richness of the base' to show that, in general, FAITHFULNESS constraints must be low-ranked in the initial state if unmarked inventories are to be learnable. It turns out that the concept of robust interpretive parsing developed here makes sense
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
193
of the proposal that despite the relative poverty of their productions, children's inputs are essentially the correct adult forms (Smolensky, 1996b). This connects a fundamental principle of OT and learnability considerations to two important assumptions of much OT research on phonological acquisition: initial low-ranking of FAITHFULNESS and the hypothesis that children's inputs closely approximate the adult forms (Demuth, 1995; Gnanadesikan, 1995; Levelt, 1995; Pater and Paradis 1996; Levelt and Van de Vijver, 1998). And finally, how does the emerging OT learning theory relate to linguistic explanation? In Section 6.1 we observed that in OT, constraint interaction is simultaneously the key to both linguistic explanation and learnability: constraint conflict, resolved by language-specific ranking, provides both the explanatory power of OT as a linguistic theory, and the evidence learners need to deduce their target grammar. How, exactly, is learnability enhanced by the structure imposed by a particular grammatical theory on the space of possible human grammars? The work summarized in this paper provides evidence that Optimality Theory's claims about the structure of Universal Grammar have manifold implications for learning. The claim that constraints are universal entails that the learner can use a given set of constraints to evaluate structural descriptions. The claim that grammatical structures are optimal, and grammars are total rankings of violable constraints, entails that with every piece of explicit positive evidence comes a mass of implicit negative evidence, and that constraints can be ranked so that those violated by positive data are dominated by those violated by implicit negative data. The claim that grammars are evaluators of structural descriptions turns out to provide a uniform basis for the problems of parsing overt forms to determine their hidden structure, parsing inputs to determine their grammatical output, and deducing new inputs for insertion into the lexicon: these are merely three different directions of accessing the evaluative structure that is the grammar (T&S: §9). The claim of richness of the base connects the OT basis for adult typologies with fundamental hypotheses underlying acquisition research. All these implications follow not from a particular OT theory of stress, nor an OT theory of phonology, but from the fundamental structure which OT claims to be inherent in all of grammar. Our learning algorithms derive from this general grammatical structure alone, and so apply to the learning of any OT grammar. At the same time, our algorithms are not generic search procedures, uninformed by a theory of grammar. The special, characteristically linguistic, structure imposed by OT on UG is sufficiently strong to allow the proof of learnability theorems which state that large spaces of possible grammars can be efficiently navigated to home in on a correct grammar. References (Rutgers Optimality Archive = http://ruccs.rutgers.edu/roa.html) Bahl, L.R., F. Jelinek and R.L. Mercer, 1983. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-5, 179-190. Bakovic, E., 1998. Optimality and inversion in Spanish. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis, and D. Pesetsky (eds.), Is the best good enough? Optimality and competition in syntax, 35-58. Cambridge, MA.: MIT Press and MITWPL.
194
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
Baum, L. E., and T. Petrie, 1966. Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics 37, 1559-1563. Boersma, P., to appear. Learning a grammar in Functional Phonology. In: J. van de Weijer, J. Dekkers and F. van der Leeuw (eds.) Optimality Theory: Phonology, syntax, and acquisition. Brown, P.P., J. Cocke, S.A. Delia Pietra, VJ. Delia Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer and P.S. Roossin, 1990. A statistical approach to machine translation. Computational Linguistics 16, 79-85. Clark, R., 1990. Papers on learnability and natural selection. Technical reports in formal and computational linguistics, No. 1, Universite de Geneve. Dempster, A.P., N.M. Laird and D.B. Rubin, 1977. Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B (39), 1-38. Demuth, K., 1995. Markedness and the development of prosodic structure. In: J. Beckman (ed.), NELS 25, 13-25. Amherst, MA: GLSA, University of Massachusetts. Rutgers Optimality Archive 50. Dresher, B.E., and J. Kaye, 1990. A computational learning model for metrical phonology. Cognition 34, 137-195. Eisner, J., 1997. Efficient generation in primitive Optimality Theory. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics. Rutgers Optimality Archive 206. Ellison, T.M., 1994. Phonological derivation in Optimality Theory. In: Proceedings of the Fifteenth International Conference on Computational Linguistics, 1007-1013. Rutgers Optimality Archive 75. Frank, R. and S. Kapur, 1996. On the use of triggers in parameter setting. Linguistic Inquiry 27, 623-660. Frank, R. and G. Satta, in press. Optimality Theory and the generative complexity of constraint violability. Computational Linguistics. Rutgers Optimality Archive 228. Gibson, E. and K. Wexler, 1994. Triggers. Linguistic Inquiry 25, 407^54. Gnanadesikan, A., 1995. Markedness and faithfulness constraints in child phonology. Manuscript, University of Massachusetts. Rutgers Optimality Archive 67. Grimshaw, J., 1991. Extended projection. Manuscript, Brandeis University, Waltham, MA. Grimshaw, J., 1993. Minimal projection, heads, and optimality. Manuscript, Rutgers University, New Brunswick, NJ. Rutgers Optimality Archive 68. Grimshaw, J, 1995. Projection, heads, and optimality. Linguistic Inquiry 28, 373^22. Grimshaw, J. and V. Samek-Lodovici, 1995. Optimal subjects. In: J. Beckman, L. Walsh Dickey and S. Urbanczyk (eds.), University of Massachusetts Occasional Papers 18: Papers in Optimality Theory, 589-605. GLSA, University of Massachusetts, Amherst, MA. Grimshaw, J. and V. Samek-Lodovici, 1998. Optimal subjects and subject universals. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis and D. Pesetsky (eds.), Is the best good enough? Optimality and Competition in Syntax, 193-219. Cambridge, MA.: MIT Press and MITWPL. Hale, M. and C. Reiss., 1997. Grammar Optimization: The simultaneous acquisition of constraint ranking and a lexicon. Manuscript, Concordia University. Rutgers Optimality Archive 231. Hammond, M., 1997. Parsing in OT. Manuscript, University of Arizona. Rutgers Optimality Archive 222. Hinton, G., 1989. Connectionist learning procedures. Artificial Intelligence 40, 185-234. Jakobson, R., 1962. Selected writings 1: Phonological studies. The Hague: Mouton. Karttunen, L., 1998. The proper treatment of optimality in computational phonology. Manuscript, Xerox Research Centre Europe. Rutgers Optimality Archive 258. Legendre, G., Y. Miyata and P. Smolensky, 1990a. Can connectionism contribute to syntax? Harmonic Grammar, with an application. In: Proceedings of the 26th Meeting of the Chicago Linguistic Society. Chicago, IL: Chicago Linguistic Society. Legendre, G., Y. Miyata and P. Smolensky, 1990b. Harmonic Grammar - A formal multi-level connectionist theory of linguistic well-formedness: Theoretical foundations. In: Proceedings of the Twelfth Annual Conference of the Cognitive Science Society, 388-395. Cambridge, MA: Erlbaum. Legendre, G., P. Smolensky and C. Wilson, 1998. When is less more? Faithfulness and minimal links in wh-chains. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis and D. Pesetsky (eds.), Is the best good enough? Optimality and competition in syntax, 249-289. Cambridge, MA.: MIT Press and MITWPL. Rutgers Optimality Archive 117.
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
195
Levelt, C., 1995. Unfaithful kids: Place of articulation patterns in early child language. Paper presented at the Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, September. Levelt, C. and R. van de Vijver, 1998. Syllable types in cross-linguistic and developmental grammars. Paper presented at the Third Biannual Utrecht Phonology Workshop, June 11-12, 1998. Rutgers Optimality Archive 265. McCarthy, J. and A. Prince, 1993. Prosodic morphology I: Constraint interaction and satisfaction. Manuscript University of Massachusetts, Amherst, MA, and Rutgers University, New Brunswick, NJ. To appear as Linguistic Inquiry Monograph, MIT Press, Cambridge, MA. McCarthy, J. and A. Prince, 1995. Faithfulness and reduplicative identity. In: J. Beckman, L. Walsh Dickey and S. Urbanczyk (eds.), University of Massachusetts Occasional Papers 18: Papers in Optimality Theory, 249-384. GLSA, University of Massachusetts, Amherst, MA. Rutgers Optimality Archive 60. Nadas, A. and R.L. Mercer, 1996. Hidden Markov Models and some connections with artificial neural nets. In: P. Smolensky, M.C. Mozer and D.E. Rumelhart (eds.), Mathematical perspectives on neural networks, 603-650. Mahwah, NJ: Erlbaum. Niyogi, P. and R.C. Berwick, 1993. Formalizing triggers: A learning model for finite spaces. A.I. Memo No. 1449. Artificial Intelligence Laboratory, MIT. Pater, J. and J. Paradis, 1996. Truncation without templates in child phonology. In: Proceedings of the Boston University Conference on Language Development 20, 540-552. Somerville, MA: Cascadilla. Prince, A., 1993. Internet communication, September 26. Prince, A., and P. Smolensky, 1991. Notes on connectionism and Harmony Theory in linguistics. Technical Report CU-CS-533-91. Department of Computer Science, University of Colorado, Boulder, CO. Prince, A. and P. Smolensky, 1993. Optimality Theory: Constraint interaction in generative grammar. Manuscript, Rutgers University, New Brunswick, NJ, and University of Colorado, Boulder, CO. To appear as Linguistic Inquiry Monograph, MIT Press, Cambridge, MA. Pulleyblank, D. and W.J. Turkel, 1995. Traps in constraint ranking space. Paper presented at Maryland Mayfest 95: Formal Approaches to Learnability. Pulleyblank, D. and W.J. Turkel, 1998. The logical problem of language acquisition in Optimality Theory. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis and D. Pesetsky (eds.), Is the best good enough? Optimality and competition in syntax, 399-420. Cambridge, MA.: MIT Press and MITWPL. Safir, K., 1987. Comments on Wexler and Manzini. In: T. Roeper and E. Williams (eds.), Parameter setting, 77-89. Dordrecht: Reidel. Samek-Lodovici, V., 1994. Structural focusing and subject inversion in Italian. Manuscript, Rutgers University, New Brunswick, NJ. Presented at the XXIV Linguistic Symposium on Romance Languages, Los Angeles. Samek-Lodovici, V., 1996. Constraints on subjects: An Optimality Theoretic analysis. Ph.D. dissertation, Rutgers University, New Brunswick, NJ. Rutgers Optimality Archive 148. Smolensky, P., 1983. Schema selection and stochastic inference in modular environments. In: Proceedings of the National Conference on Artificial Intelligence, 378-382. Smolensky, P., 1986. Information processing in dynamical systems: Foundations of harmony theory. In: D.E. Rumelhart, J.L. McClelland and the PDP Research Group (eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Volume 1: Foundations, 194-281. Cambridge, MA: MIT Press/Bradford. Smolensky, P., 1996a. Statistical perspectives on neural networks. In: P. Smolensky, M.C. Mozer and D. E. Rumelhart (eds.), Mathematical perspectives on neural networks, 453-495. Mahwah, NJ: Erlbaum. Smolensky, P., 1996b. On the comprehension/production dilemma in child language. Linguistic Inquiry 27,720-731. Smolensky, P., forthcoming. The initial state and 'richness of the base' in Optimality Theory. Linguistic Inquiry. [Previous version: Technical Report JHU-CogSci-96-4, Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, 1996. Rutgers Optimality Archive 154.] Tesar, B., 1994. Parsing in Optimality Theory: A dynamic programming approach. Technical Report, Department of Computer Science, University of Colorado at Boulder, CO.
196
B.B. Tesar, P. Smolensky I Lingua 106 (1998) 161-196
Tesar, B., 1995a. Computing optimal forms in Optimality Theory: Basic syllabification. Technical Report, Department of Computer Science, University of Colorado at Boulder, CO. Rutgers Optimality Archive 52. Tesar, B., 1995b. Computational Optimality Theory. Ph.D. dissertation, Department of Computer Science, University of Colorado at Boulder, CO. Rutgers Optimality Archive 90. Tesar, B., 1996. Computing optimal descriptions for Optimality Theory grammars with context-free position structures. In: proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, 101-107. Tesar, B., 1997. An iterative strategy for learning metrical stress in Optimality Theory. In proceedings of the Twenty-First Annual Boston University Conference on Language Acquisition, 615-626. Rutgers Optimality Archive 177. Tesar, B., 1998a. Error-driven learning in Optimality Theory via the efficient computation of optimal forms. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis and D. Pesetsky (eds.), Is the best good enough? Optimality and competition in syntax, 421^4-35. Cambridge, MA.: MIT Press and MITWPL. Tesar, B., (1998b). An iterative strategy for language learning. Lingua. 104(1-2) 131-145. Tesar, B., in press. Robust Interpretive Parsing in metrical stress theory. In: Proceedings of WCCFL XVII. Rutgers Optimality Archive 262. Tesar, B. and P. Smolensky, 1993. The learnability of Optimality Theory: An algorithm and some basic complexity results. Technical Report, Department of Computer Science, University of Colorado at Boulder, CO. Rutgers Optimality Archive 2. Tesar, B. and P. Smolensky, 1996. Learnability in Optimality Theory (long version). Technical Report JHU-CogSci-96-3, Department of Cognitive Science, Johns Hopkins University, Baltimore, MD. Rutgers Optimality Archive 156. Tesar, B. and P. Smolensky, 1998. Learnability in Optimality Theory. Linguistic Inquiry 29, 229-268. Wexler, K., and M.R. Manzini, 1987. Parameters and learnability in binding theory. In: T. Roeper and E. Williams (eds.), Parameter setting, 41-76. Dordrecht: Reidel.
Lingua ELSEVIER
Lingua 106 (1998) 197-218
Constraining the search for structure in the input1 Peter W. Jusczyk* Departments of Psychology and Cognitive Science, Ames Hall, Johns Hopkins University, Baltimore, MD 21218, USA
Abstract In their first year, infants develop sensitivity to specific phonetic, phonotactic, and prosodic features that are characteristic of their native language. Moreover, they are attuned to potential markers of syntactic organization that are present in the speech signal. Such studies raise questions about the range and nature of the capacities involved in the early stages of language acquisition. As speech researchers have begun to explore how infants learn about specific aspects of native language sound structure, there has been an increasing emphasis on the abilities of infants to pick up information about the structural organization of their language from exposure to the input. However, evidence from some recent investigations suggests that the search for distributional regularities in the input is constrained by both cognitive and linguistic factors. Keywords: Prosodic markers; Phonotactics; Word segmentation; Discontinuous dependencies; Optimality Theory
1. Introduction Thirty years ago, we had very little information about the capacities of infants for perceiving spoken language. In fact, it was not clear whether infants had any real ability to discriminate speech sound differences before they actually began to produce speech sounds themselves. However, with the development of different methodologies for exploring the perceptual capacities of infants, it became apparent that infants are picking up information about language long before they actually produce speech of their own. Indeed, the findings from the initial study by Eimas and his colleagues (Eimas et al., 1971), showing that even 1-month-olds have the capac-
* Funding for many of the studies described and for the preparation of this manuscript was provided by a Research Grant from NICHD (#15795) and a Research Scientist Award from NIMH (#01490). In addition, the author is grateful to Ann Marie Jusczyk, Paul Smolensky, and three reviewers for comments on a previous version of the present manuscript, and to his many collaborators on the research projects reported here. * Phone: +1 (410) 515-5165, Fax: +1 (410) 516-4478, E-mail: [email protected] 0024-3 841/99/S - see front matter © 1999 Elsevier Science B.V. All rights reserved PII: 80024-3841(98)00034-5
198
P.W. Jusczyk I Lingua 106 (1998) 197-218
ity to perceive such subtle contrasts as the one between [ba] and [pa], led to a great deal of speculation about specialized innate linguistic capacities. Speculation about such capacities was furthered by findings of differences in infants' processing of speech and non-speech signals (Eimas, 1974, 1975; Morse, 1972) and by demonstrations that certain speech contrasts were perceived without prior experience (Streeter, 1976). Furthermore, evidence suggested that infants' capacities were not limited to those involved in the discrimination of speech contrasts. Other findings demonstrated that infants could compensate for the variability that occurs in speech due to differences between talkers (Kuhl, 1979) or changes in speaking rate (Eimas and Miller, 1980). Although there was some debate as to whether these underlying capacities were general perceptual ones (Jusczyk, 1982) or specific to language (Eimas, 1982), the focus of much of the early work was on illuminating the range of capacities that made it possible to acquire the sound structure of one's own language.
2. Finding cues to structure in the input As infants gain experience with a particular language, speech perception capacities may be affected by the nature of the input, but in turn these capacities may also affect the course of language acquisition. An example of the first sort are the findings of a decline in infants' sensitivity to non-native contrasts, evidenced in the studies of Werker and Best and their colleagues (Best et al., 1995, 1988; Polka and Werker, 1994; Werker and Lalonde, 1988; Werker and Tees, 1984). My own research, inspired by some interesting observations of Gleitman and Wanner (1982), has addressed the second type of interaction, namely, the role that speech capacities might play in acquiring a native language. This interest led me to investigate when infants began to pick up information about characteristics that are particular to the sound structure of their target language. Identifying when infants display sensitivity to such features helps to clarify the potential role that such features may play in acquiring the language. For instance, in order for any prosodic marking of clausal or phrasal boundaries (by declinations in pitch, final lengthening, or pausing) to be useful to language learners in discovering the grammatical organization of the language, they must first be able to detect these markers in fluent speech. To address precisely this issue, Deborah Kemler Nelson, Kathryn Hirsh-Pasek and I began a series of studies exploring the sensitivity of infants to the prosodic marking of grammatical units in fluent speech. The first hurdle was to devise a method to investigate infants' sensitivity to prosodic markers. We decided to compare how infants would respond to passages in which 1-second pauses were inserted either at clause boundaries or within clauses. The rationale was that if infants already did perceive prosodic marking of clausal boundaries, then pauses coincident with these would be likely to exaggerate this organization, whereas pauses within clauses would compete with the perceived prosodic organization. To test this notion, we inserted pauses into speech from a mother talking to her 18-month-old and presented it to infants using the Headturn Preference Procedure (Jusczyk, in press; Kemler Nelson et al., 1995). In our first
P.W. Jusczyk I Lingua 106 (1998) 197-218
199
study, we found that 10.5-month-olds, and even 7-month-olds, detected the prosodic cues to clause boundaries (Hirsh-Pasek et al., 1987). Specifically, the infants listened significantly longer to passages in which the pauses coincided with the clause boundaries than they did to ones in which the pauses occurred between words in the middle of clauses. As later studies showed, English-learning infants' sensitivity to these prosodic markers did not appear to be specifically tied to their native language (Jusczyk, 1989). In particular, we found that 4.5-month-olds would also respond to these prosodic markers not only in their own target language, English, but also in a foreign language, Polish (see Fig. 1). In fact, infants at this age also showed a similar tendency to respond to phrase boundaries in music that were marked by pauses. Krumhansl and I inserted a series of 1-second pauses into Mozart minuets either at phrase boundaries or within phrases and found that 4.5-month-olds displayed a preference for the sequences in which the pauses coincided with the phrase boundaries (Jusczyk and Krumhansl, 1993; Krumhansl and Jusczyk, 1990). Interestingly enough, an analysis of the musical structure of these pieces indicated that the musical phrase boundaries were marked by cues similar to those found in speech (i.e., a declination in pitch and a lengthening of final notes). Further investigations explored whether infants' sensitivity to prosodic marking of clausal units had any direct consequences in their processing of speech information. For example, might information contained within such a marked phrase be encoded as a unit and, therefore, better remembered than if it occurred in different units? Indeed, one early psycholinguistic demonstration of the psychological reality of clauses and phrases in adult language comprehension was that the organization that such units afford facilitates memory (Miller, 1964; Miller and Isard, 1963; Suci, 1967). Using the high-amplitude-sucking procedure (Polka et al., 1995), we found that 2-month-olds were more apt to remember speech sounds that were contained within a single well-formed prosodic grouping than they were to remember the same information when it was presented without such a grouping (Mandel et al., 1994). Specifically, infants were more apt to detect phonetic changes in a sequence such as 'the rat chased white mice' to 'the cat chased white mice', after a 2-minute delay when the prosodic packaging was available than when it was not. Moreover, there is even some indication that this sort of prosodic packaging may help infants in remembering the sequential ordering of information. Mandel et al. (1996) presented 2-month-olds with the same information either within the same well-formed prosodic unit or as fragments of two adjoined prosodic units. During the first (or 'pre-shift') phase of the experiment, infants' sucking responses produced a word sequence such as 'cats jump wood benches'. Half of the infants heard this sequence spoken with prosody appropriate to a sentence. The other half of the infants heard the same sequence of words but the sequence was constructed from two 2-word fragments ('Cats jump' and 'Wood benches') that were concatenated together. Each of these 2-word fragments was a well-formed prosodic unit that had been given in response to a question. However, when concatenated, they do not have the prosody appropriate for a sentence, making them less prosodically coherent than the sentential materials. After the infants sucking patterns habituated to the initial
200
P.W. Jusczyk I Lingua 106 (1998) 197-218
Fig. 1. Displays the average listening times of American 4.5-month-olds who heard the Coincident and Noncoincident versions of either English or Polish passages.
P.W. Jusczyk I Lingua 106 (1998) 197-218
201
word sequence, a 2-minute silent delay period occurred during which time a series of slides was shown. During the second (or 'post-shift') period, infants in control groups continued to hear the identical sequences that were played in the preshift phase; infants in the two experimental groups heard a new sequence involving the same words and phonetic information as before, but re-ordered (e.g., 'cats would jump benches'). Infants who had heard the sequence with sentential prosody in the pre-shift phase also heard the new sequence spoken with sentential prosody. Infants who had heard the concatenated fragments in the pre-shift phase heard the new sequence as a concatenation of two fragments. Only the infants who had heard the sentential materials showed significant increases in sucking to the word order changes in the post-shift period. Thus, 2-month-olds were better able to remember information about sequential order when it occurred within a well-formed prosodic unit. The fact that 2-month-olds display some capacity for remembering sequential information is interesting given the importance of such information in acquiring a native language. For example, sequential order information is important for the recognition of multisyllabic morphemes and words, as in distinguishing [ti' pat] (i.e. 'teapot') from [pat' ti] (i.e., 'potty'). Information about sequential order is also important in learning about word formation rules in a language, particularly for ones with a highly developed morphological system (e.g., Turkish), in which strict constraints exist on the ordering of morphemes within a word (Kenstowicz, 1994). Furthermore, in a language like English, serial order plays an important role with respect to the encoding of syntactic relations within sentences and in the organization of phrases. To summarize, these investigations demonstrate that, from a young age, infants are sensitive to markers of clausal units in the speech signal. Moreover, the available prosodic packaging does appear to affect what infants encode and remember about speech. However, the capacity that infants display for responding to the prosodic marking of clausal units is apparently not tied to their experience with a specific-language, and perhaps not to language at all. Regardless, this capacity clearly influences the perception of the organization of information in the speech signal and, in this sense, it has considerable implications for advancing the learner's acquisition of syntactic organization.
3. Developing sensitivity to language-specific features of the input Only when we began to explore when infants pick up information about the more fine-grained features of the sound structure of the native language, did we become cognizant of how sensitive infants are to the distribution of information in the input. The question that motivated these studies concerned the ability of infants to recognize when an unfamiliar word was likely to belong to their native language as opposed to some other language. In particular, we focused on when infants begin to recognize that native language words include particular phones and phonotactic sequences. Hence, whereas an English learner may hear a number of occurrences of
202
P.W. Jusczyk I Lingua 106 (1998) 197-218
[5] and [e] in the input, a French learner will not. Similarly, whereas phonotactic sequences such as [db], [kt], and [zb] might be common as the onsets of syllables of Polish words, they do not occur as the onsets of syllables in English words. Eventually, learners discover these properties about words in their native language. The question is when. Whorf (1956) suggested that such information is learned at some time between 2 and 5 years of age. To see whether knowledge of these features of a language might develop even earlier, we decided to examine whether infants show signs of recognizing those patterns that belong to their native language (Jusczyk et al., 1993b). We began by having a bilingual Dutch-English speaker record lists of unfamiliar English and Dutch words. In each list of 15 words, at least 10 of the items violated the phonotactic constraints of the other language. For instance, English words ended in voiced segments or contained dipthongs that were not present in Dutch; whereas the Dutch lists contained words that began with phonotactic sequences such as [zw], [vl], and [kn] which are not permissible in English. American 6-month-olds were tested on 6 English and 6 Dutch lists of these sorts using the Headturn Preference Procedure. They listened about equally long to the English and Dutch lists suggesting that they were not yet sensitive to the phonotactic characteristics of words in their native language. However, when 9-month-olds were tested on the same lists, a different pattern emerged: Dutch infants listened significantly longer to the Dutch lists than the English lists; whereas American infants listened significantly longer to the English lists. To determine whether infants were responding on the basis of phonotactic cues rather than prosodic differences or other information, we removed most of the phonotactic information by low-pass filtering the lists. When tested on these low-pass filtered versions of the lists, 9-month-olds showed no preference for either type of list, suggesting that the earlier preference was based on sensitivity to phonotactic differences in the lists. One interpretation of the results of this study is that, by 9 months, infants have learned the kinds of phonetic sequences that occur in words in their language. Alternatively, it is possible that the infants simply perceived some of the phonotactic patterns of the non-native language as odd because they had never heard them before. A subsequent study by Jusczyk et al. (1994) suggests that the first interpretation is correct. In their test materials, they only included items that contained phonotactic patterns that were permissible in English words. However, two types of lists of CVC items were constructed. In one type of list, all the items contained phonotactic sequences that occur frequently in English words. In the other type of lists, all the items contained phonotactic sequences that, although permissible, occur with much lower frequency in English words. Once again, American 6-month-olds displayed no preference for either type of lists. However, 9-month-olds listened significantly longer to the lists containing the higher frequency phonotactic patterns. These findings suggest that, between 6- and 9-months, the infants became more attuned to the frequency with which such patterns occur in the input. Sensitivity to the distribution of information in the input is not simply limited to phonetic and phonotactic properties. Jusczyk et al. (1993a) found evidence that sensitivity to the predominant stress pattern of native language words also develops
P.W. Jusczyk I Lingua 106 (1998) 197-218
203
between 6- and 9-months of age in English-learners. Cutler and Carter (1987) had noted that a very high proportion of content words in English conversational speech begin with strong (or accented) syllables. Hence, among bisyllabic words, ones with strong/weak stress patterns (e.g., 'falter') occur much more frequently than ones with weak/strong patterns (e.g., 'default'). When tested on lists of strong/weak vs. weak/strong words, American 6-month-olds listened about equally long to each type of list. However, 9-month-olds displayed a significant preference for the lists of strong/weak words. Thus, once again, the infants showed a listening preference for those patterns that occur more frequently in the input. The picture that emerges from these investigations is that, between 6- and 9months of age, infants are becoming more attuned to the kinds of sound patterns that occur frequently within their native language. In this sense, they are clearly learning about the structure and organization of sound patterns in the language. It is noteworthy that this sensitivity to the distribution of these patterns in the input coincides with the period in which declines in sensitivity have been reported for the discrimination of non-native phonetic contrasts (Best et al., 1995; Werker and Tees, 1984). This is an indication that infant speech perception capacities are developing to facilitate the processing of speech patterns in the native language (Jusczyk, 1986, 1993, 1997). 4. How sensitivity to frequently occurring patterns aids word segmentation Although it is impressive that sensitivity is developing during the latter half of the first year to the frequency with which certain patterns occur in the input, what consequences does this have for acquiring the language? The most obvious use that the learner's knowledge of such patterns may have is in beginning to segment words from fluent speech. Because speakers seldom pause between words in fluent speech, the acoustic information of one word often overlaps with that of immediately adjacent words (Klatt, 1980, 1986, 1989). For this reason, it is often suggested that, in segmenting words from speech, listeners rely on their knowledge of certain statistical regularities in the sound patterns of words in the language. For example, Cutler and her colleagues have suggested that English-listeners knowledge of the predominant pattern of strong initial syllables in words may be useful in finding word boundaries in fluent speech (Cutler, 1990; Cutler, 1994; Cutler and Butterfield, 1992; Cutler and Norris, 1988).1 Specifically, Cutler and her colleagues have suggested that as a first-pass, listeners might adopt a Metrical Segmentation Strategy (MSS) whereby strong syllables are identified as onsets of new words in the speech stream. Similarly, Church (1987) hypothesized that listeners' knowledge of contexts in which certain allophones typically occur (e.g., aspirated [t] occurs at the beginning of English words) would provide useful information for word segmentation. In the same vein, English-listeners' knowledge of the fact that sequences such as [db] and [kt] 1 'In recent instantiations of the MSS (Norris et al., 1998), this language-specific segmentation strategy feeds into a language-universal Possible World Constraint (PWC) which penalizes parses that produce impossible words, such as consonant strings with no vowel'.
204
P.W. Jusczyk I Lingua 106 (1998) 197-218
cannot occur as syllable onsets could be helpful in segmenting certain word patterns from fluent speech (Brent and Cartwright, 1996; Myers et al., 1996). It is only relatively recently that researchers have begun to investigate the word segmentation abilities of infants. In part, studies in this area were hindered by the lack of an appropriate paradigm for testing infants' abilities to segment words. However, Jusczyk and Aslin (1995) were able to modify the Headturn Preference Procedure to study word segmentation. They familiarized infants with a pair of isolated words (e.g., 'feet' and 'bike' or 'cup' and 'dog') for 30 seconds to each word. Then, they tested 7.5-month-olds' recognition of these items when the words were embedded in each sentence of a short passage. Specifically, the infants heard 4 different 6sentence passages. Two of these passages contained the familiar words; the other two passages contained the other two items that the infants had not been familiarized with. The 7.5-month-olds listened significantly longer to the passages that contained the words that they had been familiarized with, indicating that they did detect the occurrence of these words in the passages. By comparison, 6-month-olds tested on the same materials showed no significant preferences for any of the passages. As a further check on the capacity of 7.5-month-olds to segment words from fluent speech, Jusczyk and Aslin tried familiarizing the infants with two of the passages first, and then presenting them with repetitions of four isolated words (two of which occurred in the familiarization passages). Even though this task was potentially more difficult (because in the midst of many other words, the infants would have to notice that certain words were being repeated), the 7.5-month-olds listening times were longer to the isolated words that had originally occurred in the familiarization passages. Jusczyk and Aslin's findings show that 7.5-month-olds are beginning to segment words from fluent speech, but they leave unexplained the means by which infants are able to do this. Together with Derek Houston and Mary Newsome (Houston et al., 1995; Newsome and Jusczyk, 1995), we have been exploring the extent to which English-learning infants rely on prosodic cues, such as predominant stress patterns, to begin to segment words from speech. For example, might English-learners use some form of the MSS in finding the boundaries of words in speech? To explore this possibility, we examined 7.5-month-olds' abilities to segment longer words from the speech stream. This provided us with the opportunity to examine words which either had or did not have the predominant stress pattern of English words. We began with words that had the predominant strong/weak stress pattern (e.g., 'hamlet' and 'kingdom' or 'doctor' and 'candle'). Infants were familiarized with pairs of such words, and then tested on a series of four test passages. The 7.5-month-olds gave evidence of detecting the familiarized strong/weak words in the passages because they listened significantly longer to the passages containing these than to the other passages. Of course, even though infants familiarized with 'doctor' and 'candle' listen longer to passages with these words, this does not necessarily imply that they detected the whole words in the passages. Perhaps the infants were only sensitive to the occurrence of strong syllables. Then they may have simply matched the initial syllables of the words 'doctor' and 'candle' to those in the passages. To examine this possibility, we tried familiarizing the infants with just the strong syllables of
P.W. Jusczyk I Lingua 106 (1998) 197-218
205
these items, and then testing them on the passages containing the whole words. Under these circumstances, the infants did not show any significant preferences for the passages containing the words with the familiar strong syllables. Nor did infants who were familiarized with the whole words 'doctor' and 'candle' show any tendency to listen longer to passages with the words 'dock' and 'can' in them. Thus, with respect to familiarized words having strong/weak stress patterns, 7.5-montholds appeared only to detect matches to the whole words, rather than to just the strong syllables. These findings with respect to strong/weak words are certainly consistent with what would be expected if English-learning 7.5-month-olds are following MSS. However, a much stronger test of this hypothesis involves how infants respond to words without the predominant stress pattern. For example, consider weak/strong words such as 'surprise' and 'guitar'. A tendency to identify the onsets of new words with the occurrence of strong syllables would split both of these words up, such that 'prize' and 'tar' might be extracted. Consequently, to explore how infants would respond to words without the predominant English stress pattern, we familiarized 7.5-month-olds with pairs of weak/strong words, such as 'beret' and 'device' or 'surprise' and 'guitar'. Then they were tested on four passages, two of which included the familiar items, and two of which did not. In contrast to the previous results with the strong/weak words, the 7.5-month-olds showed no significant listening preferences for the passages containing the familiarized weak/strong words. Hence, just as proponents of MSS would predict, these infants had difficulty detecting words which began with weak (or unstressed) syllables. Further evidence that English-learning 7.5-month-olds follow something like MSS in segmenting words from fluent speech comes from additional studies that we carried out. For example, it follows from MSS that the reason that infants who are familiarized with isolated words such as 'guitar' and 'device' have difficulty detecting these in fluent speech is because in the latter context, they are likely to insert a word boundary between the initial weak syllable and the following strong syllable. This suggests that familiarizing infants with just the strong syllables of these words (e.g., 'tar' and 'vice') might lead them to listen longer to passages containing weak/strong words with these familiarized strong syllables (e.g., 'guitar' and 'device'). As is predicted from MSS, this is exactly what happened. However, before concluding that English-learners begin segmenting words from fluent speech by using some version of MSS, we must consider a potential paradox that is evident in the overall pattern of results. Namely, why do infants match familiarized strong syllables to weak/strong words in passages, but not to strong/weak words in passages? To answer this, we must take a closer look at the contexts in which the strong syllables occur in the passages (see Table 1). Notice that when 'dock' occurs as a strong syllable in a word like 'doctor', it is always followed by the same weak syllable. Thus, the distributional evidence, indicates that these two syllables co-occur together and form a single unit. In contrast, consider the situation for the strong syllable 'vice' in 'device'. In the passage 'vice' is followed by a variety of different contexts, so the distributional evidence suggests that 'vice' has no consistent following syllables.
206
P.W. Jusczyk I Lingua 106 (1998) 197-218
Table 1 A comparison of strong/weak and weak strong passages Doctor passage (strong/weak) The doctor saw you the other day. He is much younger than the old doctor. I think your doctor is very nice. He showed another doctor your pretty picture. That doctor thought that you grew a lot. Someday, maybe you'll be a big doctor. Device passage (weak/strong) Your device can do a lot. Her device only fixes things. My new red device makes ice cream. The pink device sews clothes. We don't need that old device. I think that it is a plain device.
To test this account, we conducted two additional experiments. In the first of these, we re-wrote the passages to always follow each target word by the same weak syllable. For instance, 'guitar' was always followed by 'is' and 'device' was always followed by 'to'. Hence, the distributional evidence favored linking the strong syllable with a particular following weak syllable. In one experiment, infants were familiarized with strong syllables, such as 'tar' and 'prize' and then tested on the new passages. There was no evidence that infants generalized from the familiarized strong syllables to the new passages containing 'guitar is' and 'device to'. In a second experiment, the familiarization stimuli were changed to strong/weak nonsense words such as 'taris' and 'viceto'. This time, the infants did listen significantly longer to the passages containing 'guitar is' and 'device to'. In effect, the infants mis-segmented the nonsense words 'taris' and 'viceto' from these passages. Taken together the findings from all of these experiments on two-syllable words suggest that English-learners do begin by identifying word onsets with strong syllables (see also (Echols et al., 1997; Morgan, 1994; Morgan and Saffran, 1995). Where does such a strategy originate? That is, how do English-learners discover the predominant stress pattern of English words without already having some ability to segment words from fluent speech? One possibility is that infants discover the predominant stress pattern from isolated words that they hear spoken frequently to them. For example, the predominant stress pattern of most boys' names and many girls' names in English is strong/weak (Cutler et al., 1990). Moreover, even when formal names do not have the predominant stress pattern, parents often use nicknames that are strong/weak. So, Elizabeth becomes Betty or Lizzie; Jerome and Bernard become Jerry and Bernie, respectively. In addition, the dimunitive terms used most often with infants have the predominant stress pattern (e.g., 'mommy', 'daddy', 'doggie', 'kitty', 'baby', 'birdie', etc.) Thus, it is possible that the bias toward items with strong syllable onsets emerges from infants' experiences with names and dimunitives. An alternative account of how the MSS emerges in development has been recently suggested by Cairns et al. (1997). They tested several computer simulation models which used segment-to-segment phonotactic constraints on the London-Lund Corpus of continuous speech. Cairns et al. found that the MSS could be discovered on the basis of a distributional analysis of English phonotactics, without reference to the prosodic structure per se. Of course, the question is whether this is how English-
P.W. Jusczyk I Lingua 106 (1998) 197-218
207
learners actually derive the MSS. Some findings from an ongoing study in our laboratory suggest that at the early stages, stress cues may dominate phonotactic cues (Mattys et al., in preparation). In particular, in one of our experiments, we pitted phonotactic cues to word boundaries against prosodic stress cues. For example, for some items, the prosodic cues suggested that a CVCCVC sequence be treated as a single word with a strong/weak pattern, but the phonotactics internal CC sequence implied a word boundary. For other items, the phonotactics of the internal CC favored treating the CVCCVC as a single word, but the prosodic cues implied a word boundary. The infants behaved in accordance with the prosodic cues. Results of other experiments in this study, suggested that 9-month-olds are sensitive to the way in which prosodic and phonotactic cues typically line up, but that the stress cues seem to carry greater weight at this point in development. In addition to identifying the onsets of words in fluent speech with strong syllables at the beginning stages of word segmentation, English-learners also use distributional evidence to determine whether such candidate words are monosyllabic or include following syllables (for additional evidence on this point (see Brent and Cartwright, 1996; Saffran et al., 1996). A word segmentation strategy based on MSS will do a reasonable job of recovering information about the occurrence of words that begin with strong syllables. Fortunately, the majority of content words fit this pattern in English. This should allow the learner to get started on building a vocabulary. Nevertheless, the proposed strategy will miss the onsets of words beginning with weak syllables. Consequently, as a word segmentation strategy, it is clearly incomplete. The language learner will need to use other sorts of cues to recover information about words without the predominant stress patterns. Finally, note that this particular strategy is a language-specific one in that it may work for languages (like English, Dutch, and Czech) in which initial syllable stress predominates. At the present time a syllabic strategy and a moraic strategy (Otake et al., 1993) exhaust the list of prosodic segmentation strategies that have been found to occur. Infants learning other types of languages, such as French (in which final lengthening might provide a cue to word-final boundaries), evidently have to discover other kinds of regularities to predict word boundaries in speech.
5. How smaller chunks may help in word segmentation What benefits can there be to using a word segmentation strategy that is bound to fail to identify a significant number of words in the input? One possibility is that breaking the input into smaller chunks could facilitate the task of discovering other possible cues to word boundaries. For example, by attending to information at the edges of these chunks, a learner may be able to pick up certain regularities regarding the kinds of allophones and phonotactic sequences that do and do not appear in such positions. Sensitivity to these features could provide further information about potential word boundaries in fluent speech. Indeed, as recent modeling studies show, access to phonotactic cues greatly improves performance in correctly locating word
208
P.W. Jusczyk I Lingua 106 (1998) 197-218
boundaries in fluent speech (Brent and Cartwright, 1996; Cairns et al., 1997; Christiansen, Allen and Seidenberg, 1997). We have considered one such additional source of information about word boundaries suggested by Church (1987; see also (Bolinger and Gerstman, 1957; Hockett, 1958; Lehiste, 1960; Umeda and Coker, 1974), namely, allophonic cues. For instance, in English, when [t] occurs in syllable-initial position, it is aspirated; otherwise it is typically unaspirated. Thus, once the distribution of these allophones is learned, the presence of an aspirated [t] would be an indicator of a potential word onset. Of course, this assumes that the listener has some capacity to distinguish the two kinds of allophones. Hohne and I investigated whether 2-month-olds had the capacity to discriminate the kinds of allophonic differences that mark the distinction between 'nitrate' and 'night rate'. The first occurrence of 't' in 'nitrate' is aspirated, released, and often retroflexed, and the following 'r' is devoiced - suggesting that it is part of a cluster. In contrast, the first 't' in 'night rate' is unaspirated, unreleased, not retroflexed, and sometimes glottalized, and the following 'r' is voiced - suggesting that it is syllableinitial. Together the absence of aspiration for the first't' in 'night rate' and the voicing of the following 'r' suggests that the phoneme /t/ is not syllable-initial. These allophonic differences were sufficient to allow 2-month-olds to discriminate tokens of 'nitrate' from 'night rate', even when other potential distinguishing characteristics were removed from the stimuli by cross-splicing (Hohne and Jusczyk, 1994). Although the capacity to discriminate among different allophones is a prerequisite to using allophonic cues in word segmentation, it is not a sufficient basis for doing so. One must also learn about the typical distribution of these allophones in words. To determine when English-learners might be able to use allophonic cues in segmenting words from fluent speech, we (Jusczyk, Hohne and Bauman, in press) conducted an additional series of experiments with older infants. We began by familiarizing 9-month-olds with pairs of items, such as 'night rates' and 'doctor' (or 'hamlet' and 'nitrates'), and then testing them on their ability to detect the occurrence of these words in passages. With respect to familiarization with either 'hamlet' or 'doctor', the infants listened longer to the passage containing the familiarized word than they did to the passage with the other word. However, the same did not hold for either 'night rate' or 'nitrate'. Infants familiarized with 'nitrate' listened just as long to the 'night rate' passage as they did to the 'nitrate' passage, and similarly, for familiarization with 'night rate'. Consequently, there was no indication that English-learning 9-month-olds were able to use the allophonic differences between 'night rate' and 'nitrate' to recognize them in fluent speech contexts. By comparison, there was some evidence that older infants can draw on these allophonic cues in detecting these items in fluent speech contexts. In particular, 10.5-month-olds tested with the same materials did show significant listening preferences for the passages containing the member of the 'night rate' / 'nitrate' pair that they had been familiarized with. The picture emerging from these studies of early word segmentation abilities in English-learners is that of a gradual process, whereby the learner begins with some approximation of mature word segmentation, and then uses this crude parse to
P.W. Jusczyk I Lingua 106 (1998) 197-218
209
uncover other potential cues. As the learners begin to draw on and integrate these different sources of information, they are in a better position to pick up still other cues that help to refine their developing word segmentation skills. So, although English-learners may rely on stress-based cues and distributional evidence about cooccurring following syllables as an initial basis for word segmentation, this constrains the search space and facilitates the detection of allophonic and phonotactic cues that can also be brought to bear on segmentation.
6. How sensitivity to distributional evidence helps in other ways There are other ways in which sensitivity to how certain information is distributed in the input can help in the discovery of the inherent organization of the language. A case in point concerns function words. A learner who notices that certain function words occur consistently in some phrase contexts and not others might be able to use such information in determining what type of phrase it is. For example, in English, 'the' occurs at the beginnings, rather than at the ends, of noun phrases; prepositions may follow verbs, but they do not immediately precede them. As with the examples discussed in the preceding section, determining how function words are distributed in the input, first requires some ability to detect such items and to distinguish among them. Of course, function words are often omitted in children's early word combinations (Brown et al., 1969). Moreover, such words are typically unstressed, leading many to wonder whether this may make them difficult to pick up in the input (Echols and Newport, 1992; Gleitman et al., 1988). However, investigations by Gerken and her colleagues (1991, 1994, 1990, 1993) suggest that the omission of function words in children's early speech is more attributable to constraints on production rather than perception. Furthermore, there is some evidence that by 11-months, English-learners are sensitive to the appearance of certain grammatical morphemes in utterances (Shafer et al., 1992). In particular, using an electrophysiological measure, Shafer et al. found that infants distinguished a normal English passage from one in which the function words were replaced by nonsense syllables that were phonologically unlike real English function words. Recent dissertation research conducted in my laboratory by Shady (1996) used a behavioral measure (Headturn Preference Procedure) to follow up on Shafer et al.'s observations. In a series of experiments, Shady replicated and extended Shafer et al.'s finding that English-learning infants are sensitive to the presence of function words in sentences. As in Shafer et al.'s investigation, Shady found that 10.5-month-olds display significant preferences for passages with real function words over ones with phonologically dissimilar nonsense function words. Moreover, the significant preference for passages with real functors occurred even when these were pitted against passages with nonsense functors that were phonologically similar to real English functors. Consequently, it appears that by 10.5-months, English-learners have begun to recognize some of the function words that typically appear in English utterances. Shady's next step was to determine when English-learners develop sensitivity to the distributional properties of these function words (i.e., where do such words typi-
210
P.W. Jusczyk I Lingua 106 (1998) 197-218
cally occur in English utterances?). To investigate this, she created a set of passages and then constructed a new set from these by interchanging some of the function words in these (see Table 2 for an example). Then, using the Headturn Preference Procedure, she compared infants' listening times to both types of passages. She found that 10.5-month-olds did not show a significant preference for either the normal or the scrambled passages. Thus, although infants at this age may know about what kinds of function words should occur in English utterances, they do not appear to know where these words belong. Similar results were obtained when groups of 12.5-month-olds and 14-month-olds were tested. However, by 16-months, the infants displayed significant preferences for the passages with the correctly positioned function words (see Fig. 2). Table 2 Examples of passages with correct and scrambled function words Passage with correctly ordered function words: A bike with three wheels is coming down the street. Johnny had seen that bike yesterday. The lady with him was his aunt. This red bike was missing for a day. That cover had fallen on it. We had found the bike next to her garage. Passage with scrambled function words: Is bike with three wheels a coming down the street. Johnny that seen had bike yesterday. Was lady with him the his aunt. Was red bike this missing for a day. Had cover that fallen on it. We the found had bike next to her garage.
Although further research is necessary to determine which function words are involved, it is clear that some sensitivity to the typical distribution of function words in English is developing between 14 and 16 months of age. This kind of information would then be available to language learners as a potential cue to the labeling of phrasal units that infants may have extracted on the basis of their sensitivity to the prosodic marking of such units. Once again, the prior division of utterances into smaller chunks, such as phrasal units could have facilitated the process of discovering the typical distribution of elements such as function words within these units.
7. Constraining the search for regularities in the input The evidence reviewed thus far indicates that infants have some impressive abilities to detect regularly occurring properties of language input. Indeed, as Saffran et al. (1996) concluded about their recent investigation of 8-month-olds' abilities to use distributional properties to segment words from an artificial language, infants are good statistical learners. However, the problem for accounts of language acquisition based on distributional analyses of the input is how to restrict the search so as to avoid extracting spurious correlations between features that are unrelated to the structure of the target language. The notion that learners must arrive at just the right set of generalizations about language structure within a relatively short exposure
P.W. Jusczyk I Lingua 106 (1998) 197-218
211
Infants' Responses to Well-Placed and Misplaced Function Words in Sentences
Fig. 2. Displays the average listening times (and standard error bars) for infants at four different ages to passages with correctly ordered (Natural) and scrambled (Unnatural) function words.
212
P.W. Jusczyk I Lingua 106 (1998) 197-218
period is what has led some investigators to argue that there are strong constraints on the kinds of regularities that learners will attempt to pull out of the input (Chomsky, 1968; 1980; Fodor, 1983; Wexler and Culicover, 1980). In principle, there are a number of ways of constraining the search. On the one hand, the nature of the input itself may be constrained in a way which would allow the learner to focus the search on the essential properties. For example, there may be restrictions on the complexities of sentential structures that appear, the overall length of utterances, etc. This has been the view of those who have argued that childdirected speech provides just this sort of input for the learner (for a discussion of some of the pros and cons of this approach, see the papers in (Snow and Ferguson, 1977). On the other hand, the critical constraints may be ones that the learner brings to the acquisition situation. The latter constraints can be of two sorts: general constraints on cognitive and perceptual capacities, or specialized constraints that are specific to language processing. In our research on underlying speech perception capacities, we are currently exploring the possibility of both generalized cognitive and perceptual constraints and specialized language-specific constraints. One current project investigating the role of general cognitive and perceptual constraints on language acquisition focuses on how the size of the learner's processing window affects the acquisition of discontinuous dependencies in English. Discontinuous dependencies involve relationships between constituents that can occur at some remove from each other in an utterance (such as the relationship between the morphemes 'is' and the verbal ending 'ing', as in 'the man is not buymg the car'). The size of the processing window is a potential factor in the acquisition of discontinuous dependencies because English grammatical structure allows for the placement of intervening items between the two related constituents (e.g., 'my aunt is not always constantly knitting sweaters'). If the size of the processing window is severely restricted so that the related elements never co-occur within the window, then their relationship may not be discovered by the learner. Although a limited processing window would be disadvantageous in this situation, some have suggested that limits on the size of the processing window during development could actually facilitate the acquisition of certain kinds of grammatical relations (Elman, 1993; Newport, 1990, 1991). For example, Newport (1991) has noted that the existence of a restricted processing window at early points in development may facilitate the pickup of certain kinds of morphological relations. Similarly, Elman (1993) found that when the size of the processing window was restricted during the initial stages of learning, his connectionist acquisition model did a much better job in the end in learning long-distance dependencies in sentences. Hypotheses about the effects of restricted processing windows on language acquisition assume that such constraints are present in learners. However, very little information is currently available about the size of learners' processing windows at different points in development. As a first step in this direction, Lynn Santelmann and I decided to investigate the ability of infants to process discontinuous dependencies when the constituents were located at different distances from each other (Santelmann and Jusczyk, submitted). In particular, we focused on the ability of 18-montholds to detect the relationship in English between the auxiliary verb 'is' and the verb
P.W. Jusczyk I Lingua 106 (1998) 197-218
213
ending 'ing'. We chose to examine this particular dependency because the 'ing' morpheme is common in English and is among the earliest bound morphemes that English-learners produce (Brown, 1973; de Villiers and de Villiers, 1973). To explore sensitivity to this dependency, we constructed a set of 'natural' passages so that each sentence contained an instance of is' paired with a verb ending in 'ing'. Then, a new set of 'unnatural' passages was created by replacing all instances of 'is' with the modal 'can'. This operation created ungrammatical sequences such as 'grandma can singing'. The passages were produced in synthesized speech by DECTalk so that the natural and unnatural passages would have the same prosody and stress on the critical constituents. We hypothesized that if infants were sensitive to the dependency between 'is' and the 'ing' ending, then they should listen significantly longer to the natural than to the unnatural passages. We began by testing 18-month-olds on passages in which the critical elements were separated by only a monosyllabic verb stem (e.g., is digging). Our results indicated that the infants listened significantly longer to the natural than to the unnatural passages. Thus, we established that 18-month-olds are sensitive to this type of dependency. Our next goal was to vary the distance between the critical constituents in order to estimate the size of the processing window for infants at this age. We used the same basic passages as in the previous experiment, but in each sentence we inserted a four-syllable adverbial between the auxiliary and the verb with the 'ing' ending (e.g. 'is almost always digging). This time the 18-month-olds showed no listening preferences for either the natural or unnatural passages, suggesting that the distance between the critical constituents fell outside of their processing window. Two additional experiments were carried out by inserting a two-syllable adverbial in one case and a three-syllable adverbial in the other. The infants displayed a significant preference for the natural over the unnatural passages when the two-syllable adverbial was present, but not when the three-syllable adverbial was. Thus, the size of the processing window for 18-month-olds in detecting this kind of dependency appears to be on the order of between three and four syllables. Considerably more research needs to be done before we can make any definitive statements about the actual size of the processing window and how it changes during the course of development. In particular, we are trying to gather information about how older and younger infants respond to the same dependencies. In addition, we are exploring whether the same sized window applies to other kinds of dependencies that 18-month-olds can detect. In the long run, information of this sort should be useful in evaluating the plausibility of claims made about the kind of information that learners, at various ages, can pick up from the input. Another line of investigation is exploring the existence of constraints that appear to be specifically linguistic in nature. For example, linguists have often held that there are constraints on the nature of syllable structures that occur in natural languages. In Optimality Theory (Prince and Smolensky, 1996), it is assumed that Universal Grammar provides a set of violable well-formedness constraints on the phonological organization of languages, including syllable structure. Because these constraints are considered to be universal, they are assumed to be part of the learner's innate endowment. The way that languages are said to differ is in terms of
214
P.W. Jusczyk I Lingua 106 (1998) 197-218
how the particular constraints are ranked with respect to each other. Hence, the task for learners is to discover the way that these constraints are ranked within their particular target language To test the validity of some of the claims made about the existence of certain constraints on syllable structure proposed in Optimality Theory, Paul Smolensky and I have begun a series of studies investigating the extent to which infants give evidence of respecting these constraints. Two such constraints are ONSET (Syllables must have onsets) and NOCODA (Syllables do not have codas). Both types of constraints are violated in certain contexts in English (e.g., in the word 'at'). However, in other contexts, such as a non-initial syllable in a word, the ONSET constraint is rarely violated. With respect to the initial state of language learning, it is an empirical question as to whether constraints such as ONSET and NOCODA are highly ranked. If they are highly ranked, then one might expect to find that infants might prefer linguistic forms in which these constraints are highly ranked over ones in which these constraints are less highly ranked. To explore this hypothesis, we created two types of lists made up of vowel-consonant-vowel stimuli (VCV's). For one type of list (V.CV), a syllable boundary was imposed between the initial vowel and the following consonant. In this type of syllable structure, ONSET and NOCODA are satisfied. For the other type of list (VC.V), both constraints are violated. In our first experiment, the stimuli were produced with a cross-splicing technique. For the V.CV structures, the initial V of an item like [o.kA] was taken from an item that began with an initially stressed V and whose second syllable began with the same consonant as the one that would be used in the CV portion of the item. For example, we spliced out the o from [o' kxn] (i.e., 'au' kern') and appended it to the CV portion of an item with an unstressed initial syllable (e.g., [kA] from 'capitulate'). For the VC.V structures, the initial VC of an item like [ok.A] came from an item that began with an initially stressed VC (e.g., ok from 'awkward') and the final V came from the initial syllable of an item with an unstressed initial vowel (e.g., A from 'apitulate'). There were five different lists of each syllable type (each list consisting of 12 different VCV's). When the resulting lists were presented to 6-month-olds using the Headturn Preference Procedure, the infants showed a significant preference for the V.CV lists over the VC.V lists. In other words, they showed a preference for the structures in which the ONSET and NOCODA constraints were satisfied. To verify that our findings were not attributable to some artifact associated with the cross-splicing process, we re-ran the experiment using VCV stimuli that were produced as a single utterance by a talker who was instructed to produce each one as either a V.CV or VC.V utterance. When tested on lists composed of these new items, 6-month-olds again showed a significant preference for the V.CV lists over the VC.V lists. The preferences that infants display for the lists in which ONSET and NOCODA are satisfied appear to fit the predictions of Optimality Theory. In particular, these preferences are evident at a point in development (i.e., 6 months) prior to that at which they show much sensitivity to the specific sound properties characteristic of their native language. Although it is possible that infants may have extracted information about typical syllable structures from their experience with native language
P.W. Jusczyk I Lingua 106 (1998) 197-218
215
input, there is little evidence from previous studies that infants at this age are sensitive to distributional frequencies of sound properties in the input (Jusczyk et al., 1993a; Jusczyk et al., 1994). One additional check on the role of previous experience with these sequences is to use syllables with phonetic properties that do not occur in the infant's native language environment. We are planning an investigation of this type. Infants may well begin language learning with a bias to prefer syllable structures in which constraints such as ONSET and NOCODA are satisfied. Obviously, additional research is needed to determine the extent to which even younger infants may display the same kinds of preferences, and also whether these biases are found in infants who have been exposed to other languages than English. It will also be worthwhile to examine other examples of constraints that are assumed by Optimality Theory to determine the extent to which these really affect the course of acquisition. 8. Conclusions The past thirty years of research in this area has greatly enhanced our understanding of infants' speech perception capacities and their likely role in language acquisition. In addition, a clearer picture is beginning to emerge of what and when infants learn about the sound organization of their native language. In hindsight, infants appear to be much more attuned to the distribution of information in the input than early language researchers expected them to be. Still, a future challenge for research in this area is to explain why infants are so good at extracting some kinds of information (e.g., word stress, phonotactics, frequencies of phones and phonetic sequences, etc.) from the input. In other words, what constraints are there on the regularities and patterns that infants discover in listening to language? References Best, C.T., R. Lafleur and G.W., McRoberts, 1995. Divergent developmental patterns for infants' perception of two non-native contrasts. Infant Behavior and Development 18, 339-350. Best, C.T., G.W. McRoberts, and N.M. Sithole, 1988. Examination of the perceptual re-organization for speech contrasts: Zulu click discrimination by English-speaking adults and infants. Journal of Experimental Psychology: Human Perception and Performance 14, 345-360. Bolinger, D.L. and LJ. Gerstman, 1957. Disjuncture as a cue to constraints. Word 13, 246-255. Brent, M.R. and T.A. Cartwright, 1996. Distributional regularity and phonotactic constraints are useful for segmentation. Cognition 61, 93-125. Brown, R., 1973. A first language. Cambridge, MA: Harvard University Press. Brown, R., C. Cazden and U. Bellugi, 1969. The child's grammar from I to III. In: J.P. Hill (ed.), Minnesota Symposium on Child Psychology, Vol. II, 28-73. Minneapolis, MN: University of Minnesota Press. Cairns, P., R. Shillcock, N. Chater and J. Levy, 1997. Bootstrapping word boundaries: A bottom-up corpus-based approach to speech segmentation. Cognitive Psychology, 33, 111-153. Chomsky, N., 1968. Language and mind. New York: Harcourt Brace Jovanovich. Chomsky, N., 1980. Rules and representations. New York: Columbia University Press.
216
P.W. Jusczyk I Lingua 106 (1998) 197-218
Christiansen, M.H., J. Allen, and M.S. Seidenberg, 1997. Learning to segment speech using multiple cues: A connectionist model. Language and Cognitive Processes. Church, K.W., 1987. Phonological parsing in speech recognition. Dordrecht: Kluwer Academic Publishers. Cutler, A., 1990. Exploiting prosodic probabilities in speech segmentation. In: G.T.M. Altmann (ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives, 105-121. Cambridge: MIT Press. Cutler, A., 1994. Segmentation problems, rhythmic solutions. Lingua 92, 81-104. Cutler, A. and S. Butterfield, 1992. Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Language 31, 218-236. Cutler, A. and D.M. Carter, 1987. The predominance of strong initial syllables in the English vocabulary. Computer Speech and Language 2, 133-142. Cutler, A., J. McQueen, and K. Robinson, 1990. Elizabeth and John: Sound patterns of men's and women's names. Journal of Linguistics 26, 471^4-82. Cutler, A. and D.G. Norris, 1988. The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14, 113-121. de Villiers, J.G. and P.A. de Villiers, 1973. A cross-sectional study of the acquisition of grammatical morphemes in child speech. Journal of Psycholinguistic Research 2, 267-273. Echols, C.H., MJ. Crowhurst and J.B. Childers, 1997. Perception of rhythmic units in speech by infants and adults. Journal of Memory and Language 36, 202-225. Echols, C.H. and E.L. Newport, 1992. The role of stress and position in determining first words. Language Acquisition 2, 189-220. Eimas, P.O., 1974. Auditory and linguistic processing of cues for place of articulation by infants. Perception and Psychophysics 16, 513-521. Eimas, P.D., 1975. Auditory and phonetic coding of the cues for speech: Discrimination of the [r-1] distinction by young infants. Perception and Psychophysics 18, 341-347. Eimas, P.D., 1982. Speech perception: A view of the initial state and perceptual mechanisms. In: J. Mehler, M. Garrett and E.C.T. Walker (eds.), Perspectives on mental representation: Experimental and theoretical studies of cognitive processes and capacities, 339-360. Hillsdale, NJ: Erlbaum. Eimas, P.D. and J.L. Miller, 1980. Contextual effects in infant speech perception. Science 209, 1140-1141. Eimas, P.D., E.R. Siqueland, P.W. Jusczyk, and J. Vigorito, 1971. Speech perception in infants. Science 171, 303-306. Elman, J.L., 1993. Learning and development in neural networks: The importance of starting small. Cognition 48, 71-99. Fodor, J.A., 1983. The modularity of mind. Cambridge, MA: MIT Press. Gerken, L.A., 1991. The metrical basis for children's subjectless sentences. Journal of Memory and Language 30,431-451. Gerken, L.A., 1994. A metrical template account of children's weak syllable omissions. Journal of Child Language 30, 431-451. Gerken, L.A., B Landau and R.E. Remez, 1990. Function morphemes in young children's speech perception and production. Developmental Psychology 25, 204—216. Gerken, L.A. and B.J. Mclntosh, 1993. Interplay of function morphemes and prosody in early language. Developmental Psychology 29, 448^4-57. Gleitman, L., H. Gleitman, B. Landau and E. Wanner, 1988. Where the learning begins: Initial representations for language learning. In F. Newmeyer (ed.), The Cambridge linguistic survey, vol. 3 150-193. Cambridge, MA: Harvard University Press. Gleitman, L. and E. Wanner, 1982. The state of the state of the art. In: E. Wanner and L. Gleitman (eds.), Language acquisition: The state of the art, 3^18. Cambridge: Cambridge University Press. Hirsh-Pasek, K., D.G. Kemler Nelson, P.W. Jusczyk, K. Wright Cassidy, B. Druss and L., Kennedy, 1987. Clauses are perceptual units for young infants. Cognition 26, 269-286. Hockett, C.F., 1958. A course in modern linguistics. New York: Macmillan. Hohne, E.A. and P.W. Jusczyk, 1994. Two-month-old infants' sensitivity to allophonic differences. Perception and Psychophysics 56, 613-623.
P.W. Jusczyk I Lingua 106 (1998) 197-218
217
Houston, D., P.W. Jusczyk and M. Newsome, 1995. Infants' strategies of speech segmentation: Clues from weak/strong words. Paper presented at the 20th Annual Boston University Conference on Language Acquisition, Boston, MA. Jusczyk, P.W., 1982. Auditory versus phonetic coding of speech signals during infancy. In: J. Mehler, M. Garrett and E. Walker (eds.), Perspectives on mental representation: Experimental and theoretical studies of cognitive processes and capacities, 361-387. Hillsdale, NJ.: Erlbaum. Jusczyk, P.W., 1986. Towards a model for the development of speech perception. In: J. Perkell and D.H. Klatt (eds.), Invariance and variability in speech processes, 1-19. Hillsdale, NJ: Erlbaum. Jusczyk, P.W. (1989,). Perception of cues to clausal units in native and non-native languages. Paper presented at the biennial meeting of the Society for Research in Child Development, Kansas City, Missouri. Jusczyk, P.W., 1993. From general to language specific capacities: The WRAPSA Model of how speech perception develops. Journal of Phonetics 21, 3—28. Jusczyk, P.W., 1997. The discovery of spoken language. Cambridge, MA: MIT Press. Jusczyk, P.W., in press. Using the headturn preference procedure to study language acquisition. In: E. Bavin and D. Burnham (eds.), Advances in Infancy Research, vol. 13. Jusczyk, P.W., A. Cutler and N. Redanz, 1993a. Preference for the predominant stress patterns of English words. Child Development 64, 675—687. Jusczyk, P.W., A.D. Friederici, J. Wessels, V.Y. Svenkerud and A.M. Jusczyk, 1993b. Infants' sensitivity to the sound patterns of native language words. Journal of Memory and Language 32, 402^420. Jusczyk, P.W., E.A. Hohne and A. Bauman, in press. Infants' sensitivity to allophonic cues for word segmentation. Perception and Psychophysics. Jusczyk, P.W., E.A. Hohne, and A.L. Bauman, 1995. Infants' sensitivity to word juncture cues. Paper presented at the 36th Annual Meeting of the Psychonomic Society, Los Angeles, CA. Jusczyk, P.W. and C.L. Krumhansl, 1993. Pitch and rhythmic patterns affecting infants' sensitivity to musical phrase structure. Journal of Experimental Psychology: Human Perception and Performance 19, 627-640. Jusczyk, P.W., P.A. Luce and J. Charles Luce, 1994. Infants' sensitivity to phonotactic patterns in the native language. Journal of Memory and Language 33, 630—645. Kemler Nelson, D.G., P.W. Jusczyk, D.R. Mandel, J. Myers, A. Tur, and L.A. Gerken, 1995. The Headturn Preference Procedure for testing auditory perception. Infant Behavior and Development 18, 111-116. Kenstowicz, M., 1994. Phonology in generative grammar. Cambridge, MA: Blackwell. Klatt, D.H., 1980. Speech perception: A model of acoustic-phonetic analysis and lexical access. In: R.A. Cole (ed.), Perception and production of fluent speech, 243-288. Hillsdale, NJ: Erlbaum. Klatt, D.H., 1986. The problem of variability in speech recognition and in models of speech perception. In: J.S. Perkell and D.H. Klatt (eds.), Invariance and variability in speech processes, 300-319. Hillsdale, NJ: Erlbaum. Klatt, D.H., 1989. Review of selected models of speech perception. In: W. Marslen-Wilson (ed.), Lexical representation and process, 169-226. Cambridge, MA: MIT Press. Krumhansl, C.L. and P.W. Jusczyk, 1990. Infants' perception of phrase structure in music. Psychological Science 1, 70-73. Kuhl, P.K., 1979. Speech perception in early infancy: Perceptual constancy for spectrally dissimilar vowel categories. Journal of the Acoustical Society of America 66, 1668- 1679. Lehiste, I., 1960. An acoustic-phonetic study of internal open juncture. New York: Karger. Mandel, D.R., P.W. Jusczyk and D.G. Kemler Nelson, 1994. Does sentential prosody help infants to organize and remember speech information? Cognition 53, 155-180. Mandel, D.R., D.G. Kemler Nelson and P.W. Jusczyk, 1996. Infants remember the order of words in a spoken sentence. Cognitive Development 11, 181-196. Mattys, S., P.W. Jusczyk, P.A. Luce and J.L. Morgan, in preparation. 9-month-olds' sensitivity to prosodic and phonotactic markers of word boundaries. Miller, G.A., 1964. The psycholinguists. Encounter 23, 29-37. Miller, G.A. and S. Isard, 1963. Some perceptual consequences of linguistic rules. Journal of Verbal Learning and Verbal Behavior 2, 217-228.
218
P.W. Jusczyk I Lingua 106 (1998) 197-218
Morgan, J.L., 1994. Converging measures of speech segmentation in prelingual infants. Infant Behavior and Development 17, 387^100. Morgan, J.L. and J.R. Saffran, 1995. Emerging integration of sequential and suprasegmental information in preverbal speech segmentation. Child Development 66, 911-936. Morse, P.A., 1972. The discrimination of speech and nonspeech stimuli in early infancy. Journal of Experimental Child Psychology 13, 477^192. Myers, J., P.W. Jusczyk, D.G. Kemler Nelson, J. Charles Luce, A. Woodward and K. Hirsh-Pasek, 1996. Infants' sensitivity to word boundaries in fluent speech. Journal of Child Language 23, 1-30. Newport, E., 1990. Maturational constraints on language learning. Cognitive Science 14, 11-28. Newport, E., 1991. Contrasting conceptions of the critical period for language. In: S. Carey and R. Gelman (eds.), The epigenesis of mind: Essays on biology and cognition, 111-130. Hillsdale, NJ: Erlbaum. Newsome, M. and P.W. Jusczyk, 1995. Do infants use stress as a cue for segmenting fluent speech? In: D. MacLaughlin and S. McEwen (eds.), Proceedings of the!9th Annual Boston University Conference on Language Development, vol. 2, 415-426. Somerville, MA: Cascadilla. Norris, D.G., J. McQueen, A. Cutler and S. Butterfield, 1998. The possible word constraint in the segmentation of continuous speech. Cognitive Psychology 34, 191-243. Otake, T., G. Hatano, A. Cutler and J. Mehler, 1993. Mora or syllable? Speech segmentation in Japanese. Journal of Memory and Language, 32, 258-278. Polka, L. and J.F. Werker, 1994. Developmental changes in perception of non-native vowel contrasts. Journal of Experimental Psychology: Human Perception and Performance 20, 421^35. Prince, A. and P. Smolensky, 1996. Optimality theory (Technical Report RuCCS-TR-2): Rutgers Cognitive Science Center. Saffran, J.R., R.N. Aslin, and E.L. Newport, 1996. Statistical learning by 8-month-old infants. Science 274,1926-1928. Santelmann, L. M. and P.W. Jusczyk, submitted. Sensitivity to discontinuous dependencies in language learners: Evidence for limitations in processing space. Cognition. Shady, M.E., 1996. Infants' sensitivity to function morphemes. Ph.D. dissertation, State University of New York at Buffalo. Shafer, V., L.A. Gerken, J. Shucard and Shucard D., 1992. The' and the brain: An electrophysiological study of infants' sensitivity to English function morphemes. Paper presented at the Boston University Conference on Language Development, Boston, MA. Snow, C. E. and C.A. Ferguson, (eds.), 1977. Talking to children: Language input and acquisition. Cambridge: Cambridge University Press. Streeter, L.A., 1976. Language perception of 2-month old infants shows effects of both innate mechanisms and experience. Nature 259, 39^41. Suci, G. 1967. The validity of pause as an index of units in language. Journal of Verbal Learning and Verbal Behavior 6, 26-32. Umeda, N. and C.H. Coker, 1974. Allophonic variation in American English. Journal of Phonetics 2, 1-5. Werker, J.F. and C.E. Lalonde, 1988. Cross-language speech perception: Initial capabilities and developmental change. Developmental Psychology 24, 672-683. Werker, J.F. and R.C. Tees, 1984. Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development 7, 49-63. Wexler, K. and P. Culicover, 1980. Formal principles of language acquisition. Cambridge, MA: MIT Press. Whorf, B., 1956. Language, thought and reality. Cambridge, MA: MIT Press.
Lingua ELSEVIER
Lingua 106 (1998) 219-242
Words and rules^ Steven Pinker Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, E10-016, Cambridge, MA 02319, USA
Abstract The vast expressive power of language is made possible by two principles: the arbitrary sound-meaning pairing underlying words, and the discrete combinatorial system underlying grammar. These principles implicate distinct cognitive mechanisms: associative memory and symbol-manipulating rules. The distinction may be seen in the difference between regular inflection (e.g., walk-walked), which is productive and open-ended and hence implicates a rule, and irregular inflection (e.g., come-came, which is idiosyncratic and closed and hence implicates individually memorized words. Nonetheless, two very different theories have attempted to collapse the distinction; generative phonology invokes minor rules to generate irregular as well as regular forms, and connectionism invokes a pattern associator memory to store and retrieve regular as well as irregular forms. I present evidence from three disciplines that supports the traditional word/rule distinction, though with an enriched conception of lexical memory with some of the properties of a pattern-associator. Rules, nonetheless, are distinct from pattern-association, because a rule concatenates a suffix to a symbol for verbs, so it does not require access to memorized verbs or their sound patterns, but applies as the 'default', whenever memory access fails. I present a dozen such circumstances, including novel, unusual-sounding, and rootless and headless derived words, in which people inflect the words regularly (explaining quirks likeflied out, low-lifes, and Walkmans). A comparison of English to other languages shows that contrary to the connectionist account, default suffixation is not due to numerous regular words reinforcing a pattern in associative memory, but to a memory-independent, symbol-concatenating mental operation. Keywords: Linguistic rules; Acquisition; Inflectional morphology; Connectionism; Past tense
1. Words and rules Language fascinates people for many reasons, but for me the most striking property is its vast expressive power. People can sit for hours listening to other people A I am deeply grateful to my collaborators on this project: Alan Prince, Gary Marcus, Michael Ullman, John Kim, Sandeep Prasada, Harald Clahsen, Richard Wiese, Anne Senghas, Fei Xu, and Suzanne Corkin. Preparation of this paper was supported by NIH Grant HD 18381.
0024-3841/997$ - see front matter © 1999 Elsevier Science B.V. All rights reserved PII: 80024-3841(98)00035-7
220
5. Pinker I Lingua 106 (1998) 219-242
make noise as they exhale, because those hisses and squeaks contain information about some message the speaker wishes to convey. The set of messages that can be encoded and decoded through language is, moreover, unfathomably vast; it includes everything from theories of the origin of the universe to the latest twists of a soap opera plot. Accounting for this universal human talent, more impressive than telepathy, is in my mind the primary challenge for the science of language. What is the trick behind our species' ability to cause one another to think specific thoughts by means of the vocal channel? There is not one trick, but two, and they were identified in the 19th century by continental linguists. The first principle was articulated by Ferdinand de Saussure (1960), and lies behind the mental dictionary, a finite list of memorized words. A word is an arbitrary symbol, a connection between a signal and an idea that is shared by all members of a community. The word duck, for example, doesn't look like a duck, walk like a duck or quack like a duck, but we can use it to convey the idea of a duck because we all have, in our developmental history, formed the same connection between the sound and the meaning. Therefore, any of us can convey the idea virtually instantaneously simply by making that noise. The ability depends on speaker and hearer sharing a memory entry for the association, and in caricature that entry might look like this: (1)
N I duck
/dAk/ (bird that quacks) The entry, symbolized by the symbol at the center (here spelled as English 'duck' for convenience), is a three-way association among a sound (/dAk/), a meaning ('bird that quacks'), and a grammatical category ('N' or noun). Though simple, the sheer number of such entries - on the order of 60,000 to 100,000 for an English-speaking adult (Pinker, 1994) - allows for many difference concepts to be expressed in an efficient manner. Of course, we don't just learn individual words. We combine them into strings when we speak, and that leads to the second trick behind language, grammar. The principle behind grammar was articulated by Wilhelm von Humboldt as 'the infinite use of finite media'. Inside everyone's head there is a finite algorithm with the ability to generate an infinite number of potential sentences, each corresponding to a distinct thought. The meaning of a sentence is computed from the meanings of the individual words and the way they are arranged. A fragment of the information used by that computation, again in caricature, might look something like this: (2) S - > N P V P VP-> V (NP) (S) It captures our knowledge that English allows a sentence to be composed of a noun phrase (the subject) and a verb phrase (the predicate), and allows a verb phrase to be
S. Pinker I Lingua 106 (1998) 219-242
221
composed of a verb, a noun phrase (the object), and a sentence (the complement). That pair of rules is recursive: an element is introduced in the right hand side of one rule which also exists as the left hand side of the other rule, creating the possibility of an infinite loop that could generate sentences of any size, such as 'I think that she thinks that he said that I wonder whether ....' This system thereby gives a speaker the ability to put an unlimited number of distinct thoughts into words, and a hearer the ability to interpret the string of words to recover the thoughts. Grammar can express a remarkable range of thoughts because our knowledge of languages resides in an algorithm that combines abstract symbols, such as 'Noun' and 'Verb', as opposed to concrete concepts such as 'man' and 'dog' or 'eater' and 'eaten'. This gives us an ability to talk about all kinds of wild and wonderful ideas. We can talk about a dog biting a man, or, as in the journalist's definition of 'news', a man biting a dog. We can talk about aliens landing at Harvard, or the universe beginning with a big bang, or the ancestors of native Americans immigrating to the continent over a land bridge from Asia during an Ice Age, or Michael Jackson marrying Elvis's daughter. All kinds of unexpected events can be communicated, because our knowledge of language is couched in abstract symbols that can embrace a vast set of concepts and can be combined freely into an even vaster set of propositions. How vast? In principle it is infinite; in practice it can be crudely estimated by assessing the average number of grammatical and sensible word choices possible at each point in a sentence (roughly, 10) and raising it to a power corresponding to the maximum length of a sentence a person is likely to produce and understand, say, 20. The number is 1020 or about a hundred million trillion sentences (Pinker, 1994). Words and rules each have advantages and disadvantages. Compared to the kind of grammatical computation that must be done while generating and interpreting sentences, words are straightforward to acquire, look up, and produce. On the other hand, a word by itself can convey only a finite number of meanings - the ones that are lexicalized in a language - and the word must be uniformly memorized by all the members of a community of speakers to be useful. Grammar, in contrast, allows for an unlimited number of combinations of concepts to be conveyed, including highly abstract or novel combinations. Because grammar is combinatorial, the number of messages grows exponentially with the length of the sentence, and because language is recursive, with unlimited time and memory resources speakers could, in principle, convey an infinite number distinct meanings. On the other hand, by its very nature grammar can produce long and unwieldy strings and requires complex on-line computation, all in service of allowing people to convey extravagantly more messages than they ever would be called upon to do in real life. Given these considerations, a plausible specification of the basic design of human language might run as follows. Language maximizes the distinct advantages of words and rules by comprising both, each handled by a distinct psychological system. There is a lexicon of words for common or idiosyncratic entities; the psychological mechanism designed to handle it is simply a kind of memory. And there is a separate system of combinatorial grammatical rules for novel combinations of entities; the psychological mechanism designed to handle it is symbolic computation.
222
S. Pinker I Lingua 106 (1998) 219-242
How can we test this theory of language design? In particular, how can we distinguish it from an alternative that would say that language consists of a single mechanism that produces outputs of different complexity depending on the complexity of the message that must be conveyed: short, simple outputs for elementary concepts like 'dog', and complex, multi-part outputs for combinations of concepts like 'dog bites man'? According to the word/rule theory, we ought to find a case in which words and rules express the same contents - but they would still be psychologically, and ultimately neurologically, distinguishable. I suggest there is such a case: the contrast between regular and irregular inflection. An example of regular inflection can be found in English past tense forms such as walk-walked, jog-jogged, pat-patted, kiss-kissed, and so on. Nearly all verbs in English are regular, and the class is completely predictable: given a regular verb, its past tense form is completely determinate, the verb stem with the suffix -ed attached.1 The class of regular verbs is open-ended: there are thousands of existing verbs, and hundreds of new ones are added all the time, such as faxed, snarfed, munged, and moshed. Even preschool children, after hearing a novel verb like rick in the laboratory, easily create its regular past tense form, such as ricked (Berko, 1958). Moreover, children demonstrate their productive use of the rule in another way: starting in their twos, they produce errors such as breaked and corned in which they overapply the regular suffix to a verb that does not allow it in standard English. Since they could not have heard such forms from their parents, they must have created them on their own. The predictability and open-ended productivity of the regular pattern suggests that regular past tense forms are generated, when needed, by a mental rule, similar in form to other rules of grammar, such as 'to form the past tense of a verb, add the suffix -ed*: (3) Vpast -> Vstem + d As with other combinatorial products of grammar, regulars would have the advantage of open-endedness, but also the disadvantage of complexity and unwieldiness: some regular forms, such as edited and sixths, are far less pronounceable than simple English verbs. In contrast, English contains about 180 'irregular' verbs that form their past tense in idiosyncratic ways, such as ring-rang, sing-sang, go-went, and think-thought. In contrast with the regulars, the irregulars are unpredictable. The past tense of sink is sank, but the past tense of slink is not slank but slunk; the past tense of think is neither thank nor thunk but thought, and the past tense of blink is neither blank nor blunk nor blought but regular blinked. Also in contrast to the regulars, irregular verbs define a closed class: there are about 180 of them in present-day English, and there have been no recent new ones. And they have a corresponding advantage compared 1
The suffix is pronounced /t/, /d/ and [id] in walked, jogged, and patted, respectively - but these variants represent a predictable phonological alternation that recurs elsewhere in the language. Hence they appear to be the product of a separate process of phonological adjustment applying to a single underlying morpheme, /d/; see Pinker and Prince (1988).
S. Pinker I Lingua 106 (1998) 219-242
223
with the regulars: there are no phonologically unwieldy forms such as edited; all irregulars are monosyllables (or prefixed monosyllabes such as become and overtake} that follow that canonical sound pattern for simple English words. The idiosyncrasy and fixed number of irregular verbs suggests that they are memorized as pairs of ordinary lexical items, linked or annotated to capture the grammatical relationship between one word and the other: (4)
V
V
bring
broughtpast
Finally, the memory and rule components appear to interact in a simple way: if a word can provide its own past tense form from memory, the regular rule is blocked; that is why adults, who know broke, do not say breaked. Elsewhere (by default), the rule applies; that is why children can generate ricked and adults can generate moshed, even if they have never had a prior opportunity to memorize either one. The existence of regular and irregular verbs would thus seem to be an excellent confirmation of the word/rule theory. They are equated for length and complexity (both being single words), for grammatical properties (both being nonfinite forms, with identical syntactic privileges), and meaning (both expressing the pastness of an event or state). But regular verbs bear the hallmark of rule products, whereas irregular verbs bear the hallmark of memorized words, as if the two subsystems of language occasionally competed over the right to express certain meanings, each able to do the job but in a different way. The story could end there were it not for a complicating factor. That factor is the existence of patterns among the irregular verbs: similarities among clusters of irregular verbs in their stems and in their past tense forms. For example, among the irregular verbs one finds keep-kept, sleep-slept, feel-felt, and dream-dreamt; wear-wore, bear-bore, tear-tore, and swear-swore; and string-strung, swing-swung, sting-stung, and fling-flung (see Bybee and Slobin, 1982; Pinker and Prince, 1988). Moreover, these patterns are not just inert resemblances but are occasionally generalized by live human speakers. Children occasionally produce novel form such as bring-brang, bite-bole, and wipe-wope (Bybee and Slobin, 1982). The errors are not very common (about 0.2% of the opportunities), but all children make them (Xu and Pinker, 1995). These generalizations occasionally find a toehold in the language and change its composition. The irregular forms quit and knelt are only a few centuries old, and snuck came into English only about a century ago. The effect is particularly noticeable when one compares dialects of English; many American and British dialects contain forms such as help-holp, drag-drug, and climb-dumb. Finally, the effect can be demonstrated in the laboratory. When college students are given novel verbs such as spling and asked to guess their past tense forms, most offer splang or splung among their answers (Bybee and Moder, 1983). So the irregular forms are not just a set of arbitrary exceptions, memorized individually by rote, and therefore cannot simply be attributed to a lexicon of stored
224
S. Pinker I Lingua 106 (1998) 219-242
items, as in the word-rule theory. Two very different theories have arisen to handle this fact. One is the theory of generative phonology, applied to irregular morphology by Chomsky and Halle (1968) and Halle and Mohanan (1985). In this theory, there are minor rules for the irregular patterns, such as 'change / to a\ similar to the suffixing rule for regular verbs. The rule would explain why ring and rang are so similar - the process creating the past tense form literally takes the stem as input and modifies the vowel, leaving the remainder intact. It also explains why ring-rang displays a pattern similar to sing-sang and sit-sat: a single set of rules is shared by a larger set of verbs. The theory does not, however, account well for the similarities among the verbs undergoing a given rule, such as string, sting, fling, cling, sling, and so on. On the one hand, if the verbs in this subclass are listed in memory and the rule is stipulated to apply only to the verbs on the list, it is a mysterious coincidence that the verbs on the list are so similar to one another in their onsets (consonant clusters such as st, si, fl, and so on) and in their codas (the velar nasal consonant ng). In principal, the verbs could have shared nothing but the vowel / that is replaced by the rule. On the other hand, if the phonological pattern common to the stems in a subclass is distilled out and appended to the rule as a condition, then the wrong verbs will be picked out. Take the putative minor rule replacing / by A, which applies to the sting verbs, the most cohesive irregular subclass in English. That rule could be stated as 'Change / to h if and only if the stem has the pattern 'Consonant - Consonant - / - velar-nasalConsonant.' Such a rule would falsely include bring-brought and spring-sprang, which do not change their vowels to A, and would falsely exclude stick-stuck (which does change to i\ even though its final consonant is velar but not nasal) and spin-spun (which also changes, even though its final consonant is nasal but not velar). The problem is that the irregular subclasses are family resemblance categories in the sense of Ludwig Wittgenstein and Eleanor Rosch, characterized by statistical patterns of shared features rather than by necessary and sufficient characteristics (Bybee and Slobin, 1982). While generative phonology extends a mechanism suitable to regulars - a rule to capture irregular forms, the theory of Parallel Distributed Processing or Connectionism does the opposite, and extends a mechanism suitable to irregulars - memory - to capture various degrees of regularity. The key idea is to make memory more powerful. Rather than linking an item to an item, one links the features of an item to the features of another item. Similar items, which share features, are partly superimposed in the memory representation, allowing the common patterns to reinforce each other, and new items that are similar to learned items will activate the shared features and hence inherit the patterns that have been learned previously, allowing for a kind of generalization. Rumelhart and McClelland (1986) used this principal, which dates back at least to the British associationists' Law of Similarity, to devise a connectionist 'pattern associator' memory, with the following major components. The model has a bank of input nodes, each representing a bit of sound of an input stem such as 'vowel between two consonants' or 'stop consonant at the end of a word'. It also has an identical bank of output units representing the past tense form. Every input node is
5. Pinker I Lingua 106 (1998) 219-242
225
connected to every output node. A verb is presented to the model by first dissolving it into its phonological features and turning on the subset of input nodes that corresponds to the features of the word. These nodes pass activation along the connections to the output nodes, raising their activation to varying degrees. The past tense form is computed as the word that best fits the active output nodes. The activation that is transmitted along the connections depends on the 'strengths' of the connections, and these connections are altered gradually during a training phase. Training consists of presenting the network with stems and their correct past tense forms; the connection strengths change gradually to capture the correlations among stem features and past features, averaged over the different stems and pasts in the training set. For example, the connection between the features in ing and the features in ung would be strengthened by ding-clung, string-strung, stick-stuck, and so on. Since connections are shared by any verb with given stem features, the trained model can generalize to new verbs according to their similarity to previously trained verbs and according to the strength of the connections from the shared features. The model generalizes because similar verbs are represented in overlapping sets of nodes, so any connection that is trained for one verb is automatically activated by a similar verb. In that way the pattern associator memory, implemented as a computer program, succeeded in learning several hundred regular and irregular verbs, and in generalizing with moderate success to new ones. It did so without any representations specific to words or specific to rules, using instead a single mechanism to handle regular and irregular forms. Subsequent connectionist models (e.g., Plunkett and Marchman, 1991; MacWhinney and Leinbach, 1991; Hoeffner, 1992; Hahn and Nakisa, 1998) differ in their details but share the assumption of a single pattern associator memory for regular and irregular forms, driven by input phonological patterns and generalizing according to phonological similarity. In this paper I will present evidence that neither of these alternatives to the word-rule theory is called for by the facts of regular and irregular inflection. I will argue for a modified version of the word-rule theory in which irregular forms are still words, stored in memory. Memory, however, is not just a list of unrelated slots, but is partly associative: features are linked to features (as in the connectionist pattern associators), as well as words being linked to words. By this means, irregular verbs are predicted to show the kinds of associative effects that are wellmodeled by pattern associators: families of similar irregular verbs are easier to store and recall (because similar verbs repeatedly strengthen a single set of connections for their overlapping material), and people are occasionally prone to generalize irregular patterns to new verbs similar to known ones displaying that pattern (because the new verbs contain features that have been associated with existing irregular families). On the other hand, I will argue that regular verbs are generated by a standard symbol-concatenation rule. I will present evidence that whereas irregular inflection is inherently linked to memorized words or forms similar to them, regular inflection can apply to any word, regardless of its memory status. That implies that regular inflection is computed by a memory operation that does not need access to the contents of memory, specifically, a symbol-processing operation or rule, which applies to
226
S. Pinker I Lingua 106 (1998) 219-242
any instance of the symbol 'verb'. The evidence will consist of a dozen circumstances in which memorized forms are not accessed, for one reason or another, and in which as a consequence irregular inflection is suppressed and regular inflection is applied. 2. Circumstances in which memory is not accessed but regular inflection applies A. Weak memory entry (rare word) The first circumstance of compromised memory access comes out of the fact that human memory traces generally become stronger with repeated exposure. Thus if a word is rare, its entry in the mental lexicon will be weaker. The prediction of the modified word-rule theory is that irregular inflection will suffer, but regular inflection will not. Several effects of frequency bear this out. One is the statistical structure of the English language. Here is a list of the ten most frequent verbs in English, in order of decreasing frequency in a million-word corpus (Francis and Kucera, 1982):
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
be have do say make go take come see get
39175/million 12458 4367 2765 2312 1844 1575 1561 1513 1486
Note that all ten are irregular. Compare now the least frequent verbs in the corpus, the verbs tied for last with a frequency of one occurrence per million words. The first ten in an alphabetical ordering of them are: 3791. abate 3791. abbreviate 3791. abhor 3791. ablate 3791. abridge 3791. abrogate 3791. acclimatize 3791. acculturate 3791. admix 3791. adsorb
I/million 1 1 1 1 1 1 1 1 1
S. Pinker I Lingua 106 (1998) 219-242
227
Note that all ten are regular, as are 98.2% of the rest of the list. (The remainder comprise one irregular root, smite-smote, and sixteen low-frequency prefixed versions of high-frequency irregular roots, such as bethink, outdraw, and regrind.) As this comparison shows, there is a massive correlation, in English and most other languages, between token frequency and irregularity. A simple explanation is that irregular forms (but not regular forms) have to be memorized repeatedly, generation after generation, to survive in a language, and that the commonly heard forms are the easiest to memorize. If an irregular verb declines in popularity, at some point a generation of children will fail to hear its past tense form often enough to remember it. Since the regular rule can apply to an item regardless of its frequency, that generation will apply the regular suffix to that verb, and it will be converted into a regular verb for that and all subsequent generations. Joan Bybee (1985) has gathered evidence for this conjecture. Old English contained about twice as many strong irregular verbs as Modern English, including now-obsolete forms such as cleave-clove, crow-crew, abide-abode, chide-chid, and geld-gelt. Bybee examined the current frequencies of the surviving descendants of the irregulars in Old English and found that it was the low-frequency verbs that were converted to regular forms over the centuries. Today one can actually feel the psychological cause of this historical change by considering past tense forms that are low in frequency. Most low-frequency irregulars sound stilted or strange, such as smite-smote, slay-slew, bid-bade, spell-spelt, and tread-trod (in American English), and one can predict that they will eventually go the way of chid and crew. In some cases a form is familiar enough to block the regular version, but not quite familiar enough to sound natural, and speakers are left with no clear choice for that slot in the conjugational paradigm. For example, many speakers report that neither of the past participle forms for stride, strided and stridden, sounds quite right. In contrast, low-frequency regular past tense forms always sound perfectly natural (or at least no more unnatural than the stems themselves). No one has trouble with the preterite or participle of abate-abated, abrogate-abrogated, and so on. An excellent demonstration of this effect comes from cliches, idioms, and collocations, which are often used in a characteristic tense, allowing one to unconfound the familiarity of the verb with the unfamiliarity of a past tense form of the verb. A verb may be familiar in such a collocation, but only in one tense (say, present or infinitival); by shifting the tense, one can test judgments of a rare tensed form of a common verb. For irregular collocations, the result can often sound strange: (5) You will excuse me if I forgo the pleasure of reading your paper before it's published. *?Last night I forwent the pleasure of grading student papers.2 I don't know how she can bear that guy. *?I don't know how she bore that guy. Example from Jane Grimshaw.
228
5. Pinker I Lingua 106 (1998) 219-242
I dig The Doors, man! ??In the 60's, your mother and I dug The Doors, son. That dress really becomes her. *But her old dress became her even more. In contrast, expressions containing regular verbs that are habitually used in an infinitival form do not sound any stranger when put in the past tense; they sound exactly as good or bad as their stems. The following examples are common as stems and rare as past tense forms (because they typically occur in negations), but the past tense forms are unexceptionable: (6) We can't afford it. I don't know how he afforded it. She doesn't suffer fools gladly. None of them ever suffered fools gladly. Michael Ullman (1993) has confirmed these claims quantitatively in several questionnaire studies in which subject were asked to rate the naturalness of several hundred regular and irregular forms spanning large ranges in the frequencies of their stem and past tense forms. He found that ratings of regular pasts correlate significantly (0.62) with their stems but do not correlate (-0.14) with their frequencies (partialing out the ratings of the stem). That is what we would expect if judgments of regulars do not depend on the frequency of any stored past tense forms but are based on forms that may be computed there and then with the rule. The naturalness of the past tense form would instead be inherited from the naturalness of the stem, which is the input to the past tense rule. In contrast, ratings of irregular past tense forms correlated less strongly (0.32), though still significantly, with their stems, and did correlate significantly (0.35) with their frequencies (again, partialing out the ratings of stem). This is what we would expect if the familiarity of an irregular form depended both on how familiar the verb is and how well one remembers its past tense form in particular. B. Difficult-to-analogize (unusual-sounding) verbs Many connectionist modelers have pointed out that pattern associator memories can generalize to rare or new verbs based on their similarity to well-learned verbs and on the strength of the connections between the familiar sound patterns and their characteristic past tense forms (see Marcus et al., 1995, for quotations). People do exactly that for irregular verbs, generalizing an irregular pattern to a new verb if it is highly similar to an existing family of irregular words. However, people treat regular verbs differently; they apply the regular suffix to any new word, regardless of its sound. Recall that Bybee and Moder (1983) found that subjects generalized irregular patterns to novel stems that were highly similar to existing verbs exemplifying that pattern. The effect depended strongly on the degree of similarity. For forms such as spling, which is highly similar in onset, vowel, and coda to existing verbs such as
S. Pinker I Lingua 106 (1998) 219-242
229
cling, fling, sling, slink, stink, and string, about 80% of the subjects provided irregularly inflected forms such as splang or splung. For forms such as krink, which is similar only in vowel and coda, about 50% of the subjects provided krank or krunk, and for forms such as vin, which shares only a vowel with existing irregulars, only about 20% provided van or vun - a classic generalization gradient. Sandeep Prasada and I replicated these three conditions in Bybee's experiment and added three parallel conditions involving novel verbs likely to receive regular inflection (Prasada and Pinker, 1993). Prototypical verbs like plip sound like many existing English regular verbs, such as clip, grip, slip, snip, strip, and trip. Intermediate verbs like smaig do not rhyme with any existing English verb root, and unusual verbs like ploamph are phonologically illicit in English and hence are very dissimilar to existing verbs. We presented the six classes of verbs to subjects and to the trained Rumelhart-McClelland pattern associator model. For the irregular verbs, the model did a reasonable impersonation of the human beings, showing a generalization gradient in which only the novel verbs that sounded like existing irregular verbs were readily given irregular past tense forms. For the regular verbs, on the other hand, the model and the human diverged. People provided regular forms for unusual-sounding ploamph at virtually the same rate with which they provided regular forms for familiar-sounding plip. The pattern associator, in contrast, had little trouble with plipped but was unable to generate forms such as ploamphed. Instead, it produced various blends and random combinations such as smairf-sprurice, trilb-treelilt, smeej-leefloag, and frilg-freezled. The failure is instructive. Pattern associator models, unlike symbol-processing computational architectures, do not have the mechanism of a variable, such as 'Verb', that can stand for an entire class regardless of its content and that can thereby copy over the phonological material of a stem so that it can be systematically modified to yield a past tense form. Instead the model must be painstakingly trained with items exemplifying every input feature (Marcus, 1999). When a new item exemplifying novel combinations of features is presented, the model cannot automatically copy over that combination; it activates whatever output features are most strongly connected to the features of the new item and generates a chimerical output form from them. (For more technical discussion on this limitation of pattern associator models, see Prasada and Pinker, 1993; Marcus et al., 1992; Marcus et al., 1995; Marcus, 1999.) C-J. Information about the irregular form is trapped in memory because of the word's grammatical structure Linguists, both professional and amateur, have long noted the phenomenon of systematic regularization: some irregular forms mysteriously call for a regular inflection form in certain contexts. Here are three examples: (7) All my daughter's friends are low-lifes (*low-lives). I'm sick of dealing with all the Mickey Mouses in this administration (*Mickey Mice). Boggs has singled, tripled, and flied out (*flown out) in the game so far.
230
S. Pinker I Lingua 106 (1998) 219-242
The phenomenon immediately shows that sound alone cannot be the input to the inflectional system. In the last example, a given sound, such as fly, can be mapped onto flew and flown when referring to birds butflied when referring to baseball players. The question is: what is the extra input causing the shift? Many psychologists, connectionists, and prescriptive grammarians have suggested that the missing input features are semantic: when a verb is given a new extended or metaphorical meaning, extra features for that meaning are activated, making the new verb less similar to its predecessor and decreasing activation of the associated irregular form (see Kim et al., 1991; Kim et al., 1994, for more precise versions of this explanation, and detailed discussion of its problems). This explanation, however, is clearly false; most semantic modifications in fact leave an irregular verb's inflectional forms intact. Here are some examples: (8) Prefixing: overate/*overeated, overshot/*overshooted, outdid/*outdoed, preshrank/*pre-shrinked. Compounding: workmen/*workmans, superwomen/*superwomans, muskoxen/ *muskoxes, stepchildren/*stepchilds, milkteeth/*milktooths. Metaphor: straw men/*mans, snowmen/*snowmans, God's children/*childs, sawteeth/*sawtooths, six feet/*foots long. Idiom: cut/*cutted a deal, took/*taked a leak, bought/*buyed the farm, caught/ *catched a cold, hit/*hitted the fan, blew/*blowed him off, put/*putted him down, came/*comed off well, went/*goed nuts. These examples show that merely adding semantic features to a pattern associator, in the hopes that the resulting unfamiliarity of new combinations will inhibit highly trained irregular responses, is unlikely to handle the phenomenon. An explanation that does work comes (with modification) from the linguists Paul Kiparsky (1982) and Edwin Williams (1981): headless words become regular. As with grammar in general, the syntax of words encompasses a scheme by which the properties of a novel combination can be predicted from the properties of its parts and the way they are combined. Consider the verb overeat. It is based on the verb root eat: (9) V I eat
The root is then joined with a prefix, yielding the following structure:
(10)
V
V I eat
prefix I over-
V I eat
5. Pinker I Lingua 106 (1998) 219-242
231
The result is a new word that has inherited its properties from the properties of the rightmost morpheme inside it, in this case, eat. What syntactic category (part of speech) is overeat? It is a verb, just as eat is a verb. What does overeat mean? It refers to a kind of eating - eating too much - just as eat refers to eating. And what is its past tense form? Overate, not overeated, just as the past tense form of eat is ate, not eated. It appears that a new complex word inherits its properties from the properties that are stored in the memory entry of the rightmost morpheme - the 'head' - including any irregular forms. The pipeline of information from the memory entry of the head at the bottom of the tree to the newly created complex lexical item symbolized by the node at the top of the tree can be schematized as follows:
(11)
X / \ X
/ \ X
This principle, sometimes called the right-hand head rule, explains all of the preserved irregular forms presented above. For example, the compound noun workman is formed by prefixing the noun man by the verb work:
(12) N I man
Once again, its properties are inherited from man, its rightmost morpheme or head. Workman is a noun, just as man is a noun; it refers to a kind of man, just as man refers to a man, and its plural form is irregular workmen, because the plural form of man is irregular men. Now here is the explanation for systematic regularization. Some complex words are exceptional in being headless. That is, they don't get their properties - such as grammatical category or referent - from their rightmost morpheme. The normal right-hand-head rule must be turned off for the word to be interpreted and used properly. As a result, the mechanism that ordinarily retrieves stored information from the word's root is inactive, and any irregular form stored with the root is trapped in memory, unable to be passed upward to apply to the whole word. The regular rule, acting as the default, steps in to supply the complex word with a past tense form, undeterred by the fact that the sound of the word ordinarily would call for an irregular form. Here is how the explanation works for one class of regularizations, compounds whose referent has rather than is an example of the referent of the rightmost morpheme. For example, a low-life is not a kind of life, but a kind of person, namely, a person who has or leads a low life. For it to have that meaning, the right-hand head
232
S. Pinker I Lingua 106 (1998) 219-242
rule, which would ordinarily make low-life mean a kind of life (the semantic information stored in memory with life), must be abrogated. With the usual data pipeline to the memory entry for the head disabled, there is no way for the other information stored with life to be passed upward either, such as the fact that it has an irregular plural form, lives. With the irregular plural unavailable, the regular -s rule steps in, and we get low-lifes.
Similar logic explains regularized forms such as still lifes (a kind of painting, not a kind of life), saber-tooths (a kind of cat, not a kind of tooth), flatfoots (policemen, not feet), bigmouths (not a kind of mouth but a person who has a big mouth), and Walkmans (not a kind of man, but a 'personal stereo'). This effect has been demonstrated in experiments in which four- to eight-year-old children are presented with 'has-a' compounds, such as snaggletooth, and asked to pluralize them. They provide regular plurals at a significantly higher rate than when they pluralize ordinarily novel compound nouns with irregular roots (Kim et al., 1994). The same explanation works for a second class of regularizations, eponyms. We hear Mickey Mouses because the ordinary noun mouse was converted to a distinct syntactic category, that for names, when Walt Disney christened his animated murine hero. (Names are syntactically distinct from common nouns, and hence must bear a different lexical category label, possibly 'NP', possibly a category specific to proper names. For present purposes it suffices to symbolize that category simply as 'Name'. See Marcus et al., 1995, for more detailed discussion.) Then in colloquial speech the name Mickey Mouse was converted back into a common noun, a Mickey Mouse, referring to a simpleton:
(14)
N
name N I mouse
name N I I Mickey mouse
name name N I I Mickey mouse
The new noun is headless, because the right-hand-head rule had to be turned off twice - to convert the noun mouse into a name, and then to convert the name back into a noun, both violations of the usual upward-copying process. With that process disabled, the irregular plural mice remains unexamined in the lexicon, and the regu-
S. Pinker I Lingua 106 (1998) 219-242
233
lar suffixation rule fills the vacuum and yields Mickey Mouses. The same explanation works for most other pluralized irregular-sounding eponyms:3 (15) The Toronto Maple Leafs/*Leaves (a hockey team named after Canada's national symbol, The Maple Leaf). Renault Elfs/*Elves (cars). Michael Keaton starred in both Batmans/* Batmen (movie titles). We're having Julia Child and her husband over for dinner. You know, the Childs/*Children are really great cooks. As before, this effect has been shown to work in the language production of children (Kim et al., 1994). The explanation works in the verb system as well, in the class of regularized past tense and past participle forms of denominal verbs: verbs that have been formed out of nouns. In baseball, the verb to fly was converted over a century ago into a noun, a fly, referring to a high arcing ball. The noun was then converted back into a verb, to fly out, meaning 'to hit a fly that is caught'.
(16)
V
N I V
V I N I V
fly
fly
fly
I
I
I
The verb root to fly is thus sealed off from the derived verb to fly out at two layers of the structure, the one that converted the verb root to a noun (i.e., failed to copy upwards the information that the root's category is 'verb'), and the one that converted the noun back into a verb. Among baseball cognoscenti who can sense the fly ball in flying out, the irregular forms flew and flown are unable to climb out of the lexical entry for fly, and -ed applies as the last resort, yielding flied out. The same explanation works for other denominals, such as high-stickedl*high-stuck (hit with a high stick, in hockey), grandstanded!*grandstood (played to the grandstand), and ringed/*rang the city (formed a ring around). This regularization process can be documented experimentally in adults' and children's attempts to form past tenses of new verbs (Kim et al., 1991, 1994; see those papers, too, for explanations of apparent counterexamples to this principle). Similar explanations may be applied to four other kinds of rootless or headless derivation; for a full explanation, see Marcus et al. (1995):
There are some apparent exceptions, discussed in Marcus et al. (1994); Kim et al. (1994).
234
S. Pinker I Lingua 106 (1998) 219-242
(17) Onomatopoeia: The engine pinged/*pang; My car got dinged/*dang. Quotations: While checking for sexist writing, I found three 'man's/*'men' on page 1. Foreign Borrowing: succumbed/*succame; derided/*derode; chiefs/*chieves; gulfs/*gulves (all borrowed from French or Latin). Artificial concoctions (truncations, acronyms): lip-synched/*lip-sanch (from synchronize); Ox's/*Ox-en (hypothetical abbreviation for containers of oxygen). However, though regular forms can appear in many contexts that are closed to irregulars, there is one circumstance in which the reverse is true: inside compound words. An apartment infested with mice may be called mice-infested (irregular plural inside a compound), but an apartment infested with rats is called not * ratsinfested (regular plural inside compound) but rat-infested (singular form inside compound), even though by definition one rat does not constitute an infestation. Note that there is no semantic difference between mice and rats that could account for the grammaticality difference; it is a consequence of sheer irregularity. Similar contrasts include teethmarks versus *clawsmarks, men-bashing versus *guys-bashing, and purple-people-eater versus *purple-babies-eater. In experiments in which subjects must rate the naturalness of novel compounds, Anne Senghas, John Kim and I (Senghas et al.,, 1991) have found that people reliably prefer compounds with irregular plurals, such as geese-feeder, over compounds with regular plurals, such as ducks-feeder, and that the effect is not a by-product of some confounded semantic, morphological, or phonological difference between regular and irregular plurals. A simple explanation, based loosely on Kiparsky (1982), might run as follows. Morphological composition of words takes place in several stages. First there is a lexicon of memorized roots, including, according to the word/rule theory, irregular forms. That lexicon supplies the input to rules of regular derivational morphology, which creates complex words (including compounds) out of simple words and morphemes, outputting a stem. Stems are then inputted to a third component, regular inflection, which modifies the word according to its role in the sentence. In simplified form, the architecture of morphology would look like this: (18) Memorized roots (including irregulars)
—>
Complex word formation
—>
Regular inflection
The word mice, stored as a root in the first component, is available as an input to the compounding process in the second component, where it is joined to infested to yield mice-infested. In contrast, rats is not stored as a memorized root in the first component; it is formed from rat by an inflectional rule in the third component, too late to be inputted to the compounding rule in the second. Hence we get rat-infested but not rats-infested.
S. Pinker I Lingua 106 (1998) 219-242
235
Gordon (1985) showed that 3-5-year-old children are sensitive to this principle. He asked them questions such as, 'Here is a monster who likes to eat X. What would you call him?' First he trained them on mass nouns such as mud, which don't take a plural, to introduce them to the compound construction, in this case mud-eater, without biasing their subsequent answers. Then he tested them by asking what they would call a monster who likes to eat rats. The children virtually always said rateater, not rats-eater. In contrast, they frequently called a monster who likes to eat mice a mice-eater - and those children who occasionally used the overregularized plural mouses in other contexts never used it in a plural such as mouses-eater. In an interesting twist, Gordon checked to see whether children had had an opportunity to learn the distinction by noticing irregular-plural-containing-compounds in their parents' speech, such as teethmarks, and simultaneously noticing the absence of regular-plural-containing-compounds in their parents' speech, such as clawsmarks. He found that neither kind of plural is common enough in English for children to have reliably heard them; virtually all commonly used compounds take a singular first noun, such as toothbrush. Therefore children's sensitivity to the teethmarks/clawsmarks distinction is likely to be a product of the innate architecture of their language system, not a product of a tabulation of occurring and non-occurring forms in parental speech. K. Childhood Let us return now to the circumstances of impeded memory access and their differential effects on regular and irregular inflection. Recall that children, in their third year, begin to overregularize irregular verbs in errors such as corned, holded, and bringed. Many explanations have been offered in the forty years since these errors have been called to the attention of modern psycholinguists, most portraying the child as a relentless pattern-extractor and regularity-imposer. But these theories founder on the fact that children make these errors in a minority of the opportunities to do so. On average, about 95 percent of children's irregular past tense forms are correct. Gary Marcus and I (Marcus et al., 1992) proposed the simplest conceivable explanation. The most basic and uncontroversial assumption one can make about children is that they have not lived as long as adults (that is what the word 'child' means). Among the experiences that one accumulates over the years is hearing the past tense forms of irregular verbs. If children have not heard an irregular form sufficiently often, its memory trace will be weaker than the corresponding trace in adults, and they will retrieve it less reliably and with less confidence (just as adults are less confident with seldom-heard irregular past tense forms such as smote). If the child is at the age at which he or she has acquired the regular past tense rule, then the child will fill the gap by applying the rule to the regular, resulting in an overregularization. Marcus, Ullman, and I gathered several kinds of evidence for this simple account. One example is a reliable effect of frequency: the more often the parent of a child uses an irregular form, the less often that child overregularizes it. This effect held for all nineteen of the children whose speech we examined, with a mean correlation
236
S. Pinker I Lingua 106 (1998) 219-242
coefficient of -0.33. This is expected if overregularization is an effect of insufficiently reinforced memory entries for irregular forms, and disappears as children hear those forms more and more often and remember them more and more reliably. A second kind of evidence is a simple explanation of the long-noted phenomenon of 'U-shaped development', in which children, for several months, use only correct irregular past tense forms (when they overtly mark the past tense of such verbs at all) before producing their first error. Rumelhart and McClelland had suggested that the onset of overregularization was caused by an increase in the proportion of regular verbs in the child's vocabulary, which would provide the child's pattern-associator with a sudden abundance of inputs strengthening the connections for the regular pattern and temporarily overwhelming the connections implementing each of the irregular patterns. But the evidence is inconsistent with that hypothesis. The proportion of regular tokens remains unchanged in the parental speech directed at children, rather than increasing. The proportion of regular types in the child's vocabulary does increase (as it must, since there is a fixed number of irregulars but an open-ended set of regulars), but not at the right times for the Rumelhard-McClelland hypothesis: there is a negative correlation, not a positive one, over time between children's rate of acquiring new regular verbs and their rate of overregularizing irregular verbs. Instead, we found that the onset of overregularization errors is best predicted by mastery of the regular rule. Before the first error, children leave regular verbs unmarked most of the time (e.g., Yesterday we walk); then there is a a transitional phase in which the child begins to mark these verbs most of the time (e.g., Yesterday we walked). It is in this transitional phase that the first overregularization of an irregular form is produced. We argue that the tandem development of walked and breaked comes from a single underlying process, the acquisition of the 'add -ed' rule, which manifests itself in correct performance where the rule is called for and errors where it is not. Prior to the acquisition of the rule, a child who failed to retrieve broke had no choice but to leave it unmarked, as in Yesterday he break it; once the child possesses the rule, he or she can mark tense even when memory fails, though the form is incorrect by adult standards. L, M. Disorders of word retrieval in the presence of intact grammar The final and most direct demonstrations that memory impairment specially affects irregular forms comes from studies of neurological patients whose memory or grammatical systems have been differentially disrupted. Ullman and our collaborators presented a number of patients with a battery of past tense elicitation items of the form, 'Everyday I like to verb. Yesterday I '. The verbs were regular, irregular, or novel, and the regular and irregular verbs were equated for frequency and, for a subset of them, pronounceability (e.g., irregular slept is similar to regular slapped; regular tried is similar to irregular bred). The prediction is that patients who are more impaired on vocabulary retrieval than on grammatical combination should (1) find irregular forms harder to produce than regular ones, (2) should occasionally produce overregularized forms such as swimmed (for the same reason that children do), and (3) should have little trouble producing past tense forms for novel
S. Pinker I Lingua 106 (1998) 219-242
237
verbs such as plammed. Conversely, patients who are more impaired on grammatical combination than on vocabulary retrieval should (1) find regular forms to be harder to produce than irregular ones, (2) should rarely produce overregularized forms, and (3) should have grave difficulty producing past tense forms for novel verbs (Ullman et al., 1997). In one case study, we tested a patient with anomic aphasia: following damage to left posterior perisylvian regions, he had severe difficulty retrieving words, though his speech was fluent and grammatical. Presumably the subsystem serving vocabulary storage or retrieval was more damaged than the subsystem serving grammatical composition. As predicted, he found irregular verbs harder to inflect than regular verbs (60% vs 89%), made frequent overregularization errors (25% of the opportunities), and was fairly good with novel verbs (84%). In a control case study, we tested a patient with agrammatic aphasia: following damage to left anterior perisylvian regions, he had severe difficulty combining words into fluent sentences, but was less impaired at retrieving words. Presumably the subsystem serving grammatical composition was more damaged than the subsystem serving word retrieval. As predicted, he found regular verbs harder to inflect than irregular verbs (20% versus 69%), made no overregularization errors, and was poor at inflecting novel verbs (5%). Similar findings have been obtained by other researchers (Caramazza and Badecker, 1991; Marin et al., 1976). We found a similar double dissociation when testing patients with neurodegenerative diseases. In Alzheimer's Disease, the most obvious symptom is an impairment in memory, including word retrieval, but in many cases the patients can produce relatively fluent and grammatical speech; this dissociation is thought to be caused by greater degeneration in medial-temporal and temporal-posterior regions of the brain than in the frontal regions. That would predict that these patients would behave like anomic aphasics when producing past tense forms, and indeed they do: the anomic Alzheimer's Disease patients we tested had more trouble producing irregular forms than regular forms (60% versus 89%), made frequent overregularization errors (27%), and were relatively successful in providing past tense forms for novel verbs (84%). The opposite pattern was predicted to occur in Parkinsons' Disease. As a result of degeneration in the basal ganglia, which form an essential part of a circuit involving frontal cortex, Parkinson's Disease patients have many of the symptoms of agrammatism, and less severe impairments in retrieving words from memory. As predicted, the more impaired patients we tested had more trouble with regular verbs than with irregulars (80% versus 88%), had trouble inflecting novel verbs (65%), and never produced overregularization errors. 3. A Crosslinguistic Validation All of these comparisons are tainted by a possible confound. An additional property differentiating regular from irregular verbs in English is type frequency: regular verbs are the majority in English. Only about 180 verbs in modern English are irregular, alongside several thousand regular verbs. Since pattern associators generalize
238
S. Pinker I Lingua 106 (1998) 219-242
the majority pattern most strongly, it is conceivable that a pattern associator that was suitably augmented to handle grammatical structure would have the regular pattern strongly reinforced by the many regular verbs in the input, and would come to generalize it most strongly, perhaps in all of the default circumstances I have reviewed. This is a charitable assumption - taken literally, theories invoking pattern associators are driven by tokens rather than types: the models are said to be learn in response to actual utterances of verbs, in numbers reflecting their frequencies of usage, rather than in response to vocabulary entries, inputted once for each verb regardless of its frequency of usage. Moreover, no pattern associator model yet proposed has plausibly handled the various grammatical circumstances involving headlessness (filed out, Mickey Mouses, and so on) in which irregular forms systematically regularize. But many connectionist researchers have held out the greater type frequency of regular verbs in English as the main loophole by which future pattern associators might account for the psycholinguistic facts reviewed herein (see Marcus et al., 1995, for quotations). To seal the case for the word/rule theory it would be ideal to find a language in which the regular (default) rule applies to a minority of forms in the language. Note that this prediction is an oxymoron according to the traditional, descriptive definition of 'regular' as pertaining to the most frequent inflectional form in a language and 'irregular' to pertain to the less frequent forms. But I am considering a psycholinguistic definition of 'regular' as the default operation produced by a rule of grammatical composition and 'irregular' as a form that must be specially stored in memory; the number of words of each kind in the language plays no part in this definition. One language that displays this profile is German. The past tense is expressed in everyday speech by participles, which come in three forms: strong (involving a vowel change and the suffix -en), mixed (involving a vowel change and the suffix -t), and weak (involving the suffix -t). The weak forms are analogous (and historically homologous) to English regular verbs. The plural is even more complex, coming in eight forms: four plural suffixes (-e, -er, -en, -s, and no suffix), some of which can co-occur with an altered (umlauted) stem vowel. The form that acts most clearly as the default, analogous to English -s, is -s. This complexity, and various differences in the histories of the two languages, allow us to dissociate grammatical regularity from type frequency (see Marcus et al., 1995, for a far more extensive analysis). Compare, for example, English -ed German -t, both 'regular' by our definition. Among the thousand most frequent verb types in the languages, approximately 85% of those in English are regular, compared to approximately 45% of those in German. With larger samples of verbs, the gap narrows, but English always shows the higher proportion (see Marcus et al., 1995). But despite regular forms being in a large majority in English and a slight minority in German among the most commonly used verbs, speakers treat them alike. English speakers apply -ed to rare verbs, such as ablated; German speakers apply -t to rare verbs, such as geloetet ('welded'). English speakers apply ed to unusual-sounding verbs such as ploamphed; German speakers apply -t to unusual sounding verbs such as geplaupft. English speakers apply -ed to onomatopoeic forms such as dinged; German speakers apply -t to onomatopoeic
S. Pinker I Lingua 106 (1998) 219-242
239
forms such as gebrummt ('growled'). English speakers regularize irregular-sounding verbs derived from nouns, such as, filed out; so do German speakers, in forms such as gehaust ('housed'). And just as English-speaking children overregularize irregular verbs in errors such as singed, German-speaking children produce corresponding errors, such as gesingt. Plurals provide an even more dramatic comparison. In English, -s is applied to more than 99% of all nouns; in German, -s is applied to only about 7% of nouns. Despite this enormous difference, the two suffixes behave quite similarly across different circumstances of generalization. In both languages, the -s suffix is applied to unusual-sounding nouns (ploamphs in English, Plaupfs in German) and to names that are homophonous with irregular nouns (the Julia Childs, die Thomas Manns). The suffix is applied, in both languages, to irregular-sounding eponyms (Batmans, Fausts) and product names (such as the automobile models Elfs, in English, and Kadetts, in German). The suffix is also applied in both languages to foreignisms, such as English chiefs, borrowed from French, and German Cafes, also borrowed from French. Regular suffixes are applied to truncations, such as synched in English and Sozis and Nazis (from Sozialisi) in German. Similarly, in both languages the suffix is applied to quotations: three 'man's in English, drei 'Mann's in German. Despite the relatively few nouns in German speech taking an -s-plural, Germanspeaking children frequently overregularize the suffix in errors such as Manns, analogous to English-speaking children's mans. Intriguingly, even the circumstance that tends to rule out regular plurals in English, namely the first word in a compound, has a similar effect in German: just as we dislike rats-eater, German speakers dislike Autos-fresser ('cars-eater'). Many of these effects have been corroborated in experiments with German-speaking adults (Marcus et al, 1995) or children (Clahsen and Rothweiler, 1992; Clahsen et al., 1993a; Clahsen et al., 1993b), and they have been shown in other languages as well, such as Arabic (McCarthy and Prince, 1990). These results, combined with a glance at the history of the two languages, provide an interesting insight into why regular words form the majority of types in many languages (though not German). In proto-Germanic, the ancestor of English and German, a majority of verbs were strong, the forerunners of today's irregular verbs. There was also a precursor of the weak -edl-t suffix: the 'dental suffix', perhaps a reduced form of the verb do, which applied to borrowings from other languages and to derived forms, just as it does today. As it happens, the major growth areas in English verb vocabulary over the subsequent centuries were in just these areas. English borrowed rampantly from French (because of the Norman invasion in 1066) and from Latin (because of the influences of the Church and Renaissance scholars); I have estimated that about 60% of English verb roots came from these two languages. English is also notorious for the degree to which nouns can be freely converted to verbs; approximately 20% of our verbs our denominal (Prasada and Pinker, 1993). Intriguingly, both kinds of verbs, once introduced into the language, had to be regular on grammatical grounds, because they are rootless and headless. So the standard connectionist account of the correlation between type frequency and regularity may have it backwards. It is not the case that a majority of English verbs are regular, and that causes English-speakers to use the regular suffix as the default. Instead,
240
S. Pinker I Lingua 106 (1998) 219-242
English-speakers and their linguistic ancestors have used the regular suffix as the default for millennia, and that is why the majority of today's English verbs became regular. German, which did not experience a centuries-long domination by a Frenchspeaking elite, and which does not convert nouns to verbs as freely, retained a frequency distribution closer to the ancestral language (see Marcus et al., 1995, for more discussion of the history of the past and plural markers in Germanic languages). Despite these differences in frequency across time and space, the psychology of the speakers remains the same. 4. Conclusions We have seen that despite the identical function of regular and irregular inflection, irregular forms are avoided, but the regular suffix is applied freely, in a variety of circumstances (from gelded to ploamphed to filed out to low-lifes to anomia) with nothing in common except failure of access to information in memory. Indeed, these circumstances were deliberately highlighted because they are so heterogeneous and exotic. Ever since Pinker and Prince's (1988) critique of the original Rumelhart-McClelland pattern associator, many connectionist researchers have responded with models containing ad hoc patches designed specifically to handle one or another of these circumstances (e.g., MacWhinney and Leinbach, 1991; Plunkett and Marchman, 1991; Daugherty and Seidenberg, 1992; Hare and Elman, 1992; Nakisa and Hahn, 1997; for critiques, see Marcus et al., 1992, 1995; Marcus, 1995; Prasada and Pinker, 1993; Kim et al., 1994). But the human brain is not wired with separate innate hardware features dedicated to generating seldom-produced quirks such as Mickey Mouses or three 'man 's, as one finds in many of these models; the phenomena should be consequences of the basic organization of the language system. The thrust of the argument herein is that within the word/rule theory, the phenomena fall out of the assumption that regular forms are default operations applying whenever memory retrieval fails to provide an inflected form. Regular inflection applies freely in any circumstance in which memory fails because regular inflection is computed by a mental operation that does not need access to the contents of memory, namely, a symbol-processing rule. Moreover, the comparison with German shows that the applicability of the regular as the default is not caused by the regular pattern being the majority of the child's learning experience. The evidence, then, supports the hypothesis that the design of human language comprises two mental mechanisms: memory, for the arbitrary sign underlying words, and symbolic computation, for the infinite use of finite media underlying grammar. References Berko, J., 1958. The child's learning of English morphology. Word 14, 150-177. Bybee, J.L., 1985. Morphology: A study of the relation between meaning and form. Philadelphia; PA: Benjamins.
S. Pinker I Lingua 106 (1998) 219-242
241
Bybee, J.L. and C.L. Moder, 1983. Morphological classes as natural categories. Language 59, 251-270. Bybee, J.L. and D.I. Slobin, 1982. Rules and schemas in the development and use of English past tense. Language 58, 265-289. Chomsky, N. and M. Halle, 1968. The sound pattern of English. New York: Harper and Row. Clahsen, H. and M. Rothweiler, 1992. Inflectional rules in children's grammars: Evidence from the development of participles in German. Morphology Yearbook, 1-34. Clahsen, H., M. Rothweiler, A. Woest and G.F. Marcus, 1993a. Regular and irregular inflection in the acquisition of German noun plurals. Cognition 45, 225-255. Clahsen, H., G. Marcus and S. Bartke, 1993b. Compounding and inflection in German child language. Essex Research Reports in Linguistics, 1:1993, Colchester, England. Daugherty, K. and M. Seidenberg, 1992. Rules or connections? The past tense revisited. Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, 259-264. Hillsdale, NJ: Erlbaum. de Saussure, F., 1960. A course in General Linguistics. London: Peter Owen. Francis, N. and H. Kucera, 1982. Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin. Gordon, P., 1985. Level-ordering in lexical development. Cognition 21, 73-93. Hahn, U. and R. Nakisa, 1998. German inflection: Single-route or dual-route? Unpublished manuscript, Oxford University. Halle, M. and K.P. Mohanan, 1985. Segmental phonology of modern English. Linguistic Inquiry 16, 57-116. Hare, M. and J. Elman, 1992. A connectionist account of English inflectional morphology: Evidence from language change. In: Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, 265-270. Hillsdale, NJ: Erlbaum. Hoeffner, J., 1992. Are rules a thing of the past? The acquisition of verbal morphology by an attractor network. In: Proceedings of the Fourteenth Annual Conference of the Cognitive Science Society, 861-866. Hillsdale, NJ: Erlbaum. Kim, J.J., G.F. Marcus, S. Pinker, and M. Hollander, 1994. Sensitivity of children's inflection to grammatical structure. Journal of Child Language 21, 173-209. Kim, J.J., S. Pinker, A. Prince, and S. Prasada, 1991. Why no mere mortal has ever flown out to center field. Cognitive Science 15, 173-218. Kiparsky, P., 1982. Lexical phonology and morphology. In: I.S. Yang (ed.), Linguistics in the morning calm, 3-91. Seoul: Hansin. MacWhinney, B. and J. Leinbach, 1991. Implementations are not conceptualizations: Revising the verb learning model. Cognition 40, 121-157. Marcus, G.F., 1993. Negative evidence in language acquisition. Cognition 46, 53-85. Marcus, G.F., 1995. The acquisition of inflection in children and multilayered connectionist networks. Cognition 56, 271-279. Marcus, G.F., 1999. The algebraic mind. Cambridge, MA: MIT Press. Marcus, G.F., S. Pinker, M. Ullman, M. Hollander, TJ. Rosen and F. Xu, 1992. Overregularization in language acquisition. Monographs of the Society for Research in Child Development 57 (4, Serial No. 228). Marcus, G.F., U. Brinkmann, H. Clahsen, R. Wiese and S. Pinker, 1995. German inflection: The exception that proves the rule. Cognitive Psychology 29, 189-256. McCarthy, J. and A. Prince, 1990. Foot and word in prosodic morphology: The Arabic broken plural. Natural Language and Linguistic Theory 8, 209- 283. Pinker, S., 1994. The language instinct. New York: HarperCollins. Pinker, S., and Prince, A., 1988. On language and connectionism: Analysis of a Parallel Distributed Processing model of language acquisition. Cognition 28, 73-193. Plunkett, K. and V. Marchman, 1991. U-shaped learning and frequency effects in a multi-layered perceptron: Implications for child language acquisition. Cognition 38, 43-102. Prasada, S. and S. Pinker, 1993. Generalizations of regular and irregular morphological patterns. Language and Cognitive Processes 8, 1-56.
242
S. Pinker I Lingua 106 (1998) 219-242
Rumelhart, D. and J. McClelland, 1986. On learning the past tenses of English verbs. Implicit rules or parallel distributed processing? In: J. McClelland, D. Rumelhart and the PDF Research Group, Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press. Senghas, A., J.J. Kim, S. Pinker and C. Collins, 1991. Plurals-inside-compounds: Morphological constraints and their implications for acquisition. Paper presented at the Sixteenth Annual Boston University Conference on Language Development, October 18-20, 1991. Ullman, M., 1993. The computation and neural localization of inflectional morphology. Doctoral dissertation, Dept. of Brain and Cognitive Sciences, MIT. Ullman, M., S. Corkin, M. Coppola, G. Hickok, J.H. Growdon, WJ. Koroshetz and S. Pinker, 1997. A neural dissociation within language: Evidence that the mental dictionary is part of declarative memory, and that grammatical rules are processed by the procedural system. Journal of Cognitive Neuroscience 9, 289-299. Williams, E., 1981. On the notions 'lexically related' and 'head of a word'. Linguistic Inquiry 12, 245-274. Xu, F. and S. Pinker, 1992. Weird past tense forms. Journal of Child Language 22, 531-556.
243
Language Acquisition: Knowledge Representation and Processing Author Index
Abney, S.P., 122 Allen, J., 208 Andersen, H., 156 Anderson, R., 89 Anderssen M., 124-126 Andrews, A., 119 Antinucci, F., 89 Anttila, A., 8 Atkinson, M., 7, 9 Avrutin, S., 34, 53, 69, 107 Babyonyshev, M., 34, 66 Bahl, L.R., 166 Bakovic, E., 178 Bar-Shalom, E., 55, 56 Barbiers, S., 106 Barbosa, P., 70 Bard, E., 3 Baum, L.E., 166 Bauman, A.L., 208 Behrens, H., 28, 97 Belletti, A., 70,73 Berko, J., 222 Bernstein, J., 144 Berwick, R., 9, 162 Best, C.T., 198, 203 Bhatia, T.V., 2 Bley-Vroman, R., 147, 148, 155, 156 Bloom, L., 89 Bloom, P, 2, 4, 38 Boersma, P., 8, 191, Bohnacker, U., 42 Bol, G., 69 Bolinger, D.L., 204, 207, 208 Bongaerts, T., 138 Borer, H., 4, 5, 25, 34, 41, 42, 56, 61, 63, 66, 155 Boser, K., 28, 83, 94-97 Brent, M.R., 6, 204, 207, 208
244
Breeder, P., 3 Bromberg, H., 33, 36-39 Bronkart, J.P., 89 Broselow, E., 11 Brown, P.F., 166 Brown, R., 31, 85, 92, 93, 120, 128, 129, 209, 213 Budwig. N., 115, 117, 124 Burt, M., 155 Butterfield, S., 203 Bybee, J.L., 223, 224, 227-229 Cairns, P., 6, 206, 208 Carnie, A.M., 52 Carter, D.M., 203 Cartwright, T.A., 6, 204, 207, 208 Caselli, M., 84 Chiat, S., 113 Chien, Y.C., 34, 42, 53, 69 Chomsky, N., 12, 23, 50, 51, 53, 62, 66-68, 101, 142, 212, 224 Christiansen, M.H., 208 Church, K.W., 203 Cinque, G., 144 Cipriani, P., 84 Clahsen, H., 4, 7, 28, 41, 84, 88, 138, 141, 145, 239 Clark, R., 10, 190 Coker, C.H., 208 Grain, S., 3, 4, 41, 67 Crisma, P., 38, 96 Culicover, P., 32, 212 Cutler, A., 17, 203, 206 Deprez, V., 12, 48 Daugherty, K., 240 de Haan, G., 84, 89, 90, 98 Dekydtspotter, L., 156 Demopoulos, W., 9 Dempster, A.P., 167 Demuth, K., 6, 193 den Besten, H., 95 de Villiers, J.G., 2, 213 de Villiers, P.A., 213 Dresher, E., 32, 162, 184, 185, 188, 189 Dulay, H., 155
245
duPlessis, J., 151, 156 Echols, C.H., 206, 209 Eimas, P.D., 197-198 Eisner, J., 170 Elbers, L., 9 Ellison, T.M., 170 Elman, J.L., 6, 24, 212, 240 Emonds, J., 83, 140 Enc., M, 53, 63, 87 Epstein, S., 10, 134, 155 Eubank, L., 133, 135, 142, 146, 156 Felix, S.W., 4, 147 Ferdinand, A., 61, 90, 91, 95 Ferguson, C.A., 212 Finer, D., 11 Fletcher, P., 4, 114, 116, 127, 128 Fodor, J., 75, 212 Francis, N., 226 Frank, R., 162, 170, 188, 189 Franks, S., 56 Frazier, L., 2 Fukui, N., 8, 25 Gelderen, V., 91 Genesee, F., 7, 9 Gerken, L.A., 209 Gerstman, L.J., 204, 207, 208 Gibson, E., 10, 32, 162, 188-190 Giorgi, A., 13, 102 Gleitman, L.R., 6, 43, 198, 209 Gnanadesikan, A., 193 Gold, E.M., 16 Gordon, P., 235 Greenfield, P., 38 Grimm, H., 25 Grimshaw, J., 161, 173, 174, 178 Grinstead, J., 55, 88 Grondin, N., 139 Gueron, J., 87 Guasti, M., 55, 84, 88, 98 Guilfoyle, E., 55 Haegeman, L., 34, 49, 96-98
246
Hagstrom, P., 62 Hahn, U.K., 3, 17, 225 Hale, M., 191 Halle, M., 44, 224 Hamburger, H., 31, 54, 115, 116 Hammond, M., 170 Hanlon, C., 31 Hare, M., 17, 240 Harris, T., 41, 42, 48, 85 Hawkins, R., 11, 147 Hayes, B., 8 Haznedar, B., 147 Hey cock, C., 1 Hickey, T., 55 Hickmann, M., 69 Hinton, G., 166 Hirsh-Pasek, K., 198, 199 Hockett, C.F., 208 Hoeffher, J., 225 Hoekstra, T., 3, 4, 7, 8, 11-14, 58, 69, 81, 82, 84, 86-88, 91, 96, 98, 99, 101, 104, 107, 135 Hohne, E.A., 208 Holmberg, A., 97 Houston, D., 204 Hulk, A., 15, 148-152, 155 Huxley, R., 113, 115, 116, 121, 123, 125 Hyams, N., 3, 4, 7-9, 11-14, 32-34, 39, 40, 56, 81, 82, 86, 87, 89, 96, 98, 99, 101, 107 Ingram D., 3, 91, 103, 104 loup, G., 147 Isard, S., 199 Jaeggli, 0., 9 Jagtman, M., 138 Jansen, B., 138 Jonas, D., 48, 51, 55 Jordens, P., 83, 84, 89, 91, 109, 138 Jusczyk, P., 3, 6, 7, 16, 17, 197, 198, 199, 202-204, 208, 212, 215 Kail, M., 69 Kapur, S., 162, 188, 189 Karmiloff-Smith, A., 34, 68 Karttunen, L., 170
247
Kaye, J.D., 32, 162, 184, 185, 188, 189 Kayne, R., 87 Kemler Nelson, D.G., 198 Kenstowicz, M., 201 Kim, J.J., 230, 232, 233, 240 Kiparsky, P., 230, 234 Klatt, D.H., 203 Klein, W., 145 Klepper-Schudlo, A., 55, 56 Kramer, I., 94, 97 Krantzer, A., 107 Krashen, S., 155 Krumhansl, C.L., 199 Kucera, H., 226 Kuhl, P.K., 6, 198 Kursawe, C., 96 Lalonde, C.E., 198 Lardiere, D., 155 Lebeaux, D., 4, 5, 86 Legendre, G., 169, 178 Lehiste, L., 208 Leinbach, 225, 240 Levelt, C., 193 Levow, G.A., 49 Lightbown, P., 84 Lyons, L, 74 Mandel, D.R., 199 Manzini, R., 25, 42, 190 Marantz, A., 44 Maratsos, M., 69 Marchman, V., 225, 240 Marcus, G.F., 3, 228, 229, 232, 233, 235, 238-240 Mathews, R.J., 9 Mattys, S., 207 McCarthy, J., 172, 173, 239 McClelland, J.L., 3, 17, 224, 229, 236, 240 McDaniel, D., 3 McWhinney, B., 3, 4, 84, 90, 91, 93, 225, 140 Meisel, J., 4, 83, 91, 145 Mercer, R.L., 167 Miller, G.A., 199
248
Miller, J.D., 6 Miller, J.L., 198 Mills, A., 28 Moder, C.L., 223, 228 Mohanan, K.P., 224 Morgan, J.L., 6, 206 Morse, P.A., 198 Moucka, R., 55, 56 Murre, J., 3 Muysken, P., 138 Myers, J., 204 Nadas, A., 167 Nakisa, R.C., 3, 225 Newport, E., 6, 156, 209, 212 Newsome, M., 204 Niyogi, P., 162 Norris, D., 17, 203 Otake, T., 17, 207 Paradis, J., 7, 9, 193 Parodi, T., 143-147 Pater, J., 193 Pea, R.D., 107 Penke, M., 84, 88 Penner, Z., 4 Pensalfini, R., 114-116, 119, 121 Perdue, C., 145 Petrie, T., 166 Phillips, C., 38, 55, 96, 97 Pianesi, F., 13, 102 Picallo, M.C., 144 Pierce, A., 12, 29, 32, 48, 55, 83, 84, 88 Pinker, S., 3, 4, 17, 18, 134, 148, 155, 156, 219, 220-223, 229, 239, 240 Pizutto, E., 84 Platzack, C., 29, 88, 96, 155 Plunkett, K., 3, 29, 84, 91, 94, 225, 240 Poeppel, D., 28, 30, 37, 38, 41, 83, 84, 95, 96, 98, 138 Polka, L., 198, 199 Pollack, J., 83, 140, 142 Powers, S., 5, 115, 116, 124, 125 Prevost, P., 156 Prasada, S., 17, 229, 239, 240
249
Prince, A., 5, 17, 161, 172, 174, 192, 213, 222, 223, 239, 240 Pulleyblank, D., 164, 191 Quartz, S.R., 24 Radford, A., 3, 4, 14, 113, 86, 125-127 Rasetti, L., 33 Reinhart, T., 109 Reiss, C., 191 Rhee, J., 56 Rice, M., 42, 69 Rispoli, M., 114-119, 124-126 Ritchie, W., 2 Rizzi, L., 7, 8, 34, 37, 39, 49, 53, 58, 70, 83, 86, 89, 101 Roberts, I., 10 Robertson, D., 11 Roeper, T., 9, 36, 37, 98 Rohrbacher, B., 36, 37, 98, 142 Rothweiler, M., 239 Rumelhart, D.E., 3, 17, 224, 229, 236, 240 Saffran, J.R., 206, 207, 210 Safir, K., 9, 190 Samek-Lodovici, V., 161, 173, 174, 178 Sano, T., 39, 56, 57, 88, 89 Santelmann, L., 37, 83, 96, 212 Sarma, V., 55 Satta, G., 170 Schiitze, C., 12, 23, 27, 33, 38, 42, 44, 45, 46, 49-51, 58, 64, 69, 114-120, 122, 123, 126, 128, 129 Schaeffer, J.C. 55, 69, 88 Schonenberger, M., 39, 40 Schwartz, B.D., 3, 7, 10, 11, 15, 133, 133, 135, 141, 142, 147, 148, 151, 155157 Secco, G., 55 Seidenberg, M.S., 208, 240 Sejnowski, T.J., 24 Senghas, A., 234 Serra, M., 84 Shady, M.E., 209 Shafer, V., 209 Shepherd, S.C., 107 Shillcock, R., 1 Shirai, Y., 89
250
Sigurjonsdottir, S., 55 Sinclair, H., 89 Slobin, D.I., 223, 224 Smolensky, P., 3, 5, 8-10, 16, 161, 162, 167, 168, 170, 172, 182, 191-193, 213 Snow, C., 84, 90, 91, 93, 212 Snyder, W., 55, 56 Sole, R., 84 Sorace, A., 1, 10, 11, 155, 156 Sprouse, R., 10, 15, 133, 41, 142, 147, 151 Stephany, U., 107 Stromqvist S., 29, 84, 91, 94 Streeter, L.A., 198 Stromswold, K., 128, 129 Tees, R.C., 198, 203 Tesar, B.B., 3, 5, 8, 9, 16, 161, 162, 169, 170, 179, 182, 185, 191-193 Thomas, M., 11 Thompson, W., 91, 103, 104 Thornton, R., 3, 34, 67 Tomaselli, A., 155 Torrens, V., 55, 84, 88 Towell, R., 11 Travis, L., 95, 97 Tuijmann, K., 98 Turkel, W.J., 164, 191 Ud Deen, K., 92-94 Ullman, M., 228, 235-237 Umeda, N., 208 Vainikka, A., 4, 50, 133, 135, 138, 139, 141 Valian, V., 9, 39 van de Vijver, R., 193 van der Meulen, I., 91 Van Ginneken, J., 83, 91 Van Kampen, J., 35 Vance, B., 155 Verrips, M., 83 von Stutterheim, C., 145 Wagner, K., 28, 84 Wanner, E., 6, 43, 198 Weissenborn, J., 4, 28, 55, 83, 84, 88 Wells, G., 107 Werker, J., 6, 198, 203
251
Westermann, G., 17 Weverink, M., 53, 83, 88 Wexler, K., 3-5, 7, 8, 10-15, 23, 25, 27-39, 41-44, 46-49, 52-59, 61, 63, 65-68, 69, 83, 84-86, 88, 95, 96, 98, 101, 114-119, 122, 123, 138,162, 188-190, 212 White, L., 10, 11, 135, 139-142, 147, 156 Whorf, B., 202 Wijnen, F., 90-92, 103, 107, 109 Williams, E., 9, 230 Xu, P., 223 Young-Scholten, M., 133, 135, 138, 139, 141 Zagona, K., 70 Zwart, J., 95, 97
This page intentionally left blank
253
Language Acquisition: Knowledge Representation and Processing Subject Index A-chains acronyms agreement AGR/TNS Omission Model (ATOM) default missing (See also AGR/TNS Omission Model) number person allophones Alzheimer's Disease anchoring
5,66 234 41,44-46,55,70,84,85, 88 7, 8, 12, 27, 43-52, 57, 58, 61-62, 69, 72, 100 45-49, 53, 57, 60-63 87,98 87, 88, 98 203, 207-209 237 53, 86, 87, 98,101,108, 109
aphasia agrammatic anomic
237 237 210 89-95, 102-107 18,219 166
absence from Root Infinitives Italian missing
89
artificial languages aspect associative memory automatic speech recognition auxiliaries
bilinguals binding bootstrapping (prosodic) bound morphemes case accusative assignment to the subject checking: See checking default (See also genitive subjects) ergative errors nominative oblique Catalan checking case D feature double
73 55 9,202 53,69 6 213 68 12,44,46 12, 14,44,46,49 9, 26,43-46,49, 52,53, 122 119 114,128 12, 44,46, 52-54, 68 5,8 55 47, 62, 67 48-50,68 50,51,59-61 59-63, 67-69,73
254
Unique Checking Constraint child-directed speech clause boundaries cliches complex words compound words connectionism connection strengths connectionist models Constraint-Ranking Triggering Learning Algorithm content words continuity Continuity Hypothesis convergence conversational speech corpora (See also LondonLund corpus) correspondence relations in structural descriptions crosslinguistic variation cue learning Czech Danish declination default agreement: See agreement default case: See case delearning in L2 development
early child grammars Early Morphosyntactic Convergence Economy electrophysiological recording
212 198, 199,201 227 231,234 18,230,232,234,239 3, 219, 224 225, 228 17, 18, 212, 225 164 203, 207 114, 117 4, 14, 94, 138 12,60,63, 64,66,67 203 3, 6 174
and Optimality Theory
denominal verbs determiners, missing developmental constraints developmental problem dialects of English diminutives discontinuous dependencies Distributed Morphology Dutch
12, 27, 57-74
13, 26, 54 164, 189, 190 32, 162, 163 55, 56, 207 55 198, 199
118 147 233 69, 98 7,9 4 223 206 212, 213 45,46, 61 13, 35, 37, 44, 46, 55, 69, 82-84, 87-100,102, 105-109, 148, 202, 207 4, 5, 114 82 8,47,50 209
255
endstates in L2 acquisition English
4,10,148,155, 156,157 29-34, 37,41-44,48, 55, 61, 69, 82, 85-109, 139 232, 233, 239 182, 186, 191
eponyms Error-Driven Constraint Demotion Algorithm Eventivity Constraint exhaustive search of possible grammars Expectation Maximization (EM) algorithm Extended Projection Principle Faroese features
13, 82, 89-109 162
166 26, 27,47-54, 70, 73, 74 48,55 Dfeterminer] +/-Interpretable number (See also underspecification) phi semantic
final lengthening finite morphology Finite Phrase (FP) Finnish Flemish foreign borrowing fossilization (See also endstates in L2 acquisition) French
12,51,52,59-62,65, 68-74 51,59-61,65-69 13, 58, 87
41,46 62 198, 199, 207 8 138 119 96 234, 239 134 15, 28-32, 48, 56, 73,90, 91,96,97,139,147, 148, 202, 207 15, 147-155
Full Transfer-Full Access (FT/FA) function words functional categories (See also tense, agreement)
17,209-211 109 missing
Gen function and Optimality Theory generative phonology genetic algorithms genitive clauses: See genitive subjects genitive subjects
86 9, 174, 176, 178, 179
219, 224 164
clausal analyses of fer-subjects in finite clauses
14,113-131 116-118 119-123 118
256
in interrogative structures in nonfmite clauses in root clauses ito-subjects licensing of mistranscriptions of mv-subjects nominal analyses of owr-subjects unattested forms of with +ing forms with auxiliaries with default 3rd person singular agreement German
14, 116 117 118,119 128-130 118-120 123 115,117,123-126 14,114-116 127, 128 119,120,127,128,129 114 14, 116 119
28-30,37,41,42,44,46, 73,83,91,94,97,103, 138,145 143-147
German nominate in L2 acquisition German past tense German plural Germanic languages gerund structures Gold's Theorem grammar Greed Harmonic Grammar harmonic ordering and constraint hierarchies Harmony headless derivation headlessness headturn preference procedure Hebrew Hidden Markov Models high-amplitude-sucking procedure Icelandic idioms inert features infinitives
238 238 56, 102 114 16 219, 220 50 169 177
165,168-170, 191 233,238 230,231 5,7, 16, 198, 202,204, 209, 210, 214 56 166 199, 201 55, 119 227,230 142
Optional Infinitives (See also Root Infinitives, PRO) Root Infinitives (See also Optional Infinitives, Modal Reference Effect)
9,11,12,23-79 11,12-14,28,55,74, 81-109
257
inflection (See also tense, agreement)
42,47 development of irregular missing regular Very Early Knowledge of Inflection (VEKI)
24, 25,43, 86 17, 18, 219, 222-229, 231-240 4, 38,47 17,18,219,222,223, 225-229, 232-240 25-26,41, 45-47,74-75 219 7 8, 53, 86, 109 135, 142, 146,148, 155
inflectional morphology initial stage (of LI) interface properties interlanguage +/-Interpretable features: See features Irish Italian Japanese Korean L1-L2 differences in development in L2 endstates
52, 53, 55 29, 55, 56, 58, 62, 70-74, 87, 89, 138, 177,179, 181 13, 29, 87 13, 62,138,145 15, 143. 145 15, 139, 141,156 15,156
L2 acquisition adolescent adult child earliest stage of failure-driven development in of morphological paradigms of noun movement of verb movement poverty of the stimulus in restructuring in L2 development: See L2 acquisition L2 initial state as LI lexical projections as the totality of the LI grammar
139-141 1, 2, 3,10, 143-155 135-137 139 147 142
143-147 139-141 155,156 10,147 10 138-141 10, 15,147-155
language instinct for a second language for native language Late Learning Early Emergence (LLEE) Latin
15,134, 135,133-160 133, 134,148 24-27, 40, 43 85
258
Law of Similarity learnability and Optimality Theory lexical entry lexical gaps lexical learning lexical parameterization hypothesis: See parameters lexical projection lexicon (See also mental lexicon) Logical Form logical problem of language acquisition London-Lund Corpus low-pass filtering Mad Magazine sentences Mainland Scandinavian maturation memory
138. 139 192, 193 64, 67, 68 2,4 206 202 103,118 30,48 4, 10, 12, 41, 54, 63, 64, 86 6,17,18,199,226,231, 235, 240 236 220,221,223,226,232, 234 230 17, 203-207
memory impairment mental lexicon (See also lexicon) metaphors Metrical Segmentation Strategy (MSS) metrical stress Minimal Trees hypothesis Minimalist Program Minimize Violations (MV) Modal Reference Effect (MRE) mora moraic segmentation strategy morphological learning morphological rules multisyllabic words musical phrases negation
162, 172 15, 138-141, 150 8, 27, 50, 53, 62, 63, 66,70 64, 66, 67,72 82,89-109
and infinitives and subject raising and verb movement: See verb movement in L2 acquisition negative evidence and Optimality Theory neural networks nominalisations
224 2, 10,161,162,172,193 16,165,171,183,188-191 233 120, 121, 128 15, 42,138
188 17, 207 7 17,201,212 201 199 62 49, 83, 85,91,94 48, 50, 60
135,139 3,9,31 9, 16, 180, 190, 191, 193 168 115
259
non-finite forms (See also infinitives) non-native contrasts Norwegian novel words Null Modal Hypothesis null subjects (See also parameters) Infl-licensed Null-Subject/Optional Infinitive Correlation number: See features number agreement: See agreement number underspecification: See underspecification Old English onomatopoeia Optimality Theory
198, 203 55 201, 206, 209, 233, 234, 236, 237, 239, 82,94-102 5, 8, 9, 33-39, 57, 70-74, 99 13, 56,70 27, 54-57, 62, 70-72
98
and constraint demotion and constraint ranking and constraint tableaux and constraint violation and data complexity and faithfulness constraints and grammar learning and grammar space and harmonic ordering and Harmony and infinity and iterative model-based algorithms and mark cancellation and markedness and metrical constraints and minimal violation and principles of learning and the grammar of subjects and the method of mark eliminability and the morphological paradigm and typology by reranking
227 234, 238 5, 8,9, 12, 16,17, 161-193, 197 5, 16, 179, 181-184, 186-192 5, 16, 164, 172, 175, 177, 179, 181, 190-193, 213-215 175,189 5, 16, 172, 173-176, 181, 190, 191, 193 182 175, 178, 192, 193 16 162, 193 16, 175-178 165, 168-170 178,179 168,192 176 174 184,185 176 163 161,173-175,178,180 178 192 177
260
optionality (See also infinitives) in early stages of L2 acquisition of verb movement possible sources of overparsmg overregularization errors Parallel Distributed Processing parameters lexical parameterization hypothesis mis-setting of null subject setting of (See also cue learning) verb movement Very Early Parameter Setting (VEPS) word order Parkinson's Disease parsing and Gen in Optimality Theory and grammar learning and Optimality Theory past tense (See also inflection) pattern associator memory pausing perceptual learning person agreement: See agreement phi features: See features phonetic contrasts phonotactics phrase boundaries plurals Polish Portuguese positive evidence and Optimality Theory Possible Word Constraint pragmatic errors prefixing
142, 143 142, 151,154 8, 12, 63-66 174, 178 235-237, 239 17, 224
8,25 25 9,40,47 11,12,25,27,32-33,39, 40 24,73 11,29,30,37 11,25-43,46,47,70,72, 74,75 7 237
174, 2, 165, 166, 170, 172 5, 16, 171, 179, 180 6,17,18,219,222,224, 227-229, 231, 233, 236, 238 219, 224,225, 229, 230, 237, 238, 240 198, 199,203 3, 9, 27, 31, 32
198, 199, 201, 208 17, 197,201-203,206, 207, 209, 215 198, 199, 210 18, 238, 239 55, 56,199,200, 202 55,138 2 182 203 34, 35, 38 230
261
5-7 162
prelinguistic development principles and parameters grammar space Principles and Parameters theory (PPT) PRO in Optional Infinitives processing window production-directed parsing pronouns
16,113,114,162-164, 172,188-190 33-36 212,213 170 genitive nominative null-stem forms of objective strong forms of weak forms of
proper names prosodic markers Proto-principles psycholinguistics and linguistics quotations recursion restructuring (in LI) right-hand head rule robust interpretative parsing Robust Interpretative Parsing/Constraint Demotion Romance languages roofless derivation rules (See also word/rule theory) Russian -s forms scrambling segmentation sequential ordering Single Value Constraint sound-meaning pairing Spanish specificity speech perception speech rate speech variability "starting small" stress patterns
114,117,122,127 114,117,122,125 124 114,117,122 124, 126 124 206, 232, 239 17, 197-199, 201, 202, 204,207,210,213 5 11,27,54,57,74,75
234 221 5 231,232 165, 166, 169, 170,184, 185,189,192 16, 168-172, 183,184, 186-189, 192 13, 102,138,145,146 233 3, 17, 18, 219,221-223 55,56,66,91,119 85, 123 62 17,197,203, 204, 206-210 17, 199, 201 190 219 55,73, 87,138 62,69 2, 5, 7, 16, 197-199, 212, 215 198 198 212 202-204, 206, 207, 209, 213-215
262
stress system
161, 163,166,169,184, 187 166 203-207 4,114 12,47,50-53, 59,60, 68-71
stressed syllables strong syllables structure-building model subject raising (See also Extended Projection Principle) sucking Swedish syllabic segmentation strategy syllable onsets symbol processing symbolic theories systematic regularization target language (TL) T-chain: See tense tense (See also past tense)
37,55, 84,91,95-97 17,207 202,214 17, 229, 240 3 229, 231 134 missing operator T[ense]-chain
tied constraints topic drop topicalization transfer (See also Full Transfer-Full Access, Minimal Trees hypothesis, Valueless Features hypothesis) of feature values of functional projections of lexical projections Triggering Learning Algorithm triggers Truncation Hypothesis truncations Turkish type frequency UG principles (See also Universal Grammar) unaccusatives underpaying underspecification
13,42,45 7, 8,44,45,47-49, 53, 57-63,65,70-72 87 87,98 8 33-37 62, 95,96, 148, 149,155
150 141, 142, 150 15, 138, 142, 150 163, 164,183
9,10 7, 37, 38,49, 58, 86 234 135-137,145,201 237-239 119
in DPs (See also number) in L2 acquisition
66 174 7, 8, 58, 82, 86, 88, 98, 114, 127 99,100 138
263
of agreement (See also agreement) of functional heads oflnfl of number of tense (See also tense) unergatives Unique Checking Constraint: See checking Universal Grammar
and Con and constraint violation and Gen and Optimality Theory
and ranking restrictions and the metrical module unsupervised learning U-shaped development utterance length Valueless Features hypothesis verb morphology verb movement and negation (See also L2 acquisition of verb movement) to Comp (See also Verb Second) to Ml verb raising: See verb movement Verb Second (See also verb movement) Viterbi algorithm weak syllables Weak Transfer hypothesis: See Valueless Features hypothesis wh-movement word frequency word/rule theory (See also rules)
117,122,126 114 14, 117 7, 8,13, 58, 87-89,98 117,122,126 66
2,4,9,10,15,16,26,40, 43,46, 48, 53, 68, 74, 134,135,147,157, 161,213 174,177,178 176 174, 180 164, 179,183,192,193, 213 178 184 166 236 212 15,142,143,150 3 15,29 29
28,95,96 28, 52,95,101 28, 30, 35,37, 83,95,96 167 205-207
36-39, 62, 96-98, 101 226-228, 235, 236,238, 240 222-226,234,238,240