Linguistic Evidence
≥
Studies in Generative Grammar 85
Editors
Henk van Riemsdijk Harry van der Hulst Jan Koster
Mouton de Gruyter Berlin · New York
Linguistic Evidence Empirical, Theoretical and Computational Perspectives
Edited by
Stephan Kepser Marga Reis
Mouton de Gruyter Berlin · New York
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.
The series Studies in Generative Grammar was formerly published by Foris Publications Holland.
앝 Printed on acid-free paper which falls within the guidelines 앪 of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication Data Linguistic evidence : empirical, theoretical, and computational perspectives / edited by Stephan Kepser, Marga Reis. p. cm. ⫺ (Studies in generative grammar ; 85) Includes bibliographical references. ISBN-13: 978-3-11-018312-2 (cloth : alk. paper) ISBN-10: 3-11-018312-9 (cloth : alk. paper) 1. Linguistics ⫺ Methodology. I. Kepser, Stephan, 1967⫺ II. Reis, Marga. III. Series. P126.L48 2005 410.72⫺dc22 2005031124
Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at ⬍http://dnb.ddb.de⬎.
ISBN-13: 978-3-11-018312-2 ISBN-10: 3-11-018312-9 ISSN 0167-4331 쑔 Copyright 2005 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Cover design: Christopher Schneider, Berlin. Printed in Germany.
Contents
Evidence in Linguistics Stephan Kepser and Marga Reis
1
Gradedness and Consistency in Grammaticality Judgments Aria Adli
7
Null Subjects and Verb Placement in Old High German Katrin Axel
27
Beauty and the Beast: What Running a Broad-Coverage Precision Grammar over the BNC Taught Us about the Grammar – and the Corpus Timothy Baldwin, John Beavers, Emily M. Bender, Dan Flickinger, Ara Kim, and Stephan Oepen
49
Seemingly Indefinite Definites Greg Carlson and Rachel Shirley Sussman
71
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese Sonia M. L. Cyrino and Ruth E. V. Lopes
87
Aspectual Coercion and On-line Processing: The Case of Iteration Sacha DeVelle
105
Why Do Children Fail to Understand Weak Epistemic Terms? An Experimental Study Serge Doitchinov
123
Processing Negative Polarity Items: When Negation Comes Through the Backdoor Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
145
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs Veronika Ehrich
165
vi Contents The Decathlon Model of Empirical Syntax Sam Featherston
187
Examining Constraints on the Benefactive Alternation by Using the World Wide Web as a Corpus Christiane Fellbaum
209
A Quantitative Corpus Study of German Word Order Variation Kris Heylen
241
Which Statistics Reflect Semantics? Rethinking Synonymy and Word Similarity Derrick Higgins
265
Language Production Errors as Evidence for Language Production Processes – The Frankfurt Corpora Annette Hohenberger and Eva-Maria Waleschkowski
285
A Multi-Evidence Study of European and Brazilian Portuguese wh-Questions Mary Aizawa Kato and Carlos Mioto
307
The Relationship between Grammaticality Ratings and Corpus Frequencies: A Case Study into Word Order Variability in the Midfield of German Clauses Gerard Kempen and Karin Harbusch
329
The Emergence of Productive Non-Medical -itis: Corpus Evidence and Qualitative Analysis Anke L¨udeling and Stefan Evert
351
Experimental Data vs. Diachronic Typological Data: Two Types of Evidence for Linguistic Relativity Wiltrud Mihatsch
371
Reflexives and Pronouns in Picture Noun Phrases: Using Eye Movements as a Source of Linguistic Evidence Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus
393
The Plural is Semantically Unmarked Uli Sauerland, Jan Anderssen, and Kazuko Yatsushiro
413
Coherence – an Experimental Approach Tanja Schmid, Markus Bader, and Josef Bayer
435
Contents vii
Thinking About What We Are Asking Speakers to Do Carson T. Sch¨utze
457
A Prosodic Factor for the Decline of Topicalisation in English Augustin Speyer
485
On the Syntax of DP Coordination: Combining Evidence from Reading-Time Studies and Agrammatic Comprehension Ilona Steiner
507
Lexical Statistics and Lexical Processing: Semantic Density, Information Complexity, Sex, and Irregularity in Dutch Wieke M. Tabak, Robert Schreuder, and R. Harald Baayen
529
The Double Competence Hypothesis. On Diachronic Evidence Helmut Weiß
557
List of Contributors
577
Evidence in Linguistics Stephan Kepser and Marga Reis
As is well known, the central objects of linguistic enquiry – language, languages, and the factors/mechanisms systematically (co-)governing language acquisition, language processing, language use, and language change – cannot be directly accessed; they must be reconstructed from the accessible manifestations of linguistic behaviour. These manifestations constitute the realm of possibly usable linguistic data. Since they fall into many types – introspective data, corpus data, data from (psycho-)linguistic experiments, synchronic vs. diachronic data, typological data, neurolinguistic data, data from first and second language learning, from language disorders, etc. –, and since each type, apart from historical data, can be instantiated by infinitely many tokens, the linguist’s central task of building theories about the above-mentioned linguistic objects is invariably bound up with several empirical tasks as well: (i) collecting/selecting a representative as well as reliable database from one or more data types, (ii) evaluating the various data types as to how they reflect linguistic competence (recall that even so-called primary data from introspection as well as authentic language production are complex performance data involving different nonlinguistic factors), (iii) assessing the relationship between the various data types such that comparison between studies of the same issue based on different data types is possible, and potential conflicts in results can in principle be resolved. As will be obvious, the three empirical tasks are largely interdependent. However, they are to a considerable degree dependent on linguistic theorising as well: Task (i) must typically be solved for specific linguistic problems, the specific shape of which is determined by linguistic theory proper. Tasks (ii) and (iii) must be related to theories about the interaction of linguistic competence with nonlinguistic faculties and factors in performance. Thus, gaining relevant linguistic evidence from the mass of potentially available data is neither a trivial matter nor a purely methodical one that can be pursued in isolation from concrete linguistic enquiry and their theoretical concerns. Moreover, providing useful data collections (be it appropriately annotated corpora, collections of controlled speaker judgements, experimentally elicited data, etc.) is also a linguistically challenging ‘practical’ task. In short, linguistic
2 Stephan Kepser and Marga Reis evidence is an extremely important topic as well as a challenging problem for linguists of all persuasions. Given the fundamental nature of the problem, linguistic evidence is a remarkably new topic of linguistic discussion. Traditionally, concrete speech events, i.e, naturally occurring written or spoken utterances, were taken without further ado as the only relevant source of linguistic data, although the need for ‘abstracting’ the linguistically relevant traits from these data was by no means unknown (cf. B¨uhler 1932: 97, 1934: 14–15). Within structuralism, this tradition gained explicit methodological and theoretical status (‘distributionalism’). Thus the explicit mentalistic turn of generative grammar which claimed the priority of explanatory over descriptive goals and introspective over corpus data was bound to inspire a heated debate concerning the status of linguistics as an empirical science in general and the nature of proper linguistic evidence in particular. This debate, however, died down after the seventies without virtually any consequences on linguistic practice: Generative linguists continued relying more or less on introspective data gained in rather informal ways, non-generative linguists continued relying more or less on corpus data that were often just as informally obtained. In recent times, this has begun to change. Regarding the use of introspective data, an important turning point was the book by Sch¨utze (1996), who was the first to argue forcefully for a systematic approach to the collection of speaker judgements. Since then, many authors have followed his lead and shown in various ways the necessity of controlling the many factors that influence speaker judgements in order to obtain more reliable data. As a consequence, there is a growing awareness among generative linguists that it is imperative to collect introspective data in systematically controlled ways, and moreover useful to complement them by data from other sources, both of which increasingly influences their linguistic practice. Regarding corpus data, the importance of this source of evidence has grown significantly since about the mid nineties, when when really large amounts of language data of many types became electronically available and easily accessible for the first time. Frequently, these data were annotated in linguistically relevant ways which made these sources even more valuable. At the same time, computational linguists developed methods of accessing and evaluating these corpora. Consequently, linguists have now access to corpora that are several orders of magnitude larger than they were before. And the size and number of such corpora is still rapidly growing. Hence the renaissance of corpus linguistics to be observed since the nineties is by no means a coincidence. Both developments, by voiding mutual reservations concerning solidity
Evidence in Linguistics 3
and practicability of method, have also paved the way for a rapprochement between introspective and corpus linguists, as evidenced by several recent publications in which the question of what should count as linguistic evidence is discussed from either perspective, on the whole opting for using corpus as well as introspective evidence (see, e.g., the recent special issues of Lingua and Studies in Language ). But an astonishing number of participants in the discussion are still trying to argue that one of these types of linguistic evidence is generally significantly superior to the other (see, e.g., Lehmann (2004) and Borsley (2005b)). It is one of the main aims of this volume to overcome the corpus data versus introspective data opposition and to argue for a view that values and employs different types of linguistic evidence each in their own right. Evidence involving different domains of data will shed different, but altogether more, light on the issues under investigation, be it that the various findings support each other, help with the correct interpretation, or by contradicting each other, lead to factors of influence so far overlooked. This ties in naturally with the fact we started out with that there are more domains and sources of evidence that should be taken into account than just corpus data and introspective data. These insights may sound simple, but, unfortunately, a look into the discussion on evidence in linguistics shows that they are not generally accepted. Apparently, it is not so much the origin of evidence that counts. What is more important is adequacy and the status of the data as true ‘evidence’. Adequacy means that the data put forward to support a certain claim actually do so. This can only be decided on an individual level, i.e., for the particular linguistic problem in question. It is therefore of no concern to us here. Whether certain data can be regarded as true evidence touches the key questions of reliability and reproducibility of data. Reproducibility of data is a base demand in all areas of science for these data to be considered true evidence for something. Typical counterexamples are example sentences held to be (un)acceptable by virtue of the linguist’s own judgement only (especially if fortified by the belief in individual ‘dialects’), or quoting a single occurrence of a construction found in the world wide web, which is by some regarded as the largest accessible corpus as support for this construction’s grammatical existence. Reliability encompasses reproducibility, but requires more. A proper analysis and control of the factors that influence the constitution of the data are necessary as well. With reproducibility and reliability secured, data can be fruitfully used as evidence for strengthening or refuting hypotheses. The contributions to the present book are examples of how this can be done
4 Stephan Kepser and Marga Reis in linguistic practice. An important aspect of this book, and a consequence of what we pointed out at the outset about the theoretical underpinnings of issues of linguistic evidence, is the absence of purely abstract discussions of methodologies. Rather, all issues concerning linguistic evidence taken up in the various contributions are addressed in relation to specific linguistic research problems. The main reason for this is our belief that it is only with respect to concrete problems that the quality of the method and of the various types of evidence brought to bear on them can be evaluated. Apart from that it is just more convincing to see how using different types of evidence and different methods of obtaining it may in fact further our understanding of such concrete problems. It stands to reason then that a volume on ‘Linguistic Evidence’ should cover a wide range of data types (and methods for turning data into evidence) to be applied to an equally wide range of linguistic phenomena. The present volume does: As for data types, many sources of evidence come into play: corpus data, introspective data, psycholinguistic data, data from computational linguistics, language acquisition data, data from historical linguistics, and sign language data. In several contributions, different data types are comparatively evaluated, which yields particularly insightful results. What is remarkably absent is quarrel about the status of introspective vs. corpus data; both are recognised throughout as equally valid sources of evidence. We take this as a hopeful sign that the longstanding but fruitless either-or confrontation of these data types will finally be overcome. Different ways for gaining linguistic evidence are also well represented in this volume, papers applying/exploring psycholinguistic methods forming perhaps the largest group. A good part of them is concerned with experimental data from language processing, exploring systematic ways for measuring and interpreting these data. But there are also papers exploring methods for collecting reliable as well as reproducible grammaticality judgements. These data types and methods are applied insightfully to phenomena from such diverse areas as syntax, semantics, phonology, morphology, psycholinguistics, historical linguistics, language acquisition, corpus linguistics, computational linguistics, and patholinguistics. For books, such diversity of topics is not always a virtue. But in this case, it serves to underline the fundamental importance issues of linguistic evidence have for all fields of linguistics. It also indicates that awareness of these issues has by now reached almost all these fields. The present book is based on the conference on Linguistic Evidence. Empirical, Theoretical, and Computational Perspectives that took place in T¨ubingen, January 29 – February 1, 2004. It was organised by the Collaborative
Evidence in Linguistics 5
Research Centre (SFB) 441 on “Linguistic Data Structures. On the Relation between Data and Theory in Linguistics” at the University of T¨ubingen, which has supported in-depth studies of linguistic evidence in all its aspects since 1999. The contributions to this volume are elaborated versions of the conference presentations, plus a paper by H. Weiß designed to complement the historical section. Unfortunately, four papers presented at the conference were not be submitted for publication. The editors of this volume wish to express their gratitude to the members of the collaborative research centre (SFB) 441 on Linguistic Data Structures at the University of T¨ubingen for many interesting discussions on key issues of evidence in linguistics, and for their vigorous support when organising the above-mentioned conference. In this regard we owe particular thanks to Sam Featherston, Beate Starke, and Dirk Wiebel. We also want to thank the members of the conference programme committee for their excellent work. When preparing the present volume we received again generous support by many, to whom we are very grateful. In particular, we wish to thank the colleagues who reviewed the papers for publication, for their extremely useful comments and criticisms, and the group of helpers without whom editing this volume might have become a mission impossible: Iris Banholzer, Ansgar H¨ockh, Chris Sapp, and Bettina Zeisler. We are also grateful to the German Science Foundation (DFG) for their generous support of the collaborative research centre 441 and of the conference on Linguistic Evidence.
Stephan Kepser and Marga Reis
November 2005
References Borsley, Robert D., (ed.) 2005a Data in Theoretical Linguistics, volume 115(11) of Lingua. Borsley, Robert D. 2005b Introduction. Lingua, 115(11): 1475–1480. B¨uhler, Karl 1932 Das Ganze der Sprachtheorie, ihr Aufbau und ihre Teile. In Bericht u¨ ber den XII. Kongreß der Deutschen Gesellschaft fu¨ r Psychologie, pp. 95–122. Fischer, Jena.
6 Stephan Kepser and Marga Reis 1934
Sprachtheorie. Die Darstellungsfunktion der Sprache. Fischer, Jena. 2nd edition Stuttgart, 1965. Lehmann, Christian 2004 Data in linguistics. The Linguistic Review, 21: 175–210. Penka, Martina and Anette Rosenbach 2004 What counts as evidence in linguistics. Studies in Language, 28(3): 480–526. Sch¨utze, Carson 1996 The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press.
Gradedness and Consistency in Grammaticality Judgments Aria Adli
1
The importance of graded grammaticality judgments: a case study of que Æ qui in French
The methodological issue of the unreliability of certain introspective data circulating in the syntactic literature has already been mentioned by several authors (e.g. Schütze 1996; Adli 2004). One particularly problematic phenomenon is that questionable judgments are sometimes quoted in theoretical studies without prior critical empirical verification, contributing to the formation of “myths” in the literature. One case is the que Æ qui ‘rule’ in French. This rule, which has been introduced into the literature solely on the basis of uncontrolled introspective data, is not confirmed by an experimental study in which a controlled process of data collection is applied to a whole sample of test subjects and which makes use of a graded concept of grammaticality. The que Æ qui rule essentially states that an ECP violation can be avoided in French if qui is used instead of the usual complementizer que in sentences where a wh-phrase has been extracted from the subject position (see Perlmutter 1971; Kayne 1977). This rule rests on the empirical ‘premise’ that there should be a clear difference in grammaticality between (2a) and (2b) (all four sentences are taken from Hulk and Pollock 2001). (1)
a. Quel livre crois-tu que les filles vont acheter. which book think-you COMPque the girls will buy b. *Quel livre crois-tu qui les filles vont acheter. which book think-you COMPqui the girls will buy
(2)
a. *Quelles filles crois-tu que vont acheter ce livre-là. which girls think-you COMPque will buy that book-there b. Quelles filles crois-tu qui vont acheter ce livre-là. which girls think-you COMPqui will buy that book-there
8
Aria Adli
The que Æ qui rule has been an often-used argument in syntactic theorizing.1 The assumption is that this rule is a sort of loophole to avoid ungrammaticality, or in Pesetsky’s words (1982: 308): “Qui does not occur freely as a complementizer, but only ‚when needed’ to avoid an NIC violation. [...] In other words, qui is a form of que which provides an ‘escape hatch’ from the effects of the NIC.” Chomsky (1977) compares it with free deletion in COMP in English. Rizzi (1990; 1997) supports his assumptions concerning the agreement process in the COMP system with this rule. He states that in cases of felicitous subject extraction in French the agreeing complementizer is not 0, but the overt form qui. He assumes that an ECP violation is produced if the agreeing form does not occur and C is in what he considers as the unmarked form que. He further states that this rule is a morphological reflex of Spec-head-agreement between a trace and the head of COMP. Therefore Rizzi (1990: 56) assumes: (3)
qui = que + Agr
Rizzi (1990) accounts for the ungrammaticality of the object extraction (1b) by assuming that Spec-head-agreement requires a C-adjacent position of the extracted element. Furthermore, Rizzi (1990) assumes that the que Æ qui rule only applies when agreement occurs between C0 on the one hand and its specifier as well as its complements on the other hand. (Such a double agreement had already been described for Bavarian German by Bayer 1984 concerning sentences like Wenn-st du kumm-st). The result would be as shown in (4): t’ agrees with C0, t with I0 and – due to the identity of t and t’ – C0 with the maximal projection of I0 (by transitivity). (4)
[t’ C0 [ t I0 ...
One aim of this paper was to test this assumption in an experimentally controlled process of data collection using a graded concept of grammaticality. Such a graded concept is assumed in Chomsky (1964), but it is already given up in Chomsky (1965) in favour of a distinction between grammaticality and acceptability. However, a rather pre-theoretic concept of gradedness persists in the syntactic literature, sometimes tacitly through the use of symbols like “?”, “??”, etc. Furthermore, some principles even make use of theoretical predictions in line with a graded concept (e.g. ECP vs. subjacency violation).
Gradedness and Consistency in Grammaticality Judgments
9
In order to measure graded grammaticality judgments, an instrument based on the principle of graphic rating (cf. Guilford 1954: 270; Taylor and Parker 1964) has been developed. Part of the design is an extensive instruction and training phase. Judgments are expressed by drawing a line on a bipolar scale (and not by marking one of several boxes with a cross). Within the limits of a person’s differential capacity of judgment, a theoretically infinite number of gradations are therefore possible. The test was presented in a A4 ring binder containing two horizontally turned A5 sheets (see diagram). comparaison
Quel avion, pouvez-vous penser, prennent les touristes chinois ?
Jugement (510B)
Le gros buffet en chêne doit être retapé.
Quelle est l’armoire que refont les employés de la scierie ?
Figure 1.
The upper sheet contained the reference sentence, the lower sheet the experimental sentence. The sentence, with the graphic rating scale under it, was printed in the middle of each sheet. After the subject had rated the experimental sentence on the lower sheet, he or she turned this page to go on with the next sentence. The upper sheet with the reference sentence was not turned and remained visible during the whole test. The judgments were given relative to the reference sentence judged in the beginning by the sub-
10
Aria Adli
ject himself, within both endpoints (obviously well-formed and obviously ungrammatical) given by the design. It was, therefore, a bipolar, anchored rating scale with the characteristic that the subjects choose the anchor for themselves. The reference sentence consisted of a suboptimal, but not extremely ungrammatical, sentence. The dependent variable was the difference between the judgment of a particular sentence and the judgment of the reference sentence. The test started, after the presentation of written instructions, with an interactive instruction and training phase of about 10 to 15 minutes. During this phase, two main concepts were introduced in a 9step procedure: isolated grammaticality and gradedness (cf. Adli 2004: 8588 for details). A pre-test revealed the importance of such an additional training phase. Although not directly visible to the naked eye, the concept of grammaticality was often confounded with extra-grammatical factors (e.g., the plausibility of the situation described by the sentence). The understanding of the concept of isolated grammaticality is necessary to reduce interferences with semantic and pragmatic effects. Furthermore, subjects had to replace the common distinction between grammatical and ungrammatical, or "good" and "bad", sentences with a truly graded notion of grammaticality. They were introduced to these two main concepts, among other things, by rating different training sentences and by explaining the reasons for their ratings to the experimenter, who could therefore adapt the instructions to the level of understanding of each subject. After instruction and training, the experimenter left the room. Given that reliability can generally be improved by the use of several items, each syntactic structure was presented in 4 lexical variants. Since the use of experimental methods in grammar research is recent, and not much experience exists yet, the evaluation of the instrument with regard to its reliability is important. A reliability analysis indicates the limits of an instrument concerning the precision of its measurements. Furthermore, the only three studies on the reliability of experimentally collected, graded grammaticality judgments I know of, namely Bard, Robertson and Sorace (1996: 61), Cowart (1997: 23) and Keller (2000: 215), rely on erroneous or improper calculations.2 Reliability is evaluated by Cronbach’s D, which is a measure of internal consistency (see Cronbach 1951). It indicates the consistency between the different lexical variants of a sentence without taking into consideration mean differences between the variants. Indeed, the reliability of the measurements turned out to be sufficiently high (Cronbach’s D = 0.85).
Gradedness and Consistency in Grammaticality Judgments
11
78 French native speakers participated in the experiment. Validity was ensured by means of a special index (called violation of trivial judgments), reflecting the capability of the subject to give graded grammaticality judgments (cf. Adli 2004: 89-91). By means of this criterion, those subjects who were deemed unable to perform this task could be identified and excluded; the data of 65 subjects could be utilized for the subsequent statistical analyses. Given that the measure of graded grammaticality does not reflect the categorical distinction between well-formed and ill-formed sentences, and given that such an information is still – for theory-internal reasons – important, grammatical as well as ungrammatical constructions were included in the test design in order to make available comparative scale points for the interpretation process: The experiment did not only cover subject-initial and object-initial interrogatives with long extraction over que and/or qui. The clearly felicitous constructions (5a) and (5b) with a PP-parenthetical “d’après vous” and the sentences (6a) and (6b) with the expression “croyezvous” at the position after the wh-phrase were also included – some aspects of their syntax are discussed in section 3 (see Adli 2004 for full details).3 (5)
a. Quel appache, d'après vous, méconnaît les obstacles de l'hiver? which Appache according you ignores the difficulties of the winter b. Quel animal, d'après vous, rôtissent les esquimaux de l' igloo? which animal according you grill the Eskimos of the Igloo
(6)
a. (?)Quel architecte, croyez-vous, conçoit les demeures du président? which architect think you designs the residences of the president b. (?)Quel argent, croyez-vous, investissent les organisateurs du bal? which money think you invest the organisers of the ball
(7)
a. ??Quel ingénieur, pensez-vous, qui conçoit la fusée de l'Aérospatiale? which engineer think you quiCOMP designs the rocket of Aérospatiale b. *Quel idiot, pensez-vous, que perd les clefs de la maison? which idiot think you queCOMP looses the keys of the house
12
Aria Adli
c. ?Quel appel, pensez-vous, que reçoivent les policiers du quartier? which call think you queCOMP receive the police officers of the district The data was analysed with a two-way repeated measures ANOVA (variable A: “d’après vous” / “croyez-vous” / “pensez-vous qu-”; variable B: subject / object). I took into consideration not only information about the significance level, but also about the effect size of the differences (in terms of partial K2, cf. Cohen 1973; see also Keren & Lewis 1979: 119). The hypothesis was tested at D = 5%, which approximately allows for D = E.4 In the following, only the relevant results concerning the que Æ qui issue will be given: In order to take into account the whole details of the results, a complete set of orthogonal simple effects was tested as regards the subject interrogatives (cf. Bortz 1999: 254), contrasting (i) (5a). vs. (6a), (ii) (7a) vs. (7b), as well as (iii) (5a) and (6a) vs . (7a) and (7b) [+]grammatical 60
subject questions
grammaticality
40
object questions
20
0
-20
-40
"...d'après vous..." "...croyez-vous..."
"...pensez-vous qui...""...pensez-vous que..."
-60
[-]grammatical
Figure 2.
The results show a partial K2 of 0.183 (p<0.000) for contrast (i), a value of 0.149 (p<0.001) for contrast (ii), but an amount as high as 0.875 (p<0.000) for contrast (iii). It appears that the qui-form (7a) is anything but felicitous. Though there is a significant difference between the qui-form (7a) and the que-form (7b) (i.e., the ungrammaticality of the construction with qui is not as sharp as the ungrammaticality of the construction with
Gradedness and Consistency in Grammaticality Judgments
13
que), it is a matter of fine-grained differences within the range of ungrammatical constructions. The set of orthogonal simple effects shows that the different subject-initial constructions divide into two clearly separated groups, with an eye-catching decrease in grammaticality between them. The form with qui thus cannot be considered as the licensed counterpart of the form with que. The que Æ qui rule emerges as a myth, and it must consequently be eliminated from the discussion. All the same, Pesetsky (1982: 308) notes that “for some French speakers” the use of qui does not make the sentence grammatical. However, he assumes that these persons are speakers of particular dialects (without specifying them) and he does not therefore cast doubt on the que Æ qui rule. As to the question why the ungrammaticality of the qui-form is less sharp, I suggest rather psycholinguistic factors to be responsible: the use of qui instead of que evokes the structure of subject relative clauses (i.e. the nominative qui has a sort of resumptive character), which alleviates the repair mechanisms. Concerning this, it is interesting to note that Perlmutter (1971), the first to raise the que Æ qui issue, analyzes qui in sentences with long subject extraction as a relative pronoun, as his gloss to (8) shows. (8)
Qui a -t- il dit qui s’est évanoui? who did he say who fainted ‘Who did he say fainted?’
Another argument in favour of the assumption of alleviated repair mechanisms is, at least for declaratives, the relatively easy re-analysis of these constructions: The expression qu’il a dit in example (9), also taken from Perlmutter (1971: 102), can be omitted (along the lines of a parenthetical analysis). The remaining sentence (10) is a usual construction with a relative clause. In addition, the expression qu’il a dit itself in (9) is not well-formed (if at all, an expression with the PP-pronoun dont would be required), favouring a reanalysis of the whole sentence with deletion of this expression. A similar situation can be stated for (11), taken from Rizzi (1990: 56), to give another example from the relevant literature. (9)
la speakerine qu’ il a dit qui s’est évanouie… the spokeswoman that he has said who fainted...
(10) la speakerine qui s’est évanouie… the spokeswoman who fainted... (11) l’ homme que je crois qui viendra… the man that I believe who come-FUT
14
Aria Adli
2
Graded grammaticality and the measure of judgment consistency
It is not surprising that the measure of judgment consistency has been so far ignored in syntactic research, essentially because its calculation requires a metrical (i.e., graded) grammaticality scale. The procedure is similar to the reliability evaluation of the instrument described in the previous section. However, for this purpose, the reliability values are interpreted separately for each construction. The main assumption is that reliability differences between different syntactic structures, measured with the same instrument under the same conditions, do not represent a mere indicator with respect to the precision of the instrument, but constitute an interpretable measure in terms of grammar theory. The approach of measuring graded grammaticality judgments allows one not only to study the mean value for the judgments in a sample, but also to calculate the internal judgment consistency (one might also say “intra-individual judgment consistency”) and to compare these values for different syntactic structures. This measure has the advantage of complementing the information about the exact grammaticality value with the information on the difficulty of giving stable judgments, allowing a more complete evaluation of the grammatical quality of a structure. I conducted reliability analyses using the average-measure intraclass correlation coefficient (ICC) of the absolute agreement type (cf. McGraw subject questions object questions
1,00
A v e r a g e I C M C e a s u r e
0,9
0,9
0,87
0,91
0,86
0,90 0,80
0,72 0,67
0,70 0,60 0,50 0,40 0,30 0,20 0,10 0,00
"...d'apr¸s vous..."
Figure 3.
"...croyez-vous..."
"...pensez-vous qui..."
"...pensez-vous que..."
Gradedness and Consistency in Grammaticality Judgments
15
and Wong 1996). This value indicates the intra-individual degree of agreement between the judgments of the lexical variants for each construction. Taking into account differences in mean between the lexical variants, the ICC, derived from the analysis of variance, is a more severe (or conservative) measure than Cronbach’s Į (the specific form applied is the twoway model with random variables). The results given in the figure show that (i) consistency of grammaticality judgments is not a stable factor but depends on the respective construction, (ii) in terms of our examples, both subject extractions, namely with qui and with que, have comparable consistency values, and (iii) in French it is more difficult to give consistent judgments to object interrogatives than to subject interrogatives. Consistency in the judgment of the object interrogatives improves with increasing suboptimality, as a comparison of the last two figures reveals: Consistency is much higher for (7c) than for (5b) or (6b), i.e. there is an interaction between the degree of suboptimality and the sentence initial element. Hence, the analysis of judgment consistency provides another piece of empirical evidence in terms of the discussion about the syntax and the processing of subject- vs. object-initial interrogatives in French. Given this result, two further questions arise concerning (i) the general difference between subject-initial and object-initial interrogatives and (ii) the pronounced suboptimality of the long object extraction (7c). These issues show how grammaticality values, as well as consistency values, come into play in a discussion. 3 3.1
Further issue: extraction, parenthesis and analogy The contrast between subject-initial and object-initial questions
The results of the judgment consistencies are in line with the results of the judgment values itself: Simple main effect tests of the variable B (subject vs. object) reveal significant for each of the three construction types:5 The difference between subject- and object-questions with the PPparenthetical “d’après vous”, (5a) and (5b), is significant. Subject questions have a higher grammaticality value than object questions (p < 0.034; partial K2 = 0.068).6 The difference between subject- and object-questions with “croyezvous”, (6a) and (6b), is significant. Subject questions have a higher grammaticality value than object questions (p < 0.000; partial K2 = 0.271).7
16
Aria Adli
The difference between subject- and object-questions with long extraction, (7b) and (7c), is also significant. However, subject questions have in this case a lower grammaticality value than object questions (p < 0.000; partial K2 = 0.574).8 I assume psycholinguistic factors, having an impact on grammaticality judgments, to be responsible for this effect. Apart from a few exceptions (e.g. Farke 1994), a more difficult processing of object-initial sentences has often been claimed in the psycholinguistic literature (see Frazier and Flores d’Arcais 1989 for Dutch, de Vincenzi 1991 for Italian, Hemforth 1993 for German, cf. also Gorrell 2000). In line with Schütze (1996: 164) who claims that “any other factors that might make a sentence hard to parse” affect the judgment, I also assume that the more difficult processing of the French object-initial questions affects the judgments. The unambiguous interpretation of French subject-initial and object-initial interrogatives like (5a) through (7c), especially the correct interpretation of object-initial interrogatives, requires particular morphological, semantic and phonetic cues. Unlike German, French is not a case language.9 Other than the difference between (7b) and (7c), the difference in grammaticality between subject-initial and object-initial questions without long extraction, e.g. (5a) and (5b), is anything but trivial. So far, not much attention has been paid to differences in grammaticality between licensed constructions. Concerning these results, the follow-up question arises as to whether marked forms, as long as they are not clearly suboptimal, generally have a lower judgment consistency than their unmarked counterparts. Future research might give an answer to this issue.
3.2
Long object extraction and analogy
We have stated so far that consistency in the judgment of the object interrogatives is lower compared to subject interrogatives. However, this only applies as long as the construction is not clearly suboptimal. The long object extraction (7c) has a much higher consistency than the two other object-initial constructions (5b) and (6b). The question arises why long object extractions are clearly suboptimal in French. Contrasting with the long subject extraction (7b) violating the ECP-condition, there is no obvious reason for explaining the low grammaticality value of long object extractions. In order to address this issue, we first need to turn to the constructions (6a) and (6b). I will call these constructions with the expression “croyez-
Gradedness and Consistency in Grammaticality Judgments
17
vous” right-adjacent to the wh-element, as suggested in Adli (2004), VIoCconstructions, contrasting with the long extraction cases like (7b) and (7c), which I call VImC-constructions.10 We have already stated that VIoCconstructions are slightly suboptimal: The plot of the grammaticality values showed that they have a slightly lower degree of grammaticality than the sentences (5a) and (5b) with the PP-parenthetical “d’après vous”. In Adli (2004), I had already addressed the question as to whether French VIoCconstructions should be analyzed as simple matrix clauses with a parenthesis or as complex clauses with long extraction. I give a brief sketch of those results, however without entering into the details here: I assume sentences like (6a) and (6b) to be instances of simple clauses with a parenthesis. VIoC-expressions like “croyez-vous” exhibit certain properties characteristic for parenthetical constructions: (i) they can be omitted, (ii) they are restricted to the root position, (iii) they can appear in various positions in the sentence. (The same holds if the expression in question does show the canonical word order like “vous croyez” or if the object-initial question (6b) does not exhibit stylistic inversion). (6a) and (6b) being analyzed as parenthetical constructions, their slight suboptimality is assumed to be due to characteristics of the parenthesis (and not, for example, due to any movement operation). The comparison with the completely felicitous sentences (5a) and (5b) with the PP-parenthetical “d’après vous” suggests that the reason for the slight suboptimality of (6a) and (6b) resides in the fact that sentencial parentheses like “croyez-vous” are not permitted in French interrogatives. The sentencial property is related to the issue of the interpretive relation between the predicate and the object argument, the problematic point being the fact that the argument of VIoC-expressions like “croyezvous” has to be specified by the host sentence (cf. Reis 1995; 1996 concerning German). We can observe that French declaratives with a sentencial parenthesis are completely felicitous, when they occur with an overtly realized object. Interestingly, their variants with interpretive integration (i.e. without an overtly realized object) show the same slight suboptimality effect.11 (12) a. Cet écrivain, on le sait, était un bon-vivant. this writer one it knows was a bon vivant ‘This writer, as is generally known, was a bon vivant.’ b. (?)Cet écrivain, on sait, était un bon-vivant. this writer one knows was a bon vivant
18
Aria Adli
(13) a. Cette maison, comme vous le savez, est très ancienne. this house as you it know is very old ‘This house is, as you know, very old.’ b. (?)Cette maison, comme vous savez, est très ancienne. this house as you know is very old One could assume that in French the grammar generally selects the form with a sentencial parenthesis without interpretive integration as the „better candidate“. However, sentencial parentheses without interpretive integration are not possible in French interrogatives. (14) a. *Où, le penses-tu, habite-t-elle avec l’ enfant depuis 1985? where it thinks you lives she with the child since 1985 b. *Où, tu le penses, habite-t-elle avec l’ enfant depuis 1985? where you it thinks lives she with the child since 1985 The slight suboptimality of VIoC-constructions like (6a) and (6b) is therefore due to the fact that the form with interpretive integration, which is actually required in French, is not available in interrogatives and that the slightly suboptimal, integrated variant has to be used.12 This being said, we can turn back to the initial question as to why the long object extraction (7c) shows a high degree of suboptimality. A first intuition consists in the assumption of some kind of relationship between the (slight) suboptimality of object-initial VIoC-constructions like (6b) and the (strong) suboptimality of object-initial VImC-constructions like (7c). However, according to the present analysis, (6b) is a parenthetical construction and (7c) an extraction construction. It does therefore not seem easy to establish a relationship between them. Interestingly, Reis (2000a; 2000b) has shown on the basis of the characteristics of German was…-w-constructions like (15a) that properties of extraction constructions like (15b) and properties of parenthesis constructions like (15c) can co-occur. In other words, parenthesis and extraction constructions are two related types of construction affecting each other – in a rather unorthodox manner – leading to hybrid phenomena. (15) a. Was glaubst du, was er kochen sollte? what believe you what he cook should b. Was glaubst du, dass er kochen sollte? what believe you that he cook should
Gradedness and Consistency in Grammaticality Judgments
19
c. Was sollte er glaubst du kochen? what should he believe you cook Reis (2000a: 28) enumerates several properties of was…-wconstructions that are typical for extraction constructions, e.g. the fact that the was-clause is always initial, that the related wh-clause must contain a wh-moved wh-phrase (and hence is not an ob-interrogative), that the was…w-construction can be embedded, that the was-clause may contain more complex verbs of saying, thinking or believing like behaupten (claim) or argwöhnen (suspect), etc. At the same time, she points out several properties typical for parenthesis constructions, e.g. the fact that only those predicates can appear as bridge verbs in was…w-constructions which can also appear in parenthetical was-sentences like (15c), that complex predicates involving es or full object NPs like scheint es (seems it) or hat sie das Gefühl (has she the feeling) are allowed, contrasting with extraction constructions. Therefore she claims that parenthesis and extraction constructions are related and that properties of the one can be transferred on the other by processes of analogy: „Since a convincing account of EC-IPC ‚blends’ [EC = extraction construction, IPC = integrated parenthetical construction, A.A.] is quite hard to give [...] it might seem better after all to treat the rather slight EC-traits of EV2 constructions as mere analogical transgressions [...] of the basic IPC pattern which the formal and interpretive closeness of prefinite IPCs to ECs gives rise to“ (Reis 2000a: 27).13 Sternefeld (1998: 28) also takes into account the idea of interpretive closeness in his analysis of German was…w-constructions. He applies the concept of semantic parallelism referring to the compositional semantics of Dayal (1991) for (simple) Hindi wh-constructions. He assumes a relationship between German was…w-constructions and the semantically parallel colonconstructions. This assumption helps him to give at least a partial account for the ungrammaticality of (17a) and (17b) on the basis of the idea of analogy. Sternefeld (1998) tries to explain why (16) is grammatical, but not (17a) and (17b), although multiple questions are generally possible in German. (16) Was glaubst du, wer gekommen ist? what believe you who come is (17) a. *Was glaubt wer, wer gekommen ist? what believes who who come is b. *Wer glaubt was, wer gekommen ist? who believes what who come is
20
Aria Adli
(18) a. *Was glaubt wer: Wer ist gekommen? b. *Wer glaubt was: Wer ist gekommen? Sternefeld (1998) essentially argues that if we cannot explain the ungrammaticality of the hypotactic constructions (17a) and (17b), we should examine the semantically equivalent, paratactic constructions (18a) and (18b) more thoroughly. It appears, that yet the paratactic construction is ungrammatical. This observation still does not give a complete account for the asymmetry between (16) and (17a), (17b). However, we can assume that whatever is responsible for the ungrammaticality of (18a) and (18b), is also responsible for the ungrammaticality of (17a) and (17b), because (17a), (17b) and (18a), (18b) are semantically parallel. The idea of analogy underlying Sternefeld’s analysis and also developed by Reis in the scope of her analysis of German was…w-constructions – namely “analogy rather than wh-movement plays the major role in accounting for long wh-extraction constructions” (Reis 2000b: 403) – also offers a possible explanation for the suboptimality of long object extractions in French. We can assume that whatever is responsible for the suboptimality of French VIoC-constructions like (6a) and (6b) is also responsible for the suboptimality of long object extractions like (7c). In other words: Even though the suboptimality of VIoC-constructions is due to reasons specific to parenthetical constructions (such as the assumption of the inadmissibility of sentencial parentheses with interpretive integration in French), they also affect the VImC-construction by virtue of the closeness of parenthetical constructions and extraction constructions. However, one still needs to account for the fact that the suboptimality is more pronounced for the object-initial VImC-construction than for the object-initial VIoC-construction in French. The first must therefore be affected by an additional factor, causing a decrease in grammaticality not affecting the latter. I assume that in many languages the long object extraction shows a lower degree of grammaticality than corresponding VIoCconstructions. This fact has not yet received much attention in the literature – possibly because gradedness has been considered for a long time as an epiphenomenon, but maybe also because the long object extraction often serves as a counterpart to the clearly ungrammatical long subject extraction (leading to the effect that grey next to black seems whiter than grey next to white). Along these lines I also assume German long object extractions as in (19b) to have a lower degree of grammaticality than (19a). (19) a. Welchen Anruf glaubst du erhielt der Anwalt meiner Frau? which call believe you received the lawyer of my wife
Gradedness and Consistency in Grammaticality Judgments
21
b. Welchen Anruf glaubst du, dass der Anwalt meiner Frau erhielt? which call believe you that the lawyer of my wife received In sum, we have seen that the controlled measurement of a graded concept of grammaticality does, on the one hand, allow one to obtain a fairly detailed picture of grammaticality contrasts. Syntactic discussions can thus be placed on a more solid empirical base. This reduces the risk of myth production as in the que-qui case, and it allows reliable assumptions on finegrained differences as the contrast between subjet-initial and object-initial questions or the different degrees of suboptimality between object-initial VIoC- and VImC-constructions show. On the other hand, the same measurements can be analyzed from another point of view, namely with respect to judgment consistency. Judgment consistency seems to correlate with the degree of grammaticality as well as with certain structural properties, e.g. the initial element. It is a new and complementary source of information, to be hence worthy considered in grammar research.
Notes 1. 2.
3.
Concerning this Rizzi (1990: 56) writes: „A significant body of work has been devoted to the rule converting que into qui in French wh-constructions.“ To put it briefly, they confound cases and variables, i.e. they calculate a correlation for a sample of variables and not for a sample of persons. By calculating the mean of the judgments of a sentence for all persons they eliminate the variance within the sample. Rather, the test-retest-reliability should have been calculated for each sentence separately. If desired, the mean of the different reliabilities could then have been calculated (taking into account the Fisher-Ztransformation). In addition, Bard et al. (1996: 23) and Keller (2000) compared two independent samples. Rather, the test-retest-reliability is defined as the correlation of two repeated measurements with the same sample. Only these mistakes explain why Bard et al (1996) obtain r = 0.89, Keller (2000: 217) r = 0.90, and Cowart (1997) even a hardly realistic r2 = 0.97. Note that in the present study reliability is not calculated using the test-retest-reliability but by Cronbach’s D. Anticipating the results of the judgment test, the degree of grammaticality of the sentences (5a) to (7c) is indicated by the symbols (?), ?, ?? and *, roughly meaning “slightly suboptimal”, “suboptimal”, “highly suboptimal” and “ungrammatical”. This categorization is a simplification of the more detailled, metrical grammaticality values shown in the line chart and therefore does not convey the richness and precision of information of the gradedness approach. However, these categories are not the result of a mere recoding of the metrical
22
4.
5.
6. 7. 8. 9.
10.
11.
12. 13.
Aria Adli values, but rather an interpretation of the values in terms of a categorical concept of well-formedness. I consider D and E equally important with this issue. In other words, the conclusion that the grammaticality of two constructions is identical (i.e. a nonsignificant result) and the conclusion that the grammaticality of certain constructions are different (i.e. a significant result) has the same practical impact for the purposes of grammar research and should come along with the same error probability. The overall main effect B itself cannot be interpreted because of a hybrid interaction effect A x B (p < 0.000), i.e. both levels of variable B show the same decreasing trend whereas the three levels of variable A do not show the same trend (cf. also Bortz 1999: 289-291). Pillai’s PSB|a1 = 0.068; F = 4.7; df = 1; dferror = 64; partial K2 = 0.068; p < 0.034 Pillai’s PSB|a2 = 0.271; F=23.842; df=1; dferror=64; partial K2 = 0.271; p < 0.000 Pillai’s PSB|a3= 0.574; F=86.093; df=1; dferror=64; partial K2 = 0.574; p < 0.000 Different phonetic, morphological and semantic disambiguation cues were combined in the design of the material: (i) a lack of agreement concerning the number feature between the verb and the object, (ii) a morpho-phonetic realization of number, i.e. not only a readable but also an audible subject-objectdistinction, for the wh-element (by means of liaison) and for the verb (3rd group of conjugation), and (iii) a semantically founded assignment of the subject and object function in terms of selection constraints. VIoC is the abbreviation derived from the German expression for “verb-initial sentencial expression without COMP”, and VImC the abbreviation derived from the German expression for “verb-initial sentencial expression with COMP”. These expressions shall be descriptive and neutral terms, especially with regard to the theoretical issue as to whether VIoC-constructions like (6a) and (6b) are to be analyzed as instances of long extraction or of simple matrix clauses with a parenthesis. French and German are complementery with respect to the condition of interpretive integration in declaratives. The forms with interpretive integration are preferred in German. (A) ?? Karl begann, wie er das gesagt hatte, zu schreiben. Charles began as he this said had to write (A’) Karl begann, wie er gesagt hatte, zu schreiben. Charles began as he said had to write An explanation for the unavailability of the form with interpretive integration in French interrogatives has yet to be found. Reis (1995) calls German sentences like (B) EV2-constructions. She essentially states that they are not, as has been often assumed, instances of extraction but rather a particular form of parenthetical construction. (B) Was glaubst du sollte er kochen? what believe you should he cook
Gradedness and Consistency in Grammaticality Judgments
23
References Adli, Aria 2004
Grammatische Variation und Sozialstruktur. (= Studia Grammatica 58). Berlin: Akademie Verlag. Bard, Ellen Gurman, Dan Robertson and Antonella Sorace 1996 Magnitude Estimation of Linguistic Acceptability. Language, 72 (1): 32-68. Bayer, Josef 1984 COMP in Bavarian. Linguistic Review, 3: 209-274. Bortz, Jürgen 1999 Statistik für Sozialwissenschaftler. Springer, Berlin, New York. Chomsky, Noam 1964 Current Issues in Linguistic Theory. Mouton, Den Haag. 1965 Aspects of the Theory of Syntax. MIT Press, Cambridge (Mass.). 1977 On wh-movement. In Culicover, Peter, Thomas Wasow and Adrian Akmajian (eds.). Formal syntax. pp. 71-132. Academic Press, New York. Cohen, Jacob 1973 Eta-squared and partial eta-squared in fixed factor ANOVA designs. Educational and Psychological Measurement, 33: 107-112. Cowart, Wayne 1997 Experimental Syntax. Applying Objective Methods to Sentence Judgments. London: Sage Publications. Cronbach, Lee J. 1951 Coefficient alpha and the internal structure of tests. Psychometrika, 16(3): 297-334. Dayal, Veneeta S. 1991 Wh-Dependencies in Hindi. PhD thesis, Cornell University. De Vincenzi, Marica 1991 Syntactic Parsing Strategies in Italian. Kluwer, Dordrecht. Farke, Hildegard and Sascha W. Felix 1994 Subjekt-Objektasymmetrien in der Sprachverarbeitung - Spurensuche. In Felix, Sascha W., Christopher Habel and Gert Rickheit (eds.). Kognitive Linguistik - Repräsentationen und Prozesse. Westdeutscher Verlag, Opladen. Frazier, Lyn and Flores D'Arcais, Giovanni B. 1989 Filler-driven parsing: A study of gap filling in Dutch. Journal of Memory and Language, 28: 331-344. Gorrell, Paul 2000 The subject-before-object preference in German clauses. In Hemforth, Barbara and Lars Konieczny (eds.). German Sentence Processing. pp. 25-63. Dordrecht: Kluwer.
24
Aria Adli
Guilford, Joy P. 1954 Psychometric methods. MacGraw-Hill, New York. Hemforth, Barbara 1993 Kognitives Parsing: Repräsentationen und Verarbeitung sprachlichen Wissens. Infix, Sankt Augustin. Hulk, Aafke and Jean-Yves Pollock 2001 Subject positions in Romance and the theory of Universal Grammar. In Hulk, Aafke and Jean-Yves Pollock (eds.). Inversion in Romance. pp. 3-20. Oxford University Press, Oxford. Kayne, Richard S. 1977 French relative que. In Luján, Marta and Fritz Hensey (eds.). Current Studies in Romance Linguistics. Washington D.C.: Georgetown University Press. Keller, Frank 2000 Gradience in Grammar. Experimental and Computational Aspects of Degrees of Grammaticality. PhD dissertation, University of Edinburgh. Keren, Gideon and Charles Lewis 1979 Partial omega squared for ANOVA designs. Educational and Psychological Measurement, 39: 119-128. Levelt, Willem J. M. 1974 Formal Grammars in Linguistics and Psycholinguistics, Vol. III. Mouton, Den Haag. McGraw, Kenneth O. and Wong, S.P. 1996 Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1): 30-46. Perlmutter, David 1971 Deep and Surface Structure Constraints in Syntax. Holt, Rinehart and Winston, New York. Pesetsky, David 1982 Complementizer-trace phenomena and the Nominative Island Condition. The Linguistic Review, 1: 297-344. Reis, Marga 1995 Wer glaubst du hat recht? On so-called extractions from verb-second clauses and verb-first parenthetical constructions in German. Sprache und Pragmatik, 36: 27-83. 1996 Extractions from verb-second clauses in German?. In Lutz, Uli and Jürgen Pafel (eds.). On Extraction and Extraposition in German. pp. 45-88. John Benjamins: Amsterdam. 2000a Wh-movement and integrated parenthetical constructions. In Zwart, C. Jan-Wouter and Werner Abraham (eds.). Studies in Comparative Germanic Syntax. Proceedings from the 15th Workshop on Comparative Germanic Syntax. John Benjamins: Amsterdam.
Gradedness and Consistency in Grammaticality Judgments 2000b
25
On the parenthetical features of German was-w-constructions and how to account for them. In Lutz, Uli, Gereon Müller, and Arnim v. Stechow (eds.) Wh-Scope Marking. pp. 359-407. John Benjamins: Amsterdam.
Rizzi, Luigi 1990 Relativized Minimality. MIT Press, Cambridge (Mass.). 1997 The fine structure of the left periphery. In Haegeman, Liliane (ed.). Elements of Grammar. pp. 281-337. Kluwer: Dordrecht. Schütze, Carson T. 1996 The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press, Chicago. Sternefeld, Wolfgang 1998 Grammatikalität und Sprachvermögen. SfS-Report 02-98. Universität Tübingen. Taylor, James B. and Howard A. Parker 1964 Graphic ratings and attitude measurement: a comparison of research tactics. Journal of Applied Psychology, 48(1): 37-42.
Null Subjects and Verb Placement in Old High German Katrin Axel
1
Introduction
This paper deals with null-subject constructions in Old High German (OHG). The term OHG refers to a group of dialects of the West Germanic branch whose written records span from c.850 to c.1050 A.D. The major OHG prose texts from the 8th and 9th centuries (i.e. the Monsee Fragments [MF], the Isidor [I], and the Tatian [T]) witness a wide range of nullsubject constructions. For example, the quasi-argument iz ‘it’ is not always present in the context of time and weather expressions, and there sporadic attestations of putative null pronouns with arbitrary reference:1 (1)
a. daz danne · nah · ist sumere2 (MF 19, 14) that then close is to-summer ‘that it is then close to summer’ b. noh intprennent lioht Inti sezzent íz untar mutti nor light-3PL lamp and put it under bushel ‘no-one lights a lamp and puts it under a bushel’ (T 137, 21)
Secondly, there are numerous sentences attested in which a referential subject is not overtly realized, see (2).3 Note that Modern Standard German, which has been classified as semi-pro-drop language due to the occurrence of impersonal passives with an empty expletive, does not permit the omission of referential pronouns or quasi-arguments.4 (2)
a. Sume hahet in cruci (MF 18, 18) some-AKK hang-2PL to cross ‘Some of them you will crucify’ b. In dhemu druhtines nemin archennemes ... fater in the Lord’s name recognize-1PL ... father ‘In the name of the Lord we recognize ... the Father’ (I 279)
While the literature has widely acknowledged that in OHG, as in ProtoGermanic, non-referential subject pronouns do not have to be overtly real-
28
Katrin Axel
ized, it is commonly held that at the stage of OHG, the use of referential null subjects does not reflect properties of the native grammatical system anymore (e.g. Grimm [1898] 1967: 235; Hopper 1975: 31). In the philological literature, the omission of referential subject pronouns in OHG is generally considered as a special type of ‘loan syntax’ (e.g. Eggenberger 1961; Hopper 1975: 31). The central idea is that the subject omissions are ‘imposed’ on the OHG texts through a narrow or ‘slavish’ translation process and should not be considered a native part of the OHG grammar. As will be shown later on, the OHG null-subject property interacts with word order. Therefore this study is based on the major prose texts, namely on the Monsee Fragments, the Isidor and the Tatian for the earlier OHG period, and on Notker Labeo’s version of the De Consolatione Philosophiae for late OHG.5 Unfortunately, all the longer OHG prose texts that have been handed down are translations or commentaries of biblical, theological or philosophical sources originally written in Latin. This is why the translation process has to be evaluated carefully. The quantitative data have been calculated on the basis of Eggenberger’s (1961) extensive study on OHG null-subject occurrences. To conclude, the presence of referential null subjects clearly shows that the OHG of the 8thand 9th century (henceforth: earlier OHG) – or, more precisely, the OHG dialects as they are documented in the written records – cannot be classified as a semi pro-drop language. In the following it will be argued that earlier OHG allowed genuine pro-drop. I will demonstrate that null-subject usage in earlier OHG is intricately related to verb placement and that it is influenced by morphological factors (sections 4 & 5). The morpho-syntactic distribution of null subjects is determined by properties of the ‘native’ grammar of OHG. and cannot be explained by means of ‘loan syntax’ (section 6). The loss of the null-subject property in late OHG was probably the result of a grammatical competition between null and overt subject pronouns (section 7). Finally, the case of OHG null subjects will be used to illustrate that the turning to historical and diachronic evidence can be fruitful for syntactic theorizing in general (section 8). 2
Non-canonical properties of OHG null-subject constructions
Unlike in canonical null subject languages, where the non-expression of the subject pronoun is the norm, in the early OHG texts, null subjects and overt subject pronouns appear to systematically co-occur and both variants are frequently attested. In the MF, about two thirds of the sentences with sub-
Null Subjects and Verb Placement in Old High German
29
ject pronouns have null subjects and about one third have overt subject pronouns. In the I and in the T the numbers are reversed: roughly 40% of cases have null subjects, and about 60% have overt subjects. These figures are not surprising given that in earlier OHG atonic overt and null pronominal subjects appear to exhibit parallel referential properties (see 7).6 This contrasts sharply with the situation in Notker’s Consolatio in late OHG, where virtually all sentences contain overt subjects. An alternation between overt and null variants in early OHG can also be observed with non-referential subject pronouns. For example, overt subject pronouns with arbitrary reference (e.g. sie ‘they’, man ‘one’), see (3a,) are attested alongside null pronouns with arbitrary reference as in (1b) above. As was argued by Jaeggli (1986: 66), in the canonical null-subject languages overt pronouns cannot have an arbitrary interpretation. A similar point can be made about quasi-arguments: they could equally be overtly realized in OHG, see (3b). (3)
a. noh sie nilesent fon thornun nor they NEG-pick from thorns ‘nor do people pick grapes from thorns’ b. uuanta iz abandet because it is-becoming-evening ‘for it is nearly evening’ (T 675,3)
uúinberu grapes (T 162,31)
The co-occurrence of referential pro-drop and overt non-referential subject pronouns is striking since in all the classic GB licensing-and-identification approaches (e.g Rizzi 1986), any language which can license empty referential subjects will be capable of licensing empty non-referential subjects. A further striking characteristic of earlier OHG is that the use of an overt subject pronoun appears to be obligatory in certain syntactic configurations. As has already been observed in the philological literature (Eggenberger 1961: 168), null subjects appear to be largely banned from embedded clauses. 3
OHG sentence structure
In order to be able to develop an analysis for the OHG null-subject phenomenon, it is first necessary to outline some basic facts and assumptions about OHG sentence structure in general (see also Axel 2004, Axel in prep.).
30
Katrin Axel
OHG is an SOV language with asymmetrical word order. In subordinate clauses, the finite verb usually stays in sentence-final or sentence-late position, see (4). In root clauses, on the contrary, the finite verb is generally moved to the left periphery. In fact, verb position and its correspondence to clause type already seem to be fairly similar to the Present-Day German situation in the major OHG prose texts.7 In the core clause types, the predominant ‘productive’ word orders (i.e. those frequently realized against or without the model of the Latin) are: verb-second in (non-conjoined, complementizerless) declarative main clauses (5), verb-first in independent yes/no interrogatives (6a) and in imperatives (6b), and verb-second in independent constituent interrogatives with a fronted wh-phrase (7). Apart from verb-second, there is also a substantial amount of verb-first word order in OHG declaratives (see section 4). (4)
endi dhazs dhiu burc hierusalem aruuostit uuardh and that the city Jerusalem destroyed was ‘and that the city of Jerusalem was destroyed’ (I 468) et ciuitatem hierusalem in exterminatione fuisse
(5)
a. Dhinera uuomba uuaxsmin setzu ih ubar miin your womb’s fruit put I upon my ‘I will put the fruit of your womb upon my throne’ De fructu uentris tui ponam super sedem meam
(6)
a. bist thu uuîzago (T 109, 24) are you prophet ‘Are you the Prophet?’ proph&a es tú b. tuot riuua (T 103, 1) do-PL repentance ‘Repent!’ pænitentiam agite
(7)
abrahaman uuvo gisahi thu how saw you Abraham ‘how come you have seen Abraham?’ & abraham uidisti
hohsetli throne (I 611)
(T 451, 7)
The post-finite placement of the personal pronoun in topic-initial declaratives like in (5) clearly shows that the verb-second grammar of OHG is different from the Old English (OE) one. In the equivalent OE sentences personal pronouns are placed in pre-finite position following the topicalized
Null Subjects and Verb Placement in Old High German
31
constituent. This ‘verb-third’ effect has been traced back to the fact that the finite verb does not move to C0 in OE declaratives but targets a position in the I-domain left of the position of clitic pronouns. 8 In contrast to OE, there is also no compelling evidence for overt verb movement to a sentence-medial projection in OHG subordinate clauses. All derivations from the strict verb-final pattern can be explained by mechanisms such as extraposition (8a), verb raising (8b) or verb projection raising that have persisted throughout the history of German (especially on a dialectal level), see also (Lenerz 1984: 169-179, Axel ).9 Furthermore, as in Modern German, there is no evidence for the overt movement of full subjects from their VP-internal position to a canonical subject position in the OHG middle field. However, overt pronoun subjects can generally be found in a left-peripheral Wackernagel position (see Axel in prep.). (8)
a. dhazs ir chihoric uuari gote (I 492) that he obedient be god-DAT ‘that he might be obedient to God’ ut esset deo subiectus b. dher chisendit scolda uuerdhan (I 587) who sent should be ‘who was to be sent’ qui mittendus erat
Analysed most simply, the OHG clause can thus be argued to consist of a C-projection whose head selects a verb phrase. Generalized V-to-Cmovement takes place when no complementizer is present.10 4
Syntactic distribution of null subjects
As already observed by Eggenberger (1961: 168), the distribution of (referential) null subjects in earlier OHG is characterised by a main/subordinate asymmetry: referential null subjects are largely restricted to clauses with verb-fronting, see (9a, b) for a declarative and an interrogative verb-second clause, and (6a) above for an interrogative verb-first clause. (9)
a. [CP Dizj [Cƍ quadi ... tj ti ... this-AKK said-1SG ‘This, I said...’ Hoc dixi
(MF 40,23)
32
Katrin Axel
b. [CP zihiuj [Cƍ gienguti tj ... [V° uz ti] ... (T 215,27) why went-2PL out ‘what did you go out (to see)?’ sed quid existis (uidere) As can be seen in the examples (9a, b), the OHG null subject generally corresponds to an empty subject pronoun in the Latin source. This is not really surprising since Latin is a full pro-drop language, where the non-realization of a pronominal subject is the norm. This is why it is more interesting to ask whether there are any cases in which Latin null subjects are systematically translated by overt subject pronouns. This is exactly what we find in verb-final/late clauses. In this syntactic context, overt subject pronouns are regularly used against the Latin. In the complex sentences in (10), subject pronouns (in bold face) are inserted into the middle field of the verb-final subordinate clauses (in squared brackets), whereas they are left out in the respective main clauses. In the Latin sources, both the subordinate clause and the main clause do not contain an overt subject pronoun. (10) a. ... [thaz uuir uuizumes] thaz sprehhemes (T 407,7) what we know that speak-1PL ‘... we speak of what we know’ quod scimus loquimur b. ... [so aer · danan fuor ·] quuam in iro · dhinchnjs ... when he thence passed came-3SG in their synagogue ‘when he had departed from there, he went into their synagogue’ ... cum inde transisset, uenit in synagogam eorum (MF 4,19)11 The main/subordinate asymmetry in null-subject distribution also emerges if one looks at the data from a quantitative perspective. Table 1 gives the rates of subject omission in main and subordinate clauses in the MF, the I and in the T. In the latter two texts approximately 40 per cent of main clauses with pronominal subjects contain null subjects and in the MF even almost two thirds of cases show subject omission. This contrasts sharply with the rate of subject omission in subordinate clauses, which is relatively low (7 to 15 perc cent) for all three texts.
Null Subjects and Verb Placement in Old High German
33
Table 1. Overt vs. null-subject pronouns in main and subordinate clauses 12 clause type main subordinate
MF pron. subj. overt null 48 84 (36%) (64%) 73 13 (85%) (15%)
Is pron. subj. overt null 61 48 (56%) (44%) 85 8 (91%) (9%)
T pron. subj. overt null 1434 960 (60%) (40%) 1180 95 (93%) (7%)
How can we explain now why null subjects are (largely)13 restricted to root clauses in early OHG? As mentioned above, OHG main and subordinate clauses were distinguished structurally by the position of the finite verb. Thus, the skewed distribution of OHG null subjects could be derived without further assumptions if null subjects were somehow dependent on verb fronting. Since null subjects are frequently attested in verb-first and verbsecond wh-clauses, for example, it is clear that they must be licensed in post-finite root position. The interesting point is whether they are also licensed in the pre-finite position (in the ‘front-field’) of root clauses. At first sight, the most plausible analysis for subjectless declarative sentences with ‘surface’ verb-first order as in (11a) would place the null subject in prefinite position as indicated in (11b). If the null subject was analysed as constituting the front field, sentences like (11b) would be regular verb-second clauses in structural terms. (11) a. steig tho in stepped-3SG then into ‘He stepped into a boat’ Et ascendens in nauicula [Cƍ steigi tho b. [CP proj
skifilin... boat
(Tatian 193, 1)
tj in skifilin ti ...
Attractive though it might be, this analysis is untenable. It does not account for the fact that subjectless (declarative) verb-first order is relatively infrequent. In all three texts taken together there are only about 30 cases of subjectless declarative clauses which exhibit surface verb-first order independent of the Latin source. If these sentences were ‘hidden’ verb-second clauses, there is no reason why they should be so marginal compared to subjectless ‘surface’ verb-second order independent of the Latin. Interestingly, virtually all the subjectless declaratives with surface verb-first order show characteristics of true verb-first declaratives. In OHG, declarative verb-first order is e.g. attested in the context of motion verbs and with sentential negation expressed by proclitic ni which attaches to the finite verb.
34
Katrin Axel
To illustrate this point, compare the a versus b cases in (12) and (13), where examples for these two types of verb-first declaratives without overt subjects are contrasted with similar sentences with overt subjects. Note that the OHG verb first order is realized independently of the Latin.14 (12) a. quam thô In geiste In thaz went-3SG then in spirit in the ‚And he came in the Spirit into the temple’ & uenit In spiritu In templum
goteshûs temple (T 89, 31)
b. quamun sie thó inti gifultun beidu thiu skef came they then and filled both the boats ‚And they came and filled both of the boats’ (T 125,27) & uenerunt & impleuerunt man (T 285,14) (13) a. nihaben NEG-have-1SG man ‚I have no man’ hominem non habeo b. nisanta got sínan sun ... NEG-sent god his son ‘God did not send his Son ...’ non enim misit deus filium suum
(T 407, 30)
To conclude, there is no compelling evidence that in OHG referential null subjects can occur in pre-finite root position. Hence we can account for the main/subordinate asymmetry in a straightforward way: if we claim that null subjects are only licensed in post-finite position, this asymmetry falls out without further assumptions. In other words, it is highly plausible that null subjects are only licensed in configuration in which they are c-commanded by a leftward moved finite verb15: (14) [V+AGR]k [pro ... tk]] In OHG, the only way to obtain the required configuration for null-subject licensing is verb movement to C0.16 5
Morphological distribution of OHG null subjects
The distribution of OHG null subjects is not only syntactically restricted but appears to be influenced by morphological factors as well. In the older OHG prose texts, a person split can be clearly observed (see also Eggen-
Null Subjects and Verb Placement in Old High German
35
berger 1961: 169; van Gelderen 2000: 136). Referential null subjects are attested in all persons and numbers. However, as is illustrated in Table 2, it is only in the 3rd person singular and plural that the null variant is used more frequently than the overt one.17 The distribution cannot be related to variations in feature strength. The paradigms for both strong and all classes of weak verbs have six distinct forms in the present indicative. The only syncretism that can be found is between the 3rd and 1st person singular, which are identical in the past indicative and in the present and past subjunctive.18 There is one interesting piece of evidence, however, which corroborates the widely-held claim that there is a connection between rich inflection and pro-drop. In earlier OHG, there are two alternative verb endings in the 1st person plural, a short ( -m) and a long one (–mês) (see Eggenberger 1961: 104-108; Harbert 1999). Pronouns occurring with the short ending are virtually always overt. With the long ending, however, subject pronouns are very frequently omitted, but only if they occur in post-finite position (see also Eggenberger 1961: 108). It is at present still unclear how this skewed morphological distribution of OHG null subjects can be explained.19 It is still worth noting, however, that this phenomenon cannot be attributed to the influence of the Latin source texts since Latin is consistently pro-drop throughout the paradigm.
Table 2. Frequencies of overt and null subjects– with rate of null in () – in the of MF, I and T, differentiated by person/number pers.
numb.
MF pron. subj. overt/ null
I pron. subj. overt/null
SG
1
10/5 (33 %)
36/2 (5%)
PL
TOTAL
T pron. subj. overt/null 415/103 (20 %)
2
5/3 (38 %)
3/2 (40 %)
131/84 (39 %)
3
12/52 (81 %)
15/29 (66 %)
394/460 (54%)
1
2/1 (33 %)
2/3 (60 %)
62/27 (30%)
2
16/10 (38%)
1/0 (0 %)
262/42 (14 %)
3
3/13 (81%)
4/12 (75%)
170/244 ( 59%)
48/84 (= 64 %)
61/48 (= 44 %)
1434/960 (= 40 %)
36
Katrin Axel
6
Evaluation of the loan-syntax hypothesis
The results reached so far make shed serious doubt on the hypothesis that the omission of referential subject pronouns in earlier OHG should be dismissed as a foreign feature which was imposed on the OHG texts through slavish copying from the Latin sources (e.g. Grimm 1967: 235; Hopper 1975: 31; Eggenberger 1961). One serious problem of this approach is that, while it may be possible to envisage a translation process in which a feature of a foreign-language source in translations is consistently ‘copied’ into the target language, it is hard to explain why this copying process should be banned from specific morpho-syntactic environments defined by the target language. More concretely, in the case of OHG null subjects, it remains a puzzle under the loan-syntax hypothesis that null subjects were largely banned from prefinite environments and from contexts with 1st person plural verbs ending in –m. Moreover, the loan-syntax hypothesis cannot explain why translators were more reluctant to preserve the null subject variant with 1st and 2nd than with 3rd person. A further problem is that loan syntax cannot convincingly account for the fact why there should be any instances of unexpressed subjects at all both in the autochthonous literature as well as in texts which are not distinctly loyal to their Latin sources. Referential null subjects are, however, attested in e.g. the Hildebrandslied, an autochthonous poem, and quite frequently in Otfrid’s Gospel Harmony, a vernacular poem that can not be considered an exponent of the ‘Latinized’ writing tradition and that generally does not contain grammatical features of a distinctly foreign nature. The most extensive and influential study promoting the loan-syntax hypothesis is Eggenberger (1961). In this investigation, the differences in the frequencies of subject omission between OHG texts are not considered as a reflex of a historical development of a native feature but claimed to be dependent on text type instead. In other words, Eggenberger claims that it is not the oldest OHG texts which show more subject omission, but rather the most Latinized ones. While Eggenberger’s careful quantitative investigation of null-subject occurrences in numerous OHG texts is very useful, his line of reasoning is not really convincing: Eggenberger argues that subject omission is most frequent and parallel to the Latin in ‘congruent’ texts (i.e. texts with properties of the interlinear glosses), relatively infrequent in the ‘original sources’, and subject to intratextual frequency shifts in works with a mixed character (with both properties of the original and congruent texts). His tripartition of texts (‘congru-
Null Subjects and Verb Placement in Old High German
37
ent’-‘mixed’-‘original’) is, however, not consistently based on independent criteria, but again determined by the frequency and pattern of subject omission itself. This leads to rather idiosyncratic text classifications in some cases and, more importantly, it makes the whole argumentation circular. Moreover, Eggenberger’s conclusion that the age of texts does not have an impact on subject omission is too strong. If one looks at the longer prose works, it becomes clear that factors such as age and loyalty to foreignlanguage sources are combined in a relatively unidirectional fashion: the older ones are generally more Latinized than e.g. Notker’s late OHG works. Without more fine-grained criteria it is thus impossible to decide whether the older texts witness more subject omission because they are more Latinized or just because of their age. What may, however, well be the case, is that that the Latinized writing tradition had an impact on the overall frequency of null subjects in the OHG translation. As the use of the null subject variant was truly optional in post-finite contexts, writers/translators may have used subject omission more excessively than they would have had done in their entirely native production. It is even possible that the null-subject property actually belonged to an older language stage in the spoken language and that this feature was preserved in the OHG theological texts due to their archaic style and possibly also due to their close relationship to the Latin.20 The conditions governing the specific morpho-syntactic distribution of null subjects in early OHG must, however, still have been somehow part of the linguistic competence of the writers/translators. 7
The loss of OHG null subjects
Modern Standard German does not allow referential pro-drop anymore. In the southern dialects referential pro-drop is still possible in very restricted morpho-syntactic environments. These restrictions are, however, substantially different from those in OHG as null subjects in modern dialectal German occur most frequently in subordinate clauses in the context of complementizer inflection and are limited to second person (e.g. Bayer 1984, Weiß 1998, Weiß to appear). There is thus no continuity in the transmission of null subjects even on a dialectal level.21 It is therefore legitimate to ask at which stage the null subject property as it was witnessed in earlier OHG was driven out of the language. Judging from the sparse textual basis that has been handed down to us, the critical period is late OHG. In Notker’s writings and Williram’s paraphrase of the Song of Songs (11th century),
38
Katrin Axel
referential null subjects are no longer attested. The major prose and poetical sources from the Middle High German period do not systematically witness the use of referential null subjects either. In contrast to what has been claimed for the loss of null subjects in other languages (e.g. for Medieval French [Adams 1987; Vance 1997], for Old English [Gelderen 2000] etc.), no major morpho-syntactic innovations are identifiable that could have triggered this development in late OHG. In the philological literature, the introduction of overt subject pronouns in early Old High German has traditionally been linked to the weakening of verbal endings due a phonological weakening process which was a consequence of the introduction of word-initial accent (Held 1903: XIII, Behaghel 1928: 442). However, as was noted already by Grimm (1967: 235) and Eggenberger (1961: XI), this did not lead to a substantial levelling of inflectional distinctions during the OHG period.22 According to standard accounts, even Modern German inflection, which is still more impoverished than the late OHG one, would be ‘strong’ (Rohrbacher 1999) enough or sufficiently ‘uniform’ (Jaeggli and Safir 1989: 30) to license pro.23 This is why the absence of (thematic) pro-drop in the modern language is usually traced back to an extra-morphological factor, namely to the verbsecond parameter (see e.g. Jaeggli and Safir 1989: 32; Rohrbacher 1999). To conclude, referential null subjects are lost despite the stability of a rich verbal inflection. Given that the licensing of OHG null subjects was dependent on verb movement, another potential cause for the loss of the null-subject property in late OHG could be syntactic developments. This has been claimed for Medieval French, for example. The syntactic distribution of pro in Old French was very similar to the one in OHG. The loss of null-subjects in the history of French has been related to the decline or loss of inversion contexts, i.e. to profound changes that affected the original V2-grammar (Adams 1987; Vance 1997). However, such an explanation does not carry over OHG. If at all, the verb-second grammar was consolidated and not weakened during late OHG. Moreover, there are no other major syntactic developments identifiable that could have affected the null-subject property.24 An alternative perspective is offered by the account of Sprouse and Vance (1999). They argue that the replacement of null pronouns by overt pronouns in several Germanic and Romance languages should often be viewed as a result of a grammatical competition between null subjects and overt atonic forms. For null and overt subject pronouns to be in a real competition (in the sense of Kroch 1989) they must have the same referential properties. This is generally not the case in a canonical null-subject lan-
Null Subjects and Verb Placement in Old High German
39
guage. In the case of OHG, however, there are some suggestive indications that null and overt subject pronouns had the same referential properties in post-finite environments in OHG. This is e.g. suggested by sentence pairs like in (15). The two sentences differ in the use of an overt vs. null subject pronoun, yet both sentences and their parts are very similar in wording and interpretation. Crucially, the overt realization of the subject pronoun does not trigger an emphatic or contrastive reading. (15) a. [Dhar ir quhad ...] chiuuisso meinida ir dhar sunu ... ‘where he said certainly meant he there son ‘Where he said >...<, he certainly meant there the Son ...’ (I 273) Dicendo enim >...< et filium et patrem ostendit b. [Dhar ir auh quhad ...] dhar meinida ‘where he also said there meant-3SG leohtsamo zi archennenne dhen heilegan gheist (I 274) easily to see the holy spirit’ ‘Where he said >...<, he clearly meant there the Holy Spirit’ Item dicendo >...< sanctum spiritum euidenter aperuit A second piece of evidence lies in the existence of overt non-referential pronouns alongside their null counterparts. In this case the overt and the null variant trivially have the same referential properties (viz. no referential properties) and thus it is clear that they are in competition.25 As Sprouse and Vance (1999) demonstrate, differences in parsing success, be they even very slight, may account for a diachronically unstable situation where null pronouns are replaced by overt pronouns over time. If one makes the not implausible assumption that utterances with null pronouns are more difficult to parse than those with overt pronouns, it will be expected that a drift towards a high frequency of overt pronoun use occurs. Once a language has two competing forms, speakers are attempting, subdoxastically, to match the relative frequencies as they perceive them in the surrounding linguistic data. If utterances with null subjects are more difficult to process and hearers fail to parse a certain amount of them correctly, these utterances are ‘lost’ for the calculation of frequency matching. This leads to an increase of the relative frequencies of overt variant in the production of these hearers when they function as speakers themselves. Over time the relative rates continue to shift in a constant direction– that is, in favour of the form involving higher parsing success (see Sprouse and Vance 1999: 256 for a detailed demonstration). This drift is, of course, not encoded in the grammar itself but is fuelled by the interplay of frequency monitoring in perception and frequency matching in production. In a com-
40
Katrin Axel
petition model of language change (inspired by Kroch 1989), such quantitative changes in the lingusitic environment can be argued to ultimately lead to a true reorganization of grammar.26 Thus, Sprouse and Vance (1999) show that the replacement of null subject by overt pronouns need not be related to any independent grammarinternal changes. This makes it possible to view the OHG developments within a wider typological context of Germanic and other Indo-European languages which show a strong tendency to replace null arguments by overt atonic pronouns even though their syntax and morphology differ considerably, both in synchronic and in diachronic respects. 8
The case of OHG null subjects and the relevance of historical evidence
In the GB-tradition, typological studies have found exceptions to virtually every proposed formulization of the pro-drop parameter. As there has been such extensive typological research into the nature of the null-subject phenomenon (see e.g. Gilligan 1987), it seems hard to justify the importance of supplementing this by evidence from remote language stages. As I have shown in the case of OHG, historical data are quite difficult to handle and appear to be influenced by various extra-grammatical factors. Worse still, the corpus of texts handed down to us is small and very heterogeneous with respect to text type, dialectal region, length, quality of translation etc. Nevertheless, historical evidence can often be very fruitful for syntactic theorizing. One important cross-linguistic generalization that had to be revised with the advent of historical studies was the claim that verb-second does not cooccur with referential pro-drop (e.g Jaeggli and Safir 1989; Rohrbacher 1999). Jaeggli and Safir (1989: 32), for example, argue that in V2languages, the separation between the locus of Case assignment (in Comp) and agreement features (in Infl) prohibits the identification of referential null-subject pronouns. The case of OHG sheds doubts on the alledged incompatibiltiy of referential pro-drop and verb second (see also Harbert 1999). As a matter of fact, it fills an important typological gap: OHG is as yet the first language discussed which combines the null-subject property with an ‘asymmetric’, strictly ‘C-oriented’ verb-second grammar.27 Moreover, if we take the diachronic perspective into consideration, it should be noted that evidence from language change has been used quite often to strengthen the theoretical claim of a direct, robust relation between
Null Subjects and Verb Placement in Old High German
41
morphological richness and null subjects (see Holmberg and Platzack 1995: 67; Rohrbacher 1999, and many others). As has already been objected by e.g. Sprouse and Vance (1999), many examples from diachrony do not confirm such a direct relation. The case of OHG also points in this direction. A difficult question that is still widely neglected in the literature is the relationship between historical and dialectal evidence in the syntactic domain. Unfortunately, the case of German null subjects, has not helped to clarify this point. At first sight, the presence of referential pro-drop in modern dialectal German indicates a suggestive continuity to the stage of OHG, but upon closer inspection, the exact licensing conditions of ‘historical’ and ‘dialectal’ pro differ considerably. Interestingly, however, it was in particular historical and dialectal evidence that identified many so-called ‘partial null-subject languages’ in the Romance and Germanic area. In these languages null subjects are restricted to specific morphosyntactic environments and their distribution is thus not solely governed by a parametrizable binary property of a licensing head.28 To conclude, the case of OHG null subjects not only offers important insights into the nature of the null-subject property, thereby becoming directly relevant for syntactic theorizing in general, but it also leaves open many interesting questions.
Notes
1.
2.
This work was supported by the Deutsche Forschungsgemeinschaft via the SFB 441 Linguistic Data Structures. I would like to thank Stephan Kepser and Marga Reis for organizing the conference Linguistic Evidence and for editing this volume. I am also very grateful to Helmut Weiß for his thoughtful comments on this paper. Thanks also to Werner Abraham, Anthony Kroch, Mary Kato, Konstanze Jungbluth, Marga Reis and Hubert Truckenbrodt for their comments at the conference. Furthermore, I would like to say thanks to Bettina Schreck for proofreading. For empty subjects in meteorological constructions and the like see e.g. Behaghel (1928: 444), Bishop (1977: 98) and Abraham (1993: 123). – As far as arbitrary null subjects are concerned, Hopper (1975: 81) considers them to be restricted to the North Germanic dialects in historical times (but see Eggenberger [1961: 102]) on arbitrary null subjects in OHG. Note that the phrase sumere is in the dative, which shows that it is the object of the preposition nah and not the subject of the sentence. See also Bishop (1997: 33.92) for the valences and theta roles of time expressions in OHG.
42 3.
4. 5.
6. 7. 8.
Katrin Axel It could be objected that the null-subject property of earlier OHG is in fact due to a more pragmatically conditioned type of subject omission, such as the socalled ‘topic drop’, which can be found in Modern German (e.g. (i) [Das] habe ich schon erledigt‘ ‘[that] have I already done’). However, a topic-drop analysis cannot account for OHG cases in which a non-subject XP has been topicalized and the omitted subject can therefore be argued to occur in the middle field, see, for example, (2) above. As will be shown below, this actually turns out to be the only syntactic configuration where null subjects are licensed in OHG. Note that topic-drop is cross-linguistically restricted to the highest specifier in a sentence. Neither can a topic-drop analysis can be extended to null subjects that are non-referential. The so-called quasi-arguments, see (1a), should be illicit as topics. Impersonal passives can also be found in OHG documents, see e.g. Behaghel (1928: 444) and Abraham (1993). The Monsee Fragments are cited by folio and line numbers, the Isidor is cited by line numbers and the Tatian by page and line numbers (of the whole edition, not of the manuscript) according to the following editions: The Monsee Fragments. Newly collocated text. Introduction, notes, grammatical treatise and exhaustive glossary and a photo-litographic fac-simile. Edited by G. A. Hench. Straßburg 1890/ Der althochdeutsche Isidor. Nach der Pariser Handschrift und den Monseer Fragmenten. Neu hrsg. von H. Eggers. Tübingen, 1964/ Die lateinisch-althochdeutsche Tatianbilingue Stiftsbibliothek St. Gallen Cod. 56. Unter Mitarbeit von Elisabeth De Felip-Jaud hrsg. von Achim Masser. Göttingen, 1994/ Notker der Deutsche. Boethius, »De consolatione Philosophiæ«. Buch I-V. Hrsg. von Petrus W. Tax. Tübingen, 1986, 1988, 1990). – The line numbers only indicate where the OHG sentence begins. – From section 3 onwards, the corresponding Latin sentence is given (without line numbers). – In some examples, underlining of the finite verb, bold face and/or bracketing have been added. – The fragmentary Monsee Fragments sections (= VIII, IX-XIII, XVI, XX, XXII, XXIV, XXVI, XXVIII, XXXI, XXXVII, XXXVIII, XLI) as well as the Monsee Isidor sections (XXXII-XXXVI) have been excluded from the study. – The Present-Day English translations of the Isidor examples have largely been adopted from Robinson (1997). Note that the ‘Avoid Pronoun Principle’ (Chomsky 1981: 65) predicts that the overt realization of a subject pronoun should be ruled out in cases where the more economical null-subject variant is licensed. See Dittmer & Dittmer (1998) and Lippert (1974) on T, Robinson (1997) and Lippert (1974) on I and Näf (1979) on Notker’s Consolatio. See alos Lenerz (1984) on OHG clause structure in general. See Eythórsson (1995:chapter 3) and the references cited threrein. – Some early OHG texts actually do contain some instances of XP–pronoun–Vfin order (e.g. Lippert 1974:57ff., Tomaselli 1995, Eythórsson 1995, Axel in prep.). The ‘inverted’ order (XP–Vfin–pronoun) is, however, much more frequent. In
Null Subjects and Verb Placement in Old High German
9.
10.
11. 12. 13.
14.
15. 16.
43
contrast to OE, there is also no clear contrast between ‘topic-initial’ and ‘operator-initial’ sentences as far as the positioning of pronouns relative to the finite verb is concerned (see Axel in prep.). In contrast to Modern Standard German, extrapositon is attested more frequently in OHG, and it also affects categories or parts of speech that are not normally extraposed in the modern language, such as the indirect object in (8a). See Haider (1993:chapter 6) for such a ‘minimal’ analysis of Modern German clause structure. – There are some OHG sentences attested where the finite verb has not moved even though there is no (overt) complementizer (see Lenerz 1984, who analyses such cases as instances of ‘Comp-drop’). The Latin given in Hench’s (1890) edition and cited in (10b) has been reconstructed. It is, however, almost certain that null-subject use and clause structure are identical in the ‘reconstructed’ and in the original source. The figures in Tables 1 & 2 have been calculated on the basis of Eggenberger (1961). The numbers include referential and arbitrary subject pronouns. The figures for MF are based on all sections excluding the ones listed in fn. 4. It could be objected that this post-finiteness restriction is not absolute. In all three texts there is some amount of subject omission in subordinate clauses, around 7 to 15 per cent according to Eggenberger (1961). Note, however that Eggenberger’s figures for subordinate clauses also include dependent clauses with verb-second order. I would like to claim that the remaining really problematic cases (i.e. null subjects in the context verb-final/late order) do not really falsify the post-finiteness restriction, as they can still be due to translational errors, oversights etc. Thus, I do concede that the Latin exerts some minor impact on the OHG translation. Still null-subject usage in general is not an instance of systematic ‘loan syntax’ since there is a clear overall tendency that overt subject pronouns are inserted in verb-final/late environments. Examples where a sentence-initial ampersand remains untranslated or is rendered by verb-first order and a post-finite adverbial, see (14a, b), should be considered as instances of independent, non-Latinized word order. Note that in other contexts, sentence-initial ampersands functioning as discourse connectives in T are either translated literally by means of inti ‘and’, or they are rendered by a sentence-inital adverbial like thô ‘then’ or thanne ‘then’. See also Barbosa (1995: 181) on synchronic and diachronic evidence for a ccommand relation between agreement and pro Note that the exact postion of the null subject is irrelvant for the claim that its licensing depends on V-to-C-movement. With the exception of overt pronominal subjects, that usually occupy a left-peripheral Wackernagel position in OHG (see Axel in prep.), I assume that subjects (including null subjects) generally stay in VP-internal position. If one argued for a canonical subject position in the middle field on universal grounds, condition (14) would equally be satisfied in the context of V-to-C.
44
Katrin Axel
17. The OHG person split is different from the one in modern Germanic complementizer agreement dialects, in which it is generally the 2nd person that figures most prominently in null-subject construction (see e.g. Bayer 1984). – Note that the presence of 3rd person null subject has been considered a strong diagnostic feature for true pro-drop languages (e.g. Vainikka & Levy 1999). 18. In the weak classes II and III there is also syncretism between third person singular and second person plural indicative present. 19. One idea that suggests itself would be to relate the freer omission of 3rd person subject pronouns to the fact that they can be used anaphorically. Sigurðsson (1993) shows that in Old Icelandic subject and object pro was licensed under free co-indexing with an NP in preceding discourse. Note, however, that in the OHG texts, null subjects do not always have an NP antecedent. Null subjects are, for example, also witnessed discourse initially and they do not necessarily require narrative discourse topicality, even though there seems to be a tendency to use an overt subject pronoun when a change of subject reference occurs (see Lippert 1974: 35). Van Gelderen (2000: 194) offers an account for OE pro-drop (which also favours the omission of 3rd person): 3rd person pronouns are more specified in terms of phi-features; this is why they can be dropped more easily. It is unclear if such an analysis can be applied to OHG. 20. See Weiß (2005) for a discussion of the problematic aspects concerning the ‘writing competence’ underlying historical texts. 21. There is, however, one similarity between the modern partial pro-drop West Germanic dialects and those of earlier OHG: in both cases pro has to be ccommanded by C0. If we furthermore make the plausible assumption that pro is only licensed if Agr is in C0, we can link the absence of pro drop in subordinate clauses in OHG to the lack of complementizer inflection. What still remains a puzzle is why in the modern German dialects, pro is only licensed with complementizer inflection and is banned from configurations with V-toC0 movement where it would be c-commanded by the finite verb. See, however, Bayer (1984) on some verb-first conditionals with pro drop in Modern Bavarian. 22. The different persons/numbers largely remain distinct despite phonological weakening. The only really interesting case is the first person plural: here, the long ending was lost in favour of short –m, which has subsequently been replaced by –n, thereby creating some syncretism between the 1st and 3rd person plural present and past subjunctive of both strong and weak verbs. Note, however, that null subjects were already ruled out in the context of the former –m ending, which did not create such syncretism. 23. According to Jaeggli and Safir’s (1989: 30) condition of “Morphological Uniformity”, pro is licensed only if verbal paradigms have either all derived forms or all underived forms. 24. Harbert (1999) discusses the possibiliy that the loss of thematic pro-drop could be related to the loss of V-to-I-movement. In earlier OHG texts there
Null Subjects and Verb Placement in Old High German
25. 26.
27.
28.
45
are some instances of XP-Pron-Vfin word order, which Eythórsson (1995) has claimed to represent residual instances of V-to-I movement in the context of topicalization. As Harbert argues, however, OHG null subjects also occur in the core contexts for V-to-C-movement (interrogatives, sentential negation), see also (6a), (9a, b) and (13a) above, which shows that OHG pro-drop cannot be explained by the presence of putative residues of V-to-I-movement. Note that OHG non-referential null subjects are subject to similar distributional tendencies as their referential counterparts since they are more frequent in post-finite environments than in pre-finite position. Note that the assumption of a competition situation combined with the assumption of a higher parsing success associated with overt subject pronouns only predicts that null-subject use drops to very small frequencies over time. The complete loss must still have been the result of a grammatical reanalysis. OHG did not have overt verb movement to a sentence-medial I-projection, neither in main nor in subordinate clauses (see Axel in prep.). This makes OHG very different from Old French, which has been shown to be a more IPrelated V2-language. See e.g. Vanelli, Renzi, and Benincà (1985) on the medieval stages of French, Rhaeto-Romance, and the northern Italian dialects and Sprouse and Vance (1999) on the history of Icelandic. As for modern dialects, see e.g. Sprouse and Vance (1999) on Surselvan, and De Crousaz, and Shlonsky (2003) on Franco-Provençal.
References Abraham, Werner 1993 Null subjects in the history of German: From IP to CP, Lingua, 89: 117-142. Adams, Marianne 1987 From Old French to the Theory of Pro-Drop. Natural Language and Linguistic Theory, 5:1-32. Axel, Katrin 2004 The syntactic integration of preposed adverbial clauses on the German left periphery: A diachronic perspective. In The Syntax and Semantics of the Left Periphery, Horst Lohnstein and Susanne Trissler (eds.), pp. 23–48. Berlin/New York: Mouton de Gruyter. Axel, Katrin in prep. Studien zur althochdeutschen Syntax: Linke Satzperipherie, Verbstellung und Verb-zweit. Phil. diss., University of Tübingen. Barbosa, Marie do Pilar P. 1995 Null Subjects. Ph.D. diss. Massachusetts Institute of Technology.
46
Katrin Axel
Bayer, Josef 1984 Comp in Bavarian Syntax. The Linguistic Review, 3:209–274. Behaghel, Otto 1928 Deutsche Syntax. Eine geschichtliche Darstellung. Band 3. Satzgebilde. Heidelberg: Carl Winter. Bishop, Harry M. 1977 The Subjectless Sentences of Old High German. Ph.D. diss. University of California, Berkley. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. De Crousaz, Isabelle & Ur Shlonsky 2003 The distribution of a subject clitic pronoun in a Franco-Provençal dialect and the licensing of Pro. Linguistic Inquiry, 34: 413–442. Dittmer, Arne, and Ernst Dittmer 1998 Studien zur Wortstellung – Satzgliedstellung in der althochdeutschen Tatianübersetzung. Für den Druck bearbeitet von Michael Flöer und Juliane Klempt. Göttingen: Vandenhoeck und Ruprecht. Eggenberger, Jakob 1961 Das Subjektspronomen im Althochdeutschen. Ein syntaktischer Beitrag zur Frühgeschichte des deutschen Schrifttums. Phil. Diss., University of Zurich. Eythórsson, Thórhallur 1995 Verbal syntax in the Early Germanic languages. Ph.D. diss. Cornell University. van Gelderen, Elly 2000 A History of English Reflexive Pronouns: Person, Self, and Interpretability. Amsterdam Philadelphia: John Benjamins. Gilligan, Gary M. 1987 A cross-linguistic approach to the pro-drop parameter. Ph.D. diss., University of Southern California. Grimm, Jacob 1967 Reprint. Deutsche Grammatik. Band 3, Hildesheim: Olms. Original edition, Gütersloh: C. Bertelmann 1898. Haider, Hubert 1993 Deutsche Syntax – generativ. Vorstudien zur Theorie einer projektiven Grammatik. Tübingen: Gunther Narr. Harbert, Wayne 1999 Erino portun ih firchnussu, In Interdigitations. Essays for Irmengard Rauch; Wayne Harbert, Gerald F. Carr, and Lihua Zhang (eds.), pp. 257–268. New York: Peter Lang.
Null Subjects and Verb Placement in Old High German
47
Held, Karl 1903 Das Verbum ohne pronominales Subjekt in der älteren deutschen Sprache. Berlin: Mayer und Müller. Holmberg, Anders and Platzack, Christer 1995 The Role of Inflection in Scandinavian Syntax. New York: Oxford University Press. Hopper, Paul J. 1975 The Syntax of the Simple Sentence in Proto-Germanic. The Hague: Mouton. Jaeggli, Osvaldo 1986 Arbitrary plural pronominals. Natural Language and Linguistic Theory, 4: 43–76. Jaeggli, Osvaldo and Kenneth J. Safir 1989 The null subject parameter and parametric theory. In The Null Subject Parameter, Osvaldo Jaeggli and Kenneth J. Safir (eds.), pp. 1–44. Dordrecht: Kluwer. Lenerz, Jürgen 1984 Syntaktischer Wandel und Grammatiktheorie. Eine Untersuchung an Beispielen aus der Sprachgeschichte des Deutschen. Tübingen: Max Niemeyer. Lippert, Jörg 1974 Beiträge zur Technik und Syntax althochdeutscher Übersetzungen. Unter besonderer Berücksichtigung der Isidorgruppe und des althochdeutschen Tatian. Munich: Fink Kroch, Anthony 1989 Reflexes of grammar in patterns of language change. Language Variation and Change, 1: 199–244. Näf, Anton 1979 Die Wortstellung in Notkers Consolatio. Untersuchungen zur Syntax und Übersetzungstechnik. Berlin/New York: Mouton de Gruyter. Rizzi, Luigi 1986 Null objects in Italian and the theory of pro. Linguistic Inquiry, 17: 501–557. Robinson, Orrin W. 1997 Clause Subordination and Verb Placement in the Old High German Isidor Translation. Heidelberg: Carl Winter. Rohrbacher, Wolfgang B. 1999 Morphology-Driven Syntax: A theory of V to I Raising and Pro-Drop. Amsterdam: Benjamins. Sigurðsson, Halldór Á. 1993 Argument-drop in Old Icelandic. Lingua, 89: 247–280.
48
Katrin Axel
Sprouse, Rex & Barbara Vance 1999 An explanation for the decline of null pronouns in certain Germanic and Romance languages. In Language Creation and Language Change: Creolization, Diachrony and Development, Michel DeGraff (ed.), pp. 256–284. Cambridge, MA: MIT Press. Vainikka, Anne & Yonota Levy 1999 Empty subjects in Hebrew and Finnish. Natural Language and Linguistic Theory, 17: 613–671. Vance, Barbara 1997 Syntactic Change in Medieval French: Verb Second and Null Subjects. Dordrecht: Kluwer. Vanelli, Laura, Lorenzo Renzi & Paola Benincà 1985 Typologie des pronoms sujets dans les langues romanes. Actes du XVIIe Congrès International de Linguistique et Philologie Romanes (1983); Vol. 3, pp. 163–176. Aix: Université de Provence. Weiß, Helmut 1998 Syntax des Bairischen: Studien zur Grammatik einer natürlichen Sprache. Tübingen: Max Niemeyer. Weiß, Helmut 2005 The double competence hypothesis on diachronic evidence. In this volume. Weiß, Helmut to appear Inflected complementizers in Continental West Germanic dialects. To appear in Zeitschrift für Dialektologie und Linguistik.
Beauty and the Beast: What Running a BroadCoverage Precision Grammar over the BNC Taught Us about the Grammar — and the Corpus Timothy Baldwin, John Beavers, Emily M. Bender, Dan Flickinger, Ara Kim, and Stephan Oepen
“. . . every corpus I’ve had a chance to examine, however small, has taught me facts I couldn’t imagine finding out about in any other way” Chuck Fillmore (1992: 35) 1 Introduction The relative merits of corpus and native speaker judgment data is a topic of long-standing debate in linguistics (Labov 1972; Fillmore 1992, inter alia). In this paper, we approach the question from the perspective of grammar engineering, and argue that (unsurprisingly to some, cf. Fillmore) these sources of data are best treated as complementary to one another. Further, we argue that encoding native speaker intuitions in a broad-coverage precision implemented grammar and then using the grammar to process a corpus is an effective way to explore the interaction between the two sources of data, while illuminating both. We discuss how the corpus can be used to constructively road-test such a grammar and ultimately extend its coverage. We also examine limitations in fully corpus-driven grammar development, and motivate the continued use of judgment data throughout the evolution of a precision grammar. Our use of corpus data is limited to evaluating the grammar and exposing gaps in its lexical and constructional coverage, where actual grammar development is based on the combination of corpus and judgment data. In this sense, we distinguish ourselves from the research of, for example, Hockenmaier and Steedman (2002) wherein grammar development is exclusively corpus data-driven in an attempt to enhance coverage over a given corpus (i.e. the Penn Treebank). In this style of approach, only those lexical items observed in the corpus are added to the lexicon, and constructional coverage
50 Timothy Baldwin et al. is tailored to the given corpus. We claim that this approach leads to bias in coverage and restricts the generality of grammatical analyses. In §2, we review some of the arguments for and against both corpus and intuition data. In §3 we introduce the particular resources we used, viz. the English Resource Grammar (ERG; Copestake and Flickinger 2000) and a portion of the British National Corpus (BNC; Burnard 2000), and outline our methodology for combining the two sources of evidence. In §4 we present our results: a categorization of areas for improvement in the grammar as well as a categorization of sources of ‘noise’ in the corpus properly treated as outside the domain of the grammar. In §5 we discuss how these results can inform both future grammar development and syntactic theory. 2 Background While it might seem to be common sense that corpus data and judgment data are complementary sources of evidence, the recent history of the field of linguistics (certainly since the rise of Chomskyan generative grammar) has tended to relegate each of them to competing modes of investigation. Early 20th century American structuralists such as Boas and Sapir relied on both philological sources and elicited data. However, the modern notion of grammaticality (as representative of underlying grammatical principles) was absent from such work, a methodological stance partly due to the behaviorist assumption that mental structure was either non-existent or at least beyond the realm of exploration with empirical data (cf. Bloomfield 1933). It was not until Chomsky’s groundbreaking work in generative grammar that the notion of an inherent grammatical structure in the minds of speakers, and thereby an inherent mental structure to the language faculty, (re)entered mainstream modern linguistics (see in particular Chomsky 1957, 1959, 1965). With this new paradigm of linguistic inquiry came also the distinction between “competence” and “performance”, i.e. the knowledge a speaker has about his or her language vs. how that knowledge is used (see Chomsky 1964 for an early discussion). The study of competence has since received paramount importance, and native speaker judgments of grammaticality/acceptability are now frequently seen as the only means of investigating it. Corpora are instead (somewhat dismissively) relegated to studies of language use and deemed uninteresting to most generative grammarians, on the grounds that: – Corpora are limited in size and therefore may not reflect the full range of grammatical constructions.
Beauty and the Beast 51
– Corpora are full of errors due to processing and reflect other extra-grammatical factors (not part of competence). – Corpora can only provide positive (attested) examples. Without information on contrasting ungrammatical examples, one cannot achieve a complete understanding of competence. The competence/performance distinction and consequent division of types of data has survived in some form in every version of Chomskyan generative grammar.1 However, a significant (albeit somewhat dispersed) amount of literature calls into question the primacy of native speaker intuitions as linguistic data. The main arguments are the general slipperiness of grammaticality data, primarily highlighted by the following objections: 2 – Grammaticality is neither homogeneous nor categorical, but instead represents a cline of relative acceptabilities that vary from speaker to speaker. – Grammaticality judgments are frequently formed in unnatural contextual vacuums (thereby producing unnatural judgments). – Social/cultural biases color judgments (and for that matter so do biases of linguists toward their own theories). – Relying solely on intuitions limits linguists to only the data they have the imagination to think up. While few linguists have completely given up grammaticality judgments, their tenuousness has given much cause for reevaluation. Some researchers have tried to reduce acceptability judgments to other properties of the language faculty (see e.g. Boersma and Hayes 2001 and Boersma 2004 on the prototype/frequency basis of grammaticality in Stochastic OT). Others have argued instead for more controlled, experimental methods of judgment collection and interpretation to increase the quality of intuition data (Labov 1975, 1996; Sch¨utze 1996; Cowart 1997, Keller and Asudeh 2000; Wasow 2002; Wasow and Arnold to appear), although these techniques are not necessarily practical in all circumstances (see fn. 4). However, a sizable number of linguists have in practice adopted the middle ground between more traditional introspection and corpus-based methods. Fillmore (1992) in particular argues for a methodology of linguistic analysis using corpora as a means of maintaining authenticity as well as a way of discovering new types
52 Timothy Baldwin et al. of expressions, while augmenting this data with (informal) native speaker intuitions as a way of filling out paradigms, exploring possible analyses, and drawing semantic generalizations. This approach solves many of the supposed problems of using corpora (the sparseness of data and lack of a basis for relative acceptability) while tempering the biases inherent in free-for-all introspection (see Svartvik 1992 for a collection of papers including Fillmore’s work arguing for and applying this approach). Similarly, descriptive grammars such as Quirk et al. (1985), Sinclair (1990), Biber et al. (1999) and Huddleston and Pullum (2002) have used corpus data in varying degrees to trace out the structure of the English language and unearth generalities, and intuition to fill in the boundaries of grammaticality. One can also find a contrast between corpus- and judgment-based methods in NLP research. This difference constitutes one of the underlying differences between broad-coverage precision grammars and shallow statistical parsers. Typically, broad-coverage precision grammars are based on grammaticality judgment data and syntactic intuition, and corpus data is relegated to secondary status in guiding lexicon and grammar development (e.g. Copestake and Flickinger 2000; Bouma et al. 2001; Bond et al. 2004). Shallow and/or statistical grammars, however, are often induced directly from treebank/corpus data and make little or no use of grammaticality judgments or intuition (Brill and Marcus 1992; Gaizauskas 1995). Their respective limitations are revealing of the philosophical debates between judgment-based vs. corpus linguistics: precision grammars tend to undergenerate—particularly when presented with novel constructions or lexical items—and shallow grammars to massively overgenerate. With broad-coverage precision grammars, the issue of undergeneration is addressed incrementally by grammar writers working with judgment data and analyses published in the linguistic literature to extend coverage. Developers of shallow grammars, on the other hand, tend not to deal with grammaticality and focus instead on selecting the most plausible of the available parses given the knowledge derived from the corpus. Following directly on this discrepancy between shallow and deep parsing, we illustrate in this paper how the hybrid approach advocated by Fillmore applies in the world of grammar engineering. We present a methodology for building a broad-coverage precision grammar using corpora as a primary source of data, enhancing and expanding that data with native speaker judgments in order to fully flesh out the paradigms in the corpora while staying true to their authenticity. We outline our methodology in the next section.
Beauty and the Beast 53
3
Methodology
3.1
The English resource grammar
The ERG is an implemented open-source broad-coverage precision Headdriven Phrase Structure Grammar (HPSG; Pollard and Sag 1994) developed for both parsing and generation. It has been engineered primarily in the context of applications involving genres such as conversations about meeting scheduling and email regarding e-commerce transactions. While these domains are relatively open-ended, their task-orientation leads to a significant bias in their lexical and constructional composition. Also, both are informal genres based on either transcribed speech or informal text, raising questions about the portability of the ERG to more formal corpora such as the BNC. The ERG contains roughly 10,500 lexical items, which, when combined with 59 lexical rules, compile out to around 20,500 distinct word forms. 3 Each lexical item consists of a unique identifier, a lexical type (one of roughly 600 leaf types organized into a type hierarchy with a total of around 4,000 types), an orthography, and a semantic relation. The grammar also contains 77 phrase structure rules which serve to combine words and phrases into larger constituents, and compositionally relate such structures to semantic representations in a Minimal Recursion Semantics framework (MRS; Copestake et al. 2003). Of the 10,500 lexical items, roughly 3,000 are multiword expressions (MWEs; Sag et al. 2002). Development of the ERG has been corpus-driven in the sense that coverage is expanded according to the phenomena which appear in the corpora from the domains to which the ERG has been applied. However, the grammar is not a simple reflection of what has been found in the corpus. Rather, when a corpus example illustrates a previously untreated phenomenon, the grammar engineers construct a space of similar examples drawn from the corpora, then consult the linguistic literature, their intuitions, and other informants in order to map out a space of both grammatical and ungrammatical examples. The total of these investigations serve as the basis for the analyses coded in the grammar. It is in this sense that the ERG stands as an encoding of linguistic intuitions, albeit driven primarily by data found in corpora. 4 Finally, we would like to emphasize that the ERG is a deep, precision grammar. By this we mean that it relates surface strings not only to syntactic structures but also to explicit, elaborated semantic representations, and further that it attempts to encode a sharp notion of grammaticality: only wellformed strings representing linguistic phenomena analyzed by the grammar will be parsed. Contrasting ill-formed examples will not. Avoiding ungram-
54 Timothy Baldwin et al. maticality cuts down on spurious ambiguity in parsing, simplifying somewhat the problem of parse selection, and is crucial in avoiding ill-formed output in generation. This precision contrasts with shallow approaches to parsing which, as noted above, tend to deal with selecting plausible parses (generally through stochastic means) rather than grammaticality.
3.2
The BNC sample
To investigate domain portability, we tested the coverage of the ERG over a random sample of 20,000 strings from the written component of the BNC. Here, the term “string” is used to refer to a “sentence” token according to the original BNC tokenization, and intended to reflect the fact that significant numbers of such tokens are not syntactic sentences (see §4); the random sample was extracted from the 4.6m strings contained in the written portion of the BNC by iteratively selecting a random string from the set of non-selected BNC strings based on the scaled output of a random number generator. At present, unknown word handling in the ERG is restricted to number expressions and proper names. An input containing any word which does not fall into these classes or is not explicitly described as a lexical item therefore leads to parse failure. In order to filter out the effects of unknown words and focus on constructional coverage and the syntactic coverage of known words, we restricted our attention to strings for which we seem to have a full lexical span, i.e. which contain only word forms already known to the grammar. An important point to note for the discussion of results in §4 below is that our notion of lexical span still leaves plenty of room for lexical gaps, e.g. where a form may be included in the lexicon with only a subset of its appropriate parts of speech, subcategorization frames, or other idiosyncratic lexical properties. In order to apply this filter to the data, we first tagged the strings for part-of-speech and stripped away any punctuation not handled by the grammar (e.g. commas and periods). Based on the tagger output, we tokenized proper names and number expressions (both cardinal and ordinal), and finally used a table of British–American spelling variants to translate any British spellings into their American equivalents. After tokenization and spelling normalization, the proportion of strings for which the ERG had full lexical span was 32%. This analysis was done by building a lattice of simplex words and multiword expressions licensed by the grammar, and looking for the existence of a spanning path through the lattice.
Beauty and the Beast 55
3.3
Combining the sources of evidence
We used the ERG to analyze the BNC sample in two ways. In the first instance, we used the ERG to effectively sift out the interesting new cases from the previously analyzed ones. Rather than looking at the raw corpus, we focused on those sentences in the sample which we were not able to parse. This significantly increased the signal-to-noise ratio, where the signal we were interested in was syntactic and lexical gaps in our grammar. We were also able to use the ERG as an aid in analyzing the unparsed sentences, by manually proposing paraphrases until the grammar was able to parse the string. The differences between the parsed paraphrase(s) and the original string indicate the phenomena which need to be added to the grammar or else excluded from it if ungrammatical or extragrammatical (see §4 below). We illustrate the application of the paraphrase method by way of the following sentence, which the ERG is unable to produce an analysis for: 5 (1)
@ Always
exercise gently to begin with, building up gradually over a period of time and remembering that there is never any need to strain yourself.
We diagnosed the cause(s) of parse failure by first breaking the sentence down into unit clauses and isolating possible sources of error through a depth-first paraphrase process. The resultant unit clauses in the case of (1) are: (2)
a. b. c.
Always exercise gently to begin with. It builds up gradually over a period of time. Remember that there is never any need to strain yourself.
Applying the paraphrase method, we fed each sentence in (2) into the grammar one by one. The ERG failed to parse (2a), so we then stripped the clause of sentential modifiers, producing Always exercise gently. This too failed, whereupon we looked up exercise in the lexicon and found it lacked an entry as an intransitive verb. We then tried a paraphrase of the clause using the known intransitive verb walk, with and without to begin with. Always walk gently parsed whereas Always walk gently to begin with did not. This suggested that to begin with was a MWE currently missing in the ERG, and thus another source of parse failure. Turning to (2b), this expression likewise failed to parse. Once again, we tried stripping the sentential modifiers and proposed It builds up. This also produced parse failure, revealing the absence of a lexical entry for the intransitive verb particle construction build up. We
56 Timothy Baldwin et al. then verified that It eats gradually over a period of time parses, indicating no further problems within this clausal unit. Finally, (2c) also failed to parse, causing us to test the sentence again without the adverb never, i.e. Remember that there is a need to strain yourself. Since this paraphrase parsed, we concluded never was missing a lexical entry that would license this particular construction (as type adv vp aux, i.e. a polar adverbial licensed by an auxiliary verb). In total, therefore, we were able to identify 4 lexical gaps in (1). This methodology was similarly applied to other parse failures to identify a wide range of lexical and constructional gaps. Note that one advantage of this method is that it does not require an advanced knowledge of the workings of the ERG, only the ability to test linguistic hypotheses. 4 Results Of the strings with full lexical span, the grammar was able to generate at least one parse for 57%. The parses were manually inspected (using a parse selection tool; Oepen et al. 2002). Of these, 83% of the strings were found to have been assigned a correct (i.e. preferred) parse. At first sight, the absolute coverage figures reported for parsing the BNC with the ERG must seem disappointingly low. At the same time, we felt reasonably content with the outcome of this first, out-of-the box experiment: obtaining close to 60% grammatical coverage from applying to the BNC a hand-built precision grammar that was originally developed for informal, unedited English in limited domains (and lacks a large, general-purpose lexicon, a refined treatment of unknown words, and any kind of robustness facilities) seemed like a respectable outcome. Furthermore, the 83% correctness measure that we found in treebanking the analyses produced by the grammar appears to confirm the semantically precise nature of the grammar; as does an average ambiguity of 64 analyses per sentence for strings of length 10 to 20 words. To put these results into perspective, typical coverage figures for the ERG on new data from the closed (spoken) appointment scheduling and (email) e-commerce domains tend to range upwards of 80%, with average ambiguity rates of around 100 analyses per input. A recent experiment in manually adding vocabulary for a 300-item excerpt from tourism brochures gave the ERG an initial coverage of above 90% (at an average ambiguity of 187 analyses for an average string length of 13 words). In all three scenarios manual parse inspection of ERG outputs confirms analysis correctness measures of at least 90%. The somewhat lower average ambiguity over the BNC data presumably reflects the incomplete lexical coverage diagnosed below (§4.1).
Beauty and the Beast 57 Table 1. Breakdown of causes of parse failure Cause of Parse Failure Missing lexical entry Missing construction Fragment Preprocessor error Parser resource limitations Ungrammatical string Extragrammatical string
Frequency 41% 39% 4% 4% 4% 6% 2%
The ambiguity levels in each case contrast sharply with the thousands or even millions of ‘distinct’ analyses typically delivered by treebank-derived statistical parsers (Charniak 1997). We then turned to the 43% of the original sample which did not receive any parse, and used the methodology described in §3.3 above to diagnose and classify the cause(s) of parse failure. This analysis was carried out over a sampled subset of the original data set, 1190 items, or approximately 14%. In our analysis, we found seven categories of causes of parse failure, as detailed in Table 1. The frequencies in Table 1 were calculated by itemizing the causes of parse failure for each string which did not receive a parse, and summing up the frequency of occurrence of each cause across all strings. Note that the Fragment, Preprocessor error, Parser resource limitations, Ungrammatical string and Extragrammatical string categories apply at the string level. A single string can thus produce (at most) one count for each of these categories. The Missing lexical entry and Missing construction categories, on the other hand, operate at the word/constituent level. We made every attempt to exhaustively identify and sub-classify every such occurrence within a given string, resulting in the possibility for a single string to be counted multiple times in our statistics. The first two causes of parse failure represent clear lacunae in the grammar, and we argue that the third does as well. Preprocessor errors and parser resource limitations involve other components of the system (the preprocessor and the parser, respectively) failing, and don’t necessarily reflect on either the grammar or the corpus. Finally, the last two categories represent noise in the corpus which should not be accommodated in a precision grammar. In the remainder of this section, we illustrate each type of cause in turn, and then evaluate the strategy as a whole.
58 Timothy Baldwin et al. 4.1
Missing lexical entries
Despite the restriction to strings with a full lexical span, we were nonetheless confronted by gaps in lexical coverage, which fall into two basic categories: incomplete categorization of existing lexical items and missing multiword expressions (MWEs). Incomplete categorization refers to missing lexical types for a given word token. While each ERG lexical item is annotated with a specific lexical type which determines its syntactic and semantic behavior, a gap in the full paradigm of types for a given orthographic form (e.g. the noun table, but not the verb) leads to parse failure. In some cases it appears that a general process is involved (e.g. a ‘universal grinder’ treatment of mass uses of prototypical count nouns as in Pelletier 1979), such that the most appropriate way to extend coverage is to add a lexical rule, but many more cases don’t seem amenable to this kind of treatment. Second, syntactically-marked MWEs—notably verb-particle constructions (e.g. take off ) and determinerless PPs (e.g. off screen, at arm’s length)—cause similar problems. Once again, we find general processes, such as valence patterns for action verbs with and without the completive particle up. However, such general processes hardly account for the full range of idiosyncrasies and partial generalizations of MWE. Frequently, then, the demands of precision grammar engineering dictate that the grammar explicitly license each observed verb-particle pair or determinerless PP rather than letting any particle appear with any verb or any count noun appear immediately after a preposition. The flip-side of requiring explicit licensing is a susceptibility to lexical gaps. The frequency with which MWEs appear in the data underscores the fact that they are not a minor annoyance, to be relegated to the periphery. To truly achieve broad-coverage and adequate semantic representations, a precision grammar must treat them as first class entities. Verb-particle constructions, for example, are estimated to account for 1.6% of word token occurrences in the BNC, and determinerless PPs 0.2% (Baldwin et al. 2005). Regardless of the class of lexical gap, the BNC data highlighted both lexical gaps which could easily have been identified through simple introspection (e.g. nominal attack), and more subtle ones such as the transitive verb suffer and the MWE at arm’s length. In future work, we intend to leverage the corpus via shallow parsing techniques to bootstrap semi-automatic lexical expansion efforts. We expect there to be limitations to corpus evidence, however, and that quirky constraints on some lexical entries will only be detectable via introspection. For example, the BNC data revealed a lexical gap for the use of tell meaning ‘discover’ or ‘find out’ in (3). Introspective in-
Beauty and the Beast 59
vestigation revealed that this sense of tell requires either one of a small set of modals or how (see (4)). While a subset of the collocations can be found in the BNC, there is no obvious way to automatically detect the full details of such idiosyncratic constraints on distribution. (3) (4)
@ Not
sure how you can tell.
a. Can/could you tell? b. Are you able to tell? c. *They might/ought to tell. d. How might you tell? e. *How ought they to tell?
(on the intended reading) (on the intended reading)
Further investigation of the corpus revealed instances of how could (one) tell and how does (one) tell, but not alternative modal collocates such as how might/would (one) tell. Thus, having been alerted to the presence of this expression in actual use, we used linguistic intuition in order to determine its full variability (see Fillmore 1992 and Fillmore and Atkins 1992 for a similar hybrid approach to the distribution and semantics of risk and home). 4.2
Missing constructions
In addition to known difficult problems (e.g. direct quotes, appositives and comparatives), we found many constructions which were more obscure, and might not have occurred to us as something to analyze without the aid of both the corpus (presenting the examples) and the grammar itself (sifting away all of the previously analyzed phenomena). We present a few such examples here, aiming not to provide full analyses but rather to motivate their interest. The first example (5) involves the pied-piping of an adjective by its degree specifier in a free relative construction. Such examples were not parsed by the ERG since it explicitly coded the expectation that adjectives were not allowed to pied-pipe in this context. (5)
@ However
pissed off we might get from time to time, though, we’re going to have to accept that Wilko is at Elland Rd. to stay.
At first glance, it appeared that this particular configuration might be restricted to concessive uses of free relatives like (5). However, further investigation into another corpus (the Web, via Google) turned up examples like (6), indicating that this is in fact a general pattern for free relatives.
60 Timothy Baldwin et al. (6)
@ The
actual limit is however big your file system will let the file be.
The second example (7) involves a class of expressions which one might call quasi-partitives. (7)
a. @ He’s a good player, a hell of a nice guy too. b. That’s a bitch of a problem to solve. c. *He’s a hell on wheels/hell and a half/hell beneath us of a guy. d. *The hell of a guy that I met at the party last night. . .
In addition to hell of a (and its reduced forms helluva/hella), one also finds bitch of a (7b) and perhaps others. It appears that nothing can intervene between hell and of (7c). This construction presents a neat little semantic puzzle. First, note that it appears that the construction is restricted to indefinite NPs (7d). Thus it appears that hell of is attaching to the NP a nice guy, or ¯ nice guy. On the other hand, the of perhaps hell of a is attaching to the N can be directly followed by a noun or by an adjective and then a noun. When there is an adjective present, hell of a seems to be acting semantically as an intensifier of the adjective. Given ordinary assumptions about semantic composition, it is not immediately clear how an element attaching syntactically to ¯ could semantically modify an adjective modifier inside that NP/ N. ¯ an NP/N The next example (8a) involves exocentric NPs of the form [Det Adj], but (surprisingly, if one believes the textbooks) not restricted to referring to generic classes of humans (cf. the rich, the famous). 6 (8)
a.
@ The
price of train tickets can vary from the reasonable to the ridiculous. b. The range of airfares includes the reasonable and the ridiculous. c. An exhibit of the grotesque is on display at the museum today. d. *My collection of toy cars include the red and the blue.
Further reflection brought us to examples like (8b), which joins the two exocentric NPs with a conjunction rather than the from . . . to construct and (8c) which involves only one exocentric NP. The infelicity of (8d) indicates that this construction isn’t available with all adjectives, or perhaps with all construals of the resulting NP. We believe that the corpus example (8a) motivates an investigation into the classes of adjectives which can appear in this construction, the classes of referents the resulting NPs can have, and the relationship (if any) between adjective class and resulting potential referent classes. Our final example (9) involves a construction which licenses the use of any common noun as a title, paired with an enumerator from an ordered list (e.g. numbers, letters, alpha/bravo/charlie/. . . ).
Beauty and the Beast 61
(9)
@ This sort of response
was also noted in the sample task for criterion 2.
This example appears to involve a construction somewhat similar to the one that pairs a title like Prof. or Dr. with a personal name, and raises the question of whether that family of constructions might not include a few other members, again with slightly varied constraints. It is worth noting here that this example also represents a class of phenomena (including number names, quotatives, and time/date expressions) which are relatively frequent and commonplace in corpus data, but tend to go unnoticed in linguistic investigations which are not rooted in corpora. We speculate that this is because they are somehow more context-dependent and are therefore unlikely to crop up in the sort of decontextualized sentence generation which is typical in syntactic research. We take this to be a validation of our methodology: corpora are a rich source of largely unnoticed lexical items and construction types, some of which are context-dependent in a way which makes them unlikely to be noticed through introspection but still frequent enough to pose a problem for any parser. However, the inherent biases in corpora (e.g. frequency of some uses over others) might mask the underlying paradigms governing the distribution of these items, calling for a broader approach to updating a grammar like the ERG involving introspective analysis. Furthermore, using the existing grammar to analyze the corpus enriches the data sample presented to human analysts, thus enhancing the usefulness of the corpus. 4.3
Fragments
On the boundary between the grammar illuminating the corpus and the corpus illuminating the grammar, we find sentence fragments like (10a–c). While these are clearly not grammatical sentences, they are grammatical strings, and some even represent idiomatic frames as in (10c). (10)
a. b. c.
@ The
Silence of the Piranhas Not good enough probably @ Once a Catholic, always a Catholic @ Mowbray?
We must therefore extend the grammar to include a wider notion of grammaticality, perhaps grounded in what can serve as a stand-alone utterance in a discourse or similar unit in a text (e.g. see Schlangen 2003 for a detailed analysis of a wide range of sentence fragments within this framework).
62 Timothy Baldwin et al. 4.4
Preprocessor errors and parser resource limitations
Preprocessor errors involve common nouns or other elements (e.g. whilst) in (11) being mistagged as proper nouns 7 or vice versa, causing errors in tokenization, leading in turn to unparsable inputs. (11)
@ Whilst
doing this you need to avoid the other competitors.
Also, a small number of remaining British spellings caused parse failure in some cases. While these do not reflect directly on the ERG, they do illustrate one kind of noise in the corpus. That is, in any practical application, a precision grammar will have to contend with both inherent corpus noise (see §4.5) and noise added by other components of the NLP system. Parser resource limitations refer to instances where the parser ran out of chart edges before creating any spanning parses which satisfied the root conditions. This occurred particularly for strings with a high level of coordination or modifier/attachment ambiguity. This problem can be mitigated to some degree at the hardware level by increasing the memory, or resolved more substantively through the adoption of a beam search-based parse selection facility. Beam search would take the form of dynamic pruning of improbable edges, determined relative to a treebank constructed from successfully parsed examples (Oepen et al. 2002). With such a facility, the parser should be able to find spanning edges even for very long and ambiguous sentences, whereas in the experiments here it was always attempting to parse exhaustively within the limits given. For the moment we ignore these limitations (which affected only a small number of candidate sentences). 4.5
Ungrammatical strings
Whereas ungrammatical items in a manually-constructed test suite serve to contrast with minimally different grammatical examples and demarcate the constraints on a particular construction, naturally occurring ungrammatical items constitute instead haphazard noise. Even in the BNC, much of which is edited text, one finds significant numbers of ungrammatical strings, due to reasons including spelling and string tokenization errors (e.g. @*...issues they fell should be important...), typographical inconsistencies, and quoted speech. While larger NLP systems (into which a precision grammar may be embedded) should incorporate robust processing techniques to extract such information as is possible from ungrammatical strings in the input, the precision grammar per se should not be adapted to accommodate them. 8
Beauty and the Beast 63
At the same time, such ungrammatical examples can serve as a test for overgeneration that goes far beyond what a grammar writer would think to put in a manually constructed test suite. This underscores the importance of the treebank annotation step of our methodology. Having a human annotator effectively vet the grammar’s analyses also turns up any ungrammatical examples that the grammar (mistakenly) assigned an analysis to. 4.6
Extragrammatical strings
Extragrammatical effects involve unhandled non-linguistic or quasi-linguistic phenomena, associated with written presentation, interfacing unpredictably with the grammar. A prime example is structural mark-up, which can lead to unexpected effects, such as a in (12) being misanalyzed as an article, instead of stripped off the sentence. If a is taken as an article, the grammar correctly predicts the string to be ungrammatical. A pre-processing strategy can be employed here, although simply stripping the mark-up would be insufficient. An interface with the grammar will be required in order to distinguish between structural and lexical usages of (I), e.g. as illustrated in (13) and (14). (12)
@ There
(13)
@ (I)
(14)
@ “(I) rarely took notes during the thousands
are five of these general arrest conditions: (a) the name of the person is not known to the police officer and he or she can not “readily ascertain” it. That Mrs Simpson could never be Queen. of informal conversational
interviews. 4.7
Evaluation and summary
Our treebank annotation strategy successfully identified a large number of sentences and fragments in the BNC for which the current ERG was unable to provide a correct analysis, even where it did offer some (often many) candidate analyses. The paraphrase proposal worked well in diagnosing the specific source of the parse failure, across all of the types: lexical gaps, constructional gaps, fragments, ungrammatical strings and extragrammatical strings. The undergraduate annotator (previously unfamiliar with the ERG) using these techniques was able to correctly identify, diagnose, and document often subtle errors for about 100 BNC examples per day. The annotator’s analysis
64 Timothy Baldwin et al. was evaluated and extended in an item-by-item discussion of 510 such errors with the grammar writers. This precise, detailed classification of errors and their frequency in the subcorpus provides important guidance to the ERG developers both in setting priorities for hand-coded lexical and syntactic extensions to the grammar, and also in designing methods for semi-automatic acquisition of lexical items on a much larger scale.
5 Conclusions We have explored the interaction of two types of evidence (corpus data and grammaticality judgments) from the perspective of grammar engineering. Combining the two sources of linguistic evidence as we did—encoding intuitions in a broad-coverage precision grammar and using this grammar to process the corpus—allowed us to explore their interaction in detail. The corpus provides linguistic variety and authenticity, revealing syntactic constructions which we had not previously considered for analysis, including many which fall outside the realm likely to be explored in the context of decontextualized example generation. Analyzing the corpus with the grammar allowed us to efficiently focus on the new territory, neatly sweeping away the well-known constructions which we have already incorporated. Since the asyet unanalyzed constructions tend to be lower frequency, this ability to enrich the data that must be gone through by hand is crucial. Insisting on maintaining a notion of grammaticality in our precision grammar (rather than aiming to analyze every string in the corpus) leads us to recognize and categorize the noise in the corpus. Finally, as the corpus examples inspire us to add further analyses to the grammar, we incorporate additional intuition-based evidence as well as attested examples from other corpora gleaned from targeted searches. This is in fact required by the precision grammar approach: If we were to rely only on attested examples to craft our analyses (and especially examples from a single corpus or genre), they would be a very poor match to the actual state of the language indeed. We believe that any such attempt would necessarily end up being too permissive (leading to massive ambiguity problems and ill-formed output in generation) or incoherent, as one tried to incorporate unnatural constraints to match the attested examples too closely. In illustrating our methodology and providing a taste of the kind of results we find, we hope to have shown that precision grammar engineering serves both as a means of linguistic hypothesis testing and as an effective way to bring new data into the arena of syntactic theory.
Beauty and the Beast 65
Acknowledgments This material is based upon work supported by the National Science Foundation under Grant No. BCS-0094638 and also the Research Collaboration between NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation and CSLI, Stanford University. It has been presented at the International Conference on Linguistic Evidence (T¨ubingen, Germany, January 2004) and the Fourth International Conference on Language Resources and Evaluation (Lisbon, Portugal, May 2004). We would like to thank Francis Bond, Ivan Sag, Tom Wasow, and anonymous reviewers and audience members from both conferences for their valuable input.
Notes 1. See e.g. Chomsky (2001) and Newmeyer (2003) for recent discussions. 2. See Labov (1972, 1975, inter alia), for early discussion of some of these points; see Sch¨utze (1996) for a detailed summary of critiques of grammaticality. 3. All statistics and analysis relating to the ERG in this paper are based on the version of 6 June, 2003. 4. As discussed in §2, a more rigorous alternative to standard introspection would be to use judgment data collected via experimental techniques. However, we find that in the development cycle of a project such as ours, it is not practical to carry out full-scale grammatical surveys for each contrast we want to encode. Thus we continue to use informal methods to collect introspective data (where more sophisticated surveys are not available in the literature) and rely on the corpus to show us when these methods have gone astray. 5. Following Bender and Kathol (2001), we indicate attested examples with @ . Unless otherwise noted, all attested examples cited in this paper are from the BNC. ¯ 6. Such cases of so-called N-ellipsis are of course quite common in a number of other languages (Beavers 2003). 7. In this case, the capitalization might have been one factor in the mistagging. 8. We note, however, that it is possible to adapt a precision grammar to handle ungrammaticality (while recognizing it as such) by incorporating a combination of robustness root conditions, “mal-rules” and error-predictive lexical entries, and still produce a well-formed semantic representation (Bender et al. 2004).
66 Timothy Baldwin et al. References Baldwin, Timothy, John Beavers, Leonoor van der Beek, Francis Bond, Dan Flickinger, and Ivan A. Sag 2005 In search of a systematic treatment of determinerless PPs. In Patrick Saint-Dizier, (ed.), Syntax and Semantics of Prepositions. Kluwer, Germany. Baldwin, Timothy, Emily M. Bender, Dan Flickinger, Ara Kim, and Stephan Oepen 2004 Road-testing the English Resource Grammar over the British National Corpus. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), pp. 2047–2050. Lisbon, Portugal. Beavers, John 2003 More heads and less categories: A new look at noun phrase structure. In Proceedings of the 2003 HPSG Conference, pp. 47–67. CSLI Publications, Stanford, USA. Bender, Emily M., Dan Flickinger, Stephan Oepen, Annemarie Walsh, and Tim Baldwin 2004 Arboretum: Using a precision grammar for grammar checking CALL. In Proceedings of the InSTIL/ICALL Symposium: NLP and Speech Technologies in Advance Language Learning Systems, pp. 83– 86. Venice, Italy. Bender, Emily M. and Andreas Kathol 2001 Constructional effects of just because . . . doesn’t mean . . . . In BLS 27. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan 1999 Longman Grammar of Spoken and Written English. Longman, London, UK. Bloomfield, Leonard 1933 Language. Holt, New York, USA. Boersma, Paul 2004 A Stochastic OT account of paralinguistic tasks such as grammaticality and prototypicality judgments. Rutgers Optimality Archive 648. Boersma, Paul and Bruce Hayes 2001 Empirical tests of the gradual learning algorithm. Linguistic Inquiry, 32: 45–86. Bond, Francis, Sanae Fujita, Chikara Hashimoto, Kaname Kasahara, Shigeko Nariyama, Eric Nichols, Akira Ohtani, Takaaki Tanaka, and Shigeaki Amano 2004 The Hinoki treebank: A treebank for text understanding. In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP-04), pp. 554–559. Hainan Island, China. Bouma, Gosse, Gertjan van Noord, and Robert Malouf 2001 Alpino: Wide coverage computational analysis of Dutch. In Computational Linguistics in the Netherlands 2000, pp. 45–59. Tilburg, Netherlands.
Beauty and the Beast 67 Brill, Eric and Mitchell Marcus 1992 Automatically acquiring phrase structure using distributional analysis. In Proceedings of the 4th DARPA Speech and Natural Language Workshop, pp. 155–159. Pacific Grove, USA. Burnard, Lou 2000 User Reference Guide for the British National Corpus. Technical report, Oxford University Computing Services. Charniak, Eugene 1997 Statistical techniques for natural language parsing. AI Magazine, 18: 33–44. Chomsky, Noam 1957 Syntactic Structures. Mouton, The Hague, Netherlands. 1959
A review of BF Skinner’s Verbal Behavior. Language, 35: 26–58.
1964
Current Issues in Linguistic Theory. Mouton, The Hague, Netherlands. Aspects of the Theory of Syntax. MIT Press, Cambridge, USA.
1965 2001
New Horizons in the Study of Language and Mind. Cambridge University Press, Cambridge, UK. Copestake, Ann and Dan Flickinger 2000 An open-source grammar development environment and broadcoverage English grammar using HPSG. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC 2000), pp. 591–600. Athens, Greece. Copestake, Ann, Daniel P. Flickinger, Carl Pollard, and Ivan A. Sag 2003 Minimal Recursion Semantics. An introduction. Unpublished ms., http://www.cl.cam.ac.uk/˜acc10/papers/newmrs.ps. Cowart, Wayne 1997 Experimental Syntax: Applying Objective Methods to Sentence Judgments. SAGE Publications, Thousand Oaks, USA. Fillmore, Charles J. 1992 “Corpus linguistics” or “computer-aided armchair linguistics”. In Jan Svartvik, (ed.), Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4-8 August, 1991, pp. 35–60. Mouton de Gruyter, Berlin, Germany. Fillmore, Charles J. and Beryl T.S. Atkins 1992 Towards a frame-based lexicon: The semantics of risk and its neighbors. In Adrienne Lehrer and Eva Kittay, (eds.), Frames, Fields, and Contrasts, pp. 75–102. Erlbaum Publishers, Hillsdale, USA. Gaizauskas, Rob 1995 Investigations into the grammar underlying the Penn Treebank II. Technical report, Research Memorandum CS-95-25, University of Sheffield.
68 Timothy Baldwin et al. Hockenmaier, Julia and Mark Steedman 2002 Acquiring compact lexicalized grammars from a cleaner treebank. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002), pp. 1974–1981. Las Palmas, Canary Islands. Huddleston, Rodney and Geoffrey K. Pullum 2002 The Cambridge Grammar of the English Language. Cambridge University Press, Cambridge, UK. Keller, Frank and Ash Asudeh 2000 Constraints on linguistic reference: An experimental investigation of exempt anaphors. Unpublished ms., University of Edinburgh and Stanford University. Labov, William 1972 Sociolinguistic Patterns. University of Pennsylvania Press, Philadelphia, USA. 1975 What is a linguistic fact? In R. Austerlitz, (ed.), The Scope of American Linguistics, pp. 77–133. Peter de Ridder, Lisse, Netherlands. 1996 When intuitions fail. In Lisa McNair, Kora Singer, Lise M. Dobrin, and Michelle M. Aucoin, (eds.), CLS 32: Papers from the Parasession on Theory and Data in Linguistics, pp. 76–106. Newmeyer, Frederick J. 2003 Grammar is grammar and usage is usage. Language, 79: 679–681. Oepen, Stephan, Kristina Toutanova, Stuart Shieber, Chris Manning, Dan Flickinger, and Thorsten Brants 2002 The LinGO Redwoods treebank. Motivation and preliminary applications. In Proceedings of the 19th International Conference on Computational Linguistics, pp. 1253–1257. Taipei, Taiwan. Pelletier, Francis J. 1979 Non-singular reference: Some preliminaries. In Francis J. Pelletier, (ed.), Mass Terms: Some Philosophical Problems, pp. 1–14. Reidel, Dordrecht, Holland. Pollard, Carl and Ivan A. Sag 1994 Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. The University of Chicago Press and CSLI Publications, Chicago, USA and Stanford, USA. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik 1985 A Comprehensive Grammar of the English Language. Longman, London, UK. Sag, Ivan A., Timothy Baldwin, Francis Bond, Ann Copestake, and Dan Flickinger 2002 Multiword expressions: A pain in the neck for NLP. In Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2002), pp. 1–15. Mexico City, Mexico.
Beauty and the Beast 69 Schlangen, David 2003 A Coherence-Based Approach to the Interpretation of Non-Sentential Utterances in Dialogue. Ph.D. thesis, University of Edinburgh. Sch¨utze, Carson 1996 The Empirical Base of Linguistics. University of Chicago Press, Chicago, USA. Sinclair, John, (ed.) 1990 Collins COBUILD English Grammar. Harper Collins, London, UK. Svartvik, Jan, (ed.) 1992 Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4-8 August, 1991. Mouton de Gruyter, Berlin, Germany. Wasow, Thomas 2002 Postverbal Behavior. CSLI Publications, Stanford, USA. Wasow, Thomas and Jennifer Arnold To appear Intuitions in linguistic argumentation. Lingua.
Seemingly Indefinite Definites Greg Carlson and Rachel Shirley Sussman
1
Introduction1
From the time of Bertrand Russell, the semantics of the English definite article has been the object of continued semantic analysis. Most analyses make some use of the notions of uniqueness, or familiarity, however defined more precisely (see for instance Roberts 2003 for one recent analysis and review). In this paper, though, we wish to motivate through both experimental and non-experimental methodologies the claim that there is a sub-class of English definite articles which function differently, being much more akin to indefinites in their interpretations than the much larger and more general class of definite articles that is the primary focus of continued study. Recognizing this distinction may prove useful in future work on the semantics of the definite article, as the class of “indefinite definites”, or “weak definites”, represents a class of examples constituting the greatest challenge to uniqueness or familiarity-based accounts. Setting them aside and treating them as a separate group for different treatment may prove a fruitful research strategy. 2
The phenomenon
Our contention is that there is a subtle but perceptible contrast between the examples of (1) and those (2): (1)
a. b. c. d.
Mary went to the store. I’ll read the newspaper when I get home. Open the window, will you please? Fred listened to the Red Sox on the radio.
(2)
a. b. c. d.
Mary went to the desk. I’ll read the book when I get home. Open the cage, will you please? Fred listened to the Red Sox over the headphones.
72
Greg Carlson and Rachel Shirley Sussman
In the examples of (1), intuitively, the particular identity of the store, newspaper, window, or radio is not thought to be especially important, in contrast to the desk, book, cage, or headphones in the examples in (2). This contrast, so put, is a vague intuition which nevertheless we find most English speakers agree with. This distinction, though, can be considerably sharpened by embedding such examples in constructions making use of VP-ellipsis, and asking whether the identity of the denotation of the NP must be preserved under anaphora. As a lead-in, consider the example in (3): (3)
Mary heard about the riot on the radio, and Bob did, too.
Here, Mary and Bob did have to hear about the very same riot. However, they clearly could easily have heard about it on different radios. This is because, our claim goes, “the riot” has no weak or indefinite reading, whereas “the radio” does. Note that this is the same judgment that would appear if “the radio” were replaced by the indefinite “a radio”. The contrast in (4) further sharpens this distinction. (4)
a. Fred went to the store, and Alice did, too. (OK as different stores) b. Fred went to the desk, and Alice did, too. (must be the same desk)
“The store” has a weak reading, whereas “the desk” has only the regular definite interpretation. To substantiate these results, we presented materials of this sort in a judgement survey to 16 native speakers of English. Participants read a short description of a situation where two separate characters acted upon two separate items of the same type, and then were asked whether a target sentence containing VP ellipsis and a suspected weak indefinite provided an accurate depiction of the events described (see table 1). Table 1. Examples of survey materials Regular Definites Indefinite Definites
context sentence Bill read Jane Austen's Pride and Prejudice, and Joe read The Hitchhiker's Guide to the Galaxy, by Douglas Adams. At breakfast, Samantha read the New York Times. Across the table from her, Frances was reading the Democrat and Chronicle.
target sentence Bill read the book, and Joe did too. Samantha read the newspaper, and Frances did too.
Seemingly Indefinite Definites
73
For weak definites, participants accepted the elided sentence as an accurate description 73% of the time, while for regular definites, the sentence was accepted only 24% of the time, which constituted a statistically significant difference between the two definite types (t1(15)=5.93, p<.001, t2(5)=6.14, p<.001)2. Thus, speaker judgements reflect a reliable and robust difference in the availability of the weak definite reading. 3
Distributional properties of weak definites
What is it that accounts for the difference between, e.g. “the store” vs. “the desk” in examples such as (4)? To get a handle on this, we are going to examine the distributional properties of bare singular count nouns in English, and then show that the class of weak definites shares this same class of restrictions. What we mean by “bare singulars” is exemplified in (5a), and we are going to basically be claiming that the examples in (5b) exhibiting definite articles should be analyzed similarly. (5)
a. Sue took her nephew to college/to prison/to class. b. Sue took her nephew to the hospital/to the store/to the beach.
Very approximately, both the bare singulars and the weak definites are used in constructions which designate typical or habitual activities, but this is an extremely weak characterization that is intended only as an intuitive guide. English bare singulars, though commonly noted in descriptive grammars of English, have received limited attention in the theoretical literature (though bare singular count nouns in other languages have received more attention, such as Kallulli (1999) for Albanian, Borthen (2003) for Norwegian, or Munn and Schmidt (1999) for Brazilian Portuguese, among many others). One recent major work we are relying on which devotes itself to the subject of English singulars is Stvan (1998). In the following we are concentrating on the class of singulars which do not appear in adjuncts or conjunctions (Heycock and Zamparelli (2003) suggest a treatment of these as definites) and, to a reasonable degree of convincingness, are not parts of idioms (as e.g. being “with child” or told to “take heart”), among other subpatterns.
74
Greg Carlson and Rachel Shirley Sussman
The class we are currently interested in is exemplified in (6): (6)
a. b. c. d.
They found him in bed. The ship is at sea/at port. He’s in jail/in prison/in church. Mimi attended college/class/school.
First of all, this class is lexically restricted — it is a lexical feature of the noun itself that determines whether it can function as a bare singular. Even near synonyms of bare singular nouns do not necessarily function this way; the examples of (6) contrast with those in (7): (7)
a. *They found him in couch/cot/hammock (even if he sleeps there all the time). b. *The ship is at ocean/lake. c. *He’s in penitentiary/brig/mosque. d. *Mimi attended seminar/institution/ university(AmE)3.
Bare singulars do not admit of any modification, whether prenominal or postnominal4. The nouns of (8) are some found in (6), but the addition of modification renders them in need of an article or quantifier: (8)
a. b. c. d.
*She traveled on sore foot. *He was found in silk-sheeted bed. *Mimi attended class taught by Prof. Linskowski. *The ship is now in port that’s being dredged.
Another feature of these bare singulars is that a certain degree of “semantic enrichment” is added. (9)
a. Being in prison is not simply being in a prison, but that and more… b. Being in bed is not simply being in a bed, but that and more…
For instance, being “in bed” is not simply a locative statement, though it is that in part, but also at least strongly implies that the individual is there for the purpose of resting, sleeping, that is, using a bed for its intended design purpose. For instance, one would not say of a person lying on a bed who is actively writing a dissertation on her laptop that that person is “in bed”. Or a person who is “in prison” isn’t just there, but also, e.g. incarcerated. Similar intuitions are found pretty much across-the-board with this class of bare singulars.
Seemingly Indefinite Definites
75
A fourth feature of the distribution of bare singulars is that they must be ‘lexically governed’— or, more neutrally, cooccur with a designated class of other lexical items. In English this is most often a preposition but verbs can govern them as well. Which items may serve as governors is specific to the lexical identity of the noun5. The examples in (10) have inappropriate governing lexical items and hence are not grammatical: (10) a. *They found him on bed. b. *The ship is in sea. (OK in port) c. *He’s next to jail/prison/church. d. *Mimi destroyed college/class/school. From a semantic point of view, it is somewhat difficult to determine if bare singulars are definite or indefinite — their distributional properties preclude application of the standard tests. However, it is very clear that, like bare singulars in other languages or existential readings of bare plurals in English, bare singulars appear to take narrowest possible scope with respect to other operators in the same sentence. If one considers them existentially quantified, then the existential quantifier does not have variable scope. So, for instance, in (11) there appear no readings where an existential quantifier takes scope over the quantifier in the subject or the negation. (11) a. Each mobster went to prison. b. Most of them are in class. c. My seven sons attended college. d. Bob is not in bed. Let us now return to the topic of weak definites. It turns out that this class of definites, once appropriately identified, shares precisely the same set of restrictions as the bare singulars. This is demonstrated in the examples below. We are not going to take the time here to establish that each instance of what we claim to be a weak definite is one; we are implicitly relying upon VP-ellipsis tests of the type described above in each instance. It is vital to note that, for the most part any noun or construction which allows for a weak reading also allows for a regular definite reading — there is a systematic ambiguity in other words (though a few highly colloquial exceptions to this have been identified, such as “the pokey” meaning “prison”, which has no regular definite reading). For instance, “the newspaper” has a weak reading but alongside it there is the possibility of a regular definite reading in all constructions. The weak reading only occurs under certain conditions whereas the regular reading may occur under all circumstances.
76
Greg Carlson and Rachel Shirley Sussman
Like bare singulars, weak definites are lexically determined by the noun itself — it is a lexical property. (12) a. He went to the hospital (wk) vs. He went to the building (no wk) b. Scarface is in the pen (wk) vs. Scarface is in the cage (no wk). c. They listened to the radio (wk) vs. They listened to the tape recorder (no wk). Like bare singulars again, the weak reading disappears in the presence of modification6: (13) a. He went to the 5-story hospital (no wk). b. They both checked the calendar that was hanging upside down (no wk). c. Each man listened to the red radio on the picnic table (no wk). There is typically a certain amount of “semantic enrichment” associated with weak readings, in contrast to the regular definite readings. (14) a. Going to the store is going to a store and more…(shopping) b. Being in the hospital is being in a hospital, and more…(healing) c. Looking at the calendar is looking at a calendar, and more… (gathering information) And, like bare singulars, for a weak reading to appear the noun phrase must be appropriately “governed” by a set lexical item or a class of items determined by the identity of the noun. This is most often a preposition though verbs too may serve as governors. (15) a. Kenneth is at the store (wk) vs. behind the store. (no wk) b. They took the crash victims to the hospital (wk) vs. past the hospital. (no wk) c. Sally checked the calendar (wk) vs. tore the calendar. (no wk) As with the bare singulars, distributional restrictions preclude the usual tests for definiteness and indefiniteness. We do note, however, that weak readings of definites appear to take narrowest scope (if one considers them existentially quantified) with respect to other operators in the sentence.
Seemingly Indefinite Definites
77
In (16) we clearly see the possibility of distributed readings, in contrast to the examples in (17) which do not allow weak readings: (16) a. Each man listened to the radio. b. Every professor went to the store. c. Four students were busy reading the newspaper. (17)
a. Each man scratched the radio. b. Every professor pulled the cart. c. Four students were busy watching the lawn mower.
4
Preliminary conclusions
At this point we claim to have isolated a class of noun phrases with definite articles which share the distributional and semantic properties of bare singulars in English7. Obviously, a detailed syntactic and semantic analysis is called for at some point. However, our aims in this paper are limited to establishing that there is a distinct subclass of definites in English. In the next section we turn to the question of whether this distinction can be behaviorally established, and as we will shortly see the experimental evidence supports this distinction as well. We are also going to experimentally evaluate a question which we could not resolve in the discussion above. One possible suggestion is that the weak definites are in fact indefinites NP’s in disguise. That is, in “John read the newspaper” on the weak reading, the semantics is just that of “John read a newspaper”. This idea is not as strange at it may seem at first sight, for from a strictly truth-conditional point of view the two are actually equivalent. If one says of John that he is “in bed”, for instance, his presence in any bed with the intended purpose of rest will be sufficient. Or, if Mary went to the store (to shop), it need only be true that she went to some store or other. While one normally uses constructions to indicate a bed, store, etc., which figure in the individual’s habitual pattern of behavior, this is not a part of the assertion’s actual truth conditions, and it ends up being truth-conditionally identical to an indefinite – modulo semantic enrichment if indeed this is a part of the semantics and not just a (strong) implicature. We now turn to the experimental evaluation of this hypothesis. In the following sections, we will present empirical evidence for on-line processing differences between weak, or indefinite definites and their more commonplace regular definite counterparts.
78
Greg Carlson and Rachel Shirley Sussman
5 5.1
Experimental work Background
Spivey, Eberhard, Sedivy, and Tanenhaus (2002) present a study elucidating the referential assumptions that are introduced by normal definite noun phrases beginning with the article “the.” In this study, participants were seated before a real-world display containing a group of three apples, a single apple sitting by itself on a towel, an empty towel, and an empty box. They were then given the spoken instruction to “put the apple on the towel in the box.” As they performed this task, their eye movements were monitored. Analyses of these eye movements revealed that upon hearing the definite determiner and the noun, participant attention was naturally drawn to the singleton apple, in spite of the fact that at this point, the instruction is still fully ambiguous as to which apple will be referred to (see figure 1). For example, the instruction may well have continued, “put the apple that’s the furthest to the left in the group of apples in the box.” Nevertheless, participants consistently (and correctly) ignored the group of three apples in favour of the singleton apple as soon as they had heard the definite NP.
+
Put the apple on the towel in the box
Figure 1. Example from Spivey, Eberhard, Sedivy, and Tanenhaus (2002)
The results of this experiment confirm the function of the definite article as put forth in Roberts, (2003) – that is, that definite articles serve to pick out some sort of unique entity in the context. In light of their results, the Spivey et al. work can be seen as evidence that a normal definite article is automatically interpreted as referring to a “unique” entity in the context. Thus we should expect to see looks to items that can easily be isolated as having some unique property, which in the Spivey et al. experiment would be the apple that was separated from the group and sitting by itself.
Seemingly Indefinite Definites
79
Thus, the Spivey et al. experiment has provided us with a well defined set of expectations of how regular definites will be processed on-line with respect to a certain context. However, given the consistent failure of indefinite definites to pick out a unique referent outlined in the previous section, we might expect them to behave differently in on-line tasks. Namely, we would expect that for indefinite definites, the tendency for a noun-phrase of the form “the [noun]” to draw attention to singleton referents in the context should be lessened. 5.2
Our experiment
Our experiment was designed to determine whether indefinite definites would behave differently from regular definites during online referential processing. Specifically, we hypothesize that regular definites should draw participant attention to singleton targets, while indefinite definites will not. 5.2.1 Materials We selected six nouns that often function as indefinite definites and matched them with comparable nouns that were obligatorily regular definites. The noun pairs were matched with verbs that were known to support the indefinite definite reading (as verified in our off-line judgement survey, described above) and placed into a sentential frame. This yielded a set of 6 pairs of matched experimental sentences: one version that contained an indefinite definite, and one version that contained a regular definite, but was otherwise identical (see table 2 for a full list of experimental materials). For each sentence pair, we constructed a visual context meant to depict the scene just before the action depicted in the sentence is carried out. The scene showed a human actor, and three tokens of the object that was to serve as the patient of the action. Two of these were clustered near each other in a group, while the third was alone and isolated from the group by some distance. Additionally the scenes contained 3 distractor items that were not mentioned in the sentence, also presented in the form of a singleton and a small group of two (see figure 2). The arrangement of the items in the scenes was counterbalanced across items so as to avoid the possibility of participants coming to expect targets to appear in a particular location. This also served to avoid building participant expectations based on some interaction of object arrangement and actor eye gaze, body posture, etc.
80
Greg Carlson and Rachel Shirley Sussman
Table 2. Experimental materials 1. 2. 3. 4. 5. 6.
Regular Definite version When she gets back from jazzercise class, Tammy will listen to the record. Later this afternoon, Marilyn will open the box. Before she finalizes her plans, Tina will consult the map. Before she has to go to school, Lisa will answer the letter. When he is ready to go, Trevor will slam the lid on his way out. After she finishes her breakfast, Lydia will read the book.
Regular definites
Indefinite Definite version When she gets back from jazzercise class, Tammy will listen to the radio. Later this afternoon, Marilyn will open the window. Before she finalizes her plans, Tina will consult the calendar. Before she has to go to school, Lisa will answer the phone. When he is ready to go, Trevor will slam the door on his way out. After she finishes her breakfast, Lydia will read the newspaper.
Indefinite definites
Lydia will read the book.
Lydia will read the newspaper.
Figure 2. Scenes and spoken materials from the experiment
Crucially, however, the position of the actor and objects in the scenes remained constant across noun type conditions; that is, the indefinite definite version of an instruction was presented with exactly the same scene configuration as for the regular definite version. In this way we insure that any differences observed in the processing and comprehension of indefinite vs. regular definites cannot be due to variations across conditions in target salience or proximity to the depicted actor.
Seemingly Indefinite Definites
81
5.2.2 Procedure Participants saw the visual displays on a computer screen while they heard a pre-recorded spoken version of the sentence matched to the display from a nearby speaker. They had been instructed that after hearing the sentence, their task was to choose the item in the display that they thought was most likely to be involved in the upcoming action. By involving them in a task that forced them to referentially link the spoken materials to the provided visual context, we hoped to get an idea of the particular item that participants interpreted our nouns of interest as referring to. Given the referential properties of indefinite definites as well as the results of the Spivey et. al. work, we expected that while regular definites would result in more participants choosing the singleton target as most likely to be involved in the action, indefinite definites should exhibit less of a tendency to be interpreted as referring to the singleton item. As participants were performing this task, we monitored their eye movements. A large body of work has established that eye movements are closely time-locked to spoken language comprehension and thus provides a useful tool for observing processes of reference resolution (Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy, 1995, Eberhard et. al., 1995, Arnold et al, 2000, Runner, Sussman, and Tanenhaus, 2003, inter alia). By analyzing the time-course of eye-movements participants make as the spoken instructions unfolds, we can get an idea of which items in the display are being considered at any given moment as referents for our target noun. A total of sixteen members of the University of Rochester community took part in the experiment. All had normal or corrected to normal vision. None of the participants had taken part in any of the earlier pilot versions of this experiment, or in the pen-and-paper survey reported above. 5.2.3 Results This experiment yielded two types of results: target choice (member of the group target or the singleton target) and eye movements. For the indefinite definites, participants were much more likely to guess that a member of the group target would be involved in the upcoming action, choosing one of these items on 61% of trials. For regular definites, participants chose a member of the group target as involved in the action on only 33% of trials. This result illustrates two important aspects of indefinite definites. Firstly, target choice for indefinite definites and regular definites was significantly different, with indefinite definites eliciting more choices
82
Greg Carlson and Rachel Shirley Sussman
of group targets (t1(15)=4.66, p<.001, t2(5)=3.45, p=009). The second aspect to note is that for indefinite definites, choice of target item was equally distributed among the three available compatible targets, with each individual target being selected on a third of trials. For regular definites, a target item that was a member of the group target had only a 17% chance of being selected as the item most likely to be involved in the action described, while the singleton target had a 66% chance of being selected. In this way, regular definites exhibit a marked preference for the singleton target item, while indefinite definites fail to give rise to any specific expectation of which target item will be involved in the action. The analysis of the eye-movement data revealed a similar story. Here, during the window of time when the participant would be hearing the target noun of the spoken materials, they were (marginally) significantly more likely to fixate the group target if they were hearing an indefinite definite noun phrase than if they were hearing a regular definite (t1(15)=1.09, p=.14, t2(5)=2.15, p<.05). The experimental evidence thus far supports the existence of two separate classes of definites; both target choice and eye-movements revealed a systematic difference between the regular and indefinite definites. This evidence is in accordance with the observations of semantic difference put forth in the first half of this paper. In addition to the hypothesized difference between regular and indefinite definites, a certain affinity between normal indefinites and indefinite definites had been predicted. In the interests of testing this prediction, the current experiment included a third condition, namely, one where the experimental materials contained a normal indefinite phrase (see figure 3).
Charlie will take a banana.
Figure 3. Scene and spoken materials for trials testing regular indefinites
Seemingly Indefinite Definites
83
Trials involving regular indefinites did not differ in form from the trials described above; participants were presented with a scene while they heard a sentence about an event that was about to occur. Their task was to select the item that they believed would be most likely involved in the upcoming event. Contrary to expectation, the results of the regular indefinite trials were markedly different from the indefinite definite trials. Participants were much more likely to select the singleton item as involved in the event for regular indefinites than for indefinite definites; while this target was chosen for regular indefinites on 89% of trials, for indefinite definites participants chose the singleton on only 39% of trials (t1(15)=6.89, p<.001, t2(5)=2.36, p<.05). Eye-movements also revealed a striking difference between the two conditions. During the region corresponding to the pronunciation of the noun in the experimental materials, participants were much more likely to be looking at the singleton target for regular indefinites (t1(15)=3.92, p<.001, t2(5)=2.87, p<.05), and at the group target for indefinite definites (t1(15)=5.65, p<.001, t2(5)=2.69, p<.05). 6
Conclusions
The experiments reported here demonstrate that regular definites and indefinite definites constitute two separate and empirically distinguishable classes of noun phrase. It furthermore strongly suggests that the class of indefinite definites is also distinguishable from regular indefinites. Recognizing this subclass (and characterizing its boundaries), and pursuing a syntactic and semantic analysis for them can now proceed on firmer ground than we might otherwise have had, with possible consequences for the overall treatment of definites. We also found out something that we could not easily evaluate using the direct evidence from grammaticality and meaning judgements; that is, that the weak or indefinite definites should not be accorded an analysis which identifies them with ordinary indefinite noun phrases. While the data we have worked with here has been exclusively from English, there are strong indications that similar subgroups can be found in other languages with definite articles.
Greg Carlson and Rachel Shirley Sussman
84 Notes 1.
2. 3. 4.
5.
6.
7.
This material was previously presented to the Linguistics Department at the University Of Maryland, and we thank the audience for their helpful comments. Special thanks are due Paul Pietroski and Michael Israel for extended discussion. This material is based upon work supported by the NSF under Grant No. 0328849 (first author) and by NIH under Grant R01 HD27206 (second author). t1 and t2 refer to statistical analyses that treat participants as the random factor, and experimental items as the random factor, respectively. American and British English differ at least in the use of “university” and “hospital”. Both are unacceptable in American English as bare singulars. One reviewer noted the possibility of “He heard the program on local radio”, which sounds just fine. We’re not certain what is going on here, as in American English “?He heard the program on radio” has a marginal status that the reviewer’s example does not. We also note that “local” does not generalize to other bare singulars as a modifier; “*The ship is in local port”, “*Mary attended local class”, etc. While the “governing” item is typically adjacent, there are examples with certain nouns where matters are less clear. i) Prison is no place to make friends. ii) Class was really boring today. Such example contrast with: iii) *Port is a good place for ships to arrive at. iv) *Foot is a hard way to travel long distances. Purely affective modifiers, however, may appear: i) He’s reading the ol’ newspaper again. ii) So check the blasted/doggone... calendar again, OK? Even these, however, do not at all easily appear with bare singulars. Chris Barker (pc) has independently noted that relational nouns function in the same ‘weak’ way. For instance, if a house is on “the corner”, it is on one of four corners of which three need have not already needed to be eliminated from the discourse—it’s truth-conditionally simply on a corner. One distinction we do note is that such relational nouns do not seem to distribute: to say ‘Every house is on the corner the same corner” must be involved for each, and not different corners, in contrast to the examples we are considering here.
References Arnold, Jennifer, Eisenband, Jennifer, Brown-Schmidt, Sarah, and Trueswell, John 2000 The rapid use of gender information: Evidence of the time course of pronoun resolution from eyetracking. Cognition 76, 247-264. Borthen, Kaja 2003 Norwegian Bare Singulars. Doctoral dissertation, NTNU, Trondheim.
Seemingly Indefinite Definites
85
Eberhard, Kathleen, Spivey-Knowlton, Michael, Sedivy, Julie., and Tanenhaus, Michael 1995 Eye movements as a window into real-time spoken language comprehension in natural contexts. Journal of Psycholinguistic Research, 24, 409-436. Heycock, Caroline and Zamparelli, Roberto 2003 Coordinated bare definites, Linguistic Iinquiry 34, 443-69. Kallulli, Dalina 1999 The comparative syntax of Albanian: On the contribution of syntactic types to propositional interpretation. University of Durham dissertation. Munn, Alan, and Schmidt, Cristina 1999 Bare nouns and the morpho-syntax of number. Proceedings of the Linguistic Symposium on Romance Languages 1999. Roberts, Craige 2003 Uniqueness in definite noun phrases. Linguistics and Philosophy 26, 287-350. Runner, Jeffrey, Sussman, Rachel, and Tanenhaus, Michael 2003 Assignment of reference to reflexives and pronouns in picture noun phrases: evidence from eye movements. Cognition 89, B1-B13. Spivey, Michael, Tanenhaus, Michael, Eberhard, Kathleen, and Sedivy, Julie 2002 Eye movements and spoken language comprehension: Effects of visual context on syntactic ambiguity resolution. Cognitive Psychology 45, 447-481. Stvan, Laurel 1998 The semantics and pragmatics of bare singular noun phrases. Ph.D. dissertation, Northwestern University. Tanenhaus, Michael Spivey-Knowlton, Michael Eberhard, Kathleen and Sedivy, Julie 1995 Integration of visual and linguistic information during spoken language comprehension. Science, 268, 1632-1634.
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese Sonia M. L. Cyrino and Ruth E. V. Lopes
1
Introduction
The aim of this paper1 is to show that a feature that was relevant for language change is still operative in language acquisition, which should empirically confirm its importance for the change and its cue-like character, besides shedding light into the historical process. We focus here on the grammatical change that occurred in Brazilian Portuguese (BP) in object constructions where the loss of 3rd person clitic gave way to a null element in that position. We also examine present-day acquisition of the null category. Lightfoot (1999) has proposed that change in the frequency of some particular construction can serve as the triggering experience for grammatical change. At the same time this alteration is the cue for the acquisition of a different setting of a parameter. The effect on change is always seen a posteriori, hence features are assumed as relevant for it and their role in language acquisition is taken for granted. But their actual role in post-change language acquisition is seldom researched so as to confirm whether they still play a role in grammatical development, as seldom is language acquisition data used to explain the change. That is our methodological path in this paper. 2
The null object in Brazilian Portuguese
As is well known, Brazilian Portuguese (BP) exhibits null objects in any syntactic context (1), as opposed, for example, to European Portuguese (EP), which, according to Raposo (1986), does not allow the null object in islands. Thus, a sentence like (2) is ungrammatical in EP, but grammatical in BP:
88
Sonia M. L. Cyrino and Ruth E. V. Lopes
(1)
a. Comprei o casaco depois que experimentei . buy-1psPAST the coat after that try_on-1ps ‘I bought the coat, after I tried (it) on.’ b. Tirei o dinheiro do bolso e mostrei ao took-1ps the money from-the pocket and show-1ps to-the guarda. policeman ‘I took the money from my pocket and showed (it) to the policeman.’
(2)
O rapaz que trouxe agora mesmo da pastelaria era the boy that brought-3sg now just of-the pastry shop was teu afilhado your godson ‘The boy that brought (it) just now from the pastry shop was your godson.’
One striking aspect of BP null object is that it occurs more freely when the antecedent has a [-animate] feature: (3)
O Emilio perdeu [a carteira] e não consegue the Emilio loose_past_3sg [the wallet] and not can_pres_3sg achar /?ela em lugar nenhum. find_inf / it in place none ‘Emilio lost his wallet and can’t find it anywhere.’
(4)
A Clara não quer que [o filho] veja TV, the Clara not want_pres_3sg that [the son] watch_subj_3sg TV, então ela sempre leva */ele no parquinho. so she always take_pres_3sg *him) in_the park_little ‘Clara doesn’t want her son to watch TV, so she always takes him to the playground.’ If [+animate], the null object is also [-specific]:
(5)
a. O policial insultou [o preso] antes de torturar the policeman insult_past_3sg [the prisoner] before of torture_inf */ele. */ him ‘The policeman insulted the prisoner before torturing him.’
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese 89
b. O policial insulta [presos] antes de torturar the policeman insult_pres_3sg prisoners before of torture_inf /? eles. /? them ‘The policeman insults prisoners before torturing them.’ Several proposals for the syntactic characterization of the null object in BP either as an empty pronoun (pro) or a variable have been proposed in the literature (cf. Raposo 1986; Galves 1987, 1989; Farrell 1990; Kato 1993; Bianchi and Figueiredo 1994; Kato 2000; Barra Ferreira 2000, among others). However, the examination of diachronic data (see below) and the observation that the null object makes strict and sloppy readings available (cf. (6)), led Cyrino (1997) to propose that the null object should be characterized as an instance of reconstruction at LF and deletion at PF (cf. Fiengo and May 1994): (6)
De noite, João liga seu aparelho de som, mas Pedro at night, João on_turn_pres_3sg his sound system, but Pedro desliga. . off_turn_pres_3sg (it) ‘At night, João turns on his sound system, but Pedro turns it off.’
The striking fact is that the sentence in (6) allows for strict (Pedro turns off João’s sound system) or sloppy reading (Pedro turns off his own sound system). However, if the null object is replaced by either a clitic or a full pronoun the ambiguity goes away, and only the strict interpretation is available: (7)
De noite, João liga seu aparelho de som, mas Pedro at night, João on_turn_pres_3sg his sound system, but Pedro desligao /ele. off_turn_pres_3sg itCL/it ‘At night, João turns on his sound system, but Pedro turns it off.’
We will rely on the findings reported above and compare the results of the diachronic study on the null object in BP (Cyrino 1997) with acquisition data, in order to examine how this kind of linguistic evidence is able to contribute to current linguistic theory relating syntactic change and acquisition.
90
Sonia M. L. Cyrino and Ruth E. V. Lopes
3
Diachronic results
The diachronic study of null objects in BP (data from comedies and light plays, XVIth through XXth centuries2) shows that the null object construction increases through time, as can be seen in Table 1: Table 1. Distribution of null vs. filled positions, in Cyrino (1997) Century XVI XVII XVIII XIX XX
null positions n. % 31 11 37 13 53 19 122 45 193 79
filled positions n. % 259 89 256 87 234 81 149 55 51 21
n. 290 293 287 271 244
TOTAL % 100 100 100 100 100
The study also shows that there is an increase in the occurrences of the null objects with antecedents which are NPs [+specific, -animate] in the XIXth century, while the increase in the null objects with [-specific, +animate] antecedents happens only in the XXth century: Table 2. Null objects according to specificity and animacy features in the antecedent. (Numerator = null; Denominator = null + overt objects) Century XVI XVII XVIII XIX XX
[+spec, +ani] NP 1% (1/78) 7% (2/31) 5% (1/21) 2% (1/46) 0
[+spec, -ani] NP 5% (3/61) 3% (2/69) 8% (8/99) 49% 37/75) 87% (64/74)
[-spec, +ani] NP 3% (1/8) 4% (1/24) 0 0 57% (4/7)
[-spec, -ani] NP 8% (2/26) 23% (15/61) 6% (2/32) 8% (1/12) 93% (27/29)
An important observation brought to light by this study is that Portuguese has always allowed a construction dubbed as “propositional ellipsis” by Cyrino (1997), which could be replaced by a neuter clitic “o”, as in (8): (8)
Pedro pediu para ser o professor da turma Y, antes Pedro ask_past_3sg to be_inf the teacher of_the class Y, before de Jane (o) solicitor. of Jane (it) solicit_inf ‘Pedro asked to be the teacher of class Y, before Jane asked for (it).’
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese 91
However, Table 3 below shows that there is an increase for “propositional ellipsis” in BP, followed by a subsequent increase for the null object with the same type of ([-animate]) feature. Table 3. Null objects (and ellipsis) according to type of antecedent, adapted from (Cyrino (1997); VP-ellipsis and null complements in imperative clauses excluded (Numerator = null; Denominator = null + overt objects) Century
[+ani] NP
[-ani] NP
XVI
2% (2/86)
6% (5/87)
Propositional ellipsis 23% (23/99)
XVII
5% (3/55)
13% (17/130)
21% (14/68)
XVIII
5% (1/22)
8% (10/131)
45% (41/90)
XIX
1% (1/79)
44% (38/87)
83% (81/98)
XX
14% (4/28)
88% (91/103)
91% (97/107)
Cyrino (1997) observes that if using propositional ellipsis or the neuter clitic o in its place were just a matter of choice by the speaker - that is, for example, what Cyrino (1992) finds for EP – , we would expect no changes through time. But the data in BP shows that there is a change in the occurrence of this construction, as we see in the propositional column in Table 3, beginning in the XVIIIth century. In other words, in the XVIth century, one had the option of using or not using the neuter clitic, but the preference was for the clitic (77% of clitics in the data). But in the XXth century, the situation was reversed, with the preference for the ellipsis (9% of clitics in the data). Cyrino (1997) argues that the positive evidence for the child changed through time – she would hear more and more cases of ellipsis in a structure in which a neuter clitic was also allowed by the adult grammar. As a consequence, there is an extension of the possibility of the ellipsis/reconstruction analysis to other instances where the antecedent had the [animate] feature. Therefore, the null object in BP has appeared with the characteristic strict/sloppy ambiguity found in ellipsis constructions, as seen above. Thus it is a case of nominal ellipsis. The analysis for the null object in BP as ellipsis comes up due to historical facts, but also due to the possibilities for the interpretation of the empty category. Cyrino (1997) assumes that some pronouns which have low semantic value, such as it in English (9) and the neuter clitic o in Portuguese (8), can also be thought of as reconstruction at LF, and, because
92
Sonia M. L. Cyrino and Ruth E. V. Lopes
of that, they can be null in languages which allow NP ellipsis. Such elements depend on their antecedent for the interpretation of their content: (9)
The man who gave his paycheck to his wife was wiser than the man who gave it to his mistress.
Cyrino, Duarte and Kato (2000) propose A REFERENTIAL HIERARCHY in order to explain the fact that the less referential items, which were lower in the hierarchy, were the first to become phonetically null, while the more referential items, higher in the hierarchy, were the first to become phonetically overt. REFERENTIAL HIERARCHY non-arguments propositions 3rd person [- specific] [- referential]
[-animate]
3rd person 3rd person Å--------------------------------Æ Å--------------------------------Æ
[+animate] [-human] [+human] 3rd 3rd– 2nd – 1st [+ specific] [+ referential]
Thus, [+N, +animate] arguments are in the highest position in the hierarchy, and non-arguments in the lowest one. Regarding pronouns, the speaker (= I) and the hearer (= you), being inherently humans, are in the highest position and the third person that refers to a proposition is in the lowest position, with the [-animate] entity in the middle. The [r specific] features interact with all these features. The authors show that the animacy feature is relevant both to the loss of the null subject and to the appearance of the null object in BP. They propose that this Referential Hierarchy would be operative during language acquisition and according to the data the child has access to, she will presume the pronouns in the language are null or overt. For the object, specifically, the authors predict that if the input exhibits a pronoun or a clitic in a lower position of the hierarchy, the child would consider it a weak pronoun in either a head or argument position; therefore, all the higher positions would also be lexical pronouns or clitics (e.g. English, European Portuguese). However, if the input shows a null object for a referential entity, say, for an [-animate] entity as in BP, the child assumes that all lower positions can be null. Thus, for a language that has the internal option for full or empty categories, one of the factors that can influence the choice is the animacy status of the antecedent. In this paper we add another feature to the Hierarchy, namely, Į- and ȕoccurrences. According to Fiengo and May (1994), nominal elements may
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese 93
be independent (Į-) or dependent (E-) occurrences. Į-occurrences are highly referential – they have their referentiality directly determined, independently of their syntactic position. Strict and sloppy readings stem from Į-indexation and from E-indexation of pronouns, respectively. This is so because reference for D-occurrences is established independently for each occurrence, even if they are coindexed, whereas E-occurrences are indexical dependencies, being well formed if there is another occurrence with the same index value that they can depend on. Hence, a pronoun with a E-type index gets its reference from the element it is connected to. Thus, the revised Hierarchy would be as follows: REFERENTIAL HIERARCHY (REVISED) non-arguments Propositions [-animate] 3rd person [- specific] [- referential] ȕ- occurrences
3rd person 3rd person Å-----------------------------Æ Å-----------------------------Æ
[+animate] [-human] [+human] 3rd 3rd– 2nd – 1st [+ specific] [+ referential] Į-occurrences
The diachronic results shown above and the Referential Hierarchy proposed raise some questions concerning language acquisition, once we adopt the view that the child extended the ellipsis possibility to the structure of the other pronouns whose antecedent also had the [+specific, animate] features. The natural question then is: are such features relevant for language acquisition, so that they could have started cuing the child’s grammar, eventually leading her to consider structures with the other 3rd person clitics as structures allowing ellipsis? We turn now to the examination of acquisition data. 4
Acquisition results
Lightfoot (1994: 130) states that “there can be no change in grammars without change in trigger experiences”. In his view, shifts in trigger experience consist in changes in frequency, “changes resulting from the way that grammars were used rather than changes in the grammars themselves”. Such shifts may become critical for language acquisition, cuing a new grammar. That seems to be the case at hand. As we have seen, the historical data suggest that the positive evidence for the child changed through time – she heard more and more cases of
94
Sonia M. L. Cyrino and Ruth E. V. Lopes
ellipsis in a structure in which a neuter clitic used to be allowed. This constitutes a shift in frequency, which, in turn, given UG architecture (the Referential Hierarchy), cued the child in extending the ellipsis possibility to the structure of the other pronouns whose antecedent also had the [+ specific, - animate] features. According to the analysis presented here, such expressions are the result of reconstruction of the antecedent at LF and can be elided at PF because their referentiality is very low. On the other hand, the strong pronoun, being high in referentiality, is the “audible” realization of the features of the antecedent. We have to bear in mind, then, that this is the picture for acquisition and from it one should expect that children would use the null option from the onset. We should also bear in mind that our aim is to check whether such features, which seemed to be relevant for the shifts in the child trigger experience, still play a role in their acquisition of the null object nowadays. In other words, could those features be the cue for grammar convergence? The spontaneous speech production of two children was examined. They are both daughters of highly educated parents. One of them, R., from São Paulo – a southeastern state of Brazil – was recorded from 1;9 to 2;8 years of age. The other, AC, from Rio Grande do Sul – the extreme southern state of the country – was recorded from 1;8 to 3;7 years of age. With regard to the null object, there are no observable dialectal differences between these two varieties of Brazilian Portuguese. For the analysis of the data, only transitive, ditransitive and ECM verbs were considered, those that in other Romance languages would require a clitic in anaphoric complements. Categorically null objects, such as in sentence ellipses (10) or short answers (11) were disregarded: (10) A(dult): E o que acontece na história do Príncipe do Egito? ‘And [what happens in the story of the Egyptian prince]?’ C(hild): Já esqueci . (AC, 3;7) already forget_past_1sg ‘I’ve already forgotten it.’ (11) A: A senhora aceita um suco? (adult and child pretend to host a tea party) ‘Would you, madam, like a glass of juice?’ C: Aceito . (AC, 2;1) accept-_1sg ‘Yes, I do.’
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese 95
Table 4 presents the overall results for both children. We can see that although both of them use null objects, they are still quantitatively far from the adult expected grammar, where null objects reach around 60% and strong pronouns 15%, according to Duarte (1986). We will return to that point.3 Table 4. Overall results for both children Null N % 275 29.2
Strong pronouns N % 93 9.8
DPs + bare Ns N % 575 61
Total N 943
% 100
Considering only the null and pronominal realizations of the object, we reach the figures in Table 5 below, which show that when DPs and bare Ns are excluded, and the option is between a strong pronoun or a null category, it becomes clear the child’s preference for the null. But as we will discuss below, this does not mean that the child’s null is always the same one. Table 5. Mean results for null and pronominal realizations of the object Child R. AC Both
Null N 134 141 275
% 75.2 74.2 74.7
Strong pronoun N % 44 24.8 49 25.8 93 25.3
Total N 178 190 368
% 100 100 100
The results on Tables 6 and 7 below show an increasing pattern of the use of pronouns over time, provoking a decrease in the use of nulls. Both children start out with 100% of nulls. But does it mean we are dealing with one and the same null category or does its status change over time? Tables 6 and 7 show that both children start out with a production of 100% of null objects, as we have already pointed out. Obviously such a figure decreases when pronouns kick in. For AC, it happens when she is 2;3 and for R., when she is 1;10. Looking at the data, what we see is that the initial null objects are instances of deictic-like elements, but when pronouns start to be produced in object position, the null category becomes anaphoric.
Sonia M. L. Cyrino and Ruth E. V. Lopes
96
Table 6. Null and pronominal objects for each child over time Age Null 1;8-1;9 1;10 2;1 2;3 2;8 3;0 3;7 Total
N 6 1 6 22 30 50 26 141
% 100 100 100 85 73 64 81 74.2
AC Pronoun N % 0 0 0 0 0 0 4 15 11 27 28 36 6 19 49 25.8
Total N 6 1 6 26 41 78 32 190
% 100 100 100 100 100 100 100 100
Table 7. Null and pronominal objects for each child over time Age
R. Pronoun
Null 1;8-1;9 1;10 2;1 2;3 2;8 3;0 3;7 Total
N 3 12 62 50 7 134
% 100 75 69.7 84.7 64 75.3
N 0 4 27 9 4 44
% 0 25 30.3 15.3 36 24.7
Total Total N 3 16 89 59 11 178
% 100 100 100 100 100 100
N 9 17 95 85 52 78 32 368
This should be clearer when we cross such results with the animacy feature of the antecedent. For now, let us look at some examples, comparing (12) – a deictic use of null – to (13), an anaphoric null: (12) a. Garda (= guarda) aqui. (R., 1;9) keepimp here’ ‘Keep it here.’ (The child says the sentence while holding her pacifier, obviously referring to it.) b. Tila (= tira) umbassu (= embaixo). (R., 1;9) takeimp from_under ‘Take it from under (the tape recorder)’. (When the child asked her mother to keep the pacifier, the mother placed it behind
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese 97
the tape recorder. Now the child points to the pacifier while asking for it.) (13) Não vou guardar . (AC, 3;7) not will_1sg keep ‘I won’t put them away.’ (referring to her toys. The child wants to watch a movie on TV, so she comes to her mother in order for her to turn the TV on. But the mother knows that the child was playing in her room and that there are toys all over the place. So her mother says: Put your toys away and then you may watch the movie. The child walks away, while muttering the sentence in (13).) Considering only the null objects, Table 8 presents the results for the semantic features [+animate] and [+specific] of the antecedent: Table 8. Null objects according to the semantic features of the antecedent for one child (AC), during development. (Numerator = null; Denominator = null + pronominal) Age 1;8 1;10 2;1 2;3 2;8 3;0 3;7
[- animate] 100% (2/2) 100% (1/1) 100% (3/3) 100% (17/17) 78% (25/32) 66.7% (36/54) 61.5% (16/26)
[+animate] 0 0 0 50% (1/2) 100% (1/1) 25% (3/12) 42.8% (3/7)
[+ specific] 100% (2/2) 100% (1/1) 100% (3/3) 88.9% (8/9) 72% (18/25) 53.5% (30/56) 51.7% (15/29)
[- specific] 0 0 0 100% (10/10) 100% (9/9 ) 100% (9/9) 100% (4/4)
The most important result to be brought to light is the high percentage of [-animate] null objects, especially with [+specific] antecedents (70 instances over 30 with the [-specific] feature). Comparing Table 8 above to Table 2, we see that the results for the [animate] feature are close to the XX th century data, as expected, while the unexpected case is for the [+animate] feature. When [+specific], the child should use a filled element and not the null. In any event, there are only 8 instances of such antecedents recovered by a null element (cf. numerators of second column in Table 8). This is probably an overgeneralization of the animacy feature, which seems to be the real cue for the acquisition not only of the null element, but also for the object pronominal system as a whole. Meanwhile, the child still has to deal with the specificity feature. This should explain why the child’s grammar is still quantitatively far from the adult one. It also points to a piecemeal process involving semantic
98
Sonia M. L. Cyrino and Ruth E. V. Lopes
interpretation and the referentiality hierarchy – fine-grained, subtle differences for the child to grasp. Turning to the use of pronouns, on the other hand, we get a neater picture. The [+animate] feature on the antecedent was divided into human and non-human. The non-human cases are the 8 instances in Table 8; as to the human ones, there are 14 instances (63.6%) all realized as a strong pronoun; nevertheless they are produced later. (14) E sabe quem pegou ele no final? (AC, 3;7) and know who catch_past_3sg him in the end ‘And do you know who finally caught him?’ (Referring to a baby) As we pointed out before, age 2;3 seems to be the critical period for the acquisition of the null object for AC. That is the age group when pronouns start to be used; therefore, it is the period when animacy of the antecedent becomes expressible by the child. According to our hypothesis, then, that is the period in which the child moves away from the deictic null category towards an adult-like representation of the null object (see examples 12 and 13 above, for the distinction) cued by the semantic features on the antecedents. Table 9 shows the results for R., this time with both animacy and specificity features for the antecedents. Table 9. Null objects according to the semantic features of the antecedent for one child (R.), during development. (Numerator = null; Denominator = null + pronominal) Age 1;9 1;10 2;1 2;3 2;8 Average
[-anim/ +spec] 100% (4/4) 92.3% (12/13) 75.3% (52/69) 95.5% (42/44) 70% (7/10) 83.6% (117/140)
[+anim/ +spec] 0 0% (0/2) 57% (4/7) 45.5% (5/11) 100% (1/1) 47.6% (10/21)
[-anim/ -spec] 0 0% (0/1) 46% (6/13) 33.3% (1/3) 0 41.2% (7/17)
[+anim/ -spec] 0 0 0 0 0 0
R. starts using pronouns in object position quite early, when she is 1;10. Thus, this seems to be the relevant age in which the following correlations apply: [+animate] antecedents are mostly expressed by a strong pronoun, and the null element is generally used to express [-animate] antecedents.
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese 99
That is probably the age in which the child starts to move away from the deictic-like null to the anaphoric one. Although the age in which such phenomena crosscut the data is different for each child, the same strong correlations apply; in other words, the data comparison reveals a clear acquisition pattern. Next, Table 10 focuses on the comparison of the semantic features of antecedents for null in both children, considering their averages for all ages, to the historical results for the XXth century: Table 10. Mean percentages of null for each child and results for XXth century Child AC R. XXth
[-anim/ +spec] 66.6% (70/105) 83.6% (17/140) 87% (64/74)
[+anim/ +spec] 33.4% (8/22) 47.6% (10/21) 0
[-anim/ -spec] 100% (30/30) 41.2% (7/17) 93% (27/29)
[+anim/ -spec] 0 0 57% (4/7)
The clearest result involves [-animate, +specific] features. Undoubtedly those are the most relevant features and the first ones to be manifested in acquisition, probably due to their cuing effect. However, it should be noted that the deictic-like nulls found in initial production fall into this category. As should be expected, the problem lies with the [+animate] feature. Children still use null categories for the [+animate, +specific] ones whereas adults would probably prefer a pronoun, although the percentages are not very high. The unexpected results have to do with R.’s low production of nulls with [-animate/-specific] antecedents. As for the [+animate/-specific] antecedents, there are few instances of them even in the historical data; therefore it doesn’t seem to be a productive scenario for nulls, which should explain their absence in the children’s data. We would like to explain these results by resorting to the hypothesis that children start out with a general E indexation, which is taken here as an across-the-board initial strategy. Indeed, Foley at al. (2003), in an experimental study on VP-ellipsis knowledge in small children (aged 3;0 7;11) acquiring English, demonstrate that 59% of their subjects (total of 86 children) show sloppy readings only; 31% of them present both readings. Moreover, the authors show that there is a developmental effect in which strict readings or acceptance of both readings appears later and the strict interpretation is accessed less often. Additional support for our hypothesis comes from Thornton and Wexler (1999). Their results show that children first interpret pronouns as E-
100
Sonia M. L. Cyrino and Ruth E. V. Lopes
occurrences, that is, as dependencies, even in structures where they are clearly D-occurrences, as in (15), where the pronoun cannot be construed as anaphoric to the subject in the adult grammar: (15) Mamma Bear washes her. This also occurs when there is VP ellipsis: (16) Pappa Bear licked him and Brother Bear did, too. (did too = licked him, Brother Bear). If the child E indexes across-the-board, then she does so even when the antecedent bears the [+specific] or [+animate] feature values, which should be D-occurrences, high in the referential hierarchy in the adult grammar, as in: (17) João1 viu
ele. (ele = o João1,in child grammar; o Pedro2 in adult grammar) João see-past-3sg him ‘João saw him.’
Given that: a) the child initially E indexes across-the-board; b) Eoccurrences are low in the Hierarchy as opposed to D-occurrences; c) BP data available for the child show null elements in the low positions of the Hierarchy, one can hypothesize that in a language like BP, the child will first produce null objects even for [+animate, +specific] antecedents. Results in Table 10 are thus explained. When the strict reading becomes available, and, therefore, the Dindexation for [+specific] antecedents, then null decreases in such contexts, and the child converges into the target grammar. The acquisition data also suggest that diachronic change was possible due to the child’s initial tendency for E-indexation, the existence of propositional ellipsis, and the Referential Hierarchy. Given the Referential Hierarchy and the input already providing evidence for the [-animate] cases, the child consequently extended the possibility of ellipsis to [+specific] elements as well. If UG has to deal with semantic features that are rather subtle for interpretation, then, it seems quite natural that the child will start out with the least referential elements, according to the Referential Hierarchy. This
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese 101
is a justified reasoning not only for BP but also for English, as can be seen in Table 11. Table 11. Adapted from Fujino and Sano (2002:7), Table 5 Child
Age
“it”/other pronouns
Null
E. N.1 N.2
1;6 1;11 1;6 – 1;10
37 (53,6%) / 3 (4,3%) 29 (18,3%) / 2 (1,3%) 8 (30,7%) / 1 (3;9%)
4 (5,8%) 3 (1,9%) 1 (3,9%)
As seen above, we assume that some pronouns which have low semantic value, such as it in English and the neuter clitic o in EP, can also be thought of as reconstruction at LF, and as a consequence, they can be null in languages which allow nominal ellipsis – the BP case. Table 10 clearly shows the child’s initial use of a neuter pronoun, exactly the picture found for BP, with the exception that in BP a null category is the natural choice in the paradigm. We can also predict the extension of the [-animate/+specific] features: if the null started out as a propositional null and extended to other objects with the same features, those should be the first anaphoric uses of null object by the child. 5
Conclusion
In this paper, we have shown that a feature that was relevant for a change in BP is still operative in language acquisition. We have assumed here, after Lightfoot (1999), that changes are not always grammar-driven. What happened in BP was a shift in the frequency of use of the neuter clitic o – having propositions as antecedent – and its null counterpart. Once the null was high enough, probably around the XXth century on, the animacy/specificity feature was extended to other null elements, working as a cue for the new grammar to be set. We saw a shift in trigger experience consisting of changes in frequency (of language use) that then drove a change in the grammar. That shift became critical for language acquisition, cuing a new grammar (Lightfoot 1994). In our case, the semantic features of the antecedent were the driving cue.4 Only by crossing diachronic and acquisition data does it become possible to thoroughly examine the role of a cue in change.
Sonia M. L. Cyrino and Ruth E. V. Lopes
102
We believe there are important theoretical consequences to our proposal. The first one is that we may take cue-based theories seriously and try to show how a cue can be operative after a change occurred in a language, explaining the change itself. Secondly, it places some questions about acquisition proper within the generative framework. It seems clear that semantic features drove the change at hand and play a role in the acquisition of the object pronominal paradigm in BP. The question then is how languages select non-formal features for parameter (re)-setting. On the other hand, if semantic features are, by definition, interpretable at LF, then LF should also be taken into account as input to language acquisition. Notes 1.
2.
The work reported here was supported in part by grants of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq/Brazil) for both authors. We thank the audience at the International Conference on Linguistic Evidence held in Tübingen and an anonymous reviewer for their valuable comments. All remaining errors and shortcomings are the sole responsibility of the authors. The data for the historical study come from the following comedies and light plays: a) XVI th century 1. Dois autos de Gil Vicente ("da Mofina Mendes" and "da Alma"), edited by Sousa da Silveira, with fac-símile of the 1562 edition. Ministério da Educação e Cultura - Fundação Casa de Rui Barbosa, 1973. 2. Gil Vicente, Obras Completas, notes by Marques Braga, Lisboa, Livraria Sá da Costa, 1968. 3. Camões, Comédias, edited by Paulino Vieira, São Paulo, Nova Era, 1923. b) XVII th century 1.Gregório de Matos, Obras Completas São Paulo, Edições Cultura, 1945. c) XVIII th century 1. Antonio José da Silva (o Judeu), Obras Completas, notes by José Pereira Tavares, Lisboa, Livraria Sá da Costa, 1957. 2. Caldas Barbosa, Viola de Lereno, notes by Francisco de Assis Barbosa, Rio de Janeiro, Imprensa Nacional, 1944. c) XIX th century 1. Martins Pena, O Juiz de Paz da Roça and O Judas no Sábado de Aleluia, edited by Amália Costa, Rio de Janeiro, Organizações Simões, 1951. 2. Comédias de Martins Pena, edited by Darcy Damasceno with the collaboration of Maria Filgueiras, Rio de Janeiro, Edições de Ouro, 1966.
Animacy as a Driving Cue in Change and Acquisition in Brazilian Portuguese 103
3. 4.
3. José de Alencar, O Demônio Familiar, Rio de Janeiro, Ministério da Educação e Cultura, Serviço de Documentação, Departamento de Imprensa Nacional, 1957. 4. Arthur Azevedo, O Tribofe, edited by Rachel Teixeira Valença, Rio de Janeiro, Nova Fronteira, 1986. d) XX th century 1. Marques Rebelo, Rua Alegre, 12, Curitiba, Editora Guaíra, 1940. 2. Dias Gomes, O Pagador de Promessas, Rio de Janeiro, Bertrand Brasil, 1987. 3. Gianfrancesco Guarnieri, Um Grito Parado no Ar, São Paulo, Monções, 1973. 4. Miguel Falabella, No coração do Brasil, ms., 1992. For a discussion about the high percentages of anaphoric DPs in early child language, see Lopes (2003). We will not explore these findings here. An anonymous reviewer has pointed out that the ‘animacy’ feature that plays a role in change and language development could be taken as evidence for a more general conclusion, namely, that cognitive factors may be instrumental in both types of changes. We acknowledge that it might be so, but we do not want to commit ourselves to such a view here.
References Barra Ferreira , Marcelo 2000 Argumentos nulos em português brasileiro M.A. thesis, UNICAMP, Brasil. Bianchi, Valentina and Maria Cristina Figueiredo Silva 1994 On some properties of agreement-object in Italian and Brazilian Portuguese. In Mazzola, Maryann (ed.), Issues and theory in Romance linguistics, pp. 181-197. Washington, DC, Georgetown University Press. Cyrino, Sonia Maria Lazzarini 1997 O objeto nulo no português do Brasil - um estudo sintático-diacrônico. Londrina, Editora da UEL. Cyrino, Sonia, Maria Eugênia Duarte and Mary Kato 2000 Visible objects and invisible clitics in Brazilian Portuguese. In Mary Kato and Esmeralda Negrão (eds), Brazilian Portuguese and the Null Subject Parameter, pp. 55-74. Frankfurt am Main, Vervuert. Duarte, Maria Eugênia Lamoglia 1986 Variação e sintaxe: clítico acusativo, pronome lexical e categoria vazia no Português do Brasil. MA Thesis: PUC, São Paulo. Farrell, Patrick 1990 Null objects in Brazilian Portuguese. The linguistic review 8: 325-346.
104
Sonia M. L. Cyrino and Ruth E. V. Lopes
Fiengo, Robert and Robert May 1994 Indices and identity Cambridge,MIT Press. Foley, Claire; Zelmira del Prado; Isabella Barbier and Barbara Lust 2003 Knowledge of variable binding in VP-ellipsis: language acquisition research and theory converge. Syntax 6: 52-83. Fujino, Hanako and Tetsuya Sano 2002 Aspects of the null object phenomenon in child Spanish. In Juana Liceras and Ana Teresa Perez-Leroux (eds), The acquisition of Spanish morphosyntax. Dordrecht: Kluwer. Galves, Charlotte 1987 A sintaxe do português brasileiro. Ensaios de lingüística 13:31-50. 1989 Objet nul et la structure de la proposition en Portugais Brésilien. Review des Langues Romanes 93: 305-336. Kato, Mary 1993 The distribution of pronouns and nul elements in object position in Brazilian Portuguese. In William Ashby, Marianne Mithun, Georgio Perissinoto and Raposo, Eduardo (eds.) Linguistic perspectives on the romance languages, pp. 225-235. Amsterdam, John Benjamins. 2000 Pronomes fortes e fracos na sintaxe do português brasileiro. [Strong and weak pronouns in Brazilian Portuguese syntax]. ms. Lightfoot, David 1994 Shifting Triggers and Diachronic Reanalyses. University of Maryland Working Papers in Linguistics 2: 110-135. 1999 The development of language: acquisition, change, and evolution. Oxford, Blackwell. Lopes, Ruth Elizabeth Vasconcellos 2003 The production of subject and object in Brazilian Portuguese by a young child. Probus 15: 123-146. Raposo, Eduardo 1986 On the null object in European Portuguese. In Osvaldo Jaeggli and Carmen-Silva Corvalán (eds.) Studies in Romance Linguistics, pp. 373390. Dordrecht, Foris. Thornton, Rosalind and Wexler, Kenneth 1999 Principle B, VP ellipsis and interpretation in child grammar. Cambridge, MIT Press.
Aspectual Coercion and On-line Processing: The Case of Iteration Sacha DeVelle
1
Introduction
Aspectual coercion refers to an inferential process not explicitly stated in surface form structure. A prime example of this process is an iterative interpretation, that is, the encoding of a series of repetitions within a given situation, rather than in a protracted event over time. The real time processing implications of aspectual coercion in general, and iteration in particular, has been examined in recent on-line research. Evidence from two studies (Piñango, Zurif, and Jackendoff 1999; Todorova, Straub, Badecker, and Frank 2000) shows a processing load that emerges at or just after the durational adverbials for and until, following punctual verbs. Although these findings show where the coercive effect emerges, how it is triggered is still an open empirical question. Results from the study reported here show that plausibility constraints may provide an alternative explanation for the increased processing load reported in earlier work. More stringent controls in a) formal descriptions of iteration and b) sentence stimuli design are put forward as necessary features for future experimentation. On-line processing effects of coercion are discussed in light of Jackendoff's (1999, 2002) enriched composition hypothesis and the tripartite parallel architecture system. 2
Aspectual coercion
The interpretation of sentential meaning involves a complex interplay of verb class, arguments and/or other aspectually sensitive elements as integrative features of non-linguistic (conceptual) sentence level representations. A prime example of this integration is aspectual coercion. Take the following sentences: (1) a. The author began the book about tribalism in Africa. b. The student borrowed the book for a week.
106
Sacha DeVelle
c. d.
The athlete won the competition for two years. The tourist photographed the sunset until nightfall.
The ultimate interpretation of each sentence is the result of an inferential process that goes beyond the information explicitly stated in surface form structure. In sentence (1a) the semantic combinatorial effects of the verb (begin), subject (author) and object noun phrase (book) yield an implicit complement reading interpretation that is The author began (writing/to write) the book. In sentence (1b) the effects of the durational adverbial for a week trigger a time period that follows the initial reading of The student borrowed the book. The example in (1c) suggests a sense of repetition via an achievement verb (won) and a singular object (the competition) across a prolonged time span (for two years). Sentence (1d) invokes a similar sense of repetition, within an immediate time span. The interactive effects of point action verb class (photographed), singular object (the sunset) and a durational adverbial (until nightfall) result in a repetitive effect absent from other class combinations (e.g., the same object and adverbial with an activity verb class such as The tourist watched the sunset until nightfall). 2.1
Iteration
Iteration describes the encoding of a series of repetitions within a given situation, rather than a single protracted event over time. An iterative reading is particularly enhanced by the semantic punctual feature of point action verbs (jump), a sub-class1 that has traditionally fallen under the rubric of achievement verbs. In general terms both verb types describe punctual events. However, an achievement verb has an inherent (telic) endpoint that encodes a change of state (to lose) or an inceptive meaning that focuses on the beginning of an action (to enter). In comparison, a point action verb lacks such complex meaning, has an arbitrary (atelic) final point and can terminate at any time. It may reflect either a single act (dive) or an iterative act (knock) (Frawley 1992). The conceptual expectations triggered by singular and iterative scenarios reflect an operation that re-aligns temporal differences between aspectual properties of the verb and external modifiers that place a temporal bound at the sentential level. The durational adverbials for and until are prime examples of external modifiers (The man knocked vs. The man knocked for a minute/ until he fell asleep).2 Importantly for the present discussion, the English tense system does not incorporate an iterative morpheme that encodes the notion of repetition. The coer-
Aspectual Coercion and On-line Processing: The Case of Iteration
107
cive process is purely semantic in nature. Furthermore, all the above combinatorial effects involve extra processing capabilities needed to integrate such features in the course of comprehension. The resulting sentence is not ungrammatical but often gives the comprehender pause (e.g. The girl dived in the pool for five minutes). 3
Linguistic models of coercion
Linguistic and computational approaches to aspectual coercion have assumed either a compositional process (Krifka 1998; Pustejovsky 1998; Jackendoff 1997, 2002) or what I will call a transitional operation (Moens and Steedman 1988; De Swart 1998). Compositional and transitional models are similar in key respects. Both advocate a) a type-shifting operation that aligns semantic mismatches between the verb and an external modifier, and b) an end result that involves a process of contextual re-interpretation. However, in terms of emphasis, transitional models emphasize silent syntactic and semantic operators that provide aspectual transitions between meanings. There is less emphasis on the lexical decomposition evident in compositional approaches, and a greater focus on grammatical aspect.3 Compositional models in contrast place particular emphasis on the role of lexical semantics and the relationship between semantic and pragmatic structure. Lexical decomposition, which concerns the internal structure of nouns and verbs, and the role of external modifiers are integral factors in the coercive process. Compositional approaches are particularly concerned with the role of conceptual integration at the sentential level (Jackendoff 2002; Pustejovsky 1995). In particular, Jackendoff's (1999, 2002) parallel architecture (PA) model explains the sense of repetition necessary for iteration (1d) in terms of access to a conceptual structure (CS) module. Linguistic structure is represented by three independent (phonological, syntactic and semantic/conceptual structure) generative components. Importantly for the present discussion, all three levels are linked via bi-directional interfacing constraints and so can function independently from surface structure. The CS represents the necessary interface with language structures that encode pragmatics, inferences and world knowledge. For example, the sense of repetition that emerges from The tourist photographed the sunset until nightfall describes an enriched compositional process (i.e., multiple photographs) that is accessed via the CS. The enriched compositional hypothesis applied to iteration has recently been evaluated from an on-line processing
108
Sacha DeVelle
perspective (Piñango, Zurif and Jackendoff, 1999) and will be discussed below. 4
On-line processing evidence for iteration
Linguistic evidence for coercion has traditionally relied on data from truth conditional tests such as paraphrasing and adverbial tests (Dowty 1979). However, recent work in on-line experimental settings has investigated the possible processing costs of aspectual coercion. Two studies in particular have examined the role of iteration4 at the syntax-semantics interface (Piñango et al. 1999; Todorova, et al. 2000). Evidence from cross modal lexical decision (CMLD) interference and reading time tasks has demonstrated where (within the sentence) the coercive process emerges. However, how this process is computed on-line is still an open empirical question. 4.1
Cross modal lexical decision time evidence for coercion
Piñango et al. (1999) used a CMLD interference task to test processing differences between paired sentences that varied only by verb type (e.g., activity and point action) as demonstrated in the following examples: (2)
a. The little girl snoozed for a long time^ after the grown-ups left the room. b. The little girl curtseyed for a long time^ after the grown-ups left the room.
Sentences were presented over headphones. At a certain position within the sentence (250 msec. after the durational adverbial), a probe appeared on the screen. The probe was either a word or a non word. Participants were instructed to respond to the probe by pressing yes (a real word) or no (a non word) via the mouse. The results showed an increased processing load, demonstrated by longer reaction times, for those sentences with point action verbs (2b). The authors interpreted this result as evidence for the enriched composition hypothesis (Jackendoff 2002). In (2b), the punctual (non-durative) feature of the point action verb is coerced via the temporal bound introduced at (or just following) the durational adverbial. The verb curtseyed is reinterpreted as an ongoing, repetitive process, independent of any syntactic or morphological marker. This is a prime example of enriched composition.
Aspectual Coercion and On-line Processing: The Case of Iteration
109
In contrast, examples such as (2a) express a continuous process that is bounded by the durational adverbial. The end result is simple composition. However, there was the possibility that point action/adverbial modifier combinations were overall less plausible when compared to their activity sentence pairs, and thus more difficult to process independent of a specific coercion effect. This was presumably eliminated by an off-line plausibility questionnaire presented to a separate group of participants (N = 20) who were asked to judge the overall plausibility of the same set of sentences. These results revealed no significant differences between the two conditions and led the authors to attribute the slower processing times to enriched composition. This point will be taken up again in section 4.3. 4.2
Reading time evidence for coercion
Piñango et al. (1999) interpreted longer reaction times for point action verb sentences as evidence for an enriched compositional operation that unfolds in real time. To further examine possible on-line effects of iteration, Todorova et al. (2000) used an on-line reading task that examined processing differences at two regions within the sentence: verb + singular /plural object (send a message vs. send messages) and type of adverbial modifier (durative, for a year vs. non-durative last year). These comparisons are demonstrated in the following sentence examples: (3)
a. Even though Howard sent a large check to his daughter for many years she refused to accept his money. b. Even though Howard sent large checks to his daughter for many years, she refused to accept his money. c. Even though Howard sent a large check to his daughter last year, she refused to accept his money. d. Even though Howard sent large checks to his daughter last year, she refused to accept his money.
Testing distinct coercion sites allowed for analysis of object cardinality, singular vs. plural, as a further possible factor that triggers repetition at the sentential level. The authors hypothesized that singular objects (representing a singular instance reading), combined with a durational adverbial (3a) would disrupt sentence comprehension to a greater extent than those sentences that incorporated a plural object and a durative modifier (3b).
110
Sacha DeVelle
The authors employed a self-paced reading task. The results showed significantly longer reading times for sentence condition (3a), compared to sentence condition (3b) at, and immediately following, the durational adverbial. In comparison, no differences emerged between singular (3c) and plural object (3d) non-durational combinations. The results suggest that singular/plural objects operate in a similar manner to the mass/count distinction, with the former placing a further bound on the punctual feature of the preceding verb. When a punctual verb with a singular object is followed by a durational adverbial a semantic mismatch arises, reflected in increased reading times. In comparison, the compatibility of a plural direct object with the durational adverbial eliminates any possible semantic anomaly. 4.3
Methodological issues for on-line processing of iteration
Piñango et al. (1999) and Todorova et al. (2000) both provide on-line processing evidence for the phenomenon of iteration. However, a closer examination of both stimuli sets suggests other possible sources of the observed processing differences. In particular, the putative plausibility of some of the items in Piñango et al's (1999) sentence stimuli is a concern. Certain sentence combinations, such as (2b) may strike some readers as semantically odd. One source for such oddities could relate to conceptual differences that arise from the sub-type of point action verb (singular vs. iterative) and type of durational adverbial (for vs. until). In comparison, Todorova et al. (2000) kept the verb constant within item sets, as shown in example (3b). However, while each set incorporated punctual verbs, there was no discrimination between point action (kicked, hit, sent) and achievement verbs (entered, stole, found) or time spans (immediate vs. prolonged) within the stimuli set. Furthermore, the incorporation of indefinite rather than definite articles allowed for an iterative reading deemed ungrammatical with achievement (change of state) verbs and other verb combinations (Although the dragon devoured a girl from the village for years vs. *Although the dragon devoured the girl from the village for years…).5 Interestingly, a number of the sentence stimuli sets incorporated coercion examples that contained the notion of subject intention, demonstrated in (1b), and iteration across prolonged (1c) and immediate time spans (1d). Such variations make it difficult to ascertain the source of slower reading times. In order to assess the role of semantic plausibility, a replication of Piñango et al's (1999) plausibility questionnaire and CMLD interference task was carried out. Any differences emerging between the off-line (ques-
Aspectual Coercion and On-line Processing: The Case of Iteration
111
tionnaire) and on-line data would suggest that real-time processing differences may reflect semantic anomalies independent of coercion per se. Replication of the original findings, on the other hand, would be further evidence for Jackendoff's (1997, 2002) enriched composition hypothesis that describes iteration as an extra-linguistic process that operates in real-time. 5 5.1
Experiment 1 Method
Participants. Forty-one undergraduate students enrolled in a first year linguistics course participated in the experiment as part of course credit. The mean age was 20.41 (SD = 5.50). There were 5 males and 36 females. All spoke English as their first language. Materials and procedure. The set of 25 sentence pairs (see Appendix A) were presented in an off-line questionnaire format. Sentences were interspersed among another 25 sentence pairs that functioned as fillers. Participants were asked to make a judgment on the ease of understanding for each sentence, based on their first impression. Responses were scored on a 5-point scale (1 not at all understandable to 5 completely understandable). The motive for the use of different terminology was based on the outcome of an earlier pilot study (N = 6). Here, 4 participants questioned the meaning of plausibility and its relationship with the sentences. In comparison, none of the 41 participants queried the intended meaning for ease of understanding. The experiment lasted approximately 20 minutes. 5.2
Results and discussion
Activity verbs were judged to be more understandable (M = 3.63) than point action (M = 3.29) sentences, as illustrated in Figure 1. A paired groups t-test revealed a significant difference t (24) = 2.85 p = .009.6 This indicated that point action/durational adverbial combinations were interpreted as more difficult to understand than those sentences that contained activity verbs. These results contradict Piñango et al's (1999) original findings that showed no significant differences between activity and point action sentence pairs. This suggests that longer online reaction times to the point action/durational adverbial condition may be an artifact of the sentence stimuli. A replication of the on-line CMLD interference
112
Sacha DeVelle
task, reported in Piñango et al. (1999), was also performed to see if similar online processing differences would be obtained.
Rating Scale
5 4 3 2 1 0
Activity
Point Action
Verb Type Figure 1. Experiment 1 mean difference between the two conditions
6 6.1
Experiment 2 Method
Participants. A separate group of 20 students enrolled in a first year introductory linguistics course participated in the experiment as part of course work requirements. The mean age was 19.45 years (SD = 2.95). There were 2 males and 18 females. All participants were right-handed and spoke English as their first language. Design and Procedure. The CMLD interference task was presented using Super Lab Pro (version 2). Fifty experimental sentences, interspersed among eighty fillers, were presented via headphones. The experimental procedure for order of presentation and probe attributes followed Piñango et al's (1999) original design. The position of the probe for the eighty filler sentences was randomly varied to avoid any participant expectancy effects. All experimental sentences were presented with the visual probe placed 250 msec. after the durational adverbial in question. Participants were tested together in a laboratory setting, during a session that lasted approximately 45 minutes. All were right handed, but were instructed to operate the mouse using their left hand, following the original experimental procedures.7 In order to make sure that participants were focusing on the task, ten comprehension questions, related to ten filler sentences, were interspersed throughout the experimental set. The primary task was to listen to the sen-
Aspectual Coercion and On-line Processing: The Case of Iteration
113
tences over headphones. The secondary task was to respond as quickly, and as accurately, to the visual probe that appeared on the screen at a particular point within each sentence. The expected outcome was that those sentences that contained an implied iterative effect, namely the point action/ durational adverbial condition, would trigger longer reaction times to the secondary task, due to increased processing load. 6.2
Results and discussion
The data from one participant was removed due to a high error rate in the comprehension test. The remaining 19 participants scored at least 80% on the comprehension questions presented on-line. Reaction times for each trial were screened and outliers more than 3 standard deviations beyond the mean were removed from the analysis. Figure 2 shows the means in milliseconds for activity (M = 742) and point action (M = 765) sentence pairs, compared to Piñango et al's (1999) original study. A paired t-test on subjects was significant, t (18) = -2.06, p < .03 (one-tailed), indicating that point action /durational adverbial combinations resulted in an increased processing load when compared to their activity paired comparisons. There was a high element of variability in the items analysis, possibly as a result of the non-preferred use of the left hand. The on-line CMLD results replicate Piñango et al's. (1999) online findings. However, the reverse finding emerged for the plausibility questionnaire. Unlike Piñango et al. (1999) the present replication revealed a significant difference, with point action /durational adverbial sentence pairs interpreted overall as more difficult to understand. Why could this be so? One possibility is a terminological difference in the wording of the questionnaire, namely plausibility (Piñango et al. 1999) vs. ease of understanding (the present study). As an anonymous reviewer pointed out, the wording of the present questionnaire, while intended to control for overall neutrality of the experimental sentence set, instead may have tested for the degree of difficulty between sentence pairs. While this possibility cannot be dismissed, a second replication of Piñango et al's (1999) questionnaire using identical wording (i.e. plausibility) also showed ratings for the enriched composition (point action verb) condition as significantly less plausible than their paired activity sentence counterparts (DeVelle, 2004). The findings from both questionnaires strongly suggest that extraneous sentence stimuli design features contributed to the on-line processing results. I argue that the conceptual differences that arise from the sub-type of point action
114
Sacha DeVelle
reaction time msec
900 800
Piñango et al. (1999) Experiment 2
700 600 500
Activity
Point Action Verb type
Figure 2. Comparison of mean conditions for Piñango et al. (1999) and Experiment 2
verb (singular vs. iterative) and type of durational adverbial (for vs. until) combinations contribute to the present findings reported here. Piñango et al. (1999) presented 12 sentence pairs containing for and 13 sentence pairs with until, resulting in unique combination comparisons. While both adverbial types imply duration, they differ conceptually in how the bounding element is encoded. The adverbial until specifies the endpoint and leaves the starting point implicit. In contrast, for durationals encode a bounded element that is unspecified as to any specific beginning or endpoint of the period mentioned (e.g. The little girl curtseyed for a long time after the grown-ups left the room). The scope of for adverbial modification can also imply two meaning senses (e.g. duration and subject intention) that are further dependent upon verb class type. For example, the adverbial for takes scope over the activity for potentially iterative point action verbs (i.e. The tourist photographed the sunset for ten minutes). The end result emerges as an extended iterative reading. Singular point action verbs such as (2b) appear to lack the necessary iterative component for iteration and this is why semantic oddities may emerge. Such oddities also emerge for examples such as The girl dived into the pool for twenty minutes that can imply a) the girl was in the pool for twenty minutes and b) the girl dived repeatedly into the pool (Piñón 1999). In contrast sentences that contain achievement verb types (i.e. John left for a week) show that the durative adverbial modifies
Aspectual Coercion and On-line Processing: The Case of Iteration
115
the state that follows rather than the duration of the activity (Pustejovsky 1992). Note that Todorova et al's (2000) stimuli set also demonstrated increased reading times for iteration with for durationals. However, they did not distinguish between subject intention (1b), prolonged iterative effects (1c) and immediate iterative readings (1d), suggesting that more principled definitions for a given type of iteration are necessary for future on-line processing experimentation. 7
General discussion
The present study investigated alternative reasons for on-line processing effects of iteration. Experiment 1 was at odds with Piñango et al's (1999) offline plausibility findings in comparison to Experiment 2 that showed the same pattern of on-line results. A more recent replication of the original questionnaire that utilized the same wording (i.e. plausibility) revealed the same results as Experiment 1. Sentences that contained point action verbs were rated as significantly less plausible than their paired activity counterparts. I have cited semantic oddities for singular punctual readings and potential ambiguity effects on for durational readings as contributing factors towards the offline results and the increased reaction times on the CMLD interference task. The on-line results from Experiment 2 replicate Piñango et al's (1999) original results and further support Jackendoff's (1997, 2002) enriched composition hypothesis, namely that iteration is an extra-linguistic process that emerges on-line. However, future on-line processing experimentation for iteration needs to apply a stringent sentence stimuli design to eliminate the alternative processing explanations presented here. A formal description of the type of iteration to be measured is also a necessary prerequisite. While the present research has addressed issues of a methodological nature, an important question is how to align linguistic models of coercion with performance models of language processing. Jackendoff's PA model (1999, 2002) is one compositional approach that attempts to bridge linguistic theory and on-line processing effects. Jackendoff attempts to explicitly align the notions of competence and performance within a processing theory that has explanatory power for coercion phenomena. Remember that iteration is purely semantic in nature. There is no overt syntactic marker in English that encodes this sense of repetition. The insertion of a repetition function at, or just following, the coercion region (i.e. enriched composition) is incorporated within the PA model via the conceptual processor.
116
Sacha DeVelle
This conceptual integration allows for re-interpretation of overall sentential meaning at the syntax-semantics interface. The PA model therefore has direct implications for language processing and poses a new set of questions for coercion phenomena. For example, can we assume that different types of coercion result in the same processing constraints? Coercion construed from The student began the book appears to involve a filling in of implicit subject noun and verb meaning (students read books), rather than a replacement of a default reading. In comparison, iteration may be better accounted for by a garden path strategy that involves replacing one reading with another (The man knocked on the door vs. The man knocked on the door for a minute).8 Such processing differences are not surprising given that the external modifiers for and until function as adjuncts and are not necessary to overall sentential well-formedness. To what extent coerced iterative readings differ from the processing effects of complement selection is yet to be determined. However the compositional processes necessary for iteration (i.e. the insertion of a repetition function) and aspectual complement selection (the insertion of an underspecified activity reading) provide coercion data that can be empirically tested by the PA model. To finish, the studies discussed here motivate further consideration for the possible on-line processing costs associated with specific aspectual coercion types and type of durational adverbial (for, until). This is currently being examined for subject intention (1b) and immediate iterative time spans (1d). Aspectual coercion is a complex phenomena and the psycholinguistic research to date is still in its infancy. However, the importance of these early on-line findings cannot be underestimated. Further experimental investigation into the real time processing mechanisms that govern extralinguistic processes can only benefit from this early work. Appendix A (1)
a. The boy rocked the dog for almost an hour until he realized that it was unconscious. b. The boy hit the dog for almost an hour until he realized that it was unconscious.
(2)
a. The boy screamed loudly until well into the evening at which point he was satisfied. b. The boy belched loudly until well into the evening at which point he was satisfied.
Aspectual Coercion and On-line Processing: The Case of Iteration
117
(3)
a. The little girl snoozed for a long time after the grown-ups had left the room. b. The little girl curtseyed for a long time after the grown-ups had left the room.
(4)
a. The man examined the little bundle of fur for a long time to see if it was alive. b. The man kicked the little bundle of fur for a long time to see if it was alive.
(5)
a. The man recited the material anxiously until the bell rang for a break. b. The man stamped the material anxiously until the bell rang for a break.
(6)
a. The tiger slept near the bushes for an hour after eating the biggest meal of his life. b. The tiger jumped near the bushes for an hour after eating the biggest meal of his life.
(7)
a. The little girl slumbered in the hall until her mom came to pick her up. b. The little girl twitched in the hall until her mom came to pick her up.
(8)
a. The boy slumped impatiently until the teacher asked him to leave the room to cool off. b. The boy interrupted impatiently until the teacher asked him to leave the room to cool off.
(9)
a. The man rested on the beach until the bell reminded him it was time to leave to avoid arrest. b. The man stumbled on the beach until the bell reminded him it was time to leave to avoid arrest.
(10)
a. The little girl dogpaddled in the pool until the teacher told her she should eat to keep her strength up. b. The little girl dived in the pool until the teacher told her she should eat to keep her strength up.
(11)
a. The woman serenaded her lover for hours before he woke up. b. The woman smacked her lover for hours before he woke up.
(12)
a. The boxer glared at his opponent until the referee separated them at the end of the round. b. The boxer swung at his opponent until the referee separated them at the end of the round.
(13)
a. The corn browned slowly until the guests arrived at the party. b. The corn popped slowly until the guests arrived at the party.
(14)
a. The man perspired heavily until someone opened the door to the house. b. The man coughed heavily until someone opened the door to the house.
118
Sacha DeVelle
(15)
a. The man meandered for an hour in the park after receiving the good news. b. The man sneezed for an hour in the park after receiving the good news.
(16)
a. The boy cried helplessly until the teacher said he could go home for a hot dinner. b. The boy nodded helplessly until the teacher said he could go home for a hot dinner.
(17)
a. The baby sang for an hour after her mom gave her food. b. The baby yawned for an hour after her mom gave her food.
(18)
a. The artist whistled songs for hours after she heard the music on the radio. b. The artist whistled the song for hours after she heard the music on the radio.
(19)
a. The man sprayed the worker until the bell rang to end the work shift. b. The man slapped the worker until the bell rang to end the work shift.
(20)
a. The snake hissed menacingly for an hour until the predator realized it was a lost cause. b. The snake jerked menacingly for an hour until the predator realized it was a lost cause.
(21)
a. The girl chased the criminal until the police came to her rescue in the park. b. The girl stabbed the criminal until the police came to her rescue in the park.
(22)
a. The light shone for about two hours after the last cab left the parking lot. b. The light flashed for about two hours after the last cab left the parking lot.
(23)
a. The insect glided effortlessly until it reached the far end of the garden that was hidden in the shade. b. The insect hopped effortlessly until it reached the far end of the garden that was hidden in the shade.
(24)
a. The bee tailed the bird for an hour in the hot afternoon sun. b. The bee stung the bird for an hour in the hot afternoon sun.
(25)
a. The boy dragged the dog for a long time in the park. b. The boy struck the dog for a long time in the park.
Aspectual Coercion and On-line Processing: The Case of Iteration
119
Notes 1.
2.
3. 4. 5. 6. 7. 8.
Smith (1991) introduced this class, termed semelfactive (semel meaning one in Greek). However, her definition does not distinguish between single and iterative situations. To avoid terminological confusion I employ the more general term point action. The notion of (un) boundedness is often described as analogous with (a) telicity. For the purposes of this paper the bounded/unbounded distinction described here is based on temporal sentential boundaries, in contrast to the telic/atelic distinction that is denoted as a characteristic of the verb class (Depraetere 1995). Moens and Steedman (1988) do not formally distinguish grammatical from situational aspect. However, their examples utilize varying tense and verb class that allow for this distinction. Experimental evidence from self-paced reading and eye gaze duration has also demonstrated on-line processing effects for complement selection, as described in (1a), (see McElree et al. 2001; Traxler et al. 2002). In this example the verb devour represents a non-punctual, and hence durative, accomplishment reading. However the endpoint is inherently telic. The original study ran an independent groups t-test on the questionnaire data. The present analysis used a paired groups t-test on both the questionnaire and CMLD task, given that each subject gave a response for all conditions. Piñango et al. (1999) instructed participants to use their left hand to compare the results to those of patients with aphasia, whose dominant hand was no longer functional. A reviewer pointed out that an OV language such as German places the coercion trigger (e.g. for a minute) before the verbal predicate, thus predicting the removal of the garden path effect. In English the sentence The man kept knocking on the door would appear to predict the same difference although this is yet to be tested. Note also that German distinguishes between resultant state readings and iterative readings using für and lang respectively. English in contrast marks both meaning senses with for. (See Piñón 1999 and Moens and Steedman 1988 for further discussion).
120
Sacha DeVelle
References Depraetere, Ilse 1995 On the necessity of distinguishing between (un)boundedness and (a)telicity. Linguistics and Philosophy 18: 1-19. De Swart, Henriette 1998 Aspect shift and coercion. Natural Language and linguistic theory, 16: 347-385. DeVelle, Sacha 2004 Off-line versus on-line processing: The role of verb class Unpublished manuscript, University of Queensland, Australia. Dowty, David 1979 Word Meaning and Montague Grammar. Dordrecht: D. Reidel. Frawley, William 1992 Linguistic Semantics. London: Lawrence Erlbaum Associates. Krifka, Manfred 1998 The origins of telicity. In Events and Grammar, Susan Rothstein (Ed.), 197-235. Dordrecht: Kluwer Academic Publishers. Jackendoff, Ray 1997 The architecture of the language faculty. Linguistic Inquiry Monograph. Cambridge, Mass.: MIT Press. 1999 Parallel constraint-based generative theories of language. Trends in Cognitive Sciences 3 (10). 393-400. 2002 Foundations of Language. Oxford: Oxford University Press. McElree, Brian, Michael Traxler, Richard Seely, and Ray Jackendoff 2001 Reading time evidence for enriched composition. Cognition 78: 17-25. Moens, Mark and Mark Steedman 1988 Temporal ontology and temporal reference. Computational Linguistics 14 (2): 15-28. Piñango, Maria, Edgar Zurif, and Ray Jackendoff 1999 Real-time processing implications at the syntax-semantics interface. Journal of Psycholinguistic Research 28 (4): 395-414. Piñón, Christopher 1999 Durative adverbials for result states. Proceedings of the 18th West Coast Conference on Formal Linguistics, Somerville, MA.: Cascadilla Press, 420-433. Pustejovsky, James 1992 The syntax of event structure. In Lexical and Conceptual Semantics, Beth Levin and Steven Pinker (eds.), 57-81. Cambridge, Mass: Blackwell. 1995 The Generative Lexicon. Cambridge, Mass.: MIT Press. Smith, Carlota 1991 The Parameter of Aspect. London: Kluwer Academic Publishers.
Aspectual Coercion and On-line Processing: The Case of Iteration
121
Todorova, Marina, Kathleen Straub, William Badecker, and Robert Frank 2000 Aspectual coercion and the on-line computation of sentential aspect. In proceedings of the twenty-second annual conference of the Cognitive Science Society, Philadelphia, PA. Traxler, Matthew, Brian McElree and Martin Pickering 2002 Coercion in sentence processing. Evidence from eye-movements and self-paced reading. Journal of Memory and Language 47: 530-547.
Why Do Children Fail to Understand Weak Epistemic Terms? An Experimental Study Serge Doitchinov
1 Introduction In this paper, I will present the results of two experiments that were conducted in order to find out whether children’s late understanding of epistemic terms is related to the development of their ability to understand epistemic uncertainty or to their ability to recognise scalar implicatures. It is commonly admitted that epistemic modality serves to express the speaker’s attitude towards the truth values of propositions by indicating some degree of certainty or uncertainty. In German, as well as in English, epistemic modality is mostly expressed by modal verbs (m u¨ ssen ‘must’, k¨onnen ‘may, might’) and epistemic adverbs (sicher ‘certainly’, vielleicht ‘maybe, perhaps’). Naturalistic studies on children’s language production show that the acquisition of epistemic expressions begins between two-and-half and three years of age (Stephany 1986; Ehrich 2005). Although children have apparently no difficulties using these expressions from the beginning, epistemic terms remain very rare in their spontaneous speech until 4;5 (year;month) or even later (Stephany 1986). Most researchers in the Piagetan tradition consider the acquisition of epistemic meanings by children to be dependent on the development of the concepts of necessity and possibility, with socially oriented necessity/possibility (i.e. the deontic meaning of modal expressions) being acquired prior to the more abstract epistemic and alethic ones (Stephany 1986). In a recent study Papafragou (2000) suggests that the onset of epistemic meanings of modal verbs is correlated with the children’s progress towards the acquisition of a representational theory of mind. She argues that as long as children are unable (i) to make a sharp distinction between the real world and mental representations of it, and (ii) to recognise that there is not always a one-to-one mapping between these representations and the world, they probably cannot use and understand epistemic expressions reliably. In the last twenty years, a large body of studies in the field of developmental psychology has provided strong support of the view that a representational
124 Serge Doitchinov theory of mind is acquired between 3;0 and 5;0, i.e. at about the same age span when the first epistemic expressions occur and become more frequent in children’s spontaneous speech (cf. Wellman 1990; Bartsch and Wellman 1995). However, there are various experimental studies on the comprehension of epistemic terms that challenge the findings of the naturalistic studies by showing that the linguistic epistemic system is not fully understood before 8;0 or even later. For example, studies by Pieraut-Le Bonniec (1980), Noveck (2001) and Doitchinov (2001), among others, suggest that 6- and 7-year-olds are still unable to understand weak epistemic expressions (i.e. terms expressing a low degree of certainty like ko¨ nnen or vielleicht) in the same way as adults. Above all, children of this age seem to systematically associate those terms with situations that normally require stronger ones like m u¨ ssen or sicher. The findings of these studies suggest that the children understand (weak) epistemic terms much later than they begin to use them productively in their spontaneous speech. In order to account for the findings of the experimental studies, two main hypotheses have been proposed in the literature. The first one focuses on the children’s ability to distinguish between necessity and possibility conclusions (I will call it the inference based hypothesis), and the second on their ability to take scalar implicatures into account (I will call it the implicature based hypothesis). I will now briefly discuss both hypotheses. The inference based hypothesis: A large body of studies relates the late understanding of weak epistemic terms to children’s cognitive development, and in particular to their ability to distinguish between epistemic necessity and possibility, as well as to deal with the concept of physical indeterminacy. Since the early work of Pieraut-Le Bonniec (1980), it has generally been assumed that children under 9;0 tend to overgeneralise epistemic certainty, i.e. ‘to be sure’ even if in fact they do not have enough information to draw such strong conclusions (cf. also Byrnes and Overton 1986). Moreover, these studies suggest that children at this age also often fail to appreciate the distinction between empirically determinate (i.e. problems that have only one solution) and indeterminate problems. Specifically, children make mistakes when drawing inferences of epistemic possibility because they just take the first match of the evidence into account instead of considering further possibilities. Thus, they tend to confuse indeterminate problems with determinate ones (Morris and Sloutsky 2002). These findings fit very well, indeed, with the observation that children at this age often fail to understand that weak epistemic terms are used to express the facts that not enough evidence is
Why Do Children Fail to Understand Weak Epistemic Terms? 125
available to reach a definitive conclusion about a state of affairs. It is, therefore, possible that they confound the use of weak and strong epistemic terms because they tend to consider most situations as fully determinate. The implicature based hypothesis: In a recent study, Noveck (2001) suggests that children’s failure to interpret weak epistemic terms correctly is less due to some cognitive deficit than to the fact that they do not recognise scalar implicatures as often as adults do. Since Horn (1972), it has been well known that epistemic terms form scales that are ordered by degrees of certainty (<must, might> or
), and that linguistic scales give rise to so-called scalar implicatures. Linguistic scales have two important properties: logically, the truth of a stronger term always entails the truth of all weaker terms on a given scale. Therefore, a sentence with might is true in all situations in which a corresponding sentence with must is true. Pragmatically, however, when a speaker uses a weaker term of a given scale, s/he normally denies that all stronger terms of the scale would describe the situation appropriately. That is, the use of might by a speaker implicates that the use of must would be inappropriate, even if, from a purely semantic point of view, both terms could be used to describe the situation. To sum up: the use of might implicates not must. Children who systematically ignore scalar implicatures will treat weak scalar terms like might logically. This will lead them to automatically judge statements with might to be acceptable in situations that they know to be determinate and that normally require must. Noveck (2001) examined this possibility in three experiments. His first two experiments were designed to evaluate how children from 5;0 to 9;0 judge the appropriateness of sentences with might. His results suggest that even 9-year-olds consider sentences with might to be acceptable in situations that normally require must more often than adults. In a follow-up experiment (not with the same children), he observed exactly the same type of reaction with the French quantifier certains ‘some’, which is also a weak scalar term. From this, Noveck (2001) concludes that children are more likely than adults to accept under-informative statements. The results of the experimental studies presented above do not allow us to clearly decide whether the late understanding of weak epistemic terms by children is due to their cognitive inability to understand epistemic uncertainty or to their inability to recognise scalar implicatures in the same way as adults. The typical reaction of the children tested in these studies can be taken as evidence for either of the hypotheses under review. Thus, what we need is further experimental research that tests for both hypotheses. In fact, the only
126 Serge Doitchinov way to examine the role of both hypotheses in the acquisition of epistemic terms is to let the same sample of children carry out a set of tasks that would allow us to assess their cognitive as well as their linguistic abilities. This is the aim of the two experiments reported below. 2 Experiment 1 The experiment consists of three tasks: (i) The modal expression task (ME task) investigates the children’s ability to understand weak epistemic expressions correctly; (ii) the implicature task was designed to assess the children’s understanding of scalar implicatures; and (iii) the inference task examines their ability to deal with epistemic uncertainty. In order to gain a better insight into the relationship between the three abilities tested here, the three tasks were performed by the same sample of children. They were conducted by means of the picture selection paradigm. The same testing method was used in all tasks to reduce potential biases that might be caused by a different degree of complexity in the tasks. 2.1
Participants
Eighteen 6-year-olds (from 5;7 to 6;6, mean age 6;2), twenty-eight 7-yearolds (from 6;7 to 7;6, mean age 7;1), twenty-eight 8-year-olds (from 7;7 to 8;6, mean age 8;1) and ten adults, all monolingual speakers of German, participated in the experiment. The children were recruited from elementary schools and kindergartens in Stuttgart and T¨ubingen (Germany). The adults were students at the University of T¨ubingen. 2.2 2.2.1
Method and material The ME task
In the ME task the children’s understanding of the German epistemic modals k¨onnen ‘may’ and vielleicht ‘maybe/perhaps’ was tested. The following sentence types were used to carry out the task: (1)
a.
Es kann sein, dass der Junge im Haus ist. it can be that the boy in-the house is ‘It may be the case that the boy is in the house.’
Why Do Children Fail to Understand Weak Epistemic Terms? 127
Figure 1a. The “certainty story”
Figure 1b. The “uncertainty story”
Figure 1. Picture stories presented with the sentence in (1a).
b.
Der Junge ist vielleicht im Haus. the boy is maybe in-the house ‘Maybe the boy is in the house.’
The children were tested three times for each type of sentence. So, six different sentences with k¨onnen and vielleicht were used in this task. Following Doitchinov’s (2001) design, the children were presented with two picture stories at the same time. Each story consisted of two pictures (see Figure 1). The first story showed a boy performing some action. The actions shown were always of the same type: a boy moving from one location to another. At the end of the story, it was always possible to see that the boy was at the place described in the input sentences. This type of story was called the “certainty story” (see Figure 1a). The second story (“the uncertainty story”, see Figure 1b) always showed the same beginning as the corresponding “certainty story”. The only difference was that at the end of the “uncertainty story” the location of the boy was left uncertain (i.e. there was nobody to see in the second picture of the “uncertainty story”). To prevent the participant from confusing the two stories, one showed a boy with blond hair and the other a boy with brown hair. The factors [blond/brown], [certain/uncertain] and [left/right position in front of the participant] were randomised through-
128 Serge Doitchinov out the trials. The main test was preceded by a short warm-up session (not for the adult group). The goal of this session was to ensure that the participants were able to understand that the two pictures belonged to the same story. At the beginning of the warm-up session, the investigator told the participant that he wanted to show him/her a picture story about a boy (“look, this is the story of a boy with blond/brown hair!”). At this point, the investigator showed the participant a two-picture-story similar to the “certainty story”. The participant was given enough time to look at the pictures. Then, the investigator asked him/her to tell the story. Participants who were obviously unable to recognise that the two pictures formed one and the same story or that the story was about one and the same boy were excluded from the main test. At the beginning of the main test, the investigator told the participant that he wanted to play a small quiz-game with him/her: “Here are two different picture stories. As you can see in the two pictures here, the first story is about a boy with blond/brown hair. And, as you can see in the two pictures here, the second story is about a boy with brown/blond hair. Please look at the stories carefully! Tell me when you are ready, and then I will ask you a question and you will have to guess which story is the right one. If you listen carefully to my question, you will be able to guess correctly.” At the beginning of each trial the investigator waited until the participant told him that s/he was ready. Then, the investigator asked him/her the following question: “Von welcher Geschichte spreche ich, wenn ich sage: I NPUT SENTENCE?” (‘Which story am I talking about, when I say . . . ’). The participants were not only asked to choose one story, but also to give a justification of their choice. A similar task (but with differing weak epistemic sentences) was conducted in Doitchinov (2001): the results of this study showed that the children who correctly understood weak epistemic terms chose the “uncertainty story”. They argued that they could know the boy was in the house in the “certainty story” or that there was nobody to see in the house at the end of the “uncertainty story”. Children who did not understand the weak epistemic terms correctly also behaved very consistently: They nearly always chose the “certainty story”, arguing that they could see the boy in the house. The same typical reactions are expected in the ME task of the present experiment. 2.2.2
The implicature task
The implicature task was designed to find out whether the participants preferred a pragmatic interpretation (i.e. under consideration of the scalar impli-
Why Do Children Fail to Understand Weak Epistemic Terms? 129
Figure 2a. Beginning of a story with an open end
SOME case
ALL case
NONE case
Figure 2b. Three possible outcomes
Figure 2. Picture story for the implicature task.
cature) of the quantifier einige ‘some’ in a picture selection task. This task was very similar to the ME task. The participants were presented the beginning of a picture story, the ending of which was open. The first picture always showed a group of five children that were about to perform some action (see Figure 2a). The participants were presented two of three possible outcomes of the story in each trial (see Figure 2b). The first possible outcome showed that only three children performed the action at the end of the story (= the SOME case), the second that all children (= the ALL case) performed the action and the third that none of the children did (= the NONE case). The understanding of sentences that were quantified with einige ‘some’, alle ‘all’ and kein ‘no’ was tested: (2)
Einige/alle/kein Kind(er) sind/ist im Boot. some/all/no child(ren) are/is in-the Boat ‘Some/all/no child(ren) are/is in the boat.’
The rest of the procedure was similar to the ME task: the investigator introduced the first trial by saying: “Here is the beginning of a story with two pictures. The story is about a group of five children. But we don’t know yet the end of the story. Here are two possible outcomes for the story. I will ask
130 Serge Doitchinov you a question, and you will have to guess which one of the two pictures is the right ending of the story.” After the participant was given enough time to look at the pictures, the investigator asked the following question: “Welches ist das richtige Ende der Geschichte, wenn ich sage: I NPUT SENTENCE?” (‘Which is the right ending of the story when I say: . . . ’). The next trials were just introduced by saying that there was a new quiz of the same type. Three different stories were used and six different combinations of input sentences and possible outcomes were tested three times each. In the critical trials, the participants were given a sentence with some in combination with the SOME and the ALL cases. The order of the trials was randomised but the three critical trials were always performed before the three trials with alle in order to avoid the following bias: first, an alle sentence is associated with the ALL case and then, in contrast, all the following sentences with einige are associated with the SOME case. This task was designed to find out whether children prefer the underinformative semantic (ALL case) or the more accurate pragmatic reading (SOME case) of einige in the critical trials. Because the ME and the implicature tasks are very similar, a comparison of the results of both tasks allows one to assess whether a reluctance to take into account the scalar implicature with k¨onnen/vielleicht is responsible for the choice of the “certainty story” in the ME task, as predicted by the implicature based hypothesis. If this were the case, one would expect that children who prefer the “certainty story” in the ME task would favour the ALL case in the critical trials of the implicature task. 2.2.3
The inference task
The goal of the inference task was to assess the children’s ability to deal with the concept of epistemic uncertainty. The task was a modified version of Somerville et al.’s (1979) first experiment. The task was introduced as a quiz-game. The participants were presented a picture showing a child with three toys (see the pictures on the left in Figure 3). They were told that this child always forgets one of his/her toys at the door of his/her house when going home. In a second picture, two houses were shown with a toy at their doors (see the pictures on the right in Figure 3). The investigator asked the participant the following question: “Kannst du mir sagen, in welchem Haus das Kind wohnt? Oder brauchst du Hilfe von mir?” (‘Can you point to the house the child lives in? Or do you need some help from me?’). Three possible cases were taken into account: in case A, the problem was
Why Do Children Fail to Understand Weak Epistemic Terms? 131
Figure 3a. Determinate case
Figure 3b. Indeterminate case with neg./no evidence
Figure 3c. Indeterminate case with pos. evidence
Figure 3. Pictures for the inference task.
determinate, because only one toy in front of the houses could belong to the child (see Figure 3a). In case B and case C, the problem was indeterminate, because either none of the toys (case B) or both toys (case C) belonged to the child (see figure 3b and c). Case B was called the inference ne task, because it contained no evidence about the house of the child, and case C was called inferencepe task, because in this case there was ‘too much’ positive evidence. Cases B and C were indeterminate problems, because the information available did not allow for a clear decision. The participants were tested three times for each case. They also had to justify their answers. It was expected that the participants who understand epistemic uncertainty would ask for help in case B, because none of the toys in front of the houses matched the child’s toys, and also in case C, because both toys matched the child’s toys. Following the inference based hypothesis, participants who pass the inference task should be more likely to succeed in the ME task.
132 Serge Doitchinov 2.3
Procedure
The young participants were tested individually in a separate room of their kindergarten or elementary school. The experiment was conducted in two sessions of about 15 minutes. In the first session, the participants performed the ME task. In the second session, they performed the implicature task first and then the inference task. About 40% filler trials were added to each task. The answers were tape recorded. 2.4
Results
In the ME task an answer was counted as correct if the participant chose the “uncertainty story” and gave an appropriate justification (i.e. rejecting the “certainty story”, arguing that one could see the boy; or choosing the “uncertainty story”, arguing the boy could not be seen). The results for vielleicht and k¨onnen were counted separately. In the implicature task, only the three critical trials were taken into account (no participant had any difficulties in the other trials). To be successful in these trials, the participant had to show a preference for the SOME case. In the inference ne and inferencepe tasks, an answer was counted as correct, if the participant showed that s/he was not willing to choose one of the houses. A correct answer counted as 1/3 point, so that each participant reached a score between 0 and 1 for each task. Table 1. Percentage of correct answers in Experiment 1. [With standard deviation in ().] Age
k¨onnen
vielleicht
implicature
inferencene
inferencepe
6 7 8 Ad.
0.09 (.25) 0.37 (.47) 0.64 (.47) 1.00 (.00)
0.12 (.19) 0.40 (.47) 0.69 (.44) 1.00 (.00)
0.89 (.32) 0.93 (.26) 1.00 (.00) 1.00 (.00)
0.47 (.43) 0.66 (.45) 0.73 (.42) 1.00 (.00)
0.22 (.35) 0.48 (.50) 0.68 (.46) 1.00 (.00)
Table 1 shows the mean score of correct answers reached by the participants for each task/group. A 2 (SEX) × 4 (AGE: 6, 7, 8, adult) MANOVA was conducted. For all tasks, the MANOVA showed no effect for SEX (F(1, 82) = 0.29, p = .59 for k¨onnen; F(1, 82) = .09, p = .77 for vielleicht; F(1, 82) = 3.55, p = .06 for implicature; F(1, 82) = 1.33, p = .25 for inference ne and F(1, 82) = .35, p = .56 for inferencepe ). On the contrary, the analysis showed an important effect of AGE in the ME task (F(3, 80) = 13.42, p = .000 for
Why Do Children Fail to Understand Weak Epistemic Terms? 133
k¨onnen and F(3, 80) = 14.11, p = .000 for vielleicht), in the inference ne task (F(3, 80) = 3.42, p = .02), and in the inference pe task (F(3, 80) = 8.27, p = .000). No effect of AGE was observed in the implicature task (F(3, 80) = 1.50, p = .22). Concerning the factor AGE, Post-hoc Dunnett-C tests (α = .05, 2-tailed) were conducted for all tasks but the implicature task. These tests showed that the adults performed significantly better than the children in all tasks. In the ME task, the 8-year-olds understood both epistemic terms significantly better than the 6-year-olds. No other significant difference between the groups of children was observed in this task. In the inference ne task, there was no significant difference between the performance of the different groups of children. In the inferencepe task, the Dunnett-C test shows the same result as in the ME task: the 8-year-olds solved this task significantly better than the 6-year-olds. No other significant differences in the results of the groups of children were noticed. A Wilcoxon test (α = .05, 2-tailed) was performed in order to assess whether the children understood vielleicht better than k o¨ nnen, and whether they performed better in the inference task than in the ME task. The results of this test showed that there were only one significant difference: the 6- and the 7-year-olds performed significantly better in the inference ne task than in the ME task (6-year-olds: z = −2.76, p = .006 between vielleicht and inferencene ; z = −2.85, p = .004 between k¨onnen and inferencene . 7-year-olds: z = −2.6, p = .009 between vielleicht and inference ne ; z = −2.52, p = .012 between k¨onnen and inferencene ). More important for testing the validity of the implicature based hypothesis and the inference based hypothesis are the correlations between the results of the different tasks. Partial correlations controlling for AGE were computed, but without taking the adults’ results into account. Table 2 shows the degree and the significance of the partial correlations. No significant correlations were observed between the ME and the implicature tasks. On the contrary, there are highly significant correlations between the children’s comprehension of the two epistemic terms in the ME task and their performance in judging epistemic uncertainty in both inference tasks. However, the correlations with the inference pe task are somewhat higher than the correlations with the inference ne task. Not surprisingly, the two inference tasks correlate with each other. It is also worth noticing, that the highest significant correlation is between the results for both epistemic terms.
134 Serge Doitchinov Table 2. Degree and significance of partial correlations in Experiment 1. k¨onnen k¨onnen vielleicht implicature inferencene inferencepe ∗ p = .000
2.5
— 0.91∗ 0.03 0.43∗ 0.52∗
vielleicht 0.91∗ — 0.00 0.43∗ 0.58∗
implicature
inferencene
inferencepe
0.03 0.00 — −0.18 −0.01
0.43∗ 0.43∗ −0.18 — 0.59∗
0.52∗ 0.58∗ −0.01 0.59∗ —
Discussion
The results of the ME task indicate that an important developmental change toward a full mastering of the meaning of epistemic terms takes place between 6 and 8 years of age. The 6-year-olds gave only 9% (k o¨ nnen) and 12% (vielleicht) of the correct answers. Most of the time, the 6-year-olds preferred the choice of the “certainty story” to the “uncertainty story”. This shows that children of this age are not yet able to understand reliably that vielleicht and epistemic k¨onnen are weak epistemic terms. On the contrary, the 8-year-olds were already quite confident in dealing with those terms: the children of this group mastered the ME task about two-thirds of the time, i.e. about six times more often than the youngest group. These results suggest that the most important part of the development of the understanding of weak epistemic terms occurs during this period of two years. Nevertheless, the results also suggest that even the 8-year-olds do not achieve the same competence in dealing with these terms as adults. In fact, the significant difference between the adults’ and the 8-year-olds’ performances cannot only be explained in terms of stress in an experimental situation or difficulties to remain concentrated during the task. If this were the case, one would expect to find nearly 100% of the 8year-olds giving just two correct answers out of three (which would lead to an average of exactly 2/3 correct answers, i.e. to the same results as observed). However, the analysis of the data of the ME task shows that, on the contrary, about 30% of 8-year-olds gave zero or only one correct answer. This means that about 30% of the 8-year-olds were still very poor in understanding weak epistemic terms. The justifications given by the participants for their choice during the ME task also support the claims above. The typical reaction of the 6-year-olds in this task was to choose the “certainty story” and to justify this choice by
Why Do Children Fail to Understand Weak Epistemic Terms? 135
explaining that they could see the boy in the house at the end of the story. This reaction shows clearly that children of this age agree that a weak epistemic statement is compatible with a situation that should be described with a strong epistemic statement or by a simple assertion. In contrast, the 7- and 8-yearolds who preferred the “uncertainty story” gave two types of justifications: some of the children argued that the “certainty story” was inadequate because they could see the boy looking out of the window, and the rest of the children argued that the “uncertainty story” was the right one, because the boy had totally disappeared. Both the wrong justifications of the younger children as well as the right ones of the older groups indicate that the choice of the story by the children was not motivated by random guessing, but by their knowledge about the location of the boy. To sum up, the results of the ME task show that 6- and 7-year-olds (and to some extent even 8-year-olds) are more likely to accept a weak epistemic term than adults in a situation that normally requires the use of a stronger expression. However, the main goal of this study was to investigate whether this behaviour of young children is due to their inability to recognise epistemic uncertainty, as predicted by the inference based hypothesis, or due to some tendency to ignore scalar implicatures, as predicted by the implicature based hypothesis. The following discussion of the results of the inference task and the implicature task will show that the children’s ability to solve the ME task is only correlated with their ability to recognise uncertainty, and not with some avoidance to take scalar implicatures into account. The results of the inference task show the same developmental trend that was already observed in the ME task. In general, the children of the two younger groups were not able to suspend their judgements when it was impossible to decide which house was the right one. Only the 8-year-olds showed a reliable competence in making such judgements of uncertainty independently from the type (negative vs. positive) of evidence they were offered, even if about 30% were still not able to do this. It is also important to notice that the children of all groups could recognise indeterminacy more easily when the evidence was negative than when it was positive: this difference was 25% for the 6-year-olds, 18% for the 7-year-olds, but only 5% for the 8-year-olds. The better performance of the children in the inference ne task compared to the inferencepe task is probably due to the fact that, in the inference ne task, the children used a try-and-eliminate strategy that automatically led to the conclusion that none of the presented houses is a possible correct answer to the task. Thus, in the inferencene task, there was no need for a comparison be-
136 Serge Doitchinov tween the two possible outcomes to lead the participant to the conclusion that they would need help from the investigator. The elimination of both houses in this task gave enough information to the participant to infer that there was no solution to the problem. On the contrary, in the inference pe task, the use of such a try-and-eliminate strategy did not allow the elimination of any of the possible answers, because none of the proposed solutions turned out unsatisfactory at the end of the elimination process. Thus, at the end of such a fruitless elimination process, the participants could follow two strategies: (i) they could choose one house randomly, or (ii) compare the two possibilities. It is obvious that only the second strategy led to a correct answer in the inferencepe task. So, unlike the inferencene task, the inferencepe task required a comparison between the two possible answers from the participants in order to come to the right conclusion, i.e. that the task was indeterminate. As previous research by Morris and Sloutsky (2002) has shown, young children have difficulties in drawing such a comparison. Their data suggest that children just pick out the first possibility that matches with an answer, without checking whether the problem may have another possible solution or not. It is therefore probable that many children just stopped the elimination process as soon as a possible solution to the task was found, without checking if there could be another one. Following this procedure, the children who simply recognised the toy in front of the first house concluded that it was an acceptable solution to the problem and skipped the examination of the second house. As a possible explanation for this behaviour, Morris and Sloutsky (2002, 924) suggested that children use a ‘cut’ strategy to simplify the complexity of problems (i.e. they transform an indeterminate problem to a determinate one). It is likely that the same ‘cut’ procedure was used in the inferencepe task. In the same line of argumentation as Morris and Sloutsky (2002), Pieraut-Le Bonniec (1980) found out that young children often add irrelevant information to an unsolvable task in order to increase the determinability of the problem. Furthermore, the results of the ME task and inference task indicate that the children’s abilities to recognise uncertainty and to understand expressions of epistemic uncertainty are interrelated. It is, therefore, probable that the participants who favoured the “certainty story” in the ME task did not realise that the “uncertainty story” was indeterminate. Basically, the “uncertainty story” can be interpreted in three ways: (i) one may decide that the boy is not in the house at the end of the story, (ii) one may decide that the boy is definitely in the house, or (iii) one may decide that both (i) and (ii) are possible. Naturally, only interpretation (iii) leads to an uncertainty judgement. It is probable that
Why Do Children Fail to Understand Weak Epistemic Terms? 137
many of the children who failed to pass the ME task gave interpretation (i) to the “uncertainty story”, because this interpretation automatically eliminates the “uncertainty story” as a possible answer. (Weak epistemic terms are, in fact, logically compatible with the “certainty story”, but never with a story in which one can be sure that the boy is not in the house.) On the contrary, giving interpretation (ii) to the “uncertainty story” did not really help them to solve the task, because in this case both stories are equally good answers to the task. Although the results of this experiment suggest that the late understanding of epistemic terms is linked, to some extent, to the children’s ability to recognise epistemic uncertainty, one should be careful concluding that children under 8;0 are unable to deal with epistemicity. As many previous studies show, children’s ability to recognise uncertainty may vary considerably depending on the complexity of the task they have to perform (cf. Byrnes and Beilin 1991). It is therefore possible that the children in this experiment tended to choose one of the houses in the inferencene and inferencepe tasks even if they had some feeling that they did not really know which one is the right one. They might have a priori expected the task to be solvable. So, they might have felt as if they were not on safe ground in showing that they did not know how to solve the task. For the same reason, it is also possible that, in the ME task, many children were just more cautious than adults, because they did not feel so secure in the environment of an experiment. This could have led some children to accept the weak statements in combination with a situation that would normally require the use of must or certainly, although they would reject them under circumstances that fit better with their experience in everyday life. Despite these possible objections, the main result of this experiment still suggests that the understanding of epistemic terms by children interacts with the way they are able to deal with their own ignorance about some facts, and not with their ability to recognise scalar implicatures, as the following discussion of the results of the implicature task will show. Whereas, in the ME task, nearly all 6-year-olds, many 7- and some 8-yearolds demonstrate a strong preference for the “certainty story” (i.e. the only story that fits with a logical interpretation of weak epistemic terms), nearly all children preferred the more informative pragmatic reading of einige in the implicature task. Accordingly, no significant correlation was observed between the results from both tasks. If the reluctance to take scalar implicatures into account was the reason for the children’s failure to understand the epistemic terms in the ME task, one would expect exactly the same pattern of answers
138 Serge Doitchinov in the implicature task. This possibility is clearly inconsistent with the data of the experiment. The analysis of the justifications given by the children in the critical trials of the implicature task strongly support this claim. In nearly 100% of the cases, the children argued that the ALL case did not fit with the einige sentences because the latter required that only a part of the group had performed the action. This is exactly the type of justification one expects when the choice was made according to the requirement of the scalar implicature. However, one should be careful with the interpretation of the results of the implicature task. In this experiment, the goal of the implicature task was only to get a good comparison between the reaction of the children to epistemic k¨onnen and vielleicht, on the one side, and the weak scalar term einige, on the other. However, one should keep in mind that the implicature task was only designed to test the preferred reading of einige and not to assess whether the children accept both the semantic and the pragmatic reading of the quantifier. In other words, this task was not designed to assess children’s ability to recognise scalar implicatures directly. Therefore, it cannot be ruled out that children who do not recognise scalar implicatures at all might also prefer to associate einige with the SOME case for other reasons. One such possible reason could be, for example, that einige occurs more often with its pragmatic meaning in everyday conversation. In order to find out whether the preferred pragmatic reading of einige in the implicature task was really the result of a scalar implicature, a second experiment was carried out. 3 3.1
Experiment 2 Participants, method, material and procedure
Four 6-year-olds (from 5;7 to 6;6, mean age 6;2), four 7-year-olds (from 6;7 to 7;6, mean age 7;2), and four 8-year-olds (from 7;7 to 8;6, mean age 8;3), all monolingual speakers of German, participated in the experiment. They were recruited from elementary schools and kindergartens in Stuttgart and T¨ubingen (Germany). To assess the children’s ability to recognise scalar implicatures two tasks were set up. The first task was an exact repetition of the implicature task from the first experiment. This task was called the picture selection task (PST). The second task was a truth value judgement task (TVJT), inspired by Chierchia et al. (1998). The advantage of the TVJT is that it allows for testing the acceptance of sentences with einige in the ALL and the SOME cases
Why Do Children Fail to Understand Weak Epistemic Terms? 139
separately. In the TVJT, the participants were presented the following scenario with small puppets: a group of five Smurfs (Smurfs are comic figures that are well known by German children) were observed by a sixth Smurf (the observer) while performing some action. For example, three of the five Smurfs wanted to cross a road, but two of them did not want to. Three possible outcomes were used for this situation. (i) In the ALL case, three Smurfs crossed the road. After a long discussion, the two reluctant ones joined the group again. (ii) In the SOME case, the reluctant Smurfs refused to cross the road, while the remaining three Smurfs went across. (iii) In the NONE case, no Smurf performed the action. Of course, only one of the possible outcomes was presented in each trial. At the end of the scene, the observer told the participant that he knew what happened. Then he described the scene with one sentence of the same type as in (2). Afterwards, the participant was asked whether the description by the observer was correct or not. The participants had to perform the test three times for the same combination of input sentences and possible outcomes, as in Experiment 1. Three different scenes were used in this experiment. As in Experiment 1, about 40% filler trials were added. In the critical trials, it was expected that all participants would accept the einige sentences in combination with the SOME case, but that only the participants who took the implicature into account would reject this type of sentence in combination with the ALL case. The goal of this experiment was to determine what correlation can be observed between the ability of the children to recognise the implicature in the TVJT and their preferred reading of weak scalar terms in the PST. The participants were tested individually in a separate room at their kindergartens or schools in a single session of about 20 minutes. The order of presentation of the tasks was varied within the different groups. 3.2
Results
Because the answers of the participants were very homogeneous throughout the different trials (always 3/3 or 0/3 correct answers), no mean score of correct answers was computed. The participants were just evaluated as successful or not successful. In the PST, a participant was successful if s/he showed a preference for the SOME case when the input sentence was with einige and the choice was between either the SOME case or the ALL case (cf. above the critical trials in the implicature task of Experiment 1). In the TVJT, a participant was successful when s/he accepted the sentence with einige in
140 Serge Doitchinov the SOME case and rejected it in the ALL case. Eleven children passed the PST successfully. This result largely replicates the finding of the first experiment, that 6- to 8-year-olds strongly prefer the pragmatic reading of einige to the semantic reading. The same eleven children also succeeded in the TVJT. The result shows that at least the great majority of the children reliably took the scalar implicature into account when judging the appropriateness of einige in the TVJT. Only one child — a 6year-old — failed to judge the einige sentences correctly in the TVJT. It is important to note that this child not only failed in the TVJT — showing that he did not recognise the implicature —, but he also had a strong preference for the semantic reading of einige in the PST. It is obvious that the results of the second experiment do not need extra statistical analysis to support the claim that there is a r = 1.0 (p = .000) correlation between the results of the two tasks. 3.3
Discussion
To the extent that a group of 12 children can be representative, the results of this experiment lead to two conclusions: first, the results of the TVJT indicate that children of this age almost always recognise and make use of implicatures in appropriate contexts. Second, the strong correlation between the results of both tasks in Experiment 2 shows that children who do not recognise implicatures strongly favour the semantic reading of scalar terms over their pragmatic reading, and vice versa. These two conclusions are critical for the interpretation of the data from the first experiment. They strongly support the claim that children who preferred the pragmatic reading of einige in the implicature task of the first experiment did so because they made use of a scalar implicature to solve the task, and not just because they relied on the most habitual meaning of einige in everyday conversation. The results of this study in general and those of the second experiment in particular (apparently) contradict Noveck’s (2001) proposal that a certain reluctance by children to draw implicatures may be responsible for the late understanding of weak epistemic terms. What direction does this point to? It is possible that to some extent Noveck’s design has masked children’s capacity to recognise scalar implicatures. In his third task, for example, children and adults had to judge whether a sentence like Certaines giraffes ont un long cou (‘Some giraffes have long necks’) was felicitous or not. No further context was given to the participants, so that they had to rely on their world
Why Do Children Fail to Understand Weak Epistemic Terms? 141
knowledge to judge the appropriateness of the sentence. Because there is no natural context in the world that supports a pragmatic reading of this sentence, it was expected that the participants would reject it as soon as the implicature is recognised. His results show that 7-year-olds did not take the implicature into account (neither did the adults in 41% of the trials). However, it is important to keep in mind that the participants in Noveck’s experiment could follow two distinct strategies to solve the task. First, they might conclude that the sentence was pragmatically incorrect. Second, they might assume that the goal of the study was to test their ability to understand the logical properties of scalar terms. Following this second strategy, they simply accepted the sentences as semantically correct. The lack of a context that could potentially fit with a pragmatic interpretation of the input sentences may have led the participants to decide that what was being tested was their ability to understand the logical properties of certains. Moreover, the lack of a context that fits with a pragmatic reading of certains might have led the participants to give an existential meaning to the quantifier. Since some is a weak quantifier, it is ambiguous between a quantificational and an existential reading. For example, a sentence like Some children went to school can mean that some particular children went to school (quantificational, e.g. Marc, Martha and Nina [but not Adam and Susanna]), or simply that there are children who went to school (existential). While the quantificational reading always gives rise to an implicature in appropriate contexts, the existential reading leaves it completely open, whether all or only a part of the children went to school; it only excludes that it is the case that none of the children went to school. The settings of the present experiments differed in an important point from Noveck’s design: they always gave rise to a comparison between a semantic and a pragmatic context that potentially fits with the tested weak scalar expressions. This was obvious in the first experiment and in the PST of the second experiment. However, the TVJT of the second experiment also provided a much more realistic context than Noveck’s third task did. The einige statements were always embedded in an everyday situation (e.g. crossing a road) that strongly favoured a pragmatic reading of the statements. Furthermore, in the ALL case of the TVJT, two Smurfs were at first reluctant to cross the road and had a discussion before they finally did so. This may have enhanced the probability that the correctness of the statement made by the observer has been judged from a pragmatic and not from a pure logical point of view. The fact that, at the beginning of the scene, it was not sure that all of the Smurfs would cross the road and that it always took time before they finally did, may have led the children to pay more attention to the distinction
142 Serge Doitchinov between the pragmatic and the semantic reading of einige than in Noveck’s third task. The children were also more likely to admit that the observer giving an einige statement in the ALL case did not notice that the last two Smurfs had actually crossed the road, and therefore that he had made a pragmatically inadequate statement. Although the data of the present study clearly show that the choice of an adequate context influences children’s willingness to take scalar implicatures into account, the data of Experiment 2 do not completely contradict Noveck’s main assumption. Following Sperber and Wilson’s (1995) Relevance Theory, Noveck suggests that the semantic reading is the basic reading of weak scalar terms, and that scalar implicatures are not generalised. Sperber and Wilson argue — against neo-Gricean accounts (cf. Levinson 2000) — that the pragmatic meaning of weak scalar terms is not automatically taken into account (as a generalised implicature), but it must be added every time the context requires it. Consequently, Noveck suggests that, in his experiment, the children found no reason to add an implicature because no appropriate context called for it, and therefore, they remained with the default (i.e. the logical) meaning of certains. The results of the present study do not contradict this claim, only if it is admitted that the contexts of the two experiments conducted here gave enough reason to the children to take the implicature into account when judging the input sentences. Furthermore, in the second experiment, the fact that the 6-year-old who did not recognise the scalar implicature in the TVJT also strongly favoured the semantic reading of einige in the PST fits well with the assumption that the default reading of weak scalar terms is the semantic and not the pragmatic one. However, the empirical basis of the second experiment of this study is too weak to make any definitive claim about this. More data, especially from younger children, will be needed to confirm or contradict this last observation. 4 Conclusion The results of the two experiments of this study suggest that the course of the acquisition of epistemic terms depends to a great extent on the development of children’s ability to process the information that underlies an epistemic inference. They also suggest that this ability is not yet fully mastered by eight years of age. This, however, does not mean that younger children are unable to use weak epistemic terms at all, but rather that their capacity to use such terms is simply limited by their partial inability to recognise epistemic uncertainty. Therefore, young children probably first use weak epistemic terms
Why Do Children Fail to Understand Weak Epistemic Terms? 143
seldomly, and only in everyday life situations they are very familiar with. In this sense, the data provided by this study do not fully contradict the findings of naturalistic studies (cf. Ehrich 2005). However, the results of this study also suggest that young children should have difficulties relying on linguistic evidence (i.e. epistemically modalised utterances of a speaker) to infer epistemic possibility, and that they should occasionally overgeneralise the use of strong epistemic terms in their talk. However, it remains questionable whether such mistakes can be observed in naturalistic studies at all. Acknowledgements This research has been supported by the “Deutsche Forschungsgemeinschaft” within the Sonderforschungsbereich 441 “Linguistic Data Structures”. I wish to thank Veronika Ehrich, Ira Noveck and two anonymous reviewers for insightful comments on this paper. I am also very grateful to Barbara Dillenburger and Frank Schlosser who helped me conduct the experiments.
References Bartsch, Karen and Henry M. Wellman 1995 Children Talk About the Mind. Oxford University Press, Oxford, New York. Byrnes, James P. and Harry Beilin 1991 The cognitive basis of uncertainty. Human Development, 34: 189– 203. Byrnes, James P. and Willis F. Overton 1986 Reasoning about certainty and uncertainty in concrete, causal and propositional contexts. Developmental Psychology, 6: 793–799. Chierchia, Gennaro, Stephen Crain, Maria T. Guasti, and Rosalind Thornton 1998 “some” and “or”: A study on the emergence of logical form. In Annabel Greenhill, Mary Hughes, Heather Littlefield, and Hugh Walsh, (eds.), Proceedings of the 22th annual Boston Conference on Language Development, volume 1, pp. 97–108. Cascadilla, Somerville, MA. Doitchinov, Serge 2001 ‘Es kann sein, daß der Junge ins Haus gegangen ist’. Zum Erstspracherwerb von k¨onnen in epistemischer Lesart. In R. M¨uller and M. Reis, (eds.), Modalit¨at und Modalverben im Deutschen, pp. 111– 134. Buske, Hamburg. Ehrich, Veronika 2005 Linguistic constraints on the acquisition of epistemic modal verbs. This volume.
144 Serge Doitchinov Horn, Laurence R. 1972 On the semantic properties of the logical operators in English. Mimeo: Indiana University Linguistic Club. Levinson, Stephen C. 2000 Presumptive Meanings. The Theory of Generalized Conversational Implicature. MIT Press, Cambridge, MA. Morris, Bradley J. and Vladimir Sloutsky 2002 Children’s solution of logical versus empirical problems: What’s missing and what develops. Cognitive Development, 16: 907–928. Noveck, Ira A. 2001 When children are more logical than adults: experimental investigations of scalar implicature. Cognition, 78: 165–188. Papafragou, Anna 2000 Modality: Issues in the Semantic-Pragmatic Interface. Elsevier, Amsterdam. Pieraut-Le Bonniec, Gilberte 1980 The Development of Modal Reasoning: Genesis of Necessity and Possibility Notions. Academic Press, New-York. Somerville, Susan C., B. A. Hadkinson, and G. Greenberg 1979 Two levels of inferential behaviour in young children. Child Development, 50: 119–131. Sperber, Dan and Deirdre Wilson 1995 Relevance: Communication and Cognition. Blackwell, Oxford, 2nd edition. Stephany, Ursula 1986 Modality. In Paul Fletcher and Michael Garman, (eds.), Language Acquisition, pp. 375–400. Cambridge University Press, Cambridge. Wellman, Henry M. 1990 The Child’s Theory of Mind. MIT Press, Cambridge, MA.
Processing Negative Polarity Items: When Negation Comes Through the Backdoor Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
1
Introduction
Various lexical elements, such as the German negative polarity item jemals (ever) exhibit an interesting property in that they can only occur in certain kinds of contexts. Negative polarity items must occur in a context in which the proper semantic/pragmatic properties are accessible, see (1a). If the context does not provide an (accessible) negator, the construction becomes unacceptable, see (1b) .1 (1)
a. Kein Mann war jemals glücklich. no man was ever happy ‘No man was ever happy.’ b. *Ein Mann war jemals glücklich. a man was ever happy ‘A man was ever happy.’
Another important observation is that the violation of the polarity construction is not due to a word category mismatch, see (2): (2)
a. Kein Mann war gestern glücklich. no man was yesterday happy ‘No man was happy yesterday.’ b. Ein Mann war gestern glücklich. a man was yesterday happy ‘A man was happy yesterday.’
In (2), the polarity item jemals is replaced by a non-polarity adverb gestern (yesterday). Both sentences (2a) and (2b) are equally acceptable, independent of the presence of negation. Seeing that the negative polarity item jemals (ever) belongs to the word category adverb, the unacceptability of (1b) is not due to a violation of the structural requirements. (1b) is unac-
146
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
ceptable rather, because of a conflict between the specific lexical demands of the polarity item and the properties of the context. However, what the relevant lexical properties are that restrict the occurrence of a polarity item, is discussed controversially in the theoretic linguistic literature. Linguistic descriptions of the distribution and interpretation of polarity items agree that the occurrence of polarity items is licensed by semantic (e.g. Horn 1997; Ladusaw 1980) or pragmatic (Chierchia 2001; Fouconnier 1980; Krifka 1995) properties, or by a combination of both (Baker 1970; Linebarger 1987). These properties, in addition, must be accessible to the polarity item, where accessibility is determined by hierarchical constituency (Haegeman 1995; Laka 1994; Progovac 2000). A negative polarity item is only licensed if it occurs in the scope of a negator, such as in (3a). As a consequence, linguistic theory predicts that a negative polarity construction is equally unacceptable irrespective of whether the context provides no negation at all, such as in (3b), or of whether the negative polarity item is preceded, but not c-commanded by a negator, such as in (3c). (3)
a. Kein Mann, der einen Bart hatte, war jemals glücklich. no man who a beard had was ever happy ‘No man who had a beard was ever happy.’ b. *Ein Mann, der einen Bart hatte, war jemals glücklich. a man who a beard had was ever happy ‘A man who had a beard was ever happy.’ c. *Ein Mann, der keinen Bart hatte, war jemals glücklich. a man who no beard had was ever happy ‘A man who had no beard was ever happy.’
In sum, syntactic as well as semantic/pragmatic information play an important role in the licensing of negative polarity items in a sentence context. In order to shed light on the specific lexical properties of a negative polarity item like jemals (ever) and the licensing conditions which are due to hierarchical constituency, we investigated in two studies negative polarity structures such as in (3). More specifically, we examined the influence of hierarchical constituency and linear order of the negator by using acceptability speeded judgment tasks and event-related brain potentials (ERPs).
Processing Negative Polarity Items
2
147
The judgment and processing of polarity items
From a psycholinguistic point of view, the properties of polarity items raise questions with respect to syntactic and semantic processing. More specifically, we want to know how the human language processor responds to the different types of demands initiated by a polarity item. This is supposed to shed light not only on the specific nature of polarity items, but more importantly on how the specific properties of the polarity item interact with the restrictions provided by the context. Our experiments focused on the acceptability of negative polarity in three types of constructions such as (3). In (3a) the negator kein (no) appears in the same clause (main clause) as the negative polarity item jemals (ever) and is therefore accessible. The ungrammatical structure in (3b) does not contain negation, which leaves the jemals (ever) unlicensed. In the third structure (3c), the negator precedes the negative polarity item, but the negation is not structurally accessible because it is too deeply embedded in the relative clause. If the linguistic description is correct that negative polarity items need a structurally accessible negator in order to be licensed, we expect that structures (3b) and (3c) (where this condition is not met) to be rejected as ungrammatical significantly more often compared to structures such as (3a). However, linguistic theory does not provide a reason to assume that acceptabilities should differ depending on whether the negation is there but not accessible, as in (3c), or not present at all, as in (3b). 2.1
Experiment 1: Speeded acceptability judgment-study
In a first experiment, we wanted to test how structures such as (3) are judged by native German speakers in a speeded grammaticality judgement task. This technique is believed to provide a reflection of online processing decisions by not allowing much time for reflection that could contaminate the subject’s responses and not accurately reflect their grammatical intuitions.2 2.1.1
Methods
Participants 24 students from the University of Leipzig (mean age 21 years, 10 female) participated. They were all monolingual speakers of German and received course credits for their participation.
148
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
Materials 24 sets of lexical material were constructed in each of the three critical conditions (3a) to (3c), resulting in a total of 72 experimental sentences. Each subject saw a subset of 24 sentences (8 per condition). Additionally, these sentences were intermixed with 24 related and 80 unrelated fillers. Procedure After a set of 12 training sentences (4 per condition), the 128 sentences of the experiment were presented in a pseudo-randomized order in the center of a screen. The initial subject phrase, the nominal phrase within the relative clause, and each of the other words in isolation were presented for 300ms each. There were 100ms blank screen between the pictures (interstimulus interval/ISI). 500ms after the last word of each sentence, subjects had to judge the acceptability of the presented sentence within a maximal interval of 3000ms by pressing one out of two buttons. 1000ms after their response, the next trial began. Data analysis We computed the mean accuracy percentages and mean response latencies in the correctly performed trials, per condition per subject as well as per condition per item. Subject and item analysis were statistically analyzed in separate ANOVAs with a factor VIOLATION, with three levels correct (COR), violation without negation (VNO), violation with inaccessible negation in the relative clause (VNR). In order to control for violations of sphericity, the correction proposed by Huynh and Feldt (1970) was applied. Seeing that all possible single comparisons were computed in case of significant main effect, the alpha level was adjusted according to Keppel (1991). 2.1.2
Results
Mean accuracy percentages and response latencies in the acceptability judgment task are displayed in Table 1. Table 1. Mean accuracy rates (in percent) and reaction times (in ms) for all three conditions across all 24 subjects (with standard deviations in parentheses) a. correct (COR) b. violation without negation (VNO) c. violation with inaccessible negation (VNR)
accuracy 85 (13.3) 83 (17.2) 70 (30.3)
reaction times 540 (241) 554 (237) 712 (314)
Processing Negative Polarity Items
149
The statistical analysis of the accuracies revealed a main effect of CON(F1 (2,46) = 4.38, p < .05; F2 (2,46) = 7.68, p < .01). This main effect was due to the fact that subjects made more errors in rejecting condition VNR compared to both VNO (F1 (1,23) = 6.11, p < .05; F2 (1,23) = 10.80, p < .01) and COR (F1 (1,23) = 5.11, p < .05; F2 (1,23) = 8.89, p < .01). VNO and COR, however, did not differ from one another (both F < 1). Analysis over response latencies revealed a similar picture. We found a main effect of CONDITION (F1 (2,46) = 8.75, p < .01; F2 (2,46) = 7.28, p < .01). Responses in condition VNR were slower compared to both VNO (F1 (1,23) = 26.68, p < .001; F2 (1,23) = 11.95, p < .01) and COR (F1 (1,23) = 10.25, p < .01; F2 (1,23) = 8.35, p < .05). Again, VNO and COR did not differ from one another (both F < 1). DITION
2.1.3
Discussion
The results of the acceptability judgment experiment show that subjects rejected both violation conditions as we expect them to do. However, the comparison between the incorrect conditions reveals that subjects accepted more often those structures in which the negator precedes the negative polarity item although it is not c-commanding the latter, compared to structures without negation at all. Differences in response latencies point in the same direction. The reduction in accuracy as well as the higher reaction times in condition VNR imply that it is difficult for the language processor to inhibit the influence of the negation in the relative clause. This suggests that the negator may be wrongly used to license the polarity item despite the fact that it is not in a c-commanding position. Nevertheless, the fact that the 70% of VNR constructions are rejected indicates that hierarchical constituency and accessibility play a crucial role in determining the acceptability of negative polarity constructions. 2.2 2.2.1
Experiment 2: Event-related brain potentials (ERPs) Some preliminary remarks on ERPs
In order to investigate the on-line processing of negative polarity constructions, we used event-related potentials (ERPs). Before we describe the result of our ERP study in detail we would like to give a brief overview over some language related ERP effects which have been identified using this experimental technique.
150
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
Event-related potentials (ERPs) are an ideal tool in investigating language processing on-line because they are continuous and have a very high temporal (millisecond-by-millisecond) resolution (Kutas and van Petten 1994). Compared to quantitative measures (e.g. reaction times), ERP effects (so-called components) are characterized by a set of quantitative (peak latency) and qualitative parameters (polarity, topography, experimental sensitivity). In response to linguistically distinct experimental manipulations, distinct ERP patterns have been found. With regard to language processing, four main markers have been identified in the literature. They are identified by a nomenclature which refers to their polarity (N/negativity versus P/positivity), post-stimulus peak latency and topographic distribution. The early left anterior negativity (ELAN) occurs between 120 and 220ms with either a left or a bilateral anterior distribution. In several studies this component has been associated with phrase structure violations (cf. Friederici 2002; Hahne and Friederici 1999, 2002). Second, the left anterior negativity (LAN) is similar in topography and in polarity to the ELAN, but peaks later than the latter, namely between 300 and 500ms and in response to morphosyntactic violations (Coulson, King, and Kutas 1998; Gunter, Schriefers, and Friederici 2000; Friederici and Frisch 2000). The third component is the so-called N400. The N400 is a negativity with a latency peaking typically around 400ms after the onset of a critical element. It has a centro-parietal bilateral distribution often with a slight right hemisphere focus. The N400 reflects the processing costs of semantic or thematic integration, since it has been found in response to semantic as well as thematic violations (Kutas and Hillyard 1980a; Friederici and Frisch 2000) either of verb argument structure or of thematic hierarchies between case-marked arguments (Frisch and Schlesewsky 2001). Finally, the so-called P600 is a positivity peaking between 600 and 900ms with a centro-parietal distribution and has been associated with syntactic reanalysis and repair (Osterhout and Holcomb 1992). Additionally, this component has been found in response to enhanced syntactic complexity (Kaan, Gibson, Harris, and Holcomb 2000; Friederici, Hahne, and Saddy 2002) including ambiguity (Frisch, Schlesewsky, Saddy, and Alpermann 2002).
Processing Negative Polarity Items
2.2.2
151
ERPs and polarity constructions
With regard to our study on negative polarity items, there are – to our knowledge – only two studies in the literature in which the processing of polarity items was tested by using event-related potentials. In a study carried out by Shao and Neville (1998), the differences between a correct sentence (4a) and its violation (4b) were tested. (4)
a. Max says that he has never been to a birthday party. b. *Max says that he has ever been to a birthday party.
Shao and Neville (1998) found in ungrammatical sentences like (4b) on the polarity item ever an anterior negativity between 300 and 500ms followed by a late positivity between 500 and 1000ms, compared to never which meets the context requirements in (5a). Surprisingly, they suggested that the negativity can be associated with specific types of semantic processing that are bound to polarity constructions, although they did not find an N400. Additionally, it is well known that lexical differences between two elements affect ERP correlates (Kutas and van Petten 1994). Therefore, it is not possible to exclude that Shao and Neville's findings were influenced by lexical differences between the two items tested (ever versus never). Saddy, Drenhaus, and Frisch (2004) investigated the failure to license positive polarity items ((5c) versus (5d)) and negative polarity items ((5a) versus (5b)) in German. (5)
a. Kein Mann, der einen Bart hatte, war jemals glücklich. no man who a beard had was ever happy ‘No man who had a beard was ever happy.’ b. *Ein Mann, der einen Bart hatte, war jemals glücklich. a man who a beard had was ever happy ‘A man who had a beard was ever happy.’ c. *Kein Mann, der einen Bart hatte, war durchaus glücklich. no man who a beard had was certainly happy ‘No man who had a beard was certainly happy.’ d. Ein Mann, der einen Bart hatte, war durchaus glücklich. a man who a beard had was certainly happy ‘A man who had a beard was certainly happy.’
Their results showed different processing reflexes associated with failure to license positive polarity items in comparison to failure to license negative polarity items. Failure to license both negative and positive polar-
152
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
ity items elicited an N400 component reflecting semantic integration cost. Failure to license positive polarity items, however, also elicited a P600 component ((5c) versus (5d)). The additional P600 in the positive polarity violations was interpreted as a reflex of higher processing complexity associated with a negative operator. Their results suggest a difference between the two types of violation; namely, that the processing of negative and positive polarity items does not involve identical mechanisms. 2.2.3
The present study
In an ERP experiment, we addressed the question of how structures such as in (3) are processed on-line. Seeing that ERPs provide qualitatively different types of responses (components) being associated with different types of linguistic information (see above), they can be used to answer the following questions: First, how do ERP patterns differ between acceptable (such as (3a)) and unacceptable structures (such as (3b) and (3c))? Second, what is the nature of the intrusion effect found in Experiment 1, i.e. the difference between (3b) and (3c)? Seeing that the violation of licensing a negative polarity item elicited a N400 component in Saddy et al. (2004), we expect a similar pattern for the violation conditions in our study, namely, that such a violation induces semantic/pragmatic integration problems. We do not expect a P600 effect in our study, seeing that Saddy et al. found this component only in response to a positive polarity violation. Over and above that, we are interested in the influence of the negator in the relative clause on the ERP pattern for the negative polarity item. As we have seen in the acceptability judgments in Experiment 1, a non ccommanding negation enhances the acceptability of the unlicensed negative polarity item. If this effect reflected already an on-line process, we would expect the violation effect (that is, the N400 that is generally expected according to Saddy et al. 2004) to decrease. 2.2.4
Methods
Participants 16 undergraduate students (mean age 21 years, 10 female) from the University of Potsdam participated in the ERP study after giving informed consent. All were right-handed and had normal or corrected-tonormal vision.
Processing Negative Polarity Items
153
Materials Each subject read a total of 640 sentences in a randomized order in which 120 critical sentences (40 per condition) were intermixed with 120 related and 320 unrelated filler sentences. The total material was presented to each subject in two sessions with a one-week interval between the sessions. All sentences consisted of a main clause, in which a relative clause was embedded. The negator appeared in the grammatical condition (6a) in the main clause (accessible to the negative polarity item), in ungrammatical condition (6c) in the relative clause (not accessible to the negative polarity item) and the ungrammatical condition (6b) did not contain negation. (6)
a. Kein Mann, der einen Bart hatte, war jemals glücklich. no man who a beard had was ever happy ‘No man who had a beard was ever happy.’ b. *Ein Mann, der einen Bart hatte, war jemals glücklich. a man who a beard had was ever happy ‘A man who had a beard was ever happy.’ c. *Ein Mann, der keinen Bart hatte, war jemals glücklich. a man who no beard had was ever happy ‘A man who had no beard was ever happy.’
Procedure After a set of 12 training sentences (4 in each of the critical conditions, see above), the 120 critical sentences were randomly presented in the center of a screen, with 400ms (plus 100ms interstimulus interval) for the initial subject phrase, the nominal phrase within the relative clause, and for each of the other words in isolation. 500ms after the last word of each sentence, subjects had to judge its well-formedness within a maximal interval of 3000ms by pressing one out of two buttons. 1000ms after their response, the next trial began. The EEG was recorded by means of 16 AgCl electrodes with a sampling rate of 250Hz (with impedances < 5kOhm) and were referenced to the left mastoid (re-referenced to linked mastoids offline). Electrode positions are based on the nomenclature proposed by the American Electroencephalographic Society (Sharbrough et al. 1991). The horizontal electro-oculogram (EOG) was monitored with two electrodes placed at the outer canthus of each eye and the vertical EOG with two electrodes above and below the right eye. Data analysis In order to see whether subjects judged the sentences in the way we expected them to do, the accuracy percentages of their judgments
154
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
as well as response latencies of behavioral data were analyzed (see section 2.1.1 for explanation). For the ERP analysis, only trials with correct answers in the judgment task and without artifacts were selected (83% of all trials). In order to compensate for drifts, data were filtered with 0.4Hz high pass. Single subject averages were computed in a 1300ms window relative to the onset of the critical item (second argument) and aligned to a 200ms pre-stimulus baseline. ERPs were statistically analyzed in two time windows: 300–450 ms for the N400 and 500–800ms for the P600 effects. ERP effects were statistically computed in repeated-measures analyses of variance (ANOVA) with two factors: A condition factor VIOLATION, with three levels correct (COR), violation without negation (VNO), violation with inaccessible negation in the relative clause (VNR) and a topographical factor REGION with the three levels anterior (electrodes F3, FZ and F4), central (electrodes C3, CZ and C4) and posterior (electrodes P3, PZ and P4). 2.2.5
Results of the behavioral data
Table 2 shows the mean accuracy percentages and response latencies in the behavioral data of the ERP-study (acceptability judgment task). Table 2. Mean accuracy rates (in percent) and reaction times (in ms) for all three conditions across all 16 subjects (with standard deviations in parentheses) a. correct (COR) b. violation without negation (VNO) c. violation with inaccessible negation (VNR)
accuracy 94 (4.6) 95 (4.6) 89 (8.7)
reaction times 575 (203) 529 (184) 595 (254)
The statistical analysis of the accuracies showed a main effect of CON(F1 (2,30) = 5.29, p < .05; F2 (2,78) = 8.28, p < .001). This main effect was due to the fact that subjects made more errors in rejecting condition VNR compared to both COR (F1 (1,15) = 5.61, p < .05; F2 (1,39) = 8.04, p < .001) and VNO (F1 (1,15) = 10.57, p < .001; F2 (1,39) = 17.17, p < .0001). However, VNO and COR did not differ from one another (both F < 1). The analysis over response latencies showed a different picture. We found a main effect of CONDITION in the item analysis (F2 (2,78) = 5.46, p < .001) which was only marginal in the subject analysis (F1 (2,30) = 2.82, p = .08). Responses in condition VNR were slower compared DITION
Processing Negative Polarity Items
155
to VNO (F1 (1,15) = 5.43, p < .05, F2 (1,39) = 8.31, p < .001). The responses in condition VNO were faster compared to COR in the item analysis only (F1 (1,39) = 2.69, p = .12, F2 (1,39) = 7.48, p < .01). VNR and COR, however, did not differ (F1 < 1, F2 < 1). The results of the behavioral data show a similar pattern as in Experiment 1. Subjects rejected the incorrect conditions (VNO) and (VNR). The comparison between both ungrammatical conditions showed that subjects incorrectly accept sentences more often when negation linearly precedes the negative polarity item (VNR) than in sentences which do not contain negation. 2.2.6
Results of the ERP study
ERP patterns from the onset of the critical item (polarity item, onset at 0ms) up to 1000ms thereafter are displayed in Figure 1. As can be seen from the Figure 1, ERPs in both incorrect conditions (b) and (c) show a negativity that is broadly distributed compared to the correct condition (a). Moreover, we see a difference in the amplitude of the negativity in both ungrammatical conditions (b) versus (c), in that the negativity seems to be weaker in the condition where negation is captured in the relative clause (c). Additionally, the visual inspection reveals a positivity for both ungrammatical condition (b) and (c) in comparison with the grammatical condition (a), especially at posterior sites. Statistical analysis for the N400 time window (300 to 450 ms) revealed a main effect of VIOLATION (F (2,30) = 9.91, p < .001) which was due to a negative going pattern in both VNO (F (1,15) = 17.22, p < .001) and VNR (F (1,15) = 6.42, p < .05) compared to COR. In addition, there was a marginal negativity in VNO compared to VNR (F (1,15) = 4.29, p = .08). Furthermore, there was an interaction VIOLATION x REGION (F (4,60) = 9.58, p < .001). Resolving this interaction revealed main effects of VIOLATION in the anterior (F (2,30) = 4.61, p < .05), the central (F (2,30) = 10.58, p < .001) and the posterior region (F (2,30) = 13.00, p < .001). VNO was more negative than COR in all three regions (anterior: F (1,15) = 5.97, p < .05; central: F (1,15) = 17.59, p < .001; posterior: F (1,15) = 27.05, p < .001). The pattern in VNR was also more negative than in COR, but only in the central (F (1,15) = 8.82, p < .05) and the posterior region F (1,15) = 11.25, p < .001, not at anterior sites (F < 1). Interestingly,
156
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
Figure 1. ERP effects on the negative polarity item jemals (ever) from the onset up to 1000ms there after at a subset of nine electrodes. Negativity is plotted upwards. The solid line displays the grammatical condition (a), the dotted line displays the incorrect condition without any negation (b) and the broken line displays the incorrect condition where the relative clause contains negation (c). For presentation purpose only, ERPs were filtered off-line with 8Hz low pass.
VNO elicited a negativity compared to VNR in the anterior region (F (1,15) = 7.82, p < .05), but not in the two other regions (central: F (1,15) = 3.12, p = .15; posterior: F (1,15) = 2.51, p = .20). In the global ANOVA for the P600 time window (500 to 800ms) we only found a significant interaction for VIOLATION x REGION (F (4,60) = 2.63 p < .05). Resolving the interaction, we found a main effect of VIOLATION only in the posterior region (F (2,30) = 5.04 p < .05). This main
Processing Negative Polarity Items
157
effect was due to a positivity in both VNR (F (1,15) = 5.68 p < .05) and VNO (F (1,15) = 9.25 p < .05) compared to COR. There was no effect between the two violation conditions VNO and VNR (F < 1). 3
Discussion
In the present study, we investigated the processing of negative polarity items in German using speeded acceptability judgment tasks and ERPs. The results have shown that the language processor is sensitive to whether the licensing conditions of the negative polarity item can be met. Furthermore, our results showed that the language processor is sensitive to licensing information when the licensor of the negative polarity item is structurally not accessible. We know from their linguistic description that polarity items have interesting lexical properties in that they are dependent on a specific context. A negative polarity item like the German jemals (ever) has to occur in the scope of appropriate licensor, where scope is defined in hierarchical terms, namely, a c-command relation. These syntactic and semantic constraints have to be met, otherwise the structure becomes ungrammatical. In short, there are two licensing conditions for negative polarity items. First, a licensor has to be present (semantic/pragmatic condition), and second, a licensor has to be structurally accessible (structural/syntactic condition). The linguistic characterization of this specific type of lexical element does not give reason to assume that acceptability should differ, depending on whether negation is there but not accessible as in (6b) or not present at all as in (6c). Negation in a relative clause should not influence the unacceptability of structures with jemals (ever). In this sense, we would expect similar responses from the processing system when licensing conditions are violated. However, this seems not to be the case. Results of the judgment study (Experiment 1) – which are nicely replicated in the behavioral control task of Experiment 2 – show that subjects rejected both violations, but not in the same way. Interestingly, accuracies were lower and reaction times were higher in the violation condition in which negation appeared in the relative clause compared to the violation without negation at all. This suggests that the negator is (wrongly) used to license the polarity item even if it is not in a c-commanding position. Obviously, the language processor does not strictly adhere to structural restrictions. However, since judgement data cannot reveal the qualitative nature of this effect, and in order to see whether the intrusion is an on-line process, an ERP study was conducted.
158
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
The results of the ERP study showed both that a negative polarity item in the context without negation and in the context where the negation is not accessible elicit an N400 negativity followed by a late positivity (P600) compared to their grammatical counterparts. Following ERP literature as described above (see Section 3.2.1), the N400 reflects enhanced cost of semantic integration, whereas the P600 can be seen as a marker of syntactic repair attempts. Although the N400 was predicted according to the study of Saddy et al. (2004), the P600 was not, since Saddy et al. found a significant positivity only in response to a positive polarity violation, but not on unlicensed negative polarity items. How can we account for the difference between the Saddy et al. and the present study? One possibility could be that the P600 might be sensitive to the saliency of a (syntactic) violation (Coulson, King, and Kutas 1998). In other words, this component has been argued to increase, the more salient the violation gets. Saliency can be operationalized via detectability in case of a violation (Osterhout and Hagoort 1999). In other words, saliency of a violation should be inversely proportional to the number of errors subjects make when judging structures containing the respective violation (Osterhout and Hagoort 1999). Accordingly, we would expect the participants in the study of Saddy et al (2004) – which did not show a significant P600 component – to have significantly higher error rates than the participants in the present study where a significant P600 was found. This seems in fact to be the case.3 Moreover, Saddy et al. (2004) found the absence of a P600 for a jemals-violation to contrast with the violation of a positive polarity item (durchaus (certainly)) which elicited both a N400 and a P600. It would be in line with the general argument of Saddy et al. (2004) if the violation of a negative polarity item jemals (ever) in the present study affected the P600, but not as strongly as a violation of a positive polarity item. Again, there are indications that this is indeed the case.4 Nevertheless, whether these interpretations satisfactorily explain the divergence between the two studies has to be systematically addressed in future experiments. Seeing that the elicited P600 does not distinguish between the two ungrammatical conditions, the negation in the relative clause does not seem to affect the parser's effort to repair a syntactically ill-formed structure. With regard to the N400, by contrast, we found differences in the amplitudes of this component between the two ungrammatical conditions. In the condition where negation is not accessible to the negative polarity item, the N400 was weaker compared to the condition without negation at all. This result suggests that a preceding negator can erroneously promote semantic/pragmatic integration despite a lack of structural accessibility.
Processing Negative Polarity Items
159
In both cases, the processor tries to integrate the polarity item by ‘looking for’ a negator in the sentence context. In the case of a – structurally inaccessible, but linearly preceding – negator, the processor finds a suitable target and a potential analysis is entertained. This elicits a weaker N400 compared to the context without negation. In the latter case, the N400 is stronger because the processor does not find a goal that allows a semantic/pragmatic integration of the polarity item. The fact that there is no observable difference in the P600 in the two violation conditions, suggests that the influence of negation in the relative clause has an impact only on the semantic/pragmatic integration of the negative polarity item (N400). The results of this study point to further systematic investigations of polarity constructions that will shed more light on the processing of these elements and the contexts they appear in. In our study, we have investigated the conditions for negative polarity items in a specific type of construction. Further research should investigate the role of other licensors. WH-operators, for example, are considered another licensor of negative polarity items. It would be interesting to test whether the properties of this kind of operator will induce similar or distinct ERP effects compared to the processing of constructions with negation. This will give us more insight into the understanding of the licensing relation/environments in which negative polarity items appear. In general, the investigation of polarity constructions gives us the possibility of understanding the interaction of pragmatic, semantic, and syntactic phenomena. 4
Conclusion
Both the results of the speeded acceptability judgement as well as the ERP experiment revealed that unlicensed negative polarity items are unacceptable on both semantic and syntactic grounds. Furthermore, a linearly preceding but structurally inaccessible negator can, on the one hand, erroneously enhance the acceptability of the structure in the judgement data, and, on the other hand, weaken the N400 effect in the ERP data, which reflects semantic integration problems. These results can be interpreted as follows: the simple existence of a potential licenser for a negative polarity item is sufficient to alter both the time course and efficiency of processing. It follows then, that at least for the examples investigated here, there is a competition between semantic/pragmatic information and hierarchical constituency. A theoretical approach that is only based on structural relations would not predict this dis-
160
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
tribution. Taken together, the results of the two experiments support an approach that combines semantic (pragmatic) properties and hierarchical constituency during the processing of negative polarity items. Acknowledgements The present research was supported by a grant from the Deutsche Forschungsgemeinschaft (DFG) to D.S. (FOR 375/1–4). We want to thank Angela D. Friederici, Joanna Báaszczak – among many others – and two anonymous reviewers for many helpful suggestions, and Heike Herrmann, Beate Müller and Kristin Wittich for their support in data acquisition.
Notes 1.
2. 3.
4.
For the purpose of our paper, we restrict ourselves only to (constituent) negation. However, the characterization of the licensing contexts for negative polarity items is incomplete. Polarity items can be licensed by verbs with negative properties (e.g., We doubted that Peter was ever happy) or downward entailing weak quantifiers (e.g., Few men were ever happy) or in the context of questions (e.g., Who was ever happy). Note that there are contexts in which a polarity item is licensed even if it is not overtly c-command by negation (e.g., A doctor who knew anything about acupuncture was not available). The results of a questionaire study used to determine the viability of potential stimuli sentences showed that subjects did not discrimate between (3b) and (3c) type of constructions. By comparing the error rates in the correct condition and the violation condition without a negation in the relative clause (which were identical in both studies) in the Saddy et al. study with those in the present experiment in a between-subjects ANOVA, we found an interaction VIOLATION x GROUP (F (1,30) = 7.99, p < .01). This interaction was indeed due to the fact that the mean accuracies in the violation condition without a negation in the relative clause (which was used in both studies) were lower in the Saddy et al. study (86%) compared to the experiment described in the present paper (92%). This difference turned out to be significant (F (1,30) = 5.96, p < .05). Interestingly, there was no difference in accuracies between the correct conditions from both studies (95% versus 94%, F < 1). This shows that there was no general difference in performance between the two subject samples, but that this was confined to the detection of the violation. Since we had positive polarity constructions with durchaus (certainly) as fillers in the present study, we could directly compare negative and positive polarity violation effects. We therefore chose only structures without negation in
Processing Negative Polarity Items
161
the relative clause (COR and VNO in the present study and the respective durchaus-counterparts) since these are identical to the conditions in Saddy et al. (in press). In a time window between 600 and 800 ms over all 9 electrodes, we found in the direct comparison no significant difference between the two correct conditions (F < 1). However, we found a significant difference in the incorrect conditions (F (1,15) = 4.7, p < 0.5), which was due to the fact that the pattern in the incorrect durchaus-condition was more positive going than the pattern in the jemals-violation condition.
References Baker, Curtis L. 1970 Double Negatives. Linguistic Inquiry, 1: 169–186. Brown, Colin M. and Hagoort, Peter 1999 The Neurocognition of Language. Oxford: Oxford University Press. Chierchia, Gennaro. 2001 Scalar implicatures, polarity phenomena, and the syntax/pragmatics interface. Ms. University of Milan. Coulson, Seana, King, Jonathan & Kutas, Marta 1998 Expect the unexpected: Event-related brain response to morphosyntactic violations. Language and Cognitive Processes, 13, 21–58. Fauconnier, Gilles 1975b Polarity and the scale principle. Chicago Linguistic Society, 11, 188– 199. Fauconnier, Gilles 1980 Pragmatic entailment and questions. In J.R. Searle et al. (Eds.), Speech act theory and pragmatics, Dordrecht, Reidel. Friederici, Angela D. 2002 Towards a neural basis of auditory language processing. Trends in Cognitive Science, 6, 78–84. Friederici, Angela D., Hahne, Anja. and Saddy, Douglas. 2002 Distinct neurophysiological patterns reflecting aspects of syntactic complexity and syntactic repair, Journal of Psycholinguistic Research, 31 (1), 45–63. Friederici, Angela D. and Frisch, Stefan 2000 Verb argument structure processing: the role of verb-specific and argument-specific information. Journal of Memory and Language, 43, 476–507. Frisch, Stefan and Schlesewsky, Matthias 2001 The N400 reflects problems of thematic hierarchizing. Neuroreport, 12, 3391–3394. Frisch, Stefan, Schlesewsky, Matthias, Saddy, Douglas, and Alpermann, Annegret 2002 The P600 as an indicator of syntactic ambiguity. Cognition, 85, 83– 92.
162
Heiner Drenhaus, Stefan Frisch, and Douglas Saddy
Giannakidou, Anastasia 1998 Polarity Sensitivity As (Non)Veridical Dependency. Linguistik Aktuell/Linguistics Today, 23. Gunter, Thomas C., Stowe, Laurie A., and Mulder, Gerben 1997 When syntax meets semantics. Psychophysiology, 34, 660–676. Haagort, Peter, Brown, Colin M., and Groothusen, J. 1993 The syntactic positive shift as an ERP measure of syntactic processing. Language and Cognitive Processes, 8, 439–483. Haegeman, Liliane 1995 The Syntax of Negation [= Cambridge Studies in Linguistics 75]. Cambridge University Press. Hahne, Anja 1998 Charakteristika syntaktischer und semantischer Prozesse bei der auditiven Sprachverarbeitung: Evidenz aus ereigniskorrelierten Potentialstudien. MPI Series in Cognitive Neurosciences 1. Horn, Larry 1997 Negative polarity and the dynamics of vertical inference. In Forget, D., Hirschbühler P., Martinon, F. & M.-L. Rivero (Eds.), Negation and Polarity: Syntax and Semantics (pp. 157–182). Amsterdam: John Benjamins. Huynh, Huynh and Feldt, Laur. S. 1970 Conditions under which the mean square ratios in repeated measurement designs have exact F-distribution. Journal of the American Statistical Association, 65, 1582–1589. Kaan, Edith, Harris, Anthony, Gibson, Edward, and Holcomb, Phillip 2000 The P600 as an index of syntactic integration difficulty. Language and Cognitive Processes, 15, 159–201. Krifka, Manfred 1995 The semantics and pragmatics of polarity items. Linguistic Analysis, 25, 209–257. Keppel, Geoffrey 1991 Design and Analysis. Upper Saddle River: Prince Hall. Kutas, Marta & Hillyard, Steven A. 1980a Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207, 203–205. Kutas, Marta & VanPetten, Cyma K. 1994 Psycholinguistics electrified. In M. A. Gernsbacher (Ed.), Handbook of Psycholinguistics (pp. 83–143). San Diego: Academic Press. Ladusaw, William 1980 Polarity Sensitivity as Inherent Scope Relations. New York & London: Garland Publishing, Inc. Laka, Itzia 1994 On the Syntax of Negation. New York & London: Garland Publishing, Inc.
Processing Negative Polarity Items
163
Linebarger, Marcia 1987 Negative polarity and grammatical representation. Linguistics and Philosophy, 10, 325–387. Osterhout, Lee and Hagoort, Peter 1999 A superficial resemblance does not necessarily mean you are part of the family: Counterarguments to Coulson, King, and Kutas (1998) in the P600/SPS-P300 debate. Language and Cognitive Processes, 14, 1–14. Osterhout, Lee and Holcomb, Phillip J. 1992 Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language, 31, 785–806. Progovac, Ljiljana 2000 Negative and positive feature checking and the distribution of polarity items. In Brown, S. & Przepiorkowski, A. (Eds.), Negation in Slavic. Slavica Publishers. Saddy, Douglas, Drenhaus, Heiner, and Frisch, Stefan 2004 Processing polarity items: Contrastive licensing costs, Brain and Language, Volume 90, Issues 1–3, July–September 2004, Pages 495–502. Shao, Jenny & Neville, Helen 1998 Analyzing semantic processing using event-related brain potentials. The Newsletter of the Center for Research in Language. University of California, San Diego, La Jolla CA 92039, 11 (5). www:http://crl.ucsd.edu/newsletter.html. Sharbrough, Frank W., Chatrian, Gian-Emilio, Lesser, Ruth P., Lüders, Hans O., Nuwer, Mark R., and Picton, Terence W. 1991 American Electroencephalographic Society guidelines for standard electrode position nomenclature. Journal of Clinical Neurophysiology, 8 (2), 200–202.
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs Veronika Ehrich
1
Introduction
The form and meaning of modal verbs (MVs) have been under linguistic debate for many years. The still ongoing discussion adresses the status of MVs as (i) auxiliaries or non-auxiliaries (functional or non-functional categories), (ii) as raising vs. control verbs, (iii) as sources of coherent infinitives and (iv) as systematically polyfunctional items, interpretable by reference to either circumstantial or epistemic discourse backgrounds in the sense of Kratzer (1991). The main problem at issue is the question whether the semantic polyfunctionality of MVs in general, and the epistemic MVreadings in particular, have a grammatical correlate in (one of) the properties (i-iii). Psycholinguistic studies investigating the acquisition of MVs have been mainly concerned with the cognitive basis of modal reasoning and the use of epistemic MVs as reflecting a developing Theory of Mind. In the present paper, I argue that the ontogenesis of epistemicity has a syntactic and a semantic basis as well, especially concerning the acquisition of the coherent infinitive and of different non-epistemic discourse backgrounds for MVs. I argue against monocausal accounts in acquisition research and try to demonstrate that different developmental paths in the domains of syntax, semantics and cognition converge in giving rise to epistemic meanings. The paper is structured as follows. In section 2, I review the relevant semantic and syntactic properties of German MVs and the main findings of MV-acquisition research. Section 3 presents the results of a corpus study (Caroline-corpus in CHILDES) in relation to the competing (psycho-) linguistic approaches to epistemicity in language and language development.
166
Veronika Ehrich
2 2.1
Linguistic properties of German MVs Semantic polyfunctionality
Modals relate a given state of affairs p, denoted by the subject-infinitive predication ‘Max swim’ in (1-2), to a given discourse background such as the subject’s desires, abilities or obligations (1), or the speaker’s evidence for p (2). According to Kratzer (1991), p is assessed as necessary if it follows from a given discourse background, and as possible if it is compatible with it. Kratzer defines three parameters determining the interpretation of a given MV: MODAL FORCE (necessity vs. possibility), MODAL BASE (circumstantial vs. epistemic) and ORDERING SOURCE (dispositional, deontic or realistic backgrounds for a circumstantial MODAL BASE (1), or between strictly epistemic and quotative-evidential readings for an epistemic MODAL BASE (2)). (1)
CIRCUMSTANTIAL MODAL BASE a. Max muss / kann täglich schwimmen. Er braucht das einfach. ‘Max has to / is able to swim every day. He simply needs that.’ (Desire / ability to bring about p: Dispositional Ordering Source) b. Max muss / kann jetzt schwimmen. Ich verlange / erlaube es. ‘Max is obliged / permitted to swim now. I request / permit that.’ (Obligation / permission wrt. p: Deontic Ordering Source) c. Max muss / kann zur Insel schwimmen. Das Wasser ist recht tief. ‘Max must / may swim to the island. The water is quite deep.’ (Pure necessity / possibility of p: Realistic Ordering Source)
(2)
EPISTEMIC MODAL BASE a. Max muss / kann täglich schwimmen. Sein Auto parkt immer beim See. ‘Max must / may swim every day. His car is always parked near the lake.’ (Inference from speaker’s knowledge: Strictly Epistemic Ordering Source) b. Max muss täglich schwimmen – nach dem was seine Freunde sagen. ‘Max must swim every day– according to his friends.’ (Evidence from hear-say: Quotative-evidential Ordering Source)
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
167
2.2 Syntactic correlates of semantic polyfunctionality 2.2.1 Raising and control It is commonly assumed in generative syntax that MVs are control verbs in circumstantial and raising verbs in epistemic readings (Ross 1969; v. Stechow and Sternefeld 1988; Roberst and Roussou 1999). A recent variant of the raising hypothesis, proposed by Wurmbrand (1999), assumes that all German MVs are raising verbs in any reading, which implies that wollen (‘wish to’) – being a control verb – in fact doesn’t belong to the class of German MVs. But wollen shares all features that define German MVs as a class: the preterito-present morphology, the embedding of bare infinitives, the coherent construction type and, most importantly, the semantic polyfunctionality allowing for bouletic as well as quotative-evidential readings. Hence, the assumption that wollen does not belong to the class of MVs is unconvincing. Raising verbs do not assign a theta-role to their matrix subject and thus leave the subject position empty at the level of deep structure. The subject of the embedded verb must be raised to the subject position of the matrix clause, where it receives case. Control verbs, on the other hand, theta-mark their base-generated matrix subject, and the PRO-subject of the embedded clause is subject to control. These differences in derivational history are reflected in various distributional properties: raising verbs, as opposed to control verbs, embed impersonal constructions (including impersonal passives), they allow for expletives as well as sentential subjects, and their active-passive-variants are truth-functionally equivalent. Application of these diagnostics to German MVs proves that the raising/control contrast cuts across the MODAL BASE circumstantial/epistemic distinction: Können, müssen, sollen, dürfen show raising properties in epistemic as well as in most circumstantial readings, whereas wollen behaves as a control verb in all its readings (see Öhlschläger 1989; Reis 2001). While the raising/control opposition is orthogonal to the distinction of circumstantial vs. epistemic MODAL BASEs, there is a systematic interaction between raising and control in terms of ORDERING SOURCE. Behaviour with respect to truth-functional equivalence of active and passive is a case in point. Bouletic readings of müssen (‘must’) in (3) and ability readings of können (‘can’) in (4) have non-equivalent active-passive variants, which corresponds to the control pattern in this respect.
168
Veronika Ehrich
(3)
a. Ich freue mich so, dich zu sehen, dass ich dich umarmen muss. I enjoy myself so you to see, that I you hug-Inf must ‘I am so glad to see you, that I have to (urgently wish to) hug you.’ b. Ich freue mich so, dich zu sehen, dass du von mir umarmt werden musst. I enjoy myself so you to see, that you by me hugged get-Inf must ‘I am so glad to see you that you have to (urgently wish to) be hugged by me.’
(4)
a. Der Zauberer kann ein Kaninchen aus dem Hut zaubern. the wizard can a rabbit out the hat produce-Inf ‘The wizard is able to produce a rabbit out of his hat.’ b. Ein Kaninchen kann (*ist fähig) von dem Zauberer aus dem Hut gezaubert (*zu) werden. a rabbit can by the wizard out the hat produced be ‘A rabbit can (*is able to) be produced out of the wizard’s hat.’ Glosses in the examples are based on etymological (but not always semantically equivalent) English cognates. The appropriate English readings are presented in the translations.
Deontic active-passive alternants, on the other hand, seem to be truthfunctionally equivalent (5a,b). (5)
a. Das Kind kann /muss / soll den Großvater küssen. the child can / must / shall the grandfather kiss-Inf ‘The child may / has to / is supposed to kiss the grandfather.’ b. Der Großvater kann / muss / soll von dem Kind geküsst werden. the grandfather may / must / shall by the child kissed be ‘The grandfather may / has to / is supposed to be kissed by the child.’ c. Der Großvater kann / muss / soll sich von dem Kind küssen lassen. the grandfather can / must / shall himself by the child kiss-Inf let ‘The grandfather can / must / shall allow to be kissed by the child.’
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
169
The passive sentence (5b) is, however, ambiguous, and may be interpreted in terms of an obligation addressed to either the child or the grandfather as in (5c). The active-passive equivalence only holds for the first interpretation (obligation addressed to the child) and is thus dependent on the assumption that the obligation is addressed to the referent of the logical rather than the grammatical subject in (5b). The target of the obligation is, of course, irrelevant in cases where the embedded verb lacks a logical subject (6a), or where the ORDERING SOURCE is determined by a given course of events (6b,c) and thus forms a realistic discourse background. MVs clearly show the behaviour of raising verbs in these cases. (6)
a. Es muss / kann heute regnen. (Bouletic, deontic) it must / can today rain-Inf ‘It must / may rain today.’ b. Wenn der Tank leer ist, kann / muss man ihn füllen. when the tank empty is can/ must you it fill-Inf ‘When the tank is empty, you must / should fill it.’ c. Wenn der Tank leer ist, kann / muss er gefüllt werden. when the tank empty is, can / must it filled get-Inf ‘When the tank is empty it can / must be filled.’
The raising control contrast, while cutting across the epistemic / nonepistemic-opposition, is thus sensitive to differences in ORDERING SOURCE. Dispositional (bouletic or ability) readings of können and müssen are control readings, deontic readings have a control as well as a raising potential depending on the source and the addressee of the obligation. Realistic readings show consistent raising behaviour, which they share with epistemic MV readings. While it is thus true that raising and control interact with the interpretation of MVs, this interaction takes place on the level of ORDERING SOURCE rather than MODAL BASE. 2.2.2 Strict coherence The evidence presented so far shows that raising vs. control diagnostics do not clearly single out epistemic readings. Thus, for a child having to determine the grammatical basis of epistemicity, raising diagnostics are poor evidence. On the other hand, German MVs behave quite consistenly with respect to the following syntactic requirements: MVs (i) uniformly govern the bare infinitive, and (ii) enter into obligatorily coherent
170
Veronika Ehrich
constructions. Reis (2001), therefore, argues that STRICT COHERENCE (defined by the combination of i,ii) is to be considered the syntactic correlate of semantic polyfunctionality. In a coherent construction, matrix verb and embedded verb get fused into a single verbal complex, to the effect that matrix and embedded clause integrate into a single clause (7a). As a conseqence, extraposition of the infinitival complement is ungrammatical (7b), whereas fronting of MV plus infinitive is possible (7c). (7)
a. ...dass er den Kuchen aufessen darf. ... that he the cake up-eat-Inf may ‘...that he may eat up the cake.’ b. ...,*dass er darf den Kuchen aufessen (No extraposition of Inf. ). ..., that he may the cake up-eat-Inf ‘..., that he may eat up the cake.’ c. Aufessen dürfen hat er den Kuchen. up-eat-Inf may-Inf has he the cake ‘He was allowed to eat the cake up.’
Coherence diagnostics besides extraposition, in particular scrambling and adverbial scope ambiguities (see Kiss 1995), present positive evidence to the child and may help her to recognize the essential features of German MVs on which their polyfunctional semantics is based. English MVs, of course, exhibit the same kind of polyfunctionality although the coherence / non-coherence contrast does not exist in English. Why then should strict coherence and polyfunctionality be interdependent in German? Recall that MVs are auxiliary verbs in English but full verbs in German. Auxiliarization creates bondedness: auxiliary and full verb form a bonded complex. Strict coherence fulfils an analogous function: the integration of MV and bare infinitive into a single verbal complex creates a similarly bonded construction. In other words, strict coherence and auxiliarity are different parametrizations of the same underlying property of bondedness (see Reis 2004). 2.3
The ontogenesis of epistemicity
It is uncontroversial in acquisition research that children produce circumstantial MVs earlier than epistemic ones (Stephany 1995). The first MV productions reported in the literature have bouletic and ability
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
171
readings, followed by MVs in deontic interpretations. While first epistemic MVs have been documented for some 2;6 year-olds, it is commonly assumed that epistemic readings commonly arise between age 3;5 and 6 or even later (see Doitchinov 2001 and this volume). Producing and understanding modal expressions in epistemic readings involves certain cognitive prerequisites (Shatz and Wilcox 1991), such as the ability to distinguish between actual and hypothetical states of affairs as well as between facts and mental representations of facts. The result that non-epistemic MV readings are acquired earlier than epistemic ones has a straightforward explanation on this account: epistemic reasoning requires the ability to relate the actual world to its possible alternatives, which, in turn, presupposes the availability of a THEORY OF MIND. A child is supposed to have developed a THEORY OF MIND as soon as she is able to refer to mental states (desires, beliefs) of her own and to contrast them with actual states of affairs or with assumptions (mental states) of other persons (Shatz et al. 1983). The contrastive use of mental verbs therefore counts as evidence that a child has acquired the cognitive prerequisites for modal reasoning. Recent studies investigating the acquisition of MVs are primarily concerned with the THEORY OF MIND as the possible basis of epistemicity (Papafragou 2002). While the case study presented here also reviews the use of mental terms, it is primarily focussed on the linguistic prerequisites of epistemicity in relation to the growing command of MV syntax in child language. 3 3.1
The corpus study The data
The case study to be presented here is based on data from the CarolineCorpus in CHILDES. It concentrates on Caroline’s MV use between age 2;3 and 3;0. This period has been chosen because by age 2;2 Caroline has produced each of the German MVs at least once, and because her first epistemic MVs occur between age 2;6 and 2;10. Caroline starts using MVs by the age of 1;8. Table 1 documents the order of her first productions of MVs, and their respective semantic interpretations. In general, Caroline’s early MVs confirm what is known from the literature: will, occurring by age 1;8 for the first time, is the earliest MV, while kann and muss are acquired last, by age 2;2.
172
Veronika Ehrich
Table 1. Caroline’s first MVs
will (1;8) < darf (1;11) < mag nicht, soll nicht (2;0) < kann, muss (2;2) bouletic
deontic
bouletic
deontic
ability, deontic
The data to be discussed in the following sections are presented in groups of three successive MLU stages (see Table 2). Caroline’s early MVs are restricted to bouletic and deontic interpretations at stage I, followed by deontic and realistic MVs at stage II. Her first epistemic modals occur by stage III. Table 2. MLU stages and age groups Stage I
Stage II
Stage III
MLU
2.34
3.41
4.42
Age Groups
2;3 – 2;4
2;5 – 2;8
2;9 – 2;10
Actually, there are only three clearly epistemic examples in the entire corpus. This raises doubt as to whether the data may be considered substantial evidence at all. It has to be admitted that evidence from child corpora is questionable for several reasons: (i) Audio and video recordings, even if taken in short intervals, present just a selection of a child’s linguistic production. Data from such recordings may be taken as representing a child’s linguistic abilities if the behaviour under investigation is sufficiently frequent. Verb-placement studies are a case in point. Epistemic MVs are, however, quite infrequent even in adult language. A few instances of some infrequent linguistic type in a child corpus may thus be incidental rather than reflecting a child’s systematic command of that type. (ii) Investigators can hardly avoid interpreting a child’s utterances on the basis of their adult linguistic knowledge. It cannot be ruled out a priori that a child, who in fact tries to convey a message m (and is thus cognitively capable to conceptualize m), is simply unable to make herself unambiguously understood. The reverse failure is possible, too: a child may have encoded her message in a grammatical form suggesting the interpretation m
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
173
to the investigator, although m does not correspond to the communicative intention of the child. (iii) Spontaneous speech suffers from all kinds of disruptions and is highly elliptical. This is unproblematic for the investigation of adult language because adults may always be assumed to be competent speakers of their mother tongue. But with respect to child language, it is often difficult to decide whether an utterance lacks full grammatical shape because the child has not yet fully acquired the construction in question, or whether the deviation is caused by contingent performance restrictions imposed on spoken language of children and adults alike. With these reservations in mind, I am ready to quote what I see as Caroline’s first epistemic MVs. (8)
***File CHI020322.cha":line 260;) *CHI: nage scheinlich eine reht [?] # . ‘nai[l] [prob]ably [entered?]’ *MOT: Nagel wahrscheinlich eingetreten ? *CHI: ja # scheinlich # vielleicht #1 soll sein # . ‘Yes, [prob]ably, perhaps, may be!’
(9)
***File CHI020708.cha":line 383;) *CHI: steck die # das ## in Mun Mund ## in Mund Mund xxx # . *CHI: des muss ein #1 mal rund #1 gewesen sein ## weil dis #2 ein Knetgummi#3. ‘This must once have been round, because this (is) a kneading gum.’
(10)
***File CHI020918.cha": line 86;) *MOT: so Caroline #1 jetzt sag mir noch mal # hast du Hunger #3 du # ? *CHI: gib mir noch mal ## xx xx eine gute Idee mach # . *CHI: wo ist denn [?] ## ein einen muss da oben sein . ‘Where is the cake? One must be up there.’
File references like ‘CHI020322.cha":line 260’ are to be read as follows: The digits refer to the child’s age at the occasion of the utterance, the first two digits refer to the year, the next two digits to the month and the last two to the day. In other words, Caroline was two years, 3 months and 22 days old when she uttered line 260 of file 020322 in (8).
174
Veronika Ehrich
In the following sections, I will discuss the question whether Caroline’s early epistemics are to be explained in terms of one of the competing accounts of epistemicity sketched in section 2 above. 3.2
The auxiliarization hypothesis
Theoretical approaches as different as grammaticalization theory and generative syntax (often) analyze MVs as full verbs in circumstantial and (always) as auxiliary verbs (functional categories) in epistemic readings. Acquisition data seem to support this view. It has often been observed that German MVs – as opposed to ordinary full verbs – are generated in second position (V2) even in very early child language, when full verbs still occur mainly in final position. Caroline confirms this picture. She produces MVs in V2 from very early on, see (11) for illustration: (11)
***File CHI020121.cha": line 493; *CHI: Ayche # dadze danzen ## . *MOT: was ist mit der Katze ? *CHI: die will tanzen. this-fem. will dance-Inf
The difference between MVs and FVs in terms of verb placement suggests that very young children misanalyze MVs as auxiliaries before recognizing them as full verbs in German (see Clahsen & Penke 1992 for a similar view). Assuming a (hypothetical) Aux-Parameter, set as ‘MV is Aux’ (which needs to be reset for German), would explain the verb placement data. The hypothetical Aux-Parameter might be seen as the developmental counterpart of the Auxiliarization Hypothesis. This would imply that the parameter must be reset for circumstantial MVs only, since – in view of this hypothesis – epistemic MVs are auxiliaries anyway. This is, however, incompatible with the well-established fact that epistemic readings are acquired later than circumstantial ones. If (i) epistemic MVs were auxiliaries anyway, and if (ii) children took German MVs as auxiliaries from early on, they should produce epistemic MVs from early on (provided that auxiliarization is a sufficient condition for epistemicity). The fact that epistemics occur later in acquisition thus doesn’t have a linguistic explanation under the Auxiliarization Hypothesis, and would call for a non-linguistic, cognitive account. While it is true that Caroline uses MVs in the finite V2 position quite regularly, it is also true that these MV-occurrences are in fact finite. Due to
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
175
their preterite-present morphology, German modals are endingless in the 1st and 3rd pers. sing. The lack of inflectional markers in MVs probably facilitates the acquisition of finiteness, and MVs are generated in V2 simply because they are among the first finite verbs in child language (see Jordens 1990, 2002). We compared Caroline’s MVs and her use of non-modal full verbs (FVs) in search of further evidence. The comparison includes the total of Caroline’s MVs for the period between age 2;3 and 2;10 (MLU-stages I to III) and the FV-occurrences from the first two and the last two files of each month in the same period. The result is straightforward: while only half of the occurring FVs are finite, they tend to occur in finite position even at stage MLU I.
Occurrences of MV and FV
Placement of Full Verbs (FV) and Modal Verbs (MV) 800 600 400 200 0 MLU I
MLU II
MLU III
MV Total MV in V2 / V1 FV Total FV-fin FV-fin in V2/V1
MLU Stages
Figure 1. Distribution of Finite MVs and FVs in Finite Positions
In other words, finite (non-modal) FVs and finite MVs do not behave differently with respect to verb placement. Evidence from word order thus does not provide much support for the auxiliarization hypothesis. 3.3
RAISING and CONTROL
The common RAISING/CONTROL diagnostics are not applicable to early child language: young children do not use impersonal constructions or exple-tives, and active/passive equivalence is hardly testable in corpus data
176
Veronika Ehrich
anyway. Accordingly, any attempt to show whether Caroline does or does not master RAISING by age 2;7, at which she produces her first epistemic MVs, will have to rely on alternative – and more indirect – tests. RAISING and CONTROL constructions differ mainly in the way they generate their matrix subject, and this difference might have a direct reflex in child language. We compared corpus-occurrences of wollen (a systematic control verb) and können (a raising verb in most of its readings) with respect to their subjects. Caroline, in fact, makes a clear difference between wollen and können in this respect: she omits the subject of wollen more than twice as often as the subject of können (Figure 2).
Percentage
Missing Subjects 100 90 80 70 60 50 40 30 20 10 0
kann w ill MVtotal FV
MLU I
MLU II
MLU III
Figure 2. Percentages of Missing Subjects for Full Verbs (FV), Modal Verbs (MV) and for kann and will
The proportion of full verbs (FVs) without subjects decreases from 35% at MLU I to 22% at MLU III. During the same period, the proportion of MVs without subjects also decreases from 51% to 32%. But whereas the percentage of omitted subjects remains almost constant for will, there is a remarkable decrease from MLU I to MLU II for kann. This distribution has a straightforward pragmatic explanation: Caroline uses will mainly in reference to her own wishes, preferences, and needs. While the omission of subjects is common in early child language in general, it is even more frequent when a child refers to ego. In fact, almost 90% of the will-occurrences in
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
177
the data are used for reference to ego, as opposed to less than 50% of the kann-occurrences and 50% of the MV-total (see Figure 3).
Percentages
Reference to Ego 100 90 80 70 60 50 40 30 20 10 0
MV-ego kann-ego will-ego FV-ego
MLU I
MLU II
MLU III
Figure 3. Proportion of Ego-Reference in the Use of MVs
The difference between kann and will occurrences with overt/non-overt subjects thus seems to have a purely pragmatic explanation. In adult language, subject omission is, however, not only pragmatically motivated, it is also syntactically constrained. A grammatical subject can only be omitted via TOPIC DROP: omission from the topic position (12a) is grammatical, omission from a non-topic position in the middle field (12b) is syntactically deviant. (12)
a. Wie geht es Max1? (__i) hat gestern angerufen. how goes it Max? (__i) has yesterday called ‘How is Max? (__i) has called yesterday.’ b. Wie geht es Max1? *Gestern hat (__i) angerufen. how goes it Max? Yesterday has (__i) called ‘How is Max? Yesterday has (__i) called.’
Topicalization moves a subject from its home position in Spec-I (or Spec- V) to the topic position. TOPIC DROP occurs when a topicalized phrase isn’t spelled out in that position. Subject topicalization is impossible where the subject position in Spec I has not (yet) been filled. Therefore,
178
Veronika Ehrich
applies to the matrix subject of a raising verb only after the subject of the embedded verb has been raised to the matrix-subjectposition. In other words, RAISING is prior to TOPIC DROP. Control constructions, which base-generate their matrix subjects, impose no such priority constraint on TOPIC DROP. A child who has not yet acquired RAISING will therefore omit subjects of control verbs more easily than subjects of raising verbs. Since kann is a raising verb in most of its readings, the low percentage of subject omissions for kann may be seen as reflecting the fact that Caroline hasn’t yet acquired RAISING by age d 3. On this account, RAISING isn’t a necessary prerequisite for her first epistemic MVs. There is, however, an alternative explanation. Whereas the relative amount of missing subjects is almost equally distributed for kann and will at MLU I (50% of all kann / will occurrences are subjectless at MLU I, see Figure 2), there is a remarkable shift at the subsequent stages: the missing subject rate constantly remains at 50% for will, but decreases to 20% or less for kann, at stage MLU II already. This distribution has a straightforward semantic explanation: will is necessarily interpreted in terms of a subjectinternal ORDERING SOURCE, whereas kann, which occurs not only in ability readings (13), but also in deontic (14) and realistic readings (15), allows for either an internal or an external ORDERING SOURCE. TOPIC DROP
(13)
***File CHI020622.cha": line 183; *CHI: guck mal ich schon wach ## xx mit ## aehm # mit # mit # sch # mit # mit # mit ## Hocke # sch #2. *CHI: ich kann noch ein stehen und schaukeln #1. I can another one stand and swing *MOT: wie bitte ## ? *CHI: schaukel ## auch im Stehn kann ich noch # kann ich schaukeln >...@ swing even when standing can I swing
(14)
***File CHI020628.cha": line 535; Deontic Reading *CHI: wa kann ich malen ## wann kann ich malen ? ‘whe[n] can I paint when can I paint? ’ *MOT: erst wenn du dich hinsetzt #2. ‘only when you sit down.’
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
179
(15) ***File CHI020613.cha": line 136; (MOT and CHI are making puppets) *CHI: da kann man des durchstecken # und ein Clown machen. there can one that through-put and a clown make *MOT: Clown ist schon ziemlich schwer # . The difference with respect to ORDERING SOURCE correlates with overt subject realization. We classified the MVs of the corpus by their ORDERING SOURCE and checked how many occurrences of each class were used with an overt subject. Figure 4 presents the results: deontic readings, which require an external ORDERING SOURCE, are relatively rare at MLU I (13% of all MV-occurrences), but their frequency increases (from 25% at MLU II to 34% at MLU III). Realistic MVs, which also have an external ordering source, are the least frequent MVs (no occurrence at MLU I, 7% at MLU II, and 9% at MLU III), but they hardly ever occur without a subject.
Percentages
Overt Subjects and Ordering Source 100 90 80 70 60 50 40 30 20 10 0
Dispo-sub Deo-sub Real-sub MV-sub FV-sub
MLU I
MLU II
MLU III
Figure 4.
The distribution of overt subjects is quite different: MVs in general as well as MVs in dispositional readings are used with an overt subject in about 50% of their occurrences, whereas the frequency of overt subjects for deontic MVs (with external ordering source) increases from 63% at MLU I to 83% at MLU III. Caroline, obviously, has the capacity to produce either
180
Veronika Ehrich
full clauses with subjects or elliptical ones without, but she seems to avoid the effort of producing a full clause whenever the semantics of the MV ensures that the intended message will be recognized anyway. Subjects of FVs and dispositional MVs (with internal ordering source) are randomly omitted between MLU I and MLU III, whereas subjects of deontic and realistic MVs (with external ordering source) are spelled out on a quite regular basis by stage MLU II. In other words, the increasing availability of different MV readings goes along with the development of a more elaborate syntax. 3.4
Bare infinitives and strict coherence
According to Reis (2001), STRICT COHERENCE is the syntactic correlate of epistemicity, its defining feature being the embedding of a bare infinitive. Thus, on the STRICT COHERENCE account, Caroline should have acquired the bare infinitive constraint on MVs before she produces her first epistemic MV readings. In fact, even some of her earliest MVs combine with a bare infinitive (see (11) above for illustration), and, by age 2;3, Caroline even distinguishes bare infinitives and zu-infinitives (16 -18). But she is not very consistent in this respect, sometimes she omits the infinitive ending (18), and, in 38% to 46% of her MV-productions she fails to embed an infinitive at all. (16)
***File CHI020302.cha": line 418; *MOT: guck mal wer wohnt denn in dem schwarzen Haus? *CHI: ja #2 ein Dach ## musst du malen #1. yes a roof must you paint-Inf *CHI: ich mal #3 Dach # ein Dach #1.
(17)
*** File "90-02-17.cha": line 235. *CHI: ja #2 brauch keine Angst zu haben die Ente # . yes need not be afraid the duck *MOT: ja aber die Eulen die fressen nämlich manchmal Enten # ***" CHI020325.cha":line 30; **MOT:aber du kannst zum Beispiel # ne Strumpfhose naehen. *CHI: strumpf # trumpf # Hose naeh kann doch nicht # . panty panty hose sew can yet not
(18)
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
181
While being able to obey the bare infinitive constraint in principle, she avoids the infinitive quite frequently. This suggests that integration of MVs and infinitives into a strict coherent construction poses considerable difficulties for her (compare Table 3). Table 3. Distribution of Infinitives. The first figure gives the absolute number of MVs in each group, the second figure gives the absolute number of MVs embedding bare infinitives in that group, percentages of embedded infinitives in parentheses. MVtotal 81, 31 (38%) MLU II 540, 263 (46%) MLU III 282, 122 (43%) MLU I
Dispositional 46, 13 (10%) 340, 123 (36%) 127, 57 (44%)
Deontic 10, 9 (90%)
Realistic 0
136, 83 (61%) 71, 61 (81%)
43, 42 (97%) 27, 22 (81%)
Again, there are remarkable differences with respect to the different MV readings. Deontic and realistic MV occurrences, which are based on an external ordering source, occur with bare infinitives almost twice as often as dispositional MVs based on an internal ordering source. Obviously, Caroline’s performance on the bare infinitive constraint and her performance on the overt subject requirement follow the same strategy: dispositional MVs occur in more elliptical constructions, whereas deontic and realistic MVs tend to be used in fully integrated structures with overt subjects plus embedded infinitives. We measured Caroline’s MV-productions for the degree of integration. MV-constructions containing a bare infinitive in addition to an overt subject are counted as fully integrated (Integration Factor =1), MVs accompanied by either a bare infinitve or an overt subject count as partially integrated (=0.5). A zero-degree of integration (=0) is assumed where a MV construction lacks a bare infinitive as well as a subject. We calculated the Mean Integration Factor for each MLU stage by adding up the values obtained for the individual MVs at a given stage and dividing the sum by the total number of MVs occurring at that stage. Figure 5 shows that the degree of integration is lowest for MVs in dispositional readings, and highest for MVs in realistic readings (= 0.79 at MLU III).
182
Veronika Ehrich
Integration Factor 1 = 100
Integration Factors 100 90 80 70 60 50 40 30 20 10 0
MV-total MV-disp MV-deo MV-real
MLU I
MLU II
MLU III
Figure 5. Integration Factors for MVs in Different Readings
These data, again, are evidence for a close interaction between the syntax and the semantics of MVs in child language. This does not necessarily entail that syntax is the source of MV semantics or vice versa. It may very well be the case that semantic and syntactic capacities, while having developed separately up to a certain time (each in its own way and temporal order), converge at a certain point in bringing about a growing variety of MV readings with their specific syntactic shapes.
3.5
Evidence for a developing THEORY OF MIND
In order to find out whether Caroline had developed a THEORY OF MIND when producing the first epistemic MVs, the corpus was checked for occurences of mental verbs like wissen (‘know’), denken (‘think’), meinen (‘mean’), verstehen (‘understand’), glauben (‘believe’), finden (‘judge’) and vergessen (‘forget’). These verbs are used in reference to mental states in 40% of their overall occurrences (Benz 2004). See (19) for illustration:
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
(19)
183
***CHI020413.cha": line 15. *MOT: sprichst du mit deiner Puppe #1? *CHI: ja # ja #1 [=! stoehnt] . *CHI: kann nich hinstelln # . *CHI: weisst du genau #6. know you exactly
Caroline’s use of sentential adverbs like vielleicht (‘perhaps’) is further evidence for her ability of modal reasoning. She uses vielleicht in a deliberating function at almost the same age at which she produces her first epistemic MVs (20). (20)
***CHI020814: line 81. *MOT: und da # hat er da sein Taschentuch #1? *CHI: nein ein Baby #1. >...@ *MOT: und warum sitzt es da #1? *CHI: vielleicht # ist da #2 in Papis Bauch #2. perhaps is there >a baby@ in daddy’s belly *MOT: aber Papis haben keine Babys im Bauch ##
Caroline talks about consequences resulting from a possible action and uses first conditionals. The one in (21), though connected to an ongoing action, proves that she starts reflecting about alternative futures resulting from her actions already by age 2;7. (21)
***CHI020710.cha": line 237. *MOT: dis # ah ja ich probier es jetzt mal nur mit falten . *CHI: nein dis # brauchst du ## zum Kleben #3. *CHI: wenn dis #gar nich geht ## machen wir ohne ##Klebe #1. if this not works do we without glue *CHI: ich mach #1 [=! stoehnt].
There is, thus, firm evidence that Caroline has acquired an elementary by age 2.7. She has not only acquired inferencing capacities but is also able to express her reasoning in appropriate linguistic terms. THEORY OF MIND
184
Veronika Ehrich
4
Conclusion
MV-acquisition studies of the last twenty years have been mainly concerned with cognitve constraints on the rise of epistemic meanings, whereas the form-meaning correlation has hardly been tackled. By contrast, the present study is focussed on the interaction between the syntax and the semantics of modal verbs as evidenced by Caroline’s development. Caroline’s production of MVs shows that form and meaning of MVs are indeed tightly connected. This is not primarily a function of the MODAL BASE contrast between circumstantials and epistemics, but seems to depend on the contrast between INTERNAL (ability and bouletic readings) and EXTERNAL ORDERING SOURCEs (deontic, realistic and epistemic readings). Before using her first epistemic MVs by age 2;7, Caroline starts varying her syntax for circumstantial MVs. While elliptical constructions lacking a subject phrase, a bare infinitive, or both, are predominant in bouletic and ability readings of MVs even beyond age 2;10, Caroline uses a more elaborate syntax for deontic and realistic readings by age t 2;4. The increase in semantic MV variation goes along with the production of more full-fledged syntactic structures. Caroline’s growing capacity for handling semantic polyfunctionality and her growing command of MV syntax converge in the period from age 2;4 to 2;10. This is also the age when she produces her first epistemic MVs. But syntactic development in general, and Caroline’s growing command of strict coherence in particular, are probably not the only source of epistemicity. The fact that reference to mental states, first epistemic adverbs and conditionals temporally overlap with her first epistemic MVs indicates that the cognitive basis for modal reasoning develops across various grammatical categories. Obviously, syntactic progress, semantic diversification and cognitive development are all necessary prerequisites for the rise of epistemicity, but none seems to be sufficient by itself. The data reported here do not support any monodirectional account in terms of syntactic vs. semantic boot-strapping, nor in terms of strict cognitivism. Caroline’s first epistemic MV uses seem to arise from converging developments in syntax, semantics and cognition. She makes use of whatever evidence is available to her, in order to gain access to the grammar of MVs, and, of whatever capacity she has, in order to make herself understood. This is, perhaps, just the way language development works.
Linguistic Constraints on the Acquisition of Epistemic Modal Verbs
185
Acknowledgements I would like to thank the editors and two anonymous reviewers for their helpful comments and suggestions. I am greatly indebted to the members of the ‘Modal Verb Project’ in the SFB 441, especially to Marga Reis, who shared their ideas about the syntax, semantics, and acquisition of modal verbs with me.
References Benz, Judith 2004 Epistemische Ausdrücke in der Kindersprache. Zulassungsarbeit, Tübingen: Deutsches Seminar. Clahsen, Harald and Martina Penke 1992 The Acquisition of Agreement Morphology and its Syntactic Consequences: New Evidence on German Child Language from the Simone-Corpus. In Jürgen M. Meisel, (ed.), The Acquisition of Verb Placement, 181-223. Dordrecht: Kluwer. Doitchinov, Serge 2001 „Es kann sein, dass der Junge ins Haus gegangen ist“. Zum Spracherwerb von können in epistemischer Lesart. In Reimar Müller, Marga Reis, (eds.), Modalität und Modalverben im Deutschen. Linguistische Berichte, Sonderheft 9: 109 – 134. Hamburg: Buske. 2005 Why do children fail to understand weak epistemic terms? An experimental study. [This volume]. Jordens, Peter 1990 The Acquisition of Verb Placement in Dutch and German. Linguistics 28: 1407 – 1448. 2002 The acquisition of verb placement in Dutch and German. Linguistics 40: 687 – 765. Kiss, Tibor 1995 Infinitive Komplementation – Morphologie, thematische und syntaktische Relationen. Neue Studien zum deutschen verbum infinitum. Tübingen: Niemeyer. Kratzer, Angelika 1991 ‘Modality’. In Arnim v. Stechow and Dieter Wunderlich, (eds.), Semantik. Ein internationales Handbuch der zeitgenössischen Forschung, 639-650. Berlin/New York: de Gruyter. MacWhinney, Brian B. 2000 The CHILDES project: Tools for Analysing Talk. Third edition. Mahwah, NJ, Lawrence Erlbaum Associates. http://childes.psy.cmu.edu Müller, Reimar and Marga Reis (eds.) 2001 Modalität und Modalverben im Deutschen. Linguistische Berichte, Sonderheft 9. Hamburg: Buske.
186
Veronika Ehrich
Öhlschläger, Günther 1989 Zur Syntax und Semantik der Modalverben des Deutschen. Tübingen: Niemeyer. Papafragou, Anna 2002 Modality and theory of mind. Perspectives from language development and autism. In Sjef Barbiers, Frits Beukema, and Wim v.d. Wurff (eds.), Modality and its interaction with the verbal system, 185-204. Amsterdam: Benjamins. Reis, Marga 2001 Bilden Modalverben im Deutschen eine syntaktische Klasse? In Reimar Müller and Marga Reis (eds.), Modalität und Modalverben im Deutschen. Linguistische Berichte, Sonderheft 9: 287-318. Hamburg: Buske. 2004 Modals, so-called Semi-Modals and Grammaticalization. Unpublished ms., Tübingen University. Roberts, Ian and Anna, Roussou 1999 A formal approach to grammaticalization. Linguistics 37: 1011-1041. Ross, John 1969 Auxiliaries as main verbs. In William Todd (ed.), Studies in Philosophical Linguistics. Series I: 77-102. Evanston, Ill.: Great expectations Press. Shatz, Marilyn, Henry M. Wellman, and Sharon Silber 1983 The acquisition of mental verbs. A systematic investigation of the first reference to mental state. Cognition 14: 301-321. Shatz, Marilyn and Sharon A. Wilcox 1991 Constraints on the acquisition of English modals. In Susan A. Gelman and James P. Byrnes (eds.), Perspectives on language and thought: interrelations in development, 319 - 353. New York: Cambridge University Press. Stechow, Arnim v. and Wolfgang Sternefeld 1988 Bausteine syntaktischen Wissens. Ein Lehrbuch der generativen Grammatik. Opladen: Westdeutscher Verlag. Stephany, Ursula 1995 Function and Form of Modality in First and Second Language Acquisition. In Anna Giacalone Ramat and Grazia Crocco Galeas (eds.), From Pragmatics to Syntax. Modality in Second Language Acquisition, 105-120. Tübingen: Narr. Wurmbrand, Susanne 1999 Modal verbs must be raising verbs. In Proceedings of the 18th West Coast Conference on Formal Linguistics (WCCFL 18): 599-612. Somerville, MA: Cascadilla Press.
The Decathlon Model of Empirical Syntax Sam Featherston
1 Introduction This paper reports our investigations into the data base of syntactic theory, specifically addressing the similarities and differences between corpus data and experimentally obtained well-formedness judgements and sketching the implications for the construct of grammaticality and the architecture of the grammar which our findings have. The motivation for these studies was a dissatisfaction with the state of affairs in syntax, when two syntacticians can look at the same phenomenon and come up with widely differing analyses of what is going on. Another disappointment is the lack of any real forward movement in theory: alternative analyses seem to succeed each other more due to fashion than due to falsification. We might say that syntactic description, let alone syntactic explanation, is underdetermined by its data base. This is in part due to the nature of the available evidence: most data feeding into syntactic theory has significant flaws: it is fuzzy and it reflects multiple factors, only some of which are relevant to theory. These factors are difficult to identify and even more difficult to distinguish (eg Sch¨utze 1996). Judgements have been particularly criticized as a data type, partly because of their inherent qualities, but partly for the way that they have been used (eg Labov 1996). One problem is that, faced with the impreciseness of judgement intuitions, researchers have idealized this data type to a very great degree, reducing the scale to a binary opposition, with marginal values as unclear cases. In part as a response to this situation, some syntacticians have sought other data sources, such as corpus frequencies and processing studies, which has tended to split the field and furthered the development of schools of syntax, who have neither a common formalism nor a common data base which would permit rapprochement between them. Related to this diversification into schools, a range of different grammar architectures have arisen. Generative syntacticians most commonly still use judgements, and assume a “live rail” grammar, in which any infringement of a grammatical rule causes a structure to be excluded absolutely. Those interested in competition models such as Optimality Theory (OT, Prince and
188 Sam Featherston Smolensky 1993) will tend to use frequency data and allow some idealization, while those favouring probabilistic models will tend to take a more finegrained approach to frequency data, and account for the variants using probabilities (eg Manning 2003). It need hardly be said that these models of the architecture of the grammar cannot all be right. Our view was that a more detailed study of data types and their characteristics might provide a way forward. When we have a more detailed understanding of the factors which each reflects and thus how the different types relate to each other, then we shall be in a better position to judge the evidence of each for syntax. This should also allow us to establish a well-founded procedure for idealization for each. With more finely grained data, we should be in a better position to determine how the grammar functions, and which architecture is therefore the correct one. In the following, we first sketch the sort of studies we have undertaken, outline the broad picture of the results, and then move on to the implications that these findings have for the relationship between data and theory and for the nature of the grammar. It turns out that the grammar has a rather different architecture to what is generally assumed. Note that this article aims to provide an overview of our results and our interpretation of their wider implications for theory; space does not permit discussion of the individual studies (see Featherston 2002, 2003, 2004, 2005a, 2005b). 2 Our studies We have carried out many studies using both frequencies and judgements, aiming firstly to clarify these issues of data and data type, secondly to clear up outstanding questions in the syntax, and thirdly to clarify the nature of the grammar. We have performed experiments on German and English, and have addressed a range of syntactic structures, among others island constraints, reflexives, reciprocals, word order, parenthetical insertions and echo questions. Our frequency data for German is drawn from the COSMAS I corpus of German (IDS, Mannheim), and for English, the British National Corpus (Oxford). We have generally elicited our judgement data using a variant of the magnitude estimation procedure (Bard et al. 1993). This method has three main differences to standard judgement elicitation. First, only relative judgements are gathered. Subjects are asked whether example A sounds more or less “natural” than example B, and by how much, but no absolute criterion of well-formedness is used. This distinction between relative and categorical judgements is important (see Section 5.2 below), but it also has the simple
The Decathlon Model 189
practical advantage that it defuses the problem of having to define a cutoff point between well-formed and ill-formed. Second, to anchor the judgements, subjects give their judgements relative to reference items and to their own previous judgements. Third, there is no imposed scale; no top or bottom limit nor minimum division between scores. Judgements are expressed in numerical form, and decimal fractions are allowed. This method allows informants to express all the differences in “naturalness” that they perceive, with no coercion to a given scale. When the limitation to a scale selected by the linguist is removed, the results exhibit more differentiation than conventional judgements are assumed to contain. But this additional information is an inherent part of grammaticality judgements, which was always potentially present. Previous collection methods were insufficiently sensitive to reveal this detail; deliberately so, since asking for categorical judgements is a form of idealization, of simplifying the explanatory task by reducing the amount of information gathered. In this paper we argue that this idealization has not only, as intended, simplified the job of explanation, but also distorted the picture, and led to some false conceptions of the way that the grammar and associated systems work. 3 Relative judgements In this section we sketch out the general pattern that the results of the judgement studies showed. This is necessary, since these experimentally obtained judgements reveal a very different pattern to that often assumed (but see Keller 2000 and Cowart 1996 for discussion and much insight). Firstly, judged well-formedness is a continuum. Figure 1 shows the results of a typical experiment gathering judgements from informants who are not forced to use a particular scale. In the graph, the four different syntactic conditions tested appear along on the horizontal axis and the mean judged wellformedness on the vertical axis, with higher scores reflecting “better” judgements. The error bars show the mean values and 95% confidence intervals of the scores for each condition. Let us be clear that these mean judgements can show no differential effect of lexis, context or plausibility, since these factors are fully controlled for. These error bars show only effects of structure. This experiment looked at the effect of discourse linking (Pesetsky 1987), which one might loosely paraphrase here as the effect of wh-item type on the permissable order of wh-items in multiple wh-questions. The standard view of the data is that (1a), (1b), and (1d) are good, but (1c) is ungrammatical, so syntacticians look for a factor affecting (1c).
190 Sam Featherston
Figure 1. Given freedom of judgement scale, informants do not just distinguish ’good’ and ’bad’ structures, but also ’good’ ones and ’better’ ones.
(1)
a. Who ate what? b. Which person ate which food? c. *What did who eat? d. Which food did which person eat?
The results of this study confirmed that (1c) is plainly worse than the other conditions, but the data also reveals that (1b) is not only good, but also clearly better than (1a) and (1d). What is more, (1b) is just about as much better than (1a), as (1d) is better than (1c). The factor we should be looking for therefore applies to both (1b) and (1d). If we use a model of well-formedness idealized to a binary opposition, in which (1a), (1b), and (1d) are all just good, not only do we do serious violence to the data, but we will also be looking in the wrong place for the correct syntactic account. In order to deal with this data, we must have a model of well-formedness as a continuum, on which there is not only good and bad, but also good and better. A model with good, bad, and intermediate positions (such as example structures in syntax with a question mark) will not suffice here. It follows that there are cases where the correct syntactic analysis of a structure can only be represented with a model of well-formedness as a continuum. In the following, we will illustrate our points about the data with reference to a particular piece of work in which both judgements and frequencies were collected. Let us be clear that this is just one example study, but other studies of the same type show the same basic patterns. The focus of this work was
The Decathlon Model 191
Figure 2. The data pattern form judgement studies. Given the choice, informants do not choose to bunch structures as good or bad, instead they produce a continuum of well-formedness.
the realizations of coreferent objects in the mittelfeld in German. Example (2) shows just eight of the possibilities; we also tested full NPs as antecedents. Full linguistic details of this work are in Featherston (2002), but these are not necessary for a full understanding of the present paper. Note that we tested 16 conditions in the original study, but here we shall sometimes just report on eight of them, for clarity. (2)
a. b. c. d.
ihni (selbst) im Spiegel gezeigt habe weil ich ihmi in.the mirror shown have as I him.DAT him.ACC self ihmi (selbst) im Spiegel gezeigt habe weil ich ihni in.the mirror shown have as I him.ACC him.DAT self Spiegel gezeigt habe sichi (selbst) im weil ich ihmi in.the mirror shown have as I him.DAT REFL self Spiegel gezeigt habe sichi (selbst) im weil ich ihni in.the mirror shown have as I him.ACC REFL self
Figure 2 shows the results of this study, this time with eight conditions. Here we have ordered the conditions not by their linguistic features (for this see Figure 3 below), but in order of their judged well-formedness, from best to worst. The judged well-formedness of these structures descends gradually. Looking at this continuum, it will be clear that the choice of any point at which to locate the cut-off point between well-formed and ill-formed will
192 Sam Featherston
Figure 3. This graph shows the results of the same judgement study on eight structural variants defined by three binary features. All syntactic features have an effect upon the judgements, and these effects are cumulative.
be arbitrary. These examples straddle the putative location of the cut-off point, since the best among them would be regarded as well-formed and the worst ones as ill-formed. In fact informants show no sign of using categorical well-formedness when given the option not to, instead they always use a continuum. We put forward our explanation of the intuition of categorical well-formedness in Section 5.2. We now turn to Figure 3, which shows the same data set but with the conditions ordered by their grammatical features. This was a 2x2x2 experimental design, that is, we tested eight structural variants differing on three binary parameters: a b c. The graph shows the conditions across the horizontal axis and the mean judged well-formedness on the vertical axis, with higher scores indicating better judgements. Each pair of error bars linked by a line is a minimal pair differing only in one of the three features a b c. In each case one of the pair violates a constraint and the other does not. We annotate the conditions in the graphic with the numbers 1 to 8 for easy identification, but the values of the three syntactic features for each condition are given on the baseline. For example, condition 1 has the values a:1, b:0 and c:0, which means that it violates constraints b and c, but not a. Let us look at the pattern of data. The clear finding is that well-formedness judgements directly correspond to the syntactic conditions, that is, the conditions are judged well-formed to the extent that they do not violate the syn-
The Decathlon Model 193
tactic constraints, but any and every constraint which is violated affects the judgements. It is evident that each constraint differentiating a minimal pair has a consistent effect upon the judgements: the relationship between the scores assigned to each pair differentiated by a given constraint is the same. So the relationship between conditions 1 and 2 is the same as between 3 and 4, and between 5 and 6, and between 7 and 8. Put differently, the relation between all 1bc and all 0bc is consistent. Notice also that this is generally true: for each pair ab0 and ab1, and a0c and a1c, the relationship is the same. Whether the structure was good or bad before does not matter: the application of these constraint violation costs is blind and automatic. Notice also that the pairs close to each other (eg 1 and 2, 3 and 4,...), linked by the short broken line, differ in their ratings only moderately, which shows that this particular constraint has a relatively small violation cost. The other two sets of minimal pairs (1 and 3, 2 and 4, etc and 1 and 5, 2 and 6, etc) have greater violation costs, and consistently so, but the systematicity is just as evident. The violation of a given linguistic constraint entails a given difference in judgements. We can say that each linguistic factor has a quantifiable, and constraint-specific effect upon judged well-formedness. An additional important point (Keller 2000) is that these violation costs are cumulative. The violation of any constraint entails a violation cost in judged well-formedness, to which any further violation costs are added. It is thus systematically the case that more violations cause a structure to be judged worse. This raises the question whether any of these constraints could be regarded as a ‘hard’ constraint. Perhaps the traditional definition of a ‘hard’ constraint is one which excludes a structure from being part of the language. In judgement data we might expect a ’hard’ constraint to cause a violating structure to drop to the bottom of the scale. One might predict that a ’hard’ constraint would cause a structure to be judged so bad that no further additional constraint violation could make it any worse. Perhaps surprisingly, experience with this data type has shown that there is apparently no such thing as a ‘hard’ constraint on this definition. The effect of a violation is only ever to make a structure worse, by an identifiable amount; no constraint violation makes a structure so bad that it cannot be made worse by an additional violation. We refer to this quality of linguistic constraints in judgement studies as survivability, which is best understood in contrast to the OT concept of violability. OT’s violability means that under certain circumstances, constraints have no effect on the output, that is, they fail to apply. This is in part necessary because the only effect that a constraint can have in OT is to exclude categorically the violating structures - OT only has ‘hard’ constraints. Our
194 Sam Featherston Table 1. Data from COSMAS, IDS, Mannheim (531 million word forms) ihni ihmi ihmi ihni ihmi sichi ihni sichi ihni ihmi selbst ihmi ihni selbst ihmi sichi selbst ihni sichi selbst
(“him.ACC him.DAT”) (“him.DAT him.ACC”) (“him.DAT REFL.ACC”) (“him.ACC REFL.DAT”) (“him.ACC him.DAT SELF”) (“him.DAT him.ACC SELF”) (“him.DAT REFL.ACC SELF”) (“him.ACC REFL.DAT SELF”)
0 hits 0 hits 0 hits 1 hit 0 hits 0 hits 0 hits 14 hits
survivability means that all constraints always apply, exceptionlessly, and a given violation always has the same effect – there is no probabilistic element at all. The effect of a constraint violation is to cause a structure to be judged worse, but no violation excludes a structure. We lay out what can exclude a structure under Section 5.2 below. Notice that this is strong evidence that our informants are not using occurrence as a criterion when they give a judgement – they must be assumed to be responding to something else. In the light of the finding that violation costs as measured in judgements of well-formedness are cumulative, survivable and blindly applied, on the one hand, but, as we shall see in Section 5.1 below, not directly related to output frequency on the other, it seems reasonable to assume that these violation costs, and hence well-formedness as measured by judgements, are related to computional workload. This raises questions about what psycholinguistically plausible mechanism might allow us to convert cognitive workload into judgements, and why we have such an ability. We take up these questions in Section 5.2, but we now turn to consider the evidence of frequency data. 4 Judgement data and frequency data Frequency data reveals a very different pattern. Table 1 contains the data pattern of the frequency study looking at the same variants of object coreference structures as those judged in Figures 2 and 3. The important point here is the distribution of forms found in the corpus: one structure is found fourteen times, another one is found once, but none of the others appear at all. Frequency data shows evidence of a competitive interaction of candidate forms, which would seem to indicate that the “best” structure of a comparison set usually wins through to be produced. Intuitively, this seems to be evidence
The Decathlon Model 195
Figure 4. The contrast between COSMAS frequency data and experimental judgement data on the same phenomenon.
that there is a competition function in the grammar, which in particular Optimality Theory has raised to its central operating principle. Interestingly, slightly less “good” alternatives are sometimes produced, which would suggest that the competition for output functions probabilistically. This is the motivation for stochastic versions of OT (eg Boersma and Hayes 2001). Figure 4 allows us to compare the two data patterns directly, as it superimposes the two different measures of the same sixteen structures on a single graph. The error bars show the mean normalized judgements obtained for the sixteen structures tested (left-hand scale). These can be seen to increase steadily from the very bottom to the very top, while the frequencies (right-hand scale), represented by the line without error bars, creep across the bottom at zero, and only rise sharply at the right-hand end. The comparison of these two measurements of the same structures brings the contrast of the data patterns into sharp focus. The first point to notice is their similarity: the same structures come top in both data types. The highest frequency structure is judged best and the next highest is judged second best, which makes it seem likely that the two data types are at least in part measuring the same underlying factor. But we should also note the key difference: the judgement data demonstrates that at least some part of the human linguistic computation mechanism is sensitive to differences among structures which are so bad that they would never be produced, for the structural variants on the left are surely so bad that they would never appear in any corpus, no matter how big. Since this is the case, it is plain that the two data types
196 Sam Featherston are also in part not measuring the same factor. We can therefore exclude categorically the possibility that relative judgements merely reflect frequency or probability of occurrence in some way. The attested frequency and probability of occurrence of the worst two thirds of these structural variants is exactly the same, and it is zero. These structures have in all likelihood never been used in all of human history, but our subjects can readily distinguish them in judgements, and do so very consistently. Our Decathlon Model of wellformedness and the architecture of grammar attempts to specify what process differentiates the two types of data, frequencies and relative judgements. 5 The Decathlon Model The name of this model derives from the athletic discipline of the decathlon. In this event, competitors take part in ten different sub-disciplines, and their performances are converted into a numerical form according to a set of standard scoring tables. The sum of these scores decides who wins the medals. But the scores are calculated not on their relative performance in the subdisciplines, but in their absolute performances, which means that whether an athlete comes first, second, or third in a sub-discipline is of no significance, what matters is that they perform at their personal best. In a sense therefore, they are not so much competing against each other at this stage as against themselves. Competition between competitors takes place at the second stage, where the ten numerical scores are totalled, and the highest scorer takes the gold. Something similar seems to us to be happening in human linguistic processing, as will become clear in this section. The Decathlon Model is at once an outline architecture of a grammar and at the same time an account of the differences between data types. Our finding that gradience reflects a real psychological phenomenon related to constraint violation cost (see Section 3) demands that the architecture of syntax reflect this reality, which current models generally do not do. An empirically adequate and psychologically real grammar must have the following features: quantifiable violation costs, a continuum of well-formedness, and survivable constraints (ie no constraint violation necessarily results in the exclusion from the language of the violating structure); all this to account for our judgement data. It must also generate output competitively and probabilistically so as to reflect the data patterns observed in frequencies. The obvious way to achieve this is for our syntax model to distinguish between a grammatical module which applies syntactic constraints and another which selects output. Our Decathlon Model thus has a Constraint Applica-
The Decathlon Model 197
Figure 5. The Decathlon Model of the grammar and grammaticality.
tion module, which applies constraints, assigns violation costs, and outputs form/meaning pairs, weighted with violation costs. We know certain things about the internal functioning of this module: constraints are applied blindly and exceptionlessly, and violation costs are cumulative. We may think of this module as containing the grammar, though it also contains the other factors which affect well-formedness in judgements. The second module, Output Selection, functions quite differently. Its task is to select from the possible form/meaning pairs the form which is to be output (in production processing) or the interpretation to be assigned to an input (in receptive processing), and exclude the others. It functions competitively and selects the best candidate on the basis of the weightings assigned by the Constraint Application module. This selection occurs probabilistically however, which accounts for occasional production of sub-optimal versions. In Figure 5 we see the computational steps which generate frequency data and judgements. In production we assume that an unformed message is delivered for formulation in the Constraint Application module, drawing on the resources of the lexicon. Incrementally, perhaps phrase by phrase, candidates for the linguistic representation together with their weightings are proposed to the Output Selection function, which selects the best, or one of the best. The arrows exiting the left-hand module show the candidate continuations of the structure passing to the selection module, their weightings represented by their offset positions. Sometimes two continuations will be roughly equally good: She turned the light off vs She turned off the light, in which case both will have about equal weightings and both will occur. Receptive processing makes use of the same two modules, Output Selection choosing in this case what form/meaning pair to assign to a given form, the input, rather than
198 Sam Featherston choosing what form/meaning pair to assign to a given meaning, the message the speaker wishes to convey. Giving judgements is a little different. The example is input processed as usual to determine its structure and meaning, but instead of returning the output of the selection module, relative judgements consist of returning the output of the Constraint Application function. Recall that Constraint Application outputs form/meaning pairs with a weighting. This of course requires the claim that the output of this module can be consciously accessed, as well as merely passed on as usual for selection. The capacity to be aware of finegrained cognitive workload is not something which we might have predicted for ourselves, but it is nevertheless not implausible, since we are certainly aware of more coarse-grained thinking effort. The difference between frequency measures and relative judgements can therefore be attributed to them being the outputs of two different modules of linguistic processing, both of which are independently motivated. This model has a number of explanatory advantages. First, it is firmly based on the primary data of syntax. It accounts for the differences in outcome patterns between data types, an outstanding question in linguistics. Frequency data reflects the output of the Output Selection module, which is (necessarily, since we produce only one form of an utterance) competitive. Since this module uses the weightings which are output by the Constraint Application module, we account for the fact that judgements and frequencies agree in identifying the same forms as optimal. These weightings are themselves functionally motivated by their identification with computational complexity, an explanatorily economical association, since we know of the existence of workload effects from other sources, such as processing data. The fact that output selection occurs probabilistically accounts for the occasional production of sub-optimal versions: rare but documented counterexamples in corpus data are thus no threat to grammatical generalizations in this model. Note that this is not an unprincipled method of accounting for awkward data, on the contrary, it makes strong and testable predictions: the most frequently occurring variant should be that which is judged best, but the much lower frequency of alternative variants should be strictly in order of their judged well-formedness. Second, it ties the grammar in to evidence from sentence processing. It is consensual that syntactic processing operates on-line, incrementally, and applies information from multiple sources in order to take decisions. It has often been suggested that the processor consists of a constraint component and a decision component which prunes less optimal interpretations or outputs
The Decathlon Model 199
(see Featherston 2001 for discussion of parser types). Our model is compatible with the evidence that we readily understand structures with errors, for example. This makes it necessary that we should be able to assign a structure to input which contains faults. Our Constraint Application model can account for this well-documented characteristic, since structures with constraint violations are not immediately excluded from the language but merely given more negative weightings. No model which assumes that a grammatical violation cost is identical with exclusion from the language can do this. This fault-tolerant quality greatly extends the range of linguistic data that the grammar can account for. A third strength is that it provides some explanation of the wide variation in grammar architectures that we find competing in linguistic theory. Each of these captures a part of the fuller picture that we have sketched: until the recent interest in competition in syntax (eg M¨uller and Sternefeld 2001), it was generally the case that all constraints were thought to apply to all structures, unorderedly, blindly, and automatically. Our judgement data confirms the empirical reality of this and it is reflected in our model. OT, by contrast, is entirely committed to competition, motivated by the insight that it is generally the best of any competing set of structural alternatives which is produced. This too reflects a real aspect of the empirical data: the process of selecting a form to produce necessarily results in a competitive interaction – the non-occurrence of anything but the best. This is thus included in the Output Selection module in our model. Probabilistic grammars (cf Manning 2003) too have their motivation: there is indeed a probabilistic component in the linguistic production system, although our relative judgements suggest that it is no part of the grammar, which operates blindly and exceptionlessly, but is located further downstream at the selection stage. Each of these grammar types can achieve some success because each reflects an aspect of the data: the Decathlon Model shows that they need not be contradictory, and includes all three features simultaneously. Our fourth and last explanatory advantage concerns the position of the grammar in the wider picture of evidence about the way language works. Our model allows the syntax to cover a much wider range of phenomena. Such issues as linguistic variation and language acquisition can be accounted for in a model with exceptionless constraint application but a parameter of violation cost strength. For example, Aissen & Bresnan’s (2002) Stochastic Generalization notes that similar constraints may be found cross-linguistically, but they appear grammatical and categorical in one language while being mere statistical tendencies in another. We have a ready account of these findings:
200 Sam Featherston the same factors exist across languages, but their violation costs vary, due to interactions of constraints (for the superiority effect in German and English as an example of this, see Featherston 2005b). Not only the differences between languages, but also regional, sociolinguistic and even idiomatic variation can be encoded as differences in violation cost amplitude. The learning of the language-specific parts of these violation strengths can thus be seen as a part of the acquisition of syntax. Our model thus offers a far wider view of the linguistic environment than most approaches to syntax. In this it bears a resemblance to the syntax of the sixties and seventies, when questions about the position of grammar in a more general cognitive setting were a standard issue for syntacticians. More recently they have tended to see their role as developing grammars within a psycholinguistic framework which in the meantime has become not merely a consensus, but rather a part of the set of basic assumptions of syntactic theory. Syntacticians now tend to devise syntactic analyses within this given conceptual space, rather than question the shape and extent of the space itself. In our work we have aimed to re-open this debate, and revisit these assumptions in the light of the new data available. 5.1
Well-formedness does not directly trigger occurrence
Our model is also supported by data from the interaction of well-formedness and occurrence. The standard assumption is that the functions of constraint application and output selection are not to be distinguished, and that they take place with the same module. In generative grammar, this would predict that any structure which is generated and does not violate any constraint on structure is grammatical and may be produced, while in OT the last candidate remaining, the only well-formed one, is produced. Both of these thus assume that production depends directly on the grammar, and that well-formedness directly determines occurrence. The Decathlon Model however claims that production competition determines output, so that there is no single level of well-formedness that triggers occurrence in the output. In the light of this, consider the results of the experiment in Figure 6. This figure shows the results of an experiment which contained three unrelated sub-experiments, with their mean judgements indicated by error bars as before, arranged in ascending order of well-formedness, by sub-experiment. Each group of error bars is thus a set of structural alternatives competing to represent given semantic contents. This is clearest in the set on the right-hand side, where all are competing to represent a single semantic content, whereas
The Decathlon Model 201
Figure 6. The mismatch of well-formedness and occurrence: Production is competitive.
the middle group are competing for three different semantic contents, and the left-hand group are competing for four different semantic contents. In each set, those structures which were found to occur in the COSMAS I corpus (IDS, Mannheim) are above the line, while those which do not occur are below the line. It is striking that the structures which occur always appear in a solid block, from the top of the group. This alone is strong evidence of competition for production, based on the weighting information which we can access as judgements. However, notice that the best two structures from the right-hand group, which are those which occur in the language, are nevertheless judged worse than some of the lower structural alternatives in the other groups, which do not occur. Let us be clear that these judgements were given by the same participants in the course of the same experiment, whose items were ordered randomly. The implication is clear: occurrence is not directly dependent upon well-formedness, but rather upon a competition function based on these weightings. This finding supports the distinction of the grammar and the production function, as in the Decathlon Model, but it is not compatible with an architecture in which these two are merged. 5.2
Categorical judgements and relative judgements
This insight into human linguistic processing offers an account of another outstanding question: Why do judgements, elicited under strictly controlled conditions, show that informants, given a free choice of scale, do not use a
202 Sam Featherston binary division or end points which might represent “fully grammatical” and “fully ungrammatical”? Our solution to this quandary is to distinguish the categorical judgements commonly used in syntactic work from the relative judgements obtained from our experimental studies. Our assumption of this dissociation is based upon several pieces of evidence. The strongest evidence for the reality of categorical judgements is quite simply our intuition that there are such things as “full grammaticality” (= “I would expect to hear this”) and “full ungrammaticality” (= “I would never expect to hear this”). Every speaker seems to have this, and neither its reality nor its relevance can be doubted: any naive informant, given a binary choice whether an example is good or bad, can immediately make sense of the question. It seems likely that the existence of this intuition is the reason for the standard linguistic assumption of dichotomous grammaticality. On the other hand, the results of carefully controlled experimental studies such as our own demonstrate conclusively that relative judgements exist too. Further evidence for the distinction is offered by a frequent comment in judgements of sets of sub-optimal structures: “I would never say it, but it is better than the other one”. The frequency of this type of reaction suggests that this intuition too is common to all speakers. With this response the informant is giving both types of judgement information: a categorical judgement and a relative one. This typical comment also gives us a clue about the difference between the two types: categorical judgements concern occurrence, while relative judgements reflect computational cost. Let us take these in turn. The categorical judgement, we argue, is an expression of the likelihood that a structure is good enough to occur in practice. As such it is probably dependent on one or both of two factors: firstly, our internal corpus of the language, made up of the effects of language exposure, which feeds information into every process which makes use of frequency. The question that the informant is internally answering (at least sometimes consciously) is: “Have I heard structures like this?” The second possible factor is our Output Selection function. The internal question here is: “Would this structure be produced or is there a better alternative which would be chosen in preference to it?”. Either way, categorical judgements reflect occurrence, and produce an essentially binary output in the same way that other occurrence-based data types do; a structure either does or does not occur. The relative judgement, on the other hand, reflects the cognitive workload in processing the form and semantic content of the structure, and relating the two. It reflects the function of the Constraint Application module, and consists of its standard output of a candidate form-and-meaning pair with an assigned weighting. This provides an account of why relative judgements can
The Decathlon Model 203
distinguish between sets of structures which are all seriously ill-formed and none of which would ever occur. Such data cannot possibly reflect occurrence or frequency, since this is consistent across all such structures, but they nevertheless differ in computational workload. Notice that this too provides an explanation of our failure to find any reflection of the intuition that certain linguistic constraints are ‘hard‘ constraints in our judgements (see Section 3). Constraints felt to be ‘hard’ are those which have such high violation costs that structures violating them will, in practice, tend not to win the competition for output and thus not occur. We find no correlate of ’hard’ constraints in our relative judgements because their ‘hardness’ is a feature of occurrence, not computational complexity. Short people do not tend to become professional basketball players, in fact there may be no short basketball players. A ’hard’ restriction? Plainly not. Short people can become basketball players if their other qualities (agility, speed, good aim) can make up for the disadvantage of shortness in this context. This may in practice never occur, but nevertheless the restriction is not a ’hard’ one. Restrictions on syntactic structure work in the same way: certain violations may mean that violating structures will rarely or never be selected for output: but the link is not direct, and there is no ’hard’ restriction. 6 The nature of well-formedness If further work confirms that relative judgements reflect computational load and categorical judgements reflect possible occurrence, then a number of implications for the architecture and nature of the grammar would follow. First, at least a proportion of the restrictions on linguistic structure are ultimately functionally motivated, since they relate to the factor ease of use, and are ultimately emergent, in that the factors which drive the division into “better” and “worse” structures are themselves value-free. It should be clear that this conception of the cognitive roots of grammar has little in common with approaches more generally associated with the label emergent (eg Bybee and Hopper 2001), which use the factor frequency as the causal factor in the emergence of structure. Our ambition here is to account for (among other things) occurrence frequencies, not use them as explanations. We are associative and competition-driven thinkers: put differently, we are lazy thinkers, and we therefore prefer computationally easier processing tasks. But the processing of every word and every syntactic relation comes with a cost: this can be readily seen in judgement studies, where longer sentences are systematically judged worse than shorter sentences (more words
204 Sam Featherston mean more computional load). There is of course nothing actually “wrong” with longer sentences: the interpretation of computational load as “badness” comes only at the stage when the production system has to deal with structural alternatives, and, at this stage but not before, forms which incur higher computational costs are dispreferred. Thus longer sentences are computationally more costly, but sometimes necessary and so they occur, whereas forms for which a more economical structural alternative exists may only be a little more costly, but even this little additional computational cost is unnecessary, which makes the structure unlikely to be selected for output. This model of well-formedness can perhaps be understood as resembling the world of economics. All expenses are dispreferred. Nevertheless, we are prepared to pay more for a motorbike than for a bicycle, because a motorbike does more than a bicycle. We therefore produce long or complex structures when these are appropriate, even though these are computationally costly. On the other hand, we are not to prepared to buy the more expensive of two objects which perform the same function. Equivalently, when two structural variants communicate the same semantic content, we choose the less computationally costly alternative. But there is nothing per se about any structure which makes it good or bad – computational workload is not bad until we lazy thinkers judge it so. This analysis of the nature of judged well-formedness accounts for cumulativity, violation costs, survivability etc, but at the same time goes some way to explaining why there is evidence for the universality of grammatical restrictions (architecture-related factors are by their nature universal), and it does this within a psycholinguistically and empirically motivated framework. 7 Implications for data types and their relation to theory It seems fair to state that a fundamental assumption underlying the use of frequencies as a source of evidence for syntax is that “good” structures are produced, and thus found in corpus data, while “bad” structures are not produced and thus not found. In a second step we might generalize that there is an assumption that better structures are produced more often than less good structures. These assumptions are confirmed by our findings, but they reveal that this is not the full picture: frequencies correlate with well-formedness in judgements among the very “best” structures, but provide no information about “poorer” candidates, because these undifferentiatedly do not occur. Or rather, they do not occur in the size of corpus to which we have access. If we are right in our suggestion that output competition is probabilistic, then,
The Decathlon Model 205
in a big enough corpus, we should find not only the best and second-best candidates but also the third and fourth and so on. The fact that linguists are always finding structures in corpus data which they had assumed to be categorically excluded, but which do not appear to be mere slips of the tongue, must strongly support this suggestion (just search for ”What did who” in Google). It would follow that frequency measures and judgement data are mathematically related, since we could predict the score of a given item in a comparison set on the basis of the set’s scores from the other data type. They are not practically related, however, since the corpus size required would increase exponentially as we proceeded down the order of preferredness. It follows from our arguments here that the data type of choice for syntax must be relative judgements. Frequency measures give us the same information as relative judgements about the best (couple of) structural alternatives in each comparison set, but they give us no information about any of the others. Since the interaction of linguistic constraints is demonstrably cumulative, this is a severe disadvantage, especially as it tends to make linguists interpret relative restrictions on structure as absolute restrictions. Put briefly: if you want to know what people say, choose frequencies, but if you want to know why, you are better off with relative judgements. 8 Implications for syntactic theory These new, but empirically founded perspectives on data types and their implications for the nature of grammaticality and the structure of the grammar are in some ways revolutionary, since they require a number of conventional assumptions to be abandoned or revised. Much can remain unchanged, however, since linguists in the past, on the basis of the much more partial data they had available, often nevertheless correctly identified characteristics of the data set. For example, with only an individual’s judgements and without the immediate access to corpus data that we have now, the abstraction to an essentially binary model of grammaticality was a reasonable step, which has in many ways served the field well. On the other hand, our findings should make it clear to every syntactician that the current model of syntax has significant weaknesses. We can well understand how these came about, but that cannot be a reason not to move on. In fact the necessary reformulation of syntactic theory requires only two major steps. Syntacticians must first recognize that production processing has a role in deciding what linguistic forms are produced, and that occurrence only indirectly reflects well-formedness. This entails that output selection and the grammar are two separate processes, and we must decide which of these we are modelling.
206 Sam Featherston There are three possibilities: we can look specifically at the system of the constraints which apply to syntactic structures, our Constraint Application module, and disregard production factors. To do this we should use data types which exclude the effects of occurrence as far as possible, ie relative judgements, and refine our theory to more accurately reflect the attested data patterns. This is narrow syntax. Others will be more interested by the processing system: there is an extensive literature on sentence processing and numerous data-near models of how we go about using our embedded grammar. This work concerns the aspects of what we have called here the Output Selection module. The third approach is to look at the cumulative effect on output of the two modules, Constraint Application and Output Selection. This is what many syntacticians are currently doing, assuming themselves to be looking at just one system, but the mismatch in data patterns between frequencies and relative judgements reveals this work to be treating two heterogenous objects as one. Nevertheless, this is an interesting and worthwhile field of study in its own right, one closely related to traditional descriptive linguistics, in which the occurring patterns of a language are the issue, rather than the underlying causes of these patterns. Frequencies will be the data of choice for this study since they represent the selection of the output processing system from the candidates made available and weighted by the grammar. The insight that we can and should distinguish between the functioning of Constraint Application and Output Selection should bring about a major improvement in the empirical adequacy of syntax models, for the division of these two modules resolves at a stroke many of the inconsistencies which obscure the nature of the interaction of linguistic constraints. Syntactic theory will be far closer to the data, and hypotheses about the grammar will be far more constrained, surely a welcome development. Having cleared the picture by factoring out the competitive effects of output selection, we can take a look at the module containing the grammar, which we have called Constraint Application. The second major step in the revision of theory applies here, and consists of the specification of constraint violation costs. Each violation must have a quantified cost, since there are stronger and weaker violation constraints. The introduction of this parameter should alone bring about many of the changes in architecture which are necessary to adjust current theory to gradient grammaticality, as is demonstrated to be necessary in work such as Keller (2000) and Featherston (2005b). As soon as violation costs are accepted as a real variable, the other adjustments (survivability of constraint violations, cumulativity of violation costs, dissociation of categoricity and grammar-relevance) follow automatically.
The Decathlon Model 207
These then are the lessons which we argue that syntax theory needs to draw from the closer inspection of its data base. First, we must redraw the boundary between grammar and production so as to distinguish between the effects of linguistic constraints, and the effects of our need to select just one way formulating each utterance. Second, we must add the additional parameter of violation cost to our models of syntax. Not words and rules, therefore, are the basic components of the grammar, but words, rules and sanctions. Acknowledgements This work was carried out in the project Suboptimal Syntactic Structures of the SFB 441 Linguistic Data Structures supported by the Deutsche Forschungsgemeinschaft. Thanks are due to project leader Wolfgang Sternefeld, my colleague Tanja Kiziak and many other members of the SFB 441, as well as to Frank Keller for WebExp. All errors are mine.
References Aissen, Judith and Joan Bresnan 2002 Categoricity and variation in syntax: The Stochastic Generalization. Talk at Potsdam Gradience Conference, 22.2.2002. Bard, Ellen, Dan Robertson, and Antonella Sorace 1993 Magnitude estimation of linguistic acceptability. Language, 72: 32– 68. Boersma, Paul and Bruce Hayes 2001 Empirical tests of the gradual learning algorithm. Linguistic Inquiry, 32: 45–86. Bybee, Jane and Paul Hopper 2001 Frequency and the Emergence of Linguistic Structure. Benjamins, Amsterdam. Cowart, Wayne 1996 Experimental Syntax: Applying Objective Methods to Sentence Judgements. Sage, Thousand Oaks, California. Featherston, Samuel 2001 Empty Categories in Sentence Processing. Benjamins, Amsterdam. 2002 Coreferential objects in German: Experimental evidence on reflexivity. Linguistische Berichte, 192: 457–484. 2003 That-trace in German. Lingua, 1091: 1–26. 2004 Bridge verbs and V2 verbs: The same thing in spades? Zeitschrift f u¨ r Sprachwissenschaft, 23: 181–209. 2005a Magnitude estimation and what it can do for your syntax. Lingua, 115. 2005b Universals and grammaticality: Wh-constraints in German and English. Linguistics, 43.
208 Sam Featherston Keller, Frank 2000 Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph.D. thesis, Edinburgh University, Edinburgh. Labov, William 1996 When intuitions fail. In Lisa McNair, Kora Singer, Lise Dolbrin, and Michelle Aucon, (eds.), Papers from the Parasession on Theory and Data in Linguistics 32, pp. 77–106. Chicago Linguistics Society, Chicago. Manning, Christopher 2003 Probabilistic syntax. In Rens Bod, Jennifer Hay, and Stephanie Jannedy, (eds.), Probabilistic Linguistics, pp. 289–341. MIT Press, Cambridge, MA. M¨uller, Gereon and Wolfgang Sternefeld 2001 Competition in Syntax. Mouton de Gruyter, Berlin. Pesetsky, David 1987 Wh-in-situ: Movement and unselective binding. In Eric Reuland and Alice ter Meulen, (eds.), The Representation of (In)Definiteness, pp. 98–129. MIT Press, Cambridge, MA. Prince, Alan and Paul Smolensky 1993 Optimality Theory: Constraint interaction in generative grammar. Technical Report Technical Report No.2, Center for Cognitive Science, Rutgers University, New Brunswick. Sch¨utze, Carson 1996 The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press, Chicago.
Examining the Constraints on the Benefactive Alternation by Using the World Wide Web as a Corpus Christiane Fellbaum
1
Introduction
This paper asks whether data gathered from the web can provide new insight into speakers' grammars and serve as evidence for linguistic theories. Our case study examines the poorly understood English Benefactive Alternation and the various constraints that have been proposed to account for its distribution. Web data show these constraints to be soft and subject to frequent violation and extension, raising the possibility of a constraint with "fuzzy edges." Our case study argues for the need to substitute or augment constructed data with web data to avoid theoretical biases and capture the full range of rule-governed linguistic behavior. The Benefactive Alternation relates two syntactic variants for the expression of an argument that bears the semantic role of Beneficiary. This argument can be realized either in a PP headed by for as in (1) or as the first of two direct objects in a double object construction as in (2): (1) (2)
Chris baked/bought/stole a cake for Kim. Chris baked/bought/stole Kim a cake. The direct object alternant of the Benefactive is subject to constraints:
(3) (4) (5)
Chris baked/bought/decorated/sliced a cake for Kim. Chris baked/bought Kim a cake. *Chris decorated/sliced Kim a cake
Prior explanations for contrasts like that between (2, 4) and (5) have been formulated in terms of semantic constraints on the verb's semantic class membership, the aspectual nature of the event, the Beneficiary and/or the Agent arguments, as well as the verb's morphophonological make-up.
210
Christiane Fellbaum
1.1
The data
Previous investigations of the ill-understood Benefactive alternation are based on introspection and the examination of data generated by the investigator. Such data are often constructed in order to highlight a particular phenomenon, and theoretical bias may well influence and limit the range of data that should be considered for a full account. Judgements can be unreliable and vary both across speakers and within a given speaker, depending on factors such as the context in which a specific structure is embedded. Given the current availability of naturally occurring data, it seems timely to re-examine the alternation and the constraints that have been proposed in an empirical fashion.1 1.1.1 Using the web as a corpus We used the World Wide Web as a corpus and a source for naturally occurring examples in order to examine a range of claims specifying how the Benefactive Alternation is semantically constrained. The web is a rich source of freely available linguistic data covering a wide range of speakers, topics, and styles, with much of the data generated spontaneously and unedited. The web is thus a logical alternative to conventional corpora, which tend to be hard to come by, small, or limited to a few domains. At the same time, there are no controls on web postings, and some linguistic data may not be suitable as evidence for linguistic theories. Web data are vulnerable to the charge of unreliability for several reasons. A major concern are data posted by non-native speakers. This is particularly worrisome in the case of English data, as English, unlike, say, Hungarian or Japanese, serves as the lingua franca for worldwide communication, for which the web is a preferred channel. To safeguard again non-native data, each URL needs to be examined, and those that are clearly of foreign origin must be excluded. Moreover, for each type of construction, we collected several data points. Perhaps most importantly, the data discussed in this chapter were presented at several conferences and in colloquia where they went unchallenged by the native English speakers in the audience. Some of the data cited in this paper could be dismissed as non-standard. Certainly, quite a few examples reflect language that one might not find in official publications or written language at all. But the kind of language that is often spontaneously generated in postings to chat groups nevertheless reflects speakers' grammars. Our work shows that data generated "naturally" and outside the context of a linguistic investigation force a rethinking of
Examining the Constraints on the Benefactive Alternation
211
previously proposed constraints on the Benefactive alternation, which we argue are too narrowly formulated. Not just anything goes, as far as web data are concerned. The fact that we could not find web data like (5) indicates that the construction is constrained in a principled way that needs to be understood. Of course, not finding a web example of a Benefactive with a particular verb or noun argument does not mean that it is categorically ruled out and ungrammatical. First of all, our search was unsophisticated in that we could not look for abstract syntactic patters or entire verb classes, but had to search for specific strings, necessarily missing other, similar, ones. But even the absence of hits to more exhaustive searches cannot be overinterpreted. The asterisks in this chapter must therefore be interpreted not as strictly "ungrammatical," but as "unattested on the web." The motivation for this work was less a definitive formulation of the constraints on the Benefactive as a confrontation of the proposed constraints with attested data. 1.1.2 Searching for relevant structures The PP alternant of the Benefactive, exemplified in (1) and (3) is essentially unconstrained, while (5) shows that the alternant where the Beneficiary is projected as the direct object (DO) is restricted. To determine the scope and nature of the constraints, we searched for examples of the DO alternants of the corresponding PP alternants. As this work was carried out prior to the development of the Linguist's Search Engine (Resnik and Elkiss 2004), we had no intelligent tool for targeted searches and had to rley on simple pattern-matching searches. Using GOOGLE , we formulated queries of the forms (6) (7)
"she Ved me some" "he Vs her a"
where V was filled with specific verbs. We used a variety of verbs that occurred with high to medium frequency in the Brown Corpus, and whose semantic make-up was relevant to the constraints formulated. When testing hypotheses concerning the nature of the Agent or the Beneficiary, we looked for examples with specific nouns filling the argument slots. We excluded sexually explicit sites or other inappropriate data.
212
Christiane Fellbaum
2
Beneficial events
We now examine some aspects of constructions that express events with a benefit. 2.1
What kinds of events can be beneficial?
The argument expressing a Beneficiary is always optional and is not part of the verb's theta grid. This suggests that verbs selecting for Beneficiaries do not inherently denote beneficial events, but can receive such an interpretation. A question that arises is, are there any constraints on the class of verbs that can express benefits? Many transitive and intransitive verbs from a wide variety of semantic classes can add a Beneficiary as a PP adjunct; the actions expressed by these verbs are not inherently actions performed for someone's sake or benefit. The sentences below, found on the web with the Beneficiary, are just as good without: (8)
(9)
There'll be an unloading zone at the transition area if you wish to have someone drop you off and park your car for you home/att.net/~ata-jc/kaprules.html There is also a system-wide startup file which is run for you first Orbit-net.nesdis.noaa.gov/ora/oraintranet/ctst/unix/c15.html
Similarly, many verbs permitting the DO alternant do not denote events with an inherent benefit: (10) Peel me a grape (11) Hurry, get my red shirt (12) It feels as though someone had designed me a custom dress... www.between-theshadows.com/shadows/fire/transformations/aboutme.html (13) I asked Mom to wash me some clothes, www.bad-krama.net/archive/arc39.html (14) I ask Roberto if he can change me some money www.newbury.net/deanwood/doc/Greece.htm (15) And try to find me some aspirin while you're at it. www.geocities.com/chocofeathers/ multifics/2ndc_chap8.htmll
Examining the Constraints on the Benefactive Alternation
213
In some cases, construing a beneficial reading is difficult. We could not find examples such as the following, though they are perfectly interpretable in the right context (e.g., where the Beneficiaries are a nurse and a stage director, respectively). (16) I'll take a walk/swim/nap for you (17) She fell down the steps for him While these structures seem perfectly grammatical, given an appropriate context, the corresponding DO alternants do not: (18) ??I take you a walk/swim/nap (19) ??She fell him down the stairs Unlike the unrestricted PP alternant, the DO alternant seems to be reserved for events where a beneficial reading can be constructed more easily and naturally. 2.2
No benefit
The alternation also occurs with events that have undesirable consequences for the DO: (20) They have done nothing but ruin me my whole life www.piedmont.tec.sc.us/worldlit/andr1.htm (21) They have done nothing but ruin my whole life for me (22) So they set you a trap hot.ee/fanfic/thirteefull.html (23) So they set a trap for you There is a straightforward semantic contrast between benefactives and these "malefactives." It has often been observed that contrast is a particular kind of semantic similarity, as contrastive concepts tend to represent different values of a shared attribute or distinct points on the same scale. The fact that Benefactives and some "Malefactives" participate in the same alternation is consistent with a view in which they are semantically related.
214
Christiane Fellbaum
2.3
Beneficiary or replaced Agent?
The for-phrase in the PP alternant is potentially ambiguous. Green (1974) cites, besides the Benefactive, the "instead-of" reading. In (24) and (25) respectively, the missionary taught the class, and Kahler gave the speech, in place of the writer: (24) But on Tuesday, I stayed home, in bed part of the day! Another missionary taught the class for me. www.jacklynes.com/russia/letter16.htm (25) ...had developed a very hoarse sore throat. So with the approval of my hosts, Kahler gave the speech for me and did very well indeed. www.nobel.se/noble/events/eyewitness/hench/ The distinction between the Benefactive and the "instead-of" readings is not always sharp, as the substitution seems to imply a benefit for the substituted Agent; this reading is avoided only when the PP is headed by instead of. (26) Complete Grocery Shopping... We do all the shopping for you. There's no need for you to spend your valuable time ... www.shadowlief.com/what_we_offer.htm (27) E-mail Software does all the work for you. www.homeuniverse.com/bulk.htm In some cases, the context supplies world knowledge that disambiguates between the two readings of the for-phrase: (28) ...pianist Vladimir Horowitz. After hearing two of John's compositions, which he played for the maestro one evening after dinner, Mr. Horowitz looked over to John ... www.johnsciullo.com Most likely, John is performing for Horowitz's benefit here, not in his place. Under the "instead-of" reading, for can receive heavy stress, so long as no other constituent in the VP is focused. Thus, (29) means (30) and not (31): (29) I'll do it FOR you (30) I'll do it in your place
Examining the Constraints on the Benefactive Alternation
215
(31) I'll do it for your benefit Green notes that in the direct argument alternant, no replacement interpretation is possible, and the NP is always a Beneficiary. This can be seen in (32) and (33), where only the beneficial context seems felicitous: (32) Mary played Mr. Horowitz two of her compositions ...and the maestro listened attentively ??...while he was away on tour (33) Mom washed me some shirts ...so I'd look neat for the job interview ??because I don't know how to operate the washing machine 2.4
The Agent as Beneficiary
Some verbs, including consumption and perception verbs like eat, drink, watch, listen, etc., denote events with an inherent benefit for the subject, the "ingester." It is difficult to construct an additional Beneficiary argument for these verbs. Nevertheless, some dialects of American English can add an explicit DO Beneficiary, in constructions that Curme (1986) calls a "personal dative." 2 The DOs here must be Beneficiaries and not Recipients, as the verbs do not denote transfers. These Beneficiaries are either reflexives or object pronouns, necessarily coreferent with the subject. Web examples are: (34) (35) (36) (37) (38)
I'll have myself a little snack before bed I'll eat me some potted meat gonna listen me some Guns and Roses gonna watch me some uneducational TV Have yourself a merry little Christmas
In contrast to the Benefactives where subject and object have distinct referents, the corresponding PP alternants seem ungrammatical: (39) (40) (41) (42) (43)
*I'll have a little snack before bed for myself *I'll eat some potted meat for me/myself *gonna listen to some Guns and Roses for me/myself *gonna watch some uneducational TV for me/myself *Have a merry little Christmas for yourself
216
Christiane Fellbaum
Non-ingestion verbs can have a reflexive beneficiary in a for-phrase, but only when there is a contrast: (44) I'll wash some shirts for myself (but not for you) The reflexives seem therefore to constitute a different phenomenon with these ingestion verbs; we will return to these data later. 2.4.1 Some Spanish data In what appear to be related cases, Nishida (1994) examines Spanish sentences with transitive verbs that have alternants with an additional Reflexive: 3 (45) Juan SE tomó una copa de vino John (REFL) drank a glass of wine (46) Yo ME comí diez manzanas I (REFL) ate ten apples The Spanish data seem similar to the English ones, and interestingly, Nishida's examples cover the same semantic classes of verbs (verbs of consumption/ingestion like eat, drink, smoke, read), and what could be described as semantically contrasting verbs (skip over, miss out); as well as verbs of acquisition like steal, gain, learn). Nishida claims that the reflexive clitic in cases like the above overtly marks quantitatively delimited events. The reflexive variants of (45) and (46) thus can be translated as eat up/drink up, whereas the non-reflexives do not have this completive aspect. This contrast is very clearly shown in (47). Only the sentence with the reflexive necessarily refers to an event where the entire book was read (the English gloss is underspecified with respect to aspect): (47) Juan (SE) John REFL
leyó el libro anoche read the book last night
Given the meaning difference, it seems surprising that sentences like (46), which have a delimited object (as opposed to, say, a bare plural NP), allow both the reflexive and the non-reflexive form. Not surpisingly, Nishida states that the native speakers he consulted preferred the reflexive form.
Examining the Constraints on the Benefactive Alternation
217
Despite the superficial similarity, Nishida's explanation for the Spanish data does not hold for English, where the sentences with reflexive Beneficiaries like (35-37) have a DO with a partitive, which marks them as nondelimited, and where the event is necessarily open-ended. We will return to the question of aspect in Benefactives in Section 4.4. 3
Argument status of the Beneficiary
The Beneficiary is always optional, hence it cannot be considered to be a subcategorized argument of the verb. Several explanations have been offered to account for the licensing of the Beneficiary. Larson (1990) suggests a mechanism he calls Argument Augmentation, which adds a Beneficiary to the verb's theta grid. Marantz (1984) proposes an affix-mediated increase in the verb's valency. Baker (1988) suggests that a zero affix allomorph of for attaches to verb when the Beneficiary is in direct argument position.4 In support of the claim that a Beneficiary is not a true argument, it has been pointed out that it it fails a standard test for argumenthood, passivization. In this respect, Beneficiaries contrast with Recipients: (48) *Kim was selected/designed/sewn a wedding dress (Kim = Beneficiary) (49) Kim was given/sent/mailed a present (Kim = Recipient) However, a web search turned up numerous examples of passivized Beneficiaries (these may be more characteristic of British than American English): (50) I was made coffee and sat and talked to Ella www.punternet.com/frs/fr_view.php?recnum=6798 (51) ...until early morn, when I was made tea and toast, ... pws.prserv.net/usinet/declair.diary1.htm (52) Today, the teachers were fixed breakfast www.switzerland.k12.in.us/~pacerps/pdf (53) "Oh," Nick said ...watching as he was poured a drink. "Brian?" http://cgi.allihave.net/fiction/hom/short/ForMyWedding.shmtl (54) He was built a house at pool-side, to keep him in the shade. www.bronxbard.com/specials.html
218
Christiane Fellbaum
(55) His 'friend' who came with him insisted that he was bought some trainers... duvel.lowtem.hukudai.ac.jp/~jim.climbing/manda3_report/node15.html We will not further pursue the question concerning the argument status of the Beneficiary but, clearly, passivization does not distinguish Beneficiaries from real arguments like Recipients. We will return to a comparison of the Beneficiaries and Recipients later. 4
Constraints on the Benefactive alternation
A number of explanations have been proposed to account for the restrictions on the Benefactive alternation, specifically, the distribution of the DO alternant. These explanations have been formulated in terms of the verbs' lexico-semantic and morphophonological properties, the aspect of the beneficial event, and the semantics of the Agent and the Beneficiary arguments. We examine each of the proposed constraints in the light of pertinent web data. 4.1
Lexico-semantic constraints
Green presents the most extensive analysis of verbs that show the Benefactive alternation. She classifies these verbs into distinct groups: Verbs of creation, selection, performance, and obtaining, as well as "symbolic actions." In parallel with Green's classification, benefits have been characterized as created entities (including entities created as a by-product of acting on another entity), prepared entities, (future) possessions and obtained entities (Green 1974; Levin 1993; Larson 1990; Jackendoff 1990; inter alia). We will examine each of these classes in turn with respect to the alternation. 4.1.1 Verbs of creation A creation usually requires effort, and efforts tend to be undertaken only when they are associated with a benefit. A created entity can therefore readily be interpreted as a Benefit, and creation verbs generally allow Beneficiaries in direct argument position:
Examining the Constraints on the Benefactive Alternation
219
(56) In 1818-1819, Benjamin Henry Latrobe built the couple a house, which came to be known as Decatur House, next to Lafayette ... www.library.georgetown.edu/dept/speccoll.decatur (57) My friend Ola fixed me a job. www.trance.org/sensphere/ (58) She made them waffles www.womenspace.ca/Fabrications/lorr3.htm (59) and she bore him a son, Hasumat. www.sacred-texts.com/oah/oah/oah360.htm An interesting subclass is constituted by cases where the creation is a by-product or result of another event, which is not a creation event: (60) only if you clean me some room on this desk to work, right? www.geocities.com/dbzasuri/epics.itsoadc1.htm The room is the result of the desk cleaning event; even though the room is the DO of the verb, what is cleaned is not the room but the desk. 4.1.2 Destruction verbs It has been claimed that that the Benefactive is not available for verbs of destruction (Wechsler 1991). However, we found quite a few web examples that refute this claim, including the following: (61) ...kick the crap outta saint nick and burn him some pagans www.geocities.com/s1xlet/apathy19.html (62) ...an idealistic 18-year-old eager to go kill him some Redcoats www.dvdjournal.com/reviews/p/patriot.shtml These sentences show that the destruction of an entity (the Theme) may result in a benefit (for a Beneficiary that may or may not be coreferent with the Agent). Destruction and creation verbs are semantically related by virtue of the contrast between them, and an extension of the alternation from creation to destruction verbs could be attributed to this similarity. In (63) and (64), the destruction events do not appear to entail a benefit: (63) Herons or other wild fowl shall destroy them their nest or eggs www.nprwc.usgs.gov/resource/1999/eastblue/ebexotic.htm
220
Christiane Fellbaum
(64) The white missionary is trying to ruin them their way of life www.piedmont.tec.sc.us/worldlit/andr1.htm Here, the DO seems to emphasize the "malefactive" effect of the destruction. 4.1.3 Verbs of preparation Verbs expressing events where an Agent acts on an entity such that this entity is prepared for use or consumption easily take a Beneficiary argument and exhibit the alternation. (65) Peel me a grape, Crush me some ice. Skin me a peach, save the fuzz for my pillow. www.amyandfreddy.com/cd/track5.html (66) I asked Mom to wash me some clothes www.bad-karma.net/archive/arc39.html (67) Honey, can you iron me a shirt? www.epinions.com/hmgd-review-6689-32384DB-3A231D50-prod1 4.1.4 Verbs of performance Verbs of performance can be described as re-creations of a work of art such as a composition, a poem, or a song. Performances therefore resemble creations. The Beneficiary here is an Experiencer. (68) and that's where I met Mel and Shaz, I played them some tunes www.portowebbo.co.uk/nottinghilltv/faces-kgee.htm (69) Anyways, Herman has sang me some of his banana-fried lyrics members.aol.com/bumingler/set1/songs.html (70) Morgane donned wooden clogs and danced us a dance beyondthebrochure.homestead.com/britnorm.html Note that some performance verbs take both for- and to-phrases in the PP alternant: (71) Moses and his sister Miriam both sang a song to the Lord... www.pcusa.org/ega/music/favoritesongs.htm
Examining the Constraints on the Benefactive Alternation
221
(72) That week at ending campfire, we sang a song for Cassie. kidsaid.com/stories/cassie.html (73) ...anyone who would like to can play a piece to the school. www.childokeford.dorset.sch.uk/ clubsandactivities/music.htm (74) When I actually sit down to play a piece for others... www.violinist.com/discussion/response.cfm?ID=3688 4.1.5 Symbolic actions In addition to the classes listed above, Green notes that certain "symbolic actions" performed for someone's benefit can undergo the Benefactive alternation. Green does not explain the exact nature of "symbolic actions;" the benefit is specific to the context. Among the examples we found are these: (75) God said to Abraham: Kill me a son www.ieor.berkeley.edu/~goldberg/lecs/kierkegaard.html (76) Baby open me your door www.geocities.com/SunsetStrip/Pit/8508/songs/chameleo.html The verbs here (and those cited by Green) do not fit neatly into any semantic class. Moreover, the symbolic actions most clearly show the distinction between Benefactive and Recipient, since these cases do not involve a created/prepared/performed/obtained entity that is moved, transferred, or that comes into the possession of the Beneficiary. 4.1.6 (Future) possession Many verbs of obtaining, describing a resultant possession for the Beneficiary, participate in the alternation: (77) He said he'd rather go out and grab him some food www.dreamwater.net/art.jtdoc/hasil.html (78) Retrieve me some cream cakes! Home.talkcity.com/ekochap8.htm A subclass of the future possession verbs are the verbs of selection, where a to-be-possessed entity is chosen for the Beneficiary:
222
Christiane Fellbaum
(79) ...please select me a good singer for about twelve shillings www.classicreader.com/read.php/sid3/bookid1506. (80) I've written to Sylvia asking her to choose me a coat www.gerty.ncl.ac.uk/letters/l1170.htm Obtained Entities, including abstract ones like emotions, can become the Beneficiary's possession: (81) Radical Red: Get Me Some Self-esteem. by Laura Jones. www.thebody.com/tpan/julaug_01/self_esteem.html (82) I was bought loads of drinks and got quite drunk. www.wrecked.co.uk/norames/ott3.html 4.2
Benefit and possession
Larson (1990), Daultrey (1997), Krifka (1999) inter alia, attribute to the DO alternant the requirement for a created/prepared/obtained entity that becomes the Beneficiary's possession. The benefits in the different verb classes conform to the notion of possession to varying degrees. Future possession verbs clearly do. The products of creation events also become straightforwardly possessions of the Beneficiary: (83) Anyone who can create me some copies in other formats, please give me a shout! www.cunningham-king.freeserve.co.uk/YCC/Fornt%20Page.htm (84) She read the recipes and cooked her husband some Spam (85) composed me a few lovely haiku www/ghostinthemachine.net/weblog4200.html Performance verbs are a subclass of creation verbs that involve no physical entity; Green argues that the audience's perception constitutes a kind of possession: (86) Henn Parn with his dancing partner performed us the professional ballroom dances www.euroiniv.ee/evana/eng/ball2001.htm The notion of possession is somewhat stretched here, as other ways of referring to a possessed entity seem infelicitous. With a possessive adjec-
Examining the Constraints on the Benefactive Alternation
223
tive, the noun in (87) , interpreted as an activity rather than as a result, is odd, as compared with the created noun in (88): (87) You/I/she watched your/my/her dances (88) Here is your/my/his spam The product of a preparation or transformation event is extending the notion of possession even further, as the Theme is already in the possession of the Beneficiary prior to the preparation or transformation: (89) Well, the rest is his story? Honey, can you iron me a shirt?? www.epinions.com/hmgd_review-6689-32484DB-3a231D60-prod1 (90) You're a good boy, Joe. Now get busy and wash me some dishes. www2.xlibris.com/bookstore.book-excerpt.asp?bookid=902 (91) I asked Mom to wash me some clothes, www.bad-karma.net/archive/arc39.html Sentences like (92) and (93) might be described as referring to the repossession of an entity that the Beneficiary owns: (92) The captain shouted to the first mate, "Hurry, go to my cabin and get me my red shirt!" www.skywaystools.com/jokes1/html (93) his segundo would fetch him his French hat, morning frock coat and a birch tree chair. Collections.ic.ca/skeena/Cataline.htm Clearly, to equate Benefit with Possession is stretching the notion of possession considerably in many cases. Moreover, such an equation would not account for why the Benefactive exists as a phenomenon distinct from the Dative Alternation and applies to a different set of verbs. 4.3
The Latinate constraint
As in the case of Dative Shift, verbs of Latinate origin are said to be generally ineligible for the Benefactive Alternation (Levin 1993, inter alia).5 However, we found the following examples on the web, which include verbs from all the lexico-semantic classes associated with the Benefactive Alternation:
224
Christiane Fellbaum
(94) Anyone who can create me some copies in other formats, please give me a shout www.cunningham-king.freeserve.co.uk/YCC/Front%20Page.htm (95) Please compose me a short piece. www.uen.org/utahlink/activities/view_activity.cgi?activity_id=7511 (96) ...promised to procure me seeds mnlg.com/gc/species/c/cau_pla.html (97) I shall decline your invitation to purchase me a beverage www.fabulamag.com/contest/august_html (98) I am going to japan to acquire me a new slave home1.gte.net/methnews1/GLA.txt (99) this networking helped to secure me a position www.geocities.com/SouthBeach.Jetty/9001/collegedays.html (100) She produced me two gorgeous sows cavyclub.tripod.com/satinperuvian.html (101) Can anyone..photocopy me the manual. www.driverforum.com/harddrive/1267.html (102) Is there someone who could construct me a set of replicas www.taxidermy.net/forums/FeerTaxiArticles/ (103) ... a group of students performed us sketches about their school www.fast-trac.ofw.fi/report14.htm All these examples have a clitic pronoun in direct argument position, which might suggest that for clitics, the Latinate constraint is relaxed. Indeed, we found fewer Latinate verbs with a full noun DO Beneficiary than with a pronoun, but our searches turned up quite a few examples, including the following: (104) ... To secure our customers success in using our technology... Support.reachin.se/Downloadable/brochures/Core_technology.pdf (105) Her aggressive and well planned marketing concepts, combined with her personable selling skills, guarantee her customers outstanding results. rearch_realtors.com/pennsylvania/wyomissing/Liz_Egner_94768886.html (106) ...in order to ensure future generations an opportunity to appreciate and enjoy the West's rich heritage ... www.wstpc.org/About/Facts/htm (107) SA has obtained his clients recognition all over the world... www.pontsoft.com/empresa/plasticm/eng/welcome.htm
Examining the Constraints on the Benefactive Alternation
225
We conclude that there is no restriction on the Benefactive alternation that can be formulated in terms of the etymological or morpholphonological properties of the verb. Rather, speakers seem to employ this construction with certain Latinate verbs just as they do with semantically similar verbs that are of Germanic origin. In this respect, too, the Benefactive alternation resembles the Dative Shift. The restrictions on the Dative Shift have been formulated in terms of the Latin vs. Germanic origin of verbs like donate/contribute on the one hand and give/hand,on the other hand. But this explanation does not hold, as sentences like (108) and (109) with Latinate verbs show: (108) at her death she bequeathed him her whole property www.fordham.edu/halsall/pwh/plut_sull1.html (109) he offered us some hope. www.ucsfhealth.org/childrens/profiles/sieberSamuel.html Offer and bequeath are verbs with of future possession. Pinker (1989) attributes to this distinction their apparently exceptional behavior with respect to the Dative Shift. But this explanation does not account for the Dative Shift with a verb like render, which denotes a transfer contemporaneous with the event time but which behaves syntactically like a verb of future possession: (110) having rendered us the slightest service www.wtj.com/archives/suchet/suchet03a.htm The distinction between future possession and possession at the time of the event cannot account for the distribution of the Benefactive alternation with non-Latinate verbs, either. While for many verbs, in particular the creation and preparation verbs, there is necessarily a delay between the event and the benefit derived from it, benefits derived from a performance must necessarily be co-temporaneous with the performance, and Pinker's proposed constraint for the Dative Shift would not work for verbs like sing, dance, and recite in sentence like these: (111) She sang them a song (112) He danced us a little jig (113) Recite us your latest poem
226
Christiane Fellbaum
4.4
Aspect
Green states that the Benefactive alternation is compatible only with accomplishment. But again, attested data appear to refute this claim. As we saw in connection with the reflexives and the Spanish data, DO Benefactives occur with events that have stative character: (114) I always keep me some balled up paper by the phone. www.jolenestrailerpark.com/Storys/4htm (115) We all loved the flavour and thedevelopment of this wine and I said "Keep me some of this for my seafood starter". ... www.wineoftheweek.com/hist/food200806.html In the following example, the event denotes an open-ended activity: (116) I'm gonna have me some fun. ww.atlyrics.com/quotes/p/predator.html In fact, all the creation, preparation, and performance verbs can be turned into activities, as can be seem by their compatibility with temporal for-adjuncts: (117) She baked them waffles for hours (118) Mom washed me my shirts for years (119) She sang us Christmas songs for weeks It might be argued that (117-119) refer to repeated accomplishments; however, this is not the case in (114-116). We conclude that the restriction on the Benefactive alternation cannot be formulated in terms of the aspectual properties of the event. Indeed, it is not clear how such a constraint would be semantically motivated.6 4.5
Constraints on the noun arguments
Several restrictions have been formulated in terms of the semantics of the nouns expressing the Agent, the Benefit, and the Beneficiary.
Examining the Constraints on the Benefactive Alternation
227
4.5.1 Devotion or intention to please Green states that the Benefactive construction expresses the Agent's devotion or desire to please the Beneficiary. The Agent intends the event to benefit the Beneficiary. We already saw examples that involve no benefit but rather an undesirable consequence. Moreover, devotion and desire to please presuppose the animacy--or, more precisely, the sentience--of both Agent and Beneficiary. The Agent must not only be the instigator and in control of the event, but capable of intent and the feeling or attitude of devotion; conversely, the Beneficiary must be capable of appreciation. While a majority of the attested Benefactive data conforms to this assertion, we found cases with inanimate Beneficiaries, which are clearly incapable of being pleased. One could argue that in the examples below, the Agents are devoted to a doll or a car, which are anthropomorphized; the "devotion constraint" is extended here. (120) Brandy found a shirt sleeve...and made her doll a dress www.geocities.come/Haertland/Esates/3147/RPLOTN/julkids.html (121) ... Bought my car some new boots... www.strangely.org/diary/200008/ There are also cases where the subject of the Benefactive is an inanimate Cause or a causing event; these data invalidate the claim that the event demonstrates devotion or a desire to please. (122) ...the mixture of sand and clay and then let it stand in summer, the sun bakes you a brick. www.growise.com/Articles/sprhtm/bestsoilamendments.htm (123) Luck Found Me a Friend in You. ... www.gocollect.com/p/cherished-teddie/special-occasions.html (124) That deal saved me $6 www.100megsfree4.dom/blahthings/2001/march/mar16.html These data suggest that the devotion/appreciation constraint is not a hard one. While in (124), the Beneficiary might also have played an agentive role in the deal and thus have been at least partially responsible for his own action and the ensuing consequences, no intention can be ascribed to the sun or luck.
228
Christiane Fellbaum
4.5.2 Contemporaneous existence Green includes in her constraints on the Benefactive the contemporaneity of Agent and Beneficiary; the reasoning is that the Beneficiary must be able to benefit from the product or result of the event or of the entire event. But a web search reveals that this constraint does not hold. People perform actions for the benefit of not-yet-born Beneficiaries: (125) This again will save future generations time in collecting data ilil.essortment.com/craftstimecaps_rlmd.htm (126) ...industrious cultivators of Abbasid times save their descendants expense and labour by providing them with building materials www.gerty.ncl.ac.uk/letters/1462htm (127) agreement the Air Force and Raytheon and Hughes have negotiated on the Advanced Medium-Range Air-to-Air Missile will save future customers about $180 million. .. www.aerotechnews.com/starc/102797/102997d.html We also found examples of actions performed to benefit deceased, i.e., or no longer existing, Beneficiaries: (128) We will see what we can do to get him a gravestone marker www.islesoford.com/idcgues.html (129) if you don't buy a gravestone for Khveodor. You kept saying, it's winter, winter ... if you don't buy him one, he will come again, www.geocities.com.cmcarpenter28/Works/3deaths.txt (130) I've saved my last dime to buy him a casket. amsterdam.nettime.org/Lists-Archives/nettime 9908/msg00038.html Presumably, the speakers/writers of these sentences consider the deceased as being still among them and they extend the "contemporaneity" constraint proposed by Green accordingly. 4.5.3 The Beneficiary as employer Green further states that, with the Beneficiary as the DO, the action performed for someone's benefit cannot be carried out when the referent of the subject is employed by the referent of the DO. Her (constructed) examples are (131-132), where Mr. Lubin pays the speaker:
Examining the Constraints on the Benefactive Alternation
229
(131) I baked cakes for Mr. Lubin. (132) *I baked Mr. Lubin cakes. But the "employment constraint" is more subtle. We found numerous examples on the web with DO Beneficiaries where the action is performed under conditions of employment: (133) Happy Customers .. He built me a very fully loaded system www.grovecomputerservices/com/happy.htm (134) Tom Mullins Web Design Studio prepares you a bid www.tommullinsdesign.com/flag-prices .htm (135) In 1818-1819, Benjamin Henry Latrobe built the couple a house www.library.georgetown.edu/dept/speccoll/decatur/ What seems to account for the contrast in Green's examples and the web data is not just a broader context of employment or the commercial setting of the event, but the difference between the roles of an employer and a customer (I am grateful to Anthony Krogh and Philippa Cook, who, on separate occasions, suggested this interpretation). When the DO is unambiguously an employer, as in (136-139), the Benefactive alternant seems ruled out and only the for-PP alternant is grammatical; when the DO is a customer (a kind of temporary employer), the DO alternant is felicitous. We could find no sentences of the kind (136) (137) (138) (139)
??She stacked Wal-Mart shelves ??We cleaned The Maids houses ??They sold AIG insurance (cf. They sold insurance for AIG) ??Maradona played the Naples club soccer
Interestingly, the customer is also the Recipient of the product or result of the event, while the employer, who presumably passes the product on to his customers, is not. These data show once again the close semantic relation between Recipient and Benefactive. 5
Benefactive vs. Dative alternation
The Benefactive alternation resembles the Dative alternation both syntactically and semantically, and the two are often lumped together. We discuss
230
Christiane Fellbaum
some similarities and differences and argue for a distinction between the two constructions. 5.1
Beneficiary and Recipient
Both Recipients and Beneficiaries can occur freely in PP adjuncts and, with as yet ill-understood restrictions, as direct objects of many verbs. We already saw the semantic similarity between the two kinds of roles in cases where a Beneficiary is also a Recipient: (140) ...where he bought him some meat and a big loaf of bread. www.penguinreaders.com/downloads.Spreads/Olivertwist167.pdf (141) I got her some cute summer dresses. www.livejournal.com/users/piazza_rox31/ Oliver and she both receive something and benefit from it. The semantics of verbs like buy and get include both transfer and benefit, and this may account for the syntactic similarity of these verbs with respect to the PP/DO alternation. Only the PP alternants for verbs like buy and get, which must be headed by for, not to, show the difference between the two kinds of arguments. Some verbs of transfer do not strictly subcategorize for a Recipient. They may select for an adjunct with either a Recipient or a Beneficiary: (142) Kim mailed/sent/faxed a poem for John on his birthday. (John=Beneficiary) (143) Kim mailed/sent/faxed a poem to John on his birthday. (John=Recipient) Both arguments can co-occur as adjuncts in either order: (144) Kim mailed/sent/faxed a poem to Mary for John. (145) Kim mailed/sent/faxed a poem for John to Mary. But when one of these arguments is in direct argument position, it must be the Recipient: (146) Mary mailed/sent/faxed John a poem for Kim. (John = Recipient) (147) *Mary mailed/sent/faxed John a poem to Kim. (John = Beneficiary)
Examining the Constraints on the Benefactive Alternation
231
This fact seems to suggest some kind of "primacy" of the Recipient over the Beneficiary with respect to argument status. Passivization data reinforce this intuition. Our search results indicate that verbs that can take both a Recipient and a Beneficiary can passivize only the Recipient: (148) I sent mailed flowers for/to John. (John = Recipient or Beneficiary) (149) John was sent /mailed flowers (John = Recipient only) By contrast, many verbs that do not (also) select for a Recipient can passivize the Beneficiary (cf. also the examples in (50) - (55)): (150) The host poured drinks for us/*to us (us = Beneficiary) (151) The host poured us drinks (152) We were poured drinks Nevertheless, some verbs with a strong transfer meaning component that undergo the Benefactive alternation apparently cannot passivize the Beneficiary argument: (153) (154) (155) (156) (157) (158)
He got her shoes for her She fetched some clothes for him He got her her shoes She fetched him his clothes ??She was gotten her shoes ??He was fetched his clothes
Another difference between Recipients and Beneficiaries shows up in sentences where they are the sole argument. Some verbs that select for a Theme and a Recipient may delete the Theme when it is background knowledge shared among the discourse participants; here, the Recipient can be the sole DO: (159) (160) (161) (162)
I paid him (the money) She served them (dinner) I trade you (my stamp collection) He showed me (the trick)
But verbs in double object Beneficiary constructions cannot delete the Theme:
232
Christiane Fellbaum
(163) (164) (165) (166)
I'll cook you *(dinner) She prepared them *(lunch) We danced the children *(a folk dance) He bought her *(the ring)
Passives with Beneficiaries seem moreover constrained to events denoting the preparation of an entity and/or a transfer of possession. We found many non-transfer verbs in active constructions with a Beneficiary argument (either in a PP or as a DO ) but the web yielded no corresponding passives: (167) (168) (169) (170) (171) (172) (173) (174) (175) (176) (177) (178)
he composed her a song ??she was composed a song create me a website ??I was created a website wash me a shirt ??I was washed a shirt kill me some Redcoats ??I was killed some Redcoats ruin them their way of life ??they were ruined their way of life strike me a fire ??I was struck a fire
Like ingestion verbs, verbs of transfer can also select for a reflexive Recipient in DO position: (179) Mr Graham-Cumming sent himself the same message 10,000 times... www.spamfo.co.uk/The_News/Scams_&_Fraud/How_to_make_spam_unst oppable/2/ (180) She had promised herself a night on the town www.skaro.com/write/trish/trish27.html (181) she granted herself permission to lie. www.creativenonfiction.org/thejournal/articles/issue05/05editor.htm For Recipients, the PP-alternant is attested, too, in contrast to the ingestion verb with a reflexive Beneficiary: (182) ...a copy he sent to himself turned up in his own spam folder. ... www.careerjournal.com/jobhunting/ resumes/20040413maher.htm
Examining the Constraints on the Benefactive Alternation
233
(183) But Lindo had promised to herself that she would never forget... www.hh.shuttle.de/hh/gyha/ Facher/Englisch/joyluckmarl.htm (184) ...the little birthday present she'd granted to herself. www.grandt.com/XanderZone/ stories/read.php?story=Rejoined The show that there is some overlap between Beneficiaries and Recipients. First, a number of verbs select for both arguments. Second, the semantic role of a noun phrase often includes aspects of both Beneficiary and Recipient and cannot always be clearly distinguished. Third, the passivization data indicate a kind of "competition" between the Beneficiary and the Recipient, but suggest that only the latter has full argument status. A possible interpretation of the data is that the Beneficiary is a kind of sub-role of the Recipient, semantically more specified and syntactically more constrained. The Benefactive may be reserved for those cases not covered by the broader Dative/Recipient, namely, cases where no change of possession is necessarily involved or where the emphasis is on the benefit rather than a change of possession. Previous analyses of the Benefactive alternation, including Green,'s have cast its semantics in terms a change of possession, characterizing verbs of performance, creation, and preparation as metaphorical possession transfer. But this does not account for the fact that the alternations differ and are available for different verb classes. 6
Semantics of the alternants
We examined the constraints that have been proposed to account for the double object alternant of the Benefactive alternation. Web data demonstrate that many of the previously formulated constraints do not strictly hold and that speakers violate them regularly. However, the violations are not random but appear to be extensions demonstrating the "softness" of the semantic constraints. Given the semantic and syntactic overlap between the Dative Shift and the Benefactive Alternation, one might ask whether the explanations proposed for the constraints on the former can also help in understanding the latter. Krifka (1999, 2003) in his study of a large number of Dative-shifting verbs, argues for distinct meanings associated with the two alternants in many cases. He proposes that the DO syntax expresses a change of possession, where an Agent causes a Goal (or Recipient) to be in a state of possessing the Theme. The DO construction does nor provide for a movement event, in contrast to the PP alternant, which expresses an event where the
234
Christiane Fellbaum
Agent causes the motion of the Theme towards a Goal. Assigning specific semantics to these constructions, in the spirit of Goldberg (1995), seems to work well for the wide range of verbs showing the Dative alternation that Krifka discusses. An extension of this explanation to the Benefactive Alternation might be formulated roughly as follows. Analogously to Krifka's proposed analysis for the Dative Shift, the DO alternant causes a change of state in the Beneficiary, namely one where the referent necessarily becomes a Beneficiary and incurs the benefit. The PP alternant on the other hand simply expresses an event where an Agent intends a benefit for a potential Beneficiary; intention here could be interpreted as a kind of metaphorical movement of the benefit.7 Interestingly, the data from the reflexive Beneficiaries, often considered substandard or dialectal, provide some support for this analysis. Recall that these sentences involve verbs of ingestion and perception, where the Agent or Experiencer is necessarily coreferent with the Beneficiary: (185) (186) (187) (188)
I'll have myself a little snack before bed I'll eat me some potted meat gonna listen me some Guns and Roses gonna watch me some uneducational TV
Unlike with the other verb classes that show the alternation, the PP alternant is not available for these verbs: (189) (190) (191) (192)
*I'll have a little snack before bed for myself *I'll eat some potted meat for me/myself *gonna listen to some Guns and Roses for me/myself *gonna watch some uneducational TV for me/myself
Ingestion and perception events necessarily affect, and change the state of, the ingesting or perceiving entity, making these data consistent with a theory that says that the semantics of the double object alternant, but not those of the PP alternant, provide for a change of state. Green's constraint requiring the contemporaneous existence of Agent and Beneficiary constitutes a prerequisite for this analysis, while the data we found where the DO Beneficiaries are dead or as yet non-existent (sentences (125)-(130)) would be counterexamples. But it seems plausible that the speakers of these sentences conceptualize the Beneficiaries as living entities, capable of benefiting and undergoing a change of state.
Examining the Constraints on the Benefactive Alternation
7
235
Restrictions on the Benefactive alternation consistent with the data
We saw that the previously proposed constraints on the Benefactive cannot fully account for the naturally occurring data found on the web. The web data indicate that the constraints, as they have been formulated, are too rigid, and speakers regularly extend them. But clearly, restrictions on the DO Benefactive do exist. The data we examined do not permit us to formulate any hard constraints. Instead, we can state only one necessary but not sufficient condition for the DO Benefactive alternation. The attested data all appear to show at least one common semantic feature, the control of the subject over the event. 7.1
Control with transfer verbs
We saw that the Benefactive is allowed in many cases where the subject acts in order to bestow a benefit on another person (or entity). Some verbs of obtaining, like buy, get, fetch, and steal, when used as simple transitives, imply that the Agent is also the Beneficiary or Recipient of the obtained entity. But the default Beneficiary or Recipient can be cancelled in the presence of another Beneficiary argument: (193) ...santa said you guys gotta buy me my presents this year. . www.sassyandseksi.com/buystuff.htm (194) He wants someone to fetch him his shoes... www.washingtonpost.com/wpsrv/style/books/features/11980621.htm (195) Gabby's mom stole me some pants from the hospital www.dyve.com/springman/avi/art/artnav.htm Other verbs, like receive, denote events where the potential benefit must remain with the subject and cannot be passed on to another (non-subject) Beneficiary. The subject here is not only a necessary, but also a passive Beneficiary, and is not in control of the event where the possession changes ownership. We could not find any examples of these verbs with the Benefactive, either in the PP or in the DO alternant: (196) *I'll receive me/you/her a little present The subject's control over the event appears to be one (necessary but not sufficient) requirement for the Benefactive alternation. Further evidence
236
Christiane Fellbaum
comes from the interesting case of polysemy presented by the verb find. It shows the Benefactive alternation (and a Benefactive reading of the corresponding for-NP phrase) when it refers to the result of a search effort that implies a goal or intent, but we could not find instances where it refers to an accidental or serendipitous finding, as in the constructed (200): (197) Find Me My Property. Www.marinatradingpost.com/form1.html (198) My husband ...made it his mission to find me my pink shoes. www.epinions.com/sprt-Basketball-Adidas_superstarII (199) Find me my Perfect Mate! ... www.cutecards.net/platinum/icq/funlovetest.html (200) ??Find me a wallet in the street 7.2
Control with consumption and perception verbs
For verbs of consumption and perception, the subject, the ingester, is necessarily the Beneficiary. An explicit Beneficiary, coreferent with the subject, can be added to emphasize the subject's causation of, or control over, the event: (201) I'll have myself a little snack before bed www.dedecountryhome.com/BuddyBoy3.html (202) I reckon I'll eat me some potted meat. www.math.gatech.edu/~mullikin/res/respics.htm (203) gonna.listen me some Guns and Roses www.angelfire.com/me3/NovaSparkle/xjournal02.html (204) Gonna watch me some uneducational TV, damnit. www.champuru.com/08-2000/08-29-2000.html Our web searches turned up no examples of Benefactives with verbs where the perception event is not caused or controlled by the subject, as with hear and see, which do not imply intention and hence control by the perceiver: (205) ??I hear me some noises in the street (206) ??I saw me an accident on the road
Examining the Constraints on the Benefactive Alternation
7.3
237
Inanimate controllers
Sentences like (207-208) below show that an inanimate Cause can have control over an event, even though it is incapable of intention and volition: (207) The sun baked you the bricks (208) Still, the fact is the current budget only bought us time. ... www.americanprogress.org/site/ A Cause may control an event because of its specific properties, much as in middle constructions, where an entity's particular property enables a potential event. No sentience, volition, or intention is required to cause a benefit. Control is thus the one common semantic component of the wide range of subjects in the DO alternant; all other previously proposed constraints were shown to be violated by attested data. While control does not seem like a satisfactory semantic characterization, we expect to better understand the nature of the arguments in the alternation as more sophisticated web searches yield pertinent data. 8
Conclusions and future work
The web data show that most of the constraints that have been proposed on the basis of constructed data are soft and speakers frequently violate and extend them, though most data fall into the kinds of patterns that previous researchers have suggested. The work reported here raises the question as to the core of a constraint and its "fuzzy edges." This case study shows up the need for attested data, as constructed contrastive data, often labeled either "grammatical" or "ungrammatical," fail to capture the fuzziness of real constraints and often reflect the theoretical biases of the investigators who construct the data.8 Our main goal here was to test proposed constraints against attested data, we are not able to offer a revised full explanation of the alternation. The data are consistent with two observations: One, that the DO alternant requires that the subject have the abilities or properties required to bestow the benefit; two, that in the DO alternant, unlike in the PP alternant, a benefit is necessarily bestowed, resulting in a change of state of the affected entity, the Beneficiary. Traditionally, linguistic research had to rely on data based largely on the investigator's intuitions; attested, unsolicited, and naturally occurring data could not be obtained in a systematic fashion. Corpora represent a first step
238
Christiane Fellbaum
toward research based on non-constructed data. In particular, the World Wide Web represents a very large and domain-independent corpus that can be mined easily and efficiently. Our method was clumsy, and we cannot claim to have found all the relevant data. Therefore, we are careful not to propose a definitive account of the Benefactive construction. We plan to re-examine the Benefactive construction, as well as other illunderstood grammatical phenomena, with the help of a sophisticated search tool (Resnik and Elkiss 2004). Resnik and Elkiss's Linguist's Search Engine allows the user to search data from Internet Archives for specific syntactic structures and to build custom tailored corpora with pertinent hits for the purposes of empirical investigation. This tool will allow the testing and possible refinement of linguistic theories, and permit their formulation in the light of relevant data that might not otherwise be considered. Acknowledgements This work was supported in part by NSF Grant Number IIS-0112429. I thank Mari Olsen, Philip Resnik, Usama Soltan, Manfred Krifka, Philippa Cook, Adele Goldberg, Artemis Alexiadou, Hans Kamp, Effi Georgialou, Anthony Kroch, and Ben Haskell for their critique and helpful comments.
Notes 1.
2. 3. 4. 5.
Lapata's (1999) interesting study the Dative and Benefactive alternations, using the British National Corpus, investigates the relative frequencies of the two alternations, the preference of the alternating verbs for the DO vs. the PP alternant, and the representative members of the participating classes, based on Levin (1993). Her quantitative focus is however quite different from ours and does not directly challenge the proposed constraints on the Benefactive alternation. These constructions are described by Christian (1991) in her study of Appalachian speech. Christian states that they carry a "light benefactive meaning," but offers no further evidence for this assertion. I thank Manfred Krifka for pointing these data out to me. In principle, non-argument status would preclude the occurrence in direct argument position. Verbs like contribute and donate are generally considered to be restricted from the Dative Alternation. While a web search showed up no sentences with contribute and DO Recipients, well over a thousand sentences like can anyone donate me some ice cubes? and you can donate me some money. Apparently,
Examining the Constraints on the Benefactive Alternation
6. 7.
8.
239
whatever semantic constraints blocks the alternation for contribute does not (or no longer) block it for donate. If Benefit is equated with Change of Possession, then the aspectual constraint would be better motivated, as a change of possession tends to denote an accomplishment. Such an explanation seems related to the holistic constraint, which is often assumed to account for the spray-load alternation. The argument projected as the direct object is fully (holistically) affected by the event, in contrast to the argument in the PP. In an examination of the distribution of the Dative Shift on the basis of attested data, Bresnan and Nikitina (2003) claim that much data that is labeled "ungrammatical" is merely "improbable" and that the probability of their occurrence is linked to information structure.
References Anderson, Stephen R. 1971 On the Role of Deep Structure in Semantic Interpretation. Foundations of Language 7: 387-396. Baker, Carlos L. 1979 Syntactic theory and the projection principle. Linguistic Inquiry 10. 533-581. Bresnan, Joan and Tatiana Nikitina 2003 On the Gradience of the Dative Alternation. Ms. Stanford, CA: Stanford University. Christian, Donna 1991 The personal dative in Appalachian speech. In Dialects of English, Peter Trudgill. and J Chambers (eds.), 11-19. London: Longman. Curme, George O. 1986 A grammar of the English Language. Vol II: Syntax. Essex, CT: Verbatim Printing. Daultrey, Bethan 1997 The Structure of the Double Object Construction in English. www.ucd.ie/~pages/97/daultrey Goldberg, Adele 1995 Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Green, Georgia 1974 Semantics and Syntactic Regularity. Bloomington, IN:Indiana University Press. Jackendoff, Ray 1990 Semantic Structures. Cambridge, MA: MIT Press.
240
Christiane Fellbaum
Krifka, Manfred 1999 Manner in Dative Alternation. In: Proceedings of the 18th West Coast Conference on Formal Linguistics, Sonya Bird, Andrew Carnie, Jason D. Haugen, and Peter Norquest (eds.), Tucson, AZ: University of Arizona. 2003 Semantic and pragmatic conditions for the Dative Alternation. Korean Journal of English Language and Linguistics 4:1-32. Lapata, Maria 1999 Acquiring Lexical Generalizations from Corpora: A Case Study for Diathesis Alternations. In: Proceedings of the 37th Meeting of the Association for Computational Linguistics, 266-274. College Park, MD. Larson, Richard 1990 On the Double Object Construction. Linguistic Inquiry 19:335-391. Levin, Beth 1993 English Verb Classes and Alternations: A Preliminary Investigation. Chicago, IL: University of Chicago Press. Marantz, Alec 1984 On the Nature of Grammatical Relations. Cambridge, MA: MIT Press. Nishida, Chiyo 1994 The Spanish reflexive clitic se as an aspectual class marker. Linguistics 23: 425-258. Pinker, Steven 1989 Learnability and Cognition: The Acquisition of Argument Structure. Cambridge, MA: MIT Press. Resnik, Philip and Aaron Elkiss 2004 The Linguist's Search Engine: Getting Started Guide. Technical Report: LAMP-TR-108/CS-TR-4541/UMIACS-TR-2003-109. University of Maryland, College Park. Wechsler, Stephen 1995 A Non-Derivational Account of the English Benefactive Alternation. Paper presentend at the 65th LSA Annual Meeting. Chicago, IL.
A Quantitative Corpus Study of German Word Order Variation Kris Heylen
1
Introduction
Word order variation in German is as an area of syntactic research in which the limitations of the types of data traditionally used in theoretical linguistics have become apparent. In a case study, we present a quantitative corpus analysis as a possible alternative to overcome these shortcomings. Traditionally, theoretical linguists have not had to worry much about the validity of the data on which they based their theories. For most linguists, obtaining relevant data seemed fairly unproblematic and easy. In the generative tradition, linguists could rely on the introspective grammaticality judgments of any single native speaker (including themselves) to uncover the principles of grammar. Researchers taking a cognitive-functional approach offered hermeneutic interpretations of “encountered” examples of language use to show how discourse properties or general cognitive abilities shape the grammar. Most research into word order variation in German was also based on these types of linguistic evidence. Recently however, there has been a growing awareness across the field of theoretical linguistics that this kind of data has limited reliability and is insufficient to deal with the complexity of linguistic phenomena. Several options are being pursued to provide grammar research with a more solid empirical basis. One of them is the use of large electronic corpora for collecting representative data samples in which theoretically interesting patterns of linguistic structure can be discovered and validated. This kind of corpus study will be applied here to word order variation in the Mittelfeld of the German clause. 2
Word order variation in German
Word order variation has been a longstanding issue within the study of German syntax. In the first section, we will briefly outline the major research questions involved, and we will point out how problems with data
242
Kris Heylen
reliability have played an important role in this area of research. The second section introduces the specific type of word order variation that we will pursue further in a following case study. 2.1
A challenge to traditional data types
German clause topology is characterized by the so-called Klammerconstruction. The German clause has two fixed positions, called Klammer (German for ‘brackets’), that are typically occupied by elements of the verbal group in main clauses or by a complementizer and the verbal group in subordinate clauses. These fixed positions subdivide the clause into three main “fields”. Of interest here is that the field between the two fixed positions, called the Mittelfeld (middle field), can contain multiple constituents and that these constituents do not always occur in the same order. Especially the relative order of verbal arguments like subject, direct and indirect object, co-occurring in the Mittelfeld, has been the subject of a lively debate within the German linguistics community. In this debate, problems of linguistic evidence have played a major role. At the debate’s climax in the mid 1980’s, linguists mainly differed in opinion as to whether the word order variation in the Mittelfeld was mainly determined by either grammatical or pragmatic factors.1 Both sides kept on coming up with examples that seemed to confirm the importance of the factors they had put forward, while refuting the effect of those suggested by the other side. Two main problems seemed to keep the debate from being settled: firstly, a great many factors were involved simultaneously, and secondly, each factor taken individually rarely had a categorical effect. This posed enormous problems to the traditional types of linguistic evidence on which the researchers relied. Without a categorical effect of the factors, grammaticality judgements were not unambiguous and not at all stable across speakers. Moreover, there was no obvious scale to interpret the resulting graded differences in grammaticality. The fact that multiple factors were involved, made it well nigh impossible to control for all of these factors simultaneously in test sentences, let alone assess the contribution of each factor to the graded grammaticality. Towards the beginning of the 1990’s, there was an increasing awareness that the main problem was indeed a methodological one: the traditional introspective data was unreliable and could not cope with the phenomenon’s complexity. As a consequence, several new empirical methods were tried out.
A Quantitative Corpus Study of German Word Order Variation
243
One approach used psycholinguistic experiments based on processing time differences (e.g. Pechmann et al. 1996, Poncin 2001), and a second type of study looked at corpus material (e.g. Primus 1994; Kurz 2000). A third and more recent approach uses a sophisticated version of grammaticality judgments with a strict design, taken from multiple test subjects and analyzed with advanced statistical techniques (e.g. Keller 2000). Yet, these approaches do have problems of their own. Both the psycholinguistic experiments and the corpus studies continued struggling with the variation’s multifactorial complexity: the psycholinguistic studies had to limit the number of factors because of the time-consuming and costly way of collecting data. Results were highly reliable but dealt with only one or two factors simultaneously. The corpus studies could investigate the effect of multiple factors in large amounts of actual usage data, but they lacked the statistical apparatus to deal with multifactorial complexity. The third method of enhanced grammaticality judgments can gather sufficient data relatively easy and has the appropriate statistics to deal with multifactoriality, but the heuristic status of grammaticality judgments themselves is not unproblematic. They certainly give a reliable, reproducible estimate of speakers’ post-hoc introspective judgments of sentence acceptability, but it is unclear whether this, as claimed, directly reflects speakers’ linguistic competence of grammatical constraints used in on-line production, while at the same time filtering out “performance2 noise”. It seems more probable that the relation of acceptability judgements to the grammatical system is a complex and indirect one, because assessing acceptability is a separate and complex cognitive activity that potentially introduces new biases. The case study presented below opts for corpus material as data source for several reasons. Firstly, usage data as collected in corpora can be seen as the primary data in linguistics. Actual usage is what can be directly observed about language in reality. For a usage-based approach to grammar3, usage is primary because it fundamentally shapes the grammar. But even in a modular, autonomous grammar, usage data is not less primary than judgments because both are biased by performance, and in both performance noise can be filtered out in principle. Secondly, electronic corpora are getting larger by the day, so that gathering large amounts of data is relatively easy. Thirdly, corpus data deals with the problem of gradient effects fairly automatically by studying relative frequencies. Finally, the simultaneous effect of multiple factors can be studied directly by looking at actual usage, and these effects can be explored given the appropriate statistical apparatus. It is in this last respect that the case study below will try to improve on previous corpus studies that only used monofactorial analyses.
244
Kris Heylen
2.2
A specific type of word order variation
The case study focuses on a specific type of word order variation in the Mittelfeld, viz. the variation that occurs when both a full NP-subject and a pronominally realized object are present in the Mittelfeld. In this case, the pronominal object can either precede the full subject NP (as in ex. 1) or follow it (ex. 2).4 The variation occurs with both direct and indirect object pronouns. (1)
Ein paar Tage später nahm ihn der SED-Chef der Uni beiseite a few days later took him the SED-chief of the Uni aside ‘A few days later the university's SED-chief took him aside’
(2)
Später, als die Kommission ihn entlassen hat, sagt er, ... later when the commission him dismissed has says he ‘Later, when the commission has dismissed him, he says ...’
Although most reference grammars of German consider the word order with object-first (ex. 1) to be more common, both word orders seem to be freely interchangeable without any obvious difference in grammaticality or meaning. Because of this, traditional heuristic methods like grammaticality judgments cannot discriminate between examples and thus cannot detect the effect of relevant factors. Even the method of enhanced grammaticality judgments cannot detect a difference in acceptability between the two variants (Keller 2000: 108ff). The few other studies that discuss this type of variation (Lenerz 1994; Zifonun 1997:1511ff) admit that influencing factors are hard to identify. Classifying any syntactic variation as “free” variation is explanatorily highly unsatisfying and probably means more often than not that our methods for studying the phenomenon are insufficient. Admittedly, this type of variation might seem a hard nut to crack and not an obvious choice to start investigating German word order variation, but it also has a methodological advantage: By keeping the object pronominal, we reduce the multifactorial complexity because pronouns vary less in e.g. lexical form, length or discourse given/new status than full NPs. In what follows, we look at whether a quantitative corpus study can overcome the deadlock of traditional grammaticality judgments.
A Quantitative Corpus Study of German Word Order Variation
3
245
A quantitative corpus study
In the first section we discuss the corpus that was used to collect data and how this data collection was done. The second section introduces the different factors whose effect on word order we will examine in section 4. 3.1
Collecting the data
The case study is based on data from the NEGRA corpus.5 The corpus was compiled at the University of Saarbrücken and consists of 20,602 morphosyntactically annotated sentences (355,096 tokens) taken from the 1992 editions of a local Frankfurt-based daily newspaper (Frankfurter Rundschau). Strictly speaking, this means that any of our findings will only hold for this specific type of language use, i.e. language use from this specific register (newspaper articles), location (region of Frankfurt) and time period (1992). Yet newspaper texts cover many topical domains and are written by multiple authors, often with different backgrounds, so that the patterns we find in this type of data might well be representative for modern German usage in general. Of course, whether there are regional, sociolinguistic or register differences is a question that is open to further empirical investigation. For the data collection, relevant observations were defined as all clauses with a nominal subject and one pronominal object in the Mittelfeld. Taking advantage of NEGRA’s morpho-syntactic annotation, we used a selfprogrammed PERL script to extract these observations automatically, and then we did a manual check for precision on all retrieved observations and a manual check for recall on 20% of the corpus. The script had found all relevant observations but also 15% spurious ones. After removing irrelevant clauses, we were left with a total number of 995 observations. This means that the construction is quite common and present in about 5% of the corpus’ sentences. The observations were then annotated for the response variable word order (subject first vs. object first) and for a number of factors that are traditionally mentioned as relevant in the literature on word order variation. These factors had to be operationalized, so that each observation could be assigned a unique value during a process of mostly manual annotation. Basically, this process takes the form of an iterative annotation loop: as a new observation comes up that does not fit into the initial operationalization of a factor, this operationalization has to be revised to accommodate for this observation. Then, all previously annotated observations have to be checked
246
Kris Heylen
again to see whether they are still in compliance with the new operationalizations of the different factors. This reiteration goes on until all observations are annotated adequately. It goes without saying that this a very time consuming process, but it is probably the price that has to be paid for reliable data. The final outcome of this annotation is a so-called data matrix, which states for every observation which word order was attested and which values were observed for each factor. This data matrix is then amenable to further statistical analysis. But before turning to this analysis, the next section discusses the factors that were investigated. 3.2
The factors
Because the differences in grammaticality and meaning between the word order variants in our study are so elusive, it is very hard to make an initial guess about which factors might have an influence. Therefore we consulted the (vast) literature on word order and selected a set of the many factors that are mentioned as relevant. Seven of those are discussed in this study (see table 1 for an overview). Most of them pertain to properties of the subject because its realization as a full NP allows for more diversity than the pronominal object. Table 1. Factor overview FACTORS
VALUES
1
Case of the pronominal object
[dative] [accusative]
2
Semantic role of the subject
[agent] [recipient] [theme]
3
Length difference between subject and object
number of syllables
4
Given/new status of the subject
nine point ordinal scale
5
Animacy of the subject
[animate] [inanimate]
6
Pronoun type of the object
[personal] [reflexive]
7
Clause type
[main] [subordinate]
A Quantitative Corpus Study of German Word Order Variation
3.2.1
247
Case of the pronominal object
Grammatical case is probably the most basic factor for word order phenomena. Indeed, most German grammars explicitly refer to a constituent’s case to describe preferred orderings. Grammatical case like nominative, accusative or dative is mostly morphologically marked in German. Because our study keeps the presence of a nominative subject constant, the only variation in case occurs with the pronominal object. This can either be a pronoun in accusative or dative case. 3.2.2
Semantic role of the subject
The semantic role or theta role of a constituent refers to the role the referent of a constituent plays in the action denoted by the verb. It is a factor that figures prominently in many generative accounts of word order. In this study we discriminate between three roles for the subject referent. Agent, when the referent performs an action or causes an action to take place; Recipient, when a referent is the active recipient of objects or stimuli; and Theme, when the referent is itself inactive in the action / state denoted by the verb. 3.2.3
Length difference between subject and object
The idea that length difference has an effect on word order was introduced by Jacob Wackernagel in late 19th century and recapitulated by Behaghel as his “law of growing constituents”, which claimed that short constituents tended to precede longer ones. More recently, Hawkins’ (1994) EIC-theory claims that length difference is the main factor determining word order. In this study we measured the difference between subject and object in syllables rather than number of words, because in a compound-friendly language like German individual words can already differ greatly in length. Because the object is always a one or two syllable word, this measure mainly reflects the length of the subject. 3.2.4
Given/new status of the subject
Given/new status refers to whether a referent was previously mentioned in the discourse or not. Its influence on word order was a basic assumption of
248
Kris Heylen
the Prague School. It is a factor that is notoriously difficult to operationalize because of the many intermediate categories between completely new and totally given. The 9 point scale we use here for the subject referent was developed as an opportunistic tool by Grondelaers (2000) based on earlier work by Ellen Prince (1981) and Mira Ariel (1991) and it has since proven its applicability in several analyses. The scale (table 2) classifies a referent by its degree of accessibility, given the current discourse model. Table 2. Given/new scale VALUE
REFERENT ACCESSIBILITY IN CURRENT DISCOURSE
1
Not accessible and unconstrained
2
Not accessible but constrained by the context
3
Not accessible but constrained by an anchor referent
4
Accessible through encyclopaedic knowledge
5
Inferable from an anchor referent
6
Accessible in the wider linguistic context
7
Inferable from the near linguistic context
8
Accessible in the near linguistic context
9
Accessible in the immediate speech context
3.2.5
Animacy of the subject
Whether a constituent’s referent is animate or inanimate is proposed as the main determiner of German constituent order in the reference grammar by Zifonun et al. (1997). In this study, we only look at the animacy of the subject referent because a substantial part of the objects, viz. those realised as reflexive pronouns, would only mirror the subject’s animacy. 3.2.6
Pronoun type of the object
Although reflexive pronouns also participate in the variation, they have a different semantics from personal pronouns, which could influence word order. Because of the reporting style of news paper text, most pronouns are third person pronouns. For reflexives, the third person form sich is indeed the only one that was observed.
A Quantitative Corpus Study of German Word Order Variation
3.2.7
249
Clause type
In this study, clause type refers to the difference between main clauses and subordinate clauses. Whereas main clauses in German have the finite verb in second position (occupying the first Klammer), subordinate clauses have the finite verb in (near) final position (the second Klammer). Hawkins’ (1994) theory predicts this will lessen the tendency in subordinate clauses to put shorter constituents before longer ones. In our case, this would mean fewer short pronominal objects before long full-NP subjects in subordinate clauses. 4
Statistical exploration
After the annotation process described above, we had obtained a data matrix which states for every observation which order the subject and object appeared in, and which value was observed for each of the seven factors. Now, statistical analysis allows us to examine the correlations between word order and the factors. This can be done from two perspectives: if some theory of grammar has lead us to formulate a hypothesis that makes an explicit prediction about the correlation between word order and some factor(s), we can test whether this hypothesis is confirmed by the data or not. This is called confirmatory analysis. On the other hand, if we have annotated our observations for a number of factors, but we do not yet have an explicit hypothesis about which factors determine the word order and we would just like to know a bit more about the effects of these factors, we can explore the correlations between factors and word order in the data. This is called exploratory analysis. This exploration is meant to give a better insight into the data, which may well lead to new theoretical understandings and explicit hypotheses. In their turn, these hypotheses can again be tested. With the word order variation studied here, the main problem was precisely its elusive character which prevented us from formulating an explicit hypothesis about what determined the variation. Instead we chose to look at a number of factors suggested by the literature. An exploratory statistical analysis can now give us an idea about which of these factors are actually relevant, in what way and to what extent. Note that a statistical analysis will not in itself provide an explanation; rather, it uncovers patterns that themselves need explaining. These analyses help to expose empirical facts that are not apparent at first sight. These facts should be the input for explanation finding and theory building. A good theory will try to generalize and make predictions for other cases than the ones it started from. Whether
250
Kris Heylen
these predictions are borne out by the “empirical facts” is then a question to be addressed in additional analyses. The analyses presented below explore the data at levels of increasing complexity with increasingly advanced statistical techniques. The main concern will be the kind of information that these statistical techniques provide, not their technical details.6 First, we look at the relative order of subject and object per se to see which order is dominant. Next, the effect on word order of each factor separately is investigated. Then, we examine the effect of one factor while controlling for a second factor. Finally, we assess the effect on word order of multiple factors simultaneously. 4.1
The proportion of object-first and subject-first
In studying syntactic variation, an obvious first question seems to be: how much variation is there? Are both orderings of subject and object equally frequent, or is there a clear default, dominant order? In our data, 889 out of 995 observations have object-first whereas only 106 observations have subject-first. This proportion of 89,3% object-first confirms what most grammars of German say, viz. that object-first is the default order. For a future theoretical interpretation, this probably means that subject-first will be considered a marked order whereas object-first is the unmarked order. We might also be interested to know how reliable the information about this proportion is. How sure can we be that the proportion object-first we find in our data is a good estimate for the proportion of object-first in general. In fact, this is the basic question underlying all of statistics: how reliable are the results obtained from a sample of observations when compared to all possible observations. Intuitively, it is clear that the more observations we take into consideration, the more reliable our results will be. Statisticians use this property to determine confidence intervals from a sample. The 95% confidence interval for a proportion is the interval in which the true proportion for all possible observations will be situated with 95% certainty. The more observations we take into account, the more we can narrow down the interval. This confidence interval holds for all observations made under similar conditions as those under which the sample was collected. In our case, these conditions would be something like all observations that come from newspaper articles that appeared in local newspapers from central Germany in the early 1990’s. For these conditions, the 95% confidence interval for the proportion of object-first is situated between
A Quantitative Corpus Study of German Word Order Variation
251
87.1% and 91.0%. We now can reliably say that object-first is indeed the default order. 4.2
The effect of separate factors
Above, we introduced seven factors that we think might influence the relative ordering of a full subject NP and a pronominal object. In this section, we examine for each of the seven factors separately, what its effect on word order is. For each factor, we look at two statistics: the F² test7 tells us whether there is an association between the factor and word order. If there is an association, a “measure of association” tells us how strong the association is and what direction it takes. 4.2.1
Case of the pronominal object
Table 3 makes clear that there is not much of an effect of case on word order. The proportion of object-first versus subject-first cases is exactly the same for observations with an accusative and a dative pronoun. The F² test confirms that there is no significant association (p = 0.94).8 This lack of effect is somewhat unexpected, because case is considered to be relevant for word order by nearly all reference grammars. However, this may lead us to consider that although case per se is not important, some more specific interpretation of case might well have an effect, as we will see below (4.3). Table 3. Word order by Case of object Case
OBJECT FIRST
SUBJECT FIRST
ACCUSATIVE
724 / 810 (89%)
88 / 810 (11%)
DATIVE
165 / 185 (89%)
20 / 185 (11%)
4.2.2
Semantic role of the subject
The semantic role of the subject has a significant effect on word order (F², p < 0.01). Agent subjects precede the object relatively more often than recipient subjects, and in their turn recipient subjects precede objects more often than theme subjects. Increased agentivity of the subject seems to favour subject before object ordering, something we indeed expect from the literature. If we consider the three semantic roles as levels on a scale of
252
Kris Heylen
agentivity, the so-called gamma index gives a measure for the strength of the association between agentivity and word order. The index ranges from -1 (perfect inverse linear association) over 0 (no association) to 1 (perfect linear association). Here the gamma index is –0.49. The negative sign means that high levels of agentivity correspond to relatively lower levels of object-first (i.e. relatively more subject-first). The absolute value of |0.49| indicates that there is moderate association. Table 4. Word Order by Semantic role of subject Role
OBJECT FIRST
SUBJECT FIRST
AGENT
466 / 547 (85%)
81 / 547 (15%)
RECIPIENT
104 / 116 (90%)
12 / 116 (10%)
THEME
319 / 332 (96%)
13 / 332 (04%)
4.2.3
Length difference between subject and object
Length difference between subject and object has a significant effect on word order (F², p < 0.01). Smaller length differences lead to relatively more subject-first as we can also see in table 5. Because the pronominal object is always short, length difference mainly reflects the length of the subject. This means that shorter subjects precede the object relatively more often than longer ones, which is what we expect from Behaghel’s “law of growing constituents”. The gamma index of –0.44 reflects a moderate inverse association between object-first and smaller length differences. Table 5. Word Order by Length difference Syllables
OBJECT FIRST
SUBJECT FIRST
0-3
299 / 362 (82%)
63 / 362 (18%)
3-6
212 / 233 (91%)
21 / 233 (09%)
>6
378 / 400 (95%)
22 / 400 (05%)
4.2.4
Given/new status of the subject
Table 6 shows that the given/new status of the subject referent as measured by its degree of accessibility does not have a perfect linear effect.9 Indeed, empirical data does not always show the neat results we would like. How-
A Quantitative Corpus Study of German Word Order Variation
253
ever, more accessible subjects do seem to precede the object relatively more often, which we would expect from the theories of the Prague school. The (MH) F² test confirms that there is linear association (p < 0.01). A gamma index of 0.28 indicates that this linear association is relatively weak. Table 6. Word order by given/new status of the subject Values
OBJECT FIRST
SUBJECT FIRST
1
162 / 169 (96%)
7 / 169 (04%)
2
48 / 55 (87%)
7 / 55 (13%)
3
24 / 24 (100%)
0 / 24 (00%)
4
103 / 117 (88%)
14 / 117 (12%)
5
101 / 106 (95%)
5 / 106 (05%)
6
255 / 291 (88%)
36 / 291 (12%)
7
140 / 159 (88%)
19 / 159 (12%)
8
56 / 74 (76%)
18 / 74 (24%)
4.2.5
Animacy of the subject
In table 7, animate subjects precede pronominal objects more often than inanimate subjects. This effect is statistically significant (F² p < 0.01) and fits in with the effect of animacy that Zifonun (1997) predicts. Both word order and animacy have only two values and the measure of association generally used in such cases is the odds ratio. Here, this is the odds in favour of subject-first with animate subjects divided by the the odds in favour of subjectfirst with inanimate subjects, which gives a value of 2.33. The odds of having subject-first with animate subjects is more than twice the odds with inanimate subjects, a moderately strong association. Table 7. Word order by Animacy of the subject Animacy
OBJECT FIRST
SUBJECT FIRST
ANIMATE
532 / 614 (87%)
82 / 614 (13%)
INANIMATE
357 / 381 (94%)
24 / 381 (06%)
254
Kris Heylen
4.2.6
Pronoun type of the object
Personal pronouns follow the subject significantly more than reflexive pronouns (F² p < 0.01). Apparently, the fact that reflexive pronouns do not introduce a separate referent in the sentence’s meaning has consequences for their ordering. This finding from our data exploration can now lead us to search for a theoretical interpretation. The odds ratio of 2.96 indicates a moderately strong association. Table 8. Word order by pronoun type of the object Pronoun type
OBJECT FIRST
SUBJECT FIRST
PERSONAL
141 / 179 (79%)
38 / 179 (21%)
REFLEXIVE
748 / 816 (92%)
68 / 816 (08%)
4.2.7
Clause type
The marked order subject before object is much more frequent in subordinate clauses than in main clauses. There is indeed a significant association between clause type and word order (F² p < 0.01). This is a finding we will also have to interpret further after completing our data exploration. The odds ratio of 7.41, meaning that the odds for subject first in subordinate clauses is more than 7 times those odds in main clauses, indicates a very strong association. Table 9. Word order by Clause typet Clause type
OBJECT FIRST
SUBJECT FIRST
MAIN
646 / 674 (96%)
28 / 674 (04%)
SUBORDINATE
243 / 321 (76%)
78 / 321 (24%)
4.3
Stratified analysis
In the one-by-one analysis of factors, a surprising result was the lack of effect of object case on word order. Although there is no general effect, there might be an effect for specific types of pronominal objects. We therefore consider the effect of case for personal and reflexive pronouns separately. This is done in a so-called stratified analysis: we examine the effect of one factor (case) on word order, while controlling for a second factor (pronoun
A Quantitative Corpus Study of German Word Order Variation
255
type). Table 10 now tells us that there is a significant, moderately strong effect of case with personal pronouns (F² p < 0.01, odds ratio = 2.92), but there is no such effect with reflexive pronouns (F² p = 0.60). There is also a test statistic, the Breslow-Day test, to check whether the effect of case is indeed significantly different for reflexives and personal pronouns. With a p-value of 0.02, we can say that the probability of the effect being the same is very small. One reason might be case syncretism: the reflexive sich has the same form for dative and accusative, whereas personal pronouns do have different forms for these cases. There may also be other reasons, but in any case, the stratified analysis has revealed an interesting difference that we might want to interpret theoretically. Table 10. Word order by pronoun case, stratified for pronoun type Pron. type
Case
PERSONAL
ACCUSATIVE
69 / 97 (71%)
28 / 97 (29%)
DATIVE
72 / 82 (88%)
10 / 82 (12%)
ACCUSATIVE
655 / 713 (92%)
58 / 713 (08%)
DATIVE
93 / 103 (90%)
10 / 103 (10%)
REFLEXIVE
4.4
OBJECT FIRST
SUBJECT FIRST
Multifactorial analysis
In the previous sections, we have looked at the effect of the seven factors separately, or at the effect of one factor while controlling for a second factor. However, in the actual data, these seven factors are at work simultaneously. To investigate simultaneous effects, multifactorial statistical techniques are used. They address questions like, considering all factors at the same time, which ones do actually have an effect, what is their combined effect, what is each factor’s contribution to the combined effect, which factor is the most important one, and how good can we model the variation by the factors we have considered so far? First, we will look at a Logistic regression model. Next, we discuss a Classification and Regression Tree (CART). 4.4.1
Logistic regression model
A logistic regression model is an advanced statistical technique that estimates the simultaneous effect of the factors on word order. First, a stepwise
256
Kris Heylen
selection procedure determines which factors actually have an effect, given that all seven factors are considered simultaneously. The procedure selects the factors in order of effect strength and adds these to the model until no factors are left that still make a significant contribution to the effect on word order. In table 11, we see that five factors with a significant effect (p-value < 0.01) are selected for the model. Clause type has the strongest effect, followed by length difference, subject animacy, pronoun type and subject givenness. The procedure also selects one interaction, between clause type and pronoun type. Apparently, the effect of pronoun type is not the same in main and subordinate clauses. The model now states the combined effect of all selected factors on the odds of having subject-first (because odds must lie between 0 and 1, the effect is modelled on a logarithmic scale). Table 11. Logistic regression model Factor
DF
Estimate
Odds ratio
p
INTERCEPT
1
-5.114
1) Clausetype (subordinate)
1
2.512
2) Length diff. (small)
1
0.731
2.078
<0.01
3) Subj.animacy (animate)
1
0.988
2.687
<0.01
4) Pronountype (personal)
1
1.944
5) Clausetype*Pronountype
1
-1.684
6) Subject givenness (high)
1
2.021
<0.01 Person: 2.29 Reflex: 12.3
Subord: 1.3 Main: 6.99
<0.01
<0.01 <0.01
2.021
<0.01
Model Statistics
Model Information
Goodness of fit (Hosmer & Lemshow): 9.7 / 8 df / p = 0.28 (no lack of fit)
-
Subject-first modelled
-
Factors in order of entrance by stepwise selection procedure
AIC: Intercept only: Intercept and covariates:
677.016 529.890
Model signif. (LLratio): 163 / 8 df / p < 0.01 Predicted correctly: 89.9% (dumb: 89.3%)
A Quantitative Corpus Study of German Word Order Variation
257
The estimate10 for each factor indicates the contribution of that factor to the combined effect. If we translate the estimate from the logarithmic scale to the linear scale, we get an odds ratio. For example, the odds of having subject-first are 2.7 times greater with animate subjects than with inanimate ones. Because there is an interaction between clause type and pronoun type, we have two odds ratios for these factors. With personal pronouns, the effect on the odds of having a subordinate vs. main clause is only 2.3, whereas with reflexives that effect is much bigger, viz. 12.3. The intercept tells something about how common subject-first is in general. The large negative value indicates that subject-first is far less common than objectfirst. We are also interested in knowing how good our model is. We want to know how far we have come in getting a grip on the variation by having taken seven factors into consideration. The model significance statistic tells us that we have at least made some progress (p < 0.01). The Akaike Information Criterion (AIC) quantifies this progress. The first figure says something about how much variation there was without a model, the second figure about how much variation is left with the model. The figure drops, which means that we have explained (in a statistical sense) some of the variation. Unfortunately, AIC is a very technical and difficult measure to interpret. A more intuitive measure is predictive strength: how good can we predict whether the object will precede or follow the subject if we use information about the factors in the model. We see that we can predict 89,9% correctly, which seems very good. However, because object-first is so much more common, it is not very difficult to get a good score on prediction. If we always choose object-first, we will be right 89.3% of the time, so the increase the model makes is very small (0.6%). That does not necessarily mean that we have a model that does not explain much: because the proportion of object-first versus subject-first is so skewed, it is just awfully difficult for the model to do better than chance. It does mean, however, that one always has to be careful in interpreting statistical measures. To get a better idea about how far we have come, we will look at another multifactorial technique, i.e. CART. 4.4.2
Classification And Regression Tree
CART is a statistical technique that produces a decision tree to classify observations according to some response variable, in our case word order, by making use of information about a number of factors. In constructing a de-
258
Kris Heylen
cision tree, the CART procedure tries to separate as well as possible observations with object-first from observations with subject-first. The observations are split up according to the values of a chosen factor. The procedure chooses the factor that allows the purest splits. The resulting groups of observations are split up again to maximize each group’s purity. This procedure continues until the groups are as pure as possible, i.e. they preferably contain either only subject-first observations or only object-first observations. These groups are found at the bottom leaves of the tree. To prevent the tree from making splits that only increase purity in the current data set, but are not good splits in general, the tree is cross validated. The tree is grown on a subset of the data and tested on the remaining subset. By dividing the data repeatedly in different subsets, a tree can be tested several times. In the end, we get the most economical tree, i.e. the one that has the purest leaves with the smallest number of splits. If we look at the tree for our data in figure 1, we see that the first split is done using clause type. This confirms what we knew from the logistic regression model, i.e. that clause type is the most important factor. Looking at the leaves, we see the tree has not been very successful at separating subject-first from objectfirst. Some leaves purely contain object-first observations, but there are no leaves that are purely subject-first. There are not even any leaves where subject first dominates! Obviously, we will have to take into account more factors than the seven we have looked at, to explain the variation. However, the tree does tell us where the variation is strongest. The paths through the tree ending in a leaf define combinations of factor values, contexts if you like. We see that most variation occurs in the second leaf from the right, i.e. in subordinate clauses (split 1) with an agent or recipient subject (split 2) and a length difference smaller than 7 (split 3). In this context, we have 62% object-first and 38% subject-first. We can now return to our data and see if the subject-first observations in this context have some interesting property in common that could lead us to formulate new hypotheses about what determines word order. In other words, CART trees can also be a good exploratory technique.
Ĭ = Theme or Recipient 100% (270) Ĭ = Agent 97% (269)
Given < 8 99% (539)
Lengthdif. t3 100% (22)
Given t 8 84% (45)
Lengthdif. <3 70% (23)
Reflexive pronoun 97% (584)
Lengthdif. <7 62% (143)
Lengthdif. t7 82% (71)
Ĭ = Agent or Recipient 69% (214)
Subordinate clause 76% (321)
Given t 5 84% (67)
Ĭ = Theme 90% (107)
Given< 5 100% (40)
Factor value unspecified 89%object first (all 995 observations)
Figure 1: Classification and Regression Tree
Personal pronoun 86% (90)
Main clause 96% (674)
Pruned by 10 fold cross validation
Split criterion: deviance
A Quantitative Corpus Study of German Word Order Variation
259
260
Kris Heylen
5
Conclusions
Word order variation in German is an area of syntax where problems of linguistic evidence and data reliability have posed a serious challenge for linguistic research in the past. The case study presented above shows that a quantitative corpus analysis is a viable alternative to traditional data types in order to overcome these problems. In this concluding section, we first sum up the merits of a quantitative corpus analysis as compared to traditional data types, and then we discuss some steps that might be considered for future research. 5.1
Merits of a quantitative corpus analysis
At the beginning of the paper, we discussed the problems with traditional data types, like grammaticality judgements, for studying word order variation in German. They are unreliable and can not deal with gradience and multifactorial complexity. For the type of variation in our case study, traditional grammaticality judgements appeared even to be fully useless. The question is then whether a quantitative corpus analysis has brought us any further. I think the answer is yes. Instead of mere intuitions, we now have reliable empirical evidence of the effect that a number of factors have on the variation. Gradience and multifactorial complexity could be tackled thanks to the statistical analysis. We now have an idea about which factors have an effect and about the strength of that effect, both for each factor separately and for factors in combination. Granted, the analysis also told us that we have not come awfully far in explaining the variation and that we will have to consider additional factors, but, to put it bluntly, at least we now know that we still do not know much. All of the findings of the analyses above are empirical facts that any theory of language has to account for. Some theories might split up their explanations for these facts between a theory of competence and a theory of performance, others might not. In any case, founding theoretical explanations on this kind of empirical data will make linguistic theorizing more reliable and falsifiable, so that their descriptive and explanatory adequacy can be compared.
A Quantitative Corpus Study of German Word Order Variation
5.2
261
Further steps
The multifactorial analyses, especially the CART analysis, have made clear that, although the seven factors investigated can explain some of the word order variation, we will have to take more factors into account. One intriguing result from the analysis is the strong effect of clause type: the ‘marked’ order subject-first is especially common in subordinate clauses. It might therefore be interesting to study subordinate clauses more closely and see whether the effect can be pinpointed to specific types of subordinate clauses, e.g. concessive or causal clauses. As we have stated at the beginning of this paper, the findings from this study are strictly speaking only valid for observations that instantiate a similar type of language use as the one that was represented in the corpus, i.e. newspaper articles from a local mid-German newspaper. In other types of language use, the ordering of subject and object might be different. Therefore, future research will have to take into account extra-linguistic variables like regional, social or register differences. In this paper, we only presented an exploratory analysis of corpus data. Statistical exploration allowed us to uncover some interesting facts that would not have been obvious at first sight. However, these facts are not yet explanations. Based on these facts and additional exploration, we will ultimately have to formulate an explanatory model for the variation that fits within some theoretical framework. Ideally, that explanatory model will give rise to new hypotheses that can be empirically tested. In that way, the validity of the explanatory model can be confirmed or it can become clear where modifications are necessary. Some of these hypotheses will be testable against corpus data. For others however, corpus data may not be suitable. For example, some hypotheses relating to the cognitive foundation of grammar will only be testable with psycholinguistic or even neurolinguistic experiments. Therefore, corpus analysis is part of a whole set of data types that are necessary for sound empirical language research. Notes 1. 2. 3. 4.
See Reis (1987) for an overview. The division between competence and performance is itself not uncontroversial within the linguistics community. E.g. Cognitive Grammar (Langacker et al.), Construction Grammar (Goldberg et al.), Emergent Grammar (Bybee, Hopper, et al.) Both examples come from the NEGRA corpus (sentence 1618 and 1665).
262
Kris Heylen
5. 6.
More information on http://www.coli.uni-sb.de/sfb378/negra-corpus/ The statistical background knowledge is mainly based on Agresti (1996). All analyses are done with SAS for Windows V8, except for the CART which was done with the tree package in R 1.8.1. 7. For factors measured on an ordinal scale, the Mantel-Hänszel chi square statistic is used to test for linear association. 8. The statistic actually tells us that there is a 94% chance that the proportion of word orders is the same for observations with accusative and dative pronouns. 9. Only 8 levels of the accessibility scale appear because the highest level (9), accessible in the immediate speech context, does not apply to written material. 10. To make the model less complex and more understandable, the number of values for each factor was reduced to 2. For semantic role, theme and recipient were grouped together and confronted with agent. For length difference, we considered small differences ( 6) and large differences (> 6). For subject givenness, very accessible ( 6) and little accessible (< 6) was distinguished.
References Agresti, Alan 1996 An Introduction to Categorical Data Analysis. John Wiley & Sons, New York.
Ariel, Mira 1990 Accessing NP Antecedents. Routledge, London. Grondelaers, Stefan 2000 De distributie van niet-anaforisch er buiten de eerste zinsplaats. Sociolexicologische, functionele en psycholinguïstische aspecten van er’s status als presentatief signaal. Unpublished PhD thesis, K.U. Leuven. Hawkins, John 1994 A performance theory of order and consituency. Cambridge University Press, Cambridge, Mass. Keller, Frank 2000 Gradience in Grammar. Experimental and Computational Aspects of Degrees of Grammaticality. Unpubl. PhD thesis. University of Edinburgh. Kurz, Daniela 2000 Wortstellungsphänomene im Deutschen. Unpublished Master thesis, Universität Saarbrücken. Lenerz, Jürgen 1994 Pronomenprobleme. In Brigitta Haftka (ed.), Was determiniert Wortstellungsvariation? Studien zu einem Interaktionsfeld von Grammatik, Pragmatik und Sprachtypologie, pp. 161–173.
A Quantitative Corpus Study of German Word Order Variation
263
Pechmann, Thomas, et al. 1996 Wortstellung im deutschen Mittelfeld. Linguistische Theorie und psycholinguistische Evidenz. In Christoph Habel et al. (eds.), Perspektiven der kognitiven Linguistik. Modelle und Methoden, pp. 258–299. Poncin, Kristina 2001 Präferierte Satzgliedfolge im Deutschen: Modell und experimentelle Evaluation. Linguistische Berichte, 186: 175–203. Primus, Beatrice 1994 Grammatik und Performanz: Faktoren der Wortstellungsvariation im Mittelfeld. Sprache und Pragmatik, 32: 39–86. Prince, Ellen F. 1981 Toward a taxonomy of given-new information. Peter Cole (ed.), Radical pragmatics, pp. 223–255. Reis, Marga 1987 Die Stellung der Verbargumente im Deutschen. Stilübungen zum Grammatik: Pragmatik-Verhältnis. In Inger Rosengren (ed.), Sprache und Pragmatik: Lunder Symposium 1986, pp. 139–177. Zifonun, Gisela, Ludger Hoffmann, and Bonno Strecker, 1997 Grammatik der deutschen Sprache. de Gruyter, Berlin / New York.
Which Statistics Reflect Semantics? Rethinking Synonymy and Word Similarity Derrick Higgins
1 Overview A great deal of work has been done of late on the statistical modeling of word similarity relations (cf. Schtze 1992; Lund and Burgess 1996; Landauer and Dumais 1997; Lin 1998; Turney 2001). While this has largely been viewed as an engineering task (with the notable exception of much writing on Latent Semantic Analysis (LSA)), the relative success of different approaches to constructing word similarity measures is highly relevant to issues in theoretical semantics and language acquisition. With this background in mind, this paper has two main aims. First, we will present yet another statistical approach to the calculation of word-similarity scores (LC-IR), which significantly outperforms other methods on standard benchmarks including the 80-question set of TOEFL ® synonym test items first employed by Landauer and Dumais (1997). 1 Second, we hope to demonstrate that – various methods for assessing word similarity are based on fundamentally different assumptions about the statistical properties which synonyms can be expected to display, – the performance of each method can be taken as a judgment on the validity of these assumptions, and – whether these predictions regarding the statistical distribution of synonyms in a corpus are borne out ought to be taken into account in any consideration of the acquisition of meaning as part of language, and the mental representation of meaning. 2 Statistical approaches to word similarity Without indulging in too much of a caricature, we can classify different approaches to statistical estimation of word similarity according to the assump-
266 Derrick Higgins tions which they make about the distribution of synonyms (actually, plesionyms; cf. Edmonds and Hirst 2002). The three main assumptions made by existing word similarity measures are the topicality assumption, the proximity assumption, and the parallelism assumption. 2.1
Topicality: LSA et al.
The techniques of Latent Semantic Analysis, Random Indexing, and Lund & Burgess’ HAL all collect statistics on the relative frequency with which a word appears “near” other words. Similar words can then be identified as those which have a similar profile of content words which tend to occur near them. These approaches to word similarity are based on the idea of situating each word in a high-dimensional vector space, so that the similarity between words can be measured as the cosine of the angle between their vectors (or a similar metric). Figure 1 represents this graphically for a two-dimensional space. Words similar to ship tend to occupy the same part of the space, and have high cosine similarities. For example, the cosine of the angle between the vectors for ship and sail in our toy example is 0.95. Unrelated words, though will be nearly orthogonal to sail. For example, square has a cosine similarity of only 0.10 with ship. Latent Semantic Analysis (LSA; Landauer and Dumais 1997) is the most widely cited of these vector-space methods. It involves first constructing a term-by-document matrix based on a training collection, in which each cell of the matrix indicates the number of times a given term occurs in a given document (modulo the term weighting scheme). Given the expectation that similar terms will tend to occur in the same documents, similar terms ought to have similar term vectors in this scheme. Singular-value decomposition (SVD) is then applied to this matrix, a dimensionality reduction technique which blurs the distinctions between similar terms and improves generalization. Typically, around 300 factors are retained. To illustrate, singular-value decomposition of the term-by-document matrix M produces the three matrices T , S, and C, as indicated in Figure 2. S is a diagonal matrix containing the top 300 singular values of M, and T and C allow term and document vectors, respectively, to be mapped into the reduced space. The product T × S ×C of these three matrices approximates the original matrix M. Now, in order to find the similarity between any two words, instead of calculating the cosine of the angle between row vectors from M, the vectors are first mapped into the 300-dimensional factor space, and the
Which Statistics Reflect Semantics? 267
sail purple
boat ship
tariff
95 . = Cos .10 = Cos
rectangle square
Figure 1. Calculation of word similarity by the cosine metric for vector-based approaches
M
T
S
C
term 1 ...
300
300
n 300
m
term n document 1
...
document m
300
Figure 2. Singular-value decomposition of the term-by-document matrix
cosine similarity metric is calculated on these reduced vectors. Table 1 shows the most similar words to ship in one LSA space, in order to illustrate the sort of word similarity relationships induced using an LSA analysis. Sch¨utze (1992) and Lund & Burgess (1996 have also produced vectorbased methods of assessing word similarity. The primary differences between
268 Derrick Higgins Table 1. Sample similarity scores produced by Latent Semantic Analysis. (Similarity to word ship) Word ship crew aboard captain deck masts sailor
Similarity 1.00 .90 .89 .87 .85 .85 .83
Word decks rigging mast sailors sails hull ships
Similarity .82 .82 .82 .80 .80 .78 .78
these methods and LSA are, first, that they use a sliding text window to calculate co-occurrence, rather than requiring that the text be pre-segmented into documents, and second, that they construct a term-by-term matrix instead of a term-by document matrix. In this term-by-term-matrix, each cell represents the co-occurrence of a term with another term within the text window, rather than the occurrence of a term within a document. The methods remain very similar to LSA, however; in each case, a vector is constructed to represent the meaning of a word based on the content words it occurs with, and the similarity between words is calculated as the cosine of the angle between the term vectors. A slightly different vector-based word similarity metric is Random Indexing (Kanerva et al. 2000; Sahlgren 2001). Sahlgren’s application of this method involves first assigning a label vector to each word in the vocabulary, an 1800-length sparse vector in which a small number of elements have been randomly set to 1 or −1. The index vector for each word is then derived as the sum of the label vectors of all words occurring within a certain distance of the target word in the training corpus (weighted according to their distance from the target word). Sahlgren uses a window size of 2-4 words on each side of the target word. This is similar to the other vector-based approaches mentioned here, but it is more scalable because it does not require a computationally intensive matrix reduction step like SVD. While the specifics vary between these different approaches to similarity calculation—for example, the proximity required for words to count as “near” one another varies from a distance of 3 words (Random Indexing) to as much as 300 words (LSA)—these approaches are similar enough that we can say they fundamentally depend on the assumption that similar words tend
Which Statistics Reflect Semantics? 269
to have the same neighboring content words. We will refer to this as the topicality assumption, making the inference that synonyms tend to have the same neighbors because they are in passages which are on the same topic. 2.2
Proximity: PMI-IR
On the other hand, PMI-IR also involves the collection of statistics regarding the relative frequency with which words occur in proximity, but the assumption made regarding how this relates to synonymy is quite different. Instead of the assumption that similar words will occur near the same words, the calculation which forms the core of PMI-IR assumes that similar words will tend to occur near each other. In particular, PMI-IR measures the degree to which words tend to occur near one another using pointwise mutual information, as shown in (1–2) (thus the motivation for the acronym PMI-IR: pointwise mutual information– information retrieval).2 The information retrieval aspect of PMI-IR consists in the fact that web search statistics are used to estimate the frequency with which word pairs appear together. (3) illustrates that the pointwise mutual information is proportional to a measure expressed in terms of the expected counts of words and word pairs in some corpus, and (4) shows how this is estimated using web search statistics. Turney (2001) uses the NEAR operator of the AltaVista search engine, which finds words within a ten-word window of one another, to calculate term co-occurrence. 3 (1) (2) (3) (4)
SimilarityPMI−IR (w1 , w2 ) = PMI(w1 , w2 ) P(w1 & w2 ) = P(w1 ) × P(w2 ) Count(w1 & w2 ) ∝ Count(w1 ) × Count(w2 ) Hits(w1 NEAR w2 ) ≈ Hits(w1 ) × Hits(w2 )
Table 2 shows the most similar words to ship, using the PMI-IR similarity score. In fact, the similarity scores in Table 2 are only an approximation to the score used by Turney (2001), because the AltaVista query syntax that paper used is no longer available.4 There is some noise in these results (most likely due to errors in search engine frequency estimates), but in general, nautical terms tend to appear at the top of the list. Compared to the other methods,
270 Derrick Higgins PMI-IR tends to perform well on low-frequency terms, because it does not suffer from the same sparse data problems. Table 2. Sample similarity scores produced by an approximation of PMI-IR. (SimiAND “ship”) . larity to word ship) The statistic listed is actually hits(X hits(X) Word ship shipboard cruise-ship anti-ship halt digs fables
Similarity 1.00 .999 .997 .997 .993 .983 .979
Word fluffy wipers frigates warship muck cunard lifeboats
Similarity .977 .976 .969 .964 .963 .953 .947
The intuitive basis for assuming that similar words will tend to occur near each other is not as clear as the basis for the topicality assumption, but the good results of PMI-IR lend it some empirical credence. We will refer to it as the proximity assumption. 2.3
Grammatical parallelism
Finally, Dekang Lin’s work (Lin 1998; Pantel and Lin 2002) could be said to be based on the parallelism assumption: synonyms ought to be found in similar grammatical frames. The primary statistics gathered by Lin’s method are the frequencies with which words occur linked by specific grammatical relations with other words. Lin applies a parser to the training corpus to extract triples consisting of two words and the grammatical function by which they are linked, and then constructs an information-theoretic measure on the basis of these triples, which serves as a word similarity score. Since grammatical functions (such as subject-verb and verb-object) are the basic datum of this method, these scores are based in large part on the selectional properties of verbs. Table 3 once again shows the most similar words to ship, this time using Dekang Lin’s similarity scores. Whereas the LSA space identified words likely to occur in a discourse in which ships are discussed, Lin’s method identifies words which are possible substitutes for the word ship.
Which Statistics Reflect Semantics? 271 Table 3. Sample similarity scores produced by Dekang Lin’s information-theoretic method. (Similarity to word ship) Word ship vessel boat warship submarine tanker aircraft
Similarity 1.00 .32 .25 .23 .22 .20 .20
Word freighter plane cargo ship fishing boat barge helicopter ferry
Similarity .20 .20 .18 .17 .17 .17 .16
3 LC-IR Adding to this list of approaches, we present LC-IR (local context–information retrieval), a method for constructing word similarity scores which is inspired by PMI-IR, but which differs in its basic assumptions, and produces significantly better results. LC-IR, like PMI-IR, collects counts from the Web on how often words occur near one another, but it uses a smaller window size (requiring absolute adjacency). As shown in (5-7), LC-IR starts from the same assumption made by PMIIR, namely that a measure of lexical association (pointwise mutual information) ought to be a good predictor of synonymy. In Equation (8), however, we require absolute adjacency for words to be counted as occurring together. (5) (6) (7) (8) (9)
SimilarityLC−IR (w1 , w2 ) = PMI(w1 , w2 ) P(w1 & w2 ) = P(w1 ) × P(w2 ) Count(w1 & w2 ) ∝ Count(w1 ) × Count(w2 ) Hits(w1 NEXT-TO w2 ) ≈ Hits(w1 ) × Hits(w2 ) Hits(“w1 w2 ”) + Hits(“w2 w1 ”) = Hits(w1 ) × Hits(w2 )
At first glance, this would seem to be a minor modification to the basic PMI-IR model, and not one which influences its fundamental assumptions. However, we will show that the small window size is of paramount
272 Derrick Higgins importance to the model, and almost guarantees that LC-IR will identify synonyms conforming to the parallelism assumption, whereas standard PMI-IR is based on the more nebulous proximity assumption. One final modification is necessary in order to complete the description of the LC-IR lexical similarity statistic. As stated in (9), the similarity calculation is very sensitive to collocation effects. Because we sum the number of times w1 occurs immediately before w2 and the number of times w2 occurs immediately before w1 , a high count of either bigram will suffice to produce a high similarity score for the word pair, even if the word bigram produced by reversing the order does not occur at all. This is particularly troublesome when comparing words which belong to different parts of speech, or are ambiguous as to part of speech. For example, if we wish to evaluate the similarity of the words private and practice, (9) indicates that we should start by summing the frequencies of the bigrams private practice and practice private. Of course, the former is much more frequent, because of the adjectival sense of private, and the fact that private practice is a common expression for an individually owned medical or legal office. Unfortunately, the frequency of this collocation could lead to the prediction that private is more similar to practice than to nouns such as lieutenant or corporal. To mitigate these collocational effects, we replace the sum in (9) with the min function, so that only the less frequent bigram is considered in our calculation:
(10) SimilarityLC−IR (w1 , w2 ) =
min(Hits(“w1 w2 ”), Hits(“w2 w1 ”)) Hits(w1 ) × Hits(w2 )
Table 4 shows the words which are found to be most similar to ship using LC-IR. The list does not consist solely of synonyms, but it does include exclusively nautical terms which may occur either to the left or to the right of ship, and near-synonyms such as yacht are found near the top of the list. 5 Table 5 compares the performance of LC-IR and a number of other semantic similarity measures on the task of correctly answering multiple-choice synonym test items. The three sets of test items are, first, the 80 questions from the Test of English as a Foreign Language, introduced by Landauer and Dumais (1997), 50 ESL questions used by Turney (2001), and finally a set of 300 items culled from the Reader’s Digest Word Power feature and first used in Jarmasz and Szpakowicz (2003). Each item consists of a stem word and four option words, and the test-taker’s task is to identify which of the four is most nearly synonymous with the stem. Since there are four possible answers for each question, random guessing gives us a baseline performance of 25%
Which Statistics Reflect Semantics? 273 Table 4. Sample similarity scores produced by LC-IR. (Similarity to word ship) The X”)) . statistic listed is actually min(hits(“X ship”),hits(“ship hits(X) Word foundering wrecked sinking sailing torpedoed capsized moored
Similarity .0140 .0080 .0046 .0045 .0045 .0037 .0033
Word orders yacht cruise anti-submarine mayflower disembark foundered
Similarity .0032 .0030 .0028 .0028 .0028 .0026 .0026
Table 5. Comparison of word similarity results across three synonym tests
Baseline Random Indexing PMI-IR LC-IR Roget’s Thesaurus
TOEFL® 25%
RDWP 25%
ESL 25%
Overall 107.5/430 = 25%
67.5% 80.0% 81.3%
36.4% 72.3% 74.8%
39.2% 66.0% 78.0%
182.8/430 = 42.5% 314.08/430 = 73.0% 328.33/430 = 76.4%
78.8%
74.3%
82.0%
327/430 = 76.0%
accuracy. Partial credit is given in case of a tie, in which a model assigns equal similarity scores to two or more options. Landauer and Dumais (1997) report an accuracy of 64.4% on the 80question TOEFL® test using Latent Semantic Analysis, but this is omitted from Table 5 because we do not have corresponding test results for the other data sets. It is also not possible to provide a fair comparison of Dekang Lin’s similarity model, because many of the words used in the test sets were not included in his analysis. See Jarmasz and Szpakowicz (2003) for a partial evaluation of Lin’s model. The results for Random Indexing, another vector-based semantic approach, slightly exceed those reported using LSA on the TOEFL ® data set, the only one for which we have results for LSA. (The Random Indexing results in the table are for our own re-implementation of Sahlgren’s method, yielding slightly lower performance than reported in his 2001 paper. We use our re-
274 Derrick Higgins sults because Sahlgren provides a performance evaluation only on the basis of the 80-question TOEFL® test set.) Table 5 also shows that PMI-IR substantially outperforms Random Indexing, the representative of approaches based on the topicality assumption. (Again, the results reported for PMI-IR are based on our own implementation of Turney’s (2001) procedure; this time, our results are slightly higher.) Finally, LC-IR shows an improvement over PMI-IR, which is significant at the .05 level. In fact, the performance of LC-IR in identifying synonyms, as measured by these test sets, is the highest yet recorded, exceeding even the results of systems using lexical resources such as WordNet (Resnik 1995; Hirst and St-Onge 1997; Leacock et al. 1998) and Roget’s Thesaurus (Jarmasz and Szpakowicz 2003). For comparison, the final row of Table 5 provides the performance of Jarmasz and Szpakowicz’s thesaurus-based system. Two other systems deserve mention in this summary of performance results on synonym identification. First, Turney, Littman, Bigham, and Shnayder (2003) use a semi-supervised approach to developing an ensemble-based synonym identifier. Their system achieves over 80% accuracy on a test set very similar to the collections described here (and over 97% accuracy on the TOEFL® items). While this result is very encouraging for the engineering task of predicting synonymy, it is not directly comparable to the other systems which we have described, and is not really relevant to the question of what sort of information provides the clearest cue to synonymy in a languageacquisition scenario. First, this system requires supervised training in order to set the model parameters which govern the importance of each submodel in the ensemble. Second, this model is not purely a corpus-based statistical one; some of the submodules it employs use information from dictionary and thesauri. Both of these conditions are at odds with the situation presented to language learners in the course of lexical acquisition. Typically, there is no supervisory signal which identifies words as synonymous or not, and of course dictionaries and thesauri are not used in the lexical acquisition scenarios we are interested in. The other paper which reports a high accuracy on this task is (Rapp 2003), which applies singular-value decomposition to a word-by-word matrix of local associations, to produce an LSA-like vector space method of similarity calculation. This paper reports an accuracy over 90% on the TOEFL ® synonym test. Given the small size of this test set, though, it is not warranted to extrapolate from this result to the other test sets. This is especially true given the sharp degradation in performance which the similar Random Indexing model shows on the other two test sets (cf. Table 5).
Which Statistics Reflect Semantics? 275
3.1
LC-IR and parallelism
We ascribe the good performance of LC-IR across all of these test sets to two main factors. First, LC-IR benefits from the fact that it uses web search statistics, which addresses the problem of data sparsity afflicting other statistical approaches to word similarity. (This is an advantage which it shares with PMI-IR, of course.) Second, LC-IR differs from PMI-IR in that it is based on the parallelism assumption regarding the distribution of synonyms in a corpus, rather than the proximity assumption. As described above, the proximity assumption consists in the prediction that similar words ought to occur near one another; the parallelism assumption predicts that similar words will occur in grammatically parallel constructions. Given that LC-IR and PMI-IR differ primarily in the size of window used to calculate word co-occurrence, it is not immediately clear why they should rely on fundamentally different assumptions about the distribution of synonyms. The key observation is that AltaVista’s indexing format causes punctuation (such as commas) to be ignored in searches. This means that LC-IR tends to rate highly word pairs which often occur in lists of conjoined items (like “w1 , w2 , and w3 ”) or other equative contexts. Consider, for example, some of the pages rated most highly in an AltaVista search for the word pair “assistance help”: 1
Federation Sim Fleet - Assistance & Help
2
EPA: Business Gateway:
Sim Fleet - A Star Trek sim group on AOL....
Assistance, Help and Training
Environmental Information.
3
Environmental Assistance....
SEAL - Assistance/Help SEAL is a free 32-bit GUI for DOS....
In each of these results, the pair of words occurs either in a list of items treated in a parallel fashion, or in an implied conjunctive context. In essence, by setting the proximity threshold so low (requiring absolute adjacency between the target words), we are able to isolate word pairs which have a high degree of grammatical parallelism, because the equative uses which we isolate virtually guarantee parallel use of the terms. It stands to reason that a semantic similarity model with a clearer basis in grammatical parallelism should have higher performance than one which prizes word proximity first and foremost. The relationship between synonymy and similar grammatical behavior is intuitively more understandable than a link relating to proximity.
276 Derrick Higgins 4 Implications for a theory of lexical semantics and acquisition Of course, there is much more to semantics than lexical semantics, and much more to lexical semantics than synonymy. We do not claim that our statistical approach to word similarity is the final word in the computational treatment of semantics, or that it explains the linguistic acquisition of meaning in its entirety. However, we believe that this approach is nevertheless relevant to a discussion about semantic acquisition, for the following reason. LC-IR, which focuses on identifying words with parallel grammatical behavior, produces good results in identifying synonyms, as shown in the previous section. This demonstrates that grammatical parallelism is a strong correlate of semantic similarity, which is a prerequisite for its application in word learning. In this section, we will present an argument that grammatical parallelism is also the best candidate for a cue used by language learners to identify words as semantically similar or synonymous. In two different lexical acquisition scenarios, we argue that constructional parallelism could provide a sufficient basis for the acquisition profile actually observed, and that alternative mechanisms based on topicality or proximity are not workable. 4.1
“One-shot” learning
First, we consider the problem of “one-shot” word learning, and argue that this phenomenon is more easily modeled as a special case of learning from parallel word usage than as any corresponding process involving topicality or simple proximity. One-shot word learning, also known as fast-mapping (Milostan 1995), is characterized by very rapid lexical acquisition, triggered by a single exposure to a word, or at least a very small number of exposures. This may include learning words through definitions, or through hearing a prototypical instances of word usage which are sufficient to support lexical acquisition. While fast mapping is more typical of adult word learning, it is common to both adults and children. The idea that a word may be learned from a single prototypical instance was demonstrated by Nelson and Bonvillian (1978) for object names, and by Bates, Bretherton, and Snyder (1988) for action words. To control for the somewhat unnatural experimental setting of these studies, Rice and Woodsmall (1988) investigated children’s acquisition of new vocabulary from television viewing, and again found that they were able to learn new words simply from hearing them used a small number of times.
Which Statistics Reflect Semantics? 277
Fast mapping may seem relatively uncontroversial, given its intuitive plausibility. However, this phenomenon has proven very challenging for computational approaches to lexical acquisition (cf. Milostan 1995). In particular, connectionist approaches such as (Regier 1992), which depend on a long training procedure such as backpropagation of error, have difficulty accounting for such a rapid learning mechanism. A number of modifications to the basic error-driven connectionist learning schemes have been devised in attempts to permit this kind of immediate generalization (Hinton and Plaut 1987; Mikkulainen 1993; Yip and Sussman 1997). In fact, Latent Semantic Analysis is also a connectionist approach, broadly speaking, albeit not one which requires supervised training. Nevertheless, it does require a computationally expensive training procedure (singular-value decomposition) which makes one-shot word learning hard to model. This is equally true of other vector-based approaches motivated by the topicality assumption, such as Random Indexing; they also require time-consuming procedures for training or dimensionality reduction. In fact, even if these topicality-based models could be accelerated by some method such as “fast weights” (Hinton and Plaut 1987), they could still not contribute to a realistic model of one-shot word learning, because the data which they use to describe a word is not well-suited to learning in such datapoor situations. The calculation of a word’s representation in these models is determined only by the relative frequency with which it occurs near other words in the training corpus. When only a few instances of the target word are available, this kind of loose topical data cannot provide a specific enough semantic representation. (Consider a document containing five new words; LSA would assign all five words the same representation.) Of course, this observation holds equally for a semantic similarity metric based on the proximity assumption, such as PMI-IR. When the goal is to learn a word’s meaning based on a single observation, the data will simply be too sparse to use such a method. (Given a new word, we cannot estimate an association statistic such as pointwise mutual information, because we have only observed it co-occurring with a handful of other words.) The problem is not as acute, however, when we turn to models based on the parallelism assumption. The crucial point is that the primary datum for parallelism-based word similarity is a linguistic construction whose attestation is often definitive, whereas the requisite data for the other approaches is typically much too sparse for one-shot learning. Of course, the exact method used in LC-IR is not available in the case of one-shot word learning; language learners cannot consult the web to look for words which occur in parallel con-
278 Derrick Higgins texts with the target word. However, the grammatical construction in which a word is used, including the other lexical heads with which it is associated, may well be sufficient data to identify the word’s meaning at once. It is just this sort of parallelism of grammatical relations on which LC-IR is based. 4.2
Vocabulary learning by children
Second, we consider the more general issue of children’s gradual acquisition of vocabulary, addressed by Landauer and Dumais (1997). In this domain as well, we argue that parallelism is a better cue to word similarity than either topicality or proximity, in part due to the nature of the primary linguistic data with which the child is confronted. In addition to data sparsity issues, which are in play here as well as in adult word learning, such approaches would find it difficult to deal with language data which consists of relatively short utterances, and can be lacking in topical coherence. To illustrate this fact, consider the discourse in Figure 3, an excerpt from the CHILDES database (MacWhinney 2000) of children’s conversational interactions. The discourse rapidly shifts from a recap of a trip the child took recently, to a discussion of when one should and should not have chewing gum, to an exchange about the child’s shirt, and finally to a request to play with a dog. See Spooren (2004) and references cited therein for an analysis of children’s acquisition of discourse coherence relations. This is a far cry from the kind of strong topical coherence required of a collection used to train an LSA space. In fact, the best sort of text to use for training an LSA space for general use is an encyclopedia, precisely because each article is so sharply focused on a particular subject. In much of the language to which children are exposed, then, loose topical connections cannot provide a strong cue for word learning. 4.3
Reassessing LC-IR
In addition to drawing a fruitful assumption from computational treatments of semantics (the parallelism assumption), and evaluating its plausibility in a language acquisition context, we can use these language acquisition scenarios to re-evaluate the parallelism assumption as applied computationally. From this reconsideration, it becomes clear that, in fact, LC-IR is by no means an ideal implementation of the parallelism assumption (understood in its broadest and most useful form for language acquisition). Perhaps future research will succeed in producing a similarity metric which represents a truer reflection of the parallelism assumption, with superior performance.
Which Statistics Reflect Semantics? 279
*MOT: *CHI: *MOT: *MOT: *CHI: *MOT: *CHI: *MOT: *MOT: *CHI: *MOT: *CHI: *MOT: *CHI: *MOT: *MOT: *CHI: *CHI: *MOT: *CHI: *MOT: *CHI: *MOT: *MOT: *MOT: *MOT: *CHI: *CHI: *CHI: *MOT: *CHI:
where did you meet Minoru ? where ? we came to your school the other day # didn’t we ? didn’t he take you somewhere ? where ? where did he take you ? where did he take me ? where did we go with him # remember ? we came to pick you up at school ? we go to a doughnut store . and what did you do there ? eat a doughnut . and what else ? we didn’t have gum . no . why didn’t we have gum ? cause +... that was dessert . and it was breakfast . yeah [= yes] . does one eat gum after breakfast ? no . let me see that . oh # that looks nice on you # Nina . do you like that shirt ? what’s this ? a pocket . I wanna play +... this big honey doggy . he’s cute . I wanna play with him .
Figure 3. Sample mother-child interaction from the CHILDES database
First, it is clearly the case that much of the parallelism which aids in word learning is non-verbal. Words used in similar situations, with similar objects being attended to, or with similar gestures may be associated in the same way. LC-IR, as well as every other computational method of semantic association, operates on a text alone, divorced from the extralinguistic situation to which it pertains.
280 Derrick Higgins Second, while one great advantage of parallelism as a cue for word learning is that it allows learning from sparse data (“one-shot” learning), LC-IR as it stands does not learn from sparse data. In fact, it leverages the largest text database currently available, the World-Wide Web. A question for future work is whether the parallelism assumption can be employed computationally in an on-line fashion, associating word meanings as words are encountered. Finally, LC-IR makes relatively limited use even of grammatical parallelism. Only when words are found adjacent to one another in the source text can any information about their relationship be inferred. Clearly, there is much more information about the parallel use of words in a text than that provided by the contexts in which they are adjacent. For example, nouns which serve commonly as the object of the same verb are clearly being used in a parallel fashion. As technological advances in processing power become available, perhaps it will become feasible to apply an approach like Dekang Lin’s to much larger texts, so that these other sorts of parallelism can be evaluated on a large scale as well. 5 Conclusion This paper has presented a statistical model of word similarity, LC-IR, which is based on web search statistics regarding the frequency with which words appear adjacent to one another. This metric achieves a higher level of overall performance on three major synonym identification test sets than any other purely statistical system has reported. We have argued that this system’s performance gains can in part be attributed to the fact that it is based on fundamentally different assumptions about the distribution of synonyms in a text than other models. In particular, this model assumes that words which are similar in meaning will occur in the same grammatical frames (the parallelism assumption). Other models are based on the idea that similar words ought to occur near the same set of other words (the topicality assumption), or that they ought to occur near those words which are most similar to them (the proximity assumption). The parallelism assumption has the best support among the three, both on the grounds of empirical results, and on the basis of theoretical considerations of language acquisition.
Which Statistics Reflect Semantics? 281
Acknowledgements I would like to thank the conference organizers for providing an open forum for discussion, and my ETS colleagues for their helpful comments on an earlier draft of this work. R. H. Baayen and an anonymous reviewer also provided useful suggestions on this paper. Any opinions expressed here are those of the author, and not necessarily of Educational Testing Service.
Notes 1. The current version of the TOEFL® test does not include synonym items. 2. Terra and Clarke (2003) compare other statistical measures of word association on the synonym identification task, such as log-likelihood ratio and chi-square, but none of the other measures which they investigate performs better than pointwise mutual information. 3. At the time of this writing, AltaVista no longer supports the NEAR operator in its searches. While it is possible to approximate this sort of search using the Google search engine (using, for example, the Google API Proximity Search at http://staggernation.com/cgi-bin/gaps.cgi), this workaround is less than ideal. Unfortunately, corpus linguistics work which depends on web statistics is at the mercy of those few services with the resources to index such a large collection. 4. Instead of AltaVista’s search engine, we used Microsoft’s development search engine (http://beta.search.microsoft.com). Most search engines do not allow automated query submission, and while the Google engine does, it limits users to 1000 queries per day. By contrast, the construction of Table 2 required over 100, 000 queries. Co-occurrence calculation for Table 2 used the AND operator instead of the NEAR operator, because the latter is not supported by Microsoft’s search engine. Finally, we have removed a couple of inappropriate words from the list in Table 2. A bit of experimentation suggests that the frequency estimates provided by the Microsoft search engine are less reliable than those given by Google, or formerly by AltaVista. While some of the resulting noise has simply been passed through, and is reflected in the table, for some reason these bad frequency estimates seemed to boost the similarity scores for a certain set of common keywords for internet search which are unsuitable for inclusion in an academic paper. 5. As in Table 2, we used Microsoft’s development search engine to calculate the values for this table. For some reason, the search engine seems to have greater problems in accurately estimating hit counts for AND searches than for searches with exact string matches. This had the result that the LC-IR results appear much less noisy than the corresponding estimated results for PMI-IR in Table 2.
282 Derrick Higgins References Bates, E., I. Bretherton, and L. Snyder 1988 From First Words to Grammar: Individual Differences and Dissociable Mechanisms. Cambridge University Press, Cambridge, NY. Edmonds, Philip and Graeme Hirst 2002 Near-synonymy and lexical choice. Computational Linguistics, 28(2): 105–144. Hinton, G. E. and D. C. Plaut 1987 Using fast weights to deblur old memories. In Proceedings of the 9th Annual Conference of the Cognitive Science Society, pp. 177–186. Seattle, WA. Hirst, Graeme and David St-Onge 1997 Lexical chains as representation of context for the detection and correction of malapropisms. In C. Fellbaum, (ed.), WordNet: An electronic lexical database and some of its applications. The MIT Press, Cambridge, MA. Jarmasz, Mario and Stan Szpakowicz 2003 Roget’s thesaurus and semantic similarity. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), pp. 212–219. Borovets, Bulgaria. Kanerva, P., J. Kristoferson, and A. Holst 2000 Random indexing of text samples for latent semantic analysis. In L. R. Gleitman and A. K. Josh, (eds.), Proc. 22nd Annual Conference of the Cognitive Science Society. Landauer, Thomas K. and Susan T. Dumais 1997 A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104: 211–240. Leacock, Claudia, Martin Chodorow, and George Miller 1998 Using corpus statistics and WordNet relations for sense identification. Computational Linguistics, 24(1): 147–165. Lin, Dekang 1998 An information-theoretic definition of similarity. In Proc. 15th International Conference on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco, CA. Lund, Kevin and Curt Burgess 1996 Producing high-dimensional semantic spaces from lexical cooccurrence. Behavioral Research Methods, Instruments and Computers, 28(2): 203–208. MacWhinney, Brian 2000 The CHILDES project: Tools for analyzing talk. Lawrence Erlbaum Associates, Mahwah, NJ. Third edition.
Which Statistics Reflect Semantics? 283 Mikkulainen, Risto 1993 Subsymbolic Natural Language Processing: An Integrated Model of Scripts, Lexicon, and Memory. MIT Press. Milostan, Jeanne 1995 Connectionist Modeling of the Fast Mapping Phenomenon. UCSD ms. Nelson, K. E. and J. D. Bonvillian 1978 Early semantic development: Conceptual growth and related processes between 2 and 4 1/2 years of age. In K. E. Nelson, (ed.), Children’s Language, volume 1, pp. 467–556. Gardner Press, New York. Pantel, Patrick and Dekang Lin 2002 Document clustering with committees. In Proceedings of SIGIR02. Tampere, Finland. Rapp, Reinhard 2003 Word sense discovery based on sense descriptor dissimilarity. In Proceedings of Machine Translation Summit IX. Regier, Terry 1992 The Acquisition of Lexical Semantics for Spatial Terms: A Connectionist Model of Perceptual Categorization. Ph.D. thesis, University of California, Berkeley. Resnik, Philip 1995 Using information content to evaluate semantic similarity in a taxonomy. In IJCAI, pp. 448–453. Rice, M. L. and L. Woodsmall 1988 Lessons from television: Children’s word learning when viewing. Child Development, 59: 420–429. Sahlgren, Magnus 2001 Vector based semantic analysis: Representing word meanings based on random labels. In Proceedings of the ESSLLI 2001 Workshop on Semantic Knowledge Acquisition and Categorisation. Helsinki, Finland. Sch¨utze, Hinrich 1992 Dimensions of meaning. In Proceedings of Supercomputing ’92, Minneapolis., pp. 787–796. Spooren, W. 2004 On the use of discourse data in language use research. In H. Aertsen, M. Hannay, and R. Lyall, (eds.), Words in their places: A Festschrift for J. Lachlan Mackenzie, pp. 381–393. Faculty of Arts, VU, Amsterdam. Terra, Egidio L. and Charles L. A. Clarke 2003 Frequency estimates for statistical word similarity measures. In Marti Hearst and Mari Ostendorf, (eds.), HLT-NAACL 2003: Main Proceedings, pp. 244–251. Association for Computational Linguistics, Edmonton, Alberta, Canada.
284 Derrick Higgins Turney, Peter D. 2001 Mining the Web for synonyms: PMI–IR versus LSA on TOEFL. In Proceedings of the 12th European Conference on Machine Learning, pp. 491–450. Turney, Peter D., Michael Littman, Jeffrey Bigham, and Victor Shnayder 2003 Combining independent modules to solve multiple-choice synonym and analogy problems. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP03), pp. 482–489. Borovets, Bulgaria. Yip, Kenneth and Gerald Jay Sussman 1997 Sparse representations for fast, one-shot learning. In AAAI/IAAI 1997, pp. 521–527.
Language Production Errors as Evidence for Language Production Processes – The Frankfurt Corpora Annette Hohenberger and Eva-Maria Waleschkowski
1
The history of slip research: caveats, corpora, and crossings
Since the very beginnings of modern psycholinguistic research (Flores d’Arcais and Levelt 1970), language production errors, mostly speech errors, have played a major role as a rich well of evidence for processes and models of language production. Numerous researchers have appreciated the epistemological value of production errors as a “window to the mind” (Fromkin 1973) and taken regularities found in the error patterns as evidence for the normal functioning of the language processor (Berg 2003). 1 Caveats against the usefulness of speech errors have been raised on two grounds. First, there is a reasonable concern against the monolithic use of a single data class in the face of methodical and methodological advances in psycholinguistics which has led to a plurality of research methods, most notably due to the rise of the reaction-time method and the word-picture interference paradigm (Meyer 1992, Pechmann 2003), as well as neurolinguistic methods such as fMRI, PET, and recently, ERP. Second, there is an equally motivated concern for the satisfaction of standards of psychological test theory, i.e., objectivity, reliability, and validity (Cutler 1982, Ferber 1995). In the light of these caveats, speech errors should not be considered privileged data per se, but also not dismissed a priori. As we will show below, in the course of the history of slip research, a plurality of empirical methods, data collections, and a rich cross-linguistic and cross-modal perspective has developed within the field which can avoid the abovementioned methodological pitfalls. The first speech-error corpus we owe to Meringer (1908, Meringer and Mayer 1895). Many of the corpora to follow were of the same kind, socalled pen-and-paper corpora (see Poulisse 1999: 6, for an overview). They comprise a huge number of items, not seldom many thousands, collected by multiple contributors, from multiple subjects, in spontaneous speech situations, over many years. The strength of these corpora lies in the naturalness of the sampling situation and therefore in the validity of the inferences
286
Annette Hohenberger and Eva-Maria Waleschkowski
drawn from them, their huge item number, the ease of collection and subsequent storing and analysis. The weakness of these corpora lies in various observer biases, among them attentional biases, slips of the ear, perceptual confusion and biases, and detectability (Cutler 1982). One alternative is to analyze tape-recorded spontaneous speech and filter out all slips. In fact, a couple of such audio-taped corpora have been accumulated (see Poulisse 1999: 6). Interestingly, no major discrepancies between them and the pen-and-paper corpora were found, which suggests that the majority of the caveats against the latter were unwarranted (but see our findings on the rate of exchanges below). Another alternative, however, earned more attention, namely experimentally induced slips. The most famous one among ‘a dozen competingplans techniques for inducing predictable slips in speech’ (Baars 1992) is the SLIP-technique (Spoonerisms of Laboratory-Induced Predispositions) by Motley and colleagues (Baars, Motley, and MacKay 1975) 2 . With these techniques the notorious problem of language production research – the missing control over the dependent variable – can be attenuated to a certain extent, though not completely overcome. 3 The weakness of this technique lies in the reduced range of application. Only a subset of speech errors (syntagmatic ones, i.e., exchanges, anticipations, and perseverations) can be induced and also only a subset of units of processing can be manipulated, most prominently phonemes, and to a lesser extent also morphemes and words. Experimental techniques are, however, very well suited for testing a specific hypothesis and focusing only on a sub-sample of slips. Obviously, there are strengths and weaknesses in each of the sketched methods. Optimally, one should choose a research tool which maximizes the advantages and minimizes the disadvantages. Contrary to what Meyer (1992) claims, slip research is not yet outdated. If the scope of the various techniques is taken into account properly and if they are combined as tools to answer novel research questions, this can open up new vistas to psycholinguistic research. 2
The Frankfurt DFG-project on slips of the hand and tongue
In her research program on the impact of modality on language production, 4 Helen Leuninger succeeded in such a broadening of slip research. She introduced the following three novelties: (i) Inter-modal comparison of slips of the tongue and hand (ii) Creation of two extensive and ‘objective’ slip corpora
Language Production Errors
287
(iii) Combination of the corpus technique with an experimental slip technique. The leading research question in our project was, which aspects of language production and monitoring are modality-dependent and which are not. Our assumption was that the language processor is amodal but the content is modality-dependent. Likewise for monitoring. 5 Theories and models of language production have almost exclusively relied on evidence from spoken language. Vitally important for any such model, however, is not so much cross-linguistic research within the same, i.e., oral modality, but the investigation of languages which are produced in a different modality, i.e., sign languages (Hohenberger, Happ, and Leuninger 2002, Keller, Hohenberger, and Leuninger 2003, Leuninger, Hohenberger, Waleschkowski, Happ, and Menges 2004). Only then can universalist claims of language production models such as Garrett’s (1975, 1980, 1988) and, more recently, Levelt’s model (Levelt 1989, 1999, Levelt, Roelofs, and Meyer 1999) 6 really be substantiated. The two languages to be compared in our project are German Sign Language (Deutsche Gebärdensprache, DGS) and Spoken German. 2.1
Inter-modal comparison of signed and spoken language
Whereas the history of speech error research dates back more than a hundred years (see section 1), research into slips of the hand (and sign language psycholinguistics in general) is still in its infancy. The first corpus of slips of the hand in American Sign Language (ASL) was presented in Klima and Bellugi (1979) and Newkirk, Klima, Pedersen, and Bellugi (1980). They found evidence for the known slip categories and affected units, which was taken as further evidence for sign language as a fullyfledged natural language system. Especially the prevalence of sub-lexical errors involving one of the three main phonological features of a sign – handshape, movement, and place of articulation – was clear evidence against a holistic conception of signs as mere pantomimes without any internal linguistic structure. Thanks to the advances in sign language linguistics (since the seminal work of Stokoe 1960) and psycholinguistics (Emmorey 2002), no one seriously doubts the status of sign languages as natural languages with a full-blown phonology, morphology, and syntax. Our research enterprise departed from the generally established universality assumption and asked what impact the different modality had on language production processes. Table 1 summarizes the main differences between spoken and signed languages with respect to processing:
288
Annette Hohenberger and Eva-Maria Waleschkowski
Table 1. Processing characteristics of sign and spoken languages
Modality
German Sign Language (DGS) visual-gestural
Spoken German aural-oral
Articulators
manual, non-manual
vocal tract
Dimension of Processing
vertical
horizontal
Seriality
low
high
Simultaneity
high
low
Morphological type
simultaneous/fusional
serial/concatenative
Production characteristics
high information load low information load on on few big chunks many small chunks
Sign languages, according to Brentari (2002), are characterized by vertical processing, i.e., linguistic information is predominantly organized in a fusional/simultaneous fashion, with phonological, morphological, and syntactic information being distributed over various manual and non-manual articulators, including the body and the face. Signs, however, have a low production speed. The ratio of spoken words and signs is 2:1 (Klima and Bellugi 1979, Gee and Goodhart 1988). This is because the articulators in sign language are gross-motor as compared to the fine-motor articulators of spoken language. On the propositional level, however, the production ratio is equal, again. This is because much more information can be transmitted in sign language simultaneously. Thus, in sign language, few big chunks carry a lot of information. This specific information packaging strategy is a result of a dynamic adaptation to the processing requirements of signed languages on the Phonetic Form (PF-) interface. Spoken languages, on the other hand, are characterized by horizontal processing, i.e., linguistic information is predominantly organized in a linear fashion with many small chunks carrying little information. For spoken language, the information packaging strategy is thus different. However, both languages are equally apt at satisfying the requirements on any efficient language processor, namely to process language in real time (Slobin 1977, Gee and Goodhart 1988, Hohenberger et al. 2002, Leuninger et al. 2004). Typologically, sign languages form a class unto themselves with respect to canonical word shape: signs are typically mono-syllabic and polymorphemic. They seem to all fall into this typological class whereas spoken languages seem to be freer to choose which of the possible combinations of syllable status (mono- or poly-syllabic) and morpheme status (mono- or
Language Production Errors
289
poly-syllabic) they instantiate. Obviously, the processing constraints are uniform and high on all sign languages, making this solution the only really feasible one. 2.2
The two corpora
As pointed out above, we used a combination of methods: the corpus method which will be outlined here and the experimental method which will be outlined in section 3. As for the corpus method, we used an elicitation technique for spontaneous speech and sign, respectively. Hearing and deaf subjects were asked to tell 14 picture-stories under various cognitive stress conditions (such as time-pressure, unordered pictures, and cumulative repetition of pictures), in order to raise the number of slips. 7 Slips of the Hand: Data Base, FILE-MAKER Name, Clip-Nr., Nr. EM, 40, 44, rep, l, 14 Condition, Story Message error no Appropriateness no Sentence plan no Slip yes Movie: Text & Translation: XXX PARENTS HAVENO-IDEA XXXÅ HIS ‘His parents have no idea’
Type of slip Affected unit Domain Repair locus of repair Editing Conduite
Anticipation Phonology (handshape) DP no — no no
Comments: Handshape anticipation.
Figure 1. Exemplary entry of the slip data base (DGS)
16 signers and 23 speakers were audio- and/or video-taped throughout sessions of 30-45 minutes. Speech and sign 8 was transcribed and the audio and videotapes were scrutinized for slips. As for slips of the hand, two deaf colleagues, Daniela Happ and Elke Menges, were mainly in charge of the classification. Slip (and repair) sequences were digitized as audio- and video-
290
Annette Hohenberger and Eva-Maria Waleschkowski
clips and fed into two extensive data bases. The slips were categorized according to various psycholinguistic parameters, as illustrated in Figure 1. Figure 1 mimics one page of the slips of the hand data base, which, in the original, is in FILE-MAKER format. In this exemplary slip of the hand, the signer intends to sign SEINE ELTERN 9 (‘his parents’) but commits a phonological anticipation: she anticipates the (Y-)handshape of the sign for ELTERN (‘parents’) on the possessive pronoun SEINE (‘his’), leading to the slip XXX. SEINE (‘his’) is normally signed with the flathand. 2.3
Results of the corpus study
We obtained n=640 slips of the hand from 16 signers and n=944 slips of the tongue from 23 speakers. Table 2 shows the distribution of slip types and affected units for both languages, Spoken German and DGS (in italics). Evidence for modality-independence of processing is provided by the fact that (i) all slip categories (anticipation, perseveration, substitution, blend, exchange, fusion, etc.) and (ii) all units of processing (phonological features/segments, morphemes, words, phrases) known from speech error research are also instantiated in DGS. Evidence for modality-dependence of processing is provided by the fact that the distribution of errors for (i) slip categories and (ii) affected units is different between the spoken and the signed language. Especially the latter difference shows that the information packaging in both languages differs, which can be readily explained by the contrasting processing characteristics of both languages discussed above (see section 2.1). In DGS, the word is a more prominent unit of processing than in spoken language (50% vs. 34%). As pointed out above, a single sign can carry information which, in a spoken language, would be distributed over various words (Leuninger et al. 2004). Therefore, the syntagmatic axis, or, horizontal processing, is more prominent in spoken than in sign language. Phrasal errors, which are quite numerous in Spoken German (16%) therefore rarely occur in DGS (1%). Likewise, the non-concatentive morphology makes it very hard for signed morphemes to be detached and manipulated separately by the processor. Morpheme errors, therefore, are less frequent in DGS than in Spoken German (6% vs. 18%). The difference in the occurrence of slip types is not as pronounced. Again, this is in accordance with our hypothesis as slip types reflect language production processes and levels of processing which are expected to be the same for both languages.
Sum % 100
1
4.5
12
11
640
9
addition
944
43
deletion
1.2
50
Sum
11
exchange
0.1
68
5
8
113
31
38
157
139
N
8
1
fusion
20
0.3
3.3
16.5
5.9
5.1
22.7
19.5
%
spreading
188
3
blend
sem + form
31
56
substitution
form
48
Harmony
156
214
perseveration
sem
184
N anticipation
slip type
100
1.3
1.9
1.7
7.8
10.6
0.8
1.3
17.7
4.8
5.9
24.5
21.7
%
34.6
327
1
19
2
1
38
3
16
121
25
3
56
42
50.3
322
4
8
49
62
4
4
106
23
2
44
16
word
affected unit
266 41.5
30.3
8
6
1
2
1
3
35
102
108
286
6
9
5
8
6
44
113
95
phoneme
18.4
174
1
13
4
7
35
25
1
44
44
6.4
41
2
3
1
1
7
7
1
7
12
morpheme
16
151
150
1
phrase
0.8
5
3
1
1
0.6
6
1
2
3
other
0.9
6
4
2
Language Production Errors
Table 2. Slip types and affected units in Spoken German and DGS (in italics)
291
292
Annette Hohenberger and Eva-Maria Waleschkowski
The two broad slip classes – paradigmatic errors of selection and syntagmatic errors of linearization (Bierwisch 1970) – are equally well attested in both languages. The only remarkable difference we found concerns ‘fusions’, a rare category in spoken language (<1%) which, however, in DGS has a medium rate of incidence (8%). Fusions occur when – at a late, surface-near stage of processing – two words are fused into a single word frame, as in “Gib mir mal den Stuhl, Ulrich” Æ “Gib mir mal den Stuhlrich” (‘Give me the chair, Ulrich’ Æ ‘Give me the chulrich’). Sign languages, with their fusional and non-concatenative morphology, tend to fuse not only in on-line processing but also in regular and productive morphological processes such as compounding. In compounds, two independent signs are fused into the frame of a single sign word (Happ and Hohenberger 2000, Leuninger et al. (in press)) – obeying the constraints on the size of the phonological word in sign language (Brentari 1998, 2002). There is one result which, however, was contrary to our expectations: the rate of exchanges in general and that of morphological exchanges in particular. Exchanges are a major slip category which has a rate of occurrence of about 20% in many pen-and-paper slip corpora (see section 1). In our corpus, however, we rarely found any (around 1% in both corpora). Given the abundance of anticipations and perseverations in both corpora, there must be other explanations for their almost complete lacking in our data (see Hohenberger et al. 2002, Leuninger et al. 2004). There are two possible explanations. First, our more objective sampling method cancelled out observers’ tendency to report more frequently such salient slips as exchanges. In fact, our findings are more consistent with Poulisse’s (1999) claim that exchanges are quite rare, indeed. Second, the rate of exchanges is confounded with monitoring. If monitoring is fast, the exchange will only surface as an anticipation; the perseveratory part will not be visible because the monitor will have had enough time to catch and repair it. This is a very likely explanation for sign language, where signing rate is slow but the monitor is equally fast as in spoken language (Hohenberger and Keller 2002), but not a good explanation for spoken language with its high speech rate. Thus, there was a gap of unknown origin in our data which called for further inquiry and a new methodology. 3
The slip experiment
Given the fact that the corpus data did not yield evidence for exchange errors, we decided to supplement the corpus study with an additional experimental study for the purpose of eliciting exchanges. We chose the ‘repeat-
Language Production Errors
293
reverse paradigm’ (Baars 1992, Humphreys 2003) rather than the SLIPtechnique as the latter elicits exchanges only indirectly, whereas the former elicits them directly. We focused only on morphological exchanges as morphology is the domain in which Spoken German and DGS (and Sign Languages in general) differ most clearly typologically, in terms of concatenativity (see section 2). A morphological exchange is also called ‘stranding error’ because typically the roots of two polymorphemic words exchange whereas the bound inflectional/derivational morpheme(s) stay in situ, they ‘strand’, as in ‘I thought the park was truck-ed Å the truck was park-ed’ (Garrett 1980: 188). As for DGS, we were especially interested if the distribution of morphemes over manual and non-manual articulators as well as the fusional character of sign language morphology had an impact on the detachability of morphemes which is crucial for errors to happen. As for spoken German, we expected detachability to be high for concatenative morphemes but lower for non-concatenative morphemes as in Ablaut and Umlaut. Our hypotheses are as follows: H1 (only for DGS): Morphemes distributed over different articulators (manual vs. non-manual) are easier to detach and should produce more root exchanges. H2: Concatenative morphemes in Spoken German and DGS should detach more easily than non-concatenative ones. Since sign language morphology is less concatenative, we expect fewer morphological exchanges in DGS. H3: Morphological information can be processed on different levels. On the first level, abstract morphological information is processed; on the second level, the morpho-phonological form is processed. Non-concatenative morphemes can only be manipulated on the former level, concatenative ones on both. 3.1 3.1.1
The slip experiment for Spoken German Method
A total of n=26 subjects participated in the study. Subjects were required to exchange poly-morphemic words. First, they had to learn by heart and memorize two short phrases containing a mono- and a poly-morphemic target word, e.g. das Auto[SG] (‘the car[SG]’) and die Lok-s[PL] (‘the locomotive-s[PL]’). This pair was followed by a ‘priming list’ of two pairs of phrases priming root exchanges and two filler items, interspersed randomly between pairs of phrases. Root exchanges were primed by always keeping
294
Annette Hohenberger and Eva-Maria Waleschkowski
the order of the root-affix combination the same as in the target pair (e.g., sg-pl for the above example) so that subjects should have established firm morphological frames in which only the roots should be mobile, as in ‘stranding errors’. After the priming list, subjects were requested to reverse or to repeat the critical poly-morphemic word from the target pair. 10 The order of the reverse/repeat condition was randomized. The experimental course was organized as a semi-self-paced procedure. The stimuli were presented on a SONY VAIO PCG-SR1K notebook as audio sequences. The subjects were allowed to listen to the target pairs as often as necessary to remember them later. The priming list had to be shadowed silently. It was directly followed by the request to reverse/repeat the critical items. On average, the duration of the experiments amounted to 60 minutes, during which subjects were audio-taped. We were primarily interested in and primed for root exchanges, the classical cases of morpheme exchanges. There are, however, several other possible outputs such as whole word exchanges (das Auto[SG] Æ die Lok-s[PL]) and affix exchanges (das Auto[SG]Æ die Auto-s[PL]). The experimental material comprises different types of morphological processes, namely inflection (e.g. tense, number), derivation (e.g. diminutive, nominalization, aspect), and compounds. Concerning the different susceptibility of the various morphological processes to decomposition, we had no specific hypotheses. 3.1.2
Results for Spoken German
The results, as summarized in Table 3, show that the reverse/repeat paradigm is an appropriate method to elicit morphological errors in Spoken German. In the reverse condition, most of the errors are word exchanges (67.2%). As expected, the reverse condition elicits more root exchanges than affix exchanges. 11 In all, we obtained 119 root exchanges (10%) and 31 affix exchanges (2.6%). In 113 cases the condition was not met. In the repeat condition, most cases are word repetitions – as expected. Interestingly, this condition also caused exchanges, in particular more affix exchanges than root exchanges. We obtained 32 root exchanges and more than twice as many affix exchanges (n=76). Adding up the root and affix exchanges for both conditions, the mean percentage amounts to 9% for the repeat condition and to 13% for the reverse condition.
Language Production Errors
295
Table 3. Distribution of error types under the repeat/reverse condition for Spoken German Condition Type
Reverse
Repeat
n
%
n
%
word repetition
113
9.4
981
82
word exchange
804
67.2
26
2.2
root exchange
119
10
32
2.7
affix exchange
31
2.6
76
6.4
Other
87
7.3
66
5.5
Omission
42
3.5
15
1.2
1196
100
1196
100
Sum
As predicted, most cases show root exchanges in serial morphemes. Here, the diminutive “Bäum-chen/Blüm-lein”(‘tree-DIM/flower-DIM’) is affected most frequently (19 cases), followed by the nominal derivation (12 cases) and the regular -s plural inflection (11 cases). Four serial morpheme types comprising semi-regular plural inflection, adjective derivation, nominal composition, and regular participle inflection occur eight times in each case. The non-concatenative target pairs such as the irregular tense inflection and the irregular plural inflection can also be separated though less frequently than the concatenative morphemes. Note that there is only a small difference in the amount of exchanges between the concatenative regular tense inflection (n=6, “er lach-t – sie brüll-te”, ‘he laughs – she cried’) and the non-concatenative irregular tense inflection (n=5, “er läuf-t – sie sprang”, ‘he walks – she jumped’). 3.1.3
Evaluation of hypotheses for Spoken German
The main goal of this part of the study was to investigate the decomposability of concatenative and non-concatenative poly-morphemic word forms in spoken German. In German, many of the irregular forms are nonconcatenative, whereas regular forms tend to be concatenative. As hypothesized, non-concatenative irregular forms detach less frequently than concatenative regular forms. Hence, our second hypothesis can be confirmed. The fact that irregular forms can be manipulated, too,
296
Annette Hohenberger and Eva-Maria Waleschkowski
confirms our third hypothesis, namely that morphologically abstract vs. concrete morpho-phonological information is processed on different levels. Otherwise an exchange such as “Vater-SG - Mütter-PL Æ Mutter-SG – Väter-PL“ (‘father-SG – mothers-PL Æ mother-SG – fathers-Pl’) could not have taken place. Tense and plural inflection show a striking pattern in terms of decomposability. Surprisingly, there was no significant difference between regular (concatenative) and irregular (non-concatenative) tense inflection (n=6 vs. 5). Further research on regular vs. irregular tense-inflection is needed in order to separate the relevant factors, here. However, the results obtained from regular and irregular plural inflection provides clear evidence for their different status. The regular concatenative plural inflection -s is affected twice as often as the irregular non-concatenative plural inflection (n=11 vs. 5) – in concord with our hypothesis 2. In German, the regular -s plural is considered the default plural form (Marcus, Brinkmann, Clahsen, Wiese, and Pinker 1995). In processing, it undergoes decomposition according to the dual-route model (Pinker and Prince 1994, among many others). As a consequence, this plural form is expected to be affected more frequently than irregular plural forms which have separate lexical entries and do not undergo decomposition. A crucial issue in our findings is that irregular forms can be separated, indeed. One of the advantages of the reverse-repeat paradigm is that it addresses decomposition directly, whereas different reaction times obtained in priming experiments (e.g., Sonnenstuhl et al. 1999) only indirectly bear on the issue of decomposability of regular and irregular forms. There are, however, also other explanations apart from concatenativity, having to do with the similarity in meaning of the roots and affixes, with imaginability of the items, and with differences in responding to the two conditions – repeat and reverse. For a detailed discussion of these aspects, see Waleschkowski 2004. 3.2
The slip experiment for German Sign Language
Following the same logic as in the experiment described for Spoken German, morphological stimulus material was generated for DGS. In DGS, there are serial as well as non-serial morphemes, too (see Leuninger et al. 2005) for an extensive overview). The aspect morpheme ‘habitually’ or ‘repeatedly’, for example, is expressed by reduplication of the sign or the root, respectively. The ‘distributive’ agreement morpheme, as in GEBENJEDEM (‘to give each of them’) is also expressed by repeating the sign (between the spatially localized source and goal argument(s) of the predi-
Language Production Errors
297
cate). Plural, as overtly marked in some signs, is conveyed by reduplication. The bulk of sign language morphology, however, is nonconcatenative, simultaneous. Morphemes are either expressed by phonological alternations in handshape (as in classifier morphemes) or movement. Moreover, morphemes can be distributed over manual and nonmanual articulators to which our first hypothesis relates. Thus, some signs are obligatorily accompanied by facial expressions, e.g. furrowed eyebrows, as in (i)
facial expression: manual:
search SUCHEN (‘to search’),
ÜBERLEGEN (‘to ponder’), and many others. In the DGS slip corpus, there were a few examples of this type, involving facial expressions, mouth gestures, and body leans (see Leuninger et al. 2004, 2005). In order to find out to what extent these different morpheme types can be manipulated by the processor, we included +/-bound morphemes, +/serial morphemes, and +/- distributed morphemes in the stimulus material. Specifically, we included aspect (iterative, habituative aspect), agreement (to VERB each-of-them vs. all-of-them), plural, negation (either nonmanual negation, i.e., head-shake or manual negation, i.e., D-negation, see below), and adverbial modification (to VERB slowly; facial expression). 3.2.1
Method
A total of n=16 deaf signers participated in the study. Essentially, the same method and procedure was used as described for Spoken German (see 3.1.1). Video clips of the target phrase pairs and the priming list were presented on the laptop. The deaf signers were told to ‘shadow’ these phrases, either covertly, or overtly, as in ‘sign whispering’. The signers were videotaped throughout the experiment. An example of a target pair is given below (see Fig. 2): Target pair: verb, affirmative vs. negative, non-concatenative morpheme (D-negation) 12
298
Annette Hohenberger and Eva-Maria Waleschkowski
HELFEN BRAUCH-AFF 13 help need-AFF ‘You have to help’
ARBEIT MUSS-DNEG work must-DNEG ‘You don’t have to work.’
D
D
Fig. 2a: MUSS-DNEG ‘need-not’
3.2.2
Fig. 2b: BRAUCH ‘need’
Results for German Sign Language
As in the Spoken German data, root as well as affix exchanges were obtained under both the repeat and the reverse condition. The error distribution is given in Table 4: Table 4. Distribution of error types under the repeat/reverse condition for German Sign Language Condition
Reverse
Repeat
Type
n
%
n
%
word repetition
31
13.8
185
82.6
word exchange
166
74.1
16
7.1
root exchange
5
2.2
11
4.9
affix exchange
7
3.1
3
1.3
other
14
6.25
9
4
omission
1
0.45
0
0
224
100
224
100
Sum
As can be seen from Table 4, there are considerably fewer morpheme exchanges in DGS (2.2 - 4.9% root exchanges; 1.3 - 3.1% affix exchanges) as
Language Production Errors
299
compared to Spoken German (2.7 - 10% root exchanges; 2.6 - 6.4% affix exchanges). Roughly, there are twice as many of them in Spoken German than in DGS. This result mirrors the result we had obtained in the corpus study (see section 2.3). Table 5. Ranking of morpheme types in DGS morpheme type/ +/- serial process *serial, bound D-negation, derivation Agreement, +/- serial, bound inflection Aspect, serial, bound inflection All others mixed
root exchanges
affix exchanges
12
6
3
1
1
2
0
0
How is the set of morphological exchanges in DGS composed with respect to concatenative and non-concatenative morphemes? Table 5 shows the distribution of errors with respect to morpheme types. The D-negation was most error-prone. Figure 3 depicts a complete affix exchange for the input pair ARBEIT MUSS-DNEG (‘work must-DNEG’) / HELFEN BRAUCH-AFF (‘help need-AFF’) from section 3.2.1: ARBEIT MUSS-AFF must-AFF work ‘You have to work.’
HELFEN BRAUCH-DNEG help need-DNEG ‘You don’t have to help.’
D Fig. 3a. MUSS-AFF ‘must’
D
3b. BRAUCH-DNEG ‘need-not’
Although initially characterized as a non-concatenative morpheme expressed by a movement alternation (the straight movement of the affirmative morpheme become an D-movement), the high number of exchanges casts doubt on this analysis. A close inspection of the signers’ productions
300
Annette Hohenberger and Eva-Maria Waleschkowski
showed that both morphemes, the root and the affix, might actually be processed serially, with the affix sometimes being set off temporally, in very subtle ways (therefore the asterisk on *serial in Table 5). 14 Also, as pointed out for the Spoken language data (see section 2.1.3), other considerations such as similarity in meaning might play a role in the easy permutability of the roots MUSS- (‘must’) and BRAUCH- (‘need’) in this particular target pair. The other morphemes which, however, figured less prominently in the error array were also mostly serial morphemes (the aspectual inflection and the agreement inflection), thus confirming our second hypothesis. Interestingly, no decomposition of the non-manual morphemes referred to by hypothesis 1 occurred. Thus, the facial expression accompanying SUCHEN (‘to search’) never stranded. Contrary to our predictions derived from a couple of such errors in the corpus study, it seems that vertically-aligned non-manual morphemes are closely connected to their manual roots, indeed. Thus, hypothesis 1 has to be rejected. 4
Conclusions
We would like to draw the following conclusions from our research reported here. First, with respect to the content of our research, we have shown that language processing is basically modality independent. The prevalence of the same slip categories and affected units provides evidence that producing speech and sign proceeds through the same planning stages and involves the same computational vocabulary. Modality differences are witnessed especially by the different distribution of affected units. They are related to the alternative information packaging strategies due to different constraints on the level of Phonetic Form (PF) imposed on them by the two modalities. Second, with respect to methodological issues, both naturalistic and experiment data, as we obtained from the broad corpus study and the focused SLIP-study, can supplement each other and yield converging evidence for the same phenomena from different perspectives. Thus, while the SLIPexperiment roughly reproduced the ratio of morphological errors found in the corpus study, it also provided insight into the specifics of morphological decomposition during on-line processing in both modalities. Third, in situating our research within the history of slip research, we want to emphasize the progress that has been made since the seminal work of Rudolf Meringer by so many psycholinguists in this tradition and the
Language Production Errors
301
potential of slip research which is still to be developed further, in combination with the growing canon of psycholinguistic methods.
Acknowledgement With this paper, we want to honour Helen Leuninger for her ceaseless commitment to the advancement of sign language research and thank her for promoting our scientific research skills through the originality of her ideas, the broadness of her linguistic experience and her exceptional interpersonal teaching skills.
Notes 1. 2.
3. 4.
5. 6. 7. 8. 9.
For a justification of deducing normal functioning from erroneous functioning in psycho- and neurolinguistics, see Caramazza (1984, among others). In the SLIP technique, subjects have to read lists of word pairs which are set up as to prime certain onsets, e.g., in the list flat-freight; flag-fraud; flashfront, all pairs begin with fl-fr. If now the critical item fruit-fly is presented, subjects slip with a certain probability (around 10-14%) and produce the spoonerism flute-fry (or partial spoonerisms). For a discussion of this problem, see Butterworth (1980) and Pechmann (2003). The research reported here was carried out in Helen Leuninger’s DFG-project ‘Language production errors and their repairs in dependence on the modality: German Sign Language vs. Spoken German (LE 596/6, 1-3)’ from 1997 – 2003, in the scope of the Schwerpunkt program ‘Language Production’ of the German Research Foundation DFG (Deutsche Forschungsgemeinschaft). The collaborators were Annette Hohenberger, Daniela Happ, Elke Menges, and Eva Waleschkowski. We will not discuss monitoring in this paper but see Hohenberger, Happ, and Leuninger (2002), Leuninger, Hohenberger, Waleschkowski, Happ, and Menges (2004), and Hohenberger and Keller (2002). This desideratum holds true for any language production model, be they connectionist, cascading, or serial modular models. For an overview of various approaches to language production, see Jescheniak (1999). For details of the method, the reader is referred to Hohenberger et al. (2002). The signers’ output was only partially transcribed because of the immense temporal demands of sign language transcription. Slips of the hand, of course, were completely transcribed. Signs are glossed with capital letters, as is the convention in the literature. XXX stands for a slip which has a possible but unattested form, i.e., which is not a lexical sign itself.
302
Annette Hohenberger and Eva-Maria Waleschkowski
10. Subjects were not instructed what element(s) to reverse in order to allow for whole word, root, or affix exchanges. 11. Exchanges were either complete or partial, i.e., only one member of the pair was affected. This does not corrupt the logic or the goal of the experiment. Morphological decomposing is required in either case. 12. In DGS, some predicates (mostly modals) can be negated by changing the straight movement into an alpha-movement, i.e., the hands move on a path which looks like an alpha. 13. The respective affirmative morpheme, i.e., the unmarked case, is not overtly expressed. 14. We might be witnessing a process of creolization, here. Some signers might still keep an active record of the morphological composition of those polymorphemic signs which undergo derivation with DNEG. For others, this process is already completed and not available in on-line processing anymore.
References Baars, Bernard 1992 A dozen competing-plans techniques for inducing predictable slips in speech and action. In B.J. Baars (ed.), Experimental slips and human error: Exploring the architecture of volition, pp. 129-150. Plenum Press, New York. Baars, Bernard, Michael T. Motley, and Donald G. MacKay 1975 Output Editing for Lexical Status from Artificially Elicited Slips of the Tongue. Journal of Verbal Learning and Verbal Behavior, 14: 382-391. Berg, Thomas 2003 Die Analyse von Versprechern. In Theo Herrmann and Joachim Grabowski (eds.), Enzyklopädie der Psychologie. Sprache. Band 1: Sprachproduktion, pp. 247-264. Göttingen: Hogrefe. Bierwisch, Manfred 1970 Fehler-Linguistik. Linguistic Inquiry, 1: 397-414. Brentari, Diane 1998 A Prosodic Model of Sign Language Phonology. MIT Press, Cambridge, MA. 2002 Modality differences in sign language phonology and morphophonemics. In Richard P. Meier, Kearsy Cormier, and David Quinto-Pozos (eds.), Modality and structure In signed and spoken languages, pp. 35-64. Cambridge University Press, Cambridge. Butterworth, Brian 1980 Introduction. In B. Butterworth (ed.), Language Production, Vol. 1, pp. 1-17. Academic Press, London.
Language Production Errors
303
Caramazza, Alfonso 1984 The Logic of Neuropsychological Research and the Problem of Patient Classification in Aphasia. Brain and Language, 21: 9-20. Cutler, Anne 1982 The reliability of speech error data. In Anne Cutler (ed.), Slips of the tongue, pp. 7-28. Mouton, Amsterdam. Emmorey, Karen 2002 Language, cognition, and the brain. Insights from sign language research. Erlbaum, Mahwah, NJ. Ferber, Rosa 1995 Reliability and Validity of Slip-of-the-Tongue Corpora: A Methodological Note. Linguistics, 33: 1169-1190. Flores d’Arcais, G.B. and Willem J.M. Levelt 1970 Advances in Psycholinguistics. North Holland, Amsterdam. Fromkin, Victoria A. 1973 Speech Errors as Linguistic Evidence. Mouton, The Hague. Garrett, Merrill F. 1975 The Analysis of Sentence Production. In Roger Wales and E. Walker (eds.), New Approaches to Language Mechanisms, pp. 231-256. North Holland Publishing Company, Amsterdam. 1980 Levels of Proceeding in Sentence Production. In Brian Butterworth (ed.), Language Production. Volume 1. Speech and Talk, pp. 177-220. Academic Press, New York. 1988 Processes in language production. In Frederick Newmeyer (ed.), Linguistics: The Cambridge survey. Vol. III, Psychological and biological aspects, pp. 69-96. Cambridge University Press, Cambridge. Gee, James and Wendy Goodhart 1988 American Sign Language and the human biological capacity for language. In Michael Strong (ed.), Language learning and deafness. Cambridge University Press, Cambridge. Happ, Daniela and Annette Hohenberger 2000 Phonologische und morphologische Aspekte der Sprachproduktion in Deutscher Gebärdensprache (DGS). In: Helen Leuninger and Karin Wempe (eds.), Gebärdensprachlinguistik 2000: Theorie und Anwendung. Signum, Hamburg. Hohenberger, Annette, Daniela Happ, and Helen Leuninger 2002 Modality-dependent aspects of sign language production: Evidence from slips of the hands and their repairs in German Sign Language. In Richard P. Meier, Kearsy Cormier, and David Quinto-Pozos (eds.), Modality and structure in signed and spoken languages, pp. 112-142. Cambridge University Press, Cambridge. Hohenberger, Annette and Jörg Keller 2002 On the amodal nature of the monitor: Sign vs. spoken language processing. Poster presented at the 15th CUNY Conference on Human Sentence Processing. New York, March 21-23, 2002.
304
Annette Hohenberger and Eva-Maria Waleschkowski
Humphreys, K.R. 2003 The production of inflectional and derivational morphology: Evidence from elicited speech errors. Unpublished PhD thesis. University of Illinois at Urbana-Champaign. Jescheniak, Jörg 1999 Accessing words in speaking: Models, simulations, and data. In Ralf Klabunde and Christiane v. Stutterheim (eds.), Representations and processes in language production, pp. 237-257. Deutscher Universitäts Verlag, Wiesbaden. Keller, Jörg, Annette Hohenberger, and Helen Leuninger 2003 Sign language production: Slips of the hand and their repairs in German Sign Language. In Anne Baker, Beppie van den Bogaerde, and Onno Crasborn (eds.), Cross-linguistic perspectives in sign language research. Selected papers from TISLR 2000, pp. 307-333. Signum, Hamburg. Klima, Edward S. and Ursula Bellugi 1979 The signs of language. Harvard University Press, Cambridge, MA. Leuninger, Helen, Annette Hohenberger, Eva-Maria Waleschkowski, DanielaHapp, and Elke Menges 2004 The impact of modality on language production: Evidence from slips of the tongue and hand. In Thomas Pechmann and Christopher Habel (eds.), Multidisciplinary approaches to language production, pp. 219277. Mouton de Gruyter, Berlin, New York, Amsterdam. Leuninger, Helen, Annette Hohenberger, Eva-Maria Waleschkowski, and Elke Menges 2005 Zur Verarbeitung morphologischer Informationen in der Deutschen Gebärdensprache (DGS). In Helen Leuninger und Daniela Happ (eds.), Gebärdensprachen: Struktur, Erwerb, Verwendung. Linguistische Berichte, Sonderheft 13, pp. 325-358. Helmut Buske Verlag, Hamburg. Levelt, Willem J.M. 1989 Speaking. From intention to articulation. Cambridge, MA: MIT Press. 1999 Producing spoken language: A blueprint of the speaker. In Colin M. Brown and Peter Hagoort (eds.), The neurocognition of language, pp. 83-122. Oxford: Oxford University Press. Levelt, Willem J.M., Ardi Roelofs, and Antje S. Meyer 1999 A theory of lexical access in speech production. Behavioral and Brain Sciences, 22: 1-75. Marcus, Gary F., U. Brinkmann, Harald Clahsen, Richard Wiese, and Stephen Pinker 1995 German inflection: The exception that proves the rule. Cognitive Psychology, 29: 189-256. Meringer, Rudolf 1908 Aus dem Leben der Sprache: Versprechen, Kindersprache, Nachahmungstrieb. Behr’s Verlag, Berlin.
Language Production Errors
305
Meringer, Rudolf and Karl Mayer 1895 Vesprechen und Verlesen: Eine Psychologisch-Linguistische Studie. Göschense Verlagsbuchhandlung, Stuttgart. Meyer, Antje S. 1992 Investigation of phonological encoding through speech error analyses: Achievements, limitations, and alternatives. Cognition, 42: 181-211. Newkirk, Don, Edward S. Klima, Carlene C. Pedersen, and Ursula Bellugi 1980 Linguistic evidence from slips of the hand. In Victoria A. Fromkin (ed.), Errors in linguistic performance: Slips of the tongue, ear, pen, and hand, pp. 165-197. Academic Press, New York. Pechmann, Thomas 2003 Experimentelle Methoden. In Theo Herrmann and Joachim Grabowski (eds.), Enzyklopädie der Psychologie, Sprache, Band 1: Sprachproduktion, pp. 27-49. Göttingen: Hogrefe. Pinker, Stephen and Alan Prince 1994 The reality of linguistic rules. Regular and irregular morphology and the psychological stats of rules of grammar. In Susan D. Lima, Roberta L. Corrigan, and Gregory K. Iverson (eds.): The reality of linguistic rules. John Benjamins, Philadelphia. Poulisse, Nanda 1999 Slips of the tongue. Speech errors in first and second language production. Studies in bilingualism, Vol. 20. John Benjamins, Amsterdam, Philadelphia. Sonnenstuhl, Ingrid, Sonja Eisenbeiss, and Harald Clahsen 1999 Morphological priming in the German mental lexicon. Cognition, 72: 203-236. Slobin, Dan I. 1977 Language change in childhood and in history. In John MacNamara (ed.), Language Learning and Thought, pp. 185-214. Academic Press, New York. Stokoe, William C. 1960 Sign language structure: An outline of the visual communication system of the American deaf. Studies in Linguistics, Occasional Papers 8. Linstok Press, Silver Spring, MD. Waleschkowski, Eva 2004 Morphological decomposability of concatenative and nonconcatenative word forms. Evidence from slip experiments. In Proceedings of the International Graduate Conference ConSOLE XII (2003). Online available at URL: http://www.sole.leidenuniv.nl/content_docs/ConsoleXII2003pdfs/wale schowski-2003.pdf
A Multi-Evidence Study of European and Brazilian Portuguese wh-Questions1 Mary Aizawa Kato and Carlos Mioto
1
The aims
Using the Principles and Parameters (PP) model of language change (Lightfoot 1999; Roberts 1993; Kroch 1994; among others), diachronic studies of Brazilian Portuguese (BP) have shown some major changes in the syntax of wh-questions since the 18th century (M.E. Duarte 1992; Lopes-Rossi 1993). On the other hand, formal synchronic analyses of European Portuguese (EP) (Ambar 1988; Barbosa 2001; I. Duarte 2000) lead us to assume that this variety has apparently preserved the properties of the 19th century. The aim of this paper is to compare contemporary EP and BP wh-questions using equivalent written corpora and speakers’ intuition of both varieties. We focus on the aspects that were observed to have changed in BP, namely the VS order, the appearance of é que (‘is that’) questions with the drop of the copula, and the increase of wh-in-situ constructions. The ultimate goal is to provide a theoretical interpretation of the generalizations found, using as our framework the PP model. The questions to be answered are the following: a) since BP wh-questions have drawn from diachronic corpora and EP patterns have been postulated using the linguists’ introspection, how do the two varieties compare when similar corpora are used? b) what reaction do we get when constructed data are tested with speakers of both varieties? c) what theoretical account can we give for eventual qualitative and/or quantitative differences between the two varieties? The empirical, quantitative analysis was based on a corpus found in the website http://acdc.linguateca.pt/acesso/ for EP, the subcorpus NaturaPúblico and for BP the NILC-São Carlos. As the Brazilian diachronic studies used only plays, we conducted a separate analysis using two Portuguese plays to see if genre could explain eventual differences.2 The
308 Mary Aizawa Kato und Carlos Mioto need to form certain minimal pairs and the absence of more complex data in the corpus led us also to use tests of judgement with native speakers. 2
Previous studies
Adams’ (1987) study of Old French and the similarities between Modern French and BP provided Brazilian linguists with hypotheses to investigate whether similar changes have occurred in this language, namely loss of V2 and of the null subject (NS). Ribeiro (1995) discovers that Old Portuguese exhibited the properties of a V2 language. But the changes that occurred in Old French took place in BP much later. M.E. Duarte (2000) shows that NSs had an incidence of 80% in the middle of the 19th century. The situation is found reversed at the end of the 20th century, with a little more than 20% of NS, a clear indication of a change in progress. As for the VS order, Torres Morais (1993) and Berlinck (1995) distinguish two types in the 19th century: the Romance VXS and the Germanic VSX. Torres Morais finds that from the 19th century to the present date the VSX order, which was already rare (6%), becomes totally absent, and the VXS form in declaratives decreases from 20% to %. Berlinck still finds a few cases of VS(X), but her conclusion is that the VS(X) cases are with unaccusative verbs. The studies on contemporary EP, on the other hand, show that the NS and all the orders licensed in the 19th century BP are still very much alive (Barbosa 2001; Costa 1998). What we saw happen to declarative sentences in BP is mirrored in whquestions. M.E. Duarte (1992) found that the VS order in wh-interrogatives was the rule in the 19th century with bare wh-expressions, becoming restricted to mono-argumental verbs, in the last quarter of the 20th century: (1)
a. Mas que tenho eu a temer? but what have I to fear ‘But what do I have to fear?’ b. Onde andará a Neiva? where will-walk the Neiva? ‘Where will Neiva be?’ c. E onde está o resto?. and where is the rest ‘And where is the rest?’
BP (19th) BP (20th) BP (20th)
A Multi-Evidence Study of Portuguese wh-Questions
309
Kato and M.E. Duarte (forth) observe that sentences with monoargumental verbs, like (1b) and (1c), are ambiguous between the Germanic and the Romance orders. But they show that V2 languages allow a pronominal subject to appear postposed, while Romance VS does not, and that postposition of pronominal subjects is lost earlier. The NS is shown to have survived this loss. Ambar (1988) and (2000) show that both types of VS are possible in contemporary EP. (2)
a. Que ofereceu o Pedro à Ana? what offered the Pedro to the Ana ‘What did Peter offer to Ana?’ b. Que ofereceu à Ana o Pedro what offered to the Ana the Pedro
EP (20th) EP (20th)
In contemporary BP the order with pronominal subjects is always SV, a pattern considered ungrammatical by the Portuguese. (3)
Onde você andou? where you walked ‘Where have you been?’
*EP
BP (20th)
The pattern in (3) increases as the wh-questions with NS decrease. EP, on the other hand, has retained the NS in corresponding sentences: (4)
Em que ficamos? in what stand-1pl ‘Where do we stand?’
EP (20th)
M.E. Duarte (1992) relates the loss of VS to the appearance of é que questions, which is later analyzed by Lopes-Rossi (1993) as resulting from wh-extraction of cleft constructions, which licenses the SV order. (5)
O que é que eu represento? what is that I represent ‘What is it that I represent?’
BP (19th)
However, Lopes-Rossi (1993) shows that the appearance of é que (‘is that’) questions did not cause the loss of V2 in European Portuguese, though this variety also implemented the cleft-interrogatives productively. (6)
O que é que o corvo comeu? what is that the crow ate ‘What is it that the crow ate?’
EP (20th)
310 Mary Aizawa Kato und Carlos Mioto M.E. Duarte and Lopes-Rossi observed in the Brazilian corpora the appearance of wh-questions with the complementizer que in the 20th century. Kato and Raposo (1996) analyze this form as resulting from the erasure of the unmarked copula of é que,3 while the SV form would be the result of further erasure of the remaining complementizer. In EP, on the other hand, que alone is banned, and so is the order SV (Ambar 2000). (7)
a. Quem é que o Pedro viu? who is that the Pedro saw? ‘Who was it that Peter saw?’ b. Quem que o Pedro viu? c. Quem o Pedro viu?
EP
BP
*EP *EP
BP BP
Another correlation observed in Lopes-Rossi’s work is the decrease of VS order in BP and a substantial increase of wh-in-situ questions, with the in-situ wh also appearing in embedded clauses. EP also licenses wh-in-situ in root and embedded clauses, but it is restricted to echo-questions, according to I. Duarte (2000). (8)
(9)
a. E viram o quê, patetinha? and saw-3pl what, little fool ‘And they saw what, little fool?’ b. E a senhora acha que eu devo fazer o quê? and the lady thinks that I must do the what ‘And what do you think that I must do?’
BP (20th)
a. O Pedro comprou o quê? the Pedro bought the what ‘What did Peter buy?’ b. O Pedro disse que a Ana foi onde? the Pedro said that the Ana went where ‘Where did Peter say that Ana went?’
EP (20th)
BP (20th)
EP (20th)
The direct correlation between loss of VSX and the NS is difficult to capture, as Romance has also the VXS type4. This pattern has been called stylistic inversion (Kayne and Pollock 1998). A wh-question like (10) is ambiguous between a V2 type and a stylistic type: (10) Que lhe disse o Honorato? what you-said the Honorato? ‘What did Honorato say to you?’
BP (20th)
A Multi-Evidence Study of Portuguese wh-Questions
311
Though there does not seem to be any disagreement regarding what is grammatical and ungrammatical in EP, the technical treatment of the authors who worked with wh-questions in EP, or comparatively with EP and BP, are different. The differences reside mainly on the WHVS pattern and the way the authors treat the adjacency between the wh-element and the verb. For Ambar (1988) and I. Duarte (2000), the wh-element is in Spec of CP and the verb in C; for Raposo (1994) and Kato and Raposo (1996) the former is in Spec of FP and the verb in I; for Ambar (2000) the wh-element is in a WHP, inside a split periphery with the verb in WH. For Barbosa (2001) the wh-element is in Spec of IP and the verb in I. Details of their contribution that are relevant for the comparison will be discussed here. In our work, we will be using a slightly modified version of Kato and Raposo’s (1996) analysis as the base for comparison. 3 3.1
The results of the synchronic corpus research Interpretation criteria
This work involves data from a written corpus, where we assumed the presence of strong prescriptive rules interacting with the real I-language of the authors. The following criteria were used to interpret the data: a) inexistence or insignificant occurrence (below 1.0%) of a form in the corpus means its inexistence in the core grammar of the speakers; b) low percentage of a form (above 1.0%) means it is licensed in Ilanguage, but still banned by prescriptive rules; c) clear complementary distribution according to the genre of the text shows the competition of a new form vs a form in extinction. The threshold of 1% was fixed using speakers’ judgement of the relevant data. Subjects tend to reject what is below 1%, while accepting what appears above this threshold. 3.2
Results
The expectations listed in the previous section are mostly confirmed in the present corpus, namely: a) unambiguous VSX order is found productively in EP, as expected, but BP presented one unexpected VS token of the same type: (11) Que trouxe ele de novo para a construção romanesca? what brought he of new for the romance construction
EP
312 Mary Aizawa Kato und Carlos Mioto ‘What did he bring as novelty for the romance construction?’ (12) Com quem debateu o relator suas propostas? with whom discussed the reporter your proposals ‘With whom did the reporter discuss your proposals?’ b)
we found only one unambiguous case of stylistic inversion:
(13) De que está à espera o Presidente? of what is at wait the president ‘What is the president waiting for?’ c)
EP EP
cases of VS with Aux+V preceding the subject are found only in EP:
(15) Como irá entendê-lo o público lisboeta? how will understand him the public of-Lisbon ‘How will the Lisbon people understand him?’ e)
EP
ambiguous cases of VS with unergatives and transitives are also found only in EP:
(14) a. De que ri o Diamantino? Of what laughs the Diamantino ‘What does Diamantino laugh at?’ b. O que fará o terceiro? what do+fut+3sg the third ‘What will the third one do?’ d)
BP
EP
VS order with unaccusative verbs or the copula is found in both varieties, as predicted in Berlinck (1995):
(16) a. Onde morreram as esperanças e as ilusões? where died the hopes and the illusions ‘Where have the hopes and the illusions died?’ b. Onde estariam forma e teoria? where would-be-3pl form and theory ‘Where would form and theory be?’ (17) a. Com quem surgiu esse conceito? with whom appeared this concept ‘With whom did this concept appear?’ b. O que é adoção biológica? what is adoption biological ‘What is biological adoption?’
EP EP BP BP
A Multi-Evidence Study of Portuguese wh-Questions
f)
in non-cleft questions, strict SV order was the main pattern in BP, but two SV tokens were also found in EP in contexts with no marked interpretation of an echo question (cf Ambar & Veloso, 1999):
(18) Com quem ele governará? with whom he will govern ‘Who will he govern with?’ (19) a. Em que os escudos se sobrepõem à cultura ? in what the escudos se-overlap with culture ‘In what does the “escudos” overlap with culture?’ b. Em que essas críticas podem ser feitas? on what these criticisms can be made ‘On what can these criticisms be made?’ g)
EP EP
EP BP
both EP and BP exhibit é que-questions, an inexistent type before the 19th century and the SV/VS variation can be found in EP, while in BP VS is found only with unaccusatives:
(22) O que é que ela representa? what is that she represents ‘What is it that she represents?’ (23) a. O que é que o colunista tem contra a orelha da LWF? what is that the journalist has against the ear of the LWF ‘What does the journalist have against LWF’s ears?’ b. Quando é que acaba Antônio Alves taxista? when is that finish Antônio Alves taxidriver ‘When does Antonio Alves, the taxidriver, finish?’ i)
BP
both varieties exhibit wh-questions with NS:
(20) Com quem quer falar? with whom want-3sg to talk ‘Who does he want to talk with?’ (21) Em quem estão votando? in whom are-3pl voting ‘Who are you voting for?’ h)
313
EP BP BP
In addition to é que wh-constructions, the form without the copula appears with SV order or VS with unaccusatives, in BP:
(24) a. O que que eu posso fazer? What that I can do ‘What can I do?’
BP
314 Mary Aizawa Kato und Carlos Mioto b. E quanto que tá a inflação? and how-much that is the inflation ‘and how much is the inflation?’ j)
BP
wh-in-situ as ordinary questions are found in BP and in EP, despite I. Duarte’s claim that wh-in-situ in EP is always an echo-question:
(25) a. Os jovens terão seu bacharelado para quê? the young will have their BA for what ‘What will the young ones have their BA for?’ b. Você votou em quem em 1989? you voted for who in 1989? ‘Who did you vote for in 1989?’
EP BP
Comparing data from journalistic texts – Público (EP) and Folha de São Paulo (BP) – in the website, the following results were obtained. Table 1. Types of wh-questions in EP and BP types of whquestions EP BP
with whmovement N 252 238
% 76.83 65.75
wh in situ
n 04 32
% 01.22 08.84
cleft-wh with é que n 72 88
% 21.95 24.31
reduced cleft with que n % 04 01.10
Total
n 328 362
% 100 100
Table 1 shows that, disregarding the SV/VS order, all types are found in the two varieties, except the reduced cleft, which is found only in BP. In the other cases the differences are quantitative. In both EP and BP the noncleft, wh-moved type is the preferred one. The in-situ type shows a significant difference, with BP presenting eight times the number of EP, which still seems to ban it from written language. Cleft questions are quite productive, despite its innovative character. Though the reduced cleft is part of the vernacular in BP, it is still banned from written language. In the following table, we will examine only the questions with whmovement from Table 1. We have computed the two types of VS together as most cases were ambiguous between the V2 and the stylistic type. The only type that appears separated is questions with the unaccusative verb.
A Multi-Evidence Study of Portuguese wh-Questions
315
Table 2. SV/VS order in wh-moved questions With Movedwh
VS
SV
NS
N
%
n
%
N
%
EP BP
62 11
24.60 04.62
02 97
0.79 40.76
151 70
59.92 29.41
unaccus VS n % 37 60
14.68 25.21
Total n
%
252 238
100 100
Table 2 shows no empty cell, which means that the questions with whmovement can have SV, VS or a NS in the two varieties. However, there is a strong quantitative difference: SV order in EP is insignificant (0.79%), and the lower incidence of NSs in BP reflects another change in this variety, namely the loss of referential NSs. Unaccusative VS order appears with a higher incidence in BP than in EP, but the non-unaccusative VS with 4.62% raised our curiosity. Looking at the examples, we found that they are all cases of wh-adjuncts selected by the verb. There is no single case of wharguments, which are easily found in EP: (26) Afinal, quanto vale a vida? after-all, how-much is worth the life ‘After all, how much is life worth?’ (27) De que se queixam os médicos? of what se -complain the doctors ‘What do the doctors complain about?’
BP EP
As the changes in BP wh-questions were observed in the diachronic analysis of plays (Lopes-Rossi 1993), a separate analysis of EP plays was conducted, in order to check whether the appearance of é que-questions was connected with the decrease of VS with non-unaccusative verbs. Similarly to BP, bare wh-questions in two contemporary EP plays (total: 129) have revealed the inexistence of VS order with unergative and transitive verbs, and a productive use of cleft interrogatives. But differently from BP, no SV order is attested in non-cleft questions. Moreover, the whin-situ questions have only 3% even in a genre that simulated spoken language. The counting of wh-types gave us the results in Table 3:
316 Mary Aizawa Kato und Carlos Mioto Table 3. Types of wh-questions in EP plays +wh-movement VS–unacc. %
3.3
non-cleft VS +unacc. 12%
SV %
-whmovement NS 24%
Cleft é que Que 61% %
3%
Empirical generalization
Using the interpretive criteria we postulated in 3.1., we can conclude this section with the following generalizations: a) The ungrammatical patterns in EP are the ones with que ( % in all the data) and the non-clefts with SV order (only 0.79% in Table 2). b) The patterns that are part of the core grammar of EP, but that are still banned in written language, are the wh-in-situ questions (1.22% in Table 1 and 3% in the two contemporary plays). c) The patterns that are part of the core grammar of BP, but still banned from the written language, are the cleft with que (1.10% in Table 1). d) The SV order is part of the core grammar of BP (40.76% in Table 2), which licenses, however, VS order with unaccusatives verbs (25.21% in Table 2) and with wh-adjuncts of a special type (4.62% Table 2). e) The cleft questions are part of the core grammar of both BP and EP. f) The Portuguese speakers/writers have two grammars in competition: the one with the VS order in questions, and the one with a massive introduction of cleft questions. The genre differences suggest that spoken EP is losing the V-to-C type of wh-question (I. Duarte 2000), but maintaining the stylistic type. However, its difference with BP lies in the fact that SV order is not licensed in the core grammar of EP, and no difference has been attested regarding the loss of the NS. Comparing the empirical results with what has been found in previous studies, our study revealed two facts worthy of mention: a) spoken EP does not exhibit VS order in non-cleft questions, and b) BP VS order in non-cleft questions is not restricted to unaccusative verbs.
A Multi-Evidence Study of Portuguese wh-Questions
4 4.1
317
Theoretical analysis Working Hypotheses
Languages have a way to distinguish categorical sentences from thetic sentences (Kuroda 1972). For instance, in Japanese, where the distinction is made morphologically, the subject of a categorical judgement is marked -wa (Neko-wa nemute-iru (‘The cat is sleeping’), while the subject of a thetic sentence appears with -ga (Neko-ga nemute-iru (‘The/a cat is sleeping’). We assume that wh-questions are thetic-like sentences as no subject is asserted, independent of a predication (Dare-ga nemute-iru-no? (‘who is sleeping?’). Thus, when the wh-word is the subject it is necessarily marked with -ga. This leads us to expect that the wh-questions will mirror the form of a thetic sentence in both EP and BP. Martins (1996) claims that EP shows the thetic/categorical distinction through word order: SV for categorical sentences and VS for the thetic sentences. (28) a. O gato está a dormir. the cat is to sleep b. Está o gato a dormir. is the cat to sleep ‘The cat is sleeping’
(categorical)
EP
(thetic)
EP
(for a and b)
Britto (2000) proposes that as BP lost VS, the distinction is shown in a different form: SV for a thetic, and Top SV for a categorical judgement5: (29) a. O gato, ele está dormindo. (categorical) the cat he is sleeping b. O gato está dormindo.(thetic) the cat is sleeping ‘The cat is sleeping.’ (for a and b)
BP BP
As wh-questions are thetic sentences, in our extended notion, we will see the same contrast between EP and BP: (30) a. Onde está o gato a dormir? where is the cat to sleep b. Onde o gato está dormindo? where the cat is sleeping ‘Where is the cat sleeping?’ (for a and b )
EP *EP
*BP BP
One of our working hypotheses, therefore, is that the wh-questions in EP and BP have the same derivation as thetic declarative sentences. The structure of sentences in any language has layers of projection with
318 Mary Aizawa Kato und Carlos Mioto designated functions (Rizzi 1997): the VP codifies theta roles, the IP tense and agreement relations and the CP, in its split form, the interface functions with discourse or with the superordinate structure: (31) [ForceP [TopP [FocusP [TopP [FinP [IP ...]]]]]] We will assume the FocusP (FP) projection, in whose Spec we can have a focalized constituent and also the lower and the higher TopP in (31). Our second hypothesis, based on Kato and Raposo (1996), is that: a) both EP and BP activate TopP for the categorical sentences; b) EP always projects FP but BP only projects it when there is some XP to check the +F feature, as we will show later. (28)' a. b. (29)' a. b.
[TopP o gato [FP está [IP testá [VP testá a dormir]]]] (categorical) [FP [está [IP testá [VP o gato testá a dormir]]]] (thetic) [TopP o gato [IP ele está [VP tele testá dormindo ]]] (categorical) (thetic) [IP o gato está [VP to gato testá dormindo]]
EP EP BP BP
The IP structure of languages differs, depending on whether they are a NS language or a non-NS language. For Kato (1999)6 verbs in non-NS languages appear inflected for tense and agreement in the numeration (Chomsky 1995), while in NS languages they appear inflected only for tense. The agreement morpheme, which is pronominal in NS languages, appears as an independent morpheme in the numeration. This agreement morpheme can be merged as the external argument (a minimal and maximal projection), which raises to T, dispensing with the projection of Spec of TP. Morphology rearranges the order of morphemes. (32) Comeste o bolo. EP ate+2sg the cake ‘You ate the cake.’ (32)' [IP Agr + [V+T] [VP [ tAgr [V’....]]]] b.[IP [V+T] + Agr [VP tAgr ...]] The preposed subject in NS languages is assumed to be in an A'position, and its A-position is assumed to be the VP-internal subject (Martins 1996; Raposo 1994; Barbosa 1995; Kato 1999). EP is assumed to have the two properties: pronominal agreement and subject in A'-position.7 BP, on the other hand, has the verb inflected for tense and agreement from the numeration, projecting the Spec of I, which has strong +N features: (33) As crianças comeram o bolo (categorical in EP and thetic in BP) the children ate+m(=3pl) the cake (33)' [TopP as crianças [FP comeram [IP tcomera+ m [VP tm tcomera o bolo]]] EP
A Multi-Evidence Study of Portuguese wh-Questions
(33)" [IP as crianças [I' comeram [VP tas crianças tcomeram o bolo]]]
319 BP
Our third working hypothesis is that while the domain of thetic sentences in NS languages is FP, in non-NS languages8 thetic sentences can have FP or only IP. Our analysis will show that the form of whinterrogatives is just a consequence of its thetic nature. 4.2
Questions with wh-movement
Two types of sentences were computed as belonging to the VS pattern: the WHVSX, or the V2 type, and the WHV(X)S, or stylistic inversion. 4.2.1 The WHVSX type and its loss in BP The VS order in EP and the SV form in BP can be preceded not only by a wh-element, but also by an affective operator (Raposo 1994): (34) Em nenhuma parte está o gato a dormir. in no part is the cat to sleep (35) Em nenhuma parte o gato está dormindo. in no part the cat is sleeping ‘Nowhere is the cat sleeping.’ (for both a and b)
EP BP
Wh-elements and affective operators are assumed to be in FP projection in both varieties contra Barbosa’s hypothesis that in EP they are in Spec of IP. We assume that parametrization is an effect of the strength of features on I. There are two claims regarding the position of the verb in wh-questions: the Comp (or F) position (Ambar 1988,2000; Lobato 1988; Lopes-Rossi 1993; I. Duarte 2000); and the I position (Raposo 1994; Martins 1996; Barbosa 2001). In the former analysis, the subject is sitting in Spec of IP. In the latter, the subject is internal to VP. There is, however, a third possibility in which the verb is sitting in F and the subject inside VP. (36) O que ofereceu o Pedro à Maria? what offered the Pedro to the Maria ‘What did Peter offer to Maria?’ (36)' (i) [FP/CP O que [F’ ofereceu [IP o Pedro tofereceu [VP ... à Maria]]]] (ii) [IP O que [I’ ofereceu [VP o Pedro tofereceu to que à Maria ]]] (iii) [FP O que [F’ ofereceu [IP tofereceu [VP o Pedro ... à Maria]]]]
320 Mary Aizawa Kato und Carlos Mioto The adjacency between the V and the subject has been conceived as structural adjacency, but we may propose that V and the subject are heard to be adjacent because the structural element between them is a trace, like in (36)'iii. We may keep, in this analysis, the affective element and the wh– element in Spec of F, and the inflected verb in F, unifying the structure to languages like English, which does not fit into the IP analysis in (36)'ii. (37) [FP What [F’ did [IP Peter tdid [VP tPeter offer twhat to Maria]]]] This proposal would also account for a WHAuxVSX sentence like (38), where the wh-element and the subject are interrupted by two verbs, a argument in favor of (36)'iii, but not of (36)'i, since in (36)'i only one verb is allowed between the wh-element and the subject. (38) O que pode ter ele feito aos documentos? what can have he done to the documents ‘What can he have done to the documents?’ (38)' [FP O que pode [ IP tpode [AspP ter [vP tter [VP ele feito aos documentos]]]]] Thus, we have kept our assumptions with regard to the position of the wh-element (in Spec of F) and the position of the subject as VP-internal. Another factor that has to be taken into consideration is the observed fact that the presence of affective operators in FP triggers proclisis in EP. In Martins’ (1996) analysis, proclisis is formed in I, and there stays if an affective operator appears in a higher projection, say FP. Enclisis results from V-movement to the head of TopP, with stranding of the clitic in I. (39) a. Ninguém te ofereceu este CD. nobody you-clitic offered this CD. ‘Nobody offered you this CD.’ b. O Pedro ofereceu-me este CD. the Pedro offered me-clitic this CD. ‘Peter offered me this CD.’ (39)' a. [FP Ninguém [IP te ofereceu [VP ........ este CD]]]] b. [TopP O Pedro [Top’ ofereceu [IP me tofereceu [VP .... este CD]]]]
EP EP
We propose a change in Martins’ analysis, moving the [cl+V] formed in I upwards to the head of FP. In both enclisis and proclisis, V-movement would be triggered by the strong T-features of the head in F. Before spellout, both (39)a and (39)b would exhibit proclisis in F: (39)" a. [FP Ninguémi [F’ te ofereceu [IP tme ofereceu [VP ... este CD]]]] b. [TopP O Pedro [FP [F’ me ofereceu [IPtme tofereceu [VP ...este CD]]]]]
A Multi-Evidence Study of Portuguese wh-Questions
321
What happens is that the morpho-phonological directionality of clitics in EP is to the left, unlike BP or Spanish where cliticization is to the right (Nunes 1993). In (39)''a the clitic te cliticizes to ninguém, which is in the Spec of FP. As TopP is an independent prosodic unity, elements inside it do not count as hosts for a clitic inside FP. As a consequence, morphology will reorder the clitic and the verb to make the latter its proper left host. (39)''' a. [FP Quem me] ofereceu este CD b. [TopP O Pedro [FP ofereceu me] este CD As for wh-questions, which have obligatory proclisis, their predicted derivation is: (40) O que te ofereceu o Pedro? EP what you-clit offered the Pedro ‘What has Peter offered you?’ (40)' [FP O que [F te ofereceu] [IP tte ofereceu [VP o Pedro..............]]] We want to analyze now why BP has lost this pattern, while EP has not. Recall that the diachronic research and our comparative analysis revealed a new type inexistent in BP before the 20th century and also in contemporary EP: the wh-question with a complementizer que. In our analysis this complementizer cannot co-exist with the V to Comp because it is occupying the head of FP blocking verb movement. This complementizer is merged in F and it has the same function of the moved verb in EP: check +F features. The difference lies exactly in the grammaticalization of é que into que, which happened only in BP. Recall also that BP projects the Spec of IP obligatorily, triggering the raising of the VP-internal subject to it. (41) O que que você quer fazer? what that you want do ‘What do you want to do?’ [IP você quer [VP tvocê tquer fazer to que ]]]] (41)' [FP O que [F' que
BP
We also propose an optional erasure of que at PF, a stylistic rule, which derives the WHSV form9. (41)" [FP O que [F' que
[IP você quer [VP tvocê tquer fazer to que ]]]]
4.2.2 Stylistic inversion The analysis of stylistic inversion that we endorse here is the one proposed in Kayne and Pollock (1998), according to which a sentence like (40) has a
322 Mary Aizawa Kato und Carlos Mioto source where the postposed subject appears in TopP.10 The derivation that conciliates our analysis of the position of the inflected verb and the whelement is presented in (42'): (42) Por que telefonaram ontem os meninos? why called-3pl yesterday the children ‘Why did the children call yesterday?’ (42)' (i) [IP telefonaram [VP ttelefonar ontem por que]] (ii) [FP por que telefonaram [IP ttelefonaram [VP ttelefonar ontem tpor que]]] (iii) [TopP os meninos [FP por que telefonaram [IP ...[VP ...ontem ]]]] (iv) [FP por que telefonaram [IP ...[VP...ontem ]]][TopP os meninos [tFP]] The problem with such analysis is that a wh-question should have a categorical interpretation since TopP is projected. The solution for such cases is that while the referential subject goes to the lower TopP in Kayne’s system, the postposed “subject” in our system derives from the higher TopP. We may propose that such elements are independent chunks, which do not affect the LF reading of sentences. The wh-question continues to be a thetic sentence and the interpretation of the postposed “subject” is the same as that of right dislocation. What creates the thetic reading of (42) is the FP movement over TopP, placing TopP under the scope of FP. 4.2.3 WHVS with Adjunct wh-elements Most cases of VS with unaccusative verbs in BP can be analyzed as stylistic inversion, but there are cases of indefinite subjects that cannot be analyzed as being in TopP position. All examples of VS in the corpus are cases with adjunct wh-expressions – quanto (‘how much’), onde (‘where’), como (‘how’). This kind of inversion is due to the property of verbs that license predicate raising (Moro 1991). The subject and the adjunct predicate start merged in a small clause structure and the predicate raises to Spec of FP when it is a wh-word: (43) Quanto custa um carro novo? how-much costs a new car ‘How much does a new car cost?’ Let us start by exemplifying predicate raising in a declarative clause in both EP and BP. Though the sentences are identical, the structures are different:
A Multi-Evidence Study of Portuguese wh-Questions
(44) a. Um carro novo custa uma fortuna. a car new costs a fortune b. Uma fortuna custa um carro novo a fortune costs a new car ‘A new car costs a fortune’ (for both a and b) (44)' (i) [SC um carro novo [ uma fortuna ]] (ii) [VP custa [SC um carro novo [ uma fortuna ]]] a1. [IP um carro novo custa [VP tcusta [SC ... [ uma fortuna ]]]] a2. [TopP um carro novo [FP custa [IP...[VP... [SC ...uma fortuna ]]]]] b1. [FP uma fortuna (que) [IP custa [VP ... [SC um carro novo... ]]]] b2. [FP uma fortuna custa [IP tcusta [VP ...[SC um carro novo ...]]]]]
323
BP EP BP EP
The derivation of (43a) for EP and BP would be as follows: (43)' a1. [FP quanto [F’custa [IP tcusta [VP ...[SC um carro novo ...]]]]] a2. [FP quanto [que [IP tquanto custa [VP [SC um carro novo ...]]]]]
EP BP
The difference between the two varieties is again the position of the verb: F in EP and I in BP. The adjunct-wh-cases are extra-evidence that whquestions mirror what happens in thetic declarative clauses. 4.2.4 The cleft questions and the NS questions EP and BP share the é que-questions pattern that appeared in the 19th century. Both EP and BP shared the representation in (45)': (45) O que é que os amigos te ofereceram? EP BP (19th) what is that the friends [to you]-offered ‘What is it that your freinds offered you?’ (45)' [FP O que é [IP té [CP que [TopP os amigos [FP te ofereceram ]]]]] BP can also have just the complementizer que that is merged in F: (46) O que (que) os amigos te ofereceram? (46)' [FP O que (que) [IP os amigos te ofereceram ]]
*EP
BP
V+I always moves to F in EP, but not in BP. So (45)' is the analysis for contemporary EP, but not for the contemporary BP: for BP the copula remains in I. An evidence for this is (47), with unerased que: (47) O que que é que os amigos te ofereceram ? what that is that the friends [to you]-offered
*EP
BP
324 Mary Aizawa Kato und Carlos Mioto (45) in contemporary BP is produced by the erasure of the complementizer, as seen in (45''): (45)'' [FP O que que [IP é [CP que [IP os amigos te ofereceram]]]] BP(20th) Another pattern shared by EP and BP are wh-questions with NSs: (48) Com quem quer falar? with who wants to-talk ‘With whom do he want to talk?’ Again, we propose that the similarity is only apparent: first, the verb in EP is sitting in F, while in BP it is in I; further, NS in EP is simply an affixal subject, while in BP the full pronoun is merged in spec of IP and later erased by a stylistic rule. EP (48)' a. [FP com quem quer [IP tquer [VP tquer falar tcom quem]]] b. [FP com quem [F’ que [IP ele quer [VP tele tquer falar tcom quem ]]]] BP 5
Conclusions
Though the empirical research showed a good intersection of licensed patterns, the theoretical analysis showed that the similarities of EP and BP are only apparent. The most important contribution provided by the empirical research was the absence of V+I to F in plays. This tells us that in EP F is also loosing its strength to attract V+I, while it still resorts to the special conservative feature of the copula to retain F occupied by a verb. The most important theoretical contribution was to show that VS order in EP wh-questions simply reflects the derivation of thetic sentences in general, a correlation that is not made in previous studies. Notes 1. 2.
We thank Marcello Marcellino Rosa for his revision of English. We thank the audience at the International Conference on Linguistic Evidence held in Tübingen and an anonymous reviewer for their valuable comments. Jacinto Lucas Pires: Arranha-céus. Lisboa: Cotovia, 1999. Carlos Alberto Machado: Transporte e mudanças. Lisboa: Frenesi. 2000
A Multi-Evidence Study of Portuguese wh-Questions 3.
325
The form before the 20th obeyed the consecutio temporum, as in (i): (i) Quem foi que o Pedro viu? who was that the Pedro saw? ‘who was it that Pedro saw?’ 4. Roberts (1993) shows that certain Italian dialects underwent a change similar to Old French, namely loss of V2, but nevertheless retained the NS property. 5. In written language, this difference is neutralized as Left Dislocation is banned by stylistic rules. Like in English both categorical and thetic sentences are codifies as SVO. However, in the Brazilian vernacular the distinction is very sharp even among educated speakers. 6. See also Barbosa (1995), who has similar points, but who maintains pro instead of postulating the affix as subject. 7. In a work in progress, we claim that EP can optionally choose to have the verb inflected for person in contexts where it allows the variation of the order VS or SV. The order is determined by the scope of focalization. If the whole subordinate clause is focalized, the order is SV with proclisis, as in (i) where (ib) answers the question (ia); if only the subject of the subordinate clause is the focus, the order is V(X)S, as in (ii) where (iib) answers the question (iia): (i) a. O que disse o menino? what said the boy ‘What did the boy say?’ b. Disse que um gato lhe roubou o bolo. said-3sg that the cat 3pscl stole the cake. ‘The boy said that a cat stole the cake from him.’ (ii) a. Quem disse o menino que lhe roubou o bolo? who said the boy that [from him]-stole-3pl the cake ‘Who did the boy say stole the cake from him?’ b. Disse que lhe roubou o bolo um gato . said-3sg that stole [from him] the cake a cat It seems, moreover, that this possibility can be extended to root clauses for some speakers (maybe the younger generation), who accept thetic sentences with definite preposed subjects, and doubling of the sort found in BP for categorical sentences (cf Costa 2001). 8. BP is a subtype of NS language, but in this respect it behaves exactly like a non-NS language. 9. The rule is a norm for literate people. 10. While for Kayne and Pollock the element in TopP is moved from the subject position, in our analysis it is merged in Spec TopP, with the pronominal affix in EP or the pronoun in BP doubling it.
326 Mary Aizawa Kato und Carlos Mioto References Adams, Marianne 1987 Old French, Null Subjects, and Verb Second Phenomena. UCLA: Ph. D. Dissertation. Ambar, Manuela 1988 Para uma Sintaxe da Inversão Sujeito-Verbo em Português. Lisboa: Ed. Colibri. 2000 Wh-Assimetries. Going Romance. Ambar, Manuela and Rita Veloso 1999 On the nature of wh-phrases – wh-in-situ and word order”. In: Johan Rooryck (org), John Benjamins, in press. Barbosa, Maria Pilar 1995 Null Subjects. MIT: Ph.D. Dissertation. 2001 On inversion in wh-questions in Romance. In: A. Hulke & J-Y Pollock (eds) Romance Inversion. New York: OUP, 2-59. Berlinck, Rosane A. 1995 La Position du Sujet en Portugais: étude diachronique des varietés brésilienne et européenne. Katholieke Univeritei te Leuven: Ph.D. Dissertation. Britto, Helena 2000 Syntactic codification of categorical and thetic judgments in BP. In Mary A. Kato & Esmeralda V. Negrão (eds) Brazilian Portuguese and the Null Subject Parameter. Frankfurt am Main: Vervuert Verlag,,195-222. Chomsky, Noam A. 1995 The Minimalist Program. Cambridge, Mass: MIT Press. Costa, João 1998 Order Variation. A constraint-based approach. Doctoral dissertation. HIL/Leiden University 2001 Spec, IP ou deslocado? Prós e contras das duas análises dos sujeitos pré-verbais. D.E.L.T.A., 17(2), 283-303. Duarte, Inês Silva 2000 Português europeu e Português brasileiro: 500 anos depois, a sintaxe. Paper presented at Congresso Internacional dos 500 Anos de Língua Portuguesa, Évora, Portugal. Duarte, Maria Eugênia 1992 A perda da ordem V(erbo) S(ujeito) em interrogativas qu- no português do Brasil. D.E.L.T.A., Número Especial, 37-52. 2000 The loss of the Avoid Pronoun Principle in Brazilian Portuguese. In: Mary A.Kato & Esmeralda V. Negrão. (orgs.) Brazilian Portuguese and the Null Subject. Frankfurt am Main: Vervuert Verlag, 17-36.
A Multi-Evidence Study of Portuguese wh-Questions
327
Kato, Mary A. 1999 Strong pronouns, weak pronominals and the null subject parameter. PROBUS, 11 (1), 1-37. Kato, Mary A.and Eduardo P. Raposo 1996 European and Brazilian word order: questions, focus and topic constructions. In Claudia. Parodi, Antônio C. Quicoli, Mario Saltarelli & Maria L. Zubizarreta (eds) Aspects of Romance Linguistics. Washington: Georgetown U.Press, 267-277. Kato, Mary A. and Maria Eugênia Duarte in press A diachronic analysis of Brazilian Portuguese wh-questions. In: Eduardo P. Raposo (ed) Romance Philology. Santa Barbara: U of California. Kayne, Richard and Jean-Yves Pollock 1998 New thoughts on stylistic inversion Paper presented at the conference Word Order in Romance, Amsterdam. Kroch, Anthony 1994 Morphosyntactic variation. In: K. Beals et allii (eds) Papers from the XXXth Regional Meeting of the Chicago Linguistic Society: Parassession on Variation and Linguistic Theory, 180-201. Kuroda, S-Yuki 1972 The categorical and the thetic judgment. Foundations of Language, 9:153-185. Lightfoot, David 1999 The Development of Language: acquisition, change and evolution. Oxford: Blackwell. Lobato, Lucia.M.P. 1988 Sobre a regra de anteposição do verbo no português do Brasil. D.E.L.T.A., 4,121-147. Lopes-Rossi, M.Aparecida 1993 Estudo diacrônico sobre as interrogativas do português do Brasil. In Ian Roberts & Mary A. Kato (eds.) Português Brasileiro: uma viagem diacrônica. Campinas: Editora da UNICAMP, 307-342. 1996 As Orações Interrogativas-Q no Português do Brasil : um estudo diacrônico. UNICAMP: Ph. D.Dissertation. Martins, Ana M. 1996 Clíticos na História do Português. Universidade de Lisboa: Ph. D. Dissertation. Moro, Andrea 1991 The raising of predicates: copula, expletives and existence. In: Lisa S. Cheng & H. Demirdache. MITWPL, 15, 183-218.
328 Mary Aizawa Kato und Carlos Mioto Nunes, Jairo M. 1993 Direção de cliticização, objeto nulo e pronome tônico na posição de objeto em português brasileiro. In: Ian Roberts & Mary A. Kato (eds) Português Brasileiro: uma viagem diacrônica. Campinas: Editora da UNICAMP, 223-262. Raposo, Eduardo P. 1994 Affective operators and clausal structure in European Portuguese and European Spanish. Ms. UCSB, CA. Ribeiro, Ilza 1995 Evidence for a Verb-Second Phase in Old Portuguese. In: Adrian Battye & Ian Roberts (eds.). Clause Structure and Language Change. New York: Oxford University Press, 110-139 Rizzi, Luigi 1996 Residual verb-second and the wh-criterion. In: Adriana Belletti and Luigi Rizzi (eds). Parameters and functional heads: essays in comparative syntax. Oxford/New York: OUP, 63-90. 1997 The fine structure of the left periphery.In: Liliana Haegeman (ed.) Elements of Grammar. Handbook of Generative Syntax. Dordrecht: Kluwer, 281-337. Roberts, Ian 1993 Verbs and Diachronic Syntax. Dordrecht: Kluwer. Torres Morais, Maria A. 1993 Aspectos diacrônicos do movimento do verbo, estrutura da frase e caso nominativo no português do Brasil.In: Ian Roberts & Mary A. Kato (eds) Português Brasileiro: uma viagem diacrônica. Campinas: Editora da UNICAMP, 263-306. Zubizarreta, Maria Luiza 1998 Prosody, Focus and Word Order. Cambridge, Massachusetts/ London: The MIT Press.
The Relationship between Grammaticality Ratings and Corpus Frequencies: A Case Study into Word Order Variability in the Midfield of German Clauses Gerard Kempen and Karin Harbusch
1
Introduction
It is almost a commonplace that word order in the midfield of German clauses is flexible. Although statements to this effect do not claim that “anything goes” (cf. Eisenberg (1994, Ch. 12); Rambow (1994); M¨uller (1999)), they suggest that word-order variability in German clauses is considerably greater than, for example, in Dutch and English ones. The following three constraints on the ordering of argument NPs have figured prominently in the linguistic and psycholinguistic literature (cf. Uszkoreit 1987; Pechmann et al. 1996; Keller 2000): C1: Pronominal ≺ Non-pronominal C2: Nominative ≺ Non-nominative, and C3: Dative ≺ Accusative (where “≺” means “precedes”). Recently, intuitions about word order in German have been probed in a systematic fashion by Keller (2000). From linguistically untrained native speakers he elicited “graded grammaticality” judgments via a novel technique based on the psychophysical method of Magnitude Estimation (Bard et al. 1996). In one of his experiments, he determined the strengths of C1, C2 and C3 in ditransitive subordinate clauses. None of the three “precedence” constraints turned out to be “absolute” in the sense that their violation gave rise to extremely low acceptability/grammaticality judgments (as low as violation of the absolute verb-final constraint in subordinate clauses did). In fact, Keller found C1 and C2 to have about equal strength and both to be considerably stronger than C3, which was very weak. If linear order constraints such as C1 through C3 and their relative strengths are “psychologically real”, they are expected to affect the linearization of argument NPs in actual language performance, i.e. during speaking and writing. More precisely, they yield predictions concerning the relative
330 Gerard Kempen and Karin Harbusch frequencies of argument orderings in written and spoken texts. For example: – Linear orders that agree with a given constraint will be more frequent than orders violating it. – Stronger constraints will give rise to a lower proportion of violations than weak constraints. – Orders that violate more than one constraint will be rarer than violations of single constraints. In earlier work (Kempen and Harbusch 2003), we verified that this approach is viable in principle. We hypothesized that the strength of a precedence constraint reflects the likelihood that it would actually be respected in the course of the incremental production of a sentence. On this basis, and with a somewhat different set of precedence constraints than the above, we developed a probabilistic model that predicts relative frequencies of occurrence of argument NP orderings in real texts. We assumed, furthermore, that the predicted frequency of an argument ordering corresponds directly to its rated grammaticality. The model we thus obtained yielded a satisfactory fit with Keller’s ratings. However, this proof of concept hinges on the assumption of a close correspondence between frequency and rated grammaticality. As no frequency counts are available of argument NP orderings in clauses of the same type as those presented to the participants in Keller’s experiments, we decided to supply this want. In the course of this corpus work, a systematic discrepancy emerged between the frequency counts and the grammaticality ratings. We had expected argument orderings in the middle range of the grammaticality spectrum to occur in the corpus with moderate frequencies, but they were conspicuously absent. The level of flexibility emerging from the frequency counts thus was considerably lower than grammaticality judgments suggest. The discovery of this frequency-grammaticality discrepancy spawned the investigation we report here. Preview. The frequency counts of argument NP orderings in subordinate clauses of written as well as spoken corpora are presented in Section 2. In Section 3, we propose a rather restrictive “production-based linearization rule” that models the actually occurring, high-grammaticality orderings. Section 4 extends the corpus study to main clauses in the written corpora and verifies that the rule holds there as well. Subsequently, Section 5 confronts the frequency counts with Keller’s (2000) graded grammaticality rat-
Grammaticality Ratings and Corpus Frequencies 331
ings and shows that the latter license more word order freedom than the former. Section 6 proposes a theoretical account of the observed frequencygrammaticality discrepancy. Section 7 introduces a recent argument ordering model in the linguistic literature (M¨uller 1999) which is almost equally strict as our linearization rule although entirely based on grammaticality judgments, and explains why it does not square well with Keller’s data. Section 8, finally, summarizes the line of reasoning. 2 The corpus study We conducted a corpus study into the frequencies of linear orders of pronominal and non-pronominal Subject (S, nominative), Indirect Object (I, dative) and Direct Object (O, accusative) NPs in German subordinate clauses. These clause types include those used by Keller (2000) in his grammaticality rating studies. We needed these data in order to determine to what extent graded grammaticality ratings are mirrored by the frequency of argument ordering patterns in sentence materials generated outside the laboratory. We sought an answer to this question in written as well as spoken texts. 2.1 2.1.1
Written language: The NEGRA II and TIGER treebanks Method
Recently, the NEGRA II (Skut et al. 1997) and TIGER corpora (Brants et al. 2002) have become available – German treebanks that together contain about 60,000 newspaper sentences annotated in full syntactic detail. The data we report here have been aggregated over both corpora. Using version 2.1 of TIGERSearch (K¨onig and Lezius 2000), we extracted all finite clauses introduced by a subordinating conjunction and containing an (S,I) and/or an (S,O) pair, possibly with an additional (I,O) pair (with the members of a pair occurring in any order). As for terminology, clauses containing only an (S,I) pair are labeled intransitive; clauses with only an (S,O) pair are monotransitive; a clause with an (S,O) as well as an (I,O) pair is ditransitive; both latter types of clauses are called transitive. We found 2578 monotransitive, 287 intransitive, and 199 ditransitive subclauses meeting the requirements (3064 sentences in all). We distinguished six types of NPs: three pronominal (labeled (Sp, Ip, Op) and three non-pronominal or “full” (S, I, O). An NP is pronominal iff it consists of a personal or a reflexive pronoun. Because a clause contains at most one token of each of the three types of grammatical function, there are 12 possible unordered pairs of NPs: three combinations of grammatical functions ((S,I), (S,O) and (I,O)) times
332 Gerard Kempen and Karin Harbusch four combinations of NP shapes (all pronominal, one member full, the other member full, all full). For each of these, we determined the frequency of the two possible orderings (i.e., of 24 ordered pairs). For additional methodological details, we refer to Kempen and Harbusch (2004a). 2.1.2
Results
There were 3462 ordered pairs: one from each mono- or intransitive clause; three from every ditransitive clause (Table 1). Table 1. Frequency of the 24 ordered pairs of argument NPs in subordinate clauses extracted from the NEGRA II and TIGER corpora. Dark-gray cells represent syntactically inadmissible constituent pairs.
Sp
First NP
Total
Sp Op Ip S I O
0 0 0 0 0
Op 146 0 195 2 343
Second NP Ip S 49 5 302 89 41 53 0 4 95 448
I 72 23 182 9 286
O 654 93 1476 67 2290
Total 921 330 182 1894 122 13 3462
What is immediately striking is the high proportion of (almost) empty cells. This suggests a level of flexibility that is not very high. The ordering of the pronominal constituents is invariably Sp–Op–Ip, and S is the only full NP that may precede a pronominal NP. Variability within full NPs is somewhat greater: While S–I–O is the predominant order, inversions do occur regularly. Closer inspection of the data reveals, however, that the inverted order I–S is restricted to clauses with intransitive verbs (more precisely, to “experiencer–object” verbs as in dass [dem Jungen] I [etwas]S widerf¨ahrt; that [the boy] something happens-to; ‘that something happens to the boy’), and that the sequence O–I only occurs as standard order licensed by special ditransitive verbs (cf. [jemanden]O [einer Pr¨ufung]I unterziehen; someone [a test] subject-to; ‘subject someone to a test’). See Section 5 for additional data. Before going into the reasons for the restricted variability, we need to assess whether it generalizes to other text types.
Grammaticality Ratings and Corpus Frequencies 333
2.2
Spoken language: The VERBMOBIL dialogue corpus
The NEGRA II and TIGER corpora consist entirely of written newspaper texts. Probably, these texts have been heavily edited by authors and editors. In the course of this process, many argument orderings that embodied lowfrequency patterns may have been eliminated and replaced by alternatives in the high-frequency range. We therefore deemed it necessary to explore a text type that has undergone no off-line editing. The VERBMOBIL corpus of spoken dialogues (Burger et al. 2000) lent itself very well to this purpose. 2.2.1
Method
Having extracted all relevant subordinate clauses from the transcribed VERBMOBIL dialogues,1 we classified the grammatical functions (S, I, O) by hand. 2.2.2
Results
We found 2711 monotransitive, 296 intransitive, and 168 ditransitive subordinate clauses meeting the requirements (3175 clauses in all). See Table 2 for the 3511 ordered argument pairs. Section 5 gives additional details. Table 2. Frequency of the 24 ordered pairs of argument NPs in subordinate clauses extracted from the VERBMOBIL corpus. Dark-gray cells represent syntactically inadmissible constituent pairs.
Sp
First NP
Total
Sp Op Ip S I O
2 0 0 1 3
Op 687 2 13 0 702
Second NP Ip S 263 4 40 174 18 0 5 1 290 215
I 7 1 2 1 11
O 2003 151 132 4 2290
Total 2960 47 327 165 4 8 3511
Inspection of Table 2 shows that the frequency pattern is very similar to that in Table 1. The number of empty or almost empty cells is again rather high, and they occupy the same positions in the two Tables. We conclude that limited flexibility of argument NP orderings is a rather widespread phenomenon occurring in spoken as well as written language.
334 Gerard Kempen and Karin Harbusch In the next Section, we propose a linearization rule that fits the observed pattern rather accurately.
3 A production-based linearization rule
The frequency data can be accounted for by the rather strict “productionbased linearization rule” in Figure 1. 2
Sp — 6
Clause type:
Op
—
Ip
Transitive
—
S — I — O 6 6 6 6
Intransitive Ditransitive
Figure 1. “Production-based linearization rule” representing the linearization options observed in the written and spoken corpora in clauses headed by a mono-, di-, or intransitive head verb. Transitive clauses include both mono- and ditransitive ones.
To each individual argument NP, the rule assigns a standard (“primary”) position among its clausemates. The pronominal NPs have a fixed position in the anterior region of the midfield. (This region is traditionally called “Wackernagel position”.) The primary position of full NPs is posterior to that of the pronouns. However, each full NP has an additional “secondary”, more anterior placement option, which is indicated by the labeled arrows. This “freedom in restraint” is conditional upon mono-, di-, or intransitivity of the head verb. Mild conceptual factors such as animacy (Kempen and Harbusch 2004a), definiteness (Kurz 2000; Kempen and Harbusch 2004b) and referential ease (yielding “light” NPs; cf. Hawkins 2004; Wasow 2002; Kempen and Harbusch 2004b) enable full argument NPs to occupy the more leftward position. Of the 6239 written and spoken subordinate clauses extracted from the corpora, only 17 exemplars violate the production-based rule (0.27%). We list these exceptions in extenso in Table 3 3 .
Grammaticality Ratings and Corpus Frequencies 335 Table 3. Exceptions to the production-based linearization rule in the NEGRA II, TIGER, and VERBMOBIL corpora. The source of each clause has been added to the reference number (N=NEGRA II, T=TIGER, V=VERBMOBIL). Monotransives (1-N) da [diese Aufgabe]O [der Landrat]S selbst u¨ bernehmen wolle ¨ (2-T) Seit [das]O [die Arzte] S in der Heimat wissen (3-T) sofern [dies]O [eine der Seiten]S w¨unscht (4-T) Wenn [Freund wie Feind]O nun [eine Frage]S bewegte (5-V) weil [das]O ja [die Firma]S tr¨agt, die Kosten (6-V) weil [den Totensonntag]O [ich]Sp nicht f¨ur sehr geeignet halte (7-V) nachdem [sich]Op [es]Sp ja nur um eine kurze Besprechung handelt (8-V) weil [sich]Op [es]Sp um ein f¨unft¨agiges Arbeitstreffen handelt Ditransives (9-T) Wenn [er]Sp [der Pest]I nicht schleunig [uns]Op entreißt (10-T) daß [es]Sp [den Menschen]O [sich]I p selbst entfremde (11-V) damit [wir]Sp [den Flug]O [uns]I p danach einrichten k¨onnen (12-V) daß [wir]Sp vielleicht auch [ein paar Sehensw¨urdigkeiten]O [uns]I p anschauen (13-V) daß [wir]Sp [die achtunddreißigste Woche]O [uns]I p daf¨ur ausgucken (14-V) daß [wir]Sp [Donnerstag vormittag]O einfach [uns]I p vornehmen (15-V) wenn [wir]Sp schon [so viel]O [uns]I p anschauen (16-V) wenn [ich]Sp [mir]I p [es]Op so recht u¨ berlege (17-V) wenn [Sie]Sp [uns]I p [sie]Op zuschicken k¨onnten
4 Generalizing the production-based linearization rule to main clauses If the production-based rule really determines the order of argument NPs in the midfield, it should generalize to main clauses with inversion of Subject and finite verb. In order to test this prediction, we checked the ordering patterns in mono- and ditransitive main clauses in the NEGRA II and TIGER treebanks. We found 5025 ordered pairs: one from each mono- or intransitive clause; three from every ditransitive clause. Comparison of Tables 4 and 1 reveals a very similar frequency distribution over the cells. The most salient exceptions to the linearization rule involve extraposed full Subject NPs. We found 20 (out of 5025) O–S pairs, which is somewhat higher than in the subordinate clauses (4 out of 3462 pairs). In a high proportion of these cases, the Subject NP is extremely long. An example: – F¨ur die neue Konzertsaison, die am 25. September beginnt, erwarten [das Publikum]O wieder [einige H¨ohepunkte, die der musikalische Leiter der
336 Gerard Kempen and Karin Harbusch Table 4. Frequency of the 24 ordered pairs of argument NPs in the midfield of main clauses extracted from the NEGRA II and TIGER corpora. Dark-gray cells represent syntactically inadmissible constituent pairs.
Sp
First NP
Total
Sp Op Ip S I O
0 0 0 0 0
Op 153 2 37 0 192
Second NP Ip S 59 4 894 171 19 79 0 20 82 1164
I 118 16 201 7 342
O 937 99 2134 75 3245
Total 1267 914 272 2391 154 27 5025
Konzerte, Michael Schneider, unter genauen Vorgaben zusammengestellt hat]S . This also applies to four rule-violating Op–I–S main clauses (see the corresponding bar in the bottom panel of Figure 3 below). This is one of them: – Hier bot [sich]O [den 6 bis 12j¨ahrigen]I [die Gelegenheit, die großen D¨usenflugzeuge auf dem Rollfeld zu beobachten und die Flughafenfeuerwehr zu besuchen]S . We assume that these orderings are due to the overriding influence of “end weight” (cf. Hawkins 2004; Wasow 2002). Whether they are real exceptions is disputable, though, because postposed constituents often end up in the endfield of a clause. In the absence of a nonfinite verb separating the midfield from the endfield (“rechte Satzklammer”), one cannot tell which field the extraposed constituent has selected. In sum, the NEGRA II and TIGER data justify generalization of the production-based linearization rule to argument orderings in main clauses. The fact that clauses independently selected from the three corpora appear to respect the rule, may reinforce trust in it and weigh up against the disadvantage of the relatively small corpus sizes. 5 Comparing grammaticality judgments and frequency data In his Experiment 6, Keller (2000) elicited grammaticality ratings for ditransitive subordinate clauses where at most one constituent is pronominal. This yields four (what we will call) “families” of ditransitive clauses: one family
Grammaticality Ratings and Corpus Frequencies 337
with all NPs full (S,I,O) and three families with one pronominal NP ((Sp,I,O), (S,Ip,O) and (S,I,Op)). Each member of a family represents one possible ordering of the NPs. Hence, every ditransitive family comprises six members (i.e., six argument permutations). Experiment 10 dealt with monotransitive subordinate clauses, but here both S and O could be pronominal. Thus there are four monotransitive families — (S,O), (Sp,O), (S,Op) and (Sp,Op) — with two members each. It is important to keep the clause families separate because a family represents a unique combination of grammatical functions and (non-)pronominal NP shapes. This allows a view of linear order preferences uncontaminated by production factors that control the choice between a pronominal or non-pronominal NP shape. The ratings for the ditransitive clause families are shown in Figures 2 and 3, together with the corresponding corpus frequencies; Figure 4 displays ratings and frequencies for the monotransitive families. The grammaticality ratings of the NP orderings decrease from left to right. (The exact values are listed in the third column of Tables 5 and 6.) 4 The Figures reveal a substantial correlation between the grammaticality rating and the corpus frequency of the members of a clause family: in all mono- and ditransitive clause families, the most frequent members are the ones that received the highest ratings; and members with very low ratings are absent from the corpus. However, quite a few orderings that are rated at least average in grammatical quality also have zero corpus frequencies. Consider, in particular, the ditransitives to the immediate right of the thick vertical lines in Figures 2 and 3. In other words, the grammaticality judgments tend to be more lenient than the corpus data. 5 In the next Section we propose an explanation for this discrepancy. 6 Severity of rule violation affects corpus frequencies and grammaticality ratings differently Let us assume that the internal grammar of native speakers of German includes our production-based linearization rule — in other words, that the rule is “psychologically real”. This implies that under normal conditions the grammatical encoding mechanisms operative in these speakers of German will function according to this rule, and sentences violating it will rarely occur. When performing the grammaticality judgment task and detecting an argument ordering that conflicts with the rule, the informants can react in two different ways: – They reject any ill-formed linear order and assign the same low grammaticality score to all clauses embodying a deviant order.
338 Gerard Kempen and Karin Harbusch
Figure 2. Grammaticality ratings for the (S,I,O) and (Sp,I,O) families in the first and the third panel, respectively. Data from Keller’s (2000) Experiment 6. The corresponding bar diagrams display the relative corpus frequencies for the argument orderings, expressed as percentages of the total number of ditransitive clauses in their family. In the topmost bar diagram, the bar for the (S,I,O) family in the VERBMOBIL corpus is missing due to the nonoccurrence of such clauses in that corpus.
Grammaticality Ratings and Corpus Frequencies 339
Figure 3. Grammaticality ratings for the (S,Ip,O) and (S,I,Op) families in the first and third panel, respectively. Data from from Keller’s (2000) Experiment 6. The corresponding bar diagrams display the relative corpus frequencies for the argument orderings, expressed as percentages of the total number of ditransitive clauses in their family.
340 Gerard Kempen and Karin Harbusch
Figure 4. Upper panel: grammaticality ratings for all families of monotransitive subordinate clauses, from Keller’s (2000) Experiment 10. Lower panel: relative corpus frequencies for the same families of clauses, expressed as percentages of the total number of monotransitive clauses in their family.
– To ill-formed clauses they assign a grammaticality score commensurate with the severity of rule violation, i.e. with number and seriousness of the deviations. In the latter case, the grammaticality score an argument order receives is predicted to be a function of the similarity between this order and the order(s) licensed by the production-based linearization rule. In order to evaluate the viability of this hypothesis, we derived from it a simple similarity metric. Arguments that prefer an early position in the clause are the Subject NP (full or pronominal) and the pronominal NPs. Mono- and ditransitive orderings where these constituents indeed occupy early positions are therefore more similar to the rule (closer to the “prototype”) than orderings where these constituents have moved to the right. Calculation of the average ranking number of the Subject and pronominal NPs in an argument ordering yields an “anteri-
Grammaticality Ratings and Corpus Frequencies 341 Table 5. Predicting the grammaticality ratings for ditransitive clauses. Columns 2 and 3: grammaticality values and their rank order (from Keller’s Experiment 6). Columns 4 and 5: corpus frequencies. Columns 6 and 7: anteriority ranks. Column 8: anteriority score (mean rank). Ordering pattern S−I−O S−O−I I −S−O O−S−I I −O−S O−I−S Sp − O − I Sp − I − O I − Sp − O O − Sp − I I − O − Sp O − I − Sp S−Ip−O Ip−S−O S−O−Ip O−Ip−S Ip−O−S O−S−Ip S − Op − I Op − S − I S − I − Op Op − I − S I − S − Op I − Op − S
Grammaticality rank judgment 1 .2083 2 .0994 3 -.0716 4 -.2038 5 -.2667 6 -.2736 1 .1519 2 .1386 3 -.1463 4 -.2081 5 -.2936 6 -.3471 1 .1471 2 .1144 3 -.0516 4 -.2164 5 -.2612 6 -.2810 1 .1938 2 .1235 3 -.1876 4 -.2247 5 -.2694 6 -.3550
Frequency written spoken 54 0 5 0 0 0 0 0 0 0 0 0 4 1 13 4 0 0 0 0 0 0 0 0 30 6 29 4 0 0 0 0 0 0 0 0 3 1 12 0 0 0 0 0 0 0 0 0
Anteriority ranks subject pronoun mean 1 – 1 1 – 1 2 – 2 2 – 2 3 – 3 3 – 3 – 1 1 – 1 1 – 2 2 – 2 2 – 3 3 – 3 3 1 2 1.5 2 1 1.5 1 3 2 3 2 2.5 3 1 2 2 3 2.5 1 2 1.5 2 1 1.5 1 3 2 3 1 2 2 3 2.5 3 2 2.5
ority” score for each member of a family of clauses. We predict high negative correlations between anteriority and grammaticality: the lower the anteriority score of an ordering (that is, Subject and pronominal NPs in early positions), the higher the grammaticality score. Anteriority values can be calculated for both the mono- and the ditransitive clauses. Comparison of the resulting scores with the grammaticality ratings for each member of a family of clauses — see Table 5 and Table 6, columns 2 and 8 — exhibits a high rank correlation. In only one case (marked in gray),
342 Gerard Kempen and Karin Harbusch Table 6. Predicting the grammaticality ratings for monotransitive clauses. Columns 2 and 3: grammaticality values and their rank order (from Keller’s Experiment 10). Column 4 and 5: corpus frequencies. Columns 6 and 7: anteriority ranks. Column 8: anteriority score (mean rank). Ordering pattern S−O O−S Sp − O O − Sp S − Op Op − S Sp − Op Op − Sp
Grammaticality rank judgment 1 .3818 2 .1078 1 .4180 2 -.0887 1 .2482 2 .2412 1 .3024 2 -.1071
Frequency written spoken 1358 122 4 1 603 1852 0 1 189 12 290 40 134 681 0 2
Anteriority ranks subject pronoun mean 1 – 1 2 – 2 1 1 1 2 2 2 1 2 1.5 2 1 1.5 1 1 1 2 1 1.5
the rank orders are reversed. We conclude that the average similarity between to-be-judged argument orderings and orderings licensed by the production-based rule is a good predictor of the grammaticality ratings. Stated differently, the grammaticality ratings appear sensitive to the number and seriousness of violations of the rule. Argument orderings that embody mild violations of the rule, receive medium-range grammaticality scores due to this sensitivity but are virtually absent from the corpora because the grammatical encoding mechanism in speakers/writers does not (or hardly ever) produce them. 7 Graded grammaticality versus “graded ungrammaticality” In the foregoing we have tried to model the data we extracted from the spoken and written corpora as faithfully and concisely as possible, with the “production-based linearization rule” as the central outcome. We arrived at this rule independently from M¨uller’s (1999) system of word order constraints which resembles ours in many respects. The following properties of his model represent points of similarity: – The midfield includes an early region where only pronominal arguments can go — the so-called Wackernagel position. – Pronouns in the Wackernagel position do not scramble (i.e., their order is fixed).
Grammaticality Ratings and Corpus Frequencies 343
– Full NPs occupy a region posterior to the Wackernagel position. – In the posterior region, scrambling is allowed. – Of the full NPs, only the Subject is allowed to precede the Wackernagel position. There are only a few differences: – A full Subject that precedes one or more pronominal NPs, is placed to the left of the Wackernagel position, whereas in our rule it goes to the same position as pronominal Subjects. – Scrambling of full NPs is slightly more liberal in M¨uller’s model. For example, full Direct Objects may precede full Subjects in the postWackernagel region. The similarity between the two rule systems is surprising because the empirical evidence M¨uller adduces in support of his model is not production- or corpus-based but consists entirely of grammaticality judgments. Given the frequency-grammaticality discrepancies we observed above, one would expect a rather poor fit between a model based on judgments and one based on production data. Keller (2000), too, notes that his ratings are at variance with M¨uller’s model. So, how to account for the convergence of a “performancebased” and a “competence-based” linearization component? Our answer presupposes that, somewhere on the grammaticality continuum ranging from “perfectly well-formed” to “seriously ill-formed”, there is a critical value called the “production threshold”. Syntactic structures with grammaticality values above this threshold will occur in corpora with moderate-to-high frequencies. Structures whose grammaticality scores lie slightly above or slightly below the production threshold, will have zero or, at best, very low frequencies — they are “marked”. Structures with even lower grammaticality ratings are only delivered in case of a malfunctioning production mechanism or deliberate output distortion. The argument orderings licensed by our linearization rule or by M¨uller’s grammar presumably have grammaticality values that exceed the production threshold or are in its vicinity; they all represent unmarked or marked cases. Our performance-based rule converges with M¨uller’s competencebased grammar because both aim to model linear orders above and around the production threshold. Where on the grammaticality continuum should we place the clauses of Keller’s experiments? While high-grammaticality clauses indubitably exceed
344 Gerard Kempen and Karin Harbusch the production threshold, those with medium-range or low grammaticality ratings most probably are sub-threshold. If so, they are expected to have zero corpus frequencies. At any rate, a positive correlation between grammaticality and frequency may exist only for structures with grammaticality ratings above the production threshold. Apparently, “graded grammaticality” should be distinguished from “graded ungrammaticality”. Keller’s rating data probably stem in part from the latter domain, and the model he proposes aims to account not only for graded grammaticality but also for graded ungrammaticality. In contrast, M¨uller’s grammar, like our production-based rule, only covers graded grammaticality, i.e. the range of grammaticality ratings above and around the production threshold. The domain of grammaticality judgments that constitutes the empirical basis for M¨uller’s grammar, therefore differs from that investigated by Keller. This, in turn, implies that Keller’s data and the conclusions drawn from it (e.g., the relative strengths of constraints C1 through C3 in Section 1) should not be used directly to evaluate M¨uller’s grammar. The two empirical domains overlap only in part. We conclude this section with two remarks relating to M¨uller’s grammar. First, while our approach is purely descriptive, his grammar can explain important facts. For instance, we did no more than observe that the primary position of the full Subject NP follows rather than precedes pronouns in the Wackernagel position. M¨uller’s grammar is superior in that it links this property to other parts of the grammar. (For details we refer to his article.) Second, we propose to follow M¨uller (1999; endnote 11) in the treatment of “strong” pronominal arguments, that is, those carrying sentence accent or preceded by adverbs such as auch ‘also’, selbst ‘even’, nur ‘only’, etc. They function as full NPs and can occupy positions in the post-Wackernagel region of the midfield and be subject to scrambling. 8 Summary and conclusion We presented the results of several corpus studies into the frequencies of argument NP orders in the midfield of subordinate and main clauses of German. In both the written and the spoken corpora, the amount of argument order variation appeared to be relatively small. We devised a rather restrictive “production-based linearization rule” that describes the actually observed (“prototypical”) orderings. Comparison of the corpus frequencies and the grammaticality values from Keller’s (2000) study revealed a systematic discrepancy: the assignment of medium-range grammaticality scores to quite a few argument orderings with zero frequencies in the corpora.
Grammaticality Ratings and Corpus Frequencies 345
In order to explain the frequency-grammaticality discrepancy, we posited three hypotheses: – The strict production-based linearization rule (or a mechanism yielding equivalent output) is causally involved in, and constrains, grammatical encoding during spoken and written sentence production. – The grammaticality raters in Keller’s study estimated the average similarity between the to-be-judged argument ordering and the order(s) licensed by the strict linearization rule. – The grammaticality continuum specifies a critical value called the “production threshold”. Structures with grammaticality values above this threshold will occur in corpora with moderate-to-high frequencies. Structures with grammaticality scores in the neighborhood of the production threshold, will have zero or, at best, very low frequencies — they are “marked”. Structures with even lower grammaticality ratings are only generated in case of malfunction of the grammatical encoder. We show that these assumptions, in addition to accounting for the frequency-grammaticality discrepancy, also help to resolve the seeming contradiction between strict and more lenient linguistic (judgment-based) rule/constraint systems for the linear order of argument NPs in the midfield of German main and subordinate clauses, in particular the one between M¨uller (1999) and Keller (2000). Notes 1. The corpus was accessed at URL http://www.ims.uni-stuttgart.de/projekte/verbmobil/Dialogs/ and explored by means of the on-line search tool made available at the website. In writing the search patterns, we heavily used the part-of-speech (PoS) tags included in the transcriptions. No other grammatical annotations are available. As the dialogues had been recorded with several different types of microphones, many dialogue turns occur more than once, although with different codes. We eliminated such duplications before estimating the corpus frequencies. 2. The rule allows only 15 out of 48 logically possible ditransitive permutations of a full or pronominal subject, a ditto direct object and a ditto indirect object. In addition, it licenses 5 monotransitive and 6 intransitive argument NP orders. 3. In order to enable the reader to judge the (un-)markedness of the rule-violating orderings in Table 3, we list here the complete sentences together with some preceding context. (In the VERBMOBIL corpus, the accessible context is restricted
346 Gerard Kempen and Karin Harbusch to the current dialogue turn.) (1-N) In den Ferien bietet sich dazu die Gelegenheit. Die Leiterin der Limesschule, Karola Kofler, stellte auf Anfrage klar, daß sie gebeten worden sei, die Eltern nicht zu informieren, da diese Aufgabe der Landrat selbst u¨ bernehmen wolle. ¨ (2-T) Der Andrang ist betr¨achtlich. Seit das die Arzte in der Heimat wissen, ist dem einen oder anderen Befragungsbogen, den die Behinderten vor Beginn der Reise u¨ ber ihren Gesundheitszustand ausf¨ullen lassen m¨ussen, nicht mehr 100prozentig zu trauen. ¨ (3-T) Die Parteien haben sich wie folgt geeinigt: 1. Die Ubergangszeit dauert zw¨olf Monate und kann einmalig um ein weiteres Jahre verl¨angert werden, sofern dies eine der Seiten w¨unscht. (4-T) 1967, als er im iraelisch-arabischen ‘Blitzkrieg” die pal¨astinensische Westh¨alfte seines “Reiches” einschließlich der Altstadt von Jerusalem verlor, und im “Schwarzen September” 1970, w¨ahrend des beinahe t¨odlichen Sturmes der PLO-Fedajiin auf die jordanische Monarchie. Wenn Freund wie Feind nun eine Frage bewegte, dann war es die um den Gesundheitszustand des Monarchen und um das Schicksal Jordaniens, wenn er einmal abgetreten sein wird. (5-V) ja, dem steht nichts entgegen, weil das ja die Firma tr¨agt, die Kosten. (6-V) da w¨urde ich sagen, entscheiden wir uns doch f¨ur den ersten Advent, weil den Totensonntag ich nicht f¨ur sehr geeignet halte. (7-V) gut, nachdem sich es ja nur um eine kurze Besprechung handelt, denke ich, daß uns eine Stunde reichen wird. also machen wir Montag, von elf bis zw¨olf sowas. Montag, der f¨unfte April dann. (8-V) guten Tag, Frau Diesner. ich habe Sie vorhin angerufen, weil ich wollte, daß wir Angesicht zu Angesicht einen Termin ausmachen. und weil sich es um ein f¨unft¨agiges Arbeitstreffen in der Filiale AGT R in Bonn handelt, dachte ich mir, ist am besten, wir sitzen uns hier gegen¨uber. und ich habe meinen Terminkalender mitgebracht und vielleicht k¨onnten Sie dann kucken, wie es bei Ihnen aussieht . (9-T) Hatte Fabian als Textgrundlage seiner nur schwer zu dechiffrierenden Bilderwelt einen eigenen Text mit dem vielsagenden Titel Das unsichtbare Genie geschrieben, so blieben in Thomas Roths Experiment nur diese Textsplitter aus dem Original u¨ brig: “Wenn er der Pest nicht schleunig uns entreißt, ... so steigt der Leiche eines ganzen Volkes dies Land , ein Grabesh¨ugel, aus der See”. (10-T) Der Megatrend der Medienentwicklung besch¨aftigte die meisten Referenten und Teilnehmer. Hauke Brunkhorst warnte vor kulturpessimistischer Aufgeregtheit – angefangen mit Musik und Theater bis zum Internet werde jedem Medium unterstellt, daß es den Menschen sich selbst entfremde. (11-V) ja , sollen wir uns dann vielleicht dar¨uber jetzt unterhalten, damit wir den Flug uns danach einrichten k¨onnen? (12-V) ja, auf jeden Fall. m¨ochte auch so ein bißchen was von Hannover sehen,
Grammaticality Ratings and Corpus Frequencies 347 daß wir vielleicht auch ein paar Sehensw¨urdigkeiten uns anschauen, w¨urde ich sagen? wenn Sie da Interesse dran haben? (13-V) ja, also, eine Woche m¨ußte drin sein. ich w¨urde vorschlagen daß wir also, daß wir die achtunddreißigste Woche uns daf¨ur ausgucken. (14-V) daß wir Donnerstag vormittag einfach uns vornehmen. ja, das das ist prima. (15-V) wenn wir schon so viel uns anschauen. (16-V) ach, wenn ich mir es so recht u¨ berlege, w¨are der Samstag lieber. (17-V) obwohl, wenn Sie uns sie zuschicken k¨onnten, w¨are auch nicht so schlecht. 4. Note that only a subset of the ditransitive clauses considered in Section 2 (cf. Tables 1 and 2) were included in Figures 2 and 3: especially in the VERBMOBIL dialogues, quite a few clauses contain more than one pronominal argument NP. 5. A second, less salient discrepancy concerns the ditransitive family members whose ordering pattern complies with the production-based linearization rule, i.e., the pairs to the left of the thick vertical lines in Figures 2 and 3: the member with the highest grammaticality rating does not always have the highest frequency. Instead, the most frequent one tends to be the “primary” ordering in the production-based rule. The only exception seems to be the (S,Ip,O) family, where the two orderings left of the vertical line have nearly identical frequency scores. However, in the midfield of main clauses in the written corpora, the highest frequency score is obtained by the ordering whose grammaticality rating is second best, i.e. by Ip–S–O — also the “primary” one. We will not discuss this phenomenon any further here, except for noting that this frequency pattern is a second reason, in addition to the one mentioned at the end of Section 3, why the ordering Sp–Op–Ip–S–I–O deserves the appellation “primary” for transitive clauses. For the sake of completeness, we list here the frequencies of the ordering patterns in the main clauses from NEGRA II and TIGER: Ditransitives: SIO: 56; SOI: 4; SpOI: 1; SpIO: 18; SIpO 8; IpSO: 59; SOpI: 2; OpSI: 12; OpIS: 4. This sums up to 164 clauses for the four ditransitive families. In addition, we found 39 clauses with two or three pronominal arguments. The mono-transitives yielded the following frequencies: SO: 1415; OS: 14; SpO: 487; SOp: 31; OpS: 580; SpOp: 89.
References
Bard, Ellen G., Dan Robertson, and Antonella Sorace 1996 Magnitude estimation of linguistic acceptability. Language, 72: 32– 68.
348 Gerard Kempen and Karin Harbusch Brants, Sabine, Stefanie Dipper, Sylvia Hansen, Wolfgang Lezius, and George Smith 2002 The TIGER Treebank. In Erhard Hinrichs and Kiril Simov, (eds.), Proceedings of the Workshop on Treebanks and Linguistic Theories (TLT), pp. 24–41. Sozopol, Bulgaria. Burger, Susanne, Karl Weilhammer, Florian Schiel, and Hans G. Tillmann 2000 Verbmobil data collection and annotation. In Wolfgang Wahlster, (ed.), Verbmobil: Foundations of Speech–to–Speech Translation, pp. 537–549. Springer, Berlin, Germany. Eisenberg, Peter 1994 Grundriss der Deutschen Grammatik (3rd revised edition). J.B. Metzler Verlag, Stuttgart, Germany. Hawkins, John A. 2004 Efficiency and complexity in grammars. Oxford University Press, Oxford, U.K. Keller, Frank 2000 Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph.d. thesis, University of Edinburgh, Edinburgh, UK. Kempen, Gerard and Karin Harbusch 2003 Word order scrambling as a consequence of incremental sentence production. In Holden H¨artl and Heike Tappe, (eds.), Mediating between concepts and language — Processing structures, pp. 141–164. Mouton De Gruyter, Berlin, Germany. 2004a A corpus study into word order variation in German subordinate clauses: Animacy affects linearization independently of grammatical function assignment. In Thomas Pechmann and Christopher Habel, (eds.), Multidisciplinary approaches to language production, pp. 173–181. Mouton De Gruyter, Berlin, Germany. 2004b Generating natural word orders in a semi–free word order language: Treebank–based linearization preferences for argument NPs in subordinate clauses of German. In Alexander Gelbukh, (ed.), Proceedings of the Fifth International Conference on Intelligent Text Processing and Computational Linguistics (CICLING), Seoul, Korea, pp. 350– 354. Springer, Lecture Notes in Computer Science, Berlin, Germany. K¨onig, Esther and Wolfgang Lezius 2000 A description language for syntactically annotated corpora. In Proceedings of the International Conference on Computational Linguistics (COLING), Saarbr¨ucken, Germany, pp. 1056–1060. Kurz, Daniela 2000 A statistical account on word order variation in German. In Anne Abeill´e, Thorsten Brants, and Hans Uszkoreit, (eds.), Proceedings of the COLING Workshop on Linguistically Interpreted Corpora (LINC), Luxembourg.
Grammaticality Ratings and Corpus Frequencies 349 M¨uller, Gereon 1999 Optimality, markedness, and word order in German. Linguistics, 37: 777–815. Pechmann, Thomas, Hans Uszkoreit, Johannes Engelkamp, and Dieter Zerbst 1996 Wortstellung im deutschen Mittelfeld. Linguistische Theorie und psycholinguistische Evidenz. In Perspektiven der Kognitiven Linguistik, pp. 257–299. Westdeutscher Verlag, Oplaten, Germany. Rambow, Owen 1994 Formal and Computational Aspects of Natural Language Syntax. Ph.d. thesis, University of Pennsylvania, Philadelphia, PA, USA. Skut, Wojciech, Brigitte Krenn, Thorsten Brants, and Hans Uszkoreit 1997 An annotation scheme for free word order languages. In Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLP), pp. 27–28. Washington D.C., USA. Uszkoreit, Hans 1987 Word Order and Constituent Structure in German. CSLI Publication, Stanford, CA, USA. Wasow, Thomas 2002 Postverbal behavior. CSLI Publications, Lecture Notes 145, Stanford, CA, USA.
The Emergence of Productive Non-Medical -itis: Corpus Evidence and Qualitative Analysis Anke L¨udeling and Stefan Evert
1 Introduction No natural language has a closed vocabulary (Kornai 2002). In addition to mechanisms that add to the base vocabulary, like borrowing, shortening, creativity etc., the productivity of morphological processes can form new complex entries. Some word formation processes can be used to form new words more easily than others. This fact, called morphological productivity, has been recognized for a long time and discussed from many points of view (see for example Aronoff 1976; Booij 1977; Baayen and Lieber 1991; Baayen 1992; Plag 1999; Bauer 2001; Baayen 2001; Nishimoto 2004). This paper is concerned with evidence for different aspects of morphological productivity. Our claim is that the problem of productivity can only be understood when different kinds of evidence – quantitative and qualitative – are combined. We will try to understand more about the interaction of qualitative and quantitative aspects of morphological productivity. We illustrate our claim by looking at a morphological element that has not received much attention in morphological descriptions yet: German -itis. 1 1.1
Qualitative productivity
In this section we want to introduce two different ways of looking at the qualitative aspects of productivity: categorial models and similarity-based models. In generative models for linguistic competence, every rule 2 is either valid or not, i.e. a rule produces the ‘grammatical’ expressions – complex words in our case – of a language. A rule states whether a process is available (Bauer 2001) in a given language. In our model, rules can refer to every linguistic property of a lexical entry – a consequence of this are, of course, complex lexical entries where information on all linguistic levels can be associated with each word. For the sake of simplicity we assume a basic item-andarrangement model for German word formation here. 3
352 Anke L¨udeling and Stefan Evert Rules refer to linguistic categories and are thus categorial. In a competence model every rule is 100% productive, i.e. every item that belongs to a category given in a rule can be inserted. The rules do not refer to linguistic experience or frequencies of complex words or the like. In a lexical component that goes with a competence model, only irregular complex items are stored.4 Next to rule- or constraint-based competence systems, there are morphological models that are based on similarity: existing words are grouped according to some similarity criterion. The proportional formula introduced by Greek grammarians defines one instance of a similarity measure, analogy. The term analogy has been used in morphology in at least two different ways (for an overview see Becker 1990). For the Neogrammarians, analogy was a regularization process in the formation of new groups and elements, and thus in language change, be it in syntax or in morphology (compare Paul 1920: Chapter 5).5 A different view is given in Pinker (1999), where analogy is a process that is used for exceptions only and is a process totally different from rules. In Pinker’s model, regularly formed words are not stored. In contrast to competence models, analogical models in the sense of Paul, which we want to adopt here, are based on linguistic experience: we assume that instances even of regularly formed types are stored and grouped, and these groups can serve as examples after which new elements can be moulded. Rule-based and similarity-based models for morphology have in common that they are based only on the different types of complex words produced by a morphological process. 1.2
Quantitative productivity
As stated above, some processes form new words more easily than others. In a generative competence model, this notion is not expressible. Nonetheless, many authors have tried to associate quantitative terms such as ‘highly productive’ or ‘semi-productive’ with generative word formation rules, without specifying how they would fit into a generative model. See Plag (1999: 12) for an overview. The intuition is that the different ‘degrees’ of productivity are due to the number of restrictions for a word formation process and also to the number of possible bases. Some authors have even given formulae for measuring productivity: Aronoff (1976: 36), for example, states that “There is a simple way to take such restrictions into account: we count up the number of words which we feel could occur as the output of a given WFR [word formation rule, AL&SE] (which we can do by counting the number of pos-
The Emergence of Productive Non-Medical -itis 353
sible bases for that rule), count up the number of actually occurring words formed by that rule, take a ratio of the two, and compare this with the same ratio for another WFR. In fact, by this method we could arrive at a simple index of productivity for every WFR: the ratio of possible to actual words.” The formula he suggests is thus
(1)
I=
number of attested words number of possible words
Even if it were feasible to count the number of possible bases and the number of actual bases, this formula would not yield the intended result: what we would get is a static number. But what does it mean to state that, say, 38% of all words that can possibly be formed by a rule have already been formed? This number would not tell us whether the rule will ever form a new word, i.e., it will not allow a statement on the productivity of a rule and certainly makes no predictions. This issue is also discussed by Baayen (1989) who states that Aronoff’s measure I can be seen as expressing the unproductivity of the word formation process for an unproductive affix (Baayen 1989: 30). Booij (1977: 5) suggests a different way of computing productivity: “The degree of productivity of a WF-rule can be seen as inversely proportional to the amount of competence restrictions on that WF-rule.” To realize this idea, one would have to come up with a theory of how to count and rank restrictions. (Does the restriction “X combines only with verbs” have the same status as the restriction “X combines only with bisyllabic elements”? etc.) If we had such a theory, the formula would again yield a static number. To circumvent such problems, more sophisticated quantitative models have been proposed, which take both the number of types of complex words and the number of tokens of these words into account - counted on a given corpus (see below). The most influential ones are the statistical models developed by Baayen (see, among others, Baayen 1989; Baayen and Lieber 1991; Baayen 1992, 1993a,b, 1994, 2001, 2003). Here, the quantity of interest is the readiness with which a morphological rule will form a new complex word. It can be operationalized by the concept of vocabulary growth, i.e. how often new word types are encountered when an increasing amount of text is sampled. We will return to measures of vocabulary growth and productivity models in Section 3. Before that, we describe the properties of the element -itis and explain why this qualitative analysis needs to be complemented with quantitative information.
354 Anke L¨udeling and Stefan Evert 2 -itis We will now briefly describe the properties of our test case -itis. We chose -itis because it is part of two very different word-formation processes: the rule-based, or categorial, medical -itis and the similarity-based non-medical -itis. 2.1
Medical -itis
The German morphological element -itis is originally used in medical contexts with the meaning ‘inflammation (of)’. It is always bound and combines productively with neoclassical elements denoting body parts, e.g. Arthritis ‘inflammation of the joints’ or Appendizitis ‘inflammation of the appendix’. Most of the elements it combines with occur only in bound form (often called a formative), it is therefore difficult to assign them a part of speech. However, from their semantics, it could be argued that they are nominal elements. A rule for medical -itis could look like (2) 2.2
N ← Formative neoclassical [[body-part]] + -itis Non-medical -itis
The suffix -itis can be used in non-medical contexts in a different function. Well-known examples of this ‘non-medical -itis’ are Telefonitis ‘excessive use of the telephone’ or Subventionitis ‘excessive subsidizing’. In contrast to medical -itis it is difficult to characterize non-medical -itis in categorial terms. It combines mostly with neoclassical elements but (in recent years, see below) more and more also with native elements, cf. Fresseritis ‘eating too much’, names as in Wehneritis ‘being too much like Wehner (a German politician in the 1960s and 1970s)’ or English elements Bestselleritis. Categorially, the non-head can be a noun, as in Zitatitis ‘citing too much’, a verb as in Aufschieberitis ‘procrastinating too much’, or an adjective as in Exklusivitis ‘wanting exclusive interviews, articles, etc. too often (journalistic context)’ or even phrases as in Vielzuvielitis, lit.: much-too-much-itis ‘wanting too much’. The suffix -itis attracts and bears stress and wants to follow an unstressed syllable. Where the non-head ends in a stressed syllable, sometimes the allomorph -eritis is used, cf. Filmeritis ‘watching too many movies’. Where the non-head ends in a vowel, a linking element is inserted, as in Tangolitis ‘playing too many tangos’. Semantically, non-medical -itis is rather vague – its meaning can be described as ‘doing too much of X’ where
The Emergence of Productive Non-Medical -itis 355
‘X’ is some activity related to the meaning of the non-head. This vague paraphrase shows already that the non-head is interpreted ‘verbally’ rather than ‘nominally’ independent of its actual part of speech. Note that the meaning of non-medical -itis is, of course, not independent of the meaning of medical -itis: we suspect that medical -itis was generalized to mean ‘illness’ (instead of referring specifically to an inflammation). One indication for this is the fact that non-medical -itis collocates with words such as akut ‘acute’, chronisch ‘chronic’ or leiden an ‘suffer from’. It is not easy to write a categorial rule for non-medical -itis like the one above for medical -itis. We believe that non-medical -itis is a good case of a similarity-based process. One piece of evidence is that non-medical -itis words are to a certain extent stylistically marked, which medical -itis words are not. 2.3
Goals of the quantitative analysis
The qualitative analysis of -itis shows that we have evidence for two morphological processes with different properties. Qualitative evidence does not suffice, however, to explain their productivity. We want to look at two aspects of productivity: (1) do both processes differ with respect to productivity and (2) (how) does the productivity of each process change over time? It has been argued that categorial and similarity-based morphological processes exist next to each other. If so, can we see differences in their quantitative behaviour? As stated above, in a competence model every rule is fully productive. The rule we formulated says that all neoclassical formatives that denote body parts can be inserted. This cannot be directly compared to a similarity-based process where one can calculate type-token relationships. In the remainder of this paper we will therefore use the same model, based on type-token statistics, for both processes (see Section 3.2). This means that we will only look at the output – the complex words – of the two processes. If the two processes are really fundamentally different, we would expect to see quantitative differences in their output: the productivity for the rule-based process should be higher and more constant. The statistical analysis assumes a homogeneous model – we would therefore expect to get better goodnessof-fit values for the rule-based process than for the much more heterogeneous similarity-based process. Another issue of interest is the short-term diachronic change of productivity. The hypothesis would be that the established medical rule-based use of -itis does not change over time, but non-medical -itis, which is similarity-
356 Anke L¨udeling and Stefan Evert based and therefore dependent on the stored examples, can show short-term qualitative changes as well as changes in productivity. Again, this cannot be expressed in a competence model. We will suggest different ways of looking at what could be called ‘diachronic productivity’ below. Our quantitative analysis of -itis is based on the full 980 million word corpus “Textbasis f¨ur das digitale W¨orterbuch der deutschen Sprache” (henceforth Textbasis) collected by the Berlin-Brandenburgische Akademie der Wissenschaften. This corpus is an opportunistic collection of newspaper data, literature, informative texts, scientific texts and spoken language from the 20th century.6 The theoretical problems in using an opportunistic corpus of this sort are addressed below. In addition, there are a number of practical problems, which are described by Evert and L¨udeling (2001). The data have been manually cleaned up according to the guidelines given there. It is important to keep in mind that quantitative measures of productivity are closely tied to the corpus on which they are based. The precise question to which they provide an answer can be paraphrased in the following way: how likely is it that previously unseen word types (formed by the process being studied) will appear when additional (similar) text is sampled? Our interest in the phenomenon of productivity, however, is at its core a cognitive one – we want to understand how a speaker of a language knows that she can use a morphological rule to form a new word or phrase. Quantitative productivity is an observable reflex of this knowledge, namely the readiness of speakers to form new words, but it is also influenced by many other factors. In particular, our results apply only to the particular situation that is represented by Textbasis (mostly journalistic writing). However, it is also possible to give the corpus data a cognitive interpretation: we assume a model of word formation that incorporates qualitative and quantitative knowledge about word formation processes. This model is based on the idea that knowledge about the productivity of a morphological process depends on a speaker’s linguistic experience. This implies that both qualitative and quantitative aspects of productivity change with the change of experience. Corpus data – in particular the number of different words already formed by a given process and the apparent readiness of forming new words – can then be seen as a model for the speaker’s linguistic experience. Such an assumption is problematic in many respects, of course: no existing corpus comes close to representing the experience of a native speaker, let alone an opportunistic collection such as Textbasis or the recently very popular “World Wide Web as a corpus”. In this paper we therefore only measure and compare the productivity of the two processes involving -itis within Textbasis without claiming to provide a corpus-based model for linguistic experience.
The Emergence of Productive Non-Medical -itis 357
3 3.1
Measuring morphological productivity Vocabulary growth
100 80 60 40 0
20
Number of types (V)
150 100 50 0
Number of types (V)
200
The statistical models of Baayen (1989, 1992, 2001, 2003) link the degree of productivity of a morphological process to the rate of vocabulary growth, i.e., to how frequently new word types that are formed by the process are encountered when an increasing amount of text is sampled. If the degree of productivity changes over time, there should be a corresponding change in the vocabulary growth rate. For a corpus with a publication date for each document (as in the case of Textbasis7 ), a natural approach is to scan the corpus in chronological order. The vocabulary size V of a given word-formation process at a given time t, given as V (t), is the number of different word types (formed by the process) found in the part of the corpus up to the time t. Figure 1 shows vocabulary growth curves, graphs of V (t) against t, for medical (left panel) and nonmedical (right panel) -itis nouns in Textbasis. The slope of these vocabulary growth curves represents the rate at which new types appear in the corpus.
1900
1940 Date
1980
1900
1940
1980
Date
Figure 1. Vocabulary growth of -itis throughout the 20th century (left: medical -itis, right: non-medical -itis)
Taken at face value, the steep rise of both vocabulary growth curves towards the end of the century seems to indicate that both medical and nonmedical -itis have become much more productive in the 1990’s. There is also a startling jump in the left graph, where more than 100 new medical -itis words suddenly appear in the data. A closer inspection reveals that Textbasis comprises a substantial part of the 1906 edition of the German Brockhaus
358 Anke L¨udeling and Stefan Evert
200 150 100 0
50
Number of tokens
1000 500
Brockhaus encyclopedia
0
Number of tokens
1500
Encyclopedia, including definitions of a large number of medical terms. At first, one may be inclined to dismiss this as a quirk in the composition of the corpus and discard the dictionary data. The situation reveals a fundamental problem of the vocabulary growth approach to productivity, though. Obviously, all -itis words listed in the encyclopedia must have been in use at the time of publication and deserve to be included in V (t), at least when the latter is given a strict interpretation as the number of different words formed up to the time t. In a corpus of ‘ordinary’ text, on the other hand, many of these medical terms would have been encountered at a much later time, or perhaps not at all, giving a smooth growth curve similar to that of non-medical -itis. This shows that one cannot know whether a ‘new’ word was actually formed at time t or whether it had already been established in the language and just happened not to occur in the corpus data from the preceding time period (in this case, it would have occurred if more or different text had been sampled from this time period).8 As a consequence of the stochastic nature of growth curves, the number of new types encountered in a given time period depends crucially on the amount of text sampled. Figure 2 shows the number of instances of -itis nouns in each five-year period (left panel: medical use; right panel: nonmedical use). Almost all tokens occur in the last decade of the century (with the exception of the medical -itis nouns in the Brockhaus Encyclopedia). The large number of new types found during this period may simply be a correlate of the large number of tokens and need not imply a change in the degree of productivity.9
1900
1940 Date
1980
1900
1940
1980
Date
Figure 2. Number of instances found in Textbasis for five-year periods in the 20th century (left: medical -itis, right: non-medical -itis)
The Emergence of Productive Non-Medical -itis 359
Vocabulary growth curves as shown in Figure 1 mix up two different effects: (i) how new types are encountered when more text is sampled (synchronic vocabulary growth, cf. Section 3.2), and (ii) how easily new types are formed by speakers of the language (changes in the degree of productivity, which may lead to diachronic vocabulary growth when complex words are formed that were previously impossible or at least highly unusual). In order to obtain meaningful results from a statistical analysis, it is necessary to separate these two effects. We suggest that the following procedure be employed. First, determine the synchronic productivity of the process at a given point in time (Section 3.2), using a statistical model that takes the stochastic nature of (synchronic) vocabulary growth into account. The resulting measure of productivity must be independent of the amount of text sampled. Second, study the diachronic aspect of productivity by comparing the degree of synchronic productivity at two (or more) different points in time (Section 3.3). In order to make this comparison possible, the source corpus must satisfy certain criteria, which are also summarized there. 3.2
Synchronic productivity
Synchronic productivity captures the behaviour of a single speaker or a community of speakers at a fixed point in time. The standard models interpret the observed corpus data as a random sample from the potential output of the speaker(s). More precisely, the relevant -itis tokens (either medical or non-medical) in the observed data are treated as a random subset of the -itis tokens in the speakers’ output; all other tokens are discarded. In order to obtain a fully synchronic measure, the time span covered by the corpus should be as short as possible. However, a sufficient amount of data (both a sufficient number of tokens and a sufficient number of different types) is necessary for the statistical analysis. Otherwise, the inherent uncertainty of statistical estimates (such as the ones introduced in this section) would become too high to allow a meaningful interpretation. The following examples are based on Textbasis data from the years 1990–1999, although a shorter time span would be desirable (cf. Section 4). Vocabulary growth curves, albeit of a different kind, provide an intuitive visual approach to synchronic productivity (Baayen 2003: 236–242). Here, vocabulary growth is measured in text time, i.e. with respect to the number of -itis tokens encountered as an increasing amount of text is sampled. Figure 3 displays synchronic vocabulary growth curves for -itis nouns (left panel: medical use, right panel: non-medical use). Note that both graphs are drawn
360 Anke L¨udeling and Stefan Evert
60 40 20 0
Number of types (V)
80
100 200 300 400 500 0
Number of types (V)
to the same relative scale, with 10 units on the x-axis corresponding to 3 units on the y-axis. However, the sample size N is vastly different for the two processes (N = 1707 for medical vs. N = 242 for non-medical -itis). For direct comparison, the growth curve of non-medical -itis is shown as a thin dotted line in the left panel, and that of medical -itis is shown as a thin dotted line in the right panel.
0
500
1000
1500
Number of tokens (N)
0
50
100
200
300
Number of tokens (N)
Figure 3. Synchronic vocabulary growth curves for -itis in the 1990’s, showing the number of different types among the first N instances of -itis words in the corpus (left: medical -itis, right: non-medical -itis)
The slope of a vocabulary growth curve, which can be interpreted as the probability that the next -itis token will be a previously unseen one, provides a natural measure of productivity. It is sometimes referred to as the categoryconditioned degree of productivity P (Baayen 2003: 240). Obviously, the jagged growth curves would need to be smoothed in some way before their slope can be computed. These irregularities are a stochastic effect of sampling, depending on the particular order in which the tokens are arranged in the sample. Under the random sample model, the precise arrangement is irrelevant: all re-orderings of the sample are equally likely. An ‘average’ value for the growth rate P is thus obtained by averaging over all possible re-orderings. It can easily be estimated from the sample size N and the number V1 of hapax legomena (word types that occur just once in the sample): P ≈ V1 /N (Baayen 2001: 50). From the growth curves in Figure 3, we obtain P ≈ .0217 for medical -itis and P ≈ .248 for non-medical -itis. On this scale, the productivity of non-medical -itis seems to exceed that of medical -itis by more than a factor
The Emergence of Productive Non-Medical -itis 361
of eleven. Such a ‘naive’ interpretation of P is problematic, though, mostly because the growth rate depends critically on the sample size. When samples of identical size N = 200 are compared for the two processes (cf. the right panel of Figure 3), the difference in the degree of productivity is less striking: P ≈ .075 vs. P ≈ .265, a factor of 3.5 only. This example shows that despite its intuitive interpretation, it is difficult – if not impossible – to use P as a measure for the degree of productivity of a word formation process. P is much more an extrapolation of the observed sample than an absolute (i.e. size-independent) measure of productivity.
60 50 40 30 20 0
10
Number of types (Vm)
50 40 30 20 10 0
Number of types (Vm)
60
The measure P focuses entirely on the number of hapax legomena in the sample. Intuitively, this approach makes sense: after all, the hallmark of a productive process are nonce formations, created as they are needed in a specific situation to express a certain concept. Such a need may arise again on a similar occasion, though, so that the same word will once more be productively formed by the same or a different speaker. When a sufficient amount of text is sampled, many types will be seen more than once even for a highly productive process. It is therefore necessary to look at all low-frequency types, not just the hapax legomena. Figure 4 shows the number Vm of -itis types that occur exactly m times in the sample, for m = 1 . . . 10 (left panel: medical use, right panel: non-medical use). Such a bar graph (or a corresponding table of m and Vm ) is referred to as the frequency spectrum (Baayen 2001: 8) of a word formation process with respect to the observed corpus.
1 2 3 4 5 6 7 8 9 Frequency class (m)
1 2 3 4 5 6 7 8 9 Frequency class (m)
Figure 4. Frequency spectrum for -itis nouns in the 1990’s, showing the number Vm of types that occur exactly m times in the sample (left: medical -itis, right: non-medical -itis)
362 Anke L¨udeling and Stefan Evert Although productively formed types may occur more than once, they will in general be less frequent than well-established words. This reasoning implies that a productive process should be characterized by a frequency spectrum that is skewed towards the lower end. The stronger the skew, the more productive the process is. The frequency spectra in Figure 4 confirm the impression given by the growth curves, with the spectrum of non-medical -itis being dominated by hapax and dis legomena (types occurring twice). Baayen (2001: Chapter 3) describes statistical models that abstract away from the stochastic irregularities of a sample-based frequency spectrum and estimate how much the full output of the speaker(s) is skewed towards lowfrequency words (cf. the remarks at the beginning of this section). He refers to them as LNRE models, where LNRE stands for “large number of rare events” (after Khmaladze 1987). It is not obvious which one of several possible LNRE models should be used. These models differ in their flexibility and accuracy, but also in their computational complexity. None of them has a theoretical foundation rooted in the theory of morphological productivity. Therefore, a multivariate goodness-of-fit test is applied to find out how well the predictions of the model agree with the observed spectrum (Baayen 2001: 118–122). It is only appropriate to draw further inferences from an LNRE model when it has been confirmed by the goodness-of-fit test as a plausible explanation for the observed data. For the experiments reported in this paper, we used the finite Zipf-Mandelbrot (fZM) LNRE model introduced by Evert (2004), which is based on the Zipf-Mandelbrot law (Zipf 1949; Mandelbrot 1962). The fZM model is both computationally efficient and flexible, and it is reported to achieve better goodness-of-fit than many other LNRE models (Evert 2004: 420–421). Figure 5 compares the observed frequency spectra of medical and non-medical -itis with the predictions of the fZM models. The multivariate goodness-of-fit test shows an acceptable fit for medical -itis (χ 2 = 22.59, df = 13, p = .047) and an excellent fit for non-medical -itis (χ 2 = 13.91, df = 13, p = .380). The overall shape of the frequency spectrum predicted by the fZM model is mainly determined by the model parameter α. Its values range from α = 0 (indicating a balanced spectrum, where the number of hapax legomena is not much larger than the number of types in higher frequency ranks) to α = 1 (indicating a highly skewed spectrum that is entirely dominated by the hapax legomena).10 We can thus tentatively use α as a quantitative measure for the degree of productivity. When α is close to 0, the morphological process in question may not be productive at all, when α ≈ 0.5, it is moderately productive, and when α is close to 1, it has a very high degree of productivity.
60 50 10
20
30
40
observed fZM model
0
Number of types (Vm)
50 10
20
30
40
observed fZM model
0
Number of types (Vm)
60
The Emergence of Productive Non-Medical -itis 363
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
Figure 5. Frequency spectrum for -itis nouns in the 1990’s together with the predictions of the finite Zipf-Mandelbrot LNRE model (left: medical -itis, right: non-medical -itis)
For medical -itis, the shape parameter is α ≈ 0.565; for non-medical -itis, it is α ≈ 1, indicating that the latter is indeed much more productive. The finite Zipf-Mandelbrot model also provides an estimate for the total number of complex -itis types that can be formed by the two processes, which is S ≈ 183 for medical use and S ≈ 435 for non-medical use (see Evert 2004: 417–418). Such estimates must not be taken all too literally, though, because the fZM model and similar LNRE models gloss over many of the complexities of word frequency distributions, concentrating on the more ‘regular’ lower end of the frequency spectrum (cf. Baayen 2001: chapter 4). 11 Moreover, the relatively small size of our samples implies that many different classes of LNRE models (beside the fZM model used in our experiments) will be consistent with the observed data (as measured by their goodness-of-fit), some of which may predict a much larger or even infinite value for S. One way of testing the plausibility of our estimates is to compare the value S ≈ 183 with the number of established -itis terms in medical jargon. Manual counts on randomly selected pages from a German medical dictionary (Ahlheim and Lichtenstern 1968) indicate a minimum of 220 such terms (and probably even more than 300 terms).12 One possible explanation for the substantial underestimation of S by the fZM model is the composition of the Textbasis, which contains little technical writing from the domain of medicine. Therefore, statistical models applied to these data estimate the number of -itis nouns that are used in general language rather than the possibly much greater number available
364 Anke L¨udeling and Stefan Evert to a medical expert. Despite these reservations, the estimated values of S are useful as an intuitive and readily interpretable way of comparing the productivity of different processes in our experiments. The comparison is valid because both processes are analyzed with the same class of statistical models (namely, the fZM model), so that differences in the estimated parameters reflect actual differences between the frequency distributions of the two processes (rather than resulting from the assumptions underlying different statistical models). 3.3
Diachronic productivity
Our approach to diachronic productivity, changes in the readiness with which a morphological process forms new words, is based on the measures of synchronic productivity developed in section 3.2. We compute the degree of synchronic productivity for a given process at two points in time, t 1 and t2 . By comparing e.g. the shape parameters α(t 1 ) and α(t2 ) (or the estimated total number of types, S(t1 ) and S(t2 )), we can detect an increase or decrease in productivity. For a precise description of diachronic trends, it would be necessary to consider further points in time, t 1 , . . . ,tn , and formulate a mathematical model t → α(t) for the development of the shape parameter. This model could take the form of a logistic function, for instance, which is often used in research on language change (see e.g. Zuraw 2003: 148–149). In order to make this comparison, we need text samples from the time points t1 and t2 , or short time spans containing those points. The statistical models ensure that we need not worry about different sample sizes. However, some requirements remain, which unfortunately are not met by the Textbasis corpus. First, we have already pointed out at the beginning of Section 3.2 that a certain minimal amount of data is needed in order to carry out a meaningful statistical analysis, both with respect to the number of tokens and the number of types. This means that even a corpus containing millions of words may not be large enough when words formed by the process of interest are rare in the language. Moreover, a process with a low degree of productivity might require even larger samples in order to have a sufficient number of different types. In Textbasis, a sufficient number of -itis tokens are only found for the years from 1993, where several hundred million words of newspaper text are included in the corpus. During the earlier decades, there are only isolated instances of -itis words, both medical and non-medical – far too little data for the application of an LNRE model (e.g., there are only N = 16 instances of non-medical -itis in Textbasis before the year 1990).
The Emergence of Productive Non-Medical -itis 365
0.2
0.4
0.6
0.8
Fusionitis Subventionitis Telefonitis Festivalitis other
0.0
Relative frequencies
1.0
Second, the text samples from t1 and t2 must have similar composition (with respect to modality, text type, domain, etc.) in order to allow a direct comparison of the productivity measures. For instance, it is quite plausible to assume that non-medical -itis is more productive in fashionable journalistic writing than in literary or scientific texts. Even if we had a sufficient amount of data in Textbasis for the earlier decades, the prevalence of newspaper text in the 1990’s might be responsible for a significantly higher degree of productivity. Finally, the individual text samples must be taken from a short time span in order to measure short-term developments. While it is not clear yet whether a morphological process can become productive (or unproductive) within a few years, such rapid changes are commonplace at the level of individual types. Figure 6 illustrates this claim with the example of non-medical -itis. The bar graph shows the relative frequencies of the four most frequent word types in the years 1993–1999. While Fusionitis ‘too many mergers’ rapidly becomes popular towards the end of the century, Subventionitis ‘too much subsidizing’ has its heyday in the years 1994–1995, and seems to fall out of use afterwards.
1993
1995
1997
1999
Year
Figure 6. The relative frequencies of the four most frequent non-medical -itis words in the years 1993–1999.
With all its limitations, the Textbasis corpus still has great value for the qualitative description of productivity, showing that non-medical -itis existed
366 Anke L¨udeling and Stefan Evert before the 1990’s. A new type is encountered every few years, starting with the first occurrence of Spionitis ‘excessive fear of spies’ in 1915. 4 Conclusion To sum up, we have discussed the productivity of two morphological processes with different qualitative properties, categorial or rule-based medical -itis and similarity-based non-medical -itis. Since qualitative evidence alone is not sufficient to explain productivity, we have also used quantitative evidence from a German text corpus. We have argued that a theoretical distinction between rule-based and similarity-based processes should be reflected in their quantitative behaviour: rule-based processes should be more productive, lead to frequency distributions that can accurately be described by statistical LNRE models, and their degree of productivity should not change over time. We have then shown that the quantitative properties of the two processes in question do not confirm our hypotheses. Although this surprising result may well be due to the nature of our data, one might also come to the (at this point very tentative) conclusion that morphological theory does not need to make a distinction between rule-based and similarity-based processes. Acknowledgements We would like to thank Alexander Geyken and Gerald Neumann who provided the Textbasis data on which this study is based. We are grateful to the audience at the T¨ubingen conference on Linguistic Evidence, and especially to Peter Bosch, Harald Baayen and an anonymous reviewer, for much helpful advice.
Notes 1. Note that we focus on German -itis which differs in some respects from English -itis. For a discussion of the morphological status of -itis see L¨udeling et al. (2002). 2. We will speak of rules and use a simple rule-based model for the sake of simplicity here but our arguments carry over to constraint-based systems. 3. In an IA model there is no need to distinguish between derivation and compounding. (We restrict ourselves to concatenative processes here.) This means that not only is every stem associated with its word formation stem forms (as assumed in Eisenberg 1998; Fuhrhop 1998) but also every bound entry (see L¨udeling and Fitschen (2002) and Fitschen (2004) for a discussion).
The Emergence of Productive Non-Medical -itis 367 4. The idea that only irregular words are stored in a lexicon while all regular words can be derived via rules is, of course, older than generative linguistics (see for example Bloomfield 1933). In psycholinguistics the question of what needs to be stored has been discussed for a long time, resulting in models like that of Pinker (1999). We cannot go into the psycholinguistic debate on the storage of complex items. We only want to say here that there has been recent evidence that even regularly inflected words seem to be stored in the mental lexicon (Baayen et al. 1997). 5. Paul assumes that words are combined into groups according to phonological or semantic similarity: “[...] attrahieren sich die einzelnen W¨orter in der Seele, und es entstehen dadurch eine Menge gr¨oßerer oder kleinerer Gruppen. Die gegen¨ seitige Attraktion beruht immer auf einer partiellen Ubereinstimmung des Lautes oder der Bedeutung oder des Lautes und der Bedeutung zugleich.” (Paul 1920: 106) 6. For more information about the Textbasis, see http://www.dwds-corpus.de/. 7. We ignore the fact that the publication date is not necessarily the date of production. 8. Still, the linguistic experience of a particular speaker may in fact show a development just as it happens to be documented in Textbasis. 9. One might speculate whether the larger number of tokens observed in the last decade of the century is connected to intensity of use, which is a different aspect of morphological productivity. A more likely explanation is found in the opportunistic nature of Textbasis. Since the early 1990’s, entire volumes of newspapers have become conveniently available in machine-readable form. Textbasis includes a large amount of such newspaper text, which skews the data in two ways: (i) there is much more text from the 1990’s than from earlier decades, and (ii) this text is dominated by journalistic writing. All instances of non-medical -itis in the 1990’s are from newspaper sources, with the single exception of Aufschieberitis (from Kellner 1998). 10. It has to be noted at this point that the finite Zipf-Mandelbrot model, like most other LNRE models, is only suitable for productive processes with a skewed frequency spectrum. It will not achieve a satisfactory goodness-of-fit for a completely unproductive process. 11. As an example, Grigorij Martynenko estimated from the Brown corpus (Kuˇcera and Francis 1967) that the total vocabulary of American English comprises only S = 112, 500 words (Martynenko 2000: Table 3). 12. A random selection of 77 out of 1277 half-page columns from Ahlheim and Lichtenstern (1968) were inspected manually for -itis headwords, which were found in 17 columns. This gives a maxium-likelihood estimate of 362 -itis terms in the dictionary, with a two-sided 95% confidence interval ranging from 226 to 525 terms (hypergeometric test). Note that a further subclassification of -itis terms is often expressed by combination with Latinate words or phrases, e.g.
368 Anke L¨udeling and Stefan Evert Dermatitis ab acribus ‘dermatitis caused by chemical irritants’. Since this highly productive process is different from the affixation of -itis, the subclassified terms were not included in the counts.
References Ahlheim, Karl-Heinz and Hermann Lichtenstern, (eds.) 1968 DUDEN W¨orterbuch medizinischer Fachausdr¨ucke. Bibliographisches Institut and Georg Thieme Verlag, Mannheim, Stuttgart. Aronoff, Mark 1976 Word Formation in Generative Grammar. The MIT Press, Cambridge, MA. Baayen, R. Harald 1989 A Corpus-Based Approach to Morphological Productivity. Ph.D. thesis, Vrije Universiteit Amsterdam. 1992 Quantitative aspects of morphological productivity. In Geert Booij and Jaap van Marle, (eds.), Yearbook of Morphology 1991, pp. 109 – 150. Foris, Dordrecht. 1993a On frequency, transparency and productivity. Yearbook of Morphology 1992, pp. 181–208. 1993b Statistical models for word frequency distributions. Computers and the Humanities, 26: 347 – 363. 1994 Productivity in language production. In Dominiek Sandra and Marcus Taft, (eds.), Morphological Structure, Lexical Representation and Lexical Access, Special Issue of Language and Cognitive Processes, pp. 447 – 496. 2001 Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht. 2003 Probabilistic approaches to morphology. In Rens Bod, Jennifer Hay, and Stefanie Jannedy, (eds.), Probabilistic Linguistics, chapter 7, pp. 229–287. MIT Press, Cambridge. Baayen, R. Harald, T. Dijkstra, and Robert Schreuder 1997 Singulars and plurals in Dutch: Evidence for a parallel dual route model. Journal of Memory and Language, 36: 94–117. Baayen, R. Harald and Rochelle Lieber 1991 Productivity and English derivation: a corpus-based study. Linguistics, 29: 801 – 843. Bauer, Laurie 2001 Morphological Productivity. Cambridge University Press, Cambridge. Becker, Thomas 1990 Analogie und morphologische Theorie. Fink, M¨unchen. Bloomfield, Leonard 1933 Language. Holt, Rinehart and Winston, New York.
The Emergence of Productive Non-Medical -itis 369 Booij, Geert 1977 Dutch Morphology. A Study of Word Formation in Generative Grammar. de Ridder, Lisse. Eisenberg, Peter 1998 Grundriß der deutschen Grammatik. Band 1: Das Wort. J.B. Metzler, Stuttgart. Evert, Stefan 2004 A simple LNRE model for random character sequences. In Proceedings of the 7`emes Journ´ees Internationales d’Analyse Statistique des Donn´ees Textuelles, pp. 411–422. Louvain-la-Neuve, Belgium. Evert, Stefan and Anke L¨udeling 2001 Measuring morphological productivity: Is automatic preprocessing sufficient? In Corpus Linguistics 2001. Lancaster. Fitschen, Arne 2004 Lexikon als komplexes System. doctoral dissertation, Universit¨at Stuttgart. Fuhrhop, Nanna 1998 Grenzf¨alle morphologischer Einheiten. Stauffenburg-Verlag, T¨ubingen. Kellner, Hedwig 1998 Das geheime Wissen der Personalchefs. Eichborn, Frankfurt a. M. Khmaladze, E. V. 1987 The statistical analysis of large number of rare events. Technical Report MS-R8804, Department of Mathematical Statistics, CWI, Amsterdam, Netherlands. Kornai, Andr´as 2002 How many words are there? Glottometrics, 4: 61–86. Kuˇcera, H. and W. N. Francis, (eds.) 1967 Computational Analysis of Present-Day American English. Brown University Press, Providence. L¨udeling, Anke and Arne Fitschen 2002 An integrated lexicon for the analysis of complex words. In Proceedings of EURALEX 2002, pp. 145 – 152. Copenhagen. L¨udeling, Anke, Tanja Schmid, and Sawwas Kiokpasoglou 2002 Neoclassical word formation in German. Yearbook of Morphology 2001, pp. 253–283. Mandelbrot, Benoit 1962 On the theory of word frequencies and on related Markovian models of discourse. In R. Jakobson, (ed.), Structure of Language and its Mathematical Aspects, pp. 190–219. American Mathematical Society, Providence, RI. Martynenko, Grigorij 2000 Statistical consistency of keywords dictionary parameters. Technical report, Department of Computational Linguistics, St. Petersburg State University. Available from citeseer.ist.psu.edu/407553.html.
370 Anke L¨udeling and Stefan Evert Nishimoto, Eiji 2004 A Corpus-Based Delimitation of New Words: Cross-Segment Comparison and Morphological Productivity. Ph.D. thesis, Graduate Faculty in Linguistics, The City University of New York. Paul, Hermann 1920 Deutsche Grammatik. Band V. Teil IV: Wortbildungslehre. Verlag von Max Niemeyer, Halle a.S. Pinker, Steven 1999 Words and Rules. The Ingredients of Language. Basic Books (Perseus Book Group), New York. Plag, Ingo 1999 Morphological Productivity. Structural Constraints in English Derivation. Mouton de Gruyter, Berlin. Zipf, George Kingsley 1949 Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge, MA. Zuraw, Kie 2003 Probability in language change. In Rens Bod, Jennifer Hay, and Stefanie Jannedy, (eds.), Probabilistic Linguistics, chapter 5, pp. 139– 176. MIT Press, Cambridge.
Experimental Data vs. Diachronic Typological Data: Two Types of Evidence for Linguistic Relativity Wiltrud Mihatsch
1
Linguistic relativity and lexical change: a new perspective
Does language influence thought? The radical version of linguistic relativity as proposed by Sapir, Whorf, Humboldt and Boas assumes that the linguistic structures of a language determine the world view of its speakers. From the sixties onwards, the idea of linguistic relativity was supplanted by the search for universals in the cognitive sciences (cf. Gumperz and Levinson 1996 for an overview), in generative linguistics as well as in cognitive linguistics. However, recent contributions to a moderate version of linguistic relativity (see Gumperz and Levinson 1996) show that the truth must lie somewhere in between these two views. Beside many linguistic universals that reflect cognitive universals, there exist certain types of categorization processes which are undoubtedly influenced by mostly grammatical linguistic routines. Does the creation of new lexical units also reflect linguistically biased conceptualization strategies, and is the lexicon therefore partly shaped by grammatical typological factors? Most lexical items are derived from existing ones via meaning change, word formation, borrowing and combinations of these. In many cases, the creation of new lexical items thus reflects the conceptualization of one concept in terms of another. Lexical change can then “fossilize” conceptual categorization. This paper compares data taken from the psycholinguistic literature in the domain of object classification with data from lexical change. The latter are mainly taken from the database of the research project B6 (LexiTypeDia), which has collected paths of lexical change in the domain of body parts in a sample of over 30 languages of the world.
372
Wiltrud Mihatsch
2 2.1
Language type and object classification Noun systems in the languages of the world
The grammatical category of number inflection on nouns1 can reflect the referents’ perceptual properties. In English, we observe strong correlations between mass nouns and substances, count nouns and discrete objects. In many Indo-European languages such as English, count nouns with an obligatory singular/plural distinction prevail. (Even in these languages, not all nouns can be inflected for number, see Corbett 2000: 88.) This is also true for languages such as Estonian and, typically, for Bantu languages such as Northern Sotho and Swahili.2 In many languages, however, only nouns of the higher segments of the animacy or definiteness hierarchy are inflected for plural or the languages have no (obligatory) plural at all (see Corbett 2000: 56–66 for a good overview), for instance Bahasa Indonesia, Japanese, Lahu, Mandarin, Nahuatl, Tamil, Tzeltal and Yir Yoront. In Hungarian, the singular form of paired body parts3 is ambiguous as to a singular or collective (plural) interpretation (Behrens 2000; Corbett 2000: 80–81, 211).4 Of course, plural marking is subject to change. In Classical Nahuatl, only animates had (optional) plural marking. Today the plural is more widespread (Hill and Hill 2000: 246, Law 1958, Suárez 1983: 86). In Hopi, the plural has expanded from nouns denoting animates to nouns that refer to entities with prominent shapes such as NOSE, EYE, MOUTH, HEAD in the domain of body parts (Hill and Hill 2000: 249–257). Many languages such as Hopi are borderline cases as to number marking. In Bambara, the plural marker is only obligatory if individual nouns are used to refer, i.e. not in generic or negative contexts and not together with numerals (Bazin [1906]1965: 5, Brauner 1974: 26, Friederike Lüpke, p.c.). The plural marker can also be omitted in (colloquial) Nepali if there is enough contextual information, especially with inanimate nouns (see Turnbull ³1982: 12, 16, Garry and Rubino 2001: 512, Balthasar Bickel, p.c.). Hausa has many plural morphemes, but in particular inanimate nouns prefer the singular if the word is modified by a numeral or quantifiers (Newman 2000: 464). In Quechua the plural is optional and serves to disambiguate in unclear cases (Itier 1997: 52, Liliana Sánchez, p.c.). The use of plural markers is also very restricted in Tibetan (Beyer 1992: 230). When we consider plural marking in the domain of (paired) body parts, we may distinguish three clusters of languages, by combining the morphological property of a majority of inanimate nouns to inflect for plural with the semantics of the plural, i.e. the obligatoriness vs. optionality of plural use with these nouns.5 The first group (see appendix, Table 1) comprises
Two Types of Evidence for Linguistic Relativity
373
languages with obligatory plural marking in the inanimate domain. Here we find mostly languages from Western and Northern Eurasia and Africa (cf. Haspelmath 2005: 142í145 and Nichols 1992: 148), such as Albanian, English, Estonian, Gaelic (Scottish), German, Northern Sotho, Russian, Swahili and Swedish. The second group (see appendix, Table 2) consists of languages without obligatory plural marking of inanimate nouns, which include body-part nouns (Nichols 1992: 146). Here we find languages of the New World, the Pacific as well as South and Southeast Asia (cf. Haspelmath 2005: 142í145 and Nichols 1992: 148) such as Bahasa Indonesia, Japanese, Lahu, Mandarin, Nahuatl, Tamil, Tzeltal and Yir Yoront. I have included Hungarian in this group since body-part nouns are not usually marked for plural (Behrens 2000: 29–30). The third group consists of borderline cases such as Bambara, Hausa, Hopi, Nepali, Quechua and Tibetan (see appendix, Table 3).
2.2
The influence of the noun system on object-categorization tasks
The distinction between languages with and without obligatory number marking is the starting point of recent psycholinguistic experiments which prove that speakers of these two groups of languages manifest important differences in object-classification tasks. Children show an innate capacity for distinguishing between objects and substances before they acquire language (Soja, Carey and Spelke 1991). Nouns for discrete objects are among the earliest acquired words in such typologically different languages as English and Mandarin (Bloom 2000: 91–92). However, Lucy (1992) and Imai and Gentner (1997) show that as speakers grow up, the shape bias is mitigated by linguistic influence in certain domains, i.e. the number marking system may influence the categorization of entities that are ambiguous between a classification according to shape and one according to substance. In an experiment carried out by Imai and Gentner (1997) speakers of English and of Japanese have to sort given entities according to shape or substance. In a forced choice triad task a novel label is given to an entity and subjects have to decide which of the two other entities could share the label. For instance, a cork pyramid has to be classified with respect to one entity of the same shape such as a white plastic pyramid, and to one of the same substance, such as a chunk of cork. Imai and Gentner (1997) distinguish three types of entities and show that the type of entity determines the responses:
374
Wiltrud Mihatsch
a) The classification of so-called complex objects with a complex internal structure and shape, such as a wood whisk, is based on shape for all subjects. b) The classification of simple objects consisting of solid homogeneous material with a simple shape, e.g. a cork pyramid, is mostly based on shape in the case of adult speakers of English, and on material in the case of the speakers of Japanese. c) Substances, e.g. sand arranged in an S-shape, are clearly classified according to the material by most adult Japanese, about half of the adult speakers of English choose the substance, half of them the shape.
American adults
shape substance
substance
substance
substance
100% 80% 60% 40% 20% 0%
complex object simple object
shape
complex object simple object
100% 80% 60% 40% 20% 0%
Japanese adults
Figure 1. Proportions of shape vs. substance-based categorization by adult speakers (adapted from Imai and Gentner 1997: 182, 187)
The explanation of the striking differences between the speakers of the two languages is that count nouns, which prevail in English, routinely draw attention to shape (see Lucy 1992: 89) since counting usually presupposes discrete objects. In other languages such as Japanese or Mandarin, all nouns are transnumeral. Nouns without (obligatory) plural forms conceptually seem to correspond to masses (Lucy 1992: 89) and tend to direct the attention to substance. Thus, the noun system shifts the boundary between shape- and material-based classification in the middle ground between strong individuals and substances. Lucy’s results for speakers of Yucatec and English show the same tendency (Lucy 1992).
Two Types of Evidence for Linguistic Relativity
3 3.1
375
Lexico-semantic change: fossilized categorization processes Lexical innovation and change
The experiments done by Imai and Gentner (1997) and Lucy (1992) study online object-categorization processes. What do they have in common with processes of semantic change? In most cases, a concept is conceptualized via existing labels for other concepts, either via meaning change or wordformation processes, borrowing, or combinations of all three procedures. Thus, the new label usually points to a categorization process. The conventionalization of an innovative label takes place if it is plausible and/or prestigious enough. Therefore, lexical change reflects fossilized categorization processes. So far, known factors that determine the choice of a new label are cognitive (perceptual) universals, socio-cultural aspects and areal influences. This study will show that grammatical typological factors such as number marking also have to be taken into account. 3.2
A comparison of experimental data and diachronic typological data
Obviously, data from experimental classification tasks and lexical change are two very distinct data types, although both are based on conceptualization processes. In the case of experiments, researchers create a carefully controlled laboratory setting and choose a limited number of subjects, controlled for factors such as age and education. For instance, the adult subjects that take part in Imai and Gentner's study are 18 university students for each language. The subjects have a limited amount of time for the forced-choice triad task and a limited choice, since for each decision only two objects, which are specially designed for the experiments in the case of the substances and the simple objects, are presented. The nature of lexical change differs fundamentally from these experiments. The data of lexical change are the fossilized results of innovation processes, and so we usually cannot observe this process online. If enough written records are available, we may have access to incipient lexical change and intermediate stages of conventionalization, as an anonymous reviewer pointed out. Usually, however, the data from lexical change only give us access to the results of the underlying categorization processes. The creators of new lexical units are of course not controlled for any sociolinguistic factors, and we do not know who invented a new label. Very importantly, the categorization targets are completely different. Languages only lexicalize concepts that are familiar, well-known and important enough to be worth being labelled.
376
Wiltrud Mihatsch
Artificial objects of the kind Imai and Gentner (1997) presented their subjects are thus not labelled by lexical items. Therefore, the search for different types of objects as distinguished by Imai and Gentner (1997) and which are regularly expressed by lexemes is tricky. And of course, in semantic change, there are always a large number of possible source concepts; speakers are not limited to a binary choice as in the experiment done by Imai and Gentner. Beside perceptually-based classification, we also expect areal and cultural as well as pragmatic influences. We also have to be aware that a linguistic label associated with a certain concept is chosen to designate a certain set of referents, therefore linguistic factors might interfere with the conceptualization process and linguistic classification might differ from nonverbal classification. For instance, the word-formation processes available in a language may influence the creation of new labels (see Koch 2001, Mihatsch and DvoĜák 2004). Furthermore, the lexicon contains many traces of lexical change at different periods. Thus a change of the system of number marking at some stage might have to be taken into account if we analyse lexical changes that took place a long time ago. On a methodological level, categorization processes underlying lexical change are not always accessible. Many lexical units are very stable or borrowed without any changes of meaning, and diachronic information is therefore not available for these items. Therefore, the project LexiTypeDia has resorted to alternative data types that allow indirect access to lexical change, such as morphological transparency and comparative data (cf. Koch 2004). Comparative data are cognates and polysemous senses of a lexical item that can point to semantic changes. However, this data type gives no information as to the directionality of change between two concepts. All in all, diachronic data seem to be less straightforward and more complex than the experimental data presented in 2.2. However, diachronic data have important advantages, too. In the case of lexical change, a whole speech community has adopted a new conceptualization path, thus paths of semantic change tend to guarantee that a certain innovation seems plausible to a critical number of speakers. And of course, data of lexical change do not have to be elicited in sophisticated experiments, but are readily available for examination. Unlike the experiments, this study will therefore be able to take into account a larger sample of languages, altogether 24.
Two Types of Evidence for Linguistic Relativity
3.3
377
Finding equivalent data in lexico-semantic change
In order to compare experimental and diachronic data, concepts which are equivalent to the three object types distinguished by Imai and Gentner (1997) have to be found in natural languages. The project LexiTypeDia has at its disposal a large amount of data in the domain of body-part nouns. The advantage of body parts is the fact that they are universally named. However, they designate parts, no discrete objects, although many of them are probably conceptualized as discrete objects (Bloom 2000: 109) or at least resemble discrete objects in possessing a clear outline. The salience and universality of body-part nouns explains why many of them are extremely stable; for instance, we find a large percentage of body-part nouns in the Swadesh list (Swadesh 1955), e.g. those denoting HAIR, EYE, EAR or SKIN. They usually do not reveal any diachronic paths. Thus we have to analyse less stable body-part nouns such as nouns denoting EYEBALL, EYEBROW, EYELASH and EYELID instead of EYE, HAIR etc. Among these nouns, substance-like parts such as SKIN and HAIR figure in the Swadesh list and are ruled out for this analysis because they are presumably more stable. Interestingly, simple objects, or rather, parts, that are ambiguous as to the classification in terms of shape or of substance, such as EYELASH, EYEBROW and EYELID, do not figure in the Swadesh list. Maybe nouns for simple objects tend to be lexically more unstable than those denoting substances and more complex, internally structured objects. For these methodological reasons, this analysis will investigate sources and comparative data of the concept EYEBALL, which corresponds to a complex object and EYELID, EYEBROW and EYELASH, which correspond to simple objects. 3.4
Results from lexical change
In almost all analysed languages (see appendix) EYEBALL is named on the basis of round objects, for instance: (1)
Bambara nyè-kili EYE-EGG
(2)
Estonian silma-muna EYE-EGG
(3)
Japanese me-damá EYE-BALL
(4)
Tibetan mig 'bras EYE FRUIT
(5)
Yir Yoront mel-pir EYE-FULL MOON
378
Wiltrud Mihatsch
Data for the usually stable concept EYE also point to such a strategy as the world-wide observable EYE/ SEED polysemy shows (Brown and Witkowski 1983). A similar universal shape-based naming strategy is probably found in the domain of HEAD or SKULL (see Blank/Koch 1999). Size also plays a role here, since here we tend to find ‘larger’ source concepts such as BOWL or PUMPKIN rather than SEED as in the case of EYE. The shape-based shifts are mostly metaphorical. However, in the case of EYELID, EYEBROW and EYELASH we observe more naming strategies, but essentially two paths. They can either be conceptualized via their substance or their shape. EYELASH and EYEBROW can be conceptualized on the basis of HAIR or WOOL, EYELID on the basis of substance-like concepts such as SKIN, FLESH, or BARK. Alternatively, all of them can be conceptualized via their elongated, arc-like shape. Interestingly, the distinction between form- and substance-based semantic changes cuts across the distinction between metaphor and metonymy.
substance
shape
Figure 2. Two conceptualization strategies in the domain of the eye
Very remarkably, the paths in languages with number marking in Table 1 (see appendix) are generally based on shape. For instance, we find the following source for EYELID: (6)
Swedish ögon-lock EYE-LID/ HATCH
The shape in the case of the source concepts LID/ COVER for eyelid maybe is not too obvious and the function of covering might be just as important as the shape. Interestingly, however, the sources based on the concepts LID/ COVER are almost exclusively found in languages with number marking. For EYEBROW we find paths such as: (7)
Albanian vetull < Latin vittula SMALL RIBBON
An example for shape-based conceptualization of EYELASH is: (8)
English eye-lash EYE-FLEXIBLE PART OF A WHIP
Two Types of Evidence for Linguistic Relativity
379
In languages without obligatory number marking of inanimate nouns in Table 2, paths of lexical change reflect conceptualization in terms of substance. Here the naming tendency is clearer and the results are more uniform for the concepts EYELID, EYELASH and EYEBROW. Nouns denoting the EYELID usually derive from nouns denoting SKIN, BARK etc., for instance: (9)
Mandarin yan-pí EYE-SKIN
Those denoting EYEBROW usually derive from nouns meaning HAIR: (10) Yir Yoront mel-thorrchn EYE-HAIR The same is true for nouns meaning EYELASH such as: (11) Bahasa Indonesia bulu mata HAIR EYE In many cases, nouns meaning HAIR are even added to nouns that already mean EYELASH or EYEBROW, such as Mandarin meí-mao EYEBROW-HAIR or jié-máo EYELASH-HAIR, whereas the redundant compounds in Table 1 are only based on EYE, as in English eyebrow, where brow on its own also means EYEBROW. The naming tendency based on substance also prevails in Table 3, although less clearly than in Table 2. In Table 3, which contains data from languages that mark the plural to some extent, both paths are present, sometimes even within one noun, as in Bambara nyè konkon-na-si EYE ANGLE-AT-HAIR ‘eyebrow’. Another important source for nouns denoting EYELID, EYELASH and EYEBROW are shifts among these concepts based on the spatial contiguity of these three concepts, as in Northern Sotho, where ntšhi means EYELID, EYELASH and EYEBROW. Arguably, these are not pure metonymies, since they are usually restricted to these concepts and hardly ever extend to EYEBALL (an exception is found in Northern Sotho), so their arc-like shapes are probably involved, too, and therefore these shifts might also reflect shape-based categorization. Remarkably these shifts are mainly found in languages with number marking, as we can see in Table 1. This shift involving EYELID, EYELASH and EYEBROW is also very frequent in Romance languages (Koch 2003: 97), which also belong to the group of languages with obligatory plural marking. These shifts also seem to occur in languages which mark number to a certain degree such as Hausa and Hopi (see appendix, Table 3). For Hausa we find cognates of gira EYEBROW meaning EYELASH and EYELID that point to internal shifts, which may indicate quite early changes, perhaps related to an earlier stage with more pervasive number marking in the Chadic
380
Wiltrud Mihatsch
language family (Ekkehard Wolff, p.c.). Apparent shifts, i.e. those between nouns designating EYEBROW and EYELASH which are created on the basis of HAIR (found in Table 2 and 3), are plausibly more likely to be parallel lexicalization strategies, as in Quechua ñahui millma EYE WOOL, which means both EYEBROW and EYELASH. Thus, if we assume that the shifts between EYELID, EYELASH and EYEBROW also point to a shape-based categorization, the differences between the language groups are even clearer. However, even if we do not consider the latter data, the result matches the observations made by Imai and Gentner (1997) and Lucy (1992). The data in Table 3 are not presented in Figure 3, since these borderline cases require further analyses, for example the diachronic development and the exact synchronic use of number marking as well as the rough dates of the lexical changes observed. If there is more than one solution per concept and language that reflect the same path, only one of them is counted. obligatory plural (data from Table 1) Number of languages 10
shape substance
8
no obligatory plural (data from Table 2) Number of languages
substance
8
6
6
4
4
2
2 eyelash
eyebrow
eyelid
eyeball
eyelash
eyebrow
eyelid
0 eyeball
0
shape
10
Figure 3. Proportions of shape vs. substance-based categorization in lexical change
4
Conclusion
This study explores for the first time the interaction of perceptual and typological factors in lexical change and compares diachronic data with experimental data. Remarkably, there is a very strong interaction between noun type and the conceptualization reflected in lexical change for certain body parts, comparable with results from nonverbal classification tasks as
Two Types of Evidence for Linguistic Relativity
381
shown by Imai and Gentner (1997) and Lucy (1992).6 Complex objects or parts such as EYEBALL are universally conceptualized via shape. The greatest diversity of paths can be found for simple objects such as EYELASH, EYELID and EYEBROW, clearly in correlation with the noun system. Grammatical typological factors influence naming strategies. Although experimental data and diachronic data are very distinct, the results clearly converge in a moderate version of linguistic relativity.
Acknowledgements The data are mostly taken from the database of the project B6 (LexiTypeDia) (http://www.sfb441.uni-tuebingen.de/b6/). The data have been collected by Angela Dorn, Boštjan DvoĜák, Paul Gévaudan, Isabelle Hiller, Genc Lafe, Wiltrud Mihatsch and in particular Reinhild Steinberg. Thanks to Assibi Amidu, Michael Betsch, Balthasar Bickel, Uschi Drolc, Friederike Lüpke, James A. Matisoff, Liliana Sánchez, Mutsumi Takahashi, Bernhard Wälchli, Ekkehard Wolff, Bettina Zeisler and Ningchuan Zhu for valuable advice. I would also like to thank two anonymous reviewers for their helpful suggestions as well as Véronique and Sam Featherston for the stylistic revision of an earlier version of this paper.
Appendix In Table 1, 2 and 3 we find data from lexical change, mainly taken from the project's database. In the case of available data from semantic change, the source concept is marked by “<”. Morphologically transparent nouns are glossed. If borrowing is accompanied by a change of meaning, the meaning of the source is given. In many cases only cognate meanings (marked by COG) and polysemous senses (marked by POL) are available. Here only those meanings which are plausible sources are indicated. In the case of cognates, the frequency of the cognate meanings is also taken into account, less frequent cognates are excluded, since we can assume that they are more likely derived from the target sense. Source data based on shape are found in the white cases, those on substance in the grey cases. Internal shifts among EYELID and EYEBROW or EYELASH, which point to a shape-based conceptualization (not the ones just between EYEBROW or EYELASH) are marked by a black square xx. Other sources and stable lexical units are crossed out.
382
Wiltrud Mihatsch
Swedish
Swahili
Russian Northern Sotho
German Gaelic (Scot- Estonian English tish)
Albanian
Table 1. Lexical change in languages with obligatory plural marking of inanimate count nouns eyeball
eyelid
eyebrow
eyelash
kokërdhok < GRAIN
qepallë Slavic loanword meaning LID (THAT
vetull
qerpik Turkish loanword qepallë (probably from EYELID)xxx eye-lash EYE-FLEXIBLE PART
SHUTS WITH A BANG)
eye-ball EYE-BALL
eye-lid EYE-EYELID (
silma-muna EYE-EGG
silma-laug EYE-BOARD
clach na sùla
sgàile sùla
STONE VEIL EYE.GEN DEF.ART.GEN.SG.FEM fabhra EYE.GEN < EYELASH xxx rosg < EYE
Aug-apfel EYE-APPLE
SMALL RIBBON
eye-brow xxx EYE-EYEBROW (brow meant EYE- OF A WHIP LID, EYELASH, EYEBROW in OE, maybe rel. to LOG/ BRIDGE) silma-kulm silma-ripse EYE-EYEHILL EYE-EYELASH (
Augen-lid EYE-EYELID (
Two Types of Evidence for Linguistic Relativity
383
Japanese Hungarian
Bahasa Indonesia
Table 2. Lexical change in languages without obligatory plural marking of inanimate count nouns eyeball
eyelid
eyebrow
eyelash
bola mata
kelopak mata WRAPPER/THIN
bulu kening
bulu mata
HAIR EYEBROW
HAIR EYE
BALL EYE
COVERING EYE
szem-golyó EYE-BALL/MARBLE
szem-héj EYE-SKIN
szem-öldök EYE-SUFFIX
szempilla EYE-EYELASH
me-damá EYE-BALL
má-buta EYE-LID/COVER
máyu-ge EYEBROW-HAIR
má-tsu-ge EYE-(OLD) GENHAIR
mİɹȚ-qha=phu mİɹȚ-gǮɊ EYE-MPFX CEREAL- EYE-SKIN
Yir Tzeltal Yoront
Tamil
Nahuatl (Istmo- Mandarin Lahu Mecayapan)
MPFX ROUND OBJECT
mˢɹȚqú=mu mİɹȚmu EYE-OUTER COVER- EYE-MPFX HAIR ING-MPFX HAIR PˢɹȚFǮɊ EYE-[HAIR] STUCK ONTO
yan-zhnj EYE-PEARL
yan-pí EYE-SKIN
meí-mao EYEBROW-HAIR
jié-máo
iyi ʈ x yoʈloj POSS-EYE SEED-
i ʈ x -cajlo' EYE-SKIN
i ʈ x-tzojmi' EYE-HAIR
i ʈ x-tzojmi' EYE-HAIR
puruvam Sanskrit loanword
ka૽૽-imai mayir
s-bak’ sitil POSS-SEED EYE
ka૽૽-imai, ka૽ imai EYE-EYELID s-nuhkulel sitil POSS-SKIN EYE
s-tsotsel sit POSS-HAIR EYE
mel-pir EYE-FULL MOON
mel-pertn EYE-SKIN
macabil, smacab, smatzab COG: EYELASH mel-thorrchn EYE-HAIR
EYELASH-HAIR
POSS
ka૽-ma૽i EYE-PRECIOUS STONE
EYELID HAIR
384
Wiltrud Mihatsch
eyeball
eyelid
eyebrow
nyè-kili EYE-EGG nyè-kisè EYE-GRAIN
nyè-wolo EYE-SKIN, HIDE OR
nyè konkon-na-si nyè-si EYE ANGLE-AT-HAIR EYE-HAIR
śwƗʨya-r ʪ idଇ
fata-r ido SKIN-GEN EYE
pos-völö EYE-BALL
püwüapi pos-talawƾöl-hömi puvùwpi(’at)
Hopi
Hausa
Bambara
Table 3. Lexical change in languages with restricted plural marking of inanimate count nouns
Tibetan
Quechua, Highland Chimborazo
Nepali
Ɨɪkha koJKHUL EYES GEN CIRCLE
eyelash
BELT
nyè-fara EYE-BARK, PEEL gira, girƗ COG EYELASH (most), EYELID xxx
gashi-n ido HAIR-GEN EYE
EYES GEN STRIP OF LEATHER
EYE EYEBROW
ñahui lulun
ñahui cara
ñahui millma
ñahui millma
EYE EGG
EYE SKIN
EYE WOOL
EYE WOOL
ñahui pata EYE WALL
mig 'bras
mig lpags
mig spu
EYE FRUIT
EYE SKIN
EYE HAIR
mig rdog
mig Ğa
rdzi ma/gzi ma <EYELASH smin ma
mig spu <EYEBROW mig gĞog EYE WING/FEATHER rdzi ma/ gzi ma
EYEBROW SUFFIX
EYELASH SUFFIX
EYE GRAIN
EYE FLESH
Two Types of Evidence for Linguistic Relativity
385
Notes 1.
2. 3. 4.
5.
6.
This analysis does not take into account verbal plural agreement (cf. Corbett 2000: 136-137 and Nichols 1992: 143-146). Further analyses may reveal whether plural agreement on verbs in languages which do not inflect number on nouns has the same conceptual consequences as plural markers on nouns. In Bantu languages the singular/ plural distinction is expressed by noun classes. The plural of nouns designating paired body parts might be less marked than the singular. Baayen, Burani and Schreuder (1997) have shown that in Italian nouns designating paired items are semantically unmarked in the plural. In Hungarian and Turkic languages, number words combine with singular nouns, as an anonymous reviewer reported. This is a wide-spread pattern, even in languages with an otherwise obligatory plural. These languages are not treated as languages with an optional plural (see also Haspelmath in press). An anonymous reviewer pointed out that reference should be made to the existence of dual number. All minor numbers such as the dual are here subsumed under the plural, since the existence of a dual in a language implies the existence of a plural, furthermore the plural is always more frequent than the dual (Corbett 2000: 38-39). The same reviewer also rightly mentioned the productive derivation of singular nouns by suffixation from morphologically unmarked collective or transnumeral nouns, which can then be pluralized in languages such as Breton and Arabic (also see Corbett 2000: 32, 36). This sample does not contain such languages, Scottish Gaelic only contains fossilized singulatives (MacAulay 1992: 207-208). It would be interesting to investigate whether the categorization reflected by grammatical categories such as noun classes and numeral classifiers differs fundamentally from lexicalization strategies.
Lexicographical sources Alpher, Barry 1991 Yir-Yoront Lexicon: sketch and dictionary of an Australian language. Trends in linguistics: Documentation 6) Berlin, de Gruyter. Awde, Nicholas 1996 Hausa-English, English-Hausa dictionary. (Hippocrene practical dictionary) New York, Hippocrene. Bailleul, Père Charles 1981 Petit dictionnaire bambara-français, français-bambara. Amersham, Avebury.
386
Wiltrud Mihatsch
Barnhart, Robert K. 1988 The Barnhart dictionary of etymology: [the core vocabulary of standard English]. New York, Wilson. Bazin, Hippolyte 1965 Reprint. Dictionnaire bambara-français. précédé d'un abrégé de grammaire bambara. Paris, Imprimerie Nationale/ Farnborough, Hants, Gregg Press, 1906. Benkö, Loránd, and Béla Büky 1993–1997 Etymologisches Wörterbuch des Ungarischen, 3 vols. Budapest, Akad. Kiadó. Bergman, Peter M. 1980 The basic English – Chinese/ Chinese – English Dictionary. New York, NAL/ Dutton. Bielmeier, Roland 2004 Lexikalische Variation und lexikalischer Wandel im Tibetischen am Beispiel einiger Körperteilbezeichnungen. In Wiltrud Mihatsch and Reinhild Steinberg (eds.), Lexical Data and Universals of Semantic Change. (Stauffenburg Linguistik 35) Tübingen, Stauffenburg, 167– 202. Dalin, Anders Frederik 1964 Reprint. Ordbok över svenska språket [Dictionary of the Swedish language], 2 vols. Stockholm, 1850–1853. Das neue chinesisch-deutsche Wörterbuch = Xin de han cidian 2000 Xiuding ben. Shanghai, Yiwen Chubanshe. Dienhart, John M. 1989 The Mayan Languages. A Comparative Vocabulary, 3 vols. Odense, Odense University Press. Echols, John M. and Hassan Shadily 1989 An Indonesian English dictionary. Rev. and ed. by John U. Wolff, 3rd ed., Ithaca, Cornell University Press. Halász, Elod, Csaba Földes, and Pál Uzonyi 1998 Magyar-német nagyszótar: új német helyesírással = Ungarischdeutsches Grosswoerterbuch: mit neuer Rechtschreibung. (Klasszikus nagyszótárak) Budapest, Akad. Kiadó. 1989 Német-magyar nagyszótár: új német helyesírással. DeutschUngarisches Großwörterbuch mit neuer Rechtschreibung. (Klasszikus nagyszótárak) Berlin, Langenscheidt. Hellquist, Elof 1957 Svensk etymologisk ordbok. [Swedish Etymological Dictionary] 2 vols. 3rd ed., Lund, Gleerup. Herms, Irmtraud 1987 Wörterbuch Hausa-Deutsch. Leipzig,Verlag Enzyklopädie.
Two Types of Evidence for Linguistic Relativity
387
Hill, Kenneth C. (ed.) 1998 Hopi dictionary: a Hopi-English dictionary of the third Mesa dialect; with an English-Hopi finder list and a sketch of Hopi grammar = Hopìikwa lavàytutuveni. Tuscon, University of Arizona Press. Höftmann, Hildegard 1989 Wörterbuch Swahili-Deutsch. Unter Mitarbeit v. Irmtraud Herms. 4th ed., Leipzig, Verlag Enzyklopädie. Huld, Martin E. 1984 Basic Albanian Etymologies. Los Angeles, Slavica Publishers California State University. Joffe, David 2003 Sesotho sa Leboa (Northern Sotho) – English Dictionary. (http://africanlanguages.com/sdp/) Johnson, Frederick 1945 A standard Swahili-English dictionary. (founded on Madan's SwahiliEnglish dictionary) Oxford, Oxford University Press. Kahlo, Gerhard, and Rosemarie Simon-Bärwinkel 1985 Wörterbuch Deutsch-Indonesisch. München, Hueber. Kamusi project (http://research.yale.edu/swahili/kamusi/browse/s/K/c3.htm) Kann, Kallista, Elisabeth Kibbermann, Felix Kibbermann, and Salme Kirotar 1970 Eesti-saksa sõnaraamat. [Estonian-German Dictionary], 2nd ed., Tallinn, Valgus. Kirkeby, Willy A. 2000 English Swahili Dictionary. Dar es Salaam: Kakepela. Kluge, Friedrich 1999 Etymologisches Wörterbuch der deutschen Sprache. Ed. by Elmar Seebold, 23rd ed., Berlin, de Gruyter. Köbler, Gerhard 1993 Wörterbuch des althochdeutschen Sprachschatzes. Paderborn, Schöningh. Krapf, Johann Ludwig 1964 Dictionary of the Suahili Language. Farnborough, Gregg Press. MacLennan, Malcolm 1997 Reprint. Gaelic Dictionary – A pronouncing and etymological dictionary of the Gaelic language. Edinburgh: MacMillan, 1925. Mägiste, Julius 1983 Estnisches etymologisches Wörterbuch. Helsinki, Finnisch-ugrische Gesellschaft. Martin, Samuel E. 1987 The Japanese Language Through Time. (Yale language series) New Haven, Yale University Press.
388
Wiltrud Mihatsch
Matisoff, James A. 1988 The Dictionary of Lahu. (University of California Publications in Linguistics 111) Berkeley et al., University of California Press. Meyer, Gustav 1891 Etymologisches Wörterbuch der albanesischen Sprache, (Sammlung indogermanischer Wörterbücher 3) Straßburg, Trübner. Miller, Wick R. 1967 Uto-Aztecan Cognate Sets. (Univ. of California Publications in Linguistics 48) Berkeley, University of California Press. Moreno Mora, Manuel 1955 Diccionario etimológico y comparado de kichua del Ecuador. Cuenca. Newman, Roxana Ma 1990 An English-Hausa Dictionary. (Yale language series) New Haven, Yale University Press. Orel, Vladimir E. 1998 Albanian etymological dictionary, Leiden/Boston/Köln, Brill. Orr, Carolin, and Betsy Wrisley 1965 Vocabulario Quichua del Oriente del Ecuador. (Serie de vocabularios y diccionarios indígenas "Mariano Silva y Aceves" 11) Quito, Instituto Lingüístico de Verano. Parnwell, Eric C., and K. Venkatasubramaniam 1979 Oxford Picture Dictionary: English-Tamil. 3rd ed., Madras, Oxford University Press. Pfeifer, Wolfgang (ed.) 1993 Etymologisches Wörterbuch des Deutschen. 2 vols. Berlin, Akademischer Verlag. Pickett, Joseph P. et al. 2000 The American Heritage Dictionary of the English Language. 4th ed., Boston, Houghton Mifflin (www.bartleby.com/61/). Sacleux, Charles 1939–1941Dictionnaire Swahili-Français. 2 vols. (Travaux et mémoires de l'Institut d'Ethnologie) Paris, Institut d'Ethnologie. Sagart, Laurent 1999 The Roots of Old Chinese. (Amsterdam studies in the theory and history of linguistic science: Current issues in linguistic theory 184) Amsterdam/ Philadelphia, Benjamins. Schön, James Frederick 1968 Reprint. Dictionary of the Hausa language. Farnborough, Hants, Gregg Press, 1876. Seaman, P. David 1996 Hopi dictionary: Hopi - English, English - Hopi, grammatical appendix. (Northern Arizona University Anthropological Papers 2) Flag-
Two Types of Evidence for Linguistic Relativity
389
staff, Ariz., Northern Arizona University Department of Anthropology. Silvet, Johannes 1989 Inglise-Eesti sõnaraamat [English-Estonian Dictionary], 2 vols. Tallin, Valgus. Simpson, John A., and Edmund Weiner 2000 The Oxford English dictionary. OED Online. Oxford, Oxford University Press (http://oed.com/). Skinner, Neil 1996 Hausa comparative dictionary. (Westafrikanische Studien 11) Köln, Köppe. Stark, Luisa and Pieter Muysken 1977 Diccionario español-quichua quichua español. (Publicaciones de los museos del Banco Central del Ecuador 1) Quito. Stross, Brian 1976 Tzeltal anatomical terminology: semantic processes. Mayan Linguistics, l: 243–267. Svane, Gunnar 1992 Slavische Lehnwörter im Albanischen (Acta Jutlandica LXVIII. Humanist. Reihe 67) Aarhus, Aarhus University Press. Swanson, Richard, and Stanley Witkowski 1977 Hopi Ethnoanatomy: A Comparative Treatment. Proceedings of the American Philosophical Society, 121, 320–337. Thomson, Derick 1986 The New English-Gaelic Dictionary. Glasgow, Gairm Publications. Turnbull, Archibald 1982 Nepali grammar & vocabulary. 3rd ed., New Delhi, Asian Educational services. Turner, Ralph L. 1931 A comparative and etymological dictionary of the Nepali language, London, Kegan Paul, Trench, Trubner & Co. 1966–1985 A comparative dictionary of the Indo-Aryan languages, 4 vols. London, Oxford University Press. Unger, Ulrich 1989 Glossar des klassischen Chinesisch. Wiesbaden, Harrassowitz. University of Madras 1982 Reprint. Tamil Lexicon, 6 vols. Madras, University of Madras, 1924– 1936. Vasmer, Max 1953–1958 Russisches etymologisches Wörterbuch. 3 vols. (Indogermanische Bibliothek: Reihe 2, Wörterbücher) Heidelberg,Winter.
390
Wiltrud Mihatsch
Vendryes, Joseph 1959–1978 Lexique étymologique de l'irlandais ancien. 7 vols. Dublin, Dublin Institute for Advanced Studies/ Paris, CNRS. Voegelin, Carl F., Florence M. Voegelin, and Kenneth L. Hale 1962 Typological and Comparative Grammar of Uto-Aztecan: I (Phonology). (Publications in Anthropology and Linguistics, Memoir 17, Suppl. to IJAL 28.1) Indiana University. Wilkinson, Richard James 1959 A Malay-English dictionary. 2 vols. London, Macmillan. Wolgemuth, Carl, Marilyn Wolgemuth, Plácido Hernández P., Esteban Pérez R. and Christopher Hurst 2000 Diccionario náhuatl de los municipios Mecayapan y Tatahuicapan de Juárez, Veracruz. Coyoacán, D.F., Instituto Lingüístico de Verano. Zhao, Tangshou 1994 Handwörterbuch der Gegenwartssprache Deutsch-Chinesisch, Chinesisch-Deutsch. 2nd ed., Beijing, Peking University Press.
References Baayen, Harald, Christina Burani and Robert Schreuder 1997 Effects of Semantic Markedness in the Processing of Regular Nominal Singulars and Plurals in Italian. In Geert Booij and Jaap van Marle (eds.), Yearbook of Morphology 1996. Dordrecht, Kluwer Academic Publishers, 13–33. Beyer, Stephan V. 1992 The Classical Tibetan Language. (SUNY series in Buddhist studies) Albany, State University of New York Press. Brauner, Siegmund 1974 Lehrbuch des Bambara. Leipzig, Verlag Enzyklopädie. Behrens, Leila 2000 Semantics and typology. STUF Berlin 53, 21–38. Blank, Andreas and Peter Koch 1999 Onomasiologie et étymologie cognitive: l’exemple de la TETE. In Mário Vilela and Fátima Silva (eds.), Atas do 1° Encontro de Linguística Cognitiva. Porto, Faculdade de Letras do Porto, 49–71. Bloom, Paul 2000 How Children Learn the Meanings of Words. (Learning, development, and conceptual change) Cambridge, Mass., The MIT Press. Brown, Cecil H and Stanley R. Witkowski 1985 Polysemy, lexical change and cultural importance. Man, 18: 72–89.
Two Types of Evidence for Linguistic Relativity
391
Corbett, Greville G. 2000 Number. (Cambridge textbooks in linguistics) Cambridge, Cambridge University Press. Garry, Jane and Carl Rubino (eds.) 2001 Facts about the World's Languages. New York and Dublin, Wilson. Gumperz, John J. and Stephen C. Levinson 1996 Introduction: Linguistic Relativity Re-examined. In John J. Gumperz and Stephen C. Levinson (eds.), Rethinking linguistic relativity. Cambridge et al.: Cambridge University Press (Studies in the social and cultural foundations of language 17), 1–20. Haspelmath, Martin 2005 Occurrence of Nominal Plurality. In Martin Haspelmath, Matthew Dryer, David Gil and Bernard Comrie (eds.), The World Atlas of Language Structures. (Book with interactive CD-ROM), pp. 142í145, Oxford, Oxford University Press. Hill, Jane and Kenneth C. Hill 2000 Marked and Unmarked Plural nouns in Uto-Aztecan. In Eugene H. Casad and Thomas L. Willett (eds.), Uto-Aztecan: Structural, Temporal, and Geographic Perspectives. Papers in Memory of Wick R. Miller by the Friends of Uto-Aztecan. Universidad de Sonora, Unison, 241–275. Imai, Mutsumi and Derdre Gentner 1997 A Cross-Linguistic Study of Early Word Meaning: Universal Ontology and Linguistic Influence. Cognition, 62, 169–200. Itier, César 1997 Parlons Quechua. La langue du Cuzco. (Collection: Parlons) Paris/ Montréal, L’Harmattan. Koch, Peter 2001 Lexical typology from a cognitive and linguistic point of view. In Martin Haspelmath, Ekkehard König, Wulf Oesterreicher and Wolfgang Raible (eds.), Language Typology and Language Universals. An International Handbook. (Handbücher der Sprach- und Kommunikationswissenschaft, 20.2) Berlin/New York, de Gruyter, 1142–1178. 2003 Qu’est-ce que le cognitif? In Peter Blumenthal and Jean-Emmanuel Tyvaert (eds.), La cognition dans le temps. Etudes cognitives dans le champ historique des langues et des textes. (Linguistische Arbeiten, 476) Niemeyer, Tübingen, 85–100. 2004 Diachronic onomasiology and semantic reconstruction. In Wiltrud Mihatsch and Reinhild Steinberg (eds.), Lexical Data and Universals of Semantic Change, (Stauffenburg Linguistik 35) Stauffenburg, Tübingen, 79–106.
392
Wiltrud Mihatsch
Law, Howard W. 1958 Morphological Structure of Isthmus Nahuat. IJAL 24, 108–129. Lucy, John A. 1992 Grammatical Categories and Cognition: A Case Study of the Linguistic Relativity Hypothesis. (Studies in the social and cultural foundations of language 13) Cambridge, Cambridge University Press. MacAulay, Donald 1992 The Scottish Gaelic language. In Donald MacAulay (ed.), The Celtic Languages. (Cambridge language surveys) Cambridge, Cambridge University Press, 137–248. Mihatsch, Wiltrud and Boštjan DvoĜák 2004 The concept FACE: paths of lexical change. In Wiltrud Mihatsch and Reinhild Steinberg (eds.), Lexical Data and Universals of Semantic Change. (Stauffenburg Linguistik 35) Tübingen, Stauffenburg, 221– 254. Newman, Paul 2000 The Hausa language. An Encyclopedic Reference Grammar. (Yale language series) New Haven/ London, Yale University Press. Nichols, Johanna 1992 Linguistic Diversity in Space and Time. Chicago/ London, The University of Chicago Press. Soja, Nancy N., Susan Carey and Elizabeth S. Spelke 1991 Ontological categories guide young children's inductions of word meaning: Object terms and substance terms. Cognition 38, 179–211. Suárez, Jorge A. 1983 The Mesoamerican Indian Languages. (Cambridge language surveys) Cambridge et al.: Cambridge University Press. Swadesh, Morris 1955 Towards greater accuracy in lexico-statistic dating. IJAL 21, 121–137.
Reflexives and Pronouns in Picture Noun Phrases: Using Eye Movements as a Source of Linguistic Evidence Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus
1 1.1
Background Binding theory
In English, reflexives (e.g., himself) and pronouns (e.g., him) have a nearly complementary distribution. Judgments on these simple examples are quite solid and remain the same regardless of context: (1)
a. Keni saw himselfi/*j. b. Keni saw himj/*i.
Binding Theory is the set of structural constraints on the relationship between different types of NPs and their (potential) antecedents. It was designed to account for the complementarity of pronouns and reflexives illustrated in (1). A (somewhat simplified) version based on Chomsky (1981) appears in (2): (2)
A: a reflexive must be bound within a local domain (e.g., a sentence). B: a pronoun must be free (=not bound) within that same local domain.
In (1a) the reflexive satisfies the Binding Theory because it is bound by an NP within its sentence; and in (1b) the pronoun satisfies Binding Theory because it is not bound by any NP within its sentence. 1.2
Picture noun phrases
Picture Noun Phrases are noun phrases headed by a “representational” noun such as picture, film, photograph, novel, etc. The head N itself may have
394
Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus
several “arguments”; e.g., Harry and Joe in (3a); the “possessor” and PP are both optional, cf., (3b) and (3c): (3)
a. Harry’s picture of Joe b. a picture of Joe c. Harry’s picture, etc.
1.2.1 Picture Noun Phrases – no possessor Picture noun phrases have received attention in the binding literature because reflexives in picture NPs lacking a possessor phrase may violate Binding Theory (from Pollard & Sag, 1992): (4)
a. John said that [S there was [a picture of himself] in the post office] b. John was going to get even with Mary. That picture of himself in the paper would really annoy her, as would the other stunts he had planned.
Additionally, the judgments on the binding in this construction are less solid and are influenced by context in some cases. Compare (5a) and (5b) where the difference seems to be in the discourse role of the sentence external potential antecedent (from Pollard & Sag 1992): (5)
a. Johni was going to get even with Mary. That picture of himselfi in the paper would really annoy her, as would the other stunts he had planned. b. Mary was quite taken aback by the publicity Johni was receiving. *That picture of himselfi in the paper would really annoy her, as would the other stunts he had planned.
Kuno (1987) identified several pragmatic factors influencing picture NP reflexives, including what he termed “awareness”, (6a) vs. (6b), and “indirect agenthood”, (7a) vs. (7b): (6)
a. John knows that there is a picture of himself in the morning paper. b. *John still doesn't know that there is a picture of himself in the morning paper.
(7)
a. I hate the story about himself that John always tells. b. *I hate the story about himself that John likes to hear.
Using Eye Movements as a Source of Linguistic Evidence
395
The approach taken by Pollard & Sag (1992) and Reinhart & Reuland (1993) to account for these Binding Theory violations is what we will call the Logophor Analysis, which treats reflexives in picture NPs lacking possessor as “logophors”, reflexive noun phrases which are not strictly structurally controlled. They are not subject to structural Binding Theory, but rather are constrained at least in part by discourse variables. The term logophor was first used by Hagege (1974) to describe a particular type of anaphor which refers to someone whose thoughts are being reported (see also Clements 1975); since then the term has been used more broadly to describe instances of “long-distance” reflexives and the picture NP reflexives discussed here (see Reinhart & Reuland 1993). We will follow this usage, but remain agnostic on how similar the reflexives in picture NPs in English are to the logophors found in the languages examined by Hagege and Clements. 1.2.2 Picture NPs with possessors In contrast to those in picture NPs without possessor phrases, reflexives and pronouns in picture NPs with possessors appear to show the complementary distribution typically found in the standard examples. In this case the local domain for Binding Theory is the NP. The standard judgments found in the literature are that the reflexive in (8a) is obligatorily bound by the possessor NP (Harry, here), and that the pronoun in (8b) must be free from that possessor and thus can take the subject or some sentence-external referent as its antecedent. (8)
a. Joe saw [NP Harry’s picture of himself] b. Joe saw [NP Harry’s picture of him]
These judgments support the standard claim, that pronouns and reflexives in picture NPs containing possessors are constrained by Binding Theory. This is the claim of Chomsky (1981, 1986) as well as the proponents of the logophor analysis for reflexives in picture NPs lacking possessors, Pollard & Sag (1992) and Reinhart & Reuland (1993). Thus, we find two different accounts for reflexives in picture NPs. In picture NPs lacking possessors they are logophors and are pragmatically controlled; in picture NPs with possessors they are regular structural reflexives, subject to Binding Theory However, several authors have noted that reflexives in picture NPs containing a possessor may not always take the possessor as antecedent. Con-
396
Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus
sider examples (9a) and (9b), from Kuno (1987, p. 169) and Reinhart & Reuland (1993, p. 683), respectively. The array of stars and question marks indicates that the judgments are not solid and may vary from speaker to speaker, and perhaps from context to context: (9)
a. ok/?/??Maryi isn’t interested in anybody’s opinion of herselfi. b. */?Luciei liked your picture of herselfi.
In addition it seems clear that there is at least a contrast between examples like (10a) and (10b). All speakers will all agree that (10a) is not possible, while (10b) is certainly improved: (10) a. *Johni said that Bill likes himselfi. b. ??Johni liked Bill’s photograph of himselfi. This contrast is unexpected if Binding Theory constrains reflexives in possessed picture NPs as it does those in regular object position. Finally, Keller & Asudeh (2001) and Asudeh & Keller (2001) presented data collected in an web-based study using the magnitude estimation technique. They found that participants accepted equally reflexives and pronouns bound to the subject of the sentence in examples like (11): (11) Hanna found Peter’s picture of herself/her. 1.3
Summary: some concerns
As we have seen, native speaker judgments on binding in picture NPs are not solid. The shakiness of the judgments suggests the need for a better way to collect data. In particular, it is not clear what influence context has on data. For example, are different consultants imagining different contexts, leading to different judgments? Or, are the data not categorical enough to support a solid judgment? In the next section we introduce an approach that tries to address these concerns. 2
Binding theory and picture NPs with possessors
We investigated the interpretation of pronouns and reflexives in possessed picture NPs like (12). (12) a. Joe’s picture of him b. Joe’s picture of himself
Using Eye Movements as a Source of Linguistic Evidence
397
Native English speaker participants followed spoken instructions to manipulate an array of dolls and pictures belonging to those dolls. As they did this we monitored their eye movements. See the display picture in Figure 1.
Figure 1. Display
The display contained three dolls, Ken, Harry and Joe; and behind each doll was a column of pictures of each doll, belonging to that doll. For example, behind Ken was a column of pictures belonging to him, and that column of pictures included a picture of Joe, a picture of Harry and a picture of Ken as well. Participants wore a light-weight headband-mounted eye-tracker. The scene was recorded by a camera on the eye-tracker. The output was video of everything the participant saw, with a cross-hairs indicating place of fixation. In addition to the experimental trials with pronouns and reflexives, participants also heard filler trials containing similar sentences but with full NPs (e.g., Have Joe touch Ken’s picture of Harry.). This set-up provided a visual and linguistic context—a kind of backdrop—against which participants interpreted the reflexives and pronouns they heard. Sample instructions included sentences like (13): (13) a. Have Joe touch Ken’s picture of him b. Have Joe touch Ken’s picture of himself
398
Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus
Which picture participants have the doll touch indicates how they interpreted the pronoun or reflexive. Thus, participants’ target choice provides a kind of “judgment”. If participants choose a picture indicating a particular reading, this means that reading is acceptable or possible. This judgment is collected without having to ask participants if a particular sentence can have a particular interpretation. That is, there is no extra-linguistic task involved. In addition to target choice, information about which potential referents are being considered when comes from the pattern and timing of the eye movements. In recent years, a body of psycholinguistic research has established that eye movements can provide important insights into how listeners comprehend spoken language (for reviews see chapters in Henderson & Ferreira, 2004, Tanenhaus & Trueswell, 2004). When listeners hear spoken utterances in task-relevant visual contexts, eye movements to potential referents, and objects relevant to establishing reference, are closely timelocked to the input (Cooper, 1974; Tanenhaus, Spivey-Knowlton, Eberhard & Sedivy 1995). The time course of looks to potential referents in the scene is closely time-locked to the point in the utterance where the word, phrase or anaphor denoting the referent is disambiguated (Chambers, Tanenhaus, Eberhard, Filip & Carlson, 2002; Eberhard, Spivey-Knowlton, Sedivy & Tanenhaus, 1995; Hanna, Tanenhaus & Trueswell, 2003). Importantly, potential referents are fixated in proportion to the likelihood of that referent being the intended target of the spoken materials (Allopenna, Magnuson & Tanenhaus, 1998). Thus it becomes possible to use the proportion of looks to potential referents to infer the likelihood that the listener was considering that referent (Tanenhaus, Magnuson, Chambers & Dahan, 2000). Given this, the timing and pattern of looks to potential referents provides information about which alternative referents were considered as the utterance unfolded. In addition, because the action performed by the participant reveals the interpretation assigned to an anaphor on each trial, we can conduct actioncontingent analyses (cf. Runner, Sussman & Tanenhaus, 2003) in which we can divide up our data on the basis of the binding compatibility of participant response. In this way, we can examine the earliest moments of processing for particular interpretations, and determine which potential referents were considered on trials in which participants did or did not perform actions consistent with Binding Theory. In principle, the approach we have developed allows us to manipulate both visual context and linguistic context. However, the results presented here involve fairly simple manipulations, focusing on linguistic context.
Using Eye Movements as a Source of Linguistic Evidence
399
Further work extending the paradigm to more complex linguistic and visual context manipulations is also under way. The data presented in what follows are drawn from a series of experiments. In all cases we looked at picture NPs with possessors. All instructions contained either reflexives or pronouns. The first question we addressed was whether reflexives in picture NPs with possessors were constrained by Binding Theory. Recall that while the standard literature claims that they are, there have been some indications that this assumption may be incorrect (Kuno 1987, Reinhart & Reuland 1993, Asudeh & Keller 2001, Keller & Asudeh 2001). Seated in front of the display pictured in Figure 1, our participants heard instructions containing two parts. First there was a “lead-in” phrase, directing the listener’s attention towards one of the dolls; then there was an “action” sentence, instructing them to manipulate one of the dolls. For example, the set of instructions in (14): (14) a. Look at Ken. b. Have Joe touch Harry’s picture of him/himself. The predictions of Binding Theory are the following. The pronoun should take the noun phrase mentioned in the lead-in (Ken) or the subject NP (Joe) as its antecedent, not the possessor NP (Harry). That means Binding Theory predicts that on the pronoun condition participants should have the doll Joe touch Harry’s picture of Ken or Harry’s picture of Joe. Additionally, the reflexive should take the possessor (Harry), not the lead-in (Ken) or subject (Joe), so on the reflexive condition participants should have Joe touch Harry’s picture of Harry. The overall target choice results are in Figure 2 (from Runner, Sussman & Tanenhaus, 2003). On the pronoun trials participants chose a Binding Theory compatible referent on about 95% of trials. However, on the reflexives trials participants chose a BT compatible referent on only about 75% of trials. Participants were more likely to violate Binding Theory with reflexives than with pronouns (see Runner, et al., 2003 for discussion).
400
Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus
1 0.9 0.8 0.7 0.6
Pronoun Reflexive
0.5 0.4 0.3 0.2 0.1 0 BT Compatible
BT Incompatible
Figure 2. Proportion of Binding Theory compatible and incompatible target choices on the pronoun and reflexive conditions.
Proportion of Trial
1 0.8 0.6
Pronoun
0.4
Reflexive
0.2 0 Lead-in
Subject
Possessor
Target Choice
Figure 3. Proportion of lead-in, subject and possessor target choices for pronouns and reflexives.
Using Eye Movements as a Source of Linguistic Evidence
401
Figure 3 provides the detail of which referents were actually chosen as targets on the pronoun and reflexive conditions (from Runner, Sussman & Tanenhaus, 2004). We see that the pronouns took the NP in the lead-in phrase as antecedent on about 75% of trials and the subject on about 22% of trials. The reflexive took the possessor on about 75% of trials and the subject on about 23% of trials. This illustrates that both pronouns and reflexives can take the subject as antecedent. Such a breakdown in complementarity is not expected if reflexives and pronouns are both constrained by Binding Theory. In addition to the target choice data, we analyzed the participants’ timing and pattern of eye movements. This type of data can indicate which potential referents are under consideration at what points during processing. Eye movements are closely time-locked to the unfolding speech, though there is about a 200 ms lag to program an eye movement so saccades beginning sometime after that are the most informative (Tanenhaus, SpiveyKnowlton, Eberhard & Sedivy, 1995). 0.7 0.6
Proportion of looks
0.5 0.4 0.3 <-
h i m s e l f ->
0.2 0.1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 Time from onset (ms) Lead-in
Possessor
Subject
Figure 4. Proportion of looks over time to the lead-in, possessor and subject on the reflexive trials.
Figure 4 illustrates the proportion of looks to the relevant pictures in the display over time (from Runner et al., 2003). What the graph shows is that as the word himself unfolds looks to all three pictures increase. Looks to the
402
Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus
lead-in drop off first, as participants hear the reflexive. However, for an additional 200 ms or so looks to both the possessor and the subject increase together. This suggests that both the BT compatible possessor and the BT incompatible subject are being considered as potential referents from the earliest moments of processing. Note that by the end of the trial participants’ pattern of looks mirrors their overall target choices. The pattern and timing of participants’ eye movements suggests that both the subject and the possessor are possible referents for the reflexive in the picture NP containing a possessor (see Runner et al., 2004 for discussion). Eye movements can help us understand to what degree there is indeed “competition” from the subject NP, that is, how seriously do participants consider the subject as a potential antecedent. The strongest case for the claim that participants do consider the subject as a potential antecedent comes from trials in which they chose the possessor as the target. On these trials, their target choices were consistent with Binding Theory. Did participants still consider the subject on trials where they chose the possessor as target? 1 0.9
Proportion of looks
0.8 0.7 0.6 0.5 0.4 0.3
h i m s e l f -->
<--
0.2 0.1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 Time from onset (ms) Lead-in
Possessor
Subject
Figure 5. Proportion of looks over time to the lead-in, possessor and subject on the possessor target reflexive trials.
In Figure 5 we isolated the reflexive trials in which participants chose the possessor (from Runner et al., 2003). The graph illustrates the same pattern found in the overall results: from the earliest moments of processing par-
Using Eye Movements as a Source of Linguistic Evidence
403
ticipants considered both the BT compatible possessor and the BT incompatible subject. And this was the case even on trials where they ultimately chose the BT compatible interpretation. To summarize, participants’ target choices suggested that Binding Theory did constrain pronouns in picture NPs containing possessors (Joe’s picture of him). However, Binding Theory did not seem to constrain reflexives in the same construction (Joe’s picture himself); alongside the BT compatible possessor, the subject appeared to be an alternative antecedent. The timing and pattern of eye movements suggested that overall the subject competed with the possessor as potential antecedent. This was the case even on trials in which the possessor was the target choice. The next section will sketch an analysis of the data patterns found here. 3
Logophor analysis
What we have found is that pronouns in picture NPs with possessors are constrained by Binding Theory and that reflexives are not. What we will pursue is an extension of the logophor analysis, originally proposed for reflexives in picture NPs lacking possessors (Pollard & Sag 1992, Reinhart & Reuland 1993). We will explore the possibility that reflexives in picture NPs with possessors are logophors. By doing this, we can unify the treatment of reflexives in picture NPs: all are logophors, not just those in picture NPs lacking possessors. To test the logophor analysis we must find ways in which logophors differ from structural reflexives. One such way is their behavior under ellipsis. Reinhart & Reuland (1993) and Grodzinsky & Reinhart (1993) argue that structural reflexives are bound variables under ellipsis, which means they receive the “sloppy” reading only. Consider (15). In this example the elided herself can only refer to Lili. (15) Lucie likes herself, and Lili does [e], too However, logophors (like pronouns) can be coreferential under ellipsis, which means both a “strict” and a sloppy reading are available to them, as (16) illustrates. Here the elided herself can refer to either Lili or Lucie. (16) Lucie liked the picture of herself, and Lili did [e], too We can test the logophor analysis on our picture NPs containing possessors by applying NP ellipsis and observing what readings are available for
404
Jeffrey T. Runner, Rachel S. Sussman, and Michael K. Tanenhaus
the elided reflexive. Consider a context in which a museum is going out of business and is selling its portraits of the Kennedys. In that context, the elided himself in (17) can be JFK, which involves the coreferential reading (from Runner, Sussman & Tanenhaus, 2002). This provides some preliminary support for the logophor analysis. (17) Jimmy bought JFK's portrait of himself for $500 not realizing he could’ve bought the museum’s [e] for just $100 in its going out of business sale. In order to test the logophor analysis experimentally, we used the same display as in Figure 1. Now participants heard a series of three instructions, illustrated in (18). The first had them pick up the doll they would be using. The second provided an action sentence similar to those studied in the previous experiment. And the next instruction provided a second action sentence. This sentence contained either a full picture NP containing a possessor or an elided one, indicated in the example with angled brackets. That is, half of the trials contained the material in brackets and half of them did not. (18) Pick up Joe. A: Have Joe touch Ken’s picture of him/himself. B. Now have Joe touch Harry’s