Roots
≥
Studies in Generative Grammar 96
Editors
Henk van Riemsdijk Jan Koster Harry van der Hulst
Mouton de Gruyter Berlin · New York
Roots Linguistics in Search of its Evidential Base
Edited by
Sam Featherston Wolfgang Sternefeld
Mouton de Gruyter Berlin · New York
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.
The series Studies in Generative Grammar was formerly published by Foris Publications Holland.
앝 Printed on acid-free paper which falls within the guidelines 앪 of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication Data Roots : linguistics in search of its evidential base / edited by Sam Featherston, Wolfgang Sternefeld. p. cm. ⫺ (Studies in generative grammar ; 96) Includes bibliographical references and index. ISBN 978-3-11-019315-2 (cloth : alk. paper) 1. Linguistic analysis (Linguistics) 2. Linguistics ⫺ Research ⫺ Methodology. 3. Corpora (Linguistics) 4. Computational linguistics. I. Featherston, Sam. II. Sternefeld, Wolfgang, 1953⫺ P126.R665 2007 410⫺dc22 2007044365
Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.d-nb.de.
ISBN 978-3-11-019315-2 ISSN 0167-4331 쑔 Copyright 2007 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Cover design: Christopher Schneider, Berlin. Printed in Germany.
Contents
Introduction: The evidential base of linguistics – Work in progress . . . . . Sam Featherston and Wolfgang Sternefeld
1
Portuguese: Corpora, coordination and agreement . . . . . . . . . . . . . . . . . . . . Doug Arnold, Louisa Sadler and Aline Villavicencio
9
Contributing to the extraction/parenthesis debate: Judgement studies and historical data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Katrin Axel and Tanja Kiziak Quantifying quantifier scope: A cross-methodological comparison . . . . 53 Oliver Bott and Janina Radó Is syntactic knowledge probabilistic? Experiments with the English dative alternation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Joan Bresnan Psycholinguistic perspectives on grammatical representations . . . . . . . . . 97 Harald Clahsen Early language separation: A longitudinal study of a Russian-German bilingual child. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Elena Dieser ‘I need data which I can rely on’: Corroborating empirical evidence on preposition placement in English relative clauses . . . . . . . . . . . . . . . . . . 161 Thomas Hoffman Locality and accessibility in wh-questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Philip Hofmeister, T. Florian Jaeger, Ivan A. Sag, Inbal Arnon and Neal Snider
vi
Contents
Eye Tracking as a tool to investigate the comprehension of referential expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Anke Karabanov, Peter Bosch and Peter König Corpus data and experimental results as prosodic evidence: On the case of stressed auch in German. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 Denisa Lenertová and Stefan Sudhoff The retrieval and classification of Negative Polarity Items using statistical profiles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Timm Lichte and Jan-Philipp Soehn Geographic distributions of linguistic variation reflect dynamics of differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 John Nerbonne and Wilbert Heeringa Focus and verb order in Early New High German: Historical and contemporary evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Christopher D. Sapp Contrastive topics in pairing answers: A cross-linguistic production study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Stavros Skopeteas and Caroline Féry Coordinate structures: On the relationship between parsing preferences and corpus frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Ilona Steiner Adverbs and sentence topics in processing English . . . . . . . . . . . . . . . . . . . 361 Britta Stolterfoht, Lyn Frazier and Charles Clifton, Jr.
List of contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
The evidential base of linguistics: Work in progress Sam Featherston and Wolfgang Sternefeld
A range of factors have led to remarkable revival of interest in issues of the empirical base of linguistic theory in general, and the status of different kinds of linguistic evidence in particular. Amongst these we must count the technological, which has made many sorts of linguistic data more available and more analyzable, and the theoretical, which has revealed itself in a spreading realization that more attention to detail in data can lead to both broader understanding of wider issues and increased insights into detailed questions. There are at least two aspects to the technological changes which have brought about an increased interest in more empirically founded linguistic work. Both of these are driven by the rise of the personal computer. It is difficult to express the growth in storage capacity, interconnectedness, and computation power of the computer without illustrating them from personal experience. In 1983 Sam Featherston wrote a program to help solve crossword clues, but could only load small parts of the dictionary into his computer’s 48 kilobytes of memory at a time. Neither of us currently know how many gigabytes of memory sit under our desks, but we do not expect ever to fill them. In fact we suspect that our chances of filling them are receding, since bulkier language resources are perfectly accessible remotely, and do not need to be kept locally. There is thus hardly any upper bound any more to the feasible size of collections of samples of real language use, nor any limitation imposed by their physical location. But not only size and access count: the ability of today’s computers to process and search these large quantities of data means that they can be used at great speed. This has allowed the field of corpus linguistics to grow and mature, but has also brought corpuses to the non-specialist linguist. Not only the size of collections but also the sophistication of their architectures, annotation schemes, and search possibilities have made corpus data a central tool in language study. The availability of computers has had another effect too: it has supported the extension of the experimental approach to more areas of linguistics. Although controlled studies can be carried out without computers, the monitor
2 Sam Featherston and Wolfgang Sternefeld as display device and keyboard as input device lend themselves readily to experiments. And only computers can integrate the stimulus and input in different modalities, such as is required by eye-tracking for example, and measure time delays in milliseconds, which forms a central part of many methodologies. One might add that the statistical calculations necessary on the detailed results of these quantitative approaches are also much less forbidding on a computer, as anyone who has worked out even means and standard deviations with pencil and paper will know. Computer technology has thus made data-orientated approaches to linguistics more attractive. But the availability of these data types has coincided with an increased theoretical interest in the findings. In the study of language at the sentence level, the 1980s and perhaps early 1990s were a period in which the data basis was granted relatively little attention. It is necessary to re-read some works from the 1960s and 1970s to become aware the extent to which this preference for idealization from the primary data was a change from previous practice. The tipping point of rationalist linguists, as we might call it, may have come roughly at the appearance of Chomsky’s Minimalist Program (Chomsky 1993). While this work was perhaps the high tide mark of the deductive approach to grammar, avowedly using as the starting point for its grammar architecture that which is ‘necessary’, it too contains certain features which appear to be motivated by descriptive requirements. There is the introduction of competition into the definition of well-formedness (Sternefeld 1996; Müller & Sternefeld 2001), which seems to be a response to the pressure of the evidence, and the promotion of ‘economy’, which can be seen as an attempt to reduce the use of abstract functional projections to account for everything and anything in syntax over the previous ten years. We thus already see signs that calling something ‘an empirical question’ was ceasing to be a way of dismissing it as a matter of little importance. Other papers from the period make the change clear too. We might refer to (Pollard & Sag 1994), in which the authors use as their reference point the ‘empirical domain’, and, while they respect criteria such as ‘simplicity and conceptual clarity’, place greater weight on ‘conformity with the facts’ (p. 4). That the authors felt obliged to spell these priorities out reveals that they were not mainstream assuptions at the time. Schütze’s (1996) seminal review of judgements and methodology and Cowart’s (1997) practical applications are examples of how the wind direction had changed. The results of these technological developments and academic paradigm shift in our field are papers such as the ones in this volume. These articles
The evidential base of linguistics: Work in progress
3
illustrate the progress which has been made but also reveal where practical, analytical, and interpretational problems have been encountered. Researchers have found that it is not sufficient to collect more data in order to gain greater insight – in this way they may sympathize with their predecessors in the 1960s, whose sometimes disappointed hopes of the clear confirmation of their theoretical positions in experimental data may have led to the reduction in interest in the collection of data which we are now once more reacting against. Nevertheless there are clear gains in understanding and individual successes, which these papers testify to. There are a number of aspects of the work reported in these papers which will be of interest to those who aspire to empirical adequacy in their research. Most of the papers report work using more exact and controlled methods of data gathering and data analysis, or at least the use of such methods to clarify questions which have not previously been addressed in this way; indeed we may say that this was a criterion for inclusion. The questions addressed and the approachs taken in the papers are varied, but they have the common theme of exploring what the new data implies for the field of linguistics. Rather than merely describing the data, the ambition is to interpret the findings either within existing theoretical models or in contrast to existing models, so as to gain new insights into language structures and advances in linguistic theory. A wide range of data types are used and discussed, often experimental (e.g. Clahsen; Karabanov et al.), or frequency-based (e.g. Bresnan; Steiner; Arnold et al.), but also those from other sources such as questionnaires and observation (e.g. Dieser; Sapp). Many use familiar methods in new ways or combine sorts of data in order to gain insights (e.g. Axel & Kiziak; Hofmeister et al.). Several also include explicit reflexion on the necessary conditions for more controlled data types to provide added value for linguistic theory (Clahsen; Bott & Radó). Several papers (e.g. Lichte & Soehn; Nerbonne & Heeringa), provide innovative analytical approaches to their data: often these papers reveal insights into succesful techniques, but also perils and pitfalls. Many of the papers consider data from more than one language or variety, a relatively simple way of improving the generalizability of results by broadening the data base and gathering what Clahsen calls converging evidence. While such evidence does not exclude the possibility of the findings being due to an artifact of the methodology or the assumptions underlying the approach, cross-linguistic generalizations are particularly valuable to linguists who place a high value on seeking grammatical universals.
4 Sam Featherston and Wolfgang Sternefeld A fine example of more sophisticated data being put to use for linguistic ends is the paper by Karabanov, Bosch & König. They use an eye-tracker to follow subjects’ visual attention shift in response to different types of noun phrases in auditory input. Such behavioural data has a strong claim as direct evidence of the process of comprehension: actions speak louder than words. The results provide evidence that pronouns are processed very similarly to full noun phrases, but that a distinction may need to be made between referential and non-referential pronouns. The studies by Lenertová & Sudhoff provide another good example of the wealth of information which is made accessible by experimental techniques. These researchers carried out a triple investigation on the prosodic correlates of the identification of the constituent that the German additive particle auch (‘also’) is associated with. The data reveals a complex picture with no one-to-one correspondence of prosody and interpretation, but in a complex descriptive field such as this, detailed data is required to get beyond vague generalizations. Stolterfoht, Frazier & Clifton gather self-paced reading times on clause-sized text chunks which similarly reveal detailed evidence which only tightly controlled quantitative measures can provide. The focus of their study is the hypothesis that a language such as English, which has a fairly rigid word order, will reveal evidence of being sensitive to information structural constraints in the same way that languages with looser word order do. The additional effort involved in gathering evidence with finer definition allows stronger conclusions to be drawn about the structure studied. While the effects of information structure are not so readily visible in English, closer data reveals that speakers are nevertheless sensitive to them. Other contributions focus more on measures of occurrence. Ilona Steiner takes a close look at frequency data to test whether parsing preferences as revealed in reading time studies correspond to production preferences as expressed in corpus data. This turns out to require several stages of discounting irrelevant information, but the careful work is rewarded with the result that the two measures are indeed correlated. This would suggest that corpus data, if carefully analysed, can be used to draw conclusions about models of sentence processing. This is an important piece of work in crossdata type correlation with implications for grammars and processing models too. Lichte & Soehn report their work on extracting lexical candidates as negative polarity items NPIs from a German corpus. The paper makes interesting reading as a case study in the care required and the multiple para-
The evidential base of linguistics: Work in progress
5
meters which need to be addressed in order to maximize both the quantity and the quality of the output. The authors are cautious enough to underline that the data can really only produce negative evidence that a lexical item does not belong to one of the groups of interest, not positive evidence that it does. Here too there is no such thing as a free lunch, but hard work pays off. Many other papers report work using more than one data type. For example Hofmeister, Jaeger, Sag, Arnon & Snider report the results of four separate studies aiming to clarify the role played by factors such as locality and accessibility in the realisation possibilities of multiple wh-questions. The results from their magnitude estimation and self-paced reading studies show how complex such issues are, and that multiple factors must be taken into consideration. This is important work in direction of teasing apart variables of processing and putative grammar. Thomas Hoffmann first looks at preposition placement with English relative pronouns in corpus data and finds that certain structures do not occur. Why not? Are these accidental gaps or grammar-driven exclusions? He therefore carries out experimental studies to clarify these questions, the results of which allow him to group the corpus data for statistical tests. A nice example of very different data types complementing each other and contributing to the larger picture with their specific strengths. Other papers too demonstrate conclusions and predictions derivable across phenomena and data types. Joan Bresnan takes corpus frequencies enriched with fairly detailed information on context and derives continuation probabilities from these. These predictions are tested and found to be accessible to informants in continuation production tests, which shows that these probabilities must be represented in the mind of the speaker. Models of syntactic knowledge need to account for this apparent statistical information. Two papers combine experimentally obtained judgements of the contemporary language with historical data to provide an additional perspective. Christopher Sapp looks at clause-final verb clusters in Early New High German and compares it with contemporary regional varieties. The question whether object focus affected the relative positioning verbs in clusters is difficult to answer conclusively on the basis of historical textual evidence alone, because of data sparseness and fuzziness in the corpus. The statistical analysis of the historical data is supported by interview data with speakers of Swabian and Viennese dialects and by a judgement experiment with speakers of Austrian standard German. The evidence from each source is
6 Sam Featherston and Wolfgang Sternefeld fairly fine, but in sum it makes a convincing case, a significant achievement for a subtle effect such as focus in a historical variety. The other paper employing historical data, that of Axel & Kiziak, does so to answer a question about modern language: whether apparent long whextraction in German should not rather be analyzed as a single clause with a parenthetical comment inserted. Two data types complement each other here. First, the evidence from predicate restrictions on extraction obtained from judgement experiments using the pattern matching technique demonstrates differences between unambiguous long extractions and the controversial construction, which no confounding factors are able to account for. Second, historical analysis of Old High German shows that, while the controversial construction occurs in texts from this period, the simple verbsecond complement clause, from which it is derived, is not clearly attested at all. In this way two very different sources of evidence would seem to converge in supporting the parenthetical analysis of this German structure. Nerbonne & Heeringa test the gravity model of linguistic dynamism using quantified data on dialectal differentiation from the Netherlands. While the data (all gathered during the twentieth century) is basically synchronic and directionless, these measures of linguistic similarity nevertheless allow conclusions to be drawn about dynamic processes of dialectal change and the factors favouring and inhibiting them, since the patterns produced by diachronic processes should be visible. This paper takes perhaps the most sophisticated approach to quantified data in this volume. A very different approach towards stiffening the evidential base of linguistics is to take a cross-linguistic perspective. Skopeteas & Féry describe their on-going work on the encoding of information structure, using questionnaires and elicitation of prosodic forms to look at single and double questions in English, German, Georgian, and Greek. The work reveals clear generalizations about the range and mix of word order options and prosodic features that languages use to represent information structural content. Other papers include explicit reflection on what contribution data can make to the advancement of theory. Harald Clahsen, for instance, identifies three conditions for psycholinguistic evidence to be relevant to those interested in the representation of language systems. First, potential confounds must be excluded, so that we can be sure that the data addresses the issue at hand; next, there should be converging evidence, that is, corroboration from multiple sources; and third, the data should demonstrate its appropriateness by confirming or falsifying existing linguistic theories. He goes on to show in three programmes of research, involving child language, morphological
The evidential base of linguistics: Work in progress
7
processing, and the language of speakers with learning impairments, how these criteria are applied in practice and what insights can be gained. Many papers critically discuss problems with the availability and interpretation of data types and report the solutions to these limitations. The work using frequencies collected from web searches by Arnold, Sadler & Villavicencio is particularly interesting for those linguists who see the web as the single unique corpus of the future. Their study of agreement patterns in Portuguese noun phrases clearly suggests that web data can provide linguistically interesting and valid information at a level of reliability which allows us to construct linguistic analyses on the basis of the differential frequencies found. An useful step in the validation of the web as a linguistic resource. Other authors too reflect on the value of their evidence. Elena Dieser’s paper reveals an interesting case where the data which has traditionally been advanced for a hypothesis can be seen on closer inspection to be inadequate. Her study of a child growing up bilingually differs from most other studies in not using the one-person-one-language pattern of bilingual child care. She demonstrates effectively that the child recognizes and distinguishes between mono-lingual and multilingual speakers long before he has translation equivalents in the two languages. His differential production thus reveals evidence of language separation long before his absolute production does. The traditional data criterion is thus shown to be inadequate. We shall last mention Bott & Radó whose paper addresses a very central question for a volume such as this. They report their work comparing methodologies for the elicitation of intuitions for semantic theory, aiming to measure the relative availability of readings of structures with two quantifiers. The studies reveal themselves to be very well worthwhile, since the results show real differences which any single study would have implied were systematic. There are clearly pitfalls to avoid, and since theory construction requires a firm empirical base, there is more work to be done on this field. We hope that this volume gives the reader a taste of the visible advances but also the hard work still necessary in building a more empirical linguistics particulary in work at the sentence level, and thus succeeds in the same way that the conference Linguistic Evidence 2006 did. We would like to thank and congratulate those many members of the Sonderforschungsbereich 441 here in Tübingen who willingly gave their time and effort to make the conference a success, but above Beate Starke and Marga Reis. We thank also the chair of the Programme Committee Ewald Lang, and the
8 Sam Featherston and Wolfgang Sternefeld colleagues who reviewed papers: Steven Bird, Joan Bresnan, Greg Carlson, Harald Clahsen, Anette Frank, Jost Gippert, Georg Kaiser, John Nerbonne, Karel Oliva, Janet Pierrehumbert, Mark Steedman, Shravan Vasishth, Tilman Berger, Veronika Ehrich, Erhard Hinrichs, Johannes Kabatek, Stephan Kepser, Claudia Maienborn, Uwe Mönnich, Frank Richter, and Hubert Truckenbrodt.
References Chomsky, Noam 1993 A minimalist program for linguistic theory. In The View from Building 20, K. Hale & S. Keyser (eds.). Cambridge, MA: MIT Press. Sternefeld, Wolfgang 1996 Comparing Reference Sets. In The Role of Economy Principles in Linguistic Theory, C. Wilder, H.-M. Gärtner & M. Bierwisch (eds.), 81–114. Berlin: Akademie-Verlag. Müller, Gereon & Wolfgang Sternefeld 2001 The Rise of Competition in Syntax. A Synopsis. In Competition in Syntax, G. Müller & W. Sternefeld (eds.), 1–68. Berlin /New York: Mouton de Gruyter. Cowart, Wayne 1997 Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks, CA: Sage. Schütze, Carson T. 1996 The Empirical Base of Linguistics: Grammaticality Judgements and Linguistic Methodology. Chicago: University of Chicago Press.
Portuguese: Corpora, coordination and agreement Doug Arnold, Louisa Sadler and Aline Villavicencio
1. Introduction This paper reports some results from a corpus study of Portuguese, and explores their implications for the analysis of agreement processes involving coordinate structures (CSs), especially as regards gender agreement within noun phrases (NPs).1 Agreement phenomena have received considerable attention in recent years, but agreement involving CSs, and NP-internal agreement processes have received less attention. As will appear, this cannot be taken as a reflection of inherent theoretical interest. Some of the data discussed here appear to be novel, and to pose a serious challenge for existing analyses of coordinate structures. One goal of this paper is to suggest how they can be overcome. More generally, the study demonstrates the value of corpus data in challenging existing analyses, requiring a more sophisticated view of phenomena. It also raises some interesting methodological issues. The paper is structured as follows. Section 2 introduces some basic ideas about agreement in general, and what is standardly assumed about Portuguese. Section 3 describes the corpus study itself, and the results. The key conclusion is that Portuguese agreement is more complex than has generally been assumed hitherto. Section 4 discusses the theoretical implications, and provides a relatively theory neutral and intuitive analysis of the facts about Portuguese agreement as they emerge. The main point is that, contrary to what is assumed in most approaches to agreement, CSs must make several kinds of agreement information available at the same time. Section 5 summarises the discussion and provides some brief comments of a methodological nature.
2. Background In general terms, ‘agreement’ refers to the phenomenon where the form of one element (the ‘agreement target’) varies depending on properties of another (the ‘agreement controller’). For example, the following show that
10 Doug Arnold, Louisa Sadler and Aline Villavicencio Portuguese nominals control agreement for number and gender on determiners and adjectives. (1)
o teto colorido the.MSG ceiling.MSG coloured.MSG ‘the coloured ceiling’
(2) *os/a/as teto the.MPL/the.FSG/the.FPL ceiling.MSG coloridos/colorida/coloridas coloured.MPL/coloured.FSG/coloured.FPL ‘the coloured ceiling’ Agreement phenomena in general have received considerable attention in recent years. However, the main focus has been on subject-predicate agreement, at the expense of other forms of agreement, notably head-modifier agreement. In particular, there has been relatively little work on the problems posed by head-modifier agreement when the agreement controller is a Coordinate Structure (CS). It turns out that extending analyses based on noncoordinate structures to deal with CSs raises non-trivial problems. In particular, CSs appear to be able to control agreement in a variety of different ways. The two agreement strategies which are most widely attested crosslinguistically involve (syntactic or semantic) resolution and ‘closest conjunct agreement’ (CCA). Resolution strategies are familiar from many languages (for discussion and references see e.g. Corbett 1991; Dalrymple & Kaplan 2000; Wechsler & Zlatić 2003). Intuitively, under a resolution strategy agreement involves properties of the CS as a whole – more precisely, the agreement properties of a CS are some function of the properties of the conjuncts and the CS as a whole. In the case of Portuguese, this agreement strategy gives rise to examples like (3). (3)
o teto e a parede coloridos the.MSG ceiling.MSG and the.FSG wall.FSG coloured.MPL ‘the coloured ceiling and wall’
Here, plural agreement has been triggered on the adjective coloridos because the preceding CS is plural (e.g. it denotes a plurality). Masculine gender has been triggered because the CS contains a masculine conjunct (masculine is the default resolution gender in Portuguese – leaving aside
Portuguese: Corpora, coordination and agreement
11
cases of CCA, feminine agreement is only possible if all conjuncts are feminine). Under a CCA strategy, by contrast, rather than agreeing with the CS as a whole, agreement targets agree with just the closest conjunct. CCA is perhaps less familiar than resolution, but it is nevertheless widely attested. It has been observed in, inter alia, Irish, Welsh, Spanish, Arabic, and Ndebele. (e.g. McCloskey 1986; Corbett 1991; Sadler 1999; Camacho 2003; Moosally 1999; Yatabe 2004). Though it does not seem to have been much discussed in the theoretical literature on Portuguese, the existence of this strategy has been noted in descriptive grammars of Portuguese. de Almeida Torres (1981) gives examples like (4): (4)
no povo e gente hebreia on the.MSG population.MSG and people.FSG hebrew.FSG ‘on the hebrew people’ (de Almeida Torres, 1981)
Here we see that the postnominal adjective is feminine and singular, like the last conjunct, even though it semantically modifies the whole preceding CS (which contains a masculine noun, and so might be expected to trigger masculine agreement). These examples involve postnominal agreement, which is what we focus on here. However, a few words about the behaviour of prenominal adjectives and determiners are in order. As regards gender, it seems that in Portuguese CCA is required for prenominal modifiers and determiners modifying coordinated nominals. For example, in (5) the presence of a masculine conjunct in the CS is not sufficient to permit masculine agreement on the prenominal adjective and noun, which must agree with the closest conjunct in gender. (5)
suas/*seus próprias reações ou julgamentos his.FPL/*his.MPL own.FPL reactions.FPL or judgements.MPL ‘his own reactions or judgements’
As regards number, matters are less clear, and proper discussion would take us too far from the focus of this paper. Part of this complexity arises from the existence of ‘single entity’ readings of CSs (as in examples like my friend and colleague) which are semantically singular. Even leaving cases like this, there seems to be evidence of both CCA and resolution for number
12 Doug Arnold, Louisa Sadler and Aline Villavicencio in Portuguese. Example (6) shows resolved number – a plural determiner and adjective with a CS which is semantically plural, though it consists of singular nominals (prováveis (‘probable’) is plural, but is not marked morphologically for gender); and (7) shows CCA for number – a singular determiner with a CS that is again semantically plural. (6)
Os prováveis diretor e ator principal são the.MPL probable.PL director.MSG and actor.MSG principal.MSG are Gus van Sant e Johnny Depp, respectivamente. Gus van Sant and Johnny Depp respectively ‘The likely director and main actor are, respectively, Gus van Sant and Johnny Depp.’
(7)
O presidente e amigo comeram juntos. the.MSG president.MSG and friend.MSG ate.3PL together ‘The president and (his) friend ate together.’
However, the issue is complex and somewhat controversial, and not essential to the main point of this paper, and we will not pursue it.2 To summarise: NP internally, Portuguese shows clear evidence of two agreement strategies involving CSs: CCA (postnominally, and prenominally as regards gender), and resolution (postnominally, and perhaps also prenominally for number). Leaving aside the matter of prenominal number, these might be represented schematically as in (8) and (9), respectively. (8)
CCA
for number and gender:
DETNUM,GEN
(9)
NNUM,GEN qp NNUM,GEN NNUM,GEN
APNUM,GEN
Resolved number and gender: DETNUM,GEN
NNUM,GEN qp NNUM,GEN NNUM,GEN
APNUM,GEN
Portuguese: Corpora, coordination and agreement
13
The existence of two patterns raises an obvious question about their relative frequency. As we have noted, CCA in Portuguese has not been much discussed in the literature, and one might wonder if this is because it is rare or marginal. In order to investigate this, a corpus study was undertaken, which will be described below, and whose quantitative results give a clear answer to this question (CCA is not rare or marginal). As it turns out, this study also raises (and answers) an interesting qualitative question, which has not previously been considered: are these the only patterns of agreement that are found? As will appear, some of the examples produced by the study seem to show the existence of ‘mixed’ agreement strategies, whose existence has not been previously noticed, and which have significant implications for the analysis of agreement with CS.
3. Corpus study This section reports the results of a corpus based study into the agreement strategies used for NP internal agreement involving CSs, focusing especially on gender agreement for post-nominal dependents. In order to estimate the approximate frequencies with which the agreement strategies are used, a Web based corpus investigation was performed by means of searches using the Google API service.3 Occurrences of coordinated nominals followed by adjectives were found by posing Google queries of the general form (10). (10) "
* e * " Here ART stands for instances of the Portuguese (definite and indefinite) articles, ADJ stands for instances of Portuguese plural adjectives, and e is the Portuguese conjunction e (‘and’). The adjectives were extracted from the 1,528,590 entry NILC Lexicon.4 Because we were interested in the correlation between the gender of each of the nominals and the gender of the adjective, only adjectives that overtly reflect gender distinctions were used (9,915 masculine and 9,811 feminine adjectives). The results returned by the queries were manually inspected to remove noise – in cases of putative CCA this entailed removing all cases where, in the judgement of a Portuguese native speaker, the adjective should be interpreted as modifying only the the closest nominal, rather than the CS as whole.
14 Doug Arnold, Louisa Sadler and Aline Villavicencio The overall results found are displayed in Table 1, where ‘Frequency’ indicates the number of hits returned by Google for the searches, and ‘N1’, ‘N2’ and ‘ADJ’ refer to the gender of the first conjunct, second conjunct, and adjective, respectively.5 Table 1. Frequency of Masc vs. Fem Adjectives Modifying Mixed Gender Coordinations of Nominals. Frequency (a) (b) (c) (d)
0 4054 626 550
total
5230
N1
N2
ADJ
f m f m
m f m f
f m m f
Interpretation (Resolve to f) (Resolve to m) (CCA/Resolve to m) (CCA)
The first thing to notice here is that there are no instances of a feminine nominal conjoined with a masculine triggering feminine agreement (row (a)). That is, no instances of the form (11), which would be instances of resolution to feminine, or perhaps ‘furthest conjunct agreement’. This is not particularly surprising, but it supports our implicit assumption that cases of feminine gender agreement where a CS contains a masculine conjunct are indeed cases of CCA, and not some special ‘resolution to feminine’ strategy. (11) [ NF conj NM] ADJF Similarly, row (b) is unsurprising. This row reports the count of cases which are schematically of the form in (12), where a conjoined masculine and feminine trigger masculine agreement. Leaving aside the possibility of ‘furthest conjunct agreement’, these are unambiguously cases of resolution to masculine, and they are very frequent (almost 80% of cases). (12) [ NM conj NF] ADJM The cases counted in row (c), which are schematically like (13), are ambiguous – they might be either cases of resolution to masculine, or CCA with the masculine conjunct. (13) [ NF conj NM] ADJM
Portuguese: Corpora, coordination and agreement
15
The most interesting case is row (d), which gives the number of cases of the form in (14). These are unambiguously cases of CCA (resolution would produce masculine agreement on the adjective). (14) [ NM conj NF] ADJF The interesting point is that they are not at all infrequent. Even on the narrowest interpretation, disregarding all ambiguous cases from row (c), CCA for gender is evidently widespread: the ratio of (d) cases to the total is 550/5230, or slightly over 10%. If these data are representative, the odds on speakers using CCA are better than 1 in 10. We can conclude that while resolution is the dominant strategy for postnominal gender agreement, CCA is by no means rare or marginal. Apart from this quantitative finding, the study also threw up some unexpected qualitative results. Among these results were examples such as (15), which is schematically something like (16). (15) Esta canção anima os corações e mentes This song animate the.MPL hearts.MPL and minds.FPL brasileiras. Brazilian.FPL (16) DETM [ NM conj NF] ADJF What this shows is CCA for gender both prenominally and postnominally, with different effects. In this example, prenominal CCA has produced masculine agreement (recall that CCA for gender appears to be obligatory in Portuguese, so this cannot be a case of resolution to masculine on the determiner), at the same time, postnominal CCA has produced feminine agreement (resolved agreement would have made the adjective masculine). Given that a language exhibits CCA, and has both prenominal and postnominal dependents, it is perhaps not surprising that this should occur. However, the possibility seems not to have been previously considered, and its existence is a significant result, with important theoretical implications, which we will take up below. A second kind of case which appears not to have been previously noticed is exemplified in (17) to (21), which are schematically of the form (22).
16 Doug Arnold, Louisa Sadler and Aline Villavicencio (17) todo o constrangimento e a dor sofridas all.MSG the.MSG embarrassment.MSG and the.FSG pain.FSG suffered.FPL ‘all the embarrassment and pain suffered’ (18) o drama e a loucura vividas the.MSG drama.MSG and the.FSG madness.FSG lived.FPL ‘the drama and the madness experienced’ (19) o aprendizado e a experiência vividas the.MSG learning.MSG and the.FSG experience.FSG lived.FPL ‘the accumulated learning and experience’ (20) o romantismo e a morbidez profundas the.MSG romanticism.MSG and the.FSG morbidity.FSG deep.FPL da alma alemã of the soul German ‘The romanticism and morbidity of the German soul’ (21) uma relação entre sobrecarga do organismo e a relation between overload of the organism and envelhecimento e morte prematuras aging.MSG and death.FSG premature.FPL ‘A relation between overload of the organism and premature aging and death’ (22) [ NMSG conj NFSG] ADJFPL What these examples seem to show is postnominal CCA for gender (the adjective is feminine, like the last conjunct, even though the CS contains a masculine nominal) combined, simultaneously, with resolution for number (the individual conjuncts are singular, but the adjective is plural). These cases raise an interesting theoretical issue, because not only has the existence of such cases not been previously noticed, it seems not even to have been considered as a possibility. We will look at the theoretical implications of this in Section 4, below. Such cases also raise an interesting methodological issue, because though these are all attested examples, some native speakers of Portuguese are uncomfortable with them (not including the present author who is a native speaker of Brazilian Portuguese). In this context, it is worth looking at some other quantitative results. Table 2 summaries the number of examples found which involved coordinations of singular nominals (that is, a strict subset of the examples sum-
Portuguese: Corpora, coordination and agreement
17
marised in Table 1). Since all results feature plural adjectives, these are all cases of number resolution. The cases showing CCA for gender – that is, at least the cases in row (d) – thus show this ‘mixed’ agreement strategy of resolved number and CCA for gender. As the table shows, this strategy appears in 90 cases, which is approximately 4.6% of all the cases counted in Table 2, and about 4.9% of all the cases that could show this effect (i.e. all the cases where the final conjunct is feminine, i.e. rows (b) and (d)). This seems to us to be an interestingly large number, which combined with their acceptability to some speakers means that the phenomenon deserves theoretical attention, and should not be dismissed out of hand (however, we will say a little more about the methodological issue raised here in Section 5). Table 2. Frequency of Masc vs. Fem Adjectives Modifying Mixed Gender Coordinations of Singular Nominals. Frequency (a) (b) (c) (d)
0 1737 137 90
total
1964
N1
N2
ADJ
f m f m
m f m f
f m m f
Interpretation (Resolve to f) (Resolve to m) (CCA/Resolve to m) (CCA)
To summarise the results of this section: we have shown (a) that while gender resolution is the dominant agreement strategy postnominally, CCA is by no means infrequent or marginal, and (b) that Portuguese agreement is more complex than has been previously assumed. In particular, in addition to ‘pure’ resolution with prenominal CCA for gender, we also see prenominal and postnominal CCA operating independently, and a mixed postnominal strategy that involves CCA for gender with resolved number. Schematically, these strategies may be represented as in (23) to (25). The following section will consider the theoretical implications of this. (23) Resolved number and gender: DETNUM,GEN
NNUM,GEN qp NNUM,GEN NNUM,GEN
APNUM,GEN
18 Doug Arnold, Louisa Sadler and Aline Villavicencio (24) CCA for number and gender: DETNUM,GEN
NNUM,GEN qp NNUM,GEN NNUM,GEN
APNUM,GEN
(25) CCA for gender, resolved number: DETNUM,GEN
NNUM,GEN qp NNUM,GEN NNUM,GEN
APNUM,GEN
4. Linguistic analysis and theoretical implications In this section, we will consider some of the theoretical implications of the Portuguese data presented above, showing how an account of the data can be formulated. In the interests of generality, we will keep the presentation as intuitive and framework-neutral as possible.6 We will begin with resolution. In general, resolution can be modelled by a grammatical mechanism which ‘calculates’ the set of resolved agreement features to be associated with the coordinate structure as a whole: this set of resolved features then controls agreement on agreement targets (e.g. Dalrymple & Kaplan 2000). So far as we can see, it is reasonable to assume that number resolution in Portuguese is simply a matter of semantics: CS are plural just in case they denote a plurality or group of some kind. This is expressed in (26).7 (26) The number value on a CS resolves to plural just in case the CS denotes a plurality. As regards gender, it seems safe to assume that masculine is the default resolution gender, or to put it another way, the resolved gender value on a CS is masculine if it contains one or more masculine conjuncts, and feminine only if all conjuncts are feminine:8 (27) The gender value on a CS resolves to feminine iff all conjuncts are feminine, otherwise it is masculine.
Portuguese: Corpora, coordination and agreement
19
In principle, one might try to treat CCA in a similar fashion to resolution – a CS would have a single set of agreement properties calculated from properties of the conjuncts, but rather than involving calculations reflecting principles like (26) and (27), the calculation would simply return values from one designated conjunct (the last one, say). Such an approach might be unproblematic in a language which has only CCA. In a case like Portuguese which has both resolution and CCA, one might try to give every CS a single number and gender value, but allow the values to be calculated in one of two ways: either (a) by a resolution method, or (b) by a CCA method copying the value from (say) the last conjunct. Most existing approaches to agreement involve some kind of ‘single feature’ approach like this. Notice that such an approach predicts that all agreement processes will involve the same set of features.9 The Portuguese data clearly indicate that this sort of approach cannot be correct in general. First, the fact that prenominal CCA for gender is obligatory, while postnominally either CCA or resolution are possible indicate that CSs cannot be assigned a single agreement value: they need at least two sets of values, one for CCA, and one for resolution based agreement. Moreover, as we have seen, examples such as (15), repeated here, show CCA operating both prenominally and postnominally, with different effects. So we cannot manage with just one set of ‘CCA agreement’ features – we need two, one to for prenominal CCA, and one for postnominal CCA. (28) Esta canção anima os corações e mentes brasileiras. This song animate the.MPL hearts.MPL and minds.FPL Brazilian.FPL It seems the simplest way to approach this is to assume three sets of number/gender features: one reflecting resolved values, one for ‘leftwards’ CCA (i.e. CCA on prenominals), and one for ‘rightwards CCA’ (CCA on postnominals). Let use call these ‘RESOL’, ‘LAGR’ and ‘RAGR’. The behaviour of these features will be governed by principles such as the following: (29) The RESOL values of a CS are calculated from the features of the conjuncts, according to principles such as (26) and (27). (30) The LAGR values of a CS come from its leftmost conjunct. (31) The RAGR values of a CS come from its rightmost conjunct. The existence of such features on CSs raises the question of what agreement features the conjuncts have, when they are not themselves CSs. One can
20 Doug Arnold, Louisa Sadler and Aline Villavicencio imagine two approaches. The first would define features like LAGR, RAGR and RESOL only on CSs – ‘normal’ nominals would have only normal agreement features. But this is unattractive: it would complicate the statement of normal agreement principles, which would have to be different depending on whether the agreement control was a CS or not. It would also complicate the statement of the agreement percolation principles inside CSs. If instead, we assume that these features are defined on nominals of all kinds, a much simpler picture emerges. To begin with, we need some principle like the following, to capture the fact that non-coordinate structures exhibit only one kind of agreement behaviour: (32) In non-coordinate nominal structures the values of RAGR, LAGR and RESOL are identical. (One way of implementing this would be to make it a lexical requirement of nouns, which is inherited by nominal projections; if noun phrases are analysed as DPs, it would be stated as a requirement on Ds and their associated Ns that is inherited by DPs). Now, (30) and (31) can be stated more precisely, and with complete generality, as (33) and (34): (33) The LAGR values of a CS are the LAGR values of its leftmost conjunct. (34) The RAGR values of a CS are the RAGR values of its rightmost conjunct. These principles can be seen at work in (36), representing the CS in (35).10 (35) o aprendizado e a experiência the.MSG learning.MSG and the.FSG experience.FSG
Portuguese: Corpora, coordination and agreement
21
NP
(36)
RESOL RAGR LAGR
mpl fs ms
N RESOL RAGR LAGR
DET ! o
NP ms ms ms
RESOL RAGR LAGR
NP RESOL RAGR LAGR
ms ms ms
! aprendizado
CONJ ! e
fs fs fs NP RESOL RAGR LAGR
DET ! a
fs fs fs N RESOL RAGR LAGR
fs fs fs
! experiência Briefly, the lexical nouns aprendizado (‘learning’) and experiência (‘experience’) are lexically specified as masculine singular and feminine singular, and these values appear for all the agreement features. These same values appear on the non-CS nodes that dominate them, as required by (32). The mother node of the CS has LAGR ms from its left daughter and RAGR fs from its right daughter (as required by (33) and (34)). Its RESOL value is masculine because one of the daughters is masculine, reflecting (27); its RESOL number is plural because it denotes a plurality, reflecting (26). Precisely how agreement is handled, given structures like (36), will depend on assumptions about the mechanics of determiner-noun and adjectivenoun agreement, but the underlying principles will be roughly as follows:11 (37) Post-head modifiers must share either: a. their agreement controller’s RESOL values (resolved agreement); or b. their agreement controller’s RARG values (‘full’ CCA); or c. their agreement controller’s RESOL.NUMBER and RARG.GENDER values (‘mixed’ CCA /resolution).
22 Doug Arnold, Louisa Sadler and Aline Villavicencio (38) Determiners and pre-head modifiers must share their agreement controller’s LAGR.GENDER (CCA for gender) The adjective modernos in (39) exemplifies (37a); the adjective monástica in (40) exemplifies (37b); sofridas in (41) exemplifies (37c); and próprias (‘own’) and suas (‘his/her’) in (42) exemplify (38). (39)
o homem e a mulher modernos [ the.MSG man.MSG and the.FSG woman.FSG ] modern.MPL ‘the modern man and woman’
(40)
estudos e profissão monástica [ studies.MSG and profession.FSG ] monastic.FSG ‘monastic studies and profession’
(41)
o constrangimento e a dor sofridas [ the.MSG embarrassment.MSG and the.FSG pain.FSG ] suffered.FPL ‘all the embarrasment and pain suffered’
(42) suas próprias reacoes ou julgamentos his.FPL own.FPL [ reactions.FPL or judgements.MPL ] ‘his own reactions or judgements’ Notice that these principles also apply equally, and unproblematically, in the case of a non-coordinate agreement controllers (the account of agreement is thus uniform for CSs and non-coordinate structures, as one would wish). For example with in a noun like teto (‘ceiling’) in (43) all the principles in (37) produce exactly the same effect, because in a non-CS all the agreement features have the same values. (43) o teto colorido the.MSG ceiling.MSG coloured.MSG ‘the coloured ceiling’ Notice also that, as well as being ‘uniform’ in this sense, this account is consistent with a very standard idea of locality for agreement processes: percolation of features means that agreement can always be stated as a relation between an agreement controller and its sister(s). Principles like those above appear to account for the data, and from a descriptive point of view they are attractive – they provide a simple conceptual and descriptive vocabulary for the analysis of Portuguese and other
Portuguese: Corpora, coordination and agreement
23
complex agreement systems. But there is clearly a theoretical cost in terms of the introduction of features which might not be otherwise required. However, it is worth pointing out that some proliferation of features seems to be required independently, because somewhat different features are required for handling NP-internal agreement processes, like those we have examined here, and NP-external agreement processes like subject-predicate agreement. Familiar examples of this involve so-called ‘hybrid nouns’ (Corbett 1991), which can trigger different kinds of agreement on different targets. For example, in Spanish the title Majestad (‘Majesty’) is feminine, so it triggers feminine agreement on attributive adjectives and determiners. However, if it refers to a male individual, it triggers masculine agreement on a predicative adjective (cf. e.g. Corbett 1991; Kathol 1999; Wechsler & Zlatić 2003): (44) Su Majestadi Suprema esta contento. Pron.F Majesty Supreme.F is happy.M ‘His Supreme Majesty is happy.’ In this context, it is interesting to ask whether Portuguese CCA might be ‘NP-bounded’ – a purely NP-internal process, which might limit the number of features required. Examples like the following suggest that it is not. (45) …(que) o travestismo e a copulação …(that) the.MSG transvestism.MSG and the.FSG copulation.FSG ritual são realizadas para expressar o propósito… ritual be.PL realized.FPL to express the goal… ‘…(that) the transvestism and the ritual copulation are produced to express the goal…’ Here we see the CS o travestismo e a copulação ritual (‘the transvestism and the ritual copulation’) triggering plural agreement on the predicate (the verb são (‘be’) and the participle realizadas (‘realized’)), which would be consistent with a resolution strategy for number. However, we also see that realizadas is marked feminine – i.e. apparently agreeing with the closest conjunct copulação ritual. That is, subject predicate agreement may sometimes involve CCA. An obvious objection to the analysis we have described is that it is stipulative, and does not really capture the fact that CCA is closest conjunct agreement. That is, the principles we have given could equally well be
24 Doug Arnold, Louisa Sadler and Aline Villavicencio phrased so as to yield furthest conjunct agreement, which is not observed in Portuguese. However, furthest conjunct agreement is observed in some languages (e.g. Slovene Corbett 1983). Moreover, notice that any account which tries to express CCA directly, as ‘closest’ conjunct agreement, will be in danger of losing one of the attractions of our account – the fact that it is consistent with standard ideas of locality. For example, an attempt to formulate such an account using any kind of conventional phrase structure will require agreement relations to hold between aunts and nieces, as well as sisters. Moreover, attempts to deal with ‘closeness’ in terms of purely linear adjacency of agreement controllers and targets appear problematic: several of the examples we have given above involve CCA between determiners and nouns which are not adjacent (see, e.g. (5), (6), and (17)).12
5. Conclusion The foregoing has presented some novel data and conclusions about Portuguese agreement. In particular, we have presented data which suggest that CCA is more widespread than has generally been assumed. We have also presented data which suggest that agreement involving CSs is more complex than has been assumed, in ways that challenge existing analyses of agreement. In particular, we have argued that CSs do not possess a single set of agreement features (because both ‘resolved’ and ‘closest conjunct’ features are needed, and because information about the conjuncts at both ends of a CS may be needed for CCA). We have presented an analysis which captures these facts, and is consistent with a uniform treatment of agreement involving CS and non-coordinate structures. The discussion involves what we take to be an interesting mix of ‘empirical’ (e.g. corpus based) and more traditional ‘theoretical’ linguistic investigation and analysis, a mix which is increasingly common, and productive. It also raises a number of methodological issues which deserve brief attention. One relatively straightforward methodological point is that this study is of necessity based on interpreted corpus data: it is not enough to find appropriate sequences of CSs and modifiers in corpora, it is essential to limit attention to cases where the interpretation makes it clear that the modifier scopes over the whole CS. Not only is there no conflict between corpus methods and methods based on ‘native speaker intuition’ here, both are actually necessary.
Portuguese: Corpora, coordination and agreement
25
A second, rather obvious, methodological point involves the value and limitations of corpus data. On the one hand, the value of corpus data comes out clearly: the existence of examples like those above in corpora force one to consider the possibility of CCA operating differently in different directions, which one might not have expected, a priori. On the other hand, getting relevant data can be extremely difficult due to various complicating factors – notably of course the fact that even large corpora do not typically show all possible variations and combinations of the phenomena one is interested in. Here one is naturally drawn to constructing examples. But this is not straightforward, because native speakers are often uncertain about the status of some examples. In particular, it seems that some speakers reject examples involving postnominal CCA for gender with resolved number (i.e. example like (17) to (21)).13 Of course speakers’ acceptability judgements are notoriously unreliable (cf. e.g. Schütze, 1996), especially judgements of unacceptability. And in fact, experience indicates that this sort of conflict between corpus data and intuition is rather rare. It is much more common for exposure to corpus data to persuade speakers that their intuition are over-restrictive.14 But this just makes the problem harder to deal with when it arises. In the case of a web-based study such ours one cannot appeal to any pre-existing quality control (e.g. that the texts have been authored and proof-read by native speakers). One may observe, as we did above, that one has many examples of the relevant kind (in our case, 90). But how many is ‘many’? In the case of web-based queries, there is no useful estimate of the total number of words in the corpus, but we found 4.9% of cases that could have shown the relevant pattern did show it (cf. Table 2). Is this a significant number? We are inclined to think that it is rather a large number to be just the result of ‘noise’ – that is, simple mistakes and the like. On the other hand, we note that the normal standard for statistical significance is 0.05, or 5%, so one could argue that it is statistically non-significant.
Notes 1. The research was supported by the AHRB Project Noun Phrase Agreement and Coordination, MRGAN10939/APN17606. We are grateful for useful comments from many people, including: the anonymous referees for, and participants at, the LingEvid2006 conference held in Tübingen in February 2006; participants at HPSG05 in Lisbon; participants at the ‘Alliance 05 Project’
26 Doug Arnold, Louisa Sadler and Aline Villavicencio
2.
3. 4. 5.
6. 7.
8.
9.
10.
11.
Workshop held at Paris 7 in Oct 2005; numerous colleagues at Essex, and Mary Dalrymple and Irina Nikolaeva. See http://privatewww.essex.ac.uk/~louisa/agr/ NPagreement.html for more information. For example, King & Dalrymple (2004) claim that a singular determiner can only modify a CS with a ‘single entity’ (‘boolean’ or ‘joint’) interpretation, as in o presidente e diretor da Air France ‘the.MSG president.MSG and director.MSG of Air France’, where it is assumed that the president and director are one and the same individual. On the face of it, (7) is a counter-example to this claim. Another complicating issue is that the presence of material between the determiner and nominal may exert an influence on acceptability. In the (acceptable) example (6) the subject noun phrase is Os provęaveis diretor e ator principal (‘the probable director and main actor’) with the adjective provęaveis (‘probable’) intervening between the determiner and noun. Omitting it seems to have a deleterious effect, so *Os diretor e ator principal… (‘the director and main actor…’) is judged unacceptable. See http://www.google.com/apis. See http://www.nilc.icmc.usp.br/nilc/index.html. One interesting point which we will not pursue is that the figures seem to show a strong bias for masculine conjuncts to precede feminine conjuncts (feminine conjuncts precede in only 626/5230 cases). This is probably a reflection a prescriptive bias in favour of this ordering of conjuncts. For a fully worked out formal treatment, see Villavicencio et al. (2005). An example of a CS which does not denote a plurality is given in note 2. A counter-example to our assumption would be a CS containing a plural nominal that triggered singular agreement, where this could not be analysed as a case of CCA. In fact, we have not found any examples of CSs involving plural nominals triggering singular agreement at all. A counter-example would be a CS which contains a masculine conjunct, but triggers feminine agreement, where this cannot be attributed to CCA. As noted in Section 3, no such cases were found in our study. In fact, the ‘single feature’ approach is already known to be inadequate in other languages. Sadler (2003, 1999) shows that it will not work in Welsh, where different agreement processes can target the resolved and the CCA features at the same time, indicating that a CS must be able to have both resolved and CCA features simultaneously. However, Sadler suggests that any one agreement process can only access one kind of feature. The Portuguese data suggest that this is over-restrictive. This representation makes a number of assumptions about the analysis of CSs, e.g. that the conjunction forms a constituent with the final daughter, and that the CS is an NP, rather than (say) a CONJP; none of these assumptions is critical. This formulation evades the issue of number agreement for prenominal adjuncts – we leave open the question of whether they show resolution or CCA for number (or indeed both). Nothing we say hangs on this.
Portuguese: Corpora, coordination and agreement
27
12. It is true that what intervenes may be an adjective which also agrees with the noun, but this is irrelevant: the adjective is not the agreement controller for the determiner. 13. Notice that this is the only case that is problematic in this way. All speakers seem happy with cases of prenominal CCA with postnominal resolution, and cases where pre-and post-nominal CCA give different effects. Thus, the main theoretical claims of the paper are not affected by this issue about data. 14. The following is a simple and uncontroversial example of this. It has sometimes been claimed that alternately cannot be used with or, cf. John was alternately hot and /*or cold. Many speakers accept this judgement at first glance. However, a search of the British National Corpus yields several examples of alternately… or which seem to be fully acceptable to all speakers, and which lead them to revise their judgement – e.g. [they] spent almost three hours in each other’s arms, alternately making love or talking in low whispers.
References Camacho, José 2003 The Structure of Coordination: Conjunction and Agreement Phenomena in Spanish and Other Languages. Dordrecht: Kluwer. Corbett, Greville G. 1983 Hierarchies, Targets and Controllers: Agreement Patterns in Slavic. London: Croom Helm. 1991 Gender. Cambridge: Cambridge University Press. Dalrymple, Mary & Ronald M. Kaplan 2000 Feature indeterminacy and feature resolution in description-based syntax. Language 76 (4): 759–798. de Almeida Torres, Artur 1981 Moderna gramática expositiva da Língua Portuguesa. Sao Paulo: Martins Fontes. Kathol, Andreas 1999 Agreement and the syntax-morphology interface in HPSG. In Studies in Contemporary Phrase Structure Grammar, R. Levine & G. Green, (eds.), 209–260. Cambridge/New York: Cambridge University Press. King, Tracy H. & Mary Dalrymple 2004 Determiner agreement and noun conjunction. Journal of Linguistics 40: 69–104. McCloskey, James 1986 Inflection and conjunction in Modern Irish. Natural Language and Linguistic Theory 4 (2): 245–282.
28 Doug Arnold, Louisa Sadler and Aline Villavicencio Moosally, Michelle J. 1999 Subject and object coordination in Ndebele: and HPSG analysis. In Proceedings of the WCCFL 18 Conference, S. Bird, A. Carnie, J. D. Haugen & P. Norquest (eds.). Somerville, MA: Cascadilla Press. Sadler, Louisa 1999 Non-distributive features in Welsh coordination. In Proceedings of LFG 1999, M. Butt & T. H. King (eds.). Stanford, CA: CSLI Publications. 2003 Coordination and asymmetric agreement in Welsh. In Nominals: Inside and Out, M. Butt & T. H. King (eds.), 85–118. Stanford, CA: CSLI Publications. Schütze, Carson T. 1996 The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press, Chicago, Il. Villavicencio, Aline, Louisa Sadler & Doug Arnold 2005 An HPSG account of closest conjunct agreement in NP coordination in Portuguese. In Proceedings of the 12th International Conference on Head-Driven Phrase Structure Grammar, Lisbon, S. Müller(ed.). Stanford, CA: CSLI Publications. Wechsler, Stephen & Larisa Zlatić 2003 The Many Faces of Agreement. Stanford, CA: CSLI Publications. Yatabe, Shuichi 2004 A comprehensive theory of coordination of unlikes. In Proceedings of the HPSG04 Conference, S. Müller (ed.), 335–355. CSLI Publications, Katholieke Universiteit Leuven.
Contributing to the extraction/parenthesis debate: Judgement studies and historical data Katrin Axel and Tanja Kiziak
1. Introduction* For German constructions as in (1), henceforward ‘controversial construction’, two analyses have been discussed in the theoretical literature. (1)
Wen denkst du hat Ede angerufen? whom think you has Ede called ‘Whom do you think Ede has called?’
Some linguists have analysed the controversial construction as a long whextraction from an embedded verb-second clause (e.g. Thiersch 1978; Grewendorf 1988; Staudacher 1990; Haider 1993). Others have proposed that it is a monoclausal extraction with a verb-first parenthetical insert (e.g. Andersson/Kvam 1984, Reis 2002). (2)
Extraction analysis Wen1 denkst du [CP t1 [C hat Ede angerufen t1]]? whom think you has Ede called
(3)
Parenthetical analysis Wen1 [denkst du] hat Ede angerufen t1? whom think you has Ede called
The extraction analysis is feasible because it can be shown that German permits (i) dependent verb-second clauses and (ii) long extractions in other contexts. Regarding (i), some matrix predicates optionally select dependent V2 clauses (V2-clauses) without the complementizer dass ‘that’, (4b), besides the standard verb-final dass-clauses, (4a). As to (ii) German allows the extraction of an XP (in particular of wh-phrases) from a complement clause with the overt complementizer dass, cf. (5).1 The extraction is assumed to take place via the SpecCP-position of the embedded clause. Ac-
30 Katrin Axel and Tanja Kiziak cordingly, in the extraction analysis for the controversial construction there is an intermediate trace in the SpecCP of the dependent V2-clause as in (2). (4)
a. Du denkst, dass Ede Tim angerufen hat. you think that Ede Tim called has ‘You think that Ede has called Tim.’ b. Du denkst, Ede hat Tim angerufen. you think Ede has Tim called ‘You think Ede has called Tim.’
(5)
a. Wen denkst du dass Ede angerufen hat? whom think you that Ede called has ‘Whom do you think that Ede has called? b. Weni denkst du [CP ti [C dass Ede ti angerufen hat]]
On the other hand, it is also possible to envisage a parenthetical analysis for the controversial construction, i.e. a verb-first parenthetical (V1-parenthetical) in the prefinite insertion slot. This is supported by the following facts (see Reis 1996): Not only does exactly this type of parenthetical occur in other insertion slots, e.g. in post-subject and clause-final position as in (6), but it is also the case that other types of parentheticals, e.g. so-parentheticals, occur in exactly this prefinite insertion position, see (7). So both the prefinite insertion slot and this type of verb-first parentheticals are attested independently of the controversial construction. (6)
a. Wen hat Ede denkst du angerufen? whom has Ede think you called b. Wen hat Ede angerufen denkst du? whom has Ede called think you ‘Whom has Ede (do you think) called, (do you think)?’
(7)
Bei Freunden, so denkt Ede, sollte man oft anrufen. at friends so thinks Ede should one often call ‘One should often call one’s friends, thinks Ede.’
The controversial construction has attracted the attention of linguists for more than two decades mainly because of the far-reaching implications its analysis has both for models of German sentence structure in general (cf. Reis 1996: 51) and for the status of dependent V2-clauses. Extractions have
Contributing to the extraction/parenthesis debate
31
been claimed to be possible only from strictly governed domains (Huang 1982). Thus if an ‘extraction from V2’ analysis is assumed for the controversial construction, dependent V2-clauses have to be analysed as syntactic complement clauses in the same sense as dass-complement clauses. This view has been challenged by Reis (1997), who claims that dependent V2clauses are not syntactically embedded and that V2 in general is a root phenomenon. It is thus of great theoretical interest to settle the parenthesis/extraction debate. Up to now it has remained unresolved as it is difficult to find any clear evidence which distinguishes between the two accounts. In this paper we will present two types of empirical evidence in order to contribute to this long-standing discussion: Evidence from judgement studies of present-day German (section 2) and evidence from historical corpus data of Old High German (section 3). Both these data types provide new insights in themselves, but crucially, their combination allows for even stronger conclusions concerning the controversy at hand, because, as we shall see, they both point in the same direction.
2. Evidence from judgement studies The overall aim of our judgement studies was to compare the controversial construction to an uncontroversial extraction structure on the one hand and to an uncontroversial parenthetical construction on the other. More specifically, we compared the controversial construction to extractions from a dass-clause (dass-extractions) as in (5) and to V1-parentheticals in postsubject position as in (6a). We assume that constructions of the same structural type receive the same or at least parallel acceptability judgements across contexts. Thus, the finding that the controversial constructions patterns with the unambiguous structure A and not with the unambiguous structure B would count as evidence in favour of analysis A. Reis (1996, 2002) systematically compared (her own judgements of) the controversial construction to the two clear structure types for a number of phenomena. In our study we focussed on one of her core objects of investigation – on predicate restrictions. Reis claims that a number of predicates can appear as matrix predicates in extractions but that they are impossible in prosodically integrated parentheses. Thus they appear as bridge predicates in dass-extractions as in (5), but not inside V1-parentheticals as in (6). The relevant predicates are strong factive predicates, negative or negated predicates, preference predicates, and adjectival predicates in general. The crucial question is whether these predicates occur in the controversial construction.
32 Katrin Axel and Tanja Kiziak Reis denies this, i.e. she rates the controversial construction with these predicates as unacceptable and thus on a par with the V1-parentheticals and differently from the dass-extraction. In our study we elicited judgements for the pertinent constructions with a range of predicates in order to test for similarities and differences between the structures across predicates.
2.1. Magnitude estimation methodology and pattern matching technique In all experiments, we applied the magnitude estimation methodology (Bard, Robertson & Sorace 1996) to elicit judgements. In magnitude estimation, informants judge how good or bad sentences are in comparison both to a reference item and to their own previous judgements, i.e. all judgements are relative. Our informants were instructed to rate the naturalness of our example sentences. Subjects can use all positive numbers including decimals to state their judgements, i.e. they can always introduce a score which is better or worse than all of their previous scores. Since the results are numerical and form an interval scale, standard statistical tests can be applied. We made the experiments available on the web using WebExp (Keller et al. 1998). For evaluation of our data we applied the pattern matching technique (Featherston 2004). The basic idea is that if two constructions are structurally alike, their judgements will be identical or behave in a parallel way across conditions, here a hierarchy of predicates. One structure may be better than the other, but their response to the continuum of predicates is the same. Put differently, there is no interaction of structure and predicate type. If, on the other hand, two constructions are structurally different, we expect an interaction of structure and predicate type and no parallel pattern.
2.2. The studies series Our research program was guided by three questions: (i) Does the controversial construction respond to the predicate restrictions like the dass-extractions (see 2.2.1)? (ii) If not, could there be a confounding factor which is responsible for the perceived differences (see 2.2.2)? (iii) Does the controversial construction behave like clear parentheticals (see 2.2.3)?
Contributing to the extraction/parenthesis debate
33
2.2.1. Experiment I: dass-extraction vs. controversial construction Recall Reis’ claim that factive, negative/negated, adjectival and preference predicates are acceptable in the dass-extraction but not in the controversial construction. This is striking as they are otherwise generally assumed to allow the same predicate classes, i.e. mainly predicates of thought and speech. For the parenthesis/extraction debate it is more revealing to focus on those predicates for which the two constructions are supposed to diverge, but one needs to consider their behaviour with the predicates of thought and speech in order to understand their overall pattern. Featherston (2004) carried out a judgement study testing eight predicates from this group with a number of structures, among them the dass-extraction and the controversial construction. Featherston adheres to the standard generative analysis of the controversial construction as an extraction construction. We thus have to be careful about his conclusions, but the data as such is very interesting: Featherston detects a strikingly similar pattern for the two constructions with the predicates of thought and speech, with the controversial construction being constantly judged better than the dass-extraction, but in a parallel fashion. We took Featherston’s results as a starting point for our own investigation: knowing that the dass-extraction and the controversial construction behave alike for some predicates, will we detect a difference between them for negative/negated, adjectival and preference predicates? 2.2.1.1. Design: Our own experiment was deliberately designed as a followup study of Featherston (2004) with some of the conditions overlapping, and we will present Featherston’s and our own study conjoinedly in this paper. In (8) we spell out the constructions tested in both experiments, i.e. the controversial construction in (8a) and the dass-extraction in (8b).2 (8)
a. Welchen Bewerber glaubst/hoffst/bevorzugst du which applicant believe/hope/prefer you stellt das Projekt ein? employs the project PARTICLE b. Welchen Bewerber glaubst/hoffst/bevorzugst du which applicant believe/hope/prefer you dass das Projekt einstellt? that the project employs ‘Which applicant do you believe/hope/prefer (that) the project will employ?’
34 Katrin Axel and Tanja Kiziak Both experiments included the verbs glauben (believe) and hoffen (hope). Featherston additionally tested the reporting predicates sagen (say), behaupten (claim), fürchten (fear), erzählen (tell), erklären (explain), and the negative predicate bezweifeln (doubt). Our follow-up experiment focussed on preference and adjectival predicates: wollen (want), wünschen (wish), vorziehen (prefer), bevorzugen (prefer), lieber sein (be preferable), ratsam sein (be advisable), das Beste sein (be the best), besser finden (find better), klar sein (be clear) and bekannt sein (be known). In sum, a total of 18 different predicates were covered in the two studies, eight in Featherston’s and twelve in the follow-up study with an overlap of two verbs (glauben and hoffen). Featherston used er (he) as the subject of these predicates, the follow-up study du (you, singular) unless the predicate was an adjective, which only combines with es (it).3 The interrogative constituent was in the accusative in both experiments. Featherston’s study contained ten, the follow-up experiment eleven lexical variants of the experimental material. The lexis was controlled for length, lemma frequency and semantic plausibility. In each experiment 28 subjects were recruited by flier. We refer the reader to Featherston (2004) and Kiziak (2004) for further details and separate discussions of the studies. 2.2.1.2. Results and discussion: For graphical representation, we normalize the data from all subjects by conversion to z-scores. This unifies the different scales that individual informants used, allowing for visual inspection of the results. The data sets from both experiments are combined in figure 1 by using glauben and hoffen as overlap and tool for unification. The vertical scale represents perceived wellformedness, with higher scores indicating better judgements. We ordered the predicates on the horizontal axis according to the scores in the dass-extraction condition.
Contributing to the extraction/parenthesis debate
35
Figure 1. dass-extraction versus controversial construction
Figure 1 shows that the best predicates for both constructions are the typical predicates of thought and speech. Judgements of the dass-extraction decline fairly evenly as the predicates become worse bridge predicates. In contrast to this, the controversial construction starts off better than the dass-extractions with the reporting predicates, but declines more steeply with the negative, preference and adjectival predicates, plunging past the dass-extractions to become worse than them. Our judgement data thus reveals a division of the predicates into two subgroups as claimed by Reis: on the one hand we have the reporting predicates for which the controversial construction is constantly rated better than the dass-extraction, and on the other hand, we have the negative, preference and adjectival predicates, for which the pattern is reversed.4 On the face of it, this finding disfavours an extraction analysis of the controversial construction since a more parallel behaviour would be expected on this analysis. Yet, the next section addresses a possible objection to this conclusion. 2.2.2. Experiment II: V2-subordination as a confounding factor? On the extraction analysis, the controversial construction is an extraction from a dependent V2-clause. It is noteworthy that dependent V2-clauses are more restricted in their occurrence than dass-clauses, i.e. only a subset of the predicates which select a dass-clause can also select a V2-clause.
36 Katrin Axel and Tanja Kiziak Our reasoning is therefore as follows: Possibly the controversial construction is an extraction – just like the dass-extraction –, but it might be influenced by an additional factor which is irrelevant for the dass-extraction: The acceptability of V2-subordination with certain predicates. If this proved true, the observed dissimilarities between dass-extraction and controversial construction are due only to the V2-factor, in which case they are no counterargument to the extraction analysis of the controversial construction. 2.2.2.1. Design: As the primary aim of this study was to understand whether ratings of the controversial construction directly correlate with acceptability of simple, declarative dependent V2-clauses, we tested both constructions with a subset of the predicates from the previous studies. The secondary goal was to see whether we could replicate our earlier findings concerning the dass-extraction and the controversial construction. We thus included sentences as in (8) for replication and declarative dependent V2clauses as in (9) for comparison with the controversial construction. (9)
Er glaubt/hofft/bevorzugt, die Firma wählt he believes/hopes/prefers the company chooses diesen Standort aus. this location PARTICLE ‘He believes/hopes/prefers the company chooses this location.’
Note that we were careful to include both predicates for which the controversial construction had scored better as well as such for which it had scored worse than the dass-extraction. Our experiment contained four predicates of thought and speech (glauben, hoffen, fürchten and erzählen), six preference predicates, among them two adjectival predicates (wollen, wünschen, bevorzugen, vorziehen, lieber sein and ratsam sein), one adjective of certainty (klar sein) and one negative predicate (bezweifeln). We used third person singular pronouns as subjects for these predicates. We provided 12 versions of the material and included 15 filler items. The 31 subjects were recruited by flyer. 2.2.2.2. Results and discussion:5 The previous results for the dass-extraction and the controversial construction were replicated. For lack of space we cannot discuss this in detail but refer the reader to Kiziak (2007). The repeated measures analysis of variance confirms the replication. The important measure is the interaction of the factors Predicate and Structure. It is significant by subjects and by items (F1 (11,330) = 4.23, p1 < 0.001; F2 (11,121) = 3.07, p2 = 0.003), thus confirming that the two constructions do
Contributing to the extraction/parenthesis debate
37
not respond in a parallel way to the range of predicates. We omit the comparison from figure 2 for reasons of clarity.
Figure 2. Dependent V2-clauses versus controversial construction. Partial correspondance, but also clear differences
Consider figure 2. Since we seek to understand whether the controversial construction only reflects the quality of dependent V2-clauses with the tested predicates, we aligned the predicates in such a way that the two constructions are as parallel as possible. We find a consistent pattern of dependent V2-clauses and controversial construction over eight of the predicates on the left of the graph. However, the parallelism breaks down for the four predicates on the right. For these, the dependent V2-clauses score disproportionally higher than the controversial construction. It is the word ‘disproportionally’ that should be emphasized, as the dependent V2-clauses are rated better throughout, i.e. even where the two constructions are judged in a parallel fashion. The repeated measures analysis of variance supports the view that the V2-clauses and the controversial construction are rated differently across the continuum of predicates: The interaction of Verb and Structure is highly significant both by subjects and by items (F1 (11,330) = 6.66, p1 < 0.001; F2 (11,121) = 4.73, p2 < 0.001). In summary, the pattern of the controversial construction cannot be explained by attributing it to the quality of dependent V2-clauses. This in turn
38 Katrin Axel and Tanja Kiziak means that the differences we repeatedly found between the dass-extraction and the controversial construction cannot be accounted for by the factor ‘V2-subordination’, i.e. we have not found an explanation for the perceived differences between dass-extraction and controversial construction on an extraction analysis of the latter.
2.2.2. Experiment III: clear parenthetical vs. controversial construction With respect to the parenthesis/extraction debate, the results so far must be considered as negative evidence against the extraction analysis. This is however not equivalent to providing positive evidence in favour of the parenthetical analysis. In our third experiment we therefore compared the controversial construction to clear parentheticals, i.e. V1-parentheticals in postsubject position. These have generally received a parenthetical analysis despite some slifting accounts along the lines of Ross (1973). 2.2.3.1 Design: Apart from the controversial construction and the postsubject V1-parentheticals as in (10), we again included the dass-extraction to test for replication of our earlier findings. We used the same predicates as in experiment II. To ensure that we get clause-internal rather than clausefinal V1-parentheticals, we inserted an adverbial in all versions. Apart from this we retained the previously used material. We recruited 27 participants. (10) Welchen Vorschlag setzt der Vorstand which proposal implement the board glaubt/hofft/bevorzugt er im Frühjahr um? believes/hopes/prefers he in spring PARTICLE ‘Which proposal will the board implement in spring, does he believe/ hope/prefer?’ 2.2.3.2. Results and discussion: Let us briefly state that the predicate-class dependent contrast between dass-extraction and controversial construction was again replicated in this third experiment. In the repeated measures analysis of variance, the interaction of the factors Predicate and Structure is again highly significant both by subjects and by items (F1 (11,286) = 5.51, p1< 0.001; F2 (11,121) = 4.489, p2<0.001).
Contributing to the extraction/parenthesis debate
39
Figure 3. Post-subject V1-parentheticals versus controversial construction. No clear contrasts.
Figure 3 displays the results for the controversial construction and the V1parentheticals with the predicates ordered according to their scores in the controversial construction. Although the two constructions are not judged perfectly equally, we do not see any sign of the clear contrast we have repeatedly found between the dass-extraction and the controversial construction. That the controversial construction is in general rated slightly better than the V1-parentheticals can be explained by the dispreferred postsubject insertion slot. The repeated measures anova is compatible with the similar response to the continuum of predicates: there is no significant interaction of the factors Predicate and Structure (F1 (11,286) = 1.786, p1 = 0.056; F2 (11,121) = 1.447, p2 = 0.218) – this is all the more apparent when we recall the 0.001-values above. As we are testing for similarity rather than difference, we also considered the correlation coefficients. The dassextraction and the controversial construction showed a Pearson’s correlation of 0.727. It is of course natural that both constructions respond to the continuum of predicate quality. The controversial construction and the clear parentheticals, however, display a much stronger Pearson’s correlation of 0.928. The statistical analysis thus makes the very close relationship between the clear parenthetical and the controversial construction quite apparent.
40 Katrin Axel and Tanja Kiziak 2.3. Tentative conclusions We gathered three pieces of evidence: the controversial construction patterns differently to dass-extractions, differently to dependent V2-clauses, but similarly to post-subject V1-parentheticals. With regard to predicate restrictions, the controversial construction thus resembles the clear parentheticals more closely than it resembles clear extractions. Taken at face value, this favours a parenthetical rather than an extraction analysis. Accordingly, the parenthetical analysis should at the very least not be disregarded as has been done by many generative linguists. As we do not reject a dual analysis of a particular surface string out of hand, we would however hesitate to follow Reis (2002) in claiming that the correctness of the parenthetical analysis necessarily renders the extraction analysis impossible. There are a number of possible objections to our conclusions, e.g. that the effects might be due to lexical rather than structural factors (bearing in mind work on preference predicates by e.g. Frank 1998). We cannot discuss the objections in detail here but refer the reader to Kiziak (2007) for an explanation why we do not take them to compromise our argumentation. We therefore remain with our tentative conclusions that the data from our judgement studies of present-day German weigh in favour of giving the controversial construction a parenthetical analysis.
3.
Evidence from historical corpus data
In our historical study, we wanted to see whether evidence from the oldest attested period of the German language, i.e. from Old High German (OHG), could help to decide which analysis – the extraction or the parenthetical analysis – is more plausible for our controversial construction. We investigated a number of OHG prose texts, namely the Isidor translation, the socalled Monsee Fragments (both c. 800 A.D.) and a number of translations by the 11th century writer Notker Labeo (i.e. Boethius, Martianus Capella and the Psalter).6
3.1. The controversial construction in Old High German In Notker’s work the historical precursor of the controversial construction is already attested:7
Contributing to the extraction/parenthesis debate
41
(11) uuéderêr déro uuânest tu gemág mêr? which thoseGEN believe you is-capableIND more? ‘Which one of those, do you believe, is capable of more?’ quemnam horum ualentiorem esse decernis? (N IV 189, 18) (12) Fone uuiû chist dû nâhent? From whatINSTR say you approach3.PL ‘From where, do you say, they are approaching?’
(N Ps 54 189, 11)
As we have amply discussed, in modern German the controversial construction has received two analyses, the extraction and the parenthetical analysis. In principle, both these analyses could also be proposed for the OHG equivalent to this construction. However, we have seen in the discussion on modern German that both analyses critically depend on the presence of other constructions. In the following we will therefore examine whether these component constructions were available in OHG as there are in present-day German.
3.2.
Constructions relevant for the extraction analysis
3.2.1. Long extraction The extraction analysis is, of course, only plausible if we can show that long extraction of wh-phrases was possible in OHG. In fact, we do have compelling evidence that long extraction of whphrases out of complement clauses introduced by thaz ‘that’ was already in use in OHG times. Interestingly, the relevant examples are not only attested in Notker’s late OHG texts, (13), but also already in the Monsee Fragments, cf. (14), which suggests that this construction is quite old.8 (13) [uuélên uuéhsel]i múgen uuír chéden . [ti dáz tîe ti lîdênCONJ]? which changeACC can we say that these undergo ‘Which change can we say that these undergo?’ (N IV 216,1) (14) [huuaz]i uuellet ir [ti daz ih · iu ti tuoe] · whatACC want you that I you doCONJ ‘What do you want me to do for you?’ quid uultis ut faciam uobis? (MF XIV,24 (= St. Matthew 20: 32))
42 Katrin Axel and Tanja Kiziak Note that example (13) from the Consolatio is part of the commentary that was added by Notker and does not translate a Latin sentence, which shows that wh-extraction was not merely a syntactic loan.9 3.2.2. Dependent argument clauses without complementizers The second construction which is a precondition for the extraction analysis are dependent argument clauses without complementizers and with V2 word order. In modern German it is uncontroversial that certain verbs/predicates optionally select dependent V2-clauses without the complementizer dass ‘that’ besides the standard verb-final dass-clauses. But was this also the case in OHG? In Notker’s text we do indeed find complement clauses which are not introduced by the complementizer thaz, but these clauses exhibit verb-final word order as can be seen in the examples in (15)–(17).10 In fact, already Diels (1907: 194) came to the conclusion that the earliest attestations of unintroduced complement clauses in OHG did not exhibit main, but subordinate clause word order. (15) Uuânest11 [tu dehéin mûot keuéstenôtez . believe2.SG you any soul firm mít rédo ába stéte eruuékkêst] …? with reason from position softenCONJ ‘Do you believe that you can remove such a firm soul settled in reason from its state?’ (N II 90,25) Num mentem coherentem sibi firma ratione amouebis a statu proprie̹ quietis? (16) Uuânest tû [ dáz nîehtes túrftîg néist . máhte dúrftîg sî ]? believe you what nothingGEN needy NEG-is power needy isCONJ ‘Do you believe that which needs nothing needs power?’ An tu arbitraris . quod nihilo indigeat . egere potentia? (N III 142,30) (17) Íh chído [ iz tempus pezéichenne ] I say it ‘tempus’ signifiesCONJ ‘I tell you that it signifies time (‘tempus’)’ Dico autem quoniam consignificat tempus.
(N DeInt I 10,12)
In contrast to Notker’s late OHG works, the major OHG prose texts from the 8th and 9th centuries hardly contain any object or subject clauses without
Contributing to the extraction/parenthesis debate
43
overt complementizers. As Diels (1907: 195) argued this is probably due to the fact that these texts were more loyal to the Latin sources where a complementizer was generally present.12 We do, however, find earlier examples in the poetry, for example in the Hildebrandslied (cf. Schrodt 2004: 148) and in Otfrid’s Gospel Harmony.13 In the majority of complementizerless V-final clauses in OHG the matrix verbs are quedan ‘to speak/say’ and uuânen ‘to believe/think’. In contrast to our modern dependent V2-clauses (cf. Reis 1997: 122f.), however, the complementizerless verb-final clauses in OHG also occur after volitional matrix verbs such as wellen ‘want‘ in (18) and after negated or negative predicates (e.g. neuuâno, (19), neíst nehéin zuîuel ‘(there) is no doubt’, cf. N II 75,1): (18) Táz uuólta íh [tû dar míte châdîst] that wanted I you that with sayCONJ ‘I wanted you to say this with that’
.
(19) Nóh íh neuuâno [mír mûoza sî …] nor I NEG-believe me permission isCONJ ‘nor do I believe that I am allowed …’ Nec arbitror mihi fas esse. …
(N III 147,27)
(N I 25,2)
This suggests that these clauses are not the historical precursors of our modern dependent V2-clauses. Rather we could hypothesize that they are just a variant of the ‘normal’ complement clauses with overt thaz where the complementizer has either been deleted via ‘Comp-drop’ or just lacks a phonological matrix as is indicated in (20).14 (20) ih chido,
[ ø
iz
tempus
bezeichenne ]
(=17)
Note that such an analysis with a silent complementizer has also been suggested for unintroduced (‘asyndectic’) relative clauses in OHG (see Lenerz 1985: 109). Most attestations of asyndetic relative clauses are in the early texts (e.g. Behaghel 1928: 742–745), but there are also some residual examples in Notker’s works, cf. Behaghel (ibd.: 745). So we have seen that there were already complement clauses without overt complementizers and V-final order in OHG. However, the extraction analysis of our controversial construction is only convincing if there are also examples with V2 order attested. Yet the evidence for this construction is very poor.15 In Notker’s texts two classes of putative V2-complement clause do occur, but both may receive alternative analyses.
44 Katrin Axel and Tanja Kiziak First, we find some examples such as (21) which are in fact ambiguous between a V2- and a V-final analysis: (21) Ér chît [ter scriptor uuólti . (dáz man …)] he said the scribe wantedCONJ that one ‘He said that the scribe wanted (that one …’)
(N I 56,23)
Second, in the Consolatio, three putative examples are attested after the predicates wânen and thunken with complement clauses in the indicative mood: N I 26,27, cf. (22), N I 65,15 (Íh uuâno …) and N I 13,14 (Míh túnchet …). (22) Íh uuâno [dû gehúgest uuóla . (dáz…)] I believe you rememberIND well that ‘I believe you remember well (that …)’ Meministi ut opinor…
(N I 26,27)
These examples are exceptional since both verbs usually select conjunctive complement clauses in OHG. This is true not only for the dependent thazclauses, but also for the complementizer-less V-final clauses as can be seen in (15), (16) and (19).16 Behaghel (1928: 505) proposes to analyse such examples with unexpected indicatives as instances of a colon interpretation (Íh uuâno: “dû gehúgest uuóla”).17 Note that (22) and the other two examples have first person matrix subjects. With a third person matrix subject the colon construction and the construction with a true dependent clause would differ in wording, since the personal pronouns of the dependent clause would have to be third person instead of second person. To sum up, there is no compelling evidence for the type of dependent V2-clause that is a precondition for the extraction analysis. Some classes of alleged examples should presumably receive alternative analyses. It could be objected that even if we do not have positive evidence for this construction, we cannot really be sure that it was not licensed by the OHG grammar. On the other hand, we have ample evidence that there was a type of thaz-less complement clauses with V-final order. We thus may conclude that our data does not support the extraction analysis for the controversial construction in OHG, but we cannot reject it with absolute certainty.
Contributing to the extraction/parenthesis debate
45
3.2.3. Constructions relevant for the parenthetical analysis According to the parenthetical hypothesis, the OHG attestations of the controversial construction, contain parentheticals with V1 order which are inserted after the wh-phrase in the prefield and before the fronted finite verb. This assumption seems to be unproblematic since the occurrence of V1parentheticals with verbs of speech, thought and belief are a typical property of historical German texts (Maurer 1924; Behaghel 1928: 537). V1-parentheticals are attested in various insertion slots in OHG texts as is illustrated in (23) and (24) with examples from Notker (see also Wunder 1965: 244f. on Otfrid).18 First, they are found in different places in the middle field. In (23), for example, the V1-parenthetical uuâno ih has been inserted between the adverbial nû, which presumably is adjoined to the VP, and the direct object. Second, V1-parentheticals often occur at the end of their host clause as in (24), where uuâno ih is found after the verbal complex. (23) uuánda dû nû [uuâno íh] uuácherôren óugen hábest . because you now believe I more-watchful eyes haveIND ‘because you have now, I believe, more vigilant eyes’ iam enim ut arbitror uigilantius deducis oculos (N III 174, 16) (24) Ír uuéllent iz sô bríngen [uuâno íh ] . táz you wantIND it so bring believe I that íu nîehtes nebréste. you nothingGEN lackCONJ ‘You want to bring it so about, I believe, that you do not lack anything’ Fugare credo indigentiam opibus que̹ritis. (N II 80, 26) We do not have any direct evidence whether the V1-parentheticals were prosodically integrated like their modern counterparts. Notker usually signalled small breath pauses within clauses by the use of a special punctuation mark (a raised full stop: ‘·’). As can be seen in the examples (23) to (26), the V1-parentheticals are often not surrounded by a punctuation mark and this may be interpreted as a sign for prosodic integration. Since V1-parentheticals could show up in various positions in their host clause, it is very plausible to assume that they could also be inserted directly after the prefield. If this is correct, we can analyse examples with our controversial construction as in (25) and (26) as containing a V1-parenthetical insert after the fronted wh-phrase.
46 Katrin Axel and Tanja Kiziak (25) uues sun [uuânint ir] ist er whose son believe you1.PL isIND he ‘Whose son is he, do you believe?’ CV/IVS FILIVS EST? (26) uuéderêr déro [uuânest tu] gemág mêr ?
(N Ps 77 276,3)
(=11)
So the parenthetical analysis turns out to be unproblematic for our controversial construction in OHG.
4. General conclusion We considered a particular syntactic construction of German for which the theoretical literature has suggested two competing analyses, the extraction and the parenthetical analysis. To date it has not been conclusively settled which of the two analyses should be preferred. In this paper we therefore sought to supplement the so far rather theoretical argumentation with new types of empirical evidence: judgement data of present-day German and corpus data of Old High German. In the judgement study we have repeatedly seen that uncontroversial extractions respond differently from the controversial construction w.r.t. predicate restrictions. This negative evidence against the extraction analysis is supplemented by positive evidence in favour of the parenthetical analysis: for the predicates considered, the controversial construction behaves similarly to clear parentheticals. In the historical corpus study it turned out that the controversial construction is attested at a time where we do not find any compelling evidence for V2-complement clauses. So we may conclude that the parenthetical analysis is better supported than the extraction analysis for our OHG data. Both data types thus point in the same direction, towards the parenthetical analysis. That two entirely independent pieces of evidence reach the same tentative conclusions gives more weight to our overall conclusion, i.e. that a parenthetical analysis can at the very least not be excluded for the controversial construction.
Contributing to the extraction/parenthesis debate
47
Notes *
1. 2.
3.
4.
5.
6.
This work was supported by the Deutsche Forschungsgemeinschaft within the SFB 441 ‘Linguistische Datenstrukturen’. The experimental work was partly carried out in cooperation with our colleague Sam Featherston, whom we thank for many interesting discussions. We are also much obliged to our project leaders Marga Reis, Wolfgang Sternefeld and Hubert Truckenbrodt as well as many other members of the SFB and an anonymous reviewer for helpful comments. All remaining errors are our own. Some speakers, however, judge these structures to be marginal. We used extractions from dass-clauses throughout our experiments, although most preference predicates normally select wenn-clauses. In the present indicative, the wenn-clause can however be replaced by a dass-clause without incurring changes in the semantics. In order to avoid complementizer variation, we set all clauses in the present indicative and used dass-clauses with them. The only exception is the predicate wollen, which needs subjunctive. The impersonal pronoun es is a correlative with the adjectival predicates and with most preference predicates, e.g. Ich ziehe es vor, wenn du kommst. (I prefer for you to come). It can be omitted, depending largely on verbal idiosyncrasies. As an informal pilot study did not produce clear preferences, we included an es in brackets where at least some informants had opted for it. We thus left it to each subject to include or omit the es according to his/her preferences. We were careful to insert the es with a particular predicate either in both or in neither of the two constructions, so that no irrelevant distinction was made between them due to the (es). In our later experiments, we omitted the es in all cases. The overall tendencies remain the same. For the predicates on the very right, the two constructions converge. It is the dass-extraction which is judged relatively poor with these predicates. A likely explanation for this is the weak factivity of bekannt sein and klar sein. Factivity is generally assumed to have a negative influence on extractions. Why das Beste sein receives low acceptability in the dass-extraction is not entirely clear. These scores do not contradict the overall pattern we found. The two constructions still respond differently: taking glauben as a reference point, we see that the drop in acceptability is much larger for the controversial construction than for the dass-extraction with the predicates on the right-hand side. The omnibus ANOVA showed two significant main effects and a significant interaction for both experiment II and III, which justifies a pairwise comparison as done in the following sections. The Monsee Fragments [= MF] are cited by folio and line numbers, the OHG Isidor [=I] is cited by line numbers, Otfrid’s Gospel Harmony [=O] is cited by book, chapter and line numbers and Notker’s texts [N…] are cited by book, page and line numbers according to the following editions: The Monsee Fragments. G. A. Hench (ed.). Straßburg 1890 – Der althochdeutsche Isidor.
48 Katrin Axel and Tanja Kiziak
7. 8. 9.
10.
11. 12. 13.
14.
H. Eggers (ed.). Tübingen 1964 – Otfrids Evangelienbuch, Oskar Erdmann (ed.), 6th edition by Ludwig Wolff, Tübingen 1973 – Die Werke Notkers des Deutschen. Neue Ausgabe. Begonnen von E. H. Sehrt und T. Starck. Fortgesetzt von J. C. King und P. W. Tax. Band 1–3: Notker der Deutsche, Boethius, »De consolatione Philosophiæ«, Buch I–V. P. W. Tax (ed.). Tübingen 1986, 1988, 1990 [= N]. Band 4: Notker der Deutsche, Martianus Capella, »De Nuptiis Philolosophiae et Mercurii«, J. C. King (ed.), Tübingen 1979 [=N MC], Band 5: Notker der Deutsche, Boethius’ Bearbeitung der Categoriae des Aristoteles, James C. King (ed.), Tübingen 1972 [=N Cat]. Band 6: Boethius’ Bearbeitung von Aristoteles’ Schrift »De Interpretatione«, Tübingen 1975 [=N DeInt], Band 9: Der Psalter. Psalm 51–100, P. W. Tax (ed.), Tübingen 1981 [= N Ps]. – The line numbers only indicate where the OHG sentence begins. – In those cases where there is a corresponding Latin sentence present, it is also cited (without line numbers). In some examples, bracketing or traces have been added. Cf. also N II 72,3. See also Erdmann (1874: 181) and Lenerz (1985: 112–114) for examples from Otfrid’s Gospel Harmony. Further examples from early and late OHG texts are cited in Erdmann (1874: 181) and Behaghel (1928: 547–552). The earlier example from the Monsee St. Matthew, (14), is a relatively close rendering of the Latin construction that arguably involves a wh-extraction as well. We also find the relevant examples in Otfrid’s Gospel Harmony (ca. 870 A.D.), and here crucially also in sentences that do not go back to Bible verses. Consider, for example, O III 12,8: wer quédent sie theih sculi sín …? (who say they that-I should be/ ‘Who am I, do they say?’). Further examples: N II 48,27; N II 44,17; N II 77,3; N III 136,6; NMC 45,12; N Cat 53,8; N Cat 87,12. See also Diels (1907) for examples from further OHG prose and poetical texts – The majority of cases involve object clauses, but some subject clauses with missing complementizers (and V-final order) are also attested (e.g. N III 142,18). The subject pronoun has been omitted in this case. See Axel (2005b) on the pro-drop properties of OHG. In the Monsee St. Matthew there is one putative example after the matrix verb suohhen ‘to seek’ (MF XV,1). Otfrid is the OHG text which has highest number of thaz-less complement clauses. See Erdmann (1874: 177 f., 186, 190–195, Wunder (1965: 239–254) and Lenerz (1985: 112–114) for examples. The most frequent matrix verbs are wânen (ca. 30 examples according to Erdman 1874: 177, 191) and quedan (ca 50 examples, cf. ibd.). See fn. 15 on verb placement in these cases. Interestingly, there is some evidence that it was possible to extract an XP out of V-final complement clauses without thaz. In (18) from Notker’s Consolatio the direct object táz has arguably been extracted from the subordinate clause and moved to the prefield-position of the matrix clause: Tázi uuólta íh [ti tû dar
Contributing to the extraction/parenthesis debate
15.
16.
17. 18.
49
míte ti châdîst ]. In Otfrid, such examples are also found with extracted whphrases (Lenerz 1985: 112). In contrast to our controversial construction, cases such as (18) are not ambiguous between an extraction and a parenthetical analysis: If we analysed uuólta ih as a parenthetical insert, we would have to argue that the host clause would be táz tû dar míte châdîst with V-final word order. This is, however, not convincing since there was already generalized Vmovement in (unintroduced) main clauses in OHG (cf. Axel 2005a) and Vfinal order was very rare, especially in late OHG texts (cf. Näf 1979: 143–146 on Notker’s Consolatio). The only OHG text where there are quite a few examples with putative V2order in unintroduced complement clauses is Otfrid. Most these cases are ambiguous between a V2- and V-final analysis (note that even sentences with the finite verb in base position are often not verb-final at the surface due to the possibility of liberal extraposition in OHG (cf. Axel 2005a): e.g. O I 9,20; O I 19,21; O I 22,11; O IV 8,13; O IV 18,5. Otfrid is a poetical text, which exhibits word-order patterns that are not found in the prose texts and that have clearly been influenced by rhyme and metre. So sentences such as (i) (see also e.g. O III 12,13) may well be V-final clauses where the adverb réhto has been extraposed so that it rhymes with knéhto. (i) Ih wánu thu sis réhto / thésses mannes knéhto (O IV 18,7) I believe you be rightful thisgen man’s servant However, even in Otfrid the vast majority of unintroduced complement clauses have uncontroversial V-final order (cf. Wunder 1965: 483). The verbal mood of complement clauses selected by thunken is discussed in Schrodt (2004: 190), Behaghel (1928: 588) and Erdmann (1874: 191), and that of clauses selected by wânen in Furrer (1971: 145) for Notker’s Consolatio, and in Erdmann (1874: 191) and Wunder (1965: 240, 259) for Otfrid. See also Reis (1997: 123) for the colon construction in modern German. In OHG texts, V1-parentheticals (or inquit formulae) also often occur within sequences of direct speech which are usually not marked by punctuation marks. Thus, examples such as (i) are presumably not instances of our controversial construction, but direct speech sequences with inserted parentheticals: “Uuér”, chád ih, “mág tés kélougenen”. (i) Uuér chád íh mág tés kélougenen? (N III 121,14) who said I can thisGEN deny Quis id inquam neget?
References Anderson, Sven-Gunnar & Sigmund Kvam 1984 Satzverschränkung im heutigen Deutsch. Tübingen: Narr.
50 Katrin Axel and Tanja Kiziak Axel, Katrin 2005a Studien zur althochdeutschen Syntax. Linke Satzperipherie, Verbstellung und Verb-zweit, doctoral dissertation, University of Tübingen. (To appear 2007 as Studies on Old High German Syntax: Left Sentence Periphery, Verb Placement and Verb-Second. Amsterdam / Philadelphia: Benjamins.) 2005b Null subjects and verb placement in Old High German. In Linguistic Evidence. Empirical, Theoretical and Computational Perspectives, S. Kepser & M. Reis (eds.), 27–48. Berlin /New York: Mouton de Gruyter. Bard, Ellen G., Dan Robertson & Antonella Sorace 1996 Magnitude estimation of linguistic acceptability. Language 72 (1): 32–68. Behaghel, Otto 1928 Deutsche Syntax. Eine geschichtliche Darstellung. Vol. III. Heidelberg: Winter. Diels, P. 1907 Entstehung der indirekten Rede im Deutschen. Zeitschrift für vergleichende Sprachforschung 41: 194–198. Erdmann, Oskar 1874 Untersuchungen über die Syntax der Sprache Otfrids. Erster Teil. Halle: Buchhandlung des Waisenhauses. Featherston, Sam 2004 Bridge verbs and V2 verbs – the same thing in spades? Zeitschrift für Sprachwissenschaft 23: 181–209. Frank, Nicola 1998 Präferenzprädikate und abhängige Verbzweitsätze. Stuttgart/Tübingen: Arbeitspapiere des SFB 340: 128. Furrer, Dieter 1971 Modusprobleme bei Notker. Die modalen Werte in den Nebensätzen der Consolatio-Übersetzung. Berlin /New York: Mouton de Gruyter. Grewendorf, Günther 1988 Aspekte der deutschen Syntax. Tübingen: Narr. Haider, H. 1993 ECP-Etüden: Anmerkungen zur Extraktion aus eingebetteten VerbZweit-Sätzen. Linguistische Berichte 145: 185–203. Huang, C.-T. James 1982 Logical relations in Chinese and the theory of grammar. PhD dissertation. MIT. Keller, Frank, Martin Corley, Steffan Corley, Lars Konieczny & Amalia Todirascu 1998 Web Exp: A Java Toolbox for Web-Based Psychological Experiments. Technical Report HCRC/TR-99. Human Communication Research Centre, University of Edinburgh.
Contributing to the extraction/parenthesis debate
51
Kiziak, Tanja 2004 Einschub oder Bewegung? Empirische Evidenz zur Parenthese-Hypothese. Master thesis, University of Tübingen http://www.sfb441.uni-tuebingen.de/~kiziak/papers/magister.pdf 2007 Long extraction or parenthetical insertion? Evidence from judgement studies. In Parentheticals, N. Dehé & Y. Kavalova (eds.), 121–144. Amsterdam: Benjamins. Lenerz, Jürgen 1985 Diachronic syntax verb position and COMP in German. In Studies in German Grammar, J. Toman. (ed.), 103–132. Dordrecht: Foris. Maurer, Friedrich 1924 Zur Anfangsstellung des Verbs im Deutschen. In Beiträge zur germanischen Sprachwissenschaft. Festschrift für O. Behaghel, W. Horn (ed.), 141–184. Heidelberg: Winter. Näf, Anton 1979 Die Wortstellung in Notkers Consolatio. Untersuchungen zur Syntax und Übersetzungstechnik. Berlin /New York: Walter de Gruyter. Reis, Marga 1996 Extractions from Verb-Second Clauses in German? In On Extraction and Extraposition in German. U. Lutz & J. Pafel. (eds.), 45–88. Amsterdam: Benjamins. 1997 Zum syntaktischen Status unselbständiger Verbzweit-Sätze. In Sprache im Fokus. Festschrift für Heinz Vater zum 65. Geburtstag, C. Dürscheid & K.-H. Ramers (eds.), 121–144. Tübingen: Niemeyer. 2002 Wh-movement and integrated parenthetical constructions. In Proceedings from the 15th Workshop on Comparative Germanic Syntax, J.-W. Zwart & W. Abraham (eds.), 3–40. Amsterdam: Benjamins. Ross, John Robert 1973 Slifting. In The Formal Analysis of Natural Languages. Proceedings of the First International Conference, M. Gross et al. (eds.), 133–169. The Hague/Paris: Mouton. Schrodt, Richard 2004 Althochdeutsche Grammatik II. Syntax. Tübingen: Niemeyer. Staudacher, Peter 1990 Long movement from verb-second complements in German. Scrambling and Barriers, G. Grewendorf & W. Sternefeld (eds.), 319–339. Amsterdam: Benjamins. Thiersch, Craig L. 1978 Topics in German Syntax. Ph.D. Dissertation, MIT. Wunder, Dieter 1965 Der Nebensatz bei Otfrid. Heidelberg: Winter.
Quantifying quantifier scope: A cross-methodological comparison Oliver Bott and Janina Radó
Good linguistic theory can only be developed on the basis of solid data. Linguists have long recognized that it is not sufficient to trust the researcher’s intuitions – broader data sets need to be considered to make valid generalizations. In some cases the relevant data can be found in corpora. However, often the best solution is to systematically collect judgments from naive speakers, for instance by constructing questionnaires with multiple examples of the construction type under consideration. Subjects (naive informants) then rate these items on a (usually 5 or 7-point) scale or relative to a reference item (magnitude estimation). Such questionnaires have increasingly been used in syntax to buttress theoretical arguments with statistically quantifiable results. Gathering quantifiable data may be advantageous in semantics as well, as semantic theories also tend to turn on subtle distinctions. Unfortunately, though, the questionnaires used in syntax cannot simply be “imported” into semantics, since the questions asked in the two fields are quite different. In judging syntactic wellformedness, subjects need to decide “Is the sentence grammatical/acceptable?”, which can be done on the basis of the sentence alone. In semantics, however, what we usually want to find out is how many (and what) readings a given sentence has. Thus we would need subjects to examine pairs of sentences and judge “Is the second sentence a good paraphrase of the first one?” To do that, subjects would have to compute the available meaning(s) for sentence 1, interpret sentence 2 and compare the two – a hopelessly complex task for naive informants. Clearly, a more suitable method is needed if we are to collect judgments from untrained native speakers. In this paper we will develop some criteria for identifying methods that may be appropriate for collecting semantic judgments, and demonstrate the use of these criteria by comparing three candidate methods.1
54 Oliver Bott and Janina Radó 1.
Finding suitable methods
1.1. Sensitivity of the methods Ease of use for untrained subjects is just one of the criteria a suitable method will have to fulfill. It is equally important to make certain that the method itself does not introduce any bias in the answers that subjects provide, and that it is capable of detecting subtle differences in judgment across construction types. We will start with the last criterion, and come back to the other two below. In order to see whether the methods under consideration are sensitive to subtle differences, we need to test them on constructions where such differences are expected. For our comparison we have selected German versions of doubly-quantified sentences like (1): (1)
Everyone loves someone.
Sentences of this type typically permit two scope readings, depending on whether the universal or the existential quantifier is interpreted as having wide scope (the ∀∃-and the ∃∀-reading, respectively). The relative preference for these readings depends on a number of factors, such as the choice of quantifiers (for instance each vs. every), their linear order2, and the syntactic role of the quantified expressions (see Beghelli & Stowell 2002, Pafel 2005, Tunstall 1998, among others). German is particularly interesting in this respect, as it allows the manipulation of linear (or hierarchical) order independently of syntactic role. By varying linear order and quantifier type, we generated a spectrum ranging from fully ambiguous to practically unambiguous sentences. Comparing preference judgments for sentences from different positions on the spectrum, we can test the sensitivity of our methods. Doubly-quantified sentences are interesting for another reason as well: the available readings reported in the literature are sometimes quite controversial. For instance, (1) has been claimed both to be fully ambiguous (May 1977, 1985; Hornstein 1984; Higginbotham 1985), and to only allow the ∀∃reading (Reinhart 1976, 1983; Hornstein 1995; Beghelli & Stowell 2002). It thus seems desirable to obtain more solid evidence based on naive speakers’ judgments.
Quantifying quantifier scope: A cross-methodological comparison
55
1.2. Disambiguation As we are interested in assessing the relative availability of meanings, we need to pair potentially ambiguous sentences with appropriate disambiguation. The three methods we examine share the property that the disambiguating information is provided as a kind of context in which to evaluate the sentence. This is intended to simplify the subjects’ task, as no explicit comparison of meanings is necessary, only the judgment of how well the sentence fits the context. The candidate methods differ in how the context is provided: using linguistic means, that is, a preceding question, or visual means: set diagrams or natural-looking scenarios. We will describe the methods in detail in section 2, but first let us turn to some psycholinguistic motivation for the present study.
1.3. Do we need a methodological comparison? The practice of using experimental methods to assess scope interpretations is in fact not new. Both linguistic context and diagrams have been employed to study scope preferences (Gillen 1991; Jackson & Lewis 2005). Is there any need for a new offline method, or is there anything else a methodological comparison can tell us? We think there is. First, the linguistic context used in a number of psycholinguistic studies turns out not to provide adequate disambiguation. As an example let us take the discourses used in Kurtzman & MacDonald (1993): (2)
Every boy climbed a tree.
(3)
a. The tree was full of apples. b. The trees were full of apples.
While (3b) is felicitous only if the wide scope universal interpretation of (2) is taken, (3a) is consistent with both scope readings3. Thus if subjects accept (3a) as a good continuation we can’t tell what reading(s) they computed. This is a serious problem: it may turn out that in most cases when subjects select the singular continuation they in fact have the other reading in mind. Yet different variants of this type of disambiguation have been widely used both in online studies of processing and to assess offline preferences. Furthermore, in order to test whether the (linguistic or visual) contexts we consider here fully disambiguate scope ambiguous sentences, we need to compare them under standardized conditions. This is made obvious by
56 Oliver Bott and Janina Radó some of the conflicting results in the psycholinguistic literature. For instance, using Kurtzman & MacDonald-style disambiguation, Filik et al. (2004) found a significant effect of quantifier type (every vs. a) both offline and online, while Anderson (2004) detected no such influence. The discrepancy may have arisen from the different constructions and different tasks used in the two sets of experiments. Thus if we want to identify suitable methods to assess scope readings we must make the methods themselves the object of our study.
2.
Pretesting the methods
For our methodological comparison we constructed 24 test items with a universal and an existential quantifier taking care to keep them equally plausible under both scope interpretations. Our intuitions concerning plausibility were further confirmed in a norming study.4 To ensure a fair comparison, an effort was made to keep the test sentences (and the distractors) maximally similar across the candidate methods. Testing of the methods took place in two steps. The purpose of the first stage was to make sure that the methods under investigation are suitable for scope disambiguation in the first place. We then compared the same methods on the basis of their performance on ambiguous quantified sentences. Apart from the different means of disambiguation, which constituted the point of comparison, the design and procedure used in the pretest were the same in all three methods. We first describe the general features, then turn to the particulars of the individual methods.
2.1. Materials and design The aim of the pretest was to find out whether the chosen methods provide appropriate disambiguation. For this purpose we used constructions where the universal and existential quantifiers were placed in different clauses in a sentence, making it scopally unambiguous. The quantifiers appeared in two possible orders, as in (4): (4)
a. Für genau einen Professor gilt, dass jede Studentin ihn verehrt. ‘Exactly one professor is such that every student adores him.’ b. Für jede Studentin gilt, dass sie genau einen Professor verehrt. ‘Every student is such that she adores exactly one professor.’
Quantifying quantifier scope: A cross-methodological comparison
57
As the glosses indicate, in this construction the linear order of the quantifiers determines their scope order, due to the clause boundary between them. The 24 items all appeared with the two quantifier orders in (4). For each sentence we constructed two possible disambiguating contexts in each method under investigation, one for the ∀∃-and one for the ∃∀-interpretation. Each sentence version was paired with each context version, yielding a total of four conditions per sentence. Four counter-balanced lists were created, each containing six items per condition in such a way that each item appeared only once on a list and across lists, all items appeared in all four conditions. Thirty-six distractor sentences were constructed as well to keep subjects from guessing the aim of the study, and to check whether they had followed the instructions. These sentences also included two quantifiers, representing a range of quantifier types, some of which were negated. A clause boundary between the two quantifiers makes these sentences unambiguous as well. The distractors were paired with disambiguating contexts and were included in all lists.
2.2. Subjects and procedure Subjects were recruited at the University of Tübingen and were tested individually. They received written instructions followed by the questionnaire containing the 24 test items and 36 fillers in an individually randomized order. They were asked to provide yes-no answers to the question “Does the sentence fit the context?”. A total of 68 subjects participated in the pretest (question-answer pairs: 24, set diagrams: 24, scenarios: 20). They were all native speakers of German and naive to the purpose of the study. No subject completed more than one questionnaire. Testing took approximately 20 minutes, subjects were paid €5.
2.3. Question-answer pairs In the first method we examined, preceding questions provided a disambiguating context for the target sentence. We selected this method as it is a fairly intuitive means of disambiguation, often employed by linguists to determine possible readings. A further advantage of using question-answer pairs is that only the linguistic modality is needed for making judgments, which may aid in keeping the task simple.
58 Oliver Bott and Janina Radó The questions contained a universal and an existential quantifier separated by a clause boundary, just like in the target sentences. This resulted in the quantifier in the matrix clause taking scope over the one in the embedded clause. Two types of contexts were created (for the ∀∃-and the ∃∀reading) by placing either the universal or the existential quantifier in the matrix clause. An example is given in (5): (5)
a. Kann man von genau einem Professor sagen, dass jede Studentin ihn verehrt? ‘Can it be said of exactly one professor that every student adores him?’ b. Kann man von jeder Studentin sagen, dass sie genau einen Professor verehrt? ‘Can it be said of every student that they adore exactly one professor?’
Each version of the context question was paired with each version of the target sentence, repeated in (6), preceded by Ja, stimmt “Yes, that’s right”. (6)
a. Für genau einen Professor gilt, dass jede Studentin ihn verehrt. ‘Exactly one professor is such that every student adores him.’ b. Für jede Studentin gilt, dass sie genau einen Professor verehrt. ‘Every student is such that she adores exactly one professor.’
Subjects were asked to provide yes-no answers to the question “Does the sentence match the question?”. As the example shows, the pairings that were expected to be good fits given the scope relations ((5a–6a) and (5b–6b)) are also the ones where the order of the quantifiers in the question is the same as in the answer. This raises the concern that higher acceptance rates in the matching conditions might be due to a preference for word order parallelism, rather than to scope interpretation. To rule out this possibility, in the second test phase we tested similar question-answer pairs where the quantified expressions were replaced with definite NPs with diese/r “this” like in (7) and (8) and found no evidence for a word order effect. (7)
a. Kann man von diesem Professor sagen, dass diese Studentin ihn verehrt? b. Kann man von dieser Studentin sagen, dass sie diesen Professor verehrt?
Quantifying quantifier scope: A cross-methodological comparison
59
(8) a. Diesen Professor hat diese Studentin verehrt. b. Diese Studentin hat diesen Professor verehrt. 2.3.1. Results and discussion The leftmost panel in Figure 1 shows the percent “yes” answers. The conditions where the context question and the target sentence allowed the same scope reading were overwhelmingly judged as “matching” (∃∀-sentences: 98.6%, ∀∃-sentences: 97.9%) whereas conditions where they were in conflict were for the most part judged non-matching (∃∀: 38.2% ,∀∃: 34.9%). Repeated measures ANOVAs5 revealed no main effect (all F <1), but, as expected, a significant interaction was found between sentence and context (F1(1,23)=95.98; p <0.01; F2 (1,23)=267.539; p <0.01).
Figure 1. % “yes” answers in the pretests: question-answer sequences (left), set diagrams (middle) and scenarios (right)
These results show that overall context questions can disambiguate quantifier scope in a target sentence quite well, and are suitable for use with naive subjects. Still, the relatively high percent of false “yes” answers for nonmatching pairs raises the question how reliable the judgments gathered by means of question-answer sequences really are. We will come back to this point in section 2.6. 2.4. Set diagrams Another mode of disambiguation that psycholinguists have employed to study quantifier scope is set diagrams like those in (9) below (see Gillen 1991; Jackson & Lewis 2005). This method seems attractive since the diagrams are quite simple and easy to work with.
60 Oliver Bott and Janina Radó (9)
a. Für genau einen Professor gilt, dass jede Studentin ihn verehrt. ‘Exactly one professor is such that every student adores him.’ b. Für jede Studentin gilt, dass sie genau einen Professor verehrt. ‘Every student is such that she adores exactly one professor.’
Figure 2. Diagram for ∀∃-reading
Figure 3. Diagram for ∃∀-reading
In the ∀∃-diagram (Figure 2), each element of the “student” set depicted on the right 6 is connected to some element of the “professor” set (indicating the “adore” relation) – different “professors” connected to different “students”, consistent with the wide scope universal reading. In the ∃∀-diagram, on the other hand, there is one single professor that is admired by all students. To fully disambiguate the diagram another instance of the relation was included as well, one between a different professor and one of the students. In this form the diagram is no longer consistent with a ∀∃-reading where all students admire exactly one professor, who happens to be the same one in each case. This extra line apparently made the ∃∀-diagrams more complex, reflected in comments we received from the subjects. However, as the results below indicate, this did not have a negative effect on subjects’ performance.
2.4.1. Results and discussion Results of the set diagram pretest are given in the middle panel in Figure 1. Again, diagram-sentence pairs indicating the same scope reading were rated as matching (∃∀-sentences: 95.1%, ∀∃-sentences: 88.9%) while conflicting conditions were overwhelmingly rated as non-matching (∃∀sentences: 6.3%, ∀∃-sentences: 14.6%). ANOVAs revealed a significant interaction of sentence and context (F1 (1,23)=267.349; p <0.01; F2 (1,23)= 1103.530; p <0.01), but no significant main effects. Thus it appears that set diagrams are easily used by naive subjects and are capable of disambiguating the scope readings. Clear disambiguation of
Quantifying quantifier scope: A cross-methodological comparison
61
the ∃∀-reading is particularly important in light of the problems in psycholinguistic experiments discussed above. Further, the non-matching conditions were judged correctly just as often as their matching counterparts, in contrast to the question-answer pairs (see section 2.3).
2.5. Natural-looking scenarios We also tested another kind of visual context, viz. severely simplified representations of two children playing with geometrical figures in distinct but partially overlapping “corners”, as in Figures 4–5 below. In this method the test sentences were also slightly modified to fit the scenarios: (10) a. Für genau ein Dreieck gilt, dass jedes Kind es in seiner Ecke hat. ‘Exactly one triangle is such that every child has it in his corner.’ b. Für jedes Kind gilt, dass es genau ein Dreieck in seiner Ecke hat. ‘Every child is such that he has exactly one triangle in his corner.’
Figure 4. Diagram for ∀∃-reading
Figure 5. Diagram for ∃∀-reading
One advantage of these scenarios is that they depict relatively natural situations. Moreover, the possibility of lexical content influencing plausibility is eliminated, since it is not any more likely that a child has a particular geometrical object than any other state of affairs.
2.5.1. Results and discussion The distribution of “yes”-answers is displayed in the rightmost panel in Figure 1. Again, there is a clear difference in the percentage of “yes” answers between the matching (∃∀: 81.9%, ∀∃: 86.8%) and non-matching conditions (∃∀: 20.8%, ∀∃: 15.3%). In ANOVAs, this resulted in a signifi-
62 Oliver Bott and Janina Radó cant interaction between sentence and context (F1 (1,23)=116.943; p < 0.01; F2 (1,23)=201.813; p < 0.01), while the main effects didn’t reach significance. This shows that the diagrams were only consistent with the reading they were intended to depict. In sum, the pretests show that all three methods of disambiguation were successful in indicating the intended scope readings. However, a closer inspection of the results reveals sizeable differences among the candidate methods, especially in the case of false “yes” answers. This raises the question whether some of our methods may have performed “better” than the others. 2.6. Reliability of the three methods Rating methods have often been considered unsuitable for scientific purposes. However, the application of test theoretic standards in psychology has shown that methods involving judgments do in fact produce highly consistent and valid results7. We treated our pretest results as a kind of psychological test in order to ask the following questions: to what extent are our methods able to provide consistent data? Are there any substantial differences between the methods in terms of consistency/reliability? To answer these questions we have to determine how consistent the participants were contingent on the applied method8. This can be done by computing the interrater reliability. Intraclass correlation coefficients are statistics for measuring homogeneity, not only for pairs of measurements but for larger sets of measurements as well (for an overview see McGraw & Wong 1996; Wirtz & Caspar 2002). Perfect agreement among participants would be reflected by an ICC coefficient of 1, while total disagreement would yield a coefficient of 0. Although there are no guidelines concerning the interpretation of ICC coefficients (see the discussion in Wirtz & Caspar 2002), tests yielding values of 0.7 or greater are often considered to be sufficiently reliable. In the pretests each participant saw six items in each condition according to a latin square design. We computed the relative frequency of “yes”answers for each subject in each condition (ranging from zero to six out of six answers). Using these we computed two-way mixed effects intraclass correlation coefficients (interaction absent) for each method based on absolute agreement among participants. For single raters we obtained ICCs of 0.69 for question-answer pairs, 0.85 for set diagrams and 0.78 for scenarios. Although question-answer pairs received the smallest ICC score, the nu-
Quantifying quantifier scope: A cross-methodological comparison
63
merical differences between the candidate methods were not significant. Furthermore, for average raters all three methods produced coefficients greater than 0.95, showing that experimental results obtained by any of our candidate methods and using a large sample of participants should yield highly consistent semantic judgment data. 3.
Comparing the methods
All three candidate methods proved to be reliable when used with scopeunambiguous sentences. This qualifies all of them for the real test case. Are they also suited to detect subtle scope preferences in potentially ambiguous sentences? This is a question about the validity of the methods. We will investigate two aspects of validity. Construct validity indicates whether the results the methods produce correspond to the consensus in the theoretical literature on quantifier scope, whereas criterion-related validity shows how well the results mirror actual scope preferences in ambiguous sentences found in corpora. We will first introduce these two criteria, then discuss the validity of each of our methods. 3.1. Construct validity: hypotheses from the literature Although some theories on quantifier scope state that a sentence containing two or more quantifiers has all combinatorily possible scope readings (for example May 1977, 1985), two factors have repeatedly been claimed to limit the scope potential of multiply quantified sentences. First, it is commonly assumed that a linear reading is preferred over an inverse reading9. Second, distributive quantifiers like each have a stronger tendency to take wide scope than non-distributive quantifiers like all, see for instance Beghelli & Stowell (2002). In (11) both of these features are manipulated. (11) a. Genau einen dieser Professoren haben alle Studentinnen verehrt. exactly one of-these professors have all students adored ‘All students adored exactly one of these professors.’ b. Genau einen dieser Professoren hat jede Studentin verehrt. exactly one of-these professors has every student adored ‘Every student adored exactly one of these professors.’ c. Alle Studentinnen haben genau einen dieser Professoren verehrt. ‘All students adored exactly one of these professors.’
64 Oliver Bott and Janina Radó d. Jede Studentin hat genau einen dieser Professoren verehrt. ‘Every student adored exactly one of these professors.’ Pafel (2005) develops a theory on quantifier scope in German with a broad empirical coverage. In this account scope preferences are computed on the basis of an additive linear model of a number of individually weighted scope-determining factors including linear order and distributivity. For doubly quantified sentences like the ones in (11) Pafel’s theory makes the predictions in Figure 6.10
Figure 6. Predictions based on Pafel (2005)
For every combination of the manipulated factors linear order and distributivity each quantifier received a scope value. The difference between the scope values of the quantifiers within a sentence reflects the scope preferences for that sentence: the ein … jede cases are predicted to be fairly scope ambiguous, while the jede … ein and the alle … ein cases should exhibit a preference for the wide scope universal reading, and the ein … alle cases should prefer a wide scope existential interpretation. Furthermore, the influence of the two factors linear precedence and distributivity is predicted to be independent, which should give rise to purely additive effects.
3.2. Criterion-related validity: results from a corpus study We conducted a corpus study on all public written corpora of German in the Cosmas Corpus. Our aim was to extract constructions that are maximally similar to the sentences in (11). We queried sentences without any embed-
Quantifying quantifier scope: A cross-methodological comparison
65
ding containing two quantifiers: ein as direct object and jede or alle as subject of a simple transitive verb, where one of the quantifiers occured in sentence-initial position. The maximum distance between the quantifiers was four intervening words. To rule out any indefinite use of ein we excluded those occurrences where it could be phonetically reduced to n. Most importantly, we chose only sentences where the context clearly indicated the intended scope. The distribution of readings obtained in the corpus study is shown in Figure 7. Although sparseness of data in some of the conditions made a quantitative analysis impossible, the results of the corpus study fit quite nicely with the predictions of Pafel’s theory.
Figure 7. Distribution of scope readings: Corpus study
Constructions with a universal quantifier (=subject) preceding the existential quantifier (=direct object), which were predicted to preferentially have a wide scope universal reading, are in fact used mainly with that interpretation. This tendency is stronger for distributive jede than non-distributive alle which also had some wide scope existential uses. The six occurrences of ein … alle all received a wide scope existential interpretation. Only the results for the ein … jede-cases are somewhat surprising: although predicted to be fully scope-ambiguous, they were preferably used with a wide scope universal interpretation. Given the very few corpus instances of this construction this apparent difference cannot be interpreted.11 Taken together, the theoretical predictions and the results from the corpus study provide a consistent background against which we can validate the three disambiguation methods.
66 Oliver Bott and Janina Radó 3.3. Design and procedure As in the pretest, we made an effort to keep the materials and procedure maximally similar across the methods. The test sentences were modified versions of the items used in the pretest in that the two quantifiers were clause-mates, to make both scope readings possible. In addition to varying the linear order of the quantifiers, we also added the factor distributivity, yielding the four conditions given in (11) above12. They were paired with the same two versions of the disambiguating context that we had used in the pretest, yielding a total of eight conditions. Fillers from the pretest were modified the same way as the test items. In this design, the different scope potentials of jede and alle predicted by Pafel (2005) should be reflected by an interaction between distributivity and context; thus for instance the ein … jede cases are expected to be more compatible with the ∀∃-interpretation, and less compatible with the ∃∀interpretation, than the ein … alle cases. Similarly, the order of quantifiers is predicted to show its influence on scope in an interaction between order of quantifiers and context: the jede … ein order should be more compatible with the ∀∃-reading, and less compatible with the ∃∀-reading, than the ein … jede order. Finally, since in Pafel’s theory these effects are independent, there should be no three-way interaction of distributivity, order, and context (see also section 3.1 and Figure 6). The simple yes-no answers we collected in the pretest are not sufficient to capture the subtle scope differences we expected in this part of the study. Therefore we decided to use magnitude estimation, where each test item is compared to a reference item (constant throughout the test). Subjects provide judgments on an individual open-ended scale, expressing as many distinctions among the items as they feel necessary (see Bard et al. (1996) for details of the method). For all tests we will report z-transformed ratings, computed on the basis of all judgments for experimental items plus filler trials. Positive values indicate “yes”-judgments, negative values indicate “no”. The resulting scores were subjected to three-way repeated measures ANOVAs with order of quantifiers, distributivity (jede vs. alle) and disambiguating context (∃∀ vs. ∀∃) as within-factors. Subjects were tested individually. They first received detailed written instructions and then completed a short practice session to familiarize themselves with the magnitude estimation technique. The actual test session consisted of 60 items (24 test items and 36 distractors), and took approximately 30 minutes to complete.
Quantifying quantifier scope: A cross-methodological comparison
67
Subjects were recruited in the same way as for the pretest. They were all native speakers of German and naive to the purpose of the study. A total of 96 subjects participated in the tests (question-answer pairs: 32, set diagrams: 24, scenarios: 40) for a payment of €5. No subject completed more than one questionnaire or had participated in any of the pretests. 3.4. Question-Answer pairs In this test one version of the scope-disambiguated questions in (5) was paired with one of the four item versions given in (11) above, preceded by Ja, stimmt “Yes, that’s right”. The distribution of readings in the question-answer test is shown in Figure 8. While the ein … jede-cases were considered scope ambiguous, the jede … ein- and alle … ein-cases preferably received a wide scope universal interpretation, and the ein … alle-cases a wide scope existential interpretation.
Figure 8. Question-answer pairs: Results
ANOVAs revealed significant interactions between order and context (F1 (1,31)=14.176; p <0.01; F2 (1,23)=36.674; p < 0.01) and between distributivity and context (F1 (1,31)=10.787; p < 0.01; F2 (1,23)=12.032; p <0.01). The former is due to a general preference for linear scope readings, the latter to distributive jede taking wide scope more easily than alle. Crucially, the three-way interaction between order, distributivity and context was far from significant (Fs <1). No other effects were significant. Thus question-answer pairs appear to be sensitive to subtle distinctions in scope preferences. Both manipulated factors showed an effect in the ex-
68 Oliver Bott and Janina Radó pected direction. Furthermore, the effects were purely additive, as predicted by Pafel’s theory. In terms of validity, question-answer pairs thus elicit good semantic judgements, at least as long as standards used in psycholinguistics (lots of items, randomized presentation, the use of distractors etc.) are adhered to.
3.5. Set diagrams For this test the four sentence versions in (11) were paired with diagrams of the type shown in Figures 2–3 above. The distribution of readings is depicted in Figure 9. Again, ANOVAs revealed significant interactions between order and context (F1 (1,55)=20.705; p < 0.01; F2 (1,23)=41.729; p < 0.01) and between distributivity and context (F1 (1,55)=80.059; p < 0.01; F2 (1,23)=54.146; p <0.01). These interactions are due to two independent factors: First, the leftmost quantifier receives wide scope and second, distributive quantifiers receive wide scope more easily than non-distributive quantifiers. There was only one other significant effect, the interaction between order and distributivity (F1 (1,55)= 5.398; p < 0.05; F2 (1,23)=4.722; p < 0.05). This interaction results from the jede … ein-conditions receiving higher scores on average than all other constructions. Thus this method also allowed us to capture fine distinctions concerning scope ambiguous sentences. Again, the results are fully consistent with our validation criteria.
Figure 9. Set diagrams: Results
Quantifying quantifier scope: A cross-methodological comparison
69
3.6. Scenarios The four sentence versions in (12) were paired with diagrams of the type given in Figures 4–5 above. (12) a. Genau eines dieser Dreiecke haben alle Kinder in ihrer Ecke. exactly one of-these triangles have all children in their corners ‘All children have exactly one of these triangles in their corners.’ b. Genau ein dieser Dreiecke hat jedes Kind in seiner Ecke. exactly one of-these triangles has every child in his corner ‘Every child has exactly one of these triangles in his corner.’ c. Alle Kinder haben genau eines dieser Dreiecke in ihrer Ecke. ‘All children have exactly one of these triangles in their corners.’ d. Jedes Kind hat genau ein dieser Dreiecke in seiner Ecke. ‘Every child has exactly one of these triangles in his corner.’ The pattern of results obtained with the scenarios was quite different from the previous ones: there was an across-the-board preference for the wide scope universal interpretation (see Figure 10). In ANOVAs, this preference was reflected by a significant main effect of disambiguation (F1 (1,39)= 88.254; p < 0.01; F2 (1,23)=97.014; p < 0.01). No other effects reached significance.
Figure 10. Scenarios: Results
The overwhelming preference for the wide scope universal reading irrespective of the linguistic manipulation is quite puzzling, particularly in light
70 Oliver Bott and Janina Radó of our pretest results. In the pretest, where we used disambiguated versions of the same sentences and exactly the same diagrams, both readings were equally available. Thus the method seems to show a different behavior when applied to scope ambiguous and to scope disambiguated sentences. One possible explanation for this strange behavior may have to do with the relative semantic complexity of the two readings. Closer scrutiny reveals that the quantified expression genau ein “exactly one x” corresponds to not one, but two quantifiers: like definite NPs, it must be analyzed as ∃xφ(x)∧∀y(φ(y)→y = x). Moreover, the same “exactly one x” expression is hidden inside the predicate “have something in one’s corner” as well, which can be paraphrased as “there is exactly one x (x = corner) belonging to y and y has something in x”. When the two “exactly one” quantifiers are adjacent as in (12c–d), it is possible to simplify them as “exactly one corner with one triangle”. The two scope versions of the sentence could thus be rendered roughly as “Every child is such that there is exactly one corner with one triangle such that it belongs to the child” (the ∀∃-reading) and “There is exactly one triangle such that for every child it is the case that there is exactly one corner belonging to that child such that the triangle is in that corner” (the ∃∀-reading). There is a clear intuitive difference in complexity between the two scope versions; moreover, this difference can be formalized in terms of the arithmetical hierarchy in recursion theory. It is thus possible that the wide scope existential reading was avoided by the subjects because it is considerably more difficult to process.13 Another possibility has to do with pragmatics. A sentence like “Every child has a triangle in his corner” is typically considered felicitous in a situation where each child is the exclusive owner of a triangle, but not if two children share one. This clearly favors the ∀∃-diagrams as long as the corresponding sentences allow this interpretation. Thus ∀∃-diagrams are expected to receive high ratings even when paired with a sentence with the genau ein … alle order, where the wide scope existential reading is predicted to be the preferred, but not the only interpretation (recall that this was not the case in the ∃∀-conditions in the pretest). What happens when the same sentence appears together with a ∃∀-diagram? It is conceivable that subjects settle on one interpretation of the sentence first, without examining the scenario. The wide scope existential reading, bolstered by the semantic factors, is rejected in favor of the wide scope universal one, and the latter is the only reading available when the subject inspects the diagram. In this case we would expect the ∃∀-diagram to be rated low, which is exactly what we found.
Quantifying quantifier scope: A cross-methodological comparison
71
4. General discussion Our methodological comparison turned up two suitable methods for studying quantifier scope: question-answer pairs and set diagrams. They both provided valid and reliable data, although the former did not perform quite as well as the latter. By contrast, our third method did not live up to the expectations: the simple and attractive natural-looking scenarios had to be paired with sentences that are clearly not appropriate for testing naive speakers. A linguistically relevant result of the present study is providing evidence for subtle distinctions in judgments across the different constructions we tested. This lends support to semantic theories that rely on the interaction of multiple factors to determine scope relations. The methods that did well in the comparison can also be used to assess the influence of other factors on quantifier scope (e.g. intonation). In addition, they can be applied to related phenomena such as the interaction of modals with negation or collective vs. distributive readings. Moreover, the present comparison has implications for psycholinguistic research. First, our “winner” methods are suitable to determine offline preferences that are needed to interpret online processing results: They can replace those “disambiguating” contexts which, as we argued in section 1.3, fail to identify the reading subjects have in mind. Of course knowing what reading was eventually computed is crucial for the correct interpretation of processing data as well. The methods that passed our tests can also be employed as comprehension questions in online experiments to assess this reading. However, the central contribution of the present study goes beyond possible applications of the question-answer pairs or set diagrams. More importantly, we have demonstrated the use test-theoretical standards for finding sound methods to collect data in linguistic or psycholinguistic research.
72 Oliver Bott and Janina Radó Appendix Below are the sentences used in the question-answer and set diagram tests. Each sentence began with Genau einen dieser and appeared in the four conditions given in (11) in the main text. Genau einen dieser… 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.
Gesetzentwürfe wollte jede Oppositionspartei blockieren. Sachmittelanträge hat jede Expertengruppe vehement abgelehnt. Verdächtigen konnte jede Person eindeutig identifizieren. Zuschüsse hat jede Familie beim Finanzamt beantragt. Professoren hat jede Studentin über die Maßen angehimmelt. wilden Buben hat jede Erzieherin etwas zu hart bestraft. Theologiedozenten hat jede Prüfungskandidatin sehr gefürchtet. Fußballspieler hat jede Zuschauerrin lautstark angefeuert. Volkstänze hat jede Tanzschule ins Programm aufgenommen. Verbrecher hat jede Regierung lange und intensiv gesucht. Politiker hat jede Wählerin als üblen Sexist beschimpft. Stoffe hat jedes Labor für total unbedenklich erklärt. Rotweine hat jede Weinhandlung aus dem Angebot genommen. Wachdienste hat jede Bau-Firma längerfristig engagiert. Operntexte hat jede Sopranistin außerordentlich gemocht. Arbeitsverträge hat jede Gesellschaft als Vorlage genommen. epischen Texte hat jedes Jurymitglied überragend gefunden. Naturbildbände hat jede Buchhändlerin überaus gern empfohlen. Filmschauspieler hat jede Journalistin hartnäckig belagert. Pilze hat jede Klasse überwiegend richtig bestimmt. Riesenwasserfälle hat jede Reisegesellschaft eifrig fotografiert. Therapieansätze hätte jede Psychotherapeutin gerne ausprobiert. Comics hat jedes Kind geradezu gierig verschlungen. Fremdsprachenkurse hat jede Krankenschwester billiger bekommen.
Notes 1.
Strictly speaking, our discussion in this paper concerns subtypes of the questionnaire method used e.g. in psycholinguistic research. We will nevertheless refer to these subtypes as separate “methods”, in order to keep the presentation simple.
Quantifying quantifier scope: A cross-methodological comparison 2. 3.
4. 5. 6. 7. 8.
9. 10.
11. 12. 13.
73
Or c-command, although we prefer the more theory-neutral linear order. This becomes clear if (3a) is combined with disambiguated paraphrases of (2): (i) a. Every boy is such that he climbed a tree. The tree was full of apples. b. There was a tree that every boy climbed. The tree was full of apples. All test sentences are given in the Appendix in the scope-ambiguous version, see section 3.3. For all pretests we will report ANOVAs with two within factors: scope in the target sentence (∃∀ vs. ∀∃) and scope in disambiguating context (∃∀ vs. ∀∃). The set denoted by the subject phrase always appeared on the right-hand side of the diagram. Validity will be taken up in section 3. In all pretests the ratings were very consistent across items, indicated by the items analyses. The variance across pretest results is thus mainly due to differences between subjects. We call a reading linear if the order of quantifiers in a sentence corresponds to their scope, and inverse otherwise. For our predictions we used a model without quantitative thresholds, unlike Pafel (2005). This was because in a whole series of experiments on quantifier scope we never found evidence for such thresholds. Note that the actual distribution (with ∀∃:∃∀ = 14: 7) does not differ significantly from an assumed perfectly ambiguous one (t(20) =1.58; p = 0.13). All items are given in the Appendix. The prefixes for the two readings in prenex normal form are ∀x∃!y∃!z and ∃!y∀x∃!z, respectively. Since two adjacent quantifiers of the same type can be contracted into a new quantifier over pairs (see Oddifreddi 1999), after permuting the universal and the existential quantifiers we get ∀x∃!y,z vs. ∃!y∀x∃!z, which means a difference of at least one quantifier on the arithmetical hierarchy.
References Anderson, Catherine 2004 The structure and real-time comprehension of quantifier scope ambiguity. Ph.D. thesis, Northwestern University, Evanston, IL. Bard, Ellen Gurman, Dan Robertson & Antonella Sorace 1996 Magnitude estimation of linguistic acceptability. Language 72: 32–68. Beghelli, Filippo & Tim Stowell 2002 Distributivity and negation: The syntax of each and every. In Ways of Scope Taking, Anna Szabolcsi (ed.), 65–75. Dordrecht: Kluwer. Filik, Ruth, Kevin B. Paterson & Simon P. Liversedge 2004 Processing doubly quantified sentences: Evidence from eyemovements. Psychonomic Bulletin & Review 11 (5): 953–959.
74 Oliver Bott and Janina Radó Gillen, Kathryn 1991 The comprehension of doubly quantified sentences. Ph.D. thesis, University of Durham. Higginbotham, James 1985 On semantics. Linguistic Inquiry 16: 547–593. Hornstein, Norbert 1984 Logic as grammar. Cambridge, MA: MIT Press. 1995 Logical form. Oxford: Blackwell. Jackson, Scott & Will Lewis 2005 The relation between prosody and logical scope varies by the operator. Poster presented at the CUNY Conference Tucson, Arizona. Kurtzman, Howard S. & Maryellen C. MacDonald 1993 Resolution of quantifier scope ambiguities. Cognition 8: 243–79. May, Robert 1977 The grammar of quantification. Ph.D. thesis, MIT, Cambridge, MA. 1985 Logical form: Its structure and derivation. Cambridge, MA: MIT Press. McGraw, Kenneth O. & Seok P. Wong 1996 Forming inferences on some intraclass correlation coefficients. Psychological Methods 1 (1): 30–46. Oddifreddi, Piergiorgio 1999 Classical recursion theory. Amsterdam: Elsevier. Pafel, Jürgen 2005 Quantifier scope in German. An investigation into the relation between syntax and semantics. Amsterdam: Benjamins. Reinhart, Tanya 1976 The syntactic domain of anaphora. Ph.D. thesis, MIT, Cambridge. 1983 Anaphora and semantic interpretation. London: Croom Helm. Tunstall, Susanne L. 1998 The interpretation of quantifiers: Semantics and processing. Ph.D. thesis, University of Massachusetts, Amherst. Wirtz, Markus & Franz Caspar 2002 Beurteilerübereinstimmung und Beurteilerreliabilität. Freiburg: Hogrefe.
Is syntactic knowledge probabilistic? Experiments with the English dative alternation Joan Bresnan
Theoretical linguistics traditionally relies on linguistic intuitions such as grammaticality judgments for data. But the massive growth of language technologies has made the spontaneous use of language in natural settings a rich and easily accessible alternative source of data. Moreover, studies of usage as well as intuitive judgments have shown that linguistic intuitions of grammaticality are deeply flawed, because (1) they seriously underestimate the space of grammatical possibility by ignoring the effects of multiple conflicting formal, semantic, and contextual constraints, and (2) they may reflect probability instead of grammaticality. Both of these points are richly exemplified by studies of the English dative alternation (Green 1971; Gries 2003, 2005; Fellbaum 2005; Bresnan & Nikitina 2003; Bresnan, Cueni, Nikitina & Baayen 2007; Lapata 1999; Bresnan & Hay forthc.; Hay & Bresnan 2006), which is the linguistic domain of the present study. The present study discusses two experiments following up Bresnan et al. (in press). The first indicates that the “soft” generalizations found in corpus studies of the dative alternation reappear in subjects’ intuitions of grammaticality in context, and that language users have substantial knowledge on the basis of these generalizations of what others are going to say (meaning here the choice of syntactic structure to convey the message). The second experiment shows that rare constructions that have been considered ungrammatical by many linguistic theorists are judged natural by speakers when the appropriate soft conditions are met. Intuitive contrasts in grammaticality that many linguists have reported seem to reflect probabilities rather than categorical constraints. Background The English dative alternation is illustrated in (1): (1)
a. Who gave you that wonderful watch? ← double object construction b. Who gave that wonderful watch to you? ← prepositional dative
76 Joan Bresnan Although alternative forms often have different meanings (Pinker 1989; Levin 1993; Rappaport-Hovav & Levin 2005), frequently explained in terms of “the principle of contrast” (E. Clark 1987), the alternatives in (1a,b) are very close paraphrases, and the flexibility afforded by their violation of the principle of contrast appears to have functional advantages in sentence production (V. Ferreira 1996). Moreover, subtle intuitions of fine-grained semantic differences between syntactic constructions have turned out in many cases to be inconsistent and unreliable (Fellbaum 2005; Bresnan & Nikitina 2003; Bresnan, Cueni, Nikitina & Baayen 2007; Bresnan 2007). We therefore view the prepositional dative and double object constructions as having overlapping meanings which permit them to be used as alternative expressions or paraphrases. Previous studies have shown that the probability of using one or the other of these two alternatives – the double object construction or the prepositional dative – is associated with the verb and its semantic class (Lapata 1999; Gries 2005) and is respectively increased/decreased when the first phrase following the verb is a pronoun/lexical noun phrase, is definite/ indefinite, refers to a highly accessible referent/a referent not previously mentioned, refers to a human/non-human, or is shorter/longer (Bock & Irwin 1980; Thompson 1990; Bock, Loebell & Morey 1992; Hawkins 1994; Collins 1995; Prat-Sala & Branigan 2000; Arnold et al. 2000; Snyder 2003; Wasow 2002; Gries 2003). From these and other variables such as the previous occurrence of a parallel structure (Bock 1986; Pickering, Branigan & McLean 2002; Gries 2005; Szmrecsányi 2005), it is possible to predict the dative alternation (that is, predict which alternative is used: (1a) or (1b)) in spoken English with 94% accuracy (Bresnan, Cueni, Nikitina & Baayen 2007). Bresnan et al. show that their model generalizes beyond the contingencies of the particular collection of telephone conversations that constitutes their spoken dative database and predicts statistical differences in a very different written corpus of edited reportage. The generalizability of the model raises the question of whether it represents some aspects of the implicit knowledge of English language users. Experiment 1 If the probabilistic model of Bresnan et al. captures the implicit knowledge of English language users, then theoretically language users could predict the dative syntax choices that speakers make, as a function of the same
Is syntactic knowledge probabilistic?
77
kinds of variables – just as the model does. Where the corpus model predicts high or low probabilities, subjects should also do so, and where the model predicts middle-range probabilities (underdetermining dative syntax choices), subjects should do so as well. Sample Model Probabilities for Dative PP (1) vs. NP (0)
Index
Figure 1. Sample probabilities from the corpus model of Bresnan et al.
Figure 1 shows the model probabilities of a prepositional dative construction for a random sample of one hundred observations of the alternating verbs from the Bresnan et al. spoken corpus dataset of 2360 observations. The data points at the top of the vertical y axis scale have probabilities near 1 of being a prepositional construction, those at the bottom have probabilities near 0. In this model of the binary choice between the two alternative
78 Joan Bresnan dative paraphrases illustrated in (1a,b), low probability of being a prepositional dative construction is equivalent to high probability of being a double object construction, so the points at the bottom are almost always realized in the double object construction. The prevalence of data points near the zero end of the scale (the bottom of the y axis) reflects the overall skewedness of the data toward double object constructions, which constitute 79% of the total observations. The data points in the middle of the y axis scale are cases where both of the alternative constructions have substantial probability – 50/50, 60/40, and the like. Hypothesis The specific hypothesis investigated in Experiment 1 is this: given the same multivariable information as the corpus model, including contextual information from the original dialogues, subjects will make ratings of alternative dative constructions like (1a,b) that correspond to the corpus model probabilities. Method The task was inspired by Rosenbach’s (2003) experiment on the genitive alternation, which required subjects to choose between alternative possessive constructions as continuations of edited passages excerpted from a novel. The present experiment introduces several differences in method. First, the items are built from randomly sampled transcriptions of spoken dialogue passages, rather than selected literary passages in accordance with a factorial design. Second, subjects are given a scalar instead of a binary rating task. And third, subjects’ responses are analyzed as a function of the original corpus model predictor variables by using mixed effects regression (Pinheiro & Bates 2000; Bates & Sarkar 2006). This type of regression can model the responses as a function of the linguistic predictors while simultaneously taking into account the clusters of data caused by multiple observations from both of the randomly sampled elements – the experimental subjects and the dative verbs. The experimental items were chosen by randomly sampling observations in the dative corpus data from the centers of five equal probability bins defined by the corpus model, ranging from very low probability of being a prepositional dative to very high. Potentially ambiguous items were excluded. The item probabilities are shown in Figure 2. For each sampled observation an alternative paraphrase was constructed, and both were presented as choices in the original dialogue context, which
Is syntactic knowledge probabilistic?
79
Corpus Modelled Probabilities
was edited for readability by shortening and by removing disfluencies. Items were pseudo-randomized and construction choices were alternated to make up a questionnaire. The subjects were nineteen paid Stanford undergraduates of both genders who reported that they were monolingual and had not taken a syntax course. Each subject received the same questionnaire, with the same order of items and construction choices. Figure 3 displays a sample item. Subjects were asked to rate the naturalness of the alternatives in the given context by distributing 100 rating points over the two alternatives in accordance with their own intuitions. Any pair of scores summing to 100 was permitted, including 0 –100, 63–37, 50–50, etc.
Sampled Constructions for Experiment 1
Figure 2. Probability bins of items for Experiment 1
80 Joan Bresnan Speaker: About twenty-five, twenty-six years ago, my brother-inlaw showed up in my front yard pulling a trailer. And in this trailer he had a pony, which I didn’t know he was bringing. And so over the weekend I had to go out and find some wood and put up some kind of a structure to house that pony, (1) because he brought the pony to my children. (2) because he brought my children the pony. Figure 3. Sample item for Experiment 1
Results Plots of the data suggest that subjects’ scores of the naturalness of the alternative syntactic paraphrases correlate with the corpus model probabilities. Figure 4 shows the mean subject scores for each item plotted against the corpus model probability of the item. The line is a nonparametric smoother which indicates the trend of the data by averaging local values; it shows a roughly linear correspondence between the corpus model probabilities and the mean item scores. Note that the items in the middle probability bins overlap far more in average ratings than those in the extreme bins, indicating that average subjects’ scores are most indecisive where the corpus model is least accurate. In Figure 5 each panel shows a single subject’s mean scores for the items in each corpus probability bin. (The subject numbers are not contiguous because data from seven of twenty-six who completed the questionnaire were excluded because they reported they were either bilingual or had taken a syntax class.) All of the subjects’ mean ratings of items from the lowest probability bin are below their mean ratings of items from the highest probability bin. The ratings of items from the middle bins tend to fall in the middle of each subject’s rating range, though their relative rankings vary quite a bit across subjects, as expected from the original corpus model probabilities (Figure 1).
Is syntactic knowledge probabilistic?
81
Items: Mean Scores by Probability
Figure 4. Mean Experiment 1 Scores of Items by Probability Bin
The results were analyzed using a linear mixed effects regression model (Pinheiro & Bates 2000; Baayen 2004; Bates & Sarkar 2006), which fit the scores using adjustments for both subject and verb sense as random effects and adjustments for fixed effects conditioned on the random effects.1 The fixed effects were taken from the original corpus mode of Bresnan et al. together with the order of items, the order of construction choice and the lemma frequency of the verbs according to the CELEX database (Baayen, Piepenbrock & Gulikers 1995). The last three effects were eliminated from the model because their coefficients were less than their standard errors (Chatterjee, Hadi & Price 2000: 286–288). Table 1 shows that the 95% confidence intervals of all remaining factors except for givenness of recipient exclude 0, indicating a significant effect on the response.
82 Joan Bresnan Subjects: Mean Scores by Probability Bin
Figure 5. Mean Experiment 1 Scores by Probability Bin for Each Subject Table 1. Model Coefficients for Experiment 1 Fixed effects: (Intercept) pronominality of theme = pronoun definiteness of theme = indefinite givenness of theme = non-given pronominality of recipient = pronoun definiteness of recipient = indefinite givenness recipient = non-given animacy of recipient = inanimate parallelism of PP argument length difference (log scale)
95% Conf. Limits Estimate
Lower
Upper
73.19 16.91 –12.48 –14.77 –22.47 14.13 -9.00 –29.48 16.70 –4.77
45.70 10.48 –17.57 –19.62 –33.25 5.58 –19.43 –43.75 8.73 –9.37
102.22 23.28 –7.39 –9.92 –11.85 22.98 1.42 –15.66 24.67 –0.12
Number of observations: 570, groups: subject 19; verb sense 11.
Is syntactic knowledge probabilistic?
83
By examining the model coefficients in Table 1 we can interpret the results. The coefficients show the magnitudes and directions of the effects: these are consistent with the harmonic alignment effects in the original corpus model (Bresnan et al), which has been observed in many previous corpus studies (Thompson 1990; Collins 1995): nongiven or indefinite themes and pronominal recipients favor V NP NP, pronominal themes and indefinite recipients favor V NP PP. Contrary to Bresnan et al’s model, inanimate recipients favor V NP NP, but there are only two such items in the sample used in Experiment 1 and both occur with abstract senses of verbs, which strongly favor the double object construction (Bresnan & Nikitina 2003).
Observed
Scores as a Function of Model Linguistic Predictors
Fitted
Figure 6. Fit of linear mixed effects model to Experiment 1 scores
84 Joan Bresnan Finally, the fit of the experimental model is displayed in Figure 6, a trellis graph with nonparametric smoothing lines to facilitate visualization of the data (Cleveland 1979). Each panel of the trellis plot is a scatterplot of the data from a single subject, showing all thirty scores (represented on the y axis) plotted against the fitted model values (represented on the x axis). A roughly linear relation appears in each panel, indicating a good fit of the model variables to the score data. These results show that subjects’ scores of the naturalness of the alternative syntactic structures correlate very well with the corpus model probabilities and can be substantially explained as a function of the same predictors as the original corpus model. In fact, as shown in Table 2, the subjects’ preferred choices, which were made according to their own intuitions, reliably tended to pick out the same choices made by the original dialogue participants in the corpus transcriptions. If they had invariably preferred the double object construction in every item, their responses would have matched 57% of the original speakers’ choices; this is the baseline in Table 2. In actuality, their responses matched the original choices well over the baseline. Their ratings are thus good predictors of what the speakers would say. Table 2. Proportions of subjects’ ratings favoring actual corpus choices 0.63 0.80 0.73 0.80 0.73
0.83 0.80 0.83 0.77 0.87
0.80 0.67 0.80 0.77 0.67
0.70 0.77 0.77 0.73 Baseline =0.57
Experiment 2 Experiment 1 suggests that language users’ implicit knowledge of the dative alternation in context reflects the usage probabilities of the construction. In Experiment 2 we ask whether linguistic manipulations that raise or lower probabilities influence grammaticality judgments. Mismatches between grammaticality judgments reported by linguists and the actual language use of speakers and writers are surprisingly common, particularly in areas of theoretical syntax and semantics where subtle contrasts are invoked. A variety of cases are discussed in Bresnan (2007). The English dative alternation provides one such case, illustrated in (2) and (3),
Is syntactic knowledge probabilistic?
85
where the double object constructions reported by linguists to be ungrammatical with verbs like drag and whisper are found in actual usage (Bresnan & Nikitina 2003; Bresnan et al. 2007). (In example (2a), Sumomo is the name of a small robot servant.) (2)
a. …while Sumomo dragged him a can of beer. ← attested example b. *I dragged John the box. ← reported grammaticality judgment
(3)
a. She came back and whispered me the price. ← attested example b. *Susan whispered Rachel the news ← reported grammaticality judgment
Although we lack specific probability estimates for all of the relevant verbs, we know that differing alternation classes of dative verbs correspond to differing frequencies of use in internet samples (Lapata 1999), and that different argument types are more likely to occur in different syntactic positions following dative verbs (Thompson 1990; Collins 1995; Bresnan et al. 2007). In particular, double object constructions in which a pronoun precedes a lexical NP are far more frequent than those in which two lexical NPs occur, as shown in Table 3, and it is in the more frequent contexts that reportedly non-alternating dative verbs can most readily be found in actual use. Table 3. Frequency of Dative Double Object Constructions in SWITCHBOARD V […Pronoun…] NP 1530
V […Noun…] NP 178
Thus drag and whisper are reported to be ungrammatical in the double object construction, but Google queries yield examples in the more frequent construction types (2a) and (3a), along with dragged the body to the king and whisper the password to the fat lady. The reportedly ungrammatical examples constructed by linguists as in (2b) and (3b), tend to utilize the far less frequent positionings of argument types, like drag the king the body and whisper the fat lady the answer. Hypothesis Subjects’ ratings of the reportedly ungrammatical dative constructions will indicate grammaticality when the probability of the syntactic context is higher.
86 Joan Bresnan Method Experiment 2 used the same task as Experiment 1. Six alternating and eight reportedly non-alternating verbs were sampled from the internet. There were three alternating verbs of communication by instrument verbs (‘a_cm’) – phone, text, IM – and three alternating verbs of instantaneous transfer (‘a_tr’) – flip, throw, toss. There were four reportedly non-alternating verbs of manner of communication (‘n_cm’) – whisper, mutter, mumble, yell – and four reportedly non-alternating verbs of continuous transfer (‘n_tr’) – carry, push, drag, lower (Pinker 1989). The verb types are summarized in Table 4. All of the verbs were sampled in the construction types found to be most frequent in corpus studies – the double object construction with pronoun recipient preceding lexical NP theme or the dative construction with a lexical NP as prepositional object. Table 4. Verbs used in Experiment 2 Communication
Transfer
Alternating
Non-Alternating
Alternating
Non-Alternating
‘a_cm’ phone text IM
‘n_cm’ whisper mutter mumble yell
‘a_tr’ flip throw toss
‘n_tr’ carry push drag lower
Table 5. Contexts for each verb V […Pronoun…] NP V NP to […Pronoun…] V NP to […Noun…] V […Noun…] NP
(sampled) (constructed) (sampled) (constructed)
Money in the pot is dead money. It does not belong to anyone until the hand is over (1) and the dealer pushes someone the pot. (2) and the dealer pushes the pot to someone. Figure 7. Sample item for Experiment 2
Is syntactic knowledge probabilistic?
87
With each sample the preceding context was obtained for discourse cohesion and the presence of any parallel structures, which are known to influence syntactic choices (see Szmrecsányi 2005 for references). An alternative to each sampled sentence was created in the opposite construction type. For example, for the sample sentence containing whisper me the price the alternative whisper the price to me was created; and for a sample containing whisper the password to the fat lady, an alternative whisper the fat lady the password was created. Similarly, for the sample sentence containing toss the ball to Worthy, the alternative toss Worthy the ball was constructed; and for toss me the socks, toss the socks to me was constructed. Thus, each verb in each semantic class occurred in the four conditions shown in Table 5. (The data also included two instances of someone sampled in the prepositional dative construction and one instance of someone sampled in the double object construction.) The same method of creating a questionnaire was used as in Experiment 1. Figure 7 displays a sample item for the reportedly non-alternating verb push. The subjects were twenty paid Stanford undergraduates of both genders who reported that they were monolingual and had not taken a syntax course. They were given the same forced-choice scalar rating task as in Experiment 1. Results A plot of the data is given in Figure 8. In this and subsequent plots the vertical axis high score limit now shows the top rating for the double object construction, because this is precisely the construction which is at issue— found in actual usage but judged ungrammatical by linguists. Figure 8 shows that the ranges of subjects’ mean scores of the double object constructions appear to differ by both semantic class of the verb and pronominality of the recipient. The columns represent the verb classes shown in Table 4: in each panel, the first and third classes are alternating (‘a_cm’, ‘a_tr’), while the second and fourth are reportedly non-alternating (‘n_cm’ and ‘n_tr’). The black dots designate the middles of the ranges of mean scores in each verb class, the boxes are the interquartile ranges, and circled points falling outside of the dashed lines are potential outliers. The panel labeled ‘V […Noun…] NP’ on the left represents the less frequent type in which both objects are lexical NPs; the panel labeled ‘V […Pron…] NP’ on the right represents the very frequent type in which a pronoun object precedes a lexical NP object.
Score
88 Joan Bresnan
Verb Alternation Class Figure 8. Ranges of subjects’ mean scores for double object constructions by semantic class of verb and pronominality of recipient
Looking within each panel of Figure 8, we see that given a particular structure type, V […Pronoun…] NP or V […Noun…] NP, the median scores (black dots) for the alternating verbs appear higher than those for the nonalternating verbs. Looking across the two panels, we see that all the median scores appear higher for double object constructions of the more frequent argument type (V […Pronoun…] NP) than for the less frequent type (V […Noun…] NP), regardless of verb class. Strikingly, the median scores (black dots) for the reportedly non-alternating verb classes in the V […Pronoun…] NP structure appear as high as or higher than those for the alternating verb classes in the V […Noun…] NP structure. This means that the reportedly ungrammatical verb classes appear to be rated as highly in the frequent context as the grammatical verb classes in the infrequent context. (The latter are supposed to be fully grammatical
Is syntactic knowledge probabilistic?
89
by definition as alternating verbs.) In other words, the relatively frequent argument types seem to override and reverse linguists’ reported classifications of relative grammaticality. To analyze the significance of the results, a linear mixed effects regression model was fit with both verb and subject as random effects and with the fixed effects of pronominality of recipient, semantic class, and item order. An interaction between the random effect of verb and pronominality of recipient was also included to take account of possible individual differences between verbs in their selectivity for the recipient type (pronoun or lexical noun head) – whether by prosodic, stylistic, or other differences. Such a term allows for variable adjustments to the verb estimates for both recipient types and it significantly improved the overall loglikelihood of the model, Pr(>Chisq)3.358e − 06. Construction order and verb lemma frequency were not significant and were dropped from the final model because their coefficients were less than their standard errors. As seen in Tables 3 and 5, the least frequent syntactic contexts for dative verbs – prepositional dative pronouns and lexical noun objects – were constructed, because they were non-occurring in the usage samples for the non-alternating double object verbs. This introduces a possible confound between the syntactic context types and the naturalness of the discourse passage. To measure the influence of the specific context on the choice of syntactic construction, all of the stimuli were annotated for discourse givenness of recipient and theme and the presence of a parallel construction – double object or prepositional dative – in the preceding context. Then these four factors were tested in the model: givenness of recipient and theme in the discourse context and the existence of a prior parallel double object or dative prepositional construction. All four were insignificant, with coefficients less than the standard errors, and were dropped from the final model.2 All of the recipients were animate and all of the themes inanimate, so these factors were not included in any of the models. Table 6 shows the 95% confidence intervals of the remaining variables of semantic class, pronominality of recipient, and item order. The intercept is the estimate for the nonalternating communication semantic class (‘n_cm’) of verbs (whisper, mutter, mumble, yell) with lexical noun recipients; these constitute the reference set against which the other predictor values are contrasted. These verbs are also by intuitive judgments the lowest-rated class of verbs in the double object construction, as we see from Figure 8: the top of the interquartile range of mean scores (represented by the vertical rectangle) is lower than all the others.
90 Joan Bresnan Table 6. Model Coefficients for Experiment 2 Fixed effects: (Intercept) semantic class=n_tr semantic class=a_tr semantic class=a_cm pronominality of recipient=pronoun item order
95% Conf. Limits Estimate
Lower
Upper
14.50 6.93 16.86 11.84 13.89 0.42
4.65 –5.64 1.76 0.46 4.65 0.13
24.45 14.62 25.47 22.24 22.58 0.85
Number of observations: 600, groups: subject 20; verb 14.
We can interpret Table 6 as follows. Both the pronominality of recipient and item order coefficients are positive and both of their 95% confidence intervals exclude 0, indicating that they significantly improve ratings above the intercept reference values. All three semantic class coefficients shown are also positive, increasing the rating level from that of the intercept (which is the nonalternating communication semantic class in the least frequent context of the double object construction, that of the noun recipients). Because its confidence interval includes 0, the coefficient of the other supposedly nonalternating transfer class (the verbs carry, push, drag, lower) does not differ significantly from the intercept – which is not surprising, since both nonalternating classes are rated lowest in double object constructions. The other two semantic classes contrast significantly with the reference class: their coefficients (11.84 and 16.86) indicate a positive increase in rating. However, the coefficient of pronominality of recipient (13.89) is even greater than one of these semantic class coefficients and near the center of the confidence interval of the other, meaning that in the pronoun recipient condition, the scores of the nonalternating classes do not differ significantly from those of alternating classes in the noun recipient condition. This provides confirmation of our observation in Figure 8 that the reportedly ungrammatical verb classes appear to be rated as highly in the more frequent context (the pronoun recipient condition) as the theoretically grammatical verb classes in the infrequent context (the noun recipient condition). Thus generalizations observed in Figure 8 are significant after adjusting for the experimental subject, verb, item order, and the interaction of individual verbs with pronominality of recipient.
Is syntactic knowledge probabilistic?
91
Discussion As observed in Bresnan (2007), experimental work on grammaticality judgments has been advanced by improved techniques for eliciting judgments (Schütze 1996; Cowart 1997; Bard, Robertson & Sorace 1996), but the constructed sentences used in many controlled psycholinguistic experiments are often highly artificial, isolated from connected discourse and subject to assumptions about default referents (Roland & Jurafsky 2002). Contextual information about referents should not be ignored because it influences syntactic preferences in production and comprehension (Bock 1977, 1996; Bock, Loebell & Morey 1992; Bock & Warren 1985; Kelly, Bock & Keil 1986; Prat-Sala & Branigan 2000; Thompson 1990; Collins 1995; Ferreira 1996; Rosenbach 2003, 2005; Bresnan et al. 2007). Accordingly, the experimental items of the present study are built from samples of actual usage of syntactic structures in their natural contexts. Modern statistical models provide controls (Baayen 2004). This approach has two benefits in addition to the provision of essential contextual information. In the first experiment a statistical model of the usage data from the corpus study (Bresnan et al. 2007) is used to measure subjects’ predictive capacities. In the second experiment subjects’ judgments are used to test and validate usage data drawn from the internet. In this way convergent corpus and experimental methods are brought to bear on ecologically natural linguistic materials. What can be learned from studying this natural usage data? From Experiment 2 we see that linguistic manipulations that raise or lower probabilities influence grammaticality judgments, which have traditionally been the primary and privileged data for categorical grammatical models. The experiment points to ways of establishing sounder empirical foundations for syntactic and semantic theory and suggests why the older ways of doing syntax – by generalizing from linguistic intuitions about decontextualized constructions and ignoring research on actual usage, especially quantitative corpus work – produce unreliable and inconsistent findings. From Experiment 1 we see that language users’ implicit knowledge of their language is more powerful than has been recognized under the idealizations of categorical models of grammaticality: language users can in effect make accurate probabilistic predictions of the syntactic choices of others.3 The present study is the first to our knowledge to measure the predictive capacities of language users in a syntactic domain by means of a sophisticated statistical model of usage data. This approach opens up a variety of
92 Joan Bresnan questions for further research, with potential applications in many areas of linguistics and the cognitive sciences more generally.
Acknowledgements Thanks to Daniel Casasanto, Jeff Elman, Marilyn Ford, Florian Jaeger, Steve Pinker, Anette Rosenbach, and Shravan Vasishth for helpful discussion and comments. None of them can be blamed for the use I have made of their advice. Thanks also to Nick Romero for creative internet sampling and research assistance. Graphics and models were made using R (R Development Core Team 2006, Bates & Sarkar 2006).
Notes 1. The use of verb senses follows Bresnan et al. (2007). Up to five possible senses of any verb were distinguished based on broad semantic classes of their uses in context. For example, the ‘transfer’ sense of give in give an armband is distinguished from the ‘communicative’ sense of give in give your name. 2. In a more extensive study, both the discourse and the syntactic type could be separately manipulated. 3. A subsequent still unpublished study by the author shows that similar results are obtained when subjects are simply asked to guess which alternative the original dialogue participant used and to give a numerical estimate of the likelihood of their guess being correct.
References Arnold, Jennifer, Thomas Wasow, Anthony Losongco and Ryan Ginstrom 2000 Heaviness vs. newness: The effects of complexity and information structure on constituent ordering. Language 76: 28–55. Baayen, R. Harald 2004 Statistics in psycholinguistics: A critique of some current gold standards. Mental Lexicon Working Papers, Edmonton 1: 1–45. Baayen, R. Harald, Richard Piepenbrock & Leon Gulikers 1995 The CELEX lexical database (Release 2) CD-ROM. Linguistic Data Consortium, University of Pennsylvania (distributor).
Is syntactic knowledge probabilistic?
93
Bard, Ellen G., Dan Robertson & Antonella Sorace 1996 Magnitude estimation of linguistic acceptability. Language 72: 32–68. Bates, Douglas & Deepayan Sarkar 2006 lme4: Linear mixed-effects models using S4 classes. R package version 0.995-2. http://www.R-project.org. Bock, J. Kathryn 1977 Accessibility theory: An overview. Journal of Verbal Learning and Verbal Behavior 16: 723–734. 1986 Syntactic persistence in language production. Cognitive Psychology 18: 355–387. 1996 Language production: Methods and methodologies. Psychonomic Bulletin & Review 3: 395–421. Bock, J. Kathryn & David Irwin 1980 Syntactic effects of information availability in sentence production. Journal of Verbal Learning and Verbal Behavior 19: 467–484. Bock, J. Kathryn, Helga Loebell & Randal Morey 1992 From conceptual roles to structural relations: Bridging the syntactic cleft. Psychological Review 99: 150–171. Bock, J. Kathryn & Richard K. Warren 1985 Conceptual accessibility and syntactic structure in sentence formulation. Cognition 21: 47–67. Bresnan, Joan 2007 A few lessons from typology. Linguistic Typology 11: 297–306. Bresnan, Joan, Anna Cueni, Tatiana Nikitina & R. Harald Baayen 2007 Predicting the dative alternation. In Cognitive Foundations of Interpretation, G. Boume, I. Kraemer & J. Zwarts (eds.), 69–94. Amsterdam: Royal Netherlands Academy of Science. Bresnan, Joan & Jennifer Hay forthc. Gradient grammar: An effect of animacy on the syntax of give in varieties of English. Lingua. Special issue on animacy invited submission). Bresnan, Joan & Tatiana Nikitina 2003 On the gradience of the dative alternation. Stanford University: http://www-lfg.stanford.edu/bresnan/download.html. Chatterjee, Samprit, Ali S. Hadi & Bertram Price 2000 Regression Analysis by Example. 3rd edition. New York: Wiley. Clark, Eve V. 1987 The principle of contrast: A constraint on language acquisition. In Mechanisms of language acquisition, B. MacWhinney (ed.), 1–33. Mahwah, NJ: Erlbaum. Cleveland, William S. 1979 Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74: 829–836.
94 Joan Bresnan Collins, Peter 1995 The indirect object construction in English: An informational approach. Linguistics 33: 35–49. Cowart, Wayne 1997 Experimental Syntax: Applying Objective Methods to Sentence Judgments. Thousand Oaks, CA: Sage. Fellbaum, Christiane 2005 Examining the constraints on the benefactive alternation by using the World Wide Web as a corpus. In Evidence in Linguistics: Empirical, Theoretical, and Computational Perspectives, M. Reis & S. Kepser, (eds.), 207–236. Berlin /New York: Mouton de Gruyter. Ferreira, Victor S. 1996 Is it better to give than to donate? Syntactic flexibility in language production. Journal of Memory and Language 35: 724–755. Godfrey, John J., Edward C. Holliman & Jane McDaniel 1992 Telephone speech corpus for research and development. In Proceedings of ICASSP-92, pp. 517–520. Green, Georgia 1971 Some implications of an interaction among constraints. CLS 7: 85– 100. Gries, Stefan Th. 2003 Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics 1: 1–27. 2005 Syntactic priming: A corpus-based approach. Journal of Psycholinguistic Research 34: 365–399. Hawkins, John A. 1994 A Performance Theory of Order and Constituency. Cambridge: Cambridge University Press. Hay, Jennifer & Joan Bresnan 2006 Spoken syntax: The phonetics of giving a hand in New Zealand English. The Linguistic Review: Special Issue on Exemplar-Based Models in Linguistics 23: 321–349. Kelly, Michael H., J. Kathryn Bock & Frank C. Keil 1986 Prototypicality in a linguistic context: Effects on sentence structure. Journal of Memory and Language 25: 59 –74. Lapata, Maria 1999 Acquiring lexical generalizations from corpora: A case study for diathesis alternations. In Proceedings of the 37th Meeting of the North American Chapter of the Association for Computational Linguistics, pp. 397– 404. College Park, Maryland. Levin, Beth 1993 English verb classes and alternations: A preliminary investigation. Chicago /London: University of Chicago Press.
Is syntactic knowledge probabilistic?
95
Pickering, Martin J., Holly P. Branigan & Janet F. McLean 2002 Constituent structure is formulated in one stage. Journal of Memory and Language 46(3): 586–605. Pinheiro, José C. & Douglas M. Bates 2000 Mixed-Effects Models in S and S-PLUS. New York: Springer. Pinker, Steven 1989 Learnability and Cognition: The Acquisition of Argument Structure. Cambridge, MA: MIT Press. Prat-Sala, Mercè & Holly P. Branigan 2000 Discourse constraints on syntactic processing in language production: A cross-linguistic study in English and Spanish. Journal of Memory and Language 42: 168–182. R Development Core Team 2006 R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org. Rappaport Hovav, Malka & Beth Levin 2005 All dative verbs are not created equal. Unpublished ms., Hebrew University of Jerusalem and Stanford University, Jerusalem, Israel and Stanford, California. Roland, Douglas & Daniel Jurafsky 2002 Verb sense and verb subcategorization probabilities. In The Lexical Basis of Sentence Processing: Formal, Computational, and Experimental Issues, P. Merlo & S. Stevenson (eds.), 325–346. Amsterdam: Benjamins. Rosenbach, Anette 2003 Aspects of iconicity and economy in the choice between the s-genitive and the of-genitive in English. In Determinants of Grammatical Variation in English (Topics in English Linguistics/[TiEL], G. Rohdenburg & B. Mondorf (eds.), 379–411. Berlin/New York: Mouton de Gruyter. 2005 Animacy versus weight as determinants of grammatical variation in English. Language 81: 613–644. Schütze, Carson T. 1996 The Empirical Base of Linguistics: Grammaticality Judgments and Linguistics Methodology. Chicago: University of Chicago Press. Snyder, Kieran M. 2003 The relationship between form and function in ditransitive constructions. Ph.D. thesis, University of Pennsylvania. Szmrecsányi, Benedikt 2005 Language users as creatures of habit: A corpus-based analysis of persistence in spoken English. Corpus Linguistics and Linguistics Theory 1: 113–149.
96 Joan Bresnan Thompson, S. A. 1990 Information flow and dative shift in English discourse. In Development and Diversity: Language Variation across Time and Space, J. A. Edmondson, C. Feagin & P. Mühlhausler (eds.), 239–253. Dallas: Summer Institute of Linguistics and University of Texas at Arlington. Wasow, Thomas 2002 Postverbal Behavior. Stanford: CSLI Publications.
Psycholinguistic perspectives on grammatical representations Harald Clahsen
1. Introduction This chapter discusses the potential use of psycholinguistic evidence for the theoretical linguist. Psycholinguistic evidence may come from the study of language acquisition, language disorders, and real-time processing of language. Our specific question will be how the theory of grammar might benefit from psycholinguistic evidence of this kind. For example, psycholinguistic findings may favour one theory of grammar over its alternatives and thus help to resolve theoretical linguistic controversies. Looking at the vast majority of studies on grammar, one gets the impression that psycholinguistic evidence is of little use. It is true that theoretical linguists, including those working from the perspective of generative grammar, often pay lip service to the potential relevance of psycholinguistic evidence. Chomsky (1981: 9) noted, for example, that psycholinguistic evidence from language acquisition, experimentation on language processing, and evidence from language deficits is relevant to determining the properties of both Universal Grammar and of particular grammars, but at the same time, he observed that evidence from these sources is for some unspecified reason, ‘insufficient to provide much insight concerning these problems’, and that, therefore, the theoretical linguist is compelled to rely on grammar-internal considerations. Indeed, casual inspection of the major journals in theoretical linguistics reveals hardly any reference to results and findings from psycholinguistic studies. Against this background, I will make some proposals of how to bridge the gap between psycholinguistic research and theories of grammatical knowledge. Firstly, a set of criteria will be established that psycholinguistic evidence should meet to be relevant for theories of grammar. I will then present three case studies, one from language acquisition, one from language processing, and one from language disorders to illustrate what the theoretical linguist can learn from psycholinguistic studies about the nature of grammars.
98 Harald Clahsen 2. A framework for employing psycholinguistic evidence One reason that may have led Chomsky (1981) to conclude that psycholinguistic research is not particularly informative is that many psycholinguistic studies deal with issues linguists do not really care about, e.g. the details of developmental sequences in language acquisition, the intricacies of language impairments, and the precise time-course of language processing. It is also true that psycholinguistic studies often fail to explicate the potential implications of their results for theories of language. Some common ground is required for psycholinguistic findings to bear on linguistic theories. I suggest that this can be achieved by taking seriously the generative view of language. Generative grammar regards human language as a cognitive system that is represented in a speaker’s mind /brain with the mental grammar as its core element. The ultimate aim of generative research is to discover the most appropriate mental representations for language, and this encompasses both linguistic and psycholinguistic studies. From this perspective, a linguist examining the grammar of a particular language deals with a mental structure consisting of grammatical representations which are somehow manifested in a person’s brain, and which describe what it means to know a language. Research into language processing investigates how grammatical representations are constructed in real time, during the comprehension and production of language. We may conceive of these processes as a sequence of operations, each of which transforms a mental representation of a linguistic stimulus into a mental representation of a different form. Language acquisition research is concerned with changes of grammatical representations over time. From this perspective, studies of the acquisition of grammar posit a sequence of transitional grammar, i.e. changes to the mental representations of language over time. Studies of language disorders may provide insight into what has been called subtractivity (Saffran 1982), transparency (Caramazza 1984) or residual normality, i.e. selective representational deficits of an otherwise intact system of grammatical representations. Clearly, each domain – language acquisition, language processing, and language impairments – requires its own theories, but if results from psycholinguistic studies are interpreted with respect to the nature of mental representations of grammar, then the theory of grammar can potentially draw on evidence from all these sources. But how should linguists go about employing psycholinguistic findings for evaluating grammatical analyses and theories of grammar? Simply scanning the psycholinguistic literature for confirming evidence, i.e. for findings
Psycholinguistic perspectives on grammatical representations
99
that appear to support the preferred analysis or theory is insufficient as one may overlook potential counter-evidence. Instead, a more systematic approach is required. Here, I will make some suggestions of how this could be achieved by setting out some criteria against which results from psycholinguistic studies should be evaluated before they are taken as evidence for particular grammatical analyses or theories. One important consideration is whether there are any confounding factors or alternative explanations for a given psycholinguistic finding. All linguistic data are performance data and are affected by a range of nonlinguistic factors (Schütze 1996, 2004). It is possible, for example, that a particular experimental result, e.g. longer response times for condition X than for Y, is due to the fact that X is more demanding in terms of workingmemory or more general cognitive resources than Y. The role of such factors needs to be assessed before they are taken as evidence for positing any linguistic difference between X and Y. Below I will discuss a case in point, experimental results on word order and their potential implications for the analysis of clause structure in German. Another relevant consideration is whether a given psycholinguistic finding is supported by converging evidence from other sources. Any data set or experiment is in danger of producing artifacts, e.g. due to an experiment’s specific task demands, weaknesses of individual techniques, or gaps in particular data sets. This holds not only for the standard techniques employed by linguists but also for all kinds of psycholinguistic studies. One way around this problem is to look for converging evidence from different sources. Thus, in the same way in which linguists do not, for example, rely on just one test for determining constituent-hood, psycholinguistic findings should only be used as evidence if they are replicable, ideally across different experimental techniques and data sources. A final consideration is whether a given psycholinguistic finding confirms or disconfirms a specific linguistic theory or analysis, or whether it is compatible with different theoretical treatments. Experimental findings may be consistent with a given linguistic analysis, but this by itself is a relatively weak case as the same findings may also be consistent with alternative linguistic analyses. Demonstrating that some experimental findings favour one analysis over its alternatives represents a much stronger case because in this way psycholinguistic evidence may help to adjudicate between competing linguistic analyses or theories. However, as will be shown below, it is rarely the case that experimental evidence uniquely favours one particular linguistic analysis while at the same time disconfirming all available alternatives.
100 Harald Clahsen 3. Studies of child language acquisition and morphological representation Much research in child language acquisition takes the so-called continuity assumption as a null hypothesis. According to continuity, the child’s grammar learning device does not change over time, and all developmental changes are attributed to increases in the child’s lexicon, the child’s semantic and pragmatic knowledge, and increases in cognitive resources in general (Pinker 1984, Clahsen 1990). The continuity assumption is taken as a null hypothesis, which means that it is only rejected if empirical evidence leaves no other choice. By assuming continuity, the acquisition researcher avoids positing distinct grammatical systems for child language in cases in which this is unnecessary and, instead, ensures that child language is analyzed in terms of the same categories and constraints that are required for adult languages throughout all stages of development. Thus, we can conceive of language acquisition as a sequence of transitional grammars, and each of these grammars may provide insights that are not immediately accessible to observation in adult systems. In this way, the continuity assumption ensures that developmental evidence will bear on the object of inquiry that linguists care about, namely the study of systems constrained by the human language faculty (Rizzi 2000: 269). The specific case I will examine concerns contrasts between regular and irregular inflection; see Eisenbeiss (2002) and Rizzi (2000) for examples of how child language acquisition data might bear on theories of syntax. Broadly speaking, we can think of three ways of representing inflected word forms. One possibility is that all inflected words are formed by morphophonological rules, and memorization of inflected words is avoided as much as possible. Halle & Mohanan (1985) and related work in Generative Phonology in which minor rules are proposed, for example ‘Lowering Ablaut’ deriving the past-tense form sang from the stem sing, are representative of this approach; see Yang (2003) for a recent treatment of the English past tense in this framework. A radical alternative are different kinds of associative models claiming that all inflected words are stored and processed within a single associative system using distributed representations (see e.g. Rumelhart & McClelland 1986; Bybee 1995; Sereno & Jongman 1997, among others). The morphological structure of an inflected word is not explicitly represented in these models; instead, these models implement networks that represent the mapping relationship between the stem of a word and its inflected form through associatively linked orthographic, phonological and
Psycholinguistic perspectives on grammatical representations
101
semantic codes. A third possibility is represented by a family of so-called dual-mechanism models which hold that morphologically complex word forms can be processed both associatively, i.e. through stored full-form representations, and by rules that decompose or parse inflected word forms into morphological constituents (Chialant & Caramazza 1995; Schreuder & Baayen 1995; Clahsen 1999; Pinker 1999). In Pinker’s (1999) words-andrules model, for example, these two mechanisms are claimed to be responsible for contrasts between regular and irregular inflection, as, for example, in the English past-tense (see also Pinker & Ullman 2002). Regular -ed inflection is said to invoke a combinatorial rule (Add -ed), which makes -ed forms predictable in form, readily applicable to novel items and to the outputs of other morphological processes (derivation, compounding). Irregular past-tense inflection (e.g. sing-sang), on the other hand, cannot be perfectly predicted by the form of the stem or root, and only tentatively extends to new forms, and is therefore said to be based on stored forms. How can data from child language acquisition contribute to deciding between these conflicting views of morphological representation? Here I will report results from two domains of the development of German verb inflection (i) past participle formation, and (ii) present tense stem formation. Consider first participle formation. (Past) participle formation in German involves two endings, -n which appears on all participle forms of so-called strong (= irregular) verbs, and -t which appears on the participle forms of all other verbs. Irregular verbs undergo (phonologically unpredictable) stem changes in the preterit and at times also in the participle, e.g. stehlen (infinitive) – gestohlen (participle) – stahl (preterit) ‘to steal’ – ‘stolen’ – ‘stole’, ziehen – gezogen – zog ‘to pull’ – ‘pulled’ – ‘pulled’, vergraben – vergraben – vergrub ‘to bury’ – ‘buried’ – ‘buried’. There are about 160 simplex verbs that fall into the strong (= irregular) class. According to its linguistic properties, the participle suffix -t behaves like the English past tense suffix -ed: -t suffixation on regular verbs does not involve any stem alternations and is applied under default circumstances in the sense of Marcus, Brinkmann, Clahsen, Wiese & Pinker (1995), i.e. to words for which lexical entries are not readily available, e.g. low-frequency verbs (schroten ‘to crush (grain)’ – geschrotet), verbs derived from adjectives or nouns (sauber (adj.) ‘clean’ – säubern (verb) ‘to clean’ – gesäubert), onomatopoeia (brummen ‘to buzz’ – gebrummt), and nonsense words (flauden – geflaudet). In contrast, -n suffixation co-occurs with phonologically unpredictable stem changes and -n participle formation does not generalize to novel words which do not rhyme with existing strong
102 Harald Clahsen verbs (Clahsen 1997). Hence, -n participle forms behave like English irregular past-tense forms1. The development of participle formation has been investigated in a large number of children and across a wide age range (Clahsen & Rothweiler 1993; Weyerts & Clahsen 1994; Weyerts 1997). Table 1 presents an overview of the results of these studies. Table 1. Participle formation in German child language Age range
Number of children
Total errors
-t errors
-n errors
9 51 19
116 88 64
108 (93%) 77 (88%) 59 (92%)
8 (7%) 11 (12%) 5 (8%)
41
454
420 (93%)
34 (7%)
Existing verbs 1;4 – 3;9 3;6 – 6;11 7;2 – 8;11 Nonce words 3;10 – 8;10
The data summarized in Table 1 revealed that children across all age groups overapply the participle -t to strong verbs, but that they rarely overapply the participle -n to other verbs. Thus, children typically produce errors such as *gekommt ‘come, participle’ instead of gekommen, but rarely *geschneien ‘snow, participle’ instead of geschneit ‘snowed’. High percentages of -t overregularizations (of around 90%) were found for both existing and nonce verbs. In addition, a frequency effect was found for irregular, but not for regular verbs. Strong verbs that have low frequencies (of less than 30 per million) elicited significantly more -t overregularization errors than strong verbs with high token frequencies (of more than 200 per million), see Weyerts & Clahsen (1994). Verb stem formation in German child language was investigated by Clahsen et al. (2002a) using 73 samples of spontaneous speech from 7 children and in an elicited production task with 26 children. Two types of error were found in the spontaneous speech data. The most common ones were overapplications of the unmarked (non-alternated) stem in cases in which strong stem forms are required in the adult language (= 88.4% of all stem errors); see (1a) and (1b). The second type were paradigmatic errors (= 11.5%) such as those in (1c), (1d), and (1e) in which an existing marked stem form of a given verb was used in a 1st sg. or 3rd pl. context that does
Psycholinguistic perspectives on grammatical representations
103
not require a marked stem in the adult language. Note also that stem irregularizations (i.e. strong stems overapplied to weak verbs such as tanzen) were non-existent. (1)
a. b. c. d. e.
er lauft sie lest alle fäll da runter ich gib dir das ich sieh
‘he runs’ ‘she reads’ ‘everybody fall down there’ ‘I give you that’ ‘I see’
(correct: läuft) (correct: liest) (correct: fall-en) (correct: geb(e)) (correct: seh(e))
In the elicited production task, the children first listened to a sentence containing an infinitive form of a verb as the final word, e.g. sehen in (2a). They then heard a second sentence in which the appropriate finite verb form of the previously presented verb was replaced by a beep and were asked to produce the correct verb form, e.g. sieht in (2b). (2)
a. Martin will unbedingt den neuen Pokemon-Film sehen. ‘M. definitely wants to see the new pokemon movie’. b. Also gibt ihm seine Mutter Geld und Martin beep den Film. ‘Hence his mother gives him some money, and Martin ___ the movie’
This task elicited 168 stem-formation errors (out of a total of 555 stem forms) all of which were overapplications of the unmarked stem. Moreover, low-frequency (irregular) verbs elicited significantly more stem errors than high-frequency ones. Finally, a significant age effect was seen in the stemformation errors in that stem overregularizations were found to gradually decrease with age. Summarizing the results of these studies, it was found the regular -t participle suffix and the unmarked stem are overapplied to irregular verbs, whereas overapplications of irregular patterns (= the participle -n and marked stems) are rare or non-existent indicating regular/irregular contrasts in children’s grammars. There were also frequency and age effects in children’s inflectional errors. Children produced more overregularizations for irregular verbs with low frequencies than for those with high frequencies, and overregularization errors decreased with age. To see how these results from language acquisition studies might bear on theories of morphological representation, consider the three criteria for evaluating psycholinguistic evidence mentioned in the previous section.
104 Harald Clahsen The first consideration was whether there are any potentially confounding factors that may account for the findings obtained. One important factor that has been claimed to be responsible for regular/irregular contrasts in language acquisition is the frequency of occurrence in the input. It has been noted (Bybee 1995, 1999; Stemberger 1999; among many others) that regular rules of morphology usually have high type frequency and apply to a large number of different forms. Bybee (1995, 1999) argued that this also holds for the case we have considered. The type frequency of the German -t participle is said to be much higher than that of irregular ones, and the same applies to unmarked stems which have higher frequencies than irregular strong stem forms. Thus, children may overapply -t participle forms and unmarked stems simply because these forms are most common, and not because of some representational difference between regulars and irregulars. Note, however, that frequency counts that put -t participles in the majority require collapsing linguistically distinct types of words into one category, they require counting types (rather than tokens, for which regulars are not in the majority by any criteria), and they require huge corpora containing many obscure words. All these points are problematic. German has many families of particle and prefix verbs, such as ankommen ‘arrive’, bekommen ‘receive’, aufkommen ‘support, pay’, which have non-compositional meanings, which orthographically and phonologically behave like single verbs, and which in their participle form always appear as a single verb. By collapsing all verbs that share a root, as for example in the frequency counts provided by Bybee (1995), these properties are ignored. Instead, Clahsen & Rothweiler (1993) counted verbs such as those mentioned above separately and this yielded similar frequencies for regulars and irregulars in several frequency databases. It is true that in frequency counts that are based on the entire German verb lexicon or a huge lexical database such as CELEX with a total of nearly 6 million words (Baayen et al. 1993), weak verbs do indeed outnumber strong verbs. But these frequency counts contain many obscure words that are unlikely to occur in the input to a child in the relevant age period. Weyerts & Clahsen (1994) therefore determined type (and token) frequencies in child-directed speech on the basis of spontaneous speech corpora covering the age range of 1;5 to 8;7. Similar type (and token) frequencies were found for -t and -n participle forms, 46.6% for the former and 53.4% for the latter. Thus, the differences between the participle endings -t and -n in children’s overregularization errors do not appear to be correlated with frequency differences in their input. Our next concern is whether there is any converging evidence for the results on German verb inflection from acquisition studies examining different
Psycholinguistic perspectives on grammatical representations
105
domains of inflection within the same language and across different languages. Evidence for regular/irregular contrasts similar to the ones reported above is indeed available from different acquisition studies using a range of methods. Here I will only mention a few examples. The results on children’s participle formation were recently replicated with 40 children in two age groups (5-to-7-year olds, 11-to-12-year olds) using a speeded production task (Clahsen et al. 2004). Again, children were seen to overapply -t participle forms and unmarked stems to forms that are irregular in the adult language, but not vice versa. Plural formation in German child language was also investigated using similar methods and large data sets (see Clahsen 1999 for review). The results were parallel to those on verb inflection. Children prefer -s plurals for non-rhymes and for non-canonical words (proper names) indicating that -s plurals just like -t participles are applied under default circumstances, i.e. when similarity-driven analogies fail (Bartke et al. 1996). This was found despite the fact that -s plurals are extremely infrequent in the German language showing that rule-like behavior in children’s inflectional errors cannot be explained in terms of the frequency distribution in the input. There is also evidence from the acquisition of verb inflection in English (Marcus et al. 1992), Italian (Say & Clahsen 2002), and Spanish (Clahsen et al. 2002b) that is parallel to the results reported here for German. In these studies, regular inflectional forms and unmarked stems were found to be overapplied to irregular verbs, whereas overapplications of irregular patterns were rare or non-existent. Moreover, frequency and age effects were seen for irregular verbs with low frequency verbs attracting more overregularization errors than high-frequency ones, and overregularization errors decreasing with age. Thus, we can take these findings to be relatively robust across different inflectional domains and across different languages. The third consideration mentioned above for evaluating psycholinguistic evidence is whether a given set of results confirms or disconfirms any of the theoretical accounts in the relevant domain. Recall that there are three broad representational models for morphologically complex words: (a) an all-rules account for both regular and irregular inflected word forms, (b) associative network models with stored associatively-linked entries for both regular and irregular forms, and (c) a rules-plus-entries account according to which regular inflected words involve combinatorial (stem +affix) representations and irregular inflected forms have full-form representations stored in lexical memory. The acquisition data indicate sharp regular/irregular contrasts that are most directly compatible with rules-and-entries accounts. For irregulars, we found frequency and age effects which are indicative of
106 Harald Clahsen memory storage. Memory traces get stronger with additional exposure; consequently high-frequency entries can be more readily accessed than lowfrequency ones, and overregularizations decrease with age. These results challenge an all-rules account in which both regular and irregular inflection are derived by combinatorial rules (see e.g. Halle & Mohanan 1985; Halle & Marantz 1993), unless one adopts Yang’s (2003) proposal that the rules for irregulars are each assigned a probability, or weight, depending on their frequency, whereas the ‘default rule’ (for regular inflection) is applied elsewhere and is not assigned any particular probability. In this way, the frequency effects seen for irregulars in children’s overregularization errors can be explained in an all-rules framework. On the other hand, the findings reported above are difficult to reconcile with associative models of morphological representation in which the same kinds of representation are posited for regular and irregular inflected words. Children were found to overapply regular inflectional patterns (rather than irregular ones), even in cases in which the regular pattern is not the most common form in the children’s input, as, for example, in the case of German participles and plurals. This finding is incompatible with associative accounts such as Bybee (1995, 1999) and most connectionist models (see Clahsen 1999 for discussion) which directly conflate regularity with frequency of occurrence. While the acquisition data reported in this section support the fundamental distinction between regular and irregular inflection, it should be noted that there are different ways of deriving this distinction from an explicit theory of morphological representation. Wunderlich (1996) and Jackendoff (1997) provide a direct implementation of this distinction in terms of an opposition between rules and entries. Realization-based models of morphology (see e.g. Booij 2002; Blevins 2003a) offer an alternative way of implementing the contrast between regular and irregular inflection within an expanded rule inventory that includes traditional entries as a degenerate, highly-specific, rule type (Booij 2002; Blevins 2003a). Consider, for example, the contrast between the 3sg -s in English and a fixed rule with a constant output for the lexical item is illustrated in (3) in terms of Aronoff’s (1994) realization pair format. (3)
a. <[V, 3sg, pres, ind ], X+s> b. <[V, 3sg, pres, ind, be], is>
The first element of each pair of the two inflectional rules (3a) and (3b) identifies the features to be realized, while the second element indicates the formal spell-out. The regular 3sg rule in (3a) spells out the bracketed fea-
Psycholinguistic perspectives on grammatical representations
107
tures by adding the exponent ‘s’ to the base form represented by the variable ‘X’. Applied to the base of the regular verb WALK, this rule defines the regular 3sg form walks. The pair in (3b) realizes the 3sg present indicative features of the lexeme BE by the constant (irregular) form is. In this way, the regular/irregular contrast can be reconstructed within an expanded rule inventory in terms of a contrast between regular inflectional rules such as (3a) that contain variables and degenerate, highly-specific, rules such as (3b) that derive a fixed output. It is difficult to see how language acquisition data of the kind mentioned in this section should decide between these different implementations. One crucial difference between these accounts is whether irregular inflected forms are cached out in lexical entries or whether they are derived by inflectional rules. Evidence that might bear on this question comes from studies that more directly tap memory storage, e.g. from lexical decision and priming experiments. Results from such experiments are available for adults (see Penke 2006 for review). It was found, for example, that regular (but not irregular) inflected word forms yield stem priming effects in lexical priming experiments, indicating that regular forms are decomposed into their morphological constituents, whereas irregulars are stored as wholes (see e.g. Sonnenstuhl et al. 1999). These findings provide evidence against any rule-based account for irregulars. However, the kinds of off-line data available from language acquisition studies do not normally provide such measures (but see Clahsen et al. 2004, 2007). Thus, while language acquisition data provide evidence against purely associative models of morphological representation, the kinds of acquisition data currently available do not decide between the different implementations of rules-and-entries and rules-only accounts of morphological representation.
4. Studies of sentence processing and the analysis of German clause structure Sentence processing research addresses the question of how the mental grammar is employed in the production and comprehension of sentence. The most direct way to approach this relationship is to adopt the correspondence hypothesis (originally proposed by Miller & Chomsky 1963), which takes the mental grammar to be directly involved in how we understand and produce sentences in real time. Thus, when producing or comprehending a sentence, the speaker/hearer is said to make use of essentially the same processing units and operations that are used in linguistic analysis. The appeal
108 Harald Clahsen of the correspondence hypothesis is that it provides a parsimonious and straightforward account of how grammatical knowledge and processing are related in that the parser is said to make basically the same distinctions as the grammar (see Jackendoff 1997, Phillips 1996, among others for discussion). The correspondence hypothesis prevents experimental psycholinguists from positing any specialized parsing and production strategies for sentence processing that have nothing much to do with the units and operations of the mental grammar. Instead, it provides accounts of sentence processing using the normal structures and operations of the grammar. Thus, the correspondence hypothesis is a sensible starting point for employing sentence processing studies as evidence for theories of syntactic representation. At the same time, however, it should be acknowledged that the theory of grammar does not provide a complete account of sentence processing. Comprehension difficulties arising in garden-path sentences (The soldiers marched across the parade ground are a disgrace) and centre-embeddings (The pen the author the editor liked used was new), for example, indicate that sentence comprehension may be affected by additional factors (e.g. by working-memory limitations), and not just by the grammar (but see Weinberg 1999). These factors need to be considered before any experimental finding can be taken as evidence for or against a particular syntactic theory. The specific case I will discuss here concerns word-order preferences during sentence comprehension and what such preferences might reveal about conflicting syntactic analyses of word-order phenomena; see Farina (2005), Sag & Fodor (1995), Nakano et al. (2002), Featherston et al. (2000) for other cases. For word order, evidence from psycholinguistic experimentation points to a general subject-first preference in on-line sentence comprehension (see e.g. Kaan 1997 for review). That is, the parser seems to find it easier to comprehend sentences in which the subject precedes the object than sentences with the reverse, O–S order. This even holds for languages such as Dutch and German, in which the O–S order is perfectly grammatical. In contrast to the subject-first preference, psycholinguistic studies have not produced any indication that (S)OV sentences are more difficult to comprehend than (S)VO sentences or vice versa. Instead, a series of cross-modal priming experiments have revealed that Japanese and German head-final VPs (Nakano et al. 2002, Clahsen & Featherston 1999) are as optimal in sentence comprehension as English head-initial VPs (Love & Swinney 1996) indicating that order preferences for VO or OV are languagespecific. Another question, however, is whether there are any order preferences for finite verbs in sentence processing. Consider, for example, the positioning of finite verbs relative to subjects and objects in German. Finite
Psycholinguistic perspectives on grammatical representations
109
verbs may occur in the initial, the second, or the final position of a clause, depending on the type of clause: Yes-no questions and imperatives have the finite verb in first position, declarative main clauses in second position, and embedded clauses in final position. This intralanguage difference raises the question of whether the parser shows a preference for any of these different placement patterns in on-line sentence processing. The analysis of finite verb placement in German is controversial among syntacticians. The best known account is the double-movement analysis illustrated in (1); see e.g. Thiersch (1978), Fanselow (1988), von Stechow & Sternefeld (1988), Grewendorf (1988), Schwartz & Vikner (1996), among others. According to this analysis, Here, verbs first move to a clause-final INFL head (see [ej] in (1)) and in main clauses subsequently raise to the COMP position of a head-initial CP. In addition, some constituent will raise to Spec-CP in declarative main clauses, i.e. in front of the finite verb. This double-movement analysis ensures that the finite verb will always be in second position in this type of clause. Since the COMP-position is filled with lexical complementizers such as dass ‘that’ in embedded clauses, the analysis also ensures that the finite verb only raises in main clauses. In this way the double-movement analysis accounts for all possible verb positions. (1)
[CP [Die Aufgabe]i [C’ [COMP hat j] [IP Pauline mittlerweile [VP [ei] gelöst] [INFL ej]]]] ‘In the meantime, Pauline has solved the task.’
Several syntacticians have pointed out descriptive problems of the doublemovement analysis and proposed alternatives. Haider (1993) believes that there is no convincing evidence for a head-final IP projection in German. Travis (1984, 1991) suggests that in German SVfX clauses, finite verbs (Vf) are located in a head-initial IP, so that in sentences with preverbal subjects neither the finite verb nor the subject have to undergo any movement to COMP or Spec-CP. Sentences with postverbal subjects, on the other hand, involve leftward movements of the finite verb and other constituents, as illustrated in (1), and in embedded sentences the finite verb is said to be located within VP in clause-final position; see Reis (1985), Kathol (1990) and Zwart (1993) for similar proposals. These accounts are meant to capture the syntactic and interpretative differences between SVf X sentences with the subject directly preceding the finite verb, sentences with derived main-clause word orders such as (1) below, and clause-final finite verb placement in embedded sentences. In contrast to the double-movement analysis, these latter accounts assign a privileged status to SVfX sentences
110 Harald Clahsen in German, and if this is correct, we may expect to find a corresponding order preference in sentence processing. To examine order preferences for finite verbs, Weyerts et al. (2002) investigated the on-line comprehension of correct and incorrect word order in main and embedded clauses in German using different experimental paradigms. The critical items for experiments 1 and 2 are illustrated in (2). (2)
a. Main clause with correct SVf O word order Es ist Ostern, und die trauernde Witwe opfert Kerzen. ‘It is Easter time, and the mourning widow sacrifices candles.’ b. Main clause with incorrect SOVf word order *Es ist Ostern, und die trauernde Witwe Kerzen opfert. ‘It is Easter time, and the mourning widow candles sacrifices.’ c. Embedded clause with correct SOVf word order Der Priester sieht, dass der fromme Novize Kerzen opfert. ‘The priest sees that the pious novice candles sacrifices.’ d. Embedded clause with incorrect SVf O word order *Der Priester sieht, dass der fromme Novize opfert Kerzen. ‘The priest sees that the pious novice sacrifices candles.’
Experiment 1 was a self-paced reading task with 26 native speakers of German. Self-paced reading times have been shown to reflect the amount of parsing effort required in sentence processing (e.g. Gibson 1998, among others). Given that ungrammatical sentences require more parsing effort than corresponding grammatical ones, a comparison of reading times for sentences with correct and incorrect verb placement should reveal an effect of ungrammaticality. Thus, reading times for sentences with incorrect verb placement (2b, 2d) should be significantly longer than reading times for sentences with correct verb placement (2a, 2c). In addition, if there is an order preference for either SVf O or SOVf , then the preferred order should produce shorter reading times than the dispreferred one on the assumption that the preferred one requires less parsing effort than the dispreferred one. Experiments 2 and 3 measured event-related brain potentials (ERPs) during reading. ERPs are minute voltage fluctuations of the electrical activity produced by the neurons in the brain that are recorded from various points on the scalp while the participant is performing some task. ERPs possess time resolution in the millisecond range and thus provide an excellent online measure of language processing in real time (see Kutas & Schmitt 2003
Psycholinguistic perspectives on grammatical representations
111
for review). Several ERP studies have found two different waveforms, an anterior negativity (also labelled LAN, Left Anterior Negativity) and a syntactic positive shift (also labelled P600) that reflect processes involved in sentence comprehension. The anterior negativity is an early negative-going wave with a frontal distribution (sometimes larger over the left than over the right hemisphere) that occurs as a response to phrase structure violations, agreement violations, and less familiar but grammatically well-formed syntactic structures (see e.g. Friederici 2002; Felser et al. 2003). The syntactic positive shift is a late positive-going wave with a centro-parietal distribution that occurs in response to violations of phrase structure, subjacency, subjectverb agreement, and temporarily ambiguous (garden-path) sentences (see Osterhout 2004 for review). Given these findings, ungrammaticality caused by word-order violations should elicit a measurable ERP effect, with the ungrammatical sentences in (2b) and (2d) eliciting a larger anterior negativity and/or P600 than the corresponding grammatical sentences. In addition, if there is a word-order preference for either SVf O or SOVf , then the dispreferred order should produce a larger anterior negativity and/or P600 than the preferred one. Note, furthermore, that the materials used for experiments 1 and 2 included lexical verbs such as opfern ‘sacrifice’ as critical verbs. Thus, it could be the case that any order preference obtained in these experiments is due to the verb’s lexical-semantic properties, rather than to its finiteness features. For example, if SOVf sentences are found to be more difficult to parse, this could be because the thematic verb is encountered later in the clause than in SVfO causing a higher degree of temporal ambiguity. To address this possibility, Weyerts et al. performed an additional ERP experiment with auxiliaries as the critical items: (3)
a. Embedded clauses in correct S–O–V–AUX order Der grüne Politiker verspricht, dass der Naturschutz the green politician promises that the nature-conservation den Wald retten wird. the forest save will ‘The green politician promises that nature conservation will save the forest.’ b. Embedded clauses in incorrect S–AUX–O–V order *Der grüne Politiker verspricht, dass der Naturschutz wird den Wald retten.
112 Harald Clahsen If there is any order preference that is determined by a verbs’ finiteness features (rather than by its lexical-semantic properties), then we would expect to find parallel ERP effects in experiments 2 and 3. The results from experiments 1–3 of Weyerts et al. (2002) are summarized in Table 2. Table 2. Word order preferences in German sentence processing (Weyerts et al. 2002) Experiment 1 SOVf (incorrect) vs. control SVf O (incorrect) vs. control
45 ms – 4 ms
Experiment 2 300-500ms:
SOVf vs. SVf O
Anterior Negativity
700-1000ms:
SOVf (incorrect) vs. SOVf (correct) SVf O (incorrect) vs. SVf O (correct)
P600 No effect
Experiment 3 150 –300ms: SOVf (correct) vs. SVf O (incorrect)
Anterior Negativity
Experiment 1 revealed slower reading times for the critical region (shown in italics in (2) above) in main clauses with the incorrect SOVf order than the correct SVf O order with a significant difference of 45 ms. In contrast, the comparison of incorrect SVf O in embedded clauses to correct SOVf yielded a non-significant difference of 4 ms. These results, particularly the finding that in embedded clauses, the SVf O order did not produce longer reading times than the SOVf one, even though SVf O is ungrammatical in such cases, provides the first indication of an SVf O preference in German sentence processing. Experiment 2 showed that the ERPs to the critical penultimate word of each stimulus sentence (e.g. opfert in (2a)) were associated with a significant anterior negativity for SOVf (compared to SVf O) in the 300–500 ms time-window. Moreover, in the later 700–1000 ms timewindow, a large P600 with a centro-parietal maximum was found for the incorrect SOVf order, i.e. for SOVf in main clauses, whereas there was no significant effect for incorrect SVf O. These results are in line with those of experiment 1. The anterior negativity for SOVf irrespective of grammaticality and the P600 for ungrammatical SOVf (but not for ungrammatical SVf O) are indicative of a prefer-
Psycholinguistic perspectives on grammatical representations
113
ence for parsing finite verbs in second position, immediately after the subject and before the object. The ERPs to the critical words (shown in italics in (3) above) in experiment 3 revealed an anterior negativity for grammatically well-formed Subject…AUX sentences in the 150–300 ms timewindow compared to the incorrect Subject AUX… order. This finding replicates the anterior negativity obtained in experiment 2 and confirms that the SVfO preference is caused by the morpho-syntactic features (finiteness) of verbs rather than by their lexical-semantic properties. Taken together, these results indicate that sentences with the finite verb in second position and immediately following the subject are easier to parse than sentences with the finite verb in final position, and this preference holds even for embedded clauses for which the grammar of German requires clause-final placement of finite verbs. We now turn to the question of what these results from sentence-processing studies might mean for the conflicting syntactic analyses of German clause structure mentioned above. Consider first whether there is any converging evidence from other psycholinguistic studies for an SVf O order preference. Such evidence is indeed available from studies of German child language and different kinds of language disordered populations. Several acquisition studies have shown that in early stages of the acquisition of German, finite verbs are almost always placed in second or first position, i.e. before objects (see e.g. Clahsen & Penke 1992; Poeppel & Wexler 1993) indicating that the verb-second construction of German is acquired early. It has also been shown that the verbsecond construction is typically not affected by developmental language impairments. Most children with Specific Language Impairment (SLI), for example, produce fewer finite verb forms than unimpaired control subjects and make inflectional errors, but the finite verb forms they use are correctly placed in second or first position (Clahsen et al. 1997). Thus, SLI children are capable of discovering the placement properties of finite verbs in German main clauses, despite their impairment in forming correctly inflected finite verbs. Acquired language disorders such as Broca’s aphasia show the same picture. German-speaking Broca’s aphasics often produce root infinitives, i.e. main clauses in which a finite verb form is replaced by a nonfinite form such as an infinitive or participle, and these nonfinite verb forms are generally placed clause-finally. However, when the aphasics produce finite verb forms, these are (correctly) placed in second or first position (Penke 1998, 2001; Wenzlaff & Clahsen 2005). These findings show that the (S)Vf O order is not only preferred in comprehension but also early in child language
114 Harald Clahsen acquisition and typically not affected in developmental language disorders or in aphasia. The next consideration is whether there are any potentially confounding factors that may account for the findings obtained. Schlesewsky et al. (2002) provided a critique of the Weyerts et al. (2002) study focusing on the results of experiment 2. They argued that the ERP effects seen in this experiment are due to differences between nouns and verbs rather than due to order preferences. Recall that the critical comparison in experiment 2 involved ERP effects on a bare plural noun in the SOVf conditions (2b, 2c) in comparison to a finite verb form in the SVfO conditions (2a, 2d). Thus, Schlesewsky et al. argued that word order is confounded with word category in this experiment. Note, however, that to determine whether the ERP effects seen in experiment 2 can be attributed to word-category differences, Weyerts et al. performed an additional control experiment in which all the critical nouns and verbs from experiment 2 were tested together with pseudo-words in a simple word list. This experiment yielded a significantly larger N400 effect for verbs than for nouns, i.e. a more negative-going waveform for verbs. Experiment 2, however, elicited the opposite pattern, a more negative-going waveform for nouns (appearing in the object position of SOVf sentences). Hence, the anterior negativity in experiment 2 cannot be due to lexical differences between nouns and verbs. Schlesewsky et al. dismissed the results of this control experiment because it tested the critical items in a word list rather than in sentences. Instead, they pointed to an ERP sentence study (Federmeier et al. 2000) in which similar ERP effects to Weyerts et al.’s experiment 2 were seen for nouns versus verbs despite the fact that the sentences they tested were perfectly well-formed in terms of word order. According to Schlesewsky et al., this indicates that experiment 2 taps a word-category difference, and not a word-order difference. Note, however, that Federmeier et al. (2000) did not test a pure word-category difference between nouns and verbs; instead their materials manipulated word category in relation to contextual appropriateness, and as they noted themselves (p. 2564), their results do not simply reflect a semantic or lexical difference. Thus, due to different kinds of experimental manipulation, a direct comparison of the results of Federmeier et al. with those of Weyerts et al. does not seem to be appropriate. Another reason for rejecting the idea that the brain response is caused by word-category differences comes form the results of experiment 3 of Weyerts et al., in which the SOVf order yielded an anterior negativity similar to the one seen in experiment 2 even though the critical words in experiment 3 were determiners and auxil-
Psycholinguistic perspectives on grammatical representations
115
iaries rather than nouns and verbs. Taken together, word-category differences cannot explain the set of experimental findings reported in Weyerts et al. (2002). Another potentially important factor is frequency, as it might be the case that a particular word order is preferred simply because it is the most common one. To address this possibility, Weyerts et al. (2002) reported frequency counts that were based on spontaneous speech corpora from 45 native speakers of German (Schmid 2002; corpus size: 186,858 words). It was found that of the 16,292 unambiguous sentences 32% had a simple finite verb in second or first position, 26% had a finite verb in final position, and 41% had a finite verb or auxiliary in second or first position and a nonfinite verbal element in final position. These frequency counts indicate that discontinuous verb placement is the most frequent pattern and that overall, verbs are common in both second or first position and/or final position, with similar frequencies. With respect to the placement of finite verb forms, however, there is a clearly dominant pattern: 74% of the finite verb forms appear in second or first position, and only 26% in final position. Thus, the possibility that the SVf O order preference in sentence comprehension is due to frequency cannot be ruled out. Consider, finally, the possibility that the observed differences between SVf O and SOVf are due to different working-memory demands of these two word orders. In on-line comprehension, sentences are comprehended incrementally, and upon encountering a given constituent the parser makes predictions as to what the next constituent might be. The longer a predicted constituent must be kept in memory before the prediction is satisfied, the higher the memory cost (Gibson 1998). Note furthermore that in the grammar of German, the subject is closely related to verb finiteness, as reflected by case marking and subject-verb agreement, and that in both the SOVf and SVf O sentences tested by Weyerts et al., the first NP which the parser encounters is the subject NP. It is, therefore, conceivable that once the subject NP is processed, the parser predicts a finite verbal element, i.e. I(nfl) or T(ense), by virtue of the head-dependent relationship between the subject and the finite verb within IP. This prediction is made for all the sentences tested in experiments 1–3, since they all have initial subjects. The important difference between SVf O and SOVf sentences is, however, that only in the former are the finite verb and the subject immediately adjacent, which may lead to higher memory costs for SOVf sentences. In such sentences, a finite verb is predicted once the subject is encountered, and this needs to be retained in working memory while the object is processed. In an SVf O struc-
116 Harald Clahsen ture, by contrast, the finite verb is also predicted upon encountering the subject, but it does not have to be retained in memory until the end of the clause. Thus, SVf O sentences are likely to consume less memory effort than SOVf structures, and hence the ERP effects and the longer reading times Weyerts et al. found in their experiments. Summarizing this section, the case discussed has revealed an order preference in German sentence comprehension. Converging evidence comes from studies of child language acquisition and people with language impairments. On the other hand, the possibility that the SVf O preference is due to external factors (frequency, working-memory demands) could not be excluded. Consequently, the experimental results do not decide between the conflicting syntactic analyses of German clause structure mentioned above. It should be emphasized, however, that this conclusion applies to the particular case I discussed, and not to results from sentence processing studies in general. There are indeed results from sentence processing studies that have been argued to provide evidence for or against a particular syntactic analysis. To mention a few examples, Nakano et al. (2002) claimed that their experimental results support configurational analyses of Japanese clause structure. Featherston et al. (2000) argued that their ERP results support a movement-based analysis of raising constructions, and Gibson & Warren’s (2004) results provide experimental evidence for the notion of intermediate gaps or traces. These cases need to be examined with care using the three criteria mentioned above, and it may very well turn out that some of these findings do indeed provide decisive evidence for a specific syntactic analysis.
5. Studies of language impairment and the analysis of passives and binding Research on language impairments investigates pairings of intact and impaired linguistic skills in different kinds of language-disordered populations and specifically asks whether it is possible to explain language impairments in terms of selective deficits within the linguistic system itself. Selective linguistic deficits are also potentially relevant for the theoretical linguist, at least for those who are willing to adopt what Grodzinsky (1990: 111) called the breakdown-compatibility criterion, according to which a linguistic theory or analysis is to be preferred if it can account for patterns of impairment and sparing of linguistic ability in a natural, non-ad-hoc way. The strongest
Psycholinguistic perspectives on grammatical representations
117
evidence we may get from studies of language impairments are so-called double dissociations, i.e. cases in which for two phenomena A and B, A is impaired in one population (where B is spared), and B is impaired in another population (where A is spared). Double dissociations indicate that the two phenomena in question are supported by different mental representations or mechanisms, and this may provide crucial evidence for evaluating competing linguistic accounts (see also Penke & Rosenbach 2004: 501f.). The specific case I will consider here to illustrate the use of evidence from language impairments concerns two phenomena, passives and anaphoric binding, for which conflicting analyses have been proposed in the theoretical literature. With respect to passives, there are two broad approaches, one based on syntactic movement and the other on lexical derivation. In transformational accounts (Chomsky 1981, 1995), passive participles are claimed to be unable to assign objective case to their internal arguments resulting in movement of the direct object to subject position where it can be assigned nominative case. As illustrated in (4), object-to-subject movement leaves behind a phonologically silent copy of the object (= trace) that is coindexed with the moved object and is assigned a thematic role by the verb. The moved direct object and its trace form an A(rgument)-chain in that both elements are in argument positions of the same verb (locality), the moved element c-commands its trace, and they share the same syntactic features (chain uniformity); see Chomsky (1995: 270ff.). (4)
[[The fish]i is [[eaten ti]] [by the man]]
In other syntactic frameworks, e.g. Lexical Functional Grammar (LFG, see Bresnan 1982) and Head-driven Phrase Structure Grammar (HPSG, see Pollard & Sag 1994), the derivation of the passive does not involve any syntactic movement. Instead, the passive verb is considered to be lexically derived from the active verb by which the thematic role assigned to the direct object is assigned to the subject of the passive; see Blevins (2003b) for recent discussion. The notion of ‘binding’ refers to sentence-internal referential dependencies between anaphoric elements (including pronouns and reflexives) and their antecedents. The original version of the Binding Theory (Chomsky 1981) consists of three principles: Principle A states that a reflexive pronoun must be bound by a local antecedent within the same clause, Principle B says that a non-reflexive pronoun may not be syntactically bound by a local antecedent, and Principle C says that names and other referential ex-
118 Harald Clahsen pressions must not be bound. These principles are specifically stipulated for binding phenomena, which are claimed to be determined at the level of syntactic representations. Alternatively, it has been argued that binding of (non-reflexive) pronouns is based on semantic interpretation whereas binding of reflexives should be defined in syntactic terms (Pollard & Sag 1992; Sells 1991; Reinhart & Reuland 1993). Moreover, attempts have been made to derive the locality conditions on binding from independent syntactic principles rather than stipulating them as separate binding principles (Hornstein 2001; Reuland 2001). Adopting Chomsky’s (1995) feature checking account, Reuland (2001) shows that the dependency between a reflexive pronoun and its antecedent (e.g. John believes that Maryi likes herselfi ) forms an A-chain, as both are in argument positions, have the same syntactic features, and the antecedent c-commands the reflexive. This is not the case for non-reflexive pronouns, as syntactic (phi-) features of the local antecedent and the pronoun may differ (e.g. Johni believes that Mary likes himi ). To show how findings from language impairments might bear on the controversial nature of passives and binding phenomena, I will report the results from a study (Ring & Clahsen 2005a) investigating these phenomena in people with Down’s Syndrome (DS) in comparison to people with Williams Syndrome (WS). DS is the most common identifiable cause of intellectual disability, accounting for approximately 20% of the mentally handicapped population. DS is caused by an extra copy of a segment of Chromosome 21 that is associated with specific physical features and cognitive delay. Previous studies (see Tager-Flusberg 1999, Ring & Clahsen 2005b for review) have indicated that language abilities are relatively more impaired than other areas of cognition in this population, and that within the language system, morphosyntax is more impaired than other linguistic domains. Several studies have reported asynchronous patterns of linguistic development in DS, e.g. enhanced levels of lexical skill relative to reduced levels of morphosyntax. Finally, there are studies of DS that discovered patterns of morphosyntactic skill that are qualitatively different from those observed in normally developing children (Fabretti et al. 1997; Perovic 2004). These results suggest the possibility of a selective within-language impairment in people with DS. WS is a rare genetic disorder associated with learning difficulties and relative strength in language. It is caused by a microdeletion on the long arm of chromosome 7 at 7q11.23, which affects one allele of the elastin gene and other contiguous genes. Within cognitive skills, there is a
Psycholinguistic perspectives on grammatical representations
119
spatial disorder, for example in drawing. The development of language is also uneven but there is dispute about the actual performance on language tasks and the best theoretical interpretation of this performance (Bartke & Siegmüller 2004). Ring & Clahsen (2005a) investigated 8 adolescents diagnosed with DS, 10 adolescents with WS, and control groups of children whose chronological ages were closely matched to the mental ages of the impaired participants but who had no known learning impairments. The impaired participants’ mental ages were derived from full IQ scores on the Wechsler Intelligence Scale for Children (Wechsler 1992). The purpose of matching the participant groups on mental age was to control for their level of intellectual development. Anaphoric binding was tested using the sentence-picture judgment task STOP (Syntactic Test of Pronominal Reference, van der Lely & Stollwerck 1997), in which a picture seen by the participants either matched the contents of a yes-no question spoken by the experimenter (requiring a yes response) or did not match (requiring a no response). For example, the child was presented with a picture of Mowgli and Baloo Bear in which Mowgli was tickling Baloo Bear. An introductory sentence was spoken by the experimenter (This is Mowgli, this is Baloo Bear) which was followed by the experimental sentence Is Mowgli tickling him/himself?, to which the children had to reply ‘yes’/’no’ respectively. The comprehension of active and passive sentences was examined using the sentence-picture matching task TAPS (Test of Active and Passive Sentences, van der Lely 1996), in which participants listened to sentences and were required to indicate for each sentence which one of four pictures most closely matched its contents. The sentences contained action verbs and animate arguments. There were four conditions: (i) active transitive (The man eats the fish), (ii) full verbal passive (The fish is eaten by the man), (iii) short progressive passive (The fish is being eaten), (iv) ambiguous (stative or eventive) passive (The fish is eaten). The pictures presented for each sentence depicted four different responses: (i) transitive (a man eating a fish), (ii) reversal (a fish eating a man), (iii) adjectival (an eaten fish on a plate), (iv) semantic distracter (the remains of a man). The results from Ring & Clahsen (2005a) are summarized in Table 3, which shows percentages of correct responses for the different conditions of the two experiments in the three participant groups.
120 Harald Clahsen Table 3. Percentages correct in actives/passives experiment and in binding experiment for participants with Down’s Syndrome (DS), Williams Syndrome (WS), and unimpaired controls (CTR) DS
WS
CTR
76.1 54.5
90.0 81.7
94.4 88.9
54.1 84.6
92.5 96.3
90.6 92.1
Active/passive experiment Actives Passives Binding experiment Reflexives Non-reflexives
The WS participants performed almost perfectly in both experiments, and there were no statistically significant differences between the accuracy scores of the WS participants and the unimpaired controls in any condition. Moreover, the types of (occasional) error were also similar to those given by the controls. These results indicate that the grammatical mechanisms for correctly interpreting passives and sentences with reflexive and non-reflexive pronouns are not affected in WS, at least not beyond a general developmental delay (see also Clahsen & Almazán 1998). The results for the DS group were clearly different. The DS participants had significantly higher accuracy scores for non-reflexive pronouns than for reflexive ones, whereas for the control children there was no such difference. Between-group comparisons showed that the DS participants performed significantly worse than the unimpaired controls on the reflexive conditions, whereas there were no statistically reliable differences between the DS and the control participants for non-reflexive pronouns. These results indicate that the interpretation of sentences with reflexive pronouns causes particular difficulties for the DS participants. With respect to active and passive sentences, the DS participants’ accuracy scores for actives were significantly higher than for passives, and the DS participants gave significantly more reversal responses than the controls, i.e., they incorrectly took the first NP they heard to be the agent argument. Thus, taken together, the DS participants experienced difficulty interpreting passives and sentences with reflexive pronouns, while they performed better in active sentences and in sentences with non-reflexive pronouns. By contrast, the WS participants appeared to be unimpaired in these domains.
Psycholinguistic perspectives on grammatical representations
121
What are the implications of these findings for the conflicting analyses of binding and passivization outlined above? To address this question, consider first potential confounding factors. One important factor is the low level of general intelligence in DS which could mean that the difficulties with passives and reflexives are the result of a broader non-linguistic impairment. Note, however, that the WS participants achieved high accuracy scores in both experiments, even though they also had low IQs, in the same range as the DS participants, suggesting that the difficulties with passives and reflexive binding in DS cannot straightforwardly be attributed to their (low levels of) general intelligence. Another possibly confounding factor is that language development is delayed in DS and that the patterns seen for the DS participants may represent an early stage of normal acquisition. It is true that grammatical development in the DS participants was indeed severely delayed as revealed by the standardized ‘Test of Reception of Grammar’ (TROG, Bishop 1982) in which the DS achieved a score equivalent to that of 4;6-year-old unimpaired children. Note, however, that this developmental delay does not account for the specific patterns of impairment in DS for binding and passives. With respect to binding, many studies have shown that typically developing children display adult-like comprehension of sentences with reflexives from about 3 years of age (McKee 1992), while even 4-year-old children incorrectly take a nonreflexive pronoun to be bound by its local antecedent (Thornton & Wexler 1999). This is the opposite pattern of what was seen in DS. Indeed, the contrast found in DS between correct non-reflexive and impaired reflexive pronoun interpretation has not been witnessed before in any study of anaphoric binding with unimpaired children of any age indicating that, at least in this domain, linguistic development in DS is not simply delayed. Likewise, studies of passivization in young children have shown that typically developing children comprehend the kinds of passive sentences Ring & Clahsen tested by at least 3;6–4 years of age (Guasti 2002: 269), which is in contrast to the low level of performance seen for the DS participants. Thus, we can rule out the DS participants’ low IQs and their general delay of language development as confounding factors for their specific difficulties with reflexive binding and passives. Our next concern is whether there is any converging evidence for the findings obtained by Ring & Clahsen (2005a) from other studies of DS. For binding, there is one study (Perovic 2004) testing four young adults with DS (age range: 17;2–20;7 years) in a picture truth-value judgement task similar to the one Ring & Clahsen used. Perovic’s participants achieved near-perfect
122 Harald Clahsen accuracy scores (> 90%) in sentences containing non-reflexive pronouns and much lower scores (< 60%) in sentences with reflexives, a pattern parallel to the one Ring & Clahsen obtained. Bridges & Smith (1984) tested the comprehension of passives by 24 young adults with DS and 24 nonretarded children matched to the DS children and found accuracy scores of over 80% on actives (similarly to controls) and of around 50% on passives, a score that was significantly lower than the ones for 4;6–5-year old controls. These results provide converging evidence that reflexive binding and passivization are specifically impaired in DS. Consider, finally, the syntactic accounts of binding and passives described above in the light of the patterns of impairment seen in DS. The results for binding in DS (compared to younger unimpaired children) revealed a double dissociation. In DS, binding of reflexives (but not of nonreflexives) is impaired, in 3–4-year old normal children the reverse was found, accurate performance on reflexives and difficulties interpreting sentences with non-reflexive pronouns. Double dissociations are an indication that the two phenomena in question are independent and supported by different mental representations or mechanisms. Thus, the results on binding are not in line with standard Binding Theory according to which conditions on reflexives (= Principle A) and conditions on non-reflexive pronouns (= Principle B) are both being determined at the level of syntactic representations. Instead, the results provide evidence for the view that reflexive and non-reflexive binding involve different kinds of representation (see e.g. Pollard & Sag 1992; Sells 1991; Reinhart & Reuland 1993). The second finding was that in DS, impaired reflexive binding coincides with impairments in passives, i.e. low accuracy scores on passive sentences and many incorrect reversal responses. This finding is more compatible with theoretical accounts such as Reuland (2001) that posit the same syntactic mechanism (= A-chains) for both passives and reflexive binding than with theories according to which passivization and reflexive binding do not have much in common, because from the perspective of Reuland’s theory, the pattern of impairment in DS can be ascribed to a common source (= impaired A-chain formation) which affects both passives and reflexive binding. In summary, the most important finding reported here is the double dissociation between reflexive and non-reflexive binding. It was also found that impairments in reflexive binding were correlated with impairments in passives. Converging evidence from different studies of DS was reported, and a number of potentially confounding factors for this pattern of impairment (low IQs, developmental delay) could be excluded. While these re-
Psycholinguistic perspectives on grammatical representations
123
sults are suggestive of a specific impairment of A-chain formation in DS, further empirical studies are required to determine whether the impairment generalizes to other phenomena that involve A-chains, e.g. raising constructions (John seems to be a nice guy), to infinitives (John is believed to be a nice guy) and unaccusatives (The book arrived yesterday). Moreover, it is possible that the observed pattern of impairment for reflexives and passives is part of a broader deficit extending to A’-dependencies, which needs to be studied through tasks investigating wh-constructions and relative clauses.
6. Conclusion This chapter addressed the question of the potential use of psycholinguistic evidence for theoretical linguists focusing on the nature of grammatical representations. I argued that some common ground is required for psycholinguistic findings to bear on linguistic theories and analyses, and I suggested that the search for the most appropriate mental representations for language provides such common ground. I also pointed out that results from psycholinguistic studies need to be examined with care before being used as evidence for grammatical representations. Three criteria were set out to evaluate the potential theoretical implications of psycholinguistic findings. We should ask whether there are any confounding factors or alternative explanations for a given psycholinguistic result, whether there is converging evidence from other sources, and whether a given finding confirms or disconfirms a specific linguistic account. Three case studies were presented in which these criteria were applied, one from language acquisition, one from language processing, and one from language disorders. My conclusion from these case studies is that psycholinguistic findings do indeed provide evidence that the theoretical linguist may find useful (along with other sources of evidence) in developing descriptive and theoretical analyses for a given set of phenomena and that psycholinguistic results may even help to adjudicate between competing theoretical accounts.
Acknowledgements I am grateful to my colleagues Robert Borsley and Claudia Felser for useful comments on an earlier version of this chapter.
124 Harald Clahsen Note 1. Moreover, many German participles are prefixed with ge-. This prefixation, however, is prosodically determined depending on the stress pattern of the verbal stems: ge- occurs when the verbal stem is stressed on the first syllable. Since German verbal stems are often stressed on the first syllable, the (unstressed) geprefix is highly frequent. Note, furthermore, that the choice of the prefix is irrelevant for the morphological distinction between regular and irregular inflection, as it occurs in both regular (-t) and irregular (-n) participles.
References Aronoff, Mark 1994 Morphology by Itself. Cambridge, MA: MIT Press. Baayen, Harald, Richard Piepenbrock & H. van Rijn 1993 The CELEX lexical database (CD ROM). Philadelphia, PE: Linguistic Data Consortium, University of Pennsylvania. Bartke, Susanne & Julia Siegmüller (eds.) 2004 Williams Syndrome Across Languages. Amsterdam: Benjamins. Bartke, Susanne, Gary Marcus & Harald Clahsen 1996 Acquiring German noun plurals. In Proceedings of the 19th Annual Boston University Conference on Language Development, D. MacLaughlin & S. McEwen (eds.), 60–69. Boston: Cascadilla Press. Bishop, Dorothy 1982 Test for Reception of Grammar (TROG). London: Medical Research Council. Blevins, James P. 2003a Stems and paradigms. Language 79: 737–767. 2003b Passives and impersonals. Journal of Linguistics 39: 473–520. Booij, Geert 2002 The Morphology of Dutch. Oxford: Oxford University Press. Bresnan, Joan 1982. The Mental Representation of Grammatical Relations. Cambridge, MA: MIT Press. Bridges, Allayne & Joanne Smith 1984 Syntactic comprehension in Down’s syndrome children. British Journal of Psychology 75: 187–96 Bybee, Joan L. 1999 Use impacts morphological representation. [Commentary to Clahsen 1999]. Behavioral and Brain Sciences 22: 1016–1017. 1995 Regular morphology and the lexicon. Language and Cognitive Processes 10: 425–455.
Psycholinguistic perspectives on grammatical representations
125
Caramazza, Alfonso 1984 The logic of neuropsychological research and the problem of patient classification in aphasia. Brain and Language 21: 9–20. Chialant, Doriana & Alfonso Caramazza 1995 Where is morphology and how is it processed? The case of written word recognition. In Morphological Aspects of Language Processing, Laurie Beth Feldman (ed.), 55–76. Hillsdale, NJ: Erlbaum. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. 1995 The Minimalist Program. Cambridge, MA: MIT Press. Clahsen, Harald 1990 Constraints on parameter setting: A grammatical analysis of some acquisition stages in German child language. Language Acquisition 1: 361–391. 1997 The representation of participles in the German mental lexicon: evidence for the dual-mechanism model. Yearbook of Morphology 1996: 73–96. 1999 Lexical entries and rules of language: a multi-disciplinary study of German inflection. Behavioral and Brain Sciences 22: 991–1060. Clahsen, Harald & Martina Penke 1992 The acquisition of agreement morphology and its syntactic consequences. In The Acquisition of Verb Placement, Jürgen M. Meisel (ed.), 181–223. Dordrecht: Kluwer. Clahsen Harald & Monika Rothweiler 1993 Inflectional rules in children’s grammars: evidence from the development of participles in German. Yearbook of Morphology 1992: 1–34. Clahsen, Harald, Susanne Bartke & Sandra Göllner 1997 Formal features in impaired grammars: a comparison of English and German SLI children. Journal of Neurolinguistics 10: 151–171. Clahsen, Harald & Mayella Almazán 1998 Syntax and morphology in children with Williams Syndrome. Cognition 68: 167–198. Clahsen, Harald & Sam Featherston 1999 Antecedent priming at trace positions: evidence from German scrambling. Journal of Psycholinguistic Research 28: 415–437. Clahsen, Harald, Peter Prüfert, Sonja Eisenbeiss & Joana Cholin 2002a Strong stems in the German mental lexicon: evidence from child language acquisition and adult processing. In More than Words: A Festschrift for Dieter Wunderlich, Ingrid Kaufmann & Barbara Stiebels (eds.), 91–112. Berlin: Akademie-Verlag. Clahsen, Harald, Fraibet Aveledo & Iggy Roca 2002b The development of regular and irregular verb inflection in Spanish child language. Journal of Child Language 29: 591–622.
126 Harald Clahsen Clahsen, Harald, Meike Hadler & Helga Weyerts 2004 Speeded production of inflected words in children and adults. Journal of Child Language 31: 683–712. Clahsen, Harald, Monika Lück & Anja Hahne 2007 How children process overregularizations: evidence from eventrelated brain potentials. Journal of Child Language 34: 601–622. Eisenbeiss, Sonja 2002 Merkmalsgesteuerter Grammatikerwerb. Unpublished Dissertation. University of Düsseldorf. Fabretti, D., E. Pizzuto, S. Vicari & V. Volterra 1997 A story description task in children with Down’s syndrome: lexical and morphological abilities. Journal of Intellectual Disability Research 41: 165–179. Fanselow, Gisbert 1988 German word order and universal grammar. In Natural Language Parsing and Linguistic Theories, Ulrich Reyle & Wolfgang Sternefeld (eds.), 317–355. Dordrecht: Reidel. Farina, Juan Carlos Acuna 2005 Aspects of the relationship between theories of grammar and theories of processing. Atlantis 27: 11–27. Featherston, Sam, Thomas Münte, Michael Gross & Harald Clahsen 2000 Brain potentials in the processing of complex sentences: an ERP study of control and raising constructions. Journal of Psycholinguistic Research 29: 141–154. Federmeier, Kara D., Jessica B. Segal, Tania Lombrozo & Martha Kutas 2000 Brain responses to nouns, verbs and class-ambiguous words in context. Brain 12: 2552–2566. Felser, Claudia, Thomas Münte & Harald Clahsen 2003 Storage and integration in the processing of filler-gap dependencies: An ERP study of topicalization and wh-movement in German. Brain and Language 87: 345–354. Friederici, Angela D. 2002 Towards a neural basis of auditory sentence processing. Trends in Cognitive Sciences 6: 78–84. Gibson, Edward 1998 Linguistic complexity: locality of syntactic dependencies. Cognition 68: 1–76. Gibson, Edward & Tessa Warren 2004 Reading-time evidence for intermediate linguistic structure in longdistance dependencies. Syntax 7: 55–78. Grewendorf, Günther 1988 Aspekte der deutschen Syntax. Tübingen: Narr.
Psycholinguistic perspectives on grammatical representations
127
Grodzinsky, Yosif 1990 Theoretical Perspectives on Language Deficits. Cambridge, MA: MIT Press. Guasti, Teresa 2002 Language Acquisition: The Growth of Grammar. Cambridge, MA: MIT Press. Haider, Hubert 1993 Deutsche Syntax – generativ. Tübingen: Narr. Halle, Morris & Karuvannur P. Mohanan 1985 Segmental phonology of modern English. Linguistic Inquiry 16: 57– 116. Halle, Morris & Alec Marantz 1993 Distributed Morphology and the pieces of inflection. In The view from building 20: essays in linguistics in honor of Sylvain Bromberger, K. Hale & J. Keyser (eds.), 111–176, Cambridge, MA: MIT Press. Hornstein, Norbert 2001 Move! A Minimalist Theory of Construal. Oxford: Blackwell. Jackendoff, Ray 1997 The Architecture of the Language Faculty. Cambridge, MA: MIT Press. Kaan, Edith 1997 Processing Subject-Object Ambiguities in Dutch. PhD. thesis: University of Groningen. (Groningen Dissertations in Linguistics 20.) Kathol, Andreas 1990 A uniform approach to V2 in German. In Proceedings of the Northeast Linguistic Society 20: 244–254. Kutas, Marta & Bernadette M Schmitt 2003 Language in microvolts. In Mind, Brain, and Language, Marie T. Banich & Molly Mack (eds.), 171–209. Mahwah, NJ: Erlbaum. Love, Tracy & David Swinney 1996 Coreference processing and levels of analysis in object-relative constructions: Demonstration of antecedent reactivation with the crossmodal priming paradigm. Journal of Psycholinguistic Research 25: 5–24. Marcus, Gary, Ursula Brinkmann, Harald Clahsen, Richard Wiese & Steven Pinker 1995 German inflection: the exception that proves the rule. Cognitive Psychology 29: 189–256. Marcus, Gary, Steven Pinker, Michael Ullman, Michelle Hollander, John Rosen & Fei Xu 1992 Overregularization in Language Acquisition. Chicago: Monographs of the Society for Research in Child Development.
128 Harald Clahsen McKee, Cecile 1992 A comparison of pronouns and anaphors in Italian and English acquisition. Language Acquisition 2: 21–54. Miller, George A. & Noam Chomsky 1963 Finitary models of language users. In Handbook of Mathematical Psychology. Vol. II, R. D. Luce, R. R. Bush & E. Galanter (eds.), 419– 492, New York: Wiley. Nakano, Yoko, Claudia Felser & Harald Clahsen 2002 Antecedent priming at trace positions in Japanese long-distance scrambling. Journal of Psycholinguistic Research 31: 531–571. Osterhout, Lee 2004 Sentences in the brain: Event-related potentials as real-time reflections of sentence comprehension and language learning. In The Online Study of Sentence Comprehension: Eyetracking, ERP, and Beyond, M. Carreiras & C. Clifton(eds.), 271–308. Psychology Press. Penke, Martina 1998 Die Grammatik des Agrammatismus: Eine linguistische Untersuchung zu Wortstellung und Flexion bei Broca-Aphasie. Tübingen: Niemeyer. 2001 Controversies about CP: A comparison of Language Acquisition and Language Impairments in Broca’s Aphasia. Brain and Language 77: 351–363. 2006 The representation of inflectional morphology in the mental lexicon. An overview of psycho- and neurolinguistic methods and results. In Advances in the Theory of the Lexicon, D. Wunderlich (ed.), 389–428. Berlin /New York: Mouton de Gruyter. Penke, Martina & Anette Rosenbach 2004 What counts as evidence in linguistics? Studies in Language 28: 480– 526. Perovic, A. 2004 Knowledge of binding in Down syndrome: Evidence from English and Serbo-Croatian. Unpublished Ph.D. dissertation, University College London. Phillips, Colin 1996 Order and Structure. Unpublished PhD dissertation, MIT, Cambridge, MA. Pinker, Steven 1984 Language Learnability and Language Development. Cambridge, MA: Harvard University Press. Pinker, Steven 1999 Words and Rules. New York, NY: Basic Books. Pinker, Steven & Michael Ullman 2002 The past and future of the past tense. Trends in Cognitive Sciences 6: 456–462.
Psycholinguistic perspectives on grammatical representations
129
Poeppel, David & Ken Wexler 1993 The full competence hypothesis of clause structure in early German. Language 69: 1–33. Pollard, Carl & Ivan Sag 1992 Anaphors in English and the scope of Binding Theory. Linguistic Inquiry 23: 261–303. 1994 Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press. Reinhart, Tanya & Eric Reuland 1993 Reflexivity. Linguistic Inquiry 28: 178–187. Reis, Marga 1985 Satzeinleitende Strukturen im Deutschen. Über COMP, Haupt- und Nebensätze, w-Bewegung und die Doppelkopfanalyse. In Erklärende Syntax des Deutschen, Werner Abraham (ed.), 271–311. Tübingen: Narr. Rizzi, Luigi 2000 Remarks on early null subjects. In The Acquisition of Syntax, MarcAriel Friedemann & Luigi Rizzi (eds.), 269–292. Harlow: Pearson Education. Reuland, Eric 2001 Primitives of binding. Linguistic Inquiry 32: 439–492. Ring, Melanie & Harald Clahsen 2005a Distinct patterns of language impairment in Down’s Syndrome and Williams Syndrome: The case of syntactic chains. Journal of Neurolinguistics 18: 479–501. 2005b Morphosyntax in Down’s Syndrome: Is the extended optional infinitive hypothesis an option? Stem- Spraak- En Taalpathologie 13: 3–13. Rumelhart, David E. & James L. McClelland 1986 On learning the past tenses of English verbs. In Parallel Distributed Processing. Vol. 2, James L. McClelland, David E. Rumelhart & the PDP Research Group (eds.), 216–271, Cambridge, MA: MIT Press. Saffran, Eleanor 1982 Neuropsychological approaches to the study of language. British Journal of Psychology 73: 317–337. Sag, Ivan & Janet Fodor 1995 Extraction without traces. In Proceedings of the thirteenth annual meeting of the West Coast Conference on Formal Linguistics, 365– 384. Stanford: CSLI Publications. Say, Tessa & Harald Clahsen 2002 Words, rules and stems in the Italian mental lexicon. In Storage and Computation in the Language Faculty, Sieb Nooteboom, Fred Weerman & Frank Wijnen (eds.), 93–122, Dordrecht: Kluwer.
130 Harald Clahsen Schlesewsky, Matthias, Bornkessel, Ina & Martin Meyer 2002 Why a ‘word order difference’ is not always a ‘word order’ difference: A reply to Weyerts, Penke, Münte, Heinze & Clahsen. Journal of Psycholinguistic Research 31: 437–445. Schmid, Monika (2002) First Language Attrition, Use and Maintenance: The case of German Jews in Anglophone Countries. Amsterdam: Benjamins. Schreuder, Robert & Harald Baayen 1995 Modeling morphological processing. In Morphological Aspects of Language Processing, Laurie Beth Feldman (ed.), 131–154. Hillsdale, NJ: Erlbaum. Schütze, Carson 1996 The Empirical Base of Linguistics. Grammaticality Judgments and Linguistic Methodology. Chicago: Chicago University Press. 2004 Thinking about what we are asking speakers to do. In Stephan Kepser & Marga Reis (eds.), Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives. Berlin: Mouton de Gruyter, 457–485. Schwartz, Bonnie & Sten Vikner 1996 All verb-second clauses are CPs. In Parameters and Functional Heads, Adriana Belleti & Luigi Rizzi (eds.), 11– 61. Oxford: Oxford University Press. Sells, Peter 1991 Disjoint reference into NP. Linguistics and Philosophy 14: 151–169. Sereno, Joan & Allard Jongman 1997 Processing of English inflectional morphology. Memory and Cognition 25: 425–437. Sonnenstuhl, Ingrid, Sonja Eisenbeiss & Harald Clahsen 1999 Morphological priming in the mental lexicon: evidence from German. Cognition 72: 203–236. Stemberger, Joseph P. 1999 Frequency determines defaults in German: default perfect -t versus irregular plural -s. [Commentary to Clahsen 1999]. Behavioral and Brain Sciences 22: 1040–1041. Tager-Flusberg, Helen 1999 Language development in atypical children. In The Development of Language, M. Barrett (ed.), 311–348. Hove: Psychology Press. Thiersch, Craig 1978 Topics in German Syntax. Unpublished Ph.D. dissertation. MIT, Cambridge, MA. Thornton, Ross & Ken Wexler 1999 Principle B, VP Ellipsis, and Knowledge of Binding. Cambridge, MA: MIT Press.
Psycholinguistic perspectives on grammatical representations
131
Travis, Lisa 1984 Parameters and Effects of Word Order Variation. Unpublished Ph.D. dissertation, MIT, Cambridge, MA. 1991 Parameters of phrase structure and V2 phenomena. In Principles and Parameters in Comparative Grammar, Robert Freidin (ed.), 339– 364. Cambridge, MA: MIT Press. van der Lely, Heather 1996 Specifically language impaired and normally developing children: verbal passive vs. adjectival passive sentence interpretation. Lingua 98: 243–272. van der Lely, Heather & Linda Stollwerck 1997 Binding theory and specifically language impaired children. Cognition 62: 245–290. von Stechow, Arnim & Wolfgang Sternefeld 1988 Bausteine syntaktischen Wissens. Opladen: Westdeutscher Verlag. Wechsler, David 1992 Wechsler Intelligence Scale for Children-Third Edition UK. The Psychological Corporation. Sidcup, Kent: Harcourt. Weinberg, Amy 1999 A minimalist theory of human sentence processing. In Working Minimalism, Sam Epstein & Norbert Hornstein (eds.), 283–315. Cambridge, MA: MIT Press. Wenzlaff Michaela & Harald Clahsen 2005 Finiteness and verb-second in German agrammatism, Brain and Language 92: 33–44. Weyerts, Helga 1997 Reguläre und irreguläre Flexion: psycholinguistische und neurophysiologische Ergebnisse zu Erwerb, Verarbeitung und mentaler Repräsentation. Dissertation, University of Düsseldorf. Weyerts Helga & Harald Clahsen 1994 Netzwerke und symbolische Regeln im Spracherwerb: Experimentelle Ergebnisse zur Entwicklung der Flexionsmorphologie. Linguistische Berichte 154: 430–460. Weyerts, Helga, Martina Penke, Thomas Münte, Hans-Jürgen Heinze & H. Clahsen 2002 Word order in sentence processing: An experimental study on verb placement in German. Journal of Psycholinguistic Research 31: 211– 268. Wunderlich, Dieter 1996 Minimalist Morphology: the role of paradigms. Yearbook of Morphology 1995: 93–114. Yang, Charles D. 2003 Knowledge and Learning in Natural Language. Oxford: Oxford University Press.
132 Harald Clahsen Zwart, Jan-Wouter 1993 Dutch Syntax: A Minimalist Approach. Unpublished Ph.D. dissertation, University of Groningen.
Early language separation: A longitudinal study of a Russian-German bilingual child Elena Dieser
Mother: Mjachik do glazka doletel (R)1. ‘The ball went as high as the peep-hole [in the door].’ Child: Child: Child:
Net (R) eshchjo (R) höher (G). ‘No even higher.’ Vyshe (R). ‘Higher.’ Znaesh’ pochemu ja inogda po-nemecki govorju? Potomu chto ja ne pomnju nekotorye slova po-russki. A po-nemecki ja znaju luchshe (R). ‘Do you know why I sometimes speak German? Because I don’t remember some Russian words. German I know better.’ (CHI, 5;6.15)
1. Introduction Depending on the method of language input, various types of early childhood bilingualism can be distinguished, ranging from one extreme, where the child is in contact with two separate monolingual worlds, to the other pole, where the child is born into a bilingual world. At the “monolingual” end of the scale is the ‘one person – one language’ model. In this model, the parents are native speakers of two different languages and each parent speaks with the child only in their native language. Since Ronjat (1913), this ‘one person – one language’ method has been considered the best way for children to avoid mixing (Genesee, Nicoladis & Paradis 1995), and has generally been recommended as the most successful model for raising bilingual children. It is especially considered the method of choice for binational couples2. At the “bilingual” end of the scale there is the globally most common model called ‘mixed languages’. In this model, both parents speak both languages to the child. In an extreme variant of this type, the parents make no conscious effort to keep the two languages separated. The child’s input contains frequent language-mixing3 in one or both languages, and “sectors of community may also be bilingual” (Romaine 1996: 185). The type ‘mixed languages’ has often been regarded negatively in the literature, because of
134 Elena Dieser fears that it will hinder the child’s acquisition of the two languages as separate systems, and because language mixing has been thought to lead to delayed and perhaps incomplete acquisition of one or both languages. This scale is, however, a continuum. Even in the ‘one person – one language’ model in the strictest sense, meaning that each parent speaks only one language with the child, each parent must at some time also communicate with the other parent or with other monolingual people in the presence of the child. This means that all bilingual children have at least one parent who will occasionally function bilingually, even if the parents are each monolingual4 when speaking to the child. The reverse is also true: Most children who grow up according to the ‘mixed languages’ model where the parents are bilingual, have contact with monolingual people in at least one language. A lot of work has been done in the last 25 years on early bilingual development. The central question has been whether bilingual children separate the two languages from the very beginning of language acquisition, and especially of language production, or whether there is a one-language phase which precedes the acquisition of both languages as separate languages. An important issue has thus been how to interpret a high percentage of language-mixing at an early age. However, most of this research (Volterra & Taeschner 1978; De Houwer 1990; Paradis & Genesee 1996, van der Linden 2001 and many others), has been concerned with language acquisition using the ‘one person – one language’ model. This model, however, is not always as closely followed as its name implies. In some families, one parent is less consistent in using just one language than the other, and may speak to the child in the other language, or be more open to codeswitching in general. The present study investigates early language separation (mainly between the ages of 2 years and 3 months and 2 years and 10 months) using data from a child acquiring Russian and German simultaneously (the first, to our knowledge). The input pattern from the parents tended towards the bilingual end of the scale. They both spoke to the child their native language, Russian, and the language of the country where they lived, German, but were very careful to separate the two languages, thus avoiding language mixing. Up to now, very few studies concerning language development in bilingual children of this particular type have been conducted (Smith 1935, Burling 1959, Ellul 1978 come close). The main focus of this paper is to examine whether children of this type first work with one language or two, and to what extent the results of studies
Early language separation
135
of the type ‘one person – one language’ differ from the results of this type. The evidence which will be discussed in this paper may be broken down as follows below. This case study provides new data from the language use of the child. 1)
The evidence of the language use of the child: a. Production of the child i. With bilingual speakers ii. With monolingual speakers b. Comprehension of the child
2)
The evidence which the child obtains by observing his or her environment: a. The language use of the parents b. The different phonological systems of the two languages
2. The “One System Theory” vs. the “Two System Theory” Even bilingual children whose parents consistently separate the languages usually have a high percentage of language mixing in both languages when they first start to speak. This is the reason why most early studies of bilingual language acquisition agreed that all bilingual children have access to only one language system during the early phase of language acquisition (see Leopold 1949), which then serves as the basis from which two separate language systems develop: Children acquiring two languages from birth or early infancy pass through a stage in which the two languages are undifferentiated to a gradual separation of the two systems. (Swain & Wesche 1975: 17)
This idea was called the “one system theory” (Redlinger & Park 1980; Vihman 1985) in bilingual research and was further developed into the Three-Phase Model by Volterra & Taeschner (1978). According to this model, during the first phase, children possess a lexical system which contains lexemes from both languages. This implies that young, simultaneously bilingual children reject cross-language synonyms in their first lexicon. In the second phase, two lexical systems develop, in contrast to only one syntactic system. Only in the third phase are the language systems separated, both on the lexical and the syntactic level. The child is considered to be
136 Elena Dieser ‘truly’ bilingual, according to the one system theory, when the child is first aware of his or her bilingualism, between the ages of 2 and 3. Genesee (1989), however, re-examined the empirical basis for these claims and proposed that much of their data (e.g. Volterra & Taeschner 1978) could just as well be interpreted as consistent with the “two system theory”. He concluded that children are able to differentiate between two language systems from the early stages of language acquisition, “by the age of two, if not earlier” (see Paradis 2001). Genesee (1989: 165) does not see a high proportion of language mixing in early production as sufficient evidence that the child cannot yet distinguish the two languages, all the more so since the linguistic context is often not specified. “In order to uphold the unitary-system hypothesis one would need to establish that, all things being equal, bilingual children use items from both languages indiscriminately in all contexts of communication. […] In contrast, support for the differentiated-language systems hypothesis would require evidence that the children use items from their two languages differentially as a function of context”. Building in part on these criteria, more studies in the last couple of years have endorsed the “two system theory” in different linguistic fields, such as pragmatics (Lanza 1992; Genesee, Nicoladis & Paradis 1995), the lexicon (Quay 1995), morphosyntax (Meisel 1989; Paradis & Genesee 1996) and phonology (Schnitzer & Krasinski 1994). Our study also tends to support the “two system theory”. However we have some doubts about the suggestion of Genesee (1989: 166) as a criterion for the differentiated-language systems hypothesis that “…in particular, if the differentiated-language systems hypothesis were true, one would expect to find more frequent use of items from the weaker language in contexts where that language is being used than in contexts where the stronger language is being used, even though items from the stronger language might predominate in both contexts.” We do not intend to account for language acquisition in purely behaviouristic terms, but we assume that the wish to adjust to the language spoken by a conversation partner will no doubt affect the data, all the more so for a child (see the fourth column ‘repeated words’ in Tables 2-4). We should thus expect the child to reflect the immediate input and produce proportionally more of the language in which they are being addressed. We propose the following alternative criteria as indicators that a child ‘knows’ they are working with two systems. 1) When the child can switch5 into monolingual mode. This means for us, that the vocabulary the child chooses when speaking with monolingual
Early language separation
137
people contains a smaller proportion of language mixing than when the child is speaking this language with bilingual people; 2) When the child in the context of one language systematically chooses not to make use of expressions which he or she knows in the other language, even when the word is not known in the language being spoken. The urge to communicate is strong, but so is the urge to conform to the language being spoken. If the child prefers to be silent rather than use the ‘wrong’ language, because they don’t know the expression in the ‘right’ language, we can conclude that this child distinguishes the two languages. This type of situation was observed regularly in this study. In order to untangle the evidence about a child’s differentiation of two languages, we must also consider what might cause a child to conclude that he or she was dealing with two separate systems. A child brought up using the ‘one person – one language’ system will quickly differentiate the two systems for functional reasons, in order to communicate with both parents. However this child is not forced to conclude that the two languages are alternative optional encodings. Perhaps men and women use different styles within the same system – perhaps men just speak more formally than women. Put differently, a child may conclude that there are for example physiological or individual reasons for the varying form of the messages; it might be related to other systematic variables. Mother feeds the child and father reads him or her books. The child has no way of distinguishing linguistic variation from a wide range of other systematic differences between family members. The child studied here however observed from the very beginning that his parents switched systems whenever a German speaker entered the room. Five minutes earlier, the thing they were reading was called a kniga, then it was suddenly called a Buch. This would seem to be rather stronger evidence of the wider social reality of separate systems. In this study, the parents effectively demonstrated bilingualism to the child and “taught” him the rules of code-switching. Evidence from the child’s production in conversations with his parents might not reveal evidence of this until some time later, since there was no functional necessity for him to reduce language mixing with his parents: no communicative need to distinguish the languages. This is one of the most important reasons for the very common assumption that in the situation when one and the same person speaks two languages with a young child, language separation of these two languages, and language acquisition in general, is hindered and delayed (Romaine 1996).
138 Elena Dieser 2.
The case study
2.1. The child The child studied, Alex, is the son of the author. He has lived in Germany since birth. The parents are native speakers of Russian, but both speak German fluently. They spoke Russian with Alex when they were alone with him or in the presence of other Russian speakers. Whenever German speakers were part of the conversation, the parents, especially the mother, spoke German with the child. At this time, for example, the family lived together in the same home as three monolingual German speakers. The child’s maternal grandparents, who live in Germany, always spoke Russian with the child. In addition to this regular Russian input, Russian alone was spoken during a total of five annual three-week stays in Russian-speaking environments (Belarus, Russia). The child’s first conscious contact with monolingual Russian-speaking persons was at the age of 2;8. Further German input was obtained by German-speaking friends of the child’s parents, as well as his mother’s work colleagues. (Up to the start of preschool, the child was taken regularly by his mother to work, for about two hours a day.) Additionally, the parents regularly read books to the child starting when the child was one year old, about 4-5 times a week in Russian, and 2-3 times a week in German (by the mother only). The parents never insisted on expressions from the current active language being used, but encouraged them and provided translations if they thought the child did not know or could not remember an expression. The linguistic input was therefore as follows (excluding the stays in a Russian-speaking environment): – From birth until the age of 2;9: 65–70% Russian and about 30–35% German; – From the start of preschool (2;9) to the time of writing (5;10) the input ratio has remained more or less the same: 50% Russian and 50% German.
2.2. The data The data for this child consists of approximately 260 video recordings, each approximately 60 minutes in length, as well as diary entries covering the period of time from the birth of the child until the age 5;10 (on-going).
Early language separation
139
Recordings were made at the following intervals: – in the first year about every 3–4 weeks; – from the second year approximately every 2 weeks. However, during the rapid lexical development phase, or “vocabulary spurt” (2;2–3;0) once a week or even for two or three consecutive days. During the recordings, the mother interacted with Alex in German6 and Russian (the order in which the two languages were spoken varied from recording to recording); the father in Russian; Russian-speaking adults and children in Russian (from about the age of 3;0) and German-speaking adults and children in German. Additionally, recordings of the child’s “soliloquies” while playing with toys by himself were made, from the start of language production at regular intervals. Diary entries were kept by the child’s mother once a week on average, but daily during the vocabulary spurt. The diary entries supplement the video recordings and serve as an important source for the first appearances of lexical items and for the linguistic and non-linguistic context. At the time of writing, the data up until the age of 3;10 has been systematically transcribed and analysed. Of the material from the later phase (3;11– 5;10), individual German and Russian recordings at time intervals of about 4–5 months, as well as all diary entries, have been analysed. All transcripts were made using the notation model CHAT from the CHILDES project (MacWhinney 1991).
3.
Findings
3.1. Comprehension Tests of comprehension were always seen as an integral part of recordings. Emphasis was placed on comprehension before the start of intensive production. In testing comprehension, Alex was asked to either point to certain objects, especially in picture books, or to complete a certain activity, whereby his comprehension of cross-language synonyms (equivalents) on that particular day was tested. The first of these tests was carried out when the child was 1;1. It was evident from the first test that the child readily accepted the existence of cross-language synonyms. For example, at the age of 1 year 2 months and 17 days (henceforth 1;2.17), the child was asked first by his mother, in German, and then by his father, in Russian, to point
140 Elena Dieser to the mouse and the elephant in the same picture book. This task was completed without any difficulty. In other tests, the child was asked to point to other animals (such as a dog, a cat, or a duck) and objects (such as a ball, a shovel, a swing) in picture books, or to point to various parts of the body, either his own or others’ (e.g. head or leg). He was also asked to take his father’s glasses and give them to his mother, to put a plate on the table, to put stickers on a mirror, and to do other similar tasks. We may make some generalizations about the tests of comprehension. For all German words that the child understood up until the age of approximately 2 years, he also understood their Russian equivalents; however, the reverse was not true. The tests of comprehension made when the child was 1;6.13; 1;9.26 and 2;0.0 showed that the child’s passive vocabulary was composed of approximately two-thirds Russian and one-third German lexemes, in line with the ratio of linguistic input. 3.2.
Production
Language production was systematically analysed from the age of about 1;0, when proto-words were first uttered. By the age of 1;10 the child used a total of 12 proto-words. 3.2.1. One-Word Stage The child studied began to talk relatively late in both languages (2;0 < 50 words)7, although the development of passive vocabulary in both Russian and in German had been rapid since before the age of 2;0. The first words which had the meaning of real words were produced by the child between the ages of 1;2 and 2;2. This may thus be referred to as the one-word phase. During the entire one-word phase, in contrast to other children studied whose parents were native speakers of two different languages (e.g. Quay 1995), this child used no cross-language synonyms (cf. ‘the principle of contrast’8 of Clark 1993 and van der Linden 2001). Table 1 shows the exact period of time it took between the first utterance of a particular word and the time the child first uttered the equivalent of this word in the other language. It took the child from five months to a year and four months for the child to acquire the equivalents of these words. This even included the word ‘yes’, which was said only in Russian for five months, and ‘no’, which was said only in German for five months.
Early language separation
141
Table 1. List of early words between ages 1;2 and 2;2 Word
Age word first produced
mama
ca. 1;2
papa
ca. 1;4
Age equivalent first produced
Equivalent
Time interval (yrs; mths.dys)
aua (G) ‘ow’
1;5.10
2:10.7
bol’no
1;4.27
Auto (G) ‘car’
1;8.26
2;9.1
mashin(k)a
1;0.5
njam-njam ‘yum-yum’
1;9.6
baba (R) ‘grandma’
1;10.20
2;8.19
Oma
0;9.29
djadja (R) ‘uncle/man’
1;10.28
2;9.6
Mann
0;10.8
wau-wau (G) ‘bowwow’
1;11.19
da (G) ‘there’
2;0.0
2;8.16
tam
0;8.16
ba-ba(x) (R) ‘fall down’
2;0.10
(h)allo (G) ‘hello’
2;0.28
3;0.1
privet
0;11.3
heiß (G) ‘hot’
2;1.8
2;9.25
gorjachij
0;8.7
Antoshka (R) (name)
2;1.8
kartoshka (R)’potato’
2;1.28
2;8.16
Kartoffel
0;6.18
uxo (R) ‘ear’
2;2.7
2;11.3
Ohr
0;8.26
nein (G) ‘no’
2;2.12
2;7.13
net
0:5.1
auch (G) ‘also’
2;2.12
2;10.18
tozhe
0;8.6
da (R) ‘yes’
2;2.15
2;7.20
ja
0;5.5
tjotja (R) ‘aunt’
2;2.15
2;10.7
Frau
0;9.22
It is interesting to note that although the child did not produce any crosslanguage synonyms during the early phase of his active lexicon, he did use his first homonym pair, the Russian da ‘yes’ and the German da ‘there’. It seemed to be easy for the child to keep the meanings of these words completely separate from the very beginning (cf. Clark 1993), as can be seen from Example 1 taken from data recorded at a later age. (1)
Age of child: 2;6.3 (Russian matrix situation) Mother: a gde slon (R)? ‘Where is the elephant?’ Child: da (G). ‘There.’ ……………………
142 Elena Dieser Mother: a ruchki my tebe kremom pomazali (R)? ‘Did we put cream on your hands?’ Child: da (R). ‘Yes.’ There are two surprising aspects of this analysis of the early child lexicon: first of all, the number of German words equalled the number of Russian words, and secondly, the German words had no Russian equivalents during this time. This was the case despite the fact that Russian equivalents of the German words were much more frequent in the child’s language input during this period. As a possible explanation for this, we find that Alex, since his parents spoke to him in both languages had a very strong tendency to orient himself towards the German language, since he had been in the presence of Germanspeaking monolinguals basically since birth. His German orientation can also be seen from his early syntactical development at a later point in time. Alex wasn’t aware of meeting Russian-speaking monolinguals until the age of 2;8. The reason why the child initially chose a particular equivalent of a pair of possibilities, and used this particular equivalent for an extended period of time, was probably due to other factors. Many of the child’s first German words sounded alike. They contained mostly the dipthongs [au] and [ei], as well as the consonant [x]. These elements were more frequent components of later phases of child babble than were some of the proto- words, and may therefore have been easier for the child to pronounce (van der Linden 2001). The Russian equivalents of these words were phonetically much more complex. The idea that the level of difficulty in pronunciation plays a role in terms of the order in which words enter the lexicon is also supported by the first Russian words acquired by the child. As with the group of first German words, these Russian words sounded alike. Four of them contained, for example, syllables with the sound [x], which seemed very easy for the child to pronounce. The acquisition of the proper name Antoxa, or Antoshka in Standard Russian, for example, seems to have enabled the acquisition of other similar-sounding words. Twenty days after this word was acquired, the child had also acquired the word kartoxa, or kartoshka ‘potato’ in Standard Russian, and nine days later, the word uxo ‘ear’. The phenomenon of the acquisition of similar-sounding words in a short period of time has also been observed in monolingual children.
Early language separation
143
3.2.2. Two-Word Stage At the age of 2;2–2;3, the first two-word expressions were produced, e.g.: (2)
Age of child: 2;2.15 (German matrix situation) Child: (Mama) Auto aua (G) [% the (toy) car is broken] ‘(Mommy) car ow’.
In terms of equivalents, the two-word stage can be divided into two phases, one with and one without translation equivalents. Between the ages of 2;2 and 2;4.16, the child still had not begun to use any equivalents. This phase, however, was characterized by rapid and spontaneous growth of the lexicon, which at least to some extent was the result of progress in phonetic development (cf. Elsen 1999). Within these two months, 46 words were acquired (of which 20 were German, 19 were Russian, one was both German and Russian, and six were first names). When speaking to his parents at this age, the child used Russian and German words in about equal proportions, independent of the matrix situation, but the child’s favorite, most frequently used words were German: Auto ‘car’, heiß ‘hot’, and auch ‘also’. Just like monolingual German children, he used the word heiß for conceptually-linked referents, depending on the context ‘stove’, ‘lamp’, or ‘smoke’ (cf. Rothweiler & Meibauer 1999). The word auch has also been used in various meanings by monolingual German children (Penner, Tracy & Wymann 1999). He used these words not only in the German matrix situation, but also in the Russian matrix situation. The period of time between 2;3 and 2;10 is therefore central to this study, because of evidence that it provides about the value of production data as a criterion for assuming two language systems. These recordings are listed individually in the tables, along with the age and the MLU 9 value. It made sense to start with age 2;3, since the child began to talk relatively late in both languages. We placed the upper limit of the intensive analysis at age 2;10 for two reasons. The first reason was that from this point in time there were very clear tendencies concerning language differentiation, which later proved to be characteristic of the child’s later development as well. The other reason was that this was the age at which the child entered nursery school, meaning that after this point, it was no longer possible to determine with certainty the first utterances of individual words.
144 Elena Dieser Table 2. Word use in the Russian matrix situation with bilingual speakers. Matrix language: Russian
German & Non-matrix: Repeated Russian German words (only Russian)
Unintelligible
Age 2;3.10 MLU 1.29
Types
8 (20%)
18 (45%)
8 (20%)
3 (8%)
3 (7%)
Tokens
31 (20%)
72 (46%)
27 (17%)
4 (3%)
21 (13%)
Age 2;3.24 MLU 1.03
Types
3 (13%)
4 (14%)
5 (23%)
11 (50%)
0
Tokens
5 (14%)
11 (31%)
8 (22%)
12 (33%)
0
Age 2;5.10 MLU 1.00
Types
6 (60%)
0
2 (20%)
1 (10%)
0
Tokens
6 (55%)
0
2 (18%)
2 (18%)
0
Age 2;6.10 MLU 1.18
Types
41 (39%)
14 (13%)
15 (14%)
25 (24%)
9 (9%)
Tokens
96 (40%)
42 (17%)
55 (23%)
29 (12%)
17 (7%)
Age 2;6.20 MLU 1.29
Types
21 (47%)
2 (4%)
7 (16%)
11 (24%)
3 (7%)
Tokens
54 (52%)
5 (5%)
18 (17%)
11 (11%)
14 (13%)
Age 2;7.20 MLU 1.38
Types
70 (64%)
8 (7%)
18 (17%)
6 (6%)
5 (5%)
Tokens
132 (50%)
23 (9%)
65 (25%)
8 (3%)
30 (11%)
Age 2;8.9 MLU 1.56
Types
6 (32%)
6 (33%)
6 (32%)
0
0
Tokens
11 (27%)
11 (27%)
16 (39%)
0
0
Age 2;8.16 MLU 1.56
Types
29 (49%)
9 (16%)
15 (25%)
3 (5%)
3 (5%)
Tokens
44 (41%)
17 (16%)
39 (36%)
4 (4%)
4 (3%)
Age 2;9.6 MLU 1.48
Types
111 (66%)
22 (14%)
20 (12%)
5 (3%)
8 (5%)
Tokens
250 (67%)
60 (17%)
41 (11%)
5 (1%)
14 (4%)
Age 2;10.7 MLU 2.0
Types
120 (70%)
21 (12%)
9 (5%)
17 (10%)
1 (1%)
Tokens
314 (72%)
55 (13%)
19 (4%)
17 (4%)
27 (6%)
Tables 2–4 show the change in the number and percentage of Russian words, German words, words in both languages (Russian and German), repeated words (Russian or German) and unintelligible words used by the child between the ages of 2;3 and 2;10 in video recordings. Table 2 shows vocabulary use in Russian matrix situations10 with bilingual interlocutors (mother). Table 3 shows vocabulary use in German matrix situations with bilingual interlocutors (mother), and Table 4 shows German matrix situations with monolinguals. No individual recordings of the child with monolingual Russian speakers were made before the age of 3;0.
Early language separation
145
In each table the first column shows the child’s use of the matrix language and the second column words in both languages. The third column shows the number and proportion of lexical insertions from the non-matrix language. The ‘German & Russian’ category consisted mostly of proper names, words such as mama and papa, similar sounding interjections (like mhm and aeh) and onomatopoeia (like muuh and pipi)11 as well as content words that sound alike in German and Russian (like traktor ‘tractor’ and kran ‘crane’). The category ‘repeated words’ contained words which the child repeated after the interlocutor either voluntarily or in a repetition task. Table 3. Word use in the German matrix situation during conversations with bilingual Russian-German speakers. Matrix language: German Age 2;7.20 Types MLU 1.35 Tokens
German & Non-matrix: Repeated Russian Russian words (only German)
Unintelligible
24 (51%)
2 (4%)
4 (9%)
11 (23%)
71 (66%)
3 (3%)
11 (10%)
11 (10%)
10 (9%)
Types
6 (62%)
1 (8%)
2 (15%)
1 (8%)
1 (7%)
Tokens
16 (70%)
1 (4%)
3 (13%)
1 (4%)
10 (9%)
Age 2;8.16 Types MLU 1.61 Tokens
43 (74%)
7 (12%)
4 (7%)
2 (3%)
2 (3%)
88 (72%)
16 (10%)
13 (13%)
2 (2%)
4 (3%)
71 (53%)
19 (14%)
20 (15%)
18 (14%)
6 (4%)
Tokens 168 (46%)
56 (15%)
98 (27%)
28 (8%)
18 (4%)
Age 2;10.7 Types 109 (68%) MLU 2.39 Tokens 301 (73%)
22 (14%)
10 (6%)
10 (6%)
8 (5%)
51 (12%)
21 (5%)
17 (4%)
21 (5%)
Age 2;8.9 MLU 1.26
Age 2;9.6 MLU 1.60
Types
5 (11%)
Table 4. Word use in the German matrix situation during conversations with monolingual German speaking persons.
Age 2;5.1 Types MLU 1.34 Tokens
Matrix language: German
German & Non-matrix: Repeated Russian Russian words (only German)
Unintelligible
14 (46%)
11 (35%)
1 (3%)
4 (13%)
1 (3%)
32 (49%)
27 (40%)
1 (1%)
4 (6%)
3 (4%)
146 Elena Dieser Translating was the child’s favorite game from 2;3 to 2;5. This included ‘real’ translation from one language to the other, but often involved the production of a related form in his active vocabulary. This can be demonstrated in the dialogue of the child speaking with his grandmother in Example 3 (below). The grandmother asked the child to repeat the Russian words baba ‘grandmother’, kasha ‘porridge’, dym ‘smoke’, golova ‘head’, mashina ‘car’ and ded(a) ‘grandfather’. If a word was not in the child’s active lexicon at the time, the child used without hesitation either the equivalent from German, as in Kopf ‘head’ and Auto ‘car’ or an associated word, for example heiß ‘hot’ as a reaction to the Russian dym ‘smoke’, or Sasha (the name of the child’s grandfather) as a reaction to the Russian ded(a) ‘grandfather.’ (3)
Age of the child: 2;3.24 (Russian matrix situation) Grandmother: Child: Grandmother: Child: Grandmother: Child: Grandmother: Child: Grandmother: Child: Grandmother: Child:
skazhi baba (R). ‘Say grandmother.’ baba (R). ‘Grandmother.’ skazhi kasha (R). ‘Say porridge.’ kasha (R). ‘Porridge.’ skazhi dym (R). ‘Say smoke.’ heiß (G). ‘Hot.’ skazhi golova (R). ‘Say head.’ Kopf (G). ‘Head.’ skazhi mashina (R). ‘Say car.’ Auto (G). ‘Car.’ skazhi deda (R). ‘Say grandfather.’ Sasha (the first name of the child’s grandfather)
The dialogues reflect two characteristics of early child language development. On the one hand, the child’s willingness to name the equivalents – to the extent that they were present in his productive (active) lexicon – indicates (as did the results of the comprehension tests) that the child readily accepted the existence of cross-language synonyms. On the other hand, the unwillingness of the child to actively use both translation equivalents of an equivalent pair at this age suggests a tendency towards language economy. The child always tried to manage with his already acquired vocabulary, which delayed his initial lexical development in general – but especially the development of words with known equivalents.
Early language separation
147
The start of the acquisition of equivalents At the age of 2;4, the child began producing equivalents, the first of which was the Russian equivalent of the German word for ‘foot’. From the age of 2;7–2;8, the number of equivalents increased very rapidly. Most words that were acquired during this time period found their equivalent within a month. Many equivalents were acquired within a few days, some within the same day. However, some words which were acquired before the age of 2;2 continued not to have equivalents. Although the child at the age of 2;4–2;5 still had relatively few equivalents in his active lexicon, this did not mean that his conversations in German and Russian with various people contained the same proportion of code-switching. The diary entries show that the child only very rarely used any Russian words when speaking to a monolingual German speaker, starting from age 2;3. This was still the case at the age of 2;5.1, as both the diary and the recordings indicate (see Table 4). During this recording, the child used altogether 67 tokens of which only one was Russian. Beyond this, in this recording it was possible to see from the child’s word choice that he avoided describing some people, objects and events, for which he didn’t yet have German equivalents in his active lexicon. In the recording of 2;5.1, when reading the picture books in German with German-speaking strangers, there was no mention of people, although when reading picture books at this age in Russian, the child often focused on people. This was probably due to the fact that only the child’s active lexicon for Russian contained the words for ‘man’ and ‘woman’ until the age of 2;9 and 2;10, respectively12. When the child spoke to his mother in German, he consistently used about 80–85% German and 15–20% Russian words, and when the child spoke to his parents or grandparents in Russian, he used about 70–75% Russian and 25–30% German although the language of the interlocutors in the recordings contained no lexical insertions from the other language. This difference in the child’s language production in speaking with monolingual and bilingual persons shows that the child from the beginning of the intensive phase in language production (ca. 2;3) separated the two languages. The child seemed to know very early on when he was in contact with monolingual speakers and should switch into monolingual mode (cf. Genesee, Nicoladis & Paradis 1995). The necessity of communication only in German with monolingual German speakers is probably the main reason why the acquisition of the active German lexicon progressed just as quickly as the Russian, in some
148 Elena Dieser aspects even faster, although the child’s passive vocabulary was much larger in Russian. The large number of lexical insertions from the other language when speaking with bilinguals is thus not necessarily (cf. van der Linden 2001) the result of the child failing to distinguish the two systems; it can simply be seen as a product of the removal of the suppression of the other language. A second important reason, which is related to the first, was, of course, the lack of equivalents; some language-mixing can be interpreted as the filling of lexical gaps. As shown above, however, the strategy of filling lexical gaps in one language with lexical insertions from the other language, was used by the child only when speaking to bilingual persons. The cause for the initial delayed development of words with known equivalents, compared to most studies of the type ‘one person – one language’, was probably the fact that the child was for the most part around bilingual people at this age. They could understand the child in both languages, and in this situation the child had no need for the acquisition of equivalents. We interpreted this above as the consequence of the tendency towards language economy. The acquisition of translation equivalents by this child shows that although the presence of translation equivalents in the child’s active lexicon could be interpreted as an argument in support of the ‘two system theory’ as has often been the case in past studies (e.g. Quay 1995), the reverse would not be true. The lack of translating equivalents would not in itself be a sufficient criterion for the ‘one system theory’. Language-mixing during this phase was often produced sentence-internally in conversations with the child’s parents, as examples 4 and 5 demonstrate: (4)
Age of the child: 2;5.19 (Russian matrix situation) Child:
(5)
heiss (G) čaj (R). ‘Hot tea.’
Age of the child: 2;8.16 (Russian matrix situation) Child:
auch (G) knizhka (R). ‘Also book.’ [% like to read one more book]. Mother: davaj e’tu posmotrim (R). ‘Let’s read this.’ Mother: posmotri kto zdes’ sidit za stolom (R)? ‘Look who’s sitting at the table now.’ Child: ein (G) jozhik (R). ‘A hedgehog.’ Mother: a e’to kto (R)? ‘And who’s that?’ Child: ein (G) zajchik (R). ‘A rabbit.’
Early language separation
149
Dialogues such as that in Example 5 were observed very frequently at this point during conversations with his parents, independent of the matrix situation. It was noted that the two languages did not develop symmetrically during this time: in Russian, the child mainly acquired content words (most of these were nouns) and in German mainly function words such as da ‘there’, auch ‘also’, as well as articles (indefinite articles first), which were then, as can be seen in Example 5, combined with Russian nouns. For this reason, some of the child’s sentences from the earlier phase of syntactic development resembled German structures with some Russian content words (especially nouns), despite the fact that the child heard Russian twice as frequently as German at this age. Another observation about this phase was that the child often combined Russian verb stems with German infinitive endings e.g. plak-en [2;5.6] from plak- (Russian ‘to cry’) and -en (the German infinitive ending). Also, towards the end of this phase, the child often used both language equivalents together for emphasis when speaking to his parents, e.g. nein (G) # net (R) ‘no # no’ [2;7.26], or pit’ (R) # trinken (G) ‘drink # drink’, [2;7.26] (where the symbol # in transcriptions indicates a pause). Saunders (1982: 45) and Taeschner (1983: 28) report similar findings. Example 6 demonstrates a case which could be considered the transition from sentence-internal mixing to self-correction. When the child at this age wanted to have something, he used the German construction of (will + N +) haben ‘want + N + to have’ and the Russian construction (N+) dat’ ‘(N +) to give’. (6)
Age of the child: 2;6.17 (Russian matrix situation) *CHI: will (G) pechen’e (R) haben (G). ‘want to have some cookies’ *CHI: dat’ dat’ dat’ (R). ‘Give, give, give’.
The lexical insertions from Russian in the German matrix situation were for the most part only content words (usually nouns, such as knizhka ‘little book’, koshka ‘cat’ etc.), as well as the word da ‘yes’, but these borrowings appeared principally only in conversations with bilingual people. When speaking to monolingual Germans, the child used almost no insertions from Russian. This kind of asymmetrical development in both languages and the constantly increasing use of German, as well as the great discrepancy between the monolingual and bilingual modes concerning language mixing, confirm the assumption that in the situation when one and the same person speaks
150 Elena Dieser two languages with a young child, language separation of these two languages, and language acquisition in general, is delayed and made much more difficult. When the child spoke to monolingual persons, there was almost no language mixing at all. However, when the child spoke to bilingual persons, there was a lot of language mixing, although he didn’t hear any language mixing from the bilingual persons themselves. That means that when children first start to talk, if parents speak both languages alternately, it plays a less important role if their speech contains language mixing or not. The crucial factor for the child is that the person is bilingual. As a result, the child’s acquisition of Russian was in many ways slower than German prior to contact with monolingual Russian speakers, despite very frequent Russian input. On the other hand, the metalinguistic abilities of the child developed earlier than in children of the type ‘one person – one language’. From the age of 2;3–2;4 the child could translate single words from one language to the other, when asked. At the age of 2;6–2;7, the child’s first metalinguistic remarks emerged. When the mother asked in what language the child would like to read a book, the child preferred German more often (see the video recording: 2;7.20). At this time, unless asked, the child did not initiate the language choice himself but more or less adhered to the matrix situation set by the parents. But from 2;10–2;11 the child gladly initiated the choice of language in conversation with his parents, such as when choosing a book to read, whereas the child described the languages with phrases such as auf Deutsch /po-nemecki ‘in German’ or auf Russisch /po-russki. ‘in Russian’.13
3.2.3. The Three-Word and Four-Word Stage and Beyond During the transition to three and four-word expressions (from the age of 2;8-2;9), there were frequent cases of self-correction, especially in cases when Russian words were used during conversations with monolingual German speakers: (7)
Age of the child: 2;8.0 (German matrix situation) Christiane:14 und wohin gehst du heute Abend (G)? ‘And where are you going tonight?’ Child: k babe s dedom (R) # Oma Opa (G). ‘To grandma and grandpa’s # grandma grandpa’s.’
Early language separation
151
Self-correction occurred less frequently when Russian words were used in the German matrix situation with bilingual persons. It never occurred when German words were used in the Russian matrix situation during conversations with bilingual persons. The ratio of German words in Russian conversations with bilingual persons at this age was very high: 32% types and 39% tokens in the recording at the age of 2;8.9, and 25% types and 36% tokens in the recording at the age of 2;8.16. For the most part, it was a matter of function words such as da ‘there’, auch ‘also’, the word nein ‘no’, indefinite articles ein and eine, German onomatopoeia and the verbs essen ‘to eat’ and trinken ‘to drink’, as can be seen in Example 8. (8)
Age of the child: 2;8.16 (Russian matrix situation) Mother: Child: Mother: Child: Mother:
chto utka delaet (R)? ‘What is the duck doing?’ ona (R) trinken (G) +/.15 ‘It’s drinking.’ da a po-russki (R)? ‘Yes and in Russian?’ +, Saft (G)16. ‘Juice.’ a po-russki chto ona delaet (R)? ‘And in Russian what is she doing?’ Mother: trinken e’to po-nemecki a po-russki(R)? ‘In German it’s trinken, and in Russian?’ Mother: ty chasto govorish’ kak po-russki (R)? ‘You often say the Russian word?’ Child: pit’ (R). ‘Drink.’ The ratio of Russian words in German conversations with bilingual persons at this age was much lower than the other way around. For example, during the recording of 2;8.9 it was 15% types and 13% tokens, and during the recording of 2;8.16 it was 7% types and 13% tokens. Most examples were instances of the use of the Russian da ‘yes’ instead of the German ja ‘yes’, which the child said in Russian more often, as well as content words such as malen’kij ‘little’, pit’ ‘drink’. At the age of 2;8, the child spent three weeks with his parents in a Russian-speaking environment. During this time and within the month after returning to Germany, the child had acquired the Russian equivalents of his favorite German words. A further result of his three-week stay in Russia was that he used German infinitives with an added Russian infinitive ending e.g. laufen-t’ ‘to run’, essen-t’ ‘to eat’ [2;8.27]. (Before this, the child had combined Russian verb stems with German infinitive endings e.g. plak-en [2;5.6].)
152 Elena Dieser The recording at the age of 2;9.6, which was made immediately following the return from Russia, clearly shows a decrease in the number of German lexical insertions in the Russian matrix situation and an increase in the number of Russian lexical insertions in the German matrix situation (see Tables 2 and 3). One of the reasons for this was that the child had acquired Russian function words, so that his Russian sentences were no longer dominated by German structures, or at least much less so. In a reversal of the previous situation, the child then used the Russian word tam ‘there’, and e’to ‘this’, in German conversations, alongside the German da ‘there’ and das ‘this’. To sum up, it would appear that it is contact with monolinguals which acts as a trigger for the reduction of language mixing. From the age of 2;10, some weeks after entering a German nursery school, most of these phenomena ceased. There was a sharp decrease in the use of Russian words in German contexts and German words in Russian contexts. In comparison with the recording at the age of 2;9.6, the number of lexical insertions from the other language during the recording at the age of 2;10.7 had fallen from 15% types and 27% tokens to 6% types and 5% tokens in the German matrix situation, and from 12% types and 11% tokens to 5% types and 4% tokens in the Russian matrix situation (see Tables 2 and 3). The long stay in a monolingual Russian-speaking environment, as well as the subsequent entering of a German nursery school, in which the child came into close contact with monolingual German speakers, accelerated the separation of the two languages in production. It seemed that from this age, the child could also perceive that the language of his parents, although they spoke to him in both Russian and German, contained practically no language mixing. The child adopted this and wanted to speak more and more exclusively Russian with his parents, and German only in the presence of monolingual German-speaking persons. It is interesting to note that there was still language mixing at this stage. At the age of three, the child could distinguish between the two languages and consciously tried to avoid mixing them. The percentage of language mixing in conversations with bilinguals (usually his parents and grandparents) in Russian was 0–5%, in German 0–3%, and these were frequently self-corrected. The number of other language insertions in the child’s language production in conversations with monolinguals decreased to almost zero. Insertions from the respective other language when speaking to monolinguals were nearly always followed by self-correction17. A very frequent instance of language mixing, however, was observed until the age of about 3;6 during the child’s soliloquies when playing alone.
Early language separation
153
The order of the development of language differentiation – first in communication with monolinguals, then in communication with bilinguals and only then in soliloquies – suggests that it is fairly natural for a bilingual child to mix languages when neither the social nor the communicative context of production offer any constraints. This is quite compatible with the ability of the child to distinguish the two languages and restrict his use to just one of them when this is appropriate. It would thus appear that the child avoided language mixing first of all for functional reasons: for communicative success when he was speaking to a person he knew to be monolingual. The child therefore avoided language mixing with his parents only later, no doubt for reasons of social conformity: it is a widely observed phenomenon that speakers assimilate their language use to that of their conversation partner. All of these motivations are missing in soliloquies, which resulted in language mixing persisting here even after language separation in production was well-practised in other contexts. This same pattern was repeated in the distribution of self-corrections. The child’s language production was most self-corrected in conversations with monolingual German-speaking persons but less so in conversations with the members of his family. It might be assumed that when a child is raised using the ‘one person – one language’ method the two reasons for not mixing, social assimilation and communicative necessity, fall together, which should hasten and encourage language separation. However, this is dependent on both parents consistently avoiding language mixing themselves, which is often not the case (Lanza 1992; Juan-Garau & Pérez-Vidal 2001, and others). Since even parents who consistently produce only one language in front of their child must nevertheless each understand the other language (how else could they communicate?) even these cases provide the child with clear evidence of bilingualism. We might therefore consider the parental language distribution types points on a continuum, rather than entirely separate experiences for the child (cf. Lanza 1992). These considerations lead us to conclude that many factors affect children’s language mixing in production other than the child’s simple awareness of the existence of two separate systems. In particular, language mixing can be shown to be context-dependent. That a child mixes in one context does not mean that they cannot differentiate in another context. All of these need to be taken into account before language mixing in production can be regarded as evidence for or against language separation at a particular time. We have mentioned phonological evidence, but this only became clear after the age 3. From the age of about 3;2 the child began to regularly cor-
154 Elena Dieser rect native Russian speakers when their German contained vowel reductions which followed the Russian pattern. For example, the child [3;3.26] corrected a native Russian speaker who pronounced the word for car, Auto, with a reduction of the unstressed final ‘o’, so that it sounded like Auta. It is assumed that the child was able to differentiate phonological differences between the two languages before this point.
4. Conclusion Building in part on the criteria of Genesee (1989) for the differentiation of two languages by young bilingual children, most (if not all) studies since then have endorsed the “two system theory” (Lanza 1992; Genesee, Nicoladis & Paradis 1995; Quay 1995; Meisel 1989; Paradis & Genesee 1996, and many others). The empirical base for these studies is generally children’s production in conversation with their parents and sometimes strangers (Genesee, Nicoladis & Paradis 1995; Genesee, Boivin & Nikoladis 1996). Most of this data has related to bilingual language acquisition using the ‘one person – one language’ model. The present study provides an analysis of language acquisition data of a bilingual child whose parents spoke either their native language or the majority language of the environment to the child, depending on the presence of other monolingual members of the household. Since the language distribution during acquisition was a less usual one, our empirical base includes a new type of production data, in which one and the same person, the mother, first spoke with the child in one language and later switched to the other language. However, we also discussed a recording of the child speaking with a monolingual German-speaking stranger, recordings of the child’s soliloquies, and some comprehension data. These were intended to cover the whole spectrum of situations that the child encountered. This data yielded various significant findings. First, the child was able to provide translation equivalents of vocabulary items on demand well before he had any translation pairs in his active vocabulary. Second, they reveal the child using different patterns of vocabulary, depending on whether he was speaking with a monolingual or another bilingual. His avoidance of Russian vocabulary when speaking to a German monolingual demonstrates his awareness of two language systems even in very early data. The learning context may have caused the much stronger difference between monolingual and bilingual modes in our data than in the studies of Genesee,
Early language separation
155
Nicoladis & Paradis (1995) and Genesee, Boivin & Nikoladis (1996). Third, the child stopped or reduced language mixing in his own production differentially according to the context of use. With monolinguals, he distinguished already at the pre-syntactic stage, with known bilinguals this occurred somewhat later, but in his soliloquies latest of all. Language mixing could thus be seen as depending on communicative requirements. These findings from our case study would tend to support the following conclusions. First, they show the child’s awareness of alternative encodings at the very earliest stage, before he had any translation equivalents in his own active vocabulary. Even if this does not demonstrate that he separated the languages systematically at this stage, he nevertheless had achieved a significant step in that direction, since he did not assume form – concept biuniqueness. We can offer conclusive evidence of his differentiation of the languages from the first recording of him interacting with a known monolingual. Taken together, we may summarize that the child appeared to have two language systems as early as we can obtain any evidence. We see no direct evidence of a stage with a single system. A presumed initial onesystem phase would thus terminate very early, certainly during the singleword stage. The second contribution to the debate which we can make is to underline the observation that a large number of lexical insertions from the other language is not necessarily a sign of a child failing to distinguish the two systems (cf. van der Linden 2001). In fact our data from comprehension tests and from the child’s avoidance of Russian when in conversation with German monolinguals provide testimony that he differentiated between systems at stages while he was still mixing languages in production with bilinguals. This must throw increasing doubt on language mixing in production as a criterion for language separation at all. In particular the child’s language mixing in soliloquies well after he had unequivocally distinguished the two languages demands that we look for the motivation for the avoidance of mixing in external factors. There is a clear functional advantage in not using a language with a conversation partner who does not understand it. This motivates language non-mixing with monolinguals. We can best explain the child’s ceasing to mix languages with his bilingual parents with the person-specific feature of language choice. What is very clear, however, is that language mixing does not directly depend on mental language differentiation. We hope to have provided a contribution to the on-going debate about the early development of language separation in bilingual children and the factors which affect it, both by reporting acquisition data gathered in a less
156 Elena Dieser frequently studied language acquisition situation, and also with our differentiation of the evidence of production in varying contexts. The question has not been definitively answered, however. This would require further, more extensive studies involving in particular recordings of interactions with monolingual speakers in both languages at the earliest stage possible.
Notes 1. 2.
3. 4.
5.
6.
7. 8.
9.
The (R) for Russian and (G) for German indicate the language of the preceding text. An equivalent of this type in mononational couples would be of the type ‘Nondominant Home Language without Community Support’ (Romaine 1996). In this model, the parents speak only one language to the child, his or her only contact with the environment language is then from outside the home (e.g. Ruke-Dravina 1967). Following Meisel (1994), “language-mixing “is used here as a generic term for “all instances where features of the two languages are juxtaposed”. The terms monolingual and bilingual here are used from the point of view of the child and refer to persons who spoke only one language (Russian or German) or both with the child. Multilinguals sharing languages often use many more borrowings between each other than they do with monolinguals (see Grosjean 2001). As Lanza (1992) and Genesee, Boivin & Nikoladis (1996) report, children are able to switch from bilingual to monolingual mode at a very young age. In order to make the situation seem natural for the child, a stuffed teddy bear that could only ‘speak’ German was present at all times during the German recordings with the mother. Van der Linden (2000) reports a similar rate of language production development. The principle of contrast states that a meaning can be expressed using only one word form. Clark (1993) claims that it plays an important role in monolingual lexical acquisition, predicting that it blocks the acquisition of absolute synonyms. For bilingual children, this implies that their initial vocabulary lacks translation equivalents (cross-language synonyms). Cross-language homonyms on the other hand, she maintains, do not cause children any difficulties (Clark 1993: 70). Aside from the discussion of whether MLU values (here: the mean length of utterance in words) should ever be used in comparative studies (Genesee, Nicoladis & Paradis 1995), this value was affected here by the specifics of the interactions. In order to be able to create situations which were as similar as possible in both languages, the mother and child often read the same picture books during the recordings. Since the child often answered the mother’s
Early language separation
10.
11. 12. 13.
14. 15. 16. 17.
157
questions with only one word, the MLU value for many recordings is unusually low. The primary language of a conversation will be called here the matrix situation. When the child was addressed by an interlocutor in Russian during a recording, for example, this is referred to as the Russian matrix situation. Interjections and onomatopoeia, which could only be Russian (e.g. oj, kukareku) or German (e.g. aua, wau-wau), were counted as language specific. From the age of 2;10, examples such as these could be found on a regular basis, including in conversations with bilinguals. In conversations with many children at this age being raised according to the ‘one person – one language’ method, their parents still use ‘indirect phrases’ such as ‘How does your daddy say that?” (Juan-Garau & Pérez-Vidal 2001: 76), instead of naming the other language. A German-speaking friend of the child’s mother. The symbols ‘+/’ and ‘+,’ indicate that an expression was interrupted (‘+/’) or continued (‘+,’). One could, of course, treat the expression trinken Saft ‘to drink juice’ as a kind of chunk, but the child often said this statement in Russian at this age. These results of language differentiation were based on the video recordings made between 3;0 and 5;10 (the time of writing).
References Burling, Robbins 1959/78 Language development of a Garo and English-speaking child. In Second Language Acquisition: A Book of Readings, E. Hatch (ed.), 54–75. Rowley, MA: Newbury House. Clark, Eve V. 1993 The Lexicon in Acquisition. Cambridge: Cambridge University Press. De Houwer, Annick 1990 The Acquisition of Two Languages from Birth: A Case Study. Cambridge: Cambridge University Press. Ellul, Sonia 1978 A Case Study in Bilingualism. Code-switching between Parents and Their Pre-school Children in Malta. St. Albans: Campfield Press. Elsen, Hilke 1999 Auswirkungen des Lautsystems auf den Erwerb des Lexikons – Eine funktionalistisch-kognitive Perspektive. In Das Lexikon im Spracherwerb, J. Meibauer & M. Rothweiler (eds.). Tübingen: Francke. Fantini, Alvino E. 1985 Language Acquisition of a Bilingual Child: A Sociolinguistic Perspective. Clevedon: Multilingual Matters.
158 Elena Dieser Genesee, Fred 1989 Early bilingual development: one language or two? Journal of Child Language 16: 161–179. Genesee, Fred, Elena Nicoladis & Johanne Paradis 1995 Language differentiation in early bilingual development. Journal of Child Language 22: 611–631. Genesee, Fred, Isabelle Boivin & Elena Nicoladis 1996 Talking with strangers: A study of bilingual children’s communicative competence. Applied Psycholinguistics 17: 427–442. Grosjean, François 2001 The bilingual’s language modes. In One Mind, Two Languages. Bilingual Language Processing, Janet L. Nicol (ed.), 1–25. Oxford: Blackwell Juan-Garau, Maria & Carmen Pérez-Vidal 2001 Mixing and pragmatic parental strategies in early bilingual acquisition. Journal of Child Language 28: 59–86. Köppe, Regina 1997 Sprachentrennung im frühen bilingualen Erstspracherwerb Französisch/Deutsch. Tübingen: Narr. Lanza, Elizabeth 1992 Can bilingual two-year-olds code-switch? Journal of Child Language 19: 633–658. Leopold, Werner F. 1949/71 Speech Development of a Bilingual Child: A Linguist’s Record. (Vols. 1–4). New York: AMS Press. Linden, Elisabeth van der 2001 Non-Selective Access and Activation in Child Bilingualism. In Crosslinguistic Structures in Simultaneous Language Acquisition, S. Döpke (ed.), 37–56. Amsterdam: Benjamins. MacWhinney, Brian 1991 The CHILDES Project. Tools for analyzing talk. Mahwah, NJ: Erlbaum. Meisel, Jürgen M. 1986 Word order and case marking in early child language. Evidence from simultaneous acquisition of two first languages: French and German. Linguistics 24: 123–183. 1989 Early differentiation of languages in bilingual children. In Bilingualism across the Life Span. Aspects of Acquisition, Maturity, and Loss, K. Hyltenstam & L. K. Obler (eds.), 13 – 40. Cambridge: Cambridge University Press. 1994 Code-switching in young bilingual children. The acquisition of grammatical constraints. Studies in Second Language Acquisition 16: 413–439.
Early language separation 2004
159
The Bilingual Child In The Handbook of Bilingualism, T. K. Bhatia & W. C. Ritchie (eds.), 91–113. Oxford: Blackwell. Meisel, Jürgen M. (ed.) 1990 Two first languages. Early grammatical development in bilingual children. Dordrecht: Foris. Nicoladis, Elena & Fred Genesee 1998 Parental Discourse and Codemixing in Bilingual Children. The International Journal of Bilingualism 2: 85–89. Paradis, Johanne 2001 Beyond ‘One System or Two?’ Degrees of separation between the languages of French-English bilingual children. In Cross-linguistic Structures in Simultaneous Language Acquisition, S. Döpke, (ed.), 175–200. Amsterdam: Benjamins. Paradis, Johanne & Fred Genesee 1996 Syntactic acquisition in bilingual children: autonomous or interdependent? Studies in Second Language Acquisition 18: 1–25. Penner , Zvi, Rosemarie Tracy & Karin Wymann 1999 Die Rolle der Fokuspartikel AUCH im frühen kindlichen Lexikon. In Das Lexikon im Spracherwerb, J. Meibauer & M. Rothweiler (eds.), 229–251. Tübingen: Francke. Quay, Sara E. 1995 The bilingual lexicon: Implications for studies of language choice. Journal of Child Language 22: 369–387. Redlinger, Wendy E. & Tschang-Zin Park 1980 Language mixing in young bilinguals. Journal of Child Language 7: 337–352. Rescorla, Leslie, Jennifer Mirak & Leher Singh 2000 Vocabulary growth in late talkers: Lexical development from 2;0 to 3;0. Journal of Child Language 27, 293–311. Romaine, Suzanne 1996 Bilingualism. Oxford: Blackwell. Ronjat, Jules 1913 Le développement du langage observé chez un enfant bilingue. Paris: Champion. Rothweiler, Monika & Jörg Meibauer (eds.) 1999 Das Lexikon im Spracherwerb: Ein Überblick. In Das Lexikon im Spracherwerb, J. Meibauer & M. Rothweiler (eds.), 9–31 Tübingen: Francke. Ruke-Dravina, Velta 1967 Mehrsprachigkeit im Vorschulalter. Lund: Gleerup. Saunders, George 1982 Bilingual Children: Guidance for the Family. Clevedon: Multilingual Matters.
160 Elena Dieser Schnitzer, Marc & E. Krasinski 1994 The development of segmental phonological production in a bilingual child. Journal of Child Language 21: 585–622. Smith, Madorah E. 1935 A study of the speech of eight bilingual children of the same family. Child Development 6: 19–25. Swain, Merrill & Mari Wesche 1975 Linguistic Interaction: case study of a bilingual child. Language Sciences 17: 17–22. Taeschner, Traute 1983 The Sun Is Feminine. A Study on Language Acquisition in Bilingual Children. Berlin: Springer. Vihman, Marilyn 1985 Language differentiation by the bilingual infant. Journal of Child Language 12: 297–324. Volterra, Virginia & Traute Taeschner 1978 The acquisition and development of language by bilingual children. Journal of Child Language 5: 311–326.
‘I need data which I can rely on’: Corroborating empirical evidence on preposition placement in English relative clauses* Thomas Hoffmann
1. Introduction We do not need to use intuition in justifying our grammars, and as scientists, we must not use intuition in this way. (Sampson 2001: 135) You don’t take a corpus, you ask questions. […] You can take as many texts as you like, you can take tape recordings, but you’ll never get the answer. (Chomsky in Aarts 2000: 5–6)
If both Sampson’s position on introspection and Chomsky’s views on corpora were correct, there would be no valid data base left for linguists to investigate. Fortunately, however, Sampson and Chomsky only represent extreme positions. Recently, an increasing number of linguists have shown that both introspection and corpus data can yield interesting and valid results, if collected and interpreted via rigorous scientific methods (Bard et al. 1996; Fillmore 1992; Schütze 1996). Yet, when investigating a particular syntactic phenomenon, many linguists still only draw on either corpus or introspection data (notable exceptions can be found in Kepser & Reis 2005 and the present volume). In this article I will illustrate how an approach which treats the two data sources as corroborating empirical evidence allows a far more detailed analysis of a well-known grammatical phenomenon than would have been possible with a single data source. An interesting area of syntactic variation within the English language which is subject to many categorical as well as variable constraints is the placement of prepositions. In relative clauses, e.g., the preposition can either precede the wh-relativiser (preposition pied-piping, 1a) or the relativised gap (preposition stranding, 1b). (1)
a. I need data [on which]i I can rely __i b. I need data [which]i I can rely on ___i
162 Thomas Hoffmann Preposition stranding occurs in various English constructions (e.g., prepositional passives: John was laughed at or wh-questions: Who did they laugh at?). As an analysis of preposition placement in the International Corpus of English (ICE-GB) shows, however, the phenomenon occurs most frequently in relative clauses. In addition to this, there are a number of unique independent factors which call for an in-depth analysis of the phenomenon in relative clauses (e.g., the set of competing non-wh relativisers that and Ø, which do not allow pied piping, cf. the man to whom/*that/*Ø she talked, or the restrictiveness of a relative clause). In the following I will first give an overview of the overall distribution of preposition stranding in the ICE-GB. Then I will provide details on the statistical analysis of the relative clause corpus data. As the analysis showed, many of the relative clause conditions seemed to exhibit a categorical effect on preposition placement. Due to the negative data problem associated with corpus data (i.e. that the absence of a construction in a corpus does not entail its ungrammaticality), a Magnitude Estimation experiment was carried out to separate apparent from accidental categorical effects.
2. Preposition placement in the ICE-GB As a representative corpus for present day English, the British English component of the International Corpus of English (ICE-GB; Nelson et al. 2002) was chosen since it is fully tagged for part-of-speech and parsed for syntactic structure. The ICE-GB CD-Rom includes a retrieval software called ICECUP, which allows the researcher to search for individual words as well as abstract syntactic structures (Nelson et al. 2002). It is a million words corpus consisting of spoken (about 637,000 words) as well as written (about 423,000 words) material, and is intended as a representative sample of educated British English. A major advantage for the present study was the fact that all tokens in the ICE-GB containing a preposition which is not immediately followed by a complement were tagged as stranded preposition (PS) and could thus easily be extracted from the corpus. In addition to this, via the Fuzzy Tree Fragment option which allows the user to search the corpus for abstract syntactic structures all instances of preposition pied piping in the relevant constructions could be identified (i.e. by searching for P + which, P + who). As an analysis of the ICE-GB shows, the corpus contains 1192 tokens which are tagged as stranded prepositions. These, however, also include 205 items which were not relevant for the present study (i.e. incomplete
Corroborating empirical evidence on preposition placement in English RCs 163
utterances, unintelligible fragments, etc.). Thus there are 987 (= 1192–205) tokens in the corpus that actually exhibit a stranded preposition. While the overall distribution of preposition placement across clause types in the ICE-GB definitely requires further investigation (by taking into account all pied piped tokens; Hoffmann in prep.), the analysis showed that the phenomenon is most pertinent in (finite and non-finite) relative clauses: 49,5% (489/987) of all stranded prepositions occur in relative clauses. (The second most frequent context, interrogative clauses, only exhibits 201 stranded prepositions.) This, together with the large number of categorical constraints particular to relative clauses, made it necessary to take a closer look at preposition placement in these constructions.
3.
Preposition placement in relative clauses: Corpus evidence I
3.1. Coding decisions Many factors have been claimed to influence preposition placement (Bergh, & Seppänen 2000; Hornstein & Weinberg 1981; Johansson & Geisler 1998; McDaniel, McKee & Bernstein 1998; Pullum & Huddleston 2002; Trotta 2000; Van den Eynden 1996). In order to test the actual influence of these factors, it was decided to code the ICE-GB data for the following factors: ― ― ― ― ― ― ―
―
preposition placement (stranded, pied piped), finiteness (finite, nonfinite) RC function (restrictive, non-restrictive), relativiser (that, zero, who/m, which, whose), phrase containing PP (VP, AP, NP), complexity (numerical scale, see below for details), ICE-GB text type (private dialogue, private correspondence, public dialogue, unscripted speeches, broadcast news, scripted speeches, non-professional writing, business letters, printed edited texts), PP function (see Table 1).
Apart from the factor RC function, all the above factors have previously been suggested to influence preposition placement (Hoffmann 2005). Now, while the other coding decisions are straightforward, the factor groups complexity and PP function require some explanation. Increasing complexity between a filler and a gap is sometimes claimed to disfavour preposition stranding (e.g. Gries 2002; Trotta 2000; Johansson & Geisler 1998). In order to systematically investigate the influence of com-
164 Thomas Hoffmann plexity on preposition stranding/pied piping, the complexity of the ICE-GB data was analyzed by adapting Lu’s parsing orientated “Mean Chunk Number” hypothesis (2002). To illustrate this complexity measure, take the following relative clause: (2)
As much of a threat is the smothering by oil of the seagrass on which the dugongs feed. <W2B-029 #79> [1] [1] [2] [2] [3]
Basically, Lu assumes that in order to reduce the number of units in the working memory, a parser will, whenever possible, combine smaller units into a single larger one, a so-called chunk. In (2), e.g., the parser can group on which into a PP and the dugongs into an NP. Thus after processing feed, the parser only has to retain three chunks (the PP, the NP and the verb). Adding up the number of chunks which the parser has to store during processing gives the “Instant Chunk Number (or ICN)”: 1+1+2+2+3 = 9 in (2). In a next step, the ICN is then divided by the number of words that had to be integrated. This formula gives a sentence’s Mean Chunk Number (or MCN): [ICN]/[Σwords] = 9/5 = 1.8 for (2). As (3) shows, the MCN of stranded prepositions is higher and thus assumed to be more complex than pied piping: (3)
which the dugongs feed on. [1] [2] [2] [3] [4]
Since the stranded preposition in (3) is identified as an extra chunk, the MCN of this sentence would be 2.4 [12/5]. In contrast to other measures of complexity, such as Hawkins’ (1994) IC-to-Non-IC value,1 the MCN calculation thus implicitly encodes the hypothesis that pied piping is generally less complex than stranding. This feature of the MCN was the reason why it was used as a measure of complexity in the present study. However, when simply calculating MCNs, all clauses with pied piping would as a result be categorized as less complex, and the effect of the underlying complexity would be lost. In order to overcome this problem, it was decided to reconstruct the base position of pied piped prepositions, and to always take the MCN of the stranded alternative as a measure for the complexity of a construction (cf. Hoffmann 2005 for further details). Finally, PP adjuncts are sometimes claimed to strongly favour pied piping, while PP complements are said to strongly favour stranding (e.g. Hornstein & Weinberg 1981; Johansson & Geisler 1998; Trotta 2000). Yet, since the
Corroborating empirical evidence on preposition placement in English RCs 165
complement-adjunct distinction is not a clear-cut binary dichotomy (especially for PPs; Hoffmann 2005; Trotta 2000) I decided against a binary complement-adjunct distinction for the present study. Instead, partly based on Quirk et al. (1985:479-486), the function of a PP was classified according to the more fine-grained categories outlined in Table 1: Table 1. Factor group PP function V-X-P idioms (make light of, get rid of) OBLIGATORY COMPLEMENT
prepositional X (subcategorized P: rely on) subcategorized PP (put sth. in/on/over) obligatory complement (be/live in Spain)
OPTIONAL COMPLEMENT SPACE
optional complements (talk to) affected location (sit on the chair) movement (he rushed to the church) direction (he ran along the road) position/location (he killed the cat in the garden)
TIME PROCESS RESPECT
position in time (He died on Saturday) duration/frequency (He slept for seven hours) manner (he ate the cake in a disgusting way) means/instrument (He killed him with a knife) accompaniment (He came with Bill) respect (For him, something’s always missing)
CONTINGENCY
cause, reason, purpose, result (as a result of which)
DEGREE
amplification, diminution (the extent to which)
3.2. Exhibit I: Categorical corpus data A descriptive analysis of the corpus showed that it contained 1177 relative clauses exhibiting the investigated variable: 688 (58%) pied piped tokens and 489 (42%) stranded tokens (see endnote 2 for slight changes from Hoffmann 2005)2. Several of the above factor groups exhibited categorical effects. As Pullum & Huddleston (2002) predict all 350 tokens without whrelative pronouns (172 with that and 178 with Ø) exhibited stranded prepositions (2b and 3b; in order to avoid the negative data problem here and in
166 Thomas Hoffmann the following I indicate categorical corpus conditions with an “∀“, instead of the traditional “*”/”?”): (4)
a. I want a data source [that]i I can rely on __i b. ∀ I want a data source [on that]i I can rely __i
(5)
a. I want a data source [Ø]i I can rely on __i b. ∀ I want a data source [on Ø]i I can rely __i
Furthermore, all 26 wh-nonfinite tokens are pied piped (6a) and the 70 stranded prepositions in nonfinite relative clauses occur exclusively with Ø (6b–d): (6)
a. I want a data source [on which]i to rely __i b. I want a data source [Ø]i to rely on __i c. ∀ I want a data source [which]i to rely on __i d. ∀ I want a data source [that]i to rely on __i
It has often been claimed that that is a finite complementizer and not a relative pronoun (e.g. Huddleston, Pullum & Peterson 2002: 1057; but cf. also Van der Auwera 1985), which would account for the fact that it cannot pied pipe prepositions. Furthermore, the absence of that from nonfinite relative clauses (6d) would also follow directly from this. As noted by Sag (1997), however, the obligatory pied piping with wh-relative pronouns in non-finite relative clauses (6a,c) appears to be a construction specific constraint (since stranding is a viable option in non-finite interrogative clauses such as I wonder which data to rely on; cf. Hoffmann in prep. for a diachronic construction grammar explanation of this phenomenon). In order to investigate the precise nature of the absence of pied piping with that and ∅ further experimental corroboration was required (i.e. are (4b) and (5b) equally bad?). Nevertheless I would argue that these negative data provide interesting intra-corpus evidence with which to assess the grammaticality of the categorical effect displayed by certain PP functions:
Corroborating empirical evidence on preposition placement in English RCs 167 Table 2. Allegedly categorical pied piping PP functions finite wh-RC PP type PPied Piped r [respect] 119 n [manner] 80 f [frequency] 21 g [degree] 28 Total 248 l [location] 58 k [affected loc] 21 s [subcat. PP] 5 d [direction] 4 Total 88 Total (all) 336
PStranded 0 0 0 0 0 0 0 0 0 0 0
finite non-wh-RC PStranded 0 0 0 0 0 5 5 3 2 15 15
nonfinite nonfinite wh-RC non-wh RC PPied Piped 1 1 2 1 5 1 3 0 0 4 9
PStranded 0 0 0 0 0 1 3 3 0 7 7
Total 120 81 23 29 253 65 32 11 6 114 367
Table 2 gives those PP functions which in the finite wh-RC tokens only occur with a pied piped preposition. As I pointed out earlier, the mere absence of positive data cannot conclusively prove the ungrammaticality of preposition stranding with these PP functions. Nevertheless, taking into account the distribution of the categorical effects in finite that- and Ø-tokens and nonfinite Ø-tokens with such PP functions, an interesting pattern emerges: On the one hand there is a set of PP types which seem to demand obligatory pied piping: respect (people for whom shopping is a drug), manner (the ways in which they achieved it), frequency/duration (the frequency with which he saw her) and degree adjuncts (the extent to which it is true) do not only pied pipe categorically in finite wh-relative clauses, they also do not appear in either nonfinite Ø- or finite that- and Ø-relative clauses (which would trigger stranding). I would argue that there is thus something like amounting negative data evidence in the corpus which supports the claim that stranding is impossible with these PP types.3 Subcategorized PPs (the table on which she put it) and location (the road in which the accident occurred), affected location (the bed in which they slept), and direction PP adjuncts (the room into which he walked), on the other hand, appear in that- and (finite and non-finite) Ø-relative clauses in the corpus. The categorical pied piping of prepositions of these PPs in whrelative clauses thus looks like an accidental gap (cf. Table 2).
168 Thomas Hoffmann Now it might be objected that the categorical effects in Table 2 could also be attributed to two other factors: formality and blocking. Formal contexts are well known to favour pied piping (cf. below). Consequently, if the respect, manner, frequency/duration and degree PP adjuncts tokens only occured in formal text types, then this would explain why they exhibit pied piping only. Yet, this is not the case: In fact, 33% (83/248) of the finite whRC tokens with such a PP appear in the text types which the multivariate analysis identified as strongly favouring stranding (private dialogue/private correspondence and public dialogue/unscripted speeches; cf. below). The categorical effect of these PP types can therefore not be reduced to a mere formality effect. In contrast to this, only 17% (15/88) of the finite wh-RC tokens with subcategorized PPs, location, affected location and direction PP adjuncts occur in the text types that strongly favour stranding. Again, the categorical effects of these PP functions look more like an accidental gap. Finally, another potentially confounding factor is the fact that with some of the above PPs it is possible to omit the preposition (i.e. the ways they achieved it or the road the accident occurred). It might therefore be possible that the lack of stranded tokens with the PPs in Table 2 is due to the fact that the preposition is regularly omitted in such contexts. Yet, such a view is far from unproblematic. First of all, Pesetsky (1998) analyses the omission of prepositions as the result of a constraint that allows the deletion of SpecC. In other words, the underlying structure of the ways they achieved it is the ways in which they achieved it. Since the constraint only applies to SpecC, however, it can only affect pied piped prepositions. The lack of stranded prepositions with the PPs in Table 2 cannot be explained by this approach. Secondly, even if one assumes that stranded prepositions can also be omitted the problem remains that with many of the above PPs omission is not possible (cf. *people for whom shopping is a drug or *the extent to which it is true). Consequently, the effects in Table 2 cannot simply be accounted for by a potential blocking effect of preposition omission. Instead, in Hoffmann (2005) I claimed that the data in Table 2 indicate that there is a semantic constraint on preposition stranding: in order for a stranded preposition to be interpretable the PP it heads must add thematic information to a predicate. Since respect, manner, frequency/duration and degree PP adjuncts do not add thematic information to a predicate, I argued that stranding with these PPs should be ungrammatical. Nevertheless, while the amounting negative data supports such a conclusion, the data on which it rests are still subject to the negative data problem. Therefore, it was decided that further experimental data was needed to corroborate this hypothesis.
Corroborating empirical evidence on preposition placement in English RCs 169
4.
Experimental Evidence
4.1. Methodology In order to corroborate the claims based on the corpus tokens with categorical effects, 36 native speakers of British English (18 female, 18 male; age 17–64) were recruited for an on-line sentence acceptability experiment. The design crossed the following factors: Preposition Placement (stranded or pied piped), Relativiser (wh-, that or Ø) and PP function (prepositional verbs, temporal/locative sentence adjuncts or manner/degree adjuncts). The factor Relativiser thus allowed testing the hypothesis that preposition stranding is equally acceptable with all three types of relativisers. Moreover, it was investigated whether pied piping is equally ungrammatical in that- or Ø-relative clauses. In addition to this, the experiment tested the claim that the distribution of the PP functions in Table 2 permits to distinguish categorical from accidental pied piping effects. Therefore one type of allegedly categorical factors, i.e. manner/degree adjuncts (e.g. I am not concerned with the way in which he achieved his goal), was contrasted with an allegedly accidental factor, locative sentence adjuncts (e.g. Matt retired to an island on which he found gold; due to their similar syntactic function this factor also contained temporal sentence adjuncts such as I forgot the day on which James arrived). Finally, as a point of reference, this factor also included prepositional verbs (e.g. I know the man on whom Jane relied.), with which stranding is perfectly grammatical. This design resulted in Preposition Placement × Relativiser × PP function = 2 × 3 × 3 = 18 cells. Following Cowart (1997), every subject was exposed to all conditions, but never with the same lexical material. Thus, for each of the three PP function conditions, six different lexicalizations for every Preposition Placement × Relativiser factor combination were used. The resulting total of 108 stimuli was then divided into six material sets of 18 stimuli by placing the items in Latin squares (Keller 2000; Keller & Alexopoulou 2005). Furthermore, 36 relative clauses from the ICE-GB corpus were included as fillers. 18 of these fillers were manipulated to exhibit the following ungrammatical phenomena: six fillers with word order violations, six subject contact clauses, six with subject-verb agreement errors. The method used in the experiment was based on the experimental paradigm of magnitude estimation (Bard et al. 1996; Keller 2000). Thus subjects were asked to give numerical judgments on sentences proportional to a constant reference sentence. The experiment itself was conducted using the WebExp software (Keller et al. 1998), which includes one practice session
170 Thomas Hoffmann estimating line lengths and one judging linguistic items. Moreover, it automatically randomizes the order of presentation of stimuli in the main experiment. 4.2. Results The experimental data were normalized by transformation to z-scores, which effectively standardizes the scales used by each informant (following Featherston 2005). Repeated measures analyses of variance were then carried out by subjects and by items. These tests yielded significant main effects of Preposition Placement, Relativizer, and PP function (this last only by items). There were also significant interactions of Preposition Placement by Relativizer, and Preposition Placement by PP function. All significant ps < 0.05.4 4.3. Discussion In Figures 1 and 2 below the mean judgments with standard error bars are plotted for pied piping and stranding across relativisers and PP types. As can be seen, the two main effects of Preposition Placement and Relativiser can actually be attributed to a Preposition Placement*Relativiser interaction effect, i.e. the ban on pied piping with that- and Ø-relativisers: while the judgements of pied-piping with wh-relativisers are significantly higher than with that- and Ø-relativisers, all three relativisers are equally acceptable in relative clauses with preposition stranding:
Figure 1. Pied Piping across relativisers and PP-types
Corroborating empirical evidence on preposition placement in English RCs 171
Figure 2. Stranding across relativisers and PP types
The ungrammaticality of pied piping with that and Ø-relativisers was expected in the light of the absence of these constructions in the ICE-GB corpus and native speaker introspection. However, the experiment also supported the hypothesis that across different PP types stranding is equally acceptable for the different relativisers. This consequently supports the use of that and Ø-tokens as intra-corpus corroborating evidence for the evaluation of the categorical effects of the wh-tokens in Table 2. Furthermore, this effect also helps to illustrate another advantage of carefully designed introspection experiments: the absence of a phenomenon in a corpus might indicate its ungrammaticality. Subtle differences in judgements of such phenomena, however, cannot only corroborate such findings but, additionally, might reveal degrees of ungrammaticality (Kempen & Harbusch 2005; Sorace & Keller 2005). Now magnitude estimation yields gradient judgements, but this of course does not automatically entail that these gradient differences in acceptability entail gradient differences in grammaticality. However, since all judgements in magnitude estimation experiments are always relative, contrasting the judgements of various constructions gives extremely insightful results. Take e.g. Figure 3, which plots the mean judgments together with standard error bars of pied piping across relativisers and PP types (just like Figure 1), but also gives the mean judgements for the various types of filler sentences used in the experiment:
172 Thomas Hoffmann
Figure 3. Pied Piping across relativisers and PP types compared with fillers
As Figure 3 shows pied piping with that and Ø-relativisers is judged considerably worse than pied piping with wh-relativisers or the grammatical fillers. Instead, the two constructions pattern at the very end of the acceptability cline along with the ungrammatical fillers. Following Sorace and Keller (2005), this can be taken as an indication of the fact that pied piping with that and Ø-relativisers violates a hard grammatical constraint. While pied piping with that and Ø were both treated as violations of hard grammatical constraints by subjects, Figure 2 already indicates that the Preposition Placement*PP function interaction includes the violation of a soft constraint violation. As Figure 4 shows, preposition stranding with prepositional verbs is judged better than with the other two PP-type contexts. The stranded temporal/location adjunct PPs in turn are judged better than the manner/degree adjunct tokens:
Figure 4. Stranding means for all relativisers across PP types compared with fillers
Corroborating empirical evidence on preposition placement in English RCs 173
Note that the cline of acceptability in Figure 4 corroborates the hypothesis based on the corpus data: a preposition can only be stranded if it heads a PP which contributes interpretable thematic information to the predicate. Therefore it comes as no surprise that stranded manner/degree adjuncttokens are the only PP-context which is judged significantly worse than the grammatical fillers (t(35) = -5.905, p < 0.001).5 Yet compared with the remaining filler stimuli, it is also important to see that stranding with manner/degree adjuncts receives better judgements than the set of ungrammatical fillers. Preposition stranding can thus be considered a soft grammatical constraint (Sorace & Keller 2005).
5. Preposition placement in relative clauses: Corpus evidence II In a next step the results of the experiment were then incorporated into the multivariate analysis of the corpus data. The program used for the multivariate analysis of the ICE-GB data was Goldvarb 2001 (Robinson et al. 2001). Goldvarb gives a descriptive summary of the data and carries out an inferential logistic regression analysis. Since it automatically corrects for associations between factors, the program facilitates the evaluation of the actual influence of the various factor groups and helps to find the model which best describes the data (Sigley 1997; Paolillo 2002). Once the best model for the data has been identified by Goldvarb’s “step-up/step-down” analysis, the program reports three central parameters: the weights of the included factors, the significance of the last factor group added to the model, and the fit of model. Goldvarb reports the factor weights of its logistic regression on the logistic/probability scale. As a result, the neutral value for Goldvarb factors is 0.5, with factors below 0.5 having an inhibiting effect and above 0.5 having a favouring effect on the dependent variable. Values that are equidistant from the neutral value in either direction have the same magnitude of effect (thus factors of 0.1 and 0.9 have the same magnitude of effect, but in the opposite directions). Note that all differences in parameter values have a smaller effect around the neutral value (Paolillo 2002). Since factor weights are calculated by maximum likelihood estimation, model selection in Goldvarb employs a G2-test (the standard test for maximum likelihood models, Paolillo 2002): factor groups are added one at a time in the step-up/step-down regression analysis. At each step the loglikelihood of the more complex model (including a new factor) and the less complex model (without the additional factor) are subjected to a G2-test.
174 Thomas Hoffmann The output of this test is given as the significance of a model. New factors are only included in the model if adding them significantly improves the model and adding a different factor does not improve it more. The program also tests the fit of each model. In a second G2-test it computes the absolute goodness of model fit. This is termed “Fit:X-square” (Sigley 1997, 2003). Finally, in addition to these parameters, feeding the best model into a so-called Binomial One-level Analysis, the program gives the actual as well as the expected applications of the dependent variable for each cell created by all factor combinations, together with a chi-square Error value for the differences of the actual and expected realisations and the overall chisquare for all cells. All these parameters allow an in-depth investigation of potential interaction effects (Sigley 2003). Goldvarb cannot compute categorical effects (Young & Bayley 1996: 272–274; Sigley 1997: 240). Consequently, tokens exhibiting such factors either have to be eliminated from the data, or recoded with other noncategorical factors from the same factor group, provided there were sufficient linguistic reasons to do this (Young & Bayley 1996: 272–274; Sigley 1997: 240; Paolllio 2002).6 In the present study, those factors which were identified as genuine grammatical constraint violations in the corpus and experimental studies were excluded from the inferential mu1tivariate analysis: i.e. the 350 finite that and Ø tokens as well as all 96 nonfinite tokens had to be removed. For the same reason all 248 PP type tokens which do not add thematic participants (cf. Table 2), were also excluded. Furthermore, as pointed out in Hoffmann (2005), there was a complexity*PP function interaction: accompaniment, movement, temporal and cause/reason/purpose PPs always pied pipe the preposition in less complex (MCN < 3) environments. While the exact nature this effect requires further experimental data (is pied piping a preposition over a small number of nodes to the front of the clause more economic with non-obligatory PPs?), including these tokens severely affected the model fit (yielding significant Fit:X-square values). Consequently, these 33 tokens also had to be removed. All in all this left 450 tokens for the multivariate analysis. In contrast to this, the corroborating corpus and experimental evidence supported the conclusion that the absence of stranding in subcategorized PPs, location, affected location and direction PP adjuncts with wh-relativisers in the ICE-GB was in fact an accidental gap. Thus, in contrast to Hoffmann (2005), there was enough evidence to support the recoding of these tokens with other appropriate PP type tokens into new factors (cf. below).
Corroborating empirical evidence on preposition placement in English RCs 175
Subjecting the reduced token set to Goldvarb’s inferential statistical analysis yielded a model for the data which had a good fit (Fit:X-square test: p = 0,5610). This model (cf. Table 3) included the factor groups Text type*Relativiser, PP function, RC function and Phrase containing PP: Table 3. Independent effects model of significantly contributing factors (factor weights > 0.5 favouring, and factor weights <0.5 inhibiting pied piping) Factor group (significance relative to this model)
Factor private dialogue private correspondence whose (all text types)
Text type *Relativiser (p = 0.000)
PP function (p = 0.000)
Pied piped/Total (% pied piping)
Goldvarb weight (full model)
5 / 29 (17%)
0,002
2/
6 (33%)
0,083
public dialogue unscripted speeches
92 / 126 (73%)
0,152
scripted speeches non-professional writing business letters broadcast news
89 / 94 (95%)
0,437
printed edited texts
194 / 195 (99%)
0,903
prepositional X V-X-P idioms
55 / 82 (67%)
0,145
179 / 212 (84%)
0,366
accompaniment position in time means/instrument 148 / 156 (95%) cause/reason/purpose/result position/location
0,843
optional complements movement obligatory complement subcategorized PP affected location direction
RC function (p = 0.000)
non-restrictive
78 / 110 (71%)
0,200
restrictive
304 / 340 (89%)
0,610
Phrase containing PP (p = 0.001)
AP/VP
352 / 418 (84%)
0,437
NP
30 / 32 (94%)
0,964
Fit:X-square: p = 0,5610 / R2 = 0,927196431 / multiple adjusted R2 = 0,907112688 Internal estimate of accuracy = 0.92 / Cross-validation estimate of accuracy = 0.916 7
176 Thomas Hoffmann The first factor group identified as significantly influencing preposition placement was an interaction of Text type*Relativizer. While all other whtokens (who, whom, which) were sensitive to the level of formality, whose strongly disfavoured pied piping regardless of the text type the token occurred in (factor weight: 0,083). (7) Pierce Inverarity, whose wealth she is to benefit from <W1A-010 #16> Tentatively, I would argue that the fact that stranding is strongly preferred with whose is due to the complexity of the construction. Take e.g. (7): whose does not only function as a relativiser, it also occupies a determiner slot within the NP whose wealth. In order to interpret whose correctly the parser must therefore see inside the entire NP and a pied piped preposition constitutes unnecessary intervening structure. Yet, since the parameter estimation of whose is based on a rather low token number of only six instances, this hypothesis requires further experimental corroboration. As Table 3 shows, the other wh-token are clearly affected by the level of formality. The most informal contexts, i.e. private dialogue and private correspondence, are unsurprisingly also the factors which disfavour pied piping the most (with a weight of 0,002). On the other end of the scale, printed texts, which have undergone some sort of editorial process, are by far the most favouring pied piping environment (with a weight of 0,903). The weights of the remaining text types have to be interpreted relative to these extreme endpoints of the level of formality scale. In contrast to the model presented in Hoffmann (2005), the inclusion of the whose tokens and the recoding of the accidental gap PP type tokens of Table 2 also resulted in a significant effect of the factor group PP function type relationship: As expected prepositions which head PPs which are specified as obligatory or optional by a predicate (optional complements, movement, obligatory complement, subcategorized, affected location, direction PPs) favour stranding (weight: 0,366). If a particular preposition is lexically specified as obligatory (prepositional X, V-X-P idioms PPs) pied piping is even more disfavoured (weight: 0,145). In contrast to this more adjunct like PPs which can co-occur with a wide range of predicates (accompaniment, position in time, means/instrument, cause/reason/purpose/ result PPs) strongly favour pied piping (weight: 0,843). This part of the statistical analysis thus corroborates the claim about the semantic cline of the factor PP function. The next factor which had a significant independent effect on the choice of the dependent variable was RC function: while restrictive relative clauses
Corroborating empirical evidence on preposition placement in English RCs 177
were identified as favouring pied piping with a weight of 0,610, non-restrictive clauses were assigned an inhibiting factor weight of 0,200. Now it is important to realize that this effect cannot be attributed to the fact that that and Ø-relativisers do not occur in non-restrictive relative clauses. While the lack of that and Ø-relativisers might result in an increased total number of non-restrictive tokens with wh-relativisers, the preposition placement variable should not be affected by this. In fact, if Van den Eynden was right in claiming that “stranding is not really an option with WH-[…] relatives” (1996: 444), one might expect that non-restrictive relative clauses should only exhibit pied piped prepositions. However, since Goldvarb corrects for associations between factors, and since the good Fit:X-square furthermore indicates the independence of all factors in the model, RC function can in fact be said to have an additional, independent influence on preposition placement. As argued above, one reason why non-restrictive wh-relatives could be identified as a factor favouring stranding has to do with the ban on that/Ø in non-restrictive clauses: since non-restrictive wh-relativisers occur more frequently in contexts which in restrictive relative clauses favour both stranding and that/Ø, the factor non-restrictive itself might become interpreted as favouring stranding. In addition to this, non-restrictive clauses also have weaker semantic ties with their antecedent (see, for instance, Olofsson 1981; Quirk et al. 1985). As is well-known, these weaker semantic ties even have prosodic effects, since in speech non-restrictive relative clauses are often separated from their antecedent by a pause (Huddleston, Pullum & Peterson 2002). A pied piped preposition might, however, be interpreted as establishing a closer relationship between the antecedent and the relative clause, fulfilling a kind of connective function. Note furthermore that the favouring stranding effect of non-restrictive relative clauses can also be seen as a matter of complexity: non-restrictive relative clauses are not necessary for the identification of the reference of the antecedent NP. Therefore the filler-gap identification process in non-restrictive relative clauses is less complex than in restrictive relative clauses (Hawkins 2004), which accounts for the favouring stranding effect of the former. If this interpretation of the corpus results is correct, then an interesting prediction which requires further research arises: preposition stranding should be even more favoured in questions since these filler-gap structures involve even less information which needs to be processed (Hawkins 2004; for data which supports this claim cf. Hoffmann in prep.) Finally, the last factor group chosen as significantly contributing to the distribution of the dependent variable was the factor group Phrase contain-
178 Thomas Hoffmann ing PP: while AP- and VP-contained PPs slightly disfavour pied piping, PPs contained in NPs were found to strongly favour pied piping (with a factor weight of 0,964). (8)
And we were supposed to be trying to get it ready to let in that three months before Christmas with a new baby at Christmas which we weren’t going to just take no notice of <S1A-056 #296>
As (8) shows, stranding is possible with NP-contained PPs, but as the Goldvarb analysis indicates, normally the preposition is pied piped. Again, processing complexity seems to play a role here: since the filler-gap identification mechanism in cases where a preposition is stranded in an NP would have to look into a phrase which is embedded in another phrase, the VP, in order to relate the filler to the correct gap site, pied piping is preferred in these structures (also Hoffmann 2006).
6. Conclusion: The verdict Preposition placement in English relative clauses is a complex phenomenon that is affected by many categorical and variable factors. Throughout this article I have argued that using multiple data sources as corroborating evidence helps linguists to present a much stronger case to the jury of their peers. In an original analysis of the ICE-GB corpus data (Hoffmann 2005) I had already adduced within-corpora corroborating evidence to separate systematic gaps (PPs that do not introduce thematic participants) from apparent gaps (PPs which add thematic participants but did not exhibit a stranded preposition). Yet, due to the negative data problem this hypothesis was certainly indicative but not a hundred percent conclusive. Thus a different type of further evidence was needed, i.e. experimental data. The experimental data corroborated the findings of the corpus study and in addition distinguished ungrammatical constructions violating hard constraints (pied piping with that and Ø) from those violating soft constraints (stranding with PPs that do not introduce thematic participants). In a next step it was then possible to run a much more refined multivariate analysis over the corpus data, since the apparent gap PPs could be recoded and included while the systematic gap tokens could be excluded. This analysis yielded a significant model which revealed the many different factors that influence the placement of prepositions (level of formality for all wh-relativisers except whose,
Corroborating empirical evidence on preposition placement in English RCs 179
the syntactic function of the PP, the restrictiveness of the relative clause, the Phrase containing PP).
Acknowledgements This article has greatly profited from the feedback of many colleagues. First, I would like to thank Sam Featherston for introducing me to WebExp and for his continuous help with all questions concerning the statistical analysis of magnitude estimation data. Then I would like to express my gratitude to John Maindonald for his help with the R 2.2.1 software, and to John Paolillo and Robert Sigley for their Goldvarb support. Finally I am especially indebted to Holger Saurenbach for critically reading all my first drafts, which is not always an easy task.
Notes 1.
2.
Take, e.g., the two competing structures the man [on [whom]NP]PP [I]NP [relied]VP and [whom]NP [I]NP [relied]VP [on]PP. In both cases four ICs (two NPs, a PP and the VP) are realised by four words yielding an IC-to-Non-IC ratio of 4/4 = 100%. In contrast to this the MCNs for these structures would be [1+1+2+3] /4= 1.75 and [1+2+3+4] /4 = 2.5. These figures differ slightly from the ones given in Hoffmann 2005 since the present study also included all 8 whose-tokens, which had not been investigated in the earlier study: as a result there are now 662 pied piped wh-finite tokens (comprising the 659 tokens of Hoffmann 2005 plus 3 whose tokens). In addition to that, the 69 stranded wh-finite tokens consist of the original 62 items plus 5 whose and two new tokens which previously had erroneously been excluded (<S1B-033 #102>) or miscoded as a that-token (S2A-051 #12>). The figure for non-wh-finite tokens is also marginally different from the one given in Hoffmann 2005 (350 vs. 353 tokens) since in addition to the reclassification of <S2A-051 #12> as a wh-token, 5 that-relative-clause tokens were recoded as cleft-sentences (<S2A-060 #8>, <S1B-008 #23>, <S1B-018 #112>, <S2A029 #14>, <S1A-097 #285>) and one finite ∅-token was recoded as a hollow clause (<W1B-011 #112>) and another as a non-finite token (<S1A-033 #75>). In contrast to this a thorough reinvestigation of the corpus data revealed that two that- (<S1A-002 #156>, <S1B-008 #15>) and three ∅-token (<S1A-062 #1>, <S1A-084 #150>, <S1A-088 #217>) had been overlooked in the original analysis.
180 Thomas Hoffmann 3.
4.
5.
6.
7.
As Wasow, Jäger & Orr (in prep.) have shown, the choice of head noun seems to affect the presence of a particular relativizer: in their Switchboard corpus data of non-subject relative clauses the antecedent stuff, e.g., clearly favours the presence of a that relativizer (in 62.8% of all cases). It is therefore conceivable that particular head nouns also favour relative clauses which are introduced by a pied piped preposition and a wh-relativizer. Elsewhere I have argued that the antecedent way, e.g., clearly favours relative clauses with in which in SpecC (cf. Hoffmann in prep.). The reason for this appears to be a combination of processing factors and usage-based entrenchment procedures. Here are the full details of the repeated measures analyses by subject (F1) and by item (F2): Preposition Placement (F1(1,33) = 4.536, p < 0.05; F2(1,5) = 32.261, p < 0.01); Relativiser (F1(2,66) = 17.149, p < 0.001; F2(2,10) = 38.783, p < 0.001); PP function (F1(2,66) = 0.997, p > 0.30; F2(2,10) = 30.281, p < 0.001); Preposition Placement*Relativiser (F1(2,66) = 9.740, p < 0.001; F2(2,10) = 78.271, p < 0.001); and Preposition Placement*PP function (F1(2,66) = 4.217, p < 0.02; F2 (2,10) = 20.075, p < 0.001). Temporal/location adjuncts are judged as good as the grammatical fillers (t(35) = –1.349, p > 0.18), while prepositional verbs are considered better than the grammatical fillers (t(35) = 3.728, p < 0.005). The latter effect can be explained by the fact that prepositional verbs such as rely on or talk to are stored as complex lexical items, which facilitates the interpretation of such V-P structures. A third possibility would have been to add a fictitious token (Paolillo 2002) coded only for the categorical environment, and for the dependent variant. As pointed out by an anonymous reviewer, however, such a fictitious token may distort model results if the categorical environment is not very frequent. Note that adjusted R2 and adjusted multiple R2 as well as cross-validation parameters are not automatically calculated by Goldvarb since the standard test of model fit in maximum likelihood models is the G2-test (i.e. Goldvarb’s Fit: X-square test; Paolillo 2002). Since many researchers are likely to be unfamiliar with this model fit parameter, however, the additional model fit parameters were calculated by feeding the final model into the R 2.2.1 software to get the cross-validation parameter. The adjusted R2 and adjusted multiple R2 were calculated manually using the actual and the expected applications for all cells of the Binomial One-level output.
References Aarts, Bas 2000
Corpus linguistics, Chomsky and Fuzzy Tree Fragments. In Corpus Linguistics and Linguistic Theory, Christian Mair & Marianne Hundt (eds.), 5–13. Amsterdam /Atlanta, GA: Rodopi.
Corroborating empirical evidence on preposition placement in English RCs 181 Aarts, Jan 1991
Intuition-based and observation-based grammars. In English Corpus Linguistics, Karin Aijmer & Bengt Altenberg (eds.), 44–62. London / New York: Longman. Bard, Ellen Gurman, Dan Robertson & Antonella Sorace 1996 Magnitude Estimation of Linguistic Acceptability. Language 72: 32– 68. Cowart, Wayne 1997 Experimental Syntax: Applying Objective Methods to Sentence Judgements. Thousand Oaks: Sage. Featherston, Sam 2004 Bridge verbs and V2 verbs – the same thing in spades? Zeitschrift für Sprachwissenschaft 23 (2): 181–210. 2005 Magnitude estimation and what it can do for your syntax: Some whconstraints in German. Lingua 115 (11): 1525–1550. Fillmore, Charles J. 1992 ‘Corpus linguistics’ or ‘Computer aided armchair linguistics’. In Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991, Jan Svartvik (ed.), 35–60. Berlin /New York: Mouton de Gruyter. Gries, Stefan Th. 2002 Preposition stranding in English: Predicting speakers’ behaviour. In Proceedings of the Western Conference on Linguistics. Vol. 12, Vida Samiian (ed.), 230–241. Fresno, CA: California State University. Hawkins, John A. 1994 A Performance Theory of Order and Constituency. Cambridge: Cambridge University Press. 2004 Efficiency and Complexity in Grammars. Oxford: Oxford University Press. Hoffmann, Thomas 2005 Variable vs. categorical Effects: Preposition pied piping and stranding in British English relative clauses. Journal of English Linguistics 33 (3): 257–297. 2006 Corpora and introspection as corroborating evidence: The case of preposition placement in English relative clause. Corpus Linguistics and Linguistic Theory 2,2: 165–195. in prep. English relative clauses and construction grammar: something that preposition placement can shed light on? In Constructional Explanations in English Grammar, Graeme Trousdale & Nikolas Gisborne (eds.) Berlin /New York: Mouton de Gruyter. Huddleston, Rodney, Geoffrey K. Pullum & Peter Peterson 2002 Relative Constructions and Unbound Dependencies. In The Cambridge Grammar of the English Language, Geoffrey K. Pullum &
182 Thomas Hoffmann Rodney Huddleston (eds.), 1031–1096 Cambridge: Cambridge University Press. Johansson, Christine, & Christer Geisler 1998 Pied piping in spoken English. In Explorations in Corpus Linguistics, Antoinette Renouf (eds.), 67–82 Amsterdam: Rodopi. Keller, Frank 2000. Gradience in Grammar: Experimental and Computational Aspects of Degrees of Grammaticality. Ph.D. thesis, University of Edinburgh. Keller, Frank, Martin Corley, Steffan Corley, Lars Konienczny & Amalia Todirascu 1998 WebExp: A Java toolbox for web-based psychological experiments. Technical Report HCRC/TR-99, Human Communication Research Centre, University of Edinburgh. Keller, Frank & Theodora Alexopoulou 2005 A crosslinguistic, experimental study of resumptive pronouns and that-trace effects. In Proceedings of the 27th Annual Conference of the Cognitive Science Society, Bruno G. Bara, Lawrence Barsalou & Monica Bucciarelli (eds.), 1120 –1125. Kepser, Stephan & Marga Reis (eds.) 2005 Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives. Berlin /New York: Mouton de Gruyter. Lu, Bingfu 2002 How does language encode performance limitation into its structure. http://www.people.fas.harvard.edu/~whu/China/chunk.doc Maindonald, John & John Braun 2003 Data Analysis and Graphics Using R: An Example-based Approach. Cambridge: Cambridge University Press. Nelson, Gerald, Sean Wallis & Bas Aarts 2002 Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam /Philadelphia: Benjamins. Olofsson, Arne 1981 Relative Junctions in Written American English. Göteborg: ACTA Universitatis Gothoburgensis. Pesetsky, David 1998 Some Optimality principles of sentence production. In Is the Best Good Enough? Optimality and Competition in Syntax, Pilar Barbosa et al. (eds.), 337–383. Cambridge, MA: MIT Press. Pullum, Geoffrey K. & Rodney Huddleston 2002 Prepositions and prepositional phrases. In The Cambridge Grammar of the English Language, Geoffrey K. Pullum & Rodney Huddleston (eds.), 597–661. Cambridge: Cambridge University Press. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik 1985 A Comprehensive Grammar of the English Language. London: Longman.
Corroborating empirical evidence on preposition placement in English RCs 183 Robinson, John S., Helen R. Lawrence & Sali A. Tagliamonte 2001 GOLDVARB 2001: A multivariate analysis application for Windows. http://www.york.ac.uk/depts/lang/webstuff/goldvarb. Sag, Ivan A. 1997 English relative clause constructions. Journal of Linguistics 33: 431– 484. Sampson, Geoffrey 2001 Empirical Linguistics. London, New York: Continuum. Schütze, Carson T. 1996 The Empirical Base of Linguistics: Grammaticality Judgements and Linguistic Methodology. Chicago: Chicago University Press. Sigley, Robert J. 1997 Choosing your Relatives: Relative Clauses in New Zealand English. Ph.D. thesis, Victoria University of Wellington. 2003 The importance of interaction effects. Language Variation and Change 15: 227–253. Trotta, Joe 2000. Wh-clauses in English: Aspects of Theory and Description. Amsterdam /Philadelphia, GA: Rodopi. Van der Auwera, Johan 1985 Relative that – a Centennial Dispute. Journal of Linguistics 21: 149– 179. Wasow, Thomas, T. Florian Jaeger & David M. Orr in prep. Lexical Variation in Relativizer Frequency. For the Workshop on Expecting the unexpected: Exceptions in Grammar at the 27th Annual Meeting of the German Linguistic Association, University of Cologne, Germany. http://www.bcs.rochester.edu/people/fjaeger/papers/WasowJaegerOrrDGfSpaper.pdf.
Locality and accessibility in wh-questions Philip Hofmeister, T. Florian Jaeger, Ivan A. Sag, Inbal Arnon and Neal Snider
1. Competing wh-orders Even in relatively configurational languages, such as English, speakers frequently have a choice between different constituent orders. Many of these word order variations have been linked to complexity (Hawkins 2005; inter alia). For example, heavy-NP shift is more likely if the shifted NP is more complex than the NP it shifts over (Wasow 1997). Other cases of word order variations, however, have not been considered in these terms. The choice between different wh-phrase orders, as in (1), has been said to be determined by (categorical) grammatical constraints, such as Superiority (Kuno & Robinson 1972; Chomsky 1973; inter alia). (1)
a. Who bought what? Non-SUV b. What did who buy? Superiority Violation (SUV)
According to such accounts, (1b) is ungrammatical in English. These accounts, however, do not predict the findings of Arnon et al. (2005) and Clifton et al. (2006), both of whom present evidence from corpora, attesting the usage of Superiority-violating examples. Nor can they accommodate the gradient nature of the contrast that has emerged in several studies (Featherston 2005; Fedorenko & Gibson 2006). In Arnon et al. (2005), we examined an alternative account, which we dubbed the Wh-Processing Hypothesis, which treats wh-phrase ordering as being subject to the same type of constraints as other word order variations. The Wh-Processing Hypothesis predicts that speakers disprefer more complex wh-dependencies. Here we examine to what extent factors known to affect the processing of filler-gap dependencies (FGDs) also affect the relative acceptability of different whphrase orders. We focus, in particular, on two factors in the processing of wh-questions: locality and accessibility. These factors play significant roles in the processing of FGDs in general, as we discuss below. One of our goals in this paper is to explore the extent to which these factors can explain SUVs.
186 Hofmeister, Jaeger, Sag, Arnon & Snider In the next section, we define and discuss the two factors of locality and accessibility, showing how these factors have been previously related to processing difficulty. In section 2.2, we present the Wh-Processing Hypothesis. In section 3, we present the results of three acceptability surveys and one reading time study which test the effects of the above-mentioned factors on the processing and acceptability of questions. Finally, in section 4, we discuss the implications of these results, other possible factors, and potential problems with this account. 2. Locality and accessibility The first factor we consider here is the locality of the dependency. Gibson (2000), Hawkins (2005), and many others observe that the distance between the filler and gap strongly affects the processing difficulty and relative acceptability of sentences with FGDs. For example, English object relatives, as compared to the shorter subject relatives, require more resources and increase processing difficulty, as indicated by reading times, question-answer accuracy, and lexical-decision tasks (King & Just 1991; inter alia). Since wh-interrogative dependencies are also non-local, it is reasonable to assume that they are subject to the same processing constraints as relative clauses. In fact, the lack of a specified, identifiable referent associated with a whinterrogative filler potentially presents an additional cognitive challenge. Hence, we hypothesize that locality is also likely to play an important role in determining the acceptability of multiple wh-phrase (interrogative) constructions. It has also been noticed that the type of the wh-filler (which-NP vs. bare wh-item) influences the acceptability of SUVs. Karttunen (1977) points out that examples like (2) sound better than (3): (2)
Which class of drug will which patient get?
(3)
What will who get?
Pesetsky (1987) further notices that the type of the in-situ wh-phrase affects acceptability independently, so that (4) is judged better than (3): (4)
What will which patient get?
Pesetsky ascribes this difference to “D(iscourse)-linking” of the which-NP, which exempts it from the normal conditions on wh-phrase ordering. The proposal, however, that the type of wh-filler and wh-intervener affect
Locality and accessibility in wh-questions 187
grammaticality is both ad hoc and without independent motivation. We propose that the factors explaining SUVs are both more general and independently motivated. We discuss next how wh-order preferences, widely discussed under the label of D-linking, relate to more general processing mechanisms. Specifically, we believe there is a strong relationship between the form and content of an expression and its degree of activation, which has been described in terms of accessibility (Ariel 1990) and that this degree of activation strongly impacts the processing of the FGD. FGDs have been shown to be affected by the referential properties of material intervening between the filler and the gap. For example, in sentences like (5), verbs are read fastest when the relative clause subjects are pronouns, while first or famous names lead to faster reading times than definite descriptions: (5)
The consultant who (we/Donald Trump/the chairman /a chairman) called advised wealthy companies.
Warren & Gibson (2002, 2005) interpret these results in terms of accessibility (Ariel 1990, 2001): the more accessible the intervening referents, the less burden there is on the processor, which is already taxed by maintaining the filler-gap dependency. Accessibility is a measure of activation level, which is partially indicated by the choice of referring expression. The form of an NP acts as a cue to the listener as to how much work is necessary to activate or retrieve the correct antecedent. As information and morphological complexity in the NP increase, the amount of work necessary to retrieve the antecedent also increases. Processing less accessible forms, therefore, requires more work and hence creates an additional processing difficulty while an FGD is being parsed. Interrogative wh-dependencies, like other FGDs, also exhibit sensitivity to the properties of intervening material. Alexopolou & Keller (2003) show that words associated with a higher cognitive cost appearing between a whfiller and gap impair the integration of wh-phrases with (the subcategorizer of) the gap. There is also evidence from German that certain intervening wh-phrases improve the acceptability of superiority-violating, multiple whquestions: German speakers disprefer bare in-situ wh-phrases in SUVs (e.g. wer), as compared to complex wh-phrases (e.g. welcher Mann; Featherston 2005). We interpret these results as reflecting the increased processing difficulty introduced by bare wh-words. Locality and accessibility thus constitute the focus of this study. Before we turn to the predictions we make about these factors and how they influ-
188 Hofmeister, Jaeger, Sag, Arnon & Snider ence the processing of wh-dependencies, we first address in detail how accessibility applies to wh-phrases.
2.1. Accessibility: Wh-phrases versus referential NPs While accessibility has been almost exclusively applied to referential NPs, we propose that the same mechanisms that influence the processing of referential NPs are also at play during the processing of wh-phrases. We dwell on this subject here in order to address the issue of why the explicitness of intervening wh-phrases and referential NPs affect processing difficulty in seemingly different ways. As pointed out above, more explictness correlates with more processing difficulty for referential NPs, but the opposite seems to be true for wh-phrases. To explain this difference, we consider here some hypotheses about the most important predictors of activation for whphrases and referential NPs. For referential NPs, morphologically simple and less informative NPs (e.g. pronouns) are used to refer to entities of higher activation or salience, while morphological complexity and high informativity (e.g. definite descriptions) indicate that the referent is less activated at the time of utterance (Ariel 2001). Thus, the choice between a pronoun or a definite description is conditioned by the salience of that particular individual in the preceding discourse. Notice that it only makes sense to compare the accessibility of two phrases when they have the same intended interpretation (i.e. both phrases have the same referent). In addition to marking a current degree of activation, the form of NPs also partially determines the degree of activation subsequent to their utterance – referred to as future accessibility by Ariel (2001). In short, the more explicit an NP is, the greater the subsequent increase in activation of the corresponding referent(s). Increases in activation not only make subsequent references with higher current accessibility markers more likely, they also facilitate other linguistic operations that involve that information, such as the integration of fillers and gaps. Thus, all other things being equal, the referent of an expression like the gorilla approaching at breakneck speed, as opposed to it, is more likely to become the discourse topic and have a higher activation level at subsequent points in the utterance. In support of this view, Gernsbacher (1989) presents evidence that proper names reactivate an antecedent more strongly than a pronoun. From this perspective, current activation marking is in an inverse relation to future activation marking. A higher accessibility marker like a personal pronoun
Locality and accessibility in wh-questions 189
indicates high current accessibility, but does relatively little to increase activation. As Ariel (2001: 68) notes, this “can explain why speakers shift to lower accessibility markers from time to time, even when they continue to discuss the same discourse entity.” That is, to maintain topicality, speakers use longer and more explicit forms on occasion to compensate for normal activation decay and interference from other discourse entities. The same reasoning, we hypothesize, applies to wh-phrases: all other things being equal, the concept of politicians is more salient after an utterance of which politician (in context) than after who. Wh-phrases, too, have a range of possible forms from morphologically simple and uninformative (e.g. who) to more complex forms that package more information (e.g. which politician) to ever more complex and informative forms (e.g. which politician from Missouri). Given the greater degree of morphological complexity and explicitness in which-NPs, we categorize them as higher future accessibility markers. Moreover, Frazier & Clifton (2002) provide evidence that which-NPs are better antecedents for pronouns than bare wh-words like who and what. Since high future accessibility phrases encourage the subsequent use of high current accessibility anaphors (i.e. pronouns), the relation between explicitness and future activation is thus the same for anaphoric and wh-expressions. Preliminary results from reading-time experiments conducted by the first author also favor this ranking. In unary, wh-island constructions with supporting contexts (Which employee/Who did Albert learn whether they dismissed after the annual performance reviews?), which-NPs lead to significantly faster reading times than a bare wh-item at the embedded verb and in subsequent regions. Accordingly, the evidence from Featherston and Frazier & Clifton can all be seen to reflect the fact that which-phrases are more accessible than simple wh-pronouns at the time that fillers and gaps are integrated. If the difficulty of processing a head is a function (among other things) of the activation levels of its arguments, then the form preferences for both wh-questions and referential NPs emerge as a preference for high argument activation at the point when the head is processed. In examples like (6) from Warren & Gibson (2005), variants with highly salient personal pronouns will have the highest argument activation, because activation starts high and hence can withstand more decay and/or interference effects.
" you & " & we $ $ $ $ (6) It was # Patricia ' who # Dan ' avoided at the party. $ the lawyer$ $% the businessman$( % (
!
!
190 Hofmeister, Jaeger, Sag, Arnon & Snider In contrast, argument activation starts low or at zero in multiple wh-questions, but is boosted higher when more information is expressed in the whphrases. Therefore, a which-phrase in either argument position of an SUV should satisfy the preference for higher activation at the verbal head. This still leaves a noticeable distinction between the processing of referential NPs and wh-phrases. Recall that highly salient, but less informative NPs serve as the best kind of intervening referential NP (Warren & Gibson 2005). The above-cited data on wh-phrases, however, appears to indicate that more explicit and informative wh-phrases are preferred as interveners. Assuming that processing ease depends upon activation level, this means that wh-phrase interveners are most activated when the wh-phrase is explicit, while referential NP interveners are most activated when the form is not very informative but marks a highly salient referent. One way to account for the apparently different effects of explicitness is to point to the simple fact that interrogative wh-phrases are not anaphoric. Anaphoric NPs are used to refer back to discourse referents previously mentioned. In other words, they evoke information already in the common ground (explictly or implicitly). Hence, the primary task in processing a referential NP is retrieving the correct antecedent or, failing that, accommodating the existence of an antecedent. This whole process is expedited when the referent or mental entity is highly salient at the point the anaphor is reached.1 A processing benefit for more explicit anaphoric forms is not apparent in the Warren & Gibson (2005) results.2 This does not preclude the possibility of some positive correlation between explictness and activation boosting with respect to anaphoric NPs; instead, the results permit the view that the effect of activation boosting is obscured by the profound effect of salience in that study. One way to account for this is to argue that pronouns, proper names, and definites differ too much in their current activation levels (due to the need to express important differences in salience) for boosting to make much difference. On this view, it is the property of being an anaphor that causes activation boosting to be relatively unimportant. In contrast, wh-phrases do not function as anaphors, although parts of their interpretation may derive from the preceding discourse. Rather, whphrases are used to construct complex objects – questions – which either seek to gain information (as in main clause interrogatives) or else to make a clausal argument that can be predicated over (as in embedded interrogatives). Questions, therefore, will be more easily understood (and better answered, for that matter) when either a) the context strongly provides the focus of the question or b) the wh-phrase itself explicitly narrows down the scope of the inquiry. Under the assumption that wh-phrases have either low
Locality and accessibility in wh-questions 191
or zero activation prior to their utterance (which follows from their nonanaphoric function), using a more explicit wh-form should facilitate the retrieval and integration process. In other words, because the initial activation is so low, activation strength is largely dependent on activation boosts that are, in turn, dependent on explicitness. This hypothesis is consistent with all the wh-phrase data considered so far.3 In sum, we propose that the apparent differences in the effect (size) of explicitness can be attributed to wh-expressions being non-anaphoric. This proposal makes two interesting predictions for future research: a) an indefinite phrase should lead to faster processing at the verb if the indefinite phrase is more explicit (contains more information); b) in the right context, it may even be possible to observe effects of activation boost for anaphoric expressions (see footnote 3) – in such contexts, more explicit anaphoric NPs should lead to faster processing.
2.2. The Wh-Processing Hypothesis Based on these observations of how locality and accessibility affect FGDs, we propose the following Wh-Processing Hypothesis to account for the relative rareness of examples like (1b) in English, as compared to nonsuperiority violating orders like (1a): (7)
The Wh-Processing Hypothesis a. Factors that have been shown to burden the processing of referential filler-gap dependencies (e.g. relative clauses) burden the processing of all FGDs, including wh-interrogative constructions. b. Many filler-gap sentences that have standardly been analyzed as ungrammatical (violating ‘island’ constraints) are in fact grammatical, but are judged to be less acceptable by speakers because they are harder to process.
The reasoning implicit in (7b) builds on recent proposals to better understand the relation between speaker judgments and processing factors. See, for example, Fanselow & Frisch (2004). This hypothesis entails that speakers faced with a choice between several grammatical wh-orders, will disprefer those which (given the context) are associated with a greater processing cost. Combined with existing theories of processing complexity (e.g. Gibson 2000), the Wh-Processing Hypothesis makes the following predictions about wh-questions:
192 Hofmeister, Jaeger, Sag, Arnon & Snider (I)
In filler-gap constructions, the greater the distance between the filler and its gap, the less acceptable the sentence. (II) Less accessible fillers make filler-gap sentences less acceptable. (III) Less accessible interveners make filler-gap sentences less acceptable. Note that we make no assumptions about the relative importance of these predictions. That is, we do not conjecture whether the effect of distance is more important than accessibility or vice versa; nor does the Wh-Processing Hypothesis indicate if the accessibility of the filler is paramount to that of the interveners or vice versa.
3.
Experimental evidence
3.1. Methods We present here the results of three surveys eliciting acceptability judgments and one experiment measuring comprehension complexity in whquestions via self-paced reading.4 Acceptability judgments were elicited over the WWW using magnitude estimation (ME; Bard et al. 1996) with the WebExp software package (Keller et al. 1998). ME lets participants set their own continuous acceptability scale, allowing participants to express as many distinctions as desired. Acceptability judgments are made relative to a reference sentence. Participant’s judgments are subsequently standardized by dividing by the reference sentence’s score. All ME analyses are based on the z-score5 of these (log-transformed) standardized judgments. For the reading time study, residual reading times were used for the analysis. This method reduces variability due to individual differences in reading times.6 All experiments use Latin-square design: Each participant saw each item in exactly one condition, and all conditions occur equally often. All lists include at least as many fillers as experimental items. All results were analyzed using repeated measures analyses of variance (ANOVAs). Participants for the ME experiments were recruited via e-mail lists and online discussion forums. The reading-time study was conducted as part of another reading-time study at MIT’s Tedlab.
Locality and accessibility in wh-questions 193
3.2.
Locality effects on acceptability (ME1)
3.2.1. Materials ME1 investigates the effect of locality on the acceptability of wh-questions (Prediction I). Locality-based processing theories (e.g. Gibson 2000) predict that an increase in distance between filler and gap (measured in new discourse referents) makes wh-dependencies harder to process. We manipulated this distance by optionally attaching a six-word PP either to the which-phrase (8c,f) or to the other NP (8b,e). In addition, the which-phrase was either subject-extracted (8a–c) or object-extracted (8d–f): (8)
a. b. c. d. e. f.
Which man saw the girl? Which man saw the girl in the bar on California Ave? Which man in the bar on California Ave. saw the girl ? Which man did the girl see? Which man did the girl in the bar on California Ave. see? Which man in the bar on California Ave. did the girl see?
We hypothesized that longer filler-gap distances would engender higher processing costs, which would result in lower acceptability judgments. For example, the filler in (8d) is separated from the gap by only one new discourse referent, the girl; but in (8e), three new discourse referents intervene between the filler and the gap. Thus, we predict (8e) to be judged less acceptable than (8d). Notice that we further predict a difference between (8b) and (8e) for the same reasons, despite the roughly equivalent lengths of the questions. In general, Prediction I says that subject-extractions should be judged more acceptable than object extractions. The study includes 36 items in six different conditions. In addition, 34 fillers were included in each list. 18 of these came from another multiple-wh experiment. 42 native English speakers completed the survey, but the results from one individual were removed because of incomplete data for that subject. Participation did not result in compensation.
3.2.2. Results As shown in Table 1, object extractions (which have more intervening discourse referents) were judged as less acceptable than subject extractions in the subject but not the item analysis (F1 (1,35) = 4.9, p < .05; non-significant
194 Hofmeister, Jaeger, Sag, Arnon & Snider by items, F2 (1,35) = 2.5, p = .12). While the difference between examples like (8a) and (8d) turned out to be non-significant, this is not surprising since neither question involves more than one intervener and the number of interveners differed by only one. Notably though, the object wh-question with three intervening NPs (8e) was judged less acceptable than the subject wh-question (8b) of the same length with zero interveners. Overall, sentences were judged differently from each other if the difference in number of interveners was two or more. This may mean that, for simple unary whquestions, it takes at least two interveners to invoke any measurable cognitive challenge.
Figure 1. Acceptability ratings from ME1 (OE = object extraction; SE = subjection extraction; O = no attachment; WH = PP attached to wh-phrase; NP = PP attached to referential NP ) Table 1. Pairwise comparisons of the six conditions in ME1, including the difference in interveners for each pair. (OE = object extracted; SE = subject extracted; NP = six-word PP is attached to referential NP; WH = sixword PP is attached to wh-phrase) Pairs of Extraction.Attachment
Difference in # of interveners
Subj. analysis
Item analysis
OE.NP (8e) SE.NP (8b) SE.WH (8c)
3 3
p <.01 p =.17
p <.05 p =.29
OE.WH (8f)
2
p <.1
p <.05
OE.WH (8f) SE.NP (8b) SE.WH (8c)
1 1
p =.52 p =.38
p =.97 p =.14
SE.WH (8c) SE.NP (8b)
0
p =.15
p =.18
Locality and accessibility in wh-questions 195
3.3.
Accessibility effects on acceptability (ME2)
3.3.1. Materials In ME2, we addressed the issue of how accessibility affects acceptability. To do this, we manipulated the accessibility of both the object-extracted wh-filler (what vs. which book) and the intervening subject wh-phrase (who vs. which boy). All questions were embedded SUVs, as in (9): (9)
a. b. c. d.
Mary wondered what who read. Mary wondered which book who read. Mary wondered what which boy read. Mary wondered which book which boy read.
According to our predictions, examples with higher accessibility fillers and interveners should be preferred to those with low accessibility fillers and interveners. In other words, examples like (9d) should be judged the most acceptable and examples like (9a) the least acceptable. We are agnostic about the possibility of an interaction between filler and intervener accessibility, and so do not make any claims about how cases like (9b) and (9c) will be ordered with respect to each other. However, since one preference is satisfied in both (9b) and (9c), we expect that these cases are more acceptable than SUVS with two low accessibility wh-phrases, but less acceptable than SUVs with two high accessibility wh-phrases. Twenty items with 4 conditions each appeared in the experiment, as exemplified above. 42 people participated in this experiment over the web without any compensation.
3.3.2. Results The results confirm the prediction that less accessible wh-interveners (the in-situ wh-phrase) decrease acceptability (F1 (1,37) = 64.5, F2 (1,19) = 248.1, Ps < .001): Interveners with a lower activation at the verbal head decreased acceptability: examples like (9a-b) were judged worse than those in (9c–d), as illustrated in the graph below. We also observed a main effect of filler accessibility (F1 (1,37) = 19.2, F2 (1,19) = 15.7, Ps < .001). This effect is due to an interaction (F1 (1,37) = 9.9, F2 (1,19) = 9.8, Ps < 0.01): for which-interveners (9c,d), less accessible fillers reduce acceptability, but for bare wh-interveners, we found no effect of filler accessibility. That is, the accessibility of the filler had an effect
196 Hofmeister, Jaeger, Sag, Arnon & Snider when the in-situ wh-phrase was a which-NP, but not when the in-situ whphrase was who (as represented by the two rightmost bars of Figure 2).
Figure 2. Acceptability ratings of SUVs in ME2 with 95% confidence intervals shown.
According to the results, therefore, the effect of interveners actually outweighs the effect of fillers. Having a bare wh-intervener caused even the putatively “D-linked” examples like (9b) to be judged as badly as constructions with a bare filler and intervener. The prediction, therefore, that more accessible fillers always improve acceptability was not independently verified in this experiment. 3.4.
Effects of filler accessibility on acceptability (ME3)
3.4.1. Materials The lack of an effect for filler accessibility in the presence of bare whinterveners may seem surprising. ME3 addresses the possibility that the apparent lack of an effect for filler accessibility in the presence of a bare wh-intervener may be a spurious null result. The materials for this experiment consequently only varied the type of wh-filler (the intervener was always the bare wh-item who). ME3 also includes one more type of whexpression, what-NPs, in order to test whether “complex” wh-phrases in general count as high future accessibility markers: (10) a. Tom revealed what who invented. b. Tom revealed what device who invented. c. Tom revealed which device who invented.
Locality and accessibility in wh-questions 197
We did not entertain any predictions about how sentences like (10b) and (10c) should be judged with respect to each other, treating both simply as roughly equally more informative and syntactically more complex than the bare wh-word and hence as increasing future accessibility. ME3 also included non-SUV orders, resulting in 3 x 2 conditions. 18 experimental items were mixed with 52 fillers, of which 36 were items from ME1. 42 native English speakers participated in ME3 without any compensation. Only the results for the SUV condition are relevant here.
3.4.2. Results As per Prediction II, there was an effect of filler accessibility: compared to bare what-fillers, both which-NP and what-NP fillers were preferred (F1 (1,43) = 12.546, p < .001, F2 (1,17) = 5.235, p < .05). As can be seen in Table 2, grouping which-NPs and what-NPs is justified. Post-hoc pairwise comparisons revealed that the acceptability of which-NP and what-NP fillers in SUVs did not differ from each other (subject and item ts < 0.6, Ps > 0.5). The pairwise comparisons of both what-NPs and which-NPs to bare what-fillers reached significance by subjects, but not quite by items.7 In contrast to ME2, we do see an effect of filler accessibility. Note that the stimuli in ME2 and ME3 are both binary embedded wh-questions. In light of this, we tentatively conclude that the lack of a filler accessibility effect for bare interveners in ME2 is a spurious null result.
Figure 3. Acceptability ratings of SUVs and non-SUVs in ME3 (WHAT = bare wh-item; WHAT-NP = what-phrase; WHICH-NP = which-phrase
198 Hofmeister, Jaeger, Sag, Arnon & Snider Table 2. Pairwise comparisons by subjects and items SUV condition (F1: subjects)
MEAN Z-SCORE
t
df
Sig. (2-tailed)
what-NP vs. which-NP
.00312
.051
40
.960
bare what vs. which NP
–.19764
–3.292
40
.002
bare what vs. what-NP
.20075
3.195
40
.003
what-NP vs. which-NP
–.03791
–.563
17
.581
bare what vs. which-NP
–.20969
–1.752
17
.098
bare what vs. what-NP
.17178
1.604
17
.127
SUV condition (F2: items)
Interestingly, we also find that which-phrases are not unique markers of high future accessibility. The equally explicit what-NP fillers did not induce significantly different judgments of acceptability. Compared to bare whfillers, what-NPs and which-NPs both have a greater degree of morphological complexity and explicitness (i.e. more information). This greater degree of explicitness leads to higher future activation, expediting linguistic operations which require retrieval and use of that information. The results thus support a view that demarcates multi-word, complex wh-items from less informative, bare wh-items in terms of processing difficulty.
3.5.
Accessibility effects on comprehension complexity
3.5.1. Materials So far, we have worked under the assumption that current processing theories make correct predictions about comprehension complexity in wh-questions. The Wh-Processing Hypothesis in (7) allows for the possibility that differences in the acceptability of wh-orders are due to differences in the associated processing complexity. In order to test this assumption about processing complexity, we ran two self-paced, moving window reading time studies (SPR). In SPRs, participants read a sentence word by word at their own speed. To ensure proper comprehension, each experimental stimulus is followed by a true-false question about the participants or events described. Before the main experiment, a short list of practice items was presented to the participant in order to familiarize the participant with the task.
Locality and accessibility in wh-questions 199
(11) Ashley disclosed {what/which agreement}{who/which diplomat} signed after receiving permission from the president. The stimuli were adaptations of those used in ME2 – embedded SUVs in a 2 (filler accessibility) x 2 (intervener accessibility) design (with slight modifications, i.e. adding post-verbal PPs to control for reading time spillover effects). Like ME2, 20 experimental items were included in the experiment. 41 subjects participated in this experiment that was conducted at MIT’s TedLab, in conjunction with a separate, unrelated reading time experiment. Subjects were paid $10 per hour for their participation. The form of the wh-filler and wh-intervener were expected to affect reading times at the embedded verb (signed in (11) above). More specifically, we anticipated that the verb would be read fastest in the condition with the high accessibility filler and intervener (both which-NPs). Conversely, the slowest reading times were expected for the condition with the low accessibility filler (what) and intervener (who).
3.5.2. Results As predicted, less accessible fillers result in slower processing at the verb (F1 (1,40) = 17.7, p < .001, F2 (1,19) = 12.3, p < .003), as do less accessible interveners (F1 (1,40) = 10.5, F2 (1,19) = 11.5, Ps < .01). This replicates the main effects found in ME2 and ME3. Unlike the case in ME2, there was no significant interaction between filler and intervener accessibility.
Figure 4. Residual RTs with 95% confidence intervals, indicating type of superiority-violating object phrase (which-NP vs. who) and in-situ wh-phrase (which-NP vs. who) .
200 Hofmeister, Jaeger, Sag, Arnon & Snider We find a difference between the two conditions that have a bare whintervener (the two rightmost columns in Figure 2), which we did not find in ME2. As in ME3, the more complex and informative which-fillers were appreciably better than the bare wh-items. Interestingly, question-answer accuracy is also affected by accessibility (Figure 3). The results seem to mirror the results of ME2. First, questionanswer accuracy was significantly lower for bare wh-interveners (83%) than for which-interveners (92.5%) (F1 (1,40) = 18.6, p < 0.001; F2 (1,19) = 7.6, p < 0.02). We found no main effect for filler-accessibility on answer accuracy, but we found an interaction between intervener and filler accessibility (marginal by subject, F1 (1,40) = 3.6, p < 0.07; significant by item, F2 (1,19) = 5.6, p < 0.03). For wh-questions with bare wh-interveners, filler accessibility does not affect accuracy. If the intervener is a which-phrase, however, high accessibility which-fillers result in better question-answer accuracy (95%, SE = 2.5) than low accessibility bare wh-fillers (89.9%, SE = 3.1). Again, this pattern replicates the acceptability results from ME2.
Figure 5. Accessibility effects on question-answer accuracy with 95% confidence intervals.
4. Discussion Cumulatively, the results described above demonstrate that configurations of multiple wh-phrases display gradient acceptability, affected by locality and the accessibility of the filler and intervener. In SUVs, which-NP fillers improve acceptability judgments and reading times, as compared to bare wh-item fillers. Moreover, intervener accessibility impacts the processing of wh-dependencies as much as, or even more than filler accessibility: in-situ
Locality and accessibility in wh-questions 201
bare wh-items in SUVs decrease acceptability ratings and increase reading times at the verb. A similar dispreference for in-situ bare wh-subjects in multiple wh-questions has also been found for German (Featherston 2005). We conclude that the Wh-Processing Hypothesis can account for a considerable amount of wh-order variation using processing-based factors that have been independently introduced to explain other phenomena in sentence processing (e.g. locality- and accessibility-based effects). One possible interpretation of these results is that mental grammars contain only minimal constraints licensing filler-gap dependencies, without complicated constraints specifying how fillers and gaps can be arranged. Instead, independently motivated processing constraints account for the space of judgments. Perhaps the most attractive aspect of this analysis is that it requires no ad hoc constraints to explain the observed variation. The earliest formulations of Superiority, as well as Pesetsky’s D-linking proposal, lack any generality beyond the sphere of multiple wh-phrases. This research also bears important implications for other types of whdependencies that have been labeled as ungrammatical. Indeed, Kluender (1998), while discussing various syntactic islands, suggests that the processing cost of holding a filler in memory and additional referential processing “can interact to yield traditional grammaticality effects.” The proposal made here adds support to this idea and identifies wh-accessibility as a factor that affects language users’ ability to hold a filler in memory. An interesting challenge for extreme versions of the Wh-Processing Hypothesis that attribute all variation in the acceptability of wh-orders to processing comes from cross-linguistic differences in wh-phrase ordering. We refer the reader to Arnon et al. (2006), where we address this challenge. We argue that, even under the assumption of universally processing strategies, the Wh-Processing Hypothesis is not only compatible with cross-linguistic differences, but also make predictions as to when they occur. Our account wh-ordering is no doubt incomplete. Other relevant factors may include lexical frequency and collocation effects, as well as plausibility or the supportiveness of the context. We saw a considerable amount of item variability in our acceptability surveys, which is responsible for the lack of significance in some cases. This may be partly attributed to how strongly the embedding verb predicts an indirect question, but also to the degree of affinity between the embedded verb and its wh-phrase arguments. As the multiple wh-questions we report on here were all presented without preceding context, the participants were likely faced with the task of imagining a proper context for the question (Fedorenko & Gibson (2006) provide corroboration of our results for English, though, with supporting contexts).
202 Hofmeister, Jaeger, Sag, Arnon & Snider In some cases, this may have been particularly challenging and affected the results. Multiple wh-questions seem in general suited to only a very particular kind of discourse setting and pragmatic purpose, and when the specific lexical choices cannot be easily reconciled with this purpose, additional difficulty may arise.
5. Conclusion In this paper, we have identified two major factors that influence whordering and the acceptability and processing of wh-dependencies: accessibility and locality. We have also examined a noticeable difference between the properties of wh-interveners and referential interveners. Our account explains this difference by proposing that explicitness more strongly predicts future activation levels for wh-phrases than it does for anaphoric NPs. Accessibility and locality not only explain the effects observed here, but also motivate them. This is in sharp contrast to the widely held views that a competence grammar must include a constraint like Superiority or Chomsky’s Attract Closest principle, which seem to be both theoretically undesirable and empirically unnecessary.
Acknowledgements This paper has benefited from the comments and input of numerous people including Tom Wasow, Joan Bresnan, Anubha Kothari, Perry Rosenstein and the participants at the 2006 Linguistic Evidence conference in Tübingen. We are also extremely grateful to Ted Gibson and Ev Fedorenko for sharing their knowledge and the resources of TedLab with us, as well as their invaluable expertise in running reading time studies. Any errors are our own.
Notes 1. This dichotomy between anaphoric and non-anaphoric NPs predicts that indefinites and definites used to introduce discourse referents should be easier to process as explicitness increases. We are, however, unaware of any results that reflect this preference. Data on the processing of definites from relevant ex-
Locality and accessibility in wh-questions 203
2.
3.
4.
5. 6.
7.
periments (e.g. Warren & Gibson 2005) only considers definites which require an anaphoric interpretation. To be clear, Warren & Gibson were not looking for such an effect of activation enhancement. The results they present ultimately cannot say that much about the subject because the NP types they use are not forced to have the same interpretation, viz. we, Dan, and the businessman can each be interpreted differently. A true test of enhancement differences would require contextually situated examples with various NP types that can all be linked to the same referent. Conceivably, some other difference between wh-phrases and referential NPs could explain the contrasting influences of explicitness. For instance, activation boosts may be stronger and therefore more predictive for wh-phrases than referential NPs. Explicitness thus would benefit wh-phrase processing more than referential NP processing. Our best hypothesis for this difference, however, is the functional disparity between the two kinds of NPs. Processing difficulty cannot be ascertained merely by acceptability judgments. Fanselow & Frisch (2004) indeed point out that “processing difficulty (understood as including the need to revise an initial analysis) can thus have both positive and negative influences on acceptability.” We proceed with the hypothesis that increased processing difficulty reduces acceptability in the case at hand, given findings that support this relationship for other wh-island phenomena (Kluender & Kutas 1993). In light of the possible criticisms of acceptability judgments, though, we corroborate the findings with more online data from reading time studies, which provide a more direct measure of processing difficulty. A z-score is a standardization derived by subtracting the sample mean from the individual score and dividing the result by the sample standard deviation. Residual reading times describe differences between the actual reading time and the expected reading time, given the word length (in characters). They are derived using linear regression and are standard in research on sentence processing. The disparity between the results of the omnibus F-test and t-tests derives from a decrease of power in the t-tests.
References Alexopoulou, Theodora & Frank Keller 2003 Linguistic complexity, locality, and resumption. Proceedings of WCCFL 22. Somerville, MA: Cascadilla Press. Ariel, Mira 1990 Accessing Noun-Phrase Antecedents. London: Routledge. 2001 Accessibility theory: an overview. In Text Representation: Linguistic and Psycholinguistic Aspects, T. Sanders, J. Schilperoord, W. Spooren (eds.), Amsterdam: Benjamins.
204 Hofmeister, Jaeger, Sag, Arnon & Snider Arnon, Inbal, Bruno Estigarribia, Philip Hofmeister, T. Florian Jaeger, Jeanette Pettibone, Ivan A. Sag & Neal Snider 2005 Long-distance dependencies without island constraints. Poster presented at HOWL 3 (Hopkins Workshop on Language). Arnon, Inbal, Neal Snider, Philip Hofmeister, T. Florian Jaeger & Ivan A. Sag 2006 Processing accounts for gradience in acceptability: the case of multiple wh-questions. Proceedings of BLS 26, University of California, Berkeley. Bard, Ellen, Dan Robertson & Antonella Sorace 1996 Magnitude estimation of linguistic acceptability. Language 72 (1): 32–68. Chomsky, Noam 1973 Conditions on transformations. In A Festschrift for Morris Halle, S. Anderson & P. Kiparsky (ed.). New York: Holt, Rinehart & Winston. Clifton, Charles, Gisbert Fanselow & Lyn Frazier 2006 Amnestying superiority violations: processing multiple questions. Linguistic Inquiry 37 (1): 51–68. Fanselow, Gisbert & Stefan Frisch 2004 Effects of processing difficulty on judgments of acceptability. In Gradience in Grammar, G. Fanselow, C. Fery, M. Schlesewsky & R. Vogel (eds.). Oxford: Oxford University Press. Featherston, Sam 2005 Universals and grammaticality: wh-constraints in German and English. Linguistics 43 (4): 667–711. Frazier, Lynn & Charles Clifton 2002 Processing ‘D-linked’ phrases. Journal of Psycholinguistic Research 31 (6): 633–659. Fedorenko, Evelina & Edward Gibson 2006 Syntactic parallelism as an account of cross-linguistic superiority effects. Unpublished ms., MIT. Gernsbacher, Morton 1989 Mechanisms that improve referential access. Cognition 32: 99–156. Gibson, Edward 2000 The dependency locality theory: A distance-based theory of linguistic complexity. In Image, Language, Brain, Y. Miyashita, A. Marantz & W. O’Neil (eds.), Cambridge, MA: MIT Press. Hawkins, John 2005 Efficiency and Complexity in Grammars. Oxford: Oxford University Press. Karttunen, Lauri 1977 Syntax and semantics of questions. Linguistics and Philosophy 1: 3– 44.
Locality and accessibility in wh-questions 205 Keller, Frank, Martin Corley, Steffan Corley,Lars Konieczny & Amalia Todirascu 1998 Web-Exp: A Java toolbox for web-based psychological experiments (Technical report No. HCRC/TR 99). University of Edinburgh. Human Communication Research Center. King, Jonathan & Marcel A. Just 1991 Individual differences in syntactic processing: the role of working memory. Journal of Memory and Language 30: 580–602. Kluender, Robert 1998 On the distinction between strong and weak islands: a processing perspective. In Syntax and Semantics 29: The Limits of Syntax, P. Culicover & L. McNally (eds.), 241–279. San Diego, CA: Academic Press. Kluender, Robert & Marta Kutas 1993 Bridging the gap: Evidence from ERPs on the processing of unbounded dependencies. Journal of Cognitive Neuroscience 5: 196– 214. Kuno, Susumu & Jane Robinson 1972 Multiple wh-questions. Linguistic Inquiry 3: 463–87. Pesetsky, David 1987 Wh-in-situ: Movement and unselective binding. In E. Reuland & A. ter Meulen (eds.), The Representation of (In)Definiteness. Cambridge, MA: MIT Press. 2000 Phrasal Movement and Its Kin. Cambridge, MA: MIT Press. Warren, Tessa & Edward Gibson 2002 The influence of referential processing on sentence complexity. Cognition 85: 9–112. 2005 Effects of NP type in reading cleft sentences in English. Language and Cognitive Processes 20 (6): 751–767. Wasow, Thomas 1997 Remarks on grammatical weight. Language Variation and Change 9: 81–105.
Eye Tracking as a tool to investigate the comprehension of referential expressions Anke Karabanov, Peter Bosch and Peter König
1. Introduction In the study reported here we use eye-tracking methods closely related to the “Visual World Paradigm” to examine the processes underlying the comprehension of referential expressions. An investigation of the time course of these processes is of special interest for the controversial question of whether pronouns are understood the same way as full noun phrases. Our results show that both full noun phrases and unambiguous anaphoric pronouns are immediately followed by increased fixations on the corresponding referent in the visual scene and that both reach their fixation peak at about 1000 ms after the onset of the referential expression. This suggests that anaphoric pronouns are referentially interpreted very much like definite full NPs, and that no extra processing time is needed to resolve the anaphoric reference. It is crucial for the language comprehender to be able to track to whom or what different expressions refer. Whereas new discourse referents are often introduced by proper names or full lexical noun phrases, all languages also have a wide variety of different anaphors1 that are used for referring back to referents previously introduced into discourse. Besides the fact that anaphorically used expressions cannot be used entirely “out of the blue”, and although there are certain classes of expressions that are indeed primarily used anaphorically, like pronouns in particular, there is a considerable variety of expressions that allow for anaphoric use and there is considerable variation in their lexical specificity. In most languages, anaphors can range from various forms of zero anaphors through pronouns and definite noun phrases to repeated proper names. Due to these differences in lexical specificity anaphors also vary in the degree to which their interpretation is governed by the surrounding discourse. This means that full lexical noun phrases and proper names are typically more limited in their referential options by their lexical content than pronouns, which depend in their interpretation much more on the surrounding context. In this study, we will mainly focus on the interpretation process for full lexical noun phrases (NPs) and
208 Anke Karabanov, Peter Bosch and Peter König for anaphoric pronouns that are unambiguously determined in their reference by the preceding discourse. In order to understand a sentence containing a pronoun, the listener must be able to pick up the interpretation from the relevant antecedents in the text. (1)
The shop assistant told the craftsman that she was angry.
To understand sentence (1) we must know that the word she refers to the shop assistant. While computational linguistics has been struggling for a long time with the complex task of pronoun resolution, humans encountering pronouns in discourse solve this problem with ease and without even being aware of any effort. As speakers of a language we feel that we can immediately relate the pronoun to its referent. This effortlessness is astonishing, especially since it seems to be necessary to consider quite a number of syntactic, semantic, and pragmatic constraints in order to determine the correct antecedent or referent for a pronoun (cf. the listing of factors in Nicol & Swinney 2002). But how do we manage to figure out to whom or what a pronoun refers in a certain context? The mechanisms of this process are not yet clear and have been the issue of controversial discussions in syntactic and semantic theory as well as in psycholinguistics for many years. In the last two decades two opposing hypotheses about pronoun resolution have been held: Gernsbacher (1986) claimed that pronoun resolution happens in a two step process. First the antecedent of the pronoun is identified and then, in a second step, the connection to the referent of the antecedent is established. Tyler & Marslen-Wilson (1982), however, claimed that pronouns are immediately interpreted referentially, just like full lexical NPs or proper names. However the debate between these two positions as to the cognitive processes underlying reference phenomena, until recently, had to rely on empirical evidence of a largely indirect nature, such a reading times, reaction times, or eye movement during reading (cf. surveys in Nicol & Swinney 2002; Rayner & Clifton 2002). To decide the issue, we need to look at the time course of these processes. Eye tracking during spoken sentence comprehension seems to be a method perfectly suited to this purpose. With a head-mounted eye tracker we are able to record up to four eye fixations per second. Considering that each fixation can be seen as an unconscious decision of the participant about where to direct her attention it becomes clear that eye tracking is a very powerful tool to investigate comprehension processes online. Compared to more conventional psycholinguistic measures such as reading time
Eye Tracking as a tool
209
or probe verification, eye tracking can not only produce a huge amount of data, it can also monitor cognitive processes, like language comprehension, without any interruption, by directly monitoring the attentional focus of a participant at any time during a task. Two main experimental setups are used in most eye tracking experiments concerned with language processing. In one setup participants’ eye movements are recorded during a reading task. Obviously this type of experiment is mainly suited to investigations of the reading process, specifically the attentional focus in the text. The second has become known under the term “Visual World Paradigm”. In this setup, developed by Tanenhaus and colleagues (Tanenhaus et al. 1995), eye movements are recorded while participants view a visual display that is paired with an accompanying linguistic auditory stimulus. In the last ten years this paradigm has extensively been used to study different aspects of language processing. In the experiment to be reported here we make use of a slightly modified Visual World setup. In contrast to many other studies within the Visual World Paradigm (Runner, Sussman & Tanenhaus 2003; Arnold 2000; Tanenhaus 2000; etc.), we did not use cartoons or line drawings for our visual displays but detailed photographs of scenarios built up with Playmobil™ toy characters. This was motivated by the idea that line drawings may, and usually do, already contain an interpretation of the drawn object. The artist preselects certain features, which are depicted in the drawing, highlights some features and discards others. This may of course influence the focusing behaviour of the viewers. By using photographs of scenes composed from prefabricated objects as visual stimuli we thus hope to increase the naturalness and the general validity of our findings. In the following we will look at two questions, which are essential for the understanding of pronoun resolution. First we want to find out if there are any interesting differences in the fixation probabilities for the referents of different referential expressions. Second, we want to investigate the time course of fixations for both full noun phrases and pronouns. And of special interest is the question whether pronouns require additional time for “resolution” or whether they are interpreted immediately, just like proper names, as was argued by Tyler & Marslen-Wilson (1982) and Garrod, Freudenthal & Boyle (1994). By determining if there is additional time needed for the resolution of pronouns, we hope to be able to provide evidence for or against the assumption of a distinct mechanism that establishes a link between pronoun and antecedent. If no temporal difference in the resolution process of pronouns and definite full NPs is found, we have a strong support for Tyler & Marslen-Wilson’s assumption that pronouns are interpreted di-
210 Anke Karabanov, Peter Bosch and Peter König rectly with respect to the discourse representation and that no prior linking to antecedent expressions needs to be assumed. 2. Methods The participants, who volunteered for the experiment, were 12 native German speakers (5 male). All participants were students of Cognitive Science at the University of Osnabrück. They were aged between 20 and 25 (mean 21.9), had normal or corrected-to-normal vision, and none reported any speech or hearing deficits which could have influenced their performance. One participant had to be excluded from the experiment because this participant’s gaze remained almost static during the whole experiment, and we had to abort the experiment with another participant due to poor calibration (mean error >0.5°). In total, we were able to include ten participants in the analysis. All participants were naïve about the purpose of the experiment and received either course credits or payment for their participation. All participants were informed of the purpose of the experiment only after it had been completed. Ten photographs were paired with pieces of narrative discourse. The photographs showed pseudo-natural everyday situations built up with PlaymobilTM toy characters (See Figure 1). Each of the pictures comprised three objects that were named in the narrative discourse. Two of these objects were human characters, whereas the third one was either an inanimate object or an animal. The three referents were named in the corresponding discourse both by a lexical NP and by pronouns. The distractor objects that were present in each photograph were either inanimate objects or animals that fit the general context of the scene. For each photograph, a corresponding piece of narrative discourse was presented via loudspeakers. The discourses consisted of three German sentences. The first sentence always described the general scene that was visible in the corresponding photograph, without referring to any specific object in the scene. The second sentence introduced the two human referents. The third sentence referred to each of the human referents at least once with a pronoun and introduced the non-human referent with a full lexical NP as in Table 1. The sentences were pre-recorded and were spoken by a female native German speaker (the first author). All pieces of discourse had the same number of syllables and the duration of the discourse pieces ranged between 13.5 and 13.9 seconds. Each of the discourses had four variants. The first two sentences, introducing the whole scene and the two human referents, did not change. In the third sentence, however, the pronouns were permutated as in Table 1.
Eye Tracking as a tool
211
Figure 1. Shows an example of the visual stimuli presented to the participants; in this case with the following stimulus text: “Heute ist Markt im Dorf. Die Marktfrau streitet mit dem Arbeiter. Sie sagt jetzt gerade, daß er kein’ Ärger machen und das neue Fahrrad zurückgeben soll, das er sich geliehen hat.” [It’s market day in the village. The market woman is arguing with the worker. She’s just saying that he should not make any trouble and should give the new bike back that he borrowed.] Table 1. Sample narrative 1. Heute ist Markt im Dorf. [It’s market day in the village.] 2. Die Marktfrau (NP1) streitet mit dem Arbeiter (NP2). [The market woman is arguing with the worker.] 3. Sie (Pro1) sagt jetzt gerade, [She’s just saying] A daß er (Pro2) kein’ Ärger machen und das neue Fahrrad (NP3) zurückgeben soll, das er (Pro4) sich geliehen hat. [that he should not make any trouble and should give the new bike back that he borrowed.] B daß sie (Pro2) kein’ Ärger machen und nur das neue Fahrrad (NP3) zurückhaben will, das er (Pro4) sich geliehen hat. [that she does not want any trouble and only wants to have the new bike back that he borrowed.] C daß er (Pro2) ihr (Pro3) jetzt das neue Fahrrad (NP3) zurückgeben soll, das er (Pro4) sich geliehen hat. [that he should now give her the new bike back that he borrowed.] D daß sie (Pro2 ihm (Pro3) jetzt das neue Fahrrad (NP3) zurückgeben will, das er (Pro4) sich geliehen hat. [that she now wants to give him the new bike back that he borrowed.]
212 Anke Karabanov, Peter Bosch and Peter König All conditions start with a pronoun (Pro1) that has NP1 as its antecedent. Pro2 has as its antecedent NP2 in the conditions A and C and NP1 in conditions B and D. Pro3 in conditions C and D has the antecedent NP1 in Condition C and NP2 in condition D. All four conditions end with Pro4, which has NP2 as its antecedent. The idea behind the permutation is to see whether there are differences in fixation probability that might occur due to the relationship between pronoun and antecedent. Linguistic theory distinguishes between pronouns that have their antecedents within the same sentence and those that do not. Whereas pronouns with antecedents in another sentence are referential anaphoric pronouns, pronouns with their antecedent within the same sentence may either also be regular anaphoric pronouns or may be c-commanded and bound by their antecedents. In our material, all pronouns with their antecedent in the same sentence belong to the class of c-commanded pronouns that are bound by their antecedents and, as argued in Bosch (1983), occur non-referentially. In contrast, all pronouns in our material with antecedents in the preceding sentence are ordinary referential pronouns. According to this distinction, we would expect higher fixation probabilities on the matching referents for the referential pronouns. From the arrangement of our four conditions we hope to be able to test whether the differentiation in bound versus referential pronouns is just a theoretical one, or if it can also be observed in human language understanding. Due to the fact that each narrative had four different conditions, we have a total amount of 40 stimulus sentences (10 pieces of discourse times 4 conditions). Each of the participants heard either conditions A and C or conditions B and D of each story plus 30 additional stories that served as fillers. In total 20 experimental and 30 filler scenarios were presented to each participant. The auditory presentation of the discourses varied between 13.5s and 13.9s for the experimental discourses and between 13s and 17s for the filler discourses. Even though the auditory stimulus was often shorter, each picture was presented to the participants for 17 seconds. Eye movements were recorded using a binocular eye-tracker (Eye Link II, SR Research, Mississauga, Ontario, Canada, 2003). Three infrared cameras recorded the position of the participant’s head and the movements of both eyes. The two cameras that record the eye fixations were placed under the participant’s eyes. The eye-tracker was controlled by a Pentium 4 PC (Dell Inc., Round Rock, TX, USA) that sampled the eye position signal at a rate of 250 Hz. Besides video-based pupil tracking, the eye-tracker included infrared cornea reflection which reduces susceptibility to headband slips and motion.
Eye Tracking as a tool
213
At the beginning of the experiment no information about the purpose was given. Both eyes were calibrated using the nine-point grid procedure. During this procedure, participants were asked to fixate on a small point, which appeared randomly at one of nine locations on the monitor. Only calibration values with a mean error <0.5° were accepted during the validation procedure. Using a standard setting of the Eyelink II, the better eye was selected. Before each stimulus presentation a fixation point was presented. Stimulus presentation was triggered by the experimenter after the participant had stably fixated on the fixation point. This fixation point was used to perform a correction for drifts and slips of the eye-tracker and allowed participants to take a short break between trials. A total amount of 50 stimuli (20 experimental and 30 filler) was presented to each participant and the order of presentation was randomized. The experiment lasted about 30 minutes. Participants were instructed to “study the images carefully”.
3. Results As our visual stimuli differed considerably from the stimuli used in earlier visual world studies we first had to check for big variance in fixation probabilities between the different stimulus pictures and for the different objects within single scenes. By doing this, we could make sure that our stimulus material was valid and worthy of further examination. By looking at the first 2000 ms of our stimulus presentation – in which the scene was introduced without direct reference to any of the objects in the scene – we were able to ascertain that our scenes were perceived as meaningful stimuli with the two human referents in the centre of the scene. The analysis showed that the participants had a preference for the human characters in the scene already before they were mentioned explicitly in the narrative (see Figure 2). This analysis furthermore revealed a relatively high fixation probability for distractor objects and thereby demonstrated that the distractor objects were also perceived and that the participants’ focus did not exclusively lie on the three referents relevant in the stimulus sentences. The relatively high fixation probabilities that could not be assigned to any object in the scene (16.1% Beyond Object Fixations) can be explained by the quite narrow definitions of regions of interest and by the fact that objects often stood so close to each other that it might have been possible to fixate both by looking in the empty space between them. To check for the general influence of full NPs and pronouns on the fixation probabilities we summed up the fixation probabilities for all participants
214 Anke Karabanov, Peter Bosch and Peter König
Figure 2. Shows the accumulated number of fixations on different objects in the pictures over the first two seconds before the onset of Sentence 2, which introduces the referents. As indicated on the x-axis, the columns stand for the different visual stimuli, with the last column showing the summed fixation probabilities over all pictures. On the y-axis, the fixation probability is depicted. The grey-shading indicates the fixated object.
and all pictures. By this method we were able to obtain a graph that shows the change in fixation probability over the whole time course of the story (see Figure 3). Since we also summed over all the conditions this first evaluation of general fixation probabilities caused by full NPs and pronouns we can only include the first and the last of the pronouns into our analysis since only they were the same in all conditions. Figure 3 shows that both full noun phrases and pronouns elicit an increase in fixation probabilities for the matching referent. However, the fixation probabilities for full noun phrases (mean 43%) are higher than the fixation probability for pronouns (mean 35%). The time frames in which the fixation probabilities on a referent were significantly higher (p <0.005) than the fixation on other referents are depicted in the horizontal bars above Figure 3. In order to compare the peaks in fixation probabilities that were caused by full NP reference with those caused by pronoun reference, we looked at the fixation curve of each referent during both explicit and pronoun naming in a time window beginning 500ms before the onset of the referential expression and lasting until 2000 ms after the onset of the expression (Figure
Eye Tracking as a tool
215
Figure 3. Shows the fixation probabilities for the different referents summed up over all conditions, subjects, and stimulus sentences. The x-axis shows the time course of the stimulus presentation and the y-axis shows the fixation probability. The shaded vertical bars in the background represent the duration of the referential expressions and interaction verbs. The horizontal bars at the top of the figure depict the time slots in which the t-tests between the different referents became significant.
4A for pronoun naming and Figure 4B for explicit naming). In Figure 4A we can see the fixation probabilities caused by the two pronouns. Pronoun 1 is the first pronoun in our narrative, standing very close to its antecedent, whereas Pronoun 2 is the last pronoun of our story, and has the longest distance to its antecedent. It can be seen in Figure 4A that Pronoun 1 reaches its probability peak at 1000 ms after the onset of the referential expressions and decreases directly after that. Pronoun 2 reaches its peak only at about 1500 ms. However, the difference in the two fixation curves did not become significant. In Figure 4B the different fixation curves are shown for the three referents that are referred to by a full NP. It is interesting to note that Referent 2 has already reached its peak 500 ms after the onset of the referential expression and stays at this plateau until 1500 ms after the onset of the expression, whereas Referent 1 only reaches its fixation peak at 1500 ms after the onset of the referential expression. Referent 3 has a fixation plateau from 1000 ms to 1500 ms. The fixation pattern of Referent 2, with its early peak, is significantly different from both other fixation curves (t = 0.0148 for the com-
216 Anke Karabanov, Peter Bosch and Peter König
A
B
C
Figure 4. Figure 4B shows the fixation probabilities for the three different referents when they were named explicitly. The bold line indicates Referent 1, the dark dashed-dotted line Referent 2, and the light dashed line Referent 3. The vertical dashed line indicates the onset of the referential expression. The time that is depicted on the x-axis ranges from 500 ms before the onset of the referential expression to 2000 ms after its onset. The y-axis shows the fixation probability in percent. Figure 4A shows the fixation probabilities for pronoun naming. In Figure 4C we see the mean fixation probabilities for explicit and pronoun naming. The bold line shows the mean fixation probability for explicit naming and the dasheddotted line the probability for pronoun naming. The stars located around the word onset and around the fixation peak show the variance.
Eye Tracking as a tool
217
parison between Referent 1 and Referent 2 and t = 0.128 for the comparison of Referent 3 vs. Referent 2). The difference between Referent 1 and Referent 3 did not become significant. We merged the explicit fixation curves and the pronoun fixation curves for all referents. In doing so, we obtained one fixation curve for pronoun and one for full NP reference (Figure 4C). Both pronouns and full NPs lead to an increase of fixations on the visual referent. The fixation probability for pronouns reached its peak at 1000 ms after the onset of the pronoun. Until 1500 ms after the pronoun onset the fixation probability decreased only slightly, forming a plateau of highest fixation probability between 1000 and 1500 ms. Also in the case of full NPs, the peak of fixation probabilities was nearly reached within 1000 ms after the onset of the expression. However, fixations following full NP reference kept on increasing until 1500 ms after the onset of the expression forming a slightly increasing plateau from 1000 to 1500 ms. The peak in fixation probabilities caused by the full NPs reaches up to 43% and is significantly higher (p =0.00005) than the fixation peak caused by pronouns, which reaches up to 32%. As already mentioned, each story had four different conditions. To account for differences between the four conditions we calculated the average of fixation probabilities over pictures and participants for each condition. The time frame between 7000 and 10000 ms was of special interest for the comparison between the four conditions, since it was in this time segment that the differences between the four conditions occurred (see Figure 5). To find significant differences between the four conditions, we conducted a series of t-tests (t = 0.05), again for each 500 ms time slot. We tested each referent in Condition A against its counterpart in Condition B, and each referent in Condition C against its counterpart in Condition D. The only comparison that became significant was the comparison of Referent 2 in conditions A and B in the time slot between 10000 and 10500 ms. This time slot is approximately two seconds after the onset of the crucial pronoun in this condition. During this time interval, the probability to fixate Referent 2 in Condition A was significantly greater than in Condition B. No other time frame and no other referent showed significant differences. The comparison between conditions C and D did not yield any significant results.
218 Anke Karabanov, Peter Bosch and Peter König
Figure 5. Shows the time frame (6–11 seconds) in which the differences between the four conditions occur. The only time slot in which there is a significantly different fixation probability between the four conditions is between 10000 and 10500 sec in conditions A and B. At that time the probability of fixating Referent 2 is significantly higher in Condition A.
4. Discussion In this study we looked at the comprehension of spoken text with a special focus on how referential expressions, in particular full NPs and pronouns, are understood. Our results can be summarized by three main findings: 1. Both full noun phrases and pronouns cause increased fixations on the matching referent. 2. The resolution of pronouns in unambiguous texts happens just as fast as the resolution of full NPs 3. There seem to be some pronouns that do not elicit higher fixation probabilities on the matching referent. These differences in referentiality between different pronouns may be due to the syntactic relation between the pronoun and its antecedent.
Eye Tracking as a tool
219
In the following discussion, we will first try to interpret each of these three findings, followed by an outlook on how the study presented could be improved in future work. Finally, we will try to place our findings into the existing framework of previous research. Our results show that both full NPs and pronouns caused increased fixations on the matching visual referents. However, fixation probabilities for full NPs were significantly greater than the fixation probabilities for pronouns. This means that even though both kinds of referential expressions do elicit higher fixation probabilities on their matching referents, we are still able to detect differences between full NPs and pronouns in the amount of fixations they cause. Since we only used unambiguous pronouns2 in our stimulus material, the differences in fixation probability between NPs and pronouns cannot be attributed to the fact that participants had problems in finding the right antecedents for the pronouns. However, even though we were able to detect this difference in fixation probability, this does not have to mean that full noun phrases generally create a “stronger” connection to the matching referent in the visual scene than pronouns do. We suggest, rather, that the difference in the fixation probability that we found would be due to the fact that, at least in our materials, the full NPs are all cases of “first mention” uses, whereas the pronouns only resume a referent that was already explicitly referred to previously by a full NP. In other words: While the participant’s attentional focus gets directed by a full NP to an object not previously mentioned in the discourse, the pronouns only re-direct the focus back to an object that was mentioned and focussed before. It seems reasonable that a newly introduced object causes more fixations than one that has previously been introduced and examined. We also found that each full NP produces a fixation probability for the matching referent that is significantly higher than the fixation probabilities for all other referents and objects at this time point. The fixation probabilities caused by pronouns were also tested for significance. It is interesting to note that of the two pronouns tested3 only the one with the greater distance to its antecedents caused a significant difference in fixation probability compared to the other human referent. The pronoun with the shorter distance to its antecedents did not produce a significant difference compared to the fixation curve for the competing human referent, even though the magnitude of the fixation peak in percent was higher than for the second significant pronoun. This could be explained by the fact that the attentional focus was directed to both of the human referents at the time when the first pronoun occurred, since this pronoun directly followed the explicit introduction of both human referents by a full NP. Another interesting aspect is the temporal duration of the sig-
220 Anke Karabanov, Peter Bosch and Peter König nificant intervals for each referential expression. While the significant interval following the full NP referring to Referent 1 starts at the offset of the word and the NP referring to Referent 3 starts 500 ms after the offset of the word, the significant interval following the NP referring to Referent 2 already starts 200 ms after the onset of the referential expression. The fact that the significant interval for Referent 2 already starts very shortly after the onset of the expression could be due to the fact that Referent 2 is always preceded by an interaction verb like talk, fight, speak that requires a second human character as an object. This might enable the listener to anticipate the continuation of the story, since – apart from Referent 1, already referred to by the subject expression – Referent 2 is the only human object present in the visual scene and therefore the only one that allows a plausible unfolding of the story. The significant interval following the second pronoun starts 500 ms after the offset of the referential expression and is thus similar to the pattern of explicit naming for Referents 1 and 3. Besides the significant fixation probabilities caused by referential expressions, we were also interested in the temporal resolution of different referential expressions in general. As a first step, we compared the temporal pattern of the fixation curves caused by the three NPs. This comparison gives results very similar to those obtained by the comparison of the temporal duration of the significant intervals caused by full NPs. It shows that the NP referring to Referent 2 reaches its fixation peak already 500 ms after the onset of the referential expression. This is of special interest since it means that the peak is reached even before the end of this NP. As already mentioned, we assume that the interaction verb triggered an anticipation effect that may have caused this early fixation peak. The non-human Referent 3 reached its fixation peak 1000 ms after the onset of the referential phrase, whereas Referent 1 took 1500 ms until the highest fixation peak was reached. That Referent 3 reached its fixation peak 500 ms earlier than Referent 1 may partly be caused by the fact that the expressions referring to Referent 3 were on average 600 ms shorter than those referring to Referent 1. We further compared the temporal pattern of the fixation curves caused by the pronouns. As already mentioned, this part of the analysis included only those pronouns that are the same in all four conditions. Whereas the first pronoun that we were looking at occurred in the sentence directly following the sentence containing its antecedent, the second pronoun had a full sentence distance to its matching antecedent. Comparing these two pronouns, we found that Pronoun 1, having a short antecedent distance, reaches its fixation peak already after 1000 ms, whereas Pronoun 2 needs 1500 ms until
Eye Tracking as a tool
221
it reaches its fixation peak. However, the difference in percentage between the fixation probabilities caused by the two pronouns is not statistically significant. Comparing the mean temporal fixation pattern of full NPs with the mean fixation pattern for pronouns, we saw that the fixation curves look very similar. Both full NPs and pronouns reach the highest fixation probability between 1000 and 1500 ms after the onset of the referential expression. The fact that both have their highest fixation probability in the temporal interval between 1000 and 1500 ms after the onset of the expression indicates that there is no temporal delay for the resolution of pronouns in unambiguous texts. One possible objection to this interpretation of our findings is that we compared the fixation curves from the onset of the referential expressions and not from their offset. Due to the longer duration of full NPs, their offset is much later than for pronouns. This means that even though the temporal resolution for both kinds of expressions is equal with respect to their onset, the temporal delay measured from the offset is much bigger for the pronouns. However, we decided to take the onset of the expression as a fixed point since we assume that even as the referential expression unfolds, participants may well already anticipate the matching referent. This assumption is supported by the eye-tracking experiment of Hartmann (2004) which examined gender effects in sentence processing in German. Her results show very convincingly that the gender information carried by the determiner of a full NP has an early effect on fixation probability for a matching referent, and supports an interactive view on language understanding, which claims that the comprehension process already starts during word recognition and not only at the offset of the expression (Gernsbacher, 1989, MacDonald & MacWhinney, 1990, Tyler & Marslen-Wilson, 1982). We should add, perhaps, that although we believe that our results contribute to a better understanding of the differences in the comprehension of full lexical NPs as opposed to pronouns, we did obviously not systematically vary all potentially relevant parameters. That would require a far more comprehensive study. The current experiment should rather be seen as a more modest preliminary to such a systematic comparison, because first of all we had to establish that in the comprehension of spoken text vis-a-vis a relevant visual scene referential expressions do in fact reliably cause a focussing behaviour that is correlated to the hypothetical comprehension processes. The third main finding of this study was that there seem to be some pronouns that do not elicit higher fixation probabilities for the matching refer-
222 Anke Karabanov, Peter Bosch and Peter König ent. We obtained this finding by comparing the permutations in pronoun order in the four different conditions. The four conditions differed with respect to the syntactic relation that the pronoun Pro2 has to its antecedent (see Table 1 and Figure 5 above). Comparing all four conditions, just one single time slot is shown to be significant: The probability for fixation on Referent 2 is significantly greater in Condition A than in Condition B in the time slot between 1500–2000 ms after the onset of PronounA4 and PronounB respectively. According to our first claim that both full NPs and pronouns elicit higher fixation probabilities on the matching referent, it was expected that fixation probability in Condition A would become greater for Referent 2 than in Condition B, since PronounA refers to Referent 2 whereas PronounB refers to Referent 1 instead. So, the greater fixation probability in Condition A is not surprising. However, since PronounB refers to Referent 1 we would expect an increase in fixations on Referent 1 in Condition B compared to Condition A. In our data, however, we find no indication for any such increase. On the contrary, after PronounB the fixation curve for Referent 1 keeps on decreasing. What could be the reason for this strange asymmetry in referential strength between the pronouns in conditions A and B? There are several possible answers to this question. The first one takes the attentional focus of the participants as a possible explanation for the missing fixation increase in Condition B. As already mentioned, the differences between pronouns start after the first pronoun, which is the same in all four conditions. This first pronoun refers to Referent 1 in all of the conditions. This means that that Referent 1 is in the centre of attention when the variation in the different conditions start. Since the subsequent PronounA refers to Referent 2, listeners have to change their centre of attention to Referent 2 when the pronoun occurs. In Condition B however, the focus of attention stays with Referent 1 since PronounB just re-refers to Referent 1. In this condition no new information is added and the attention stays the same over a longer time period. The fact that Condition B does not require a shift in attention to another referent can account for the missing fixation increase. Another possible explanation for the asymmetry between the pronouns in conditions A and B comes from linguistic theory. It could be that the differences in fixation probability are caused by a difference in the referential properties of the pronouns. Bosch (1983) proposed that pronouns can be divided in two main groups: The regular anaphoric pronouns, which function referentially, and “syntactic” pronouns that do not function referentially, but are c-commanded and syntactically bound by their antecedents. The difference as far as German personal and possessive pronouns are concerned is
Eye Tracking as a tool
223
not a difference in form: The same forms occur in either use. Whereas the anaphoric pronouns occur referentially and the relation to their antecedents is mediated literally by co-reference, the c-commanded pronouns just link up to their antecedents by syntactic agreement. Their relation to their antecedents is free of reference and purely syntactic in nature. According to this theory, the use of the pronoun Pro2 in Condition A would be anaphoric; PronounA is in a new sentence and thus cannot be c-commanded by its antecedent. The pronoun in Condition B, however, is c-commanded by its antecedent, the subject of the sentence. PronounB would therefore be interpreted syntactically rather than referentially. If this interpretation of our results is correct, it would mean that the distinction between referential and bound pronouns is not just a theoretical one but a difference that is implemented in human language understanding. However, on the basis of the current data we are not able to decide which of the two alternatives is the correct account of the differences in fixation probability between conditions A and B. Further investigation will be required. The comparison between conditions C and D did not yield any significant results. This may be mainly due to a mistake in the construction of the stimuli. As can be seen in Table 1, we tried to per permute the two pronouns in conditions C and D. However the temporal distance between these two pronouns, Pro2 and Pro3, was so small that a comparison was just not possible with an approach based on saccadic movements5.
5. Conclusions What conclusions can we draw from the results obtained in this study, and how can we relate them to the existing body of knowledge about the resolution of referential expressions? We were able to confirm that both full NPs and pronouns elicit eye movement but that the in-peak fixation probability is generally smaller for pronouns than for nouns. As far as we know, the literature does not contain an explicit comparison between the fixation probabilities of full noun phrases and pronouns. While Cooper (1975) includes pronouns in the same class as full NPs in his experiment, the experiments of Runner (2003) and Arnold (2000) focus on the fixation probabilities for pronouns and do not comment on the fixation probabilities for the full NPs in their stimuli. Concerning the fixation probabilities for full NPs that we obtained in our experiment, we were also able to detect strong anticipation effects caused either by the word preceding the actual NP (such as an interaction verb indicating
224 Anke Karabanov, Peter Bosch and Peter König that another human referent would follow) or caused by the unfolding NP itself. These results confirm the findings of Hartmann (2004) and Dahan, Swingley, Tanenhaus & Magnuson (2001), suggesting that anticipation plays an important role in understanding and that fixation probabilities are influenced already by anticipation effects as the stimulus expression unfolds. Concerning the temporal resolution of the comprehension process for different referential expressions, we were able to contribute to a long ongoing discussion in which two main conflicting hypotheses have been around since the middle of the 1980s. As far as we know, eye tracking had not until now been used to investigate this issue. We hope to have shown that eye tracking methods provide a highly suitable online measure for the online investigation of the corresponding comprehension processes. As our results show no difference in the comprehension of nouns and pronouns in unambiguous sentences, they can be interpreted as supporting the hypothesis of Tyler & Marslen-Wilson (1982), who claimed that pronouns are interpreted immediately referentially in the same way as proper names or full NPs. Tyler & Marslen-Wilson’s view seems to be more consistent with our findings than the hypothesis of Gernsbacher (1989), who claimed that pronouns are resolved in two stages, first a search for antecedents and then, via the antecedent, linking up to the referent. According to this theory, the resolution of pronouns has to take longer time than the resolution of full NPs since it requires the additional process of identification of an antecedent. Finally, we found a first indication that the distinction between referential and nonreferential pronouns as proposed by Bosch (1983) might be not just theoretical but may have empirical consequences in language comprehension. However, as mentioned, the current data are not conclusive in this respect and certainly here more work is required.
Notes 1. Here and in the following text we will use the term anaphor in the wide, classical sense (“expression used anaphorically”) and not in the more technical sense that is found in Binding Theory. 2. Gender agreement allowed only for one possible antecedent for each of the two pronouns. 3. Only two of the four pronouns in each stimulus text were included in this analysis since only these two were the same in all four conditions. A detailed analysis of the pronoun differences in the different conditions will be discussed later. 4. Indices behind the pronouns refer to the conditions in which the pronouns occur, e.g. PronounA= The pronoun in Condition A.
Eye Tracking as a tool
225
5. While the two pronouns in Condition C refer first to Referent 2 and then to Referent 1, the arrangement of the pronouns was exactly the opposite in condition D. However, in both conditions the two pronouns fell into the same time slot of 500 ms, which made it impossible to analyse any differences caused by the arrangement of the pronouns.
References Arnold, Jennifer, Janet G. Eisenband, Sarah Brown-Schmidt & John C. Trueswell 2000 The Rapid Use of Gender Information: Evidence of the Time Course of Pronoun Resolution from Eye Tracking. Cognition 76: B13–B26. Bosch, Peter 1983 Agreement and Anaphora – A Study of the Roles of Pronouns in Discourse and Syntax. London /New York: Academic Press. Cooper, Roger M. 1974 The Control of Eye Fixation by the Meaning of Spoken Language. Cognitive Psychology 6: 84–107 Dahan, Delphine, Daniel Swingley, Michael K. Tanenhaus & James S. Magnuson 2001 Time Course of Frequency Effects in Spoken Word Recognition: Evidence from Eye Tracking. Cognitive Psychology 34: 317–367 Garrod, Simon, Daniel Freudenthal & Elizabeth Boyle 1994 The role of different types of anaphor in the online resolution of sentences in a discourse. Journal of Memory and Language 33: 39–68. Gernsbacher, Morton A. 1989 Mechanisms that improve referential access. Cognition, 32, 99–156. Gernsbacher, Morton A. & David J. Hargreaves 1988 Accessing sentence participants: The advantage of first mention. Journal of Memory and Language 27: 699–717. Hartmann, Nadine 2004 Processing Grammatical Gender in German. Bachelor Thesis, unpubl. Univ. of Osnabrück, Cognitive Science, http://www.cogsci.uniosnabrueck.de/~CL/download/ Hartmann_GramGender.pdf. MacDonald, Maryellen C. & Brian MacWhinney 1990 Measuring inhibition and facilitation from pronouns. Journal of Memory and Language 29: 469–492. Nicol, Janet & David A. Swinney 2002 The psycholinguistics of anaphora. In Anaphora, A. Barss (ed.), 72– 104. Oxford: Blackwell. Rayner, Keith & Charles Clifton, Jr. 2002 Language processing. In Stevens Handbook of Experimental Psychology, 3rd Editon, Vol. 2, Memory and Cognitive Processes, D. Medin (ed.), 261–316. New York: John Wiley & Sons.
226 Anke Karabanov, Peter Bosch and Peter König Runner, Jeffrey T., Rachel S. Sussman & Michael K. Tanenhaus 2003 Assignment of Reference to Reflexives and Pronouns in Picture Noun Phrases: Evidence from Eye Movements. Cognition 81 (1): B1–B13. Tanenhaus, Michael K., Michael J. Spivey-Knowlton, Kathleen M. Eberhard & Julie C. Sedivy 1995 Integration of visual and linguistic information in spoken language comprehension. Science 268: 1632–1634. Tanenhaus, Michael K., James S. Magnuson, Delphine Dahan & Craig Chambers 2000 Eye Movements and Lexical Access in Spoken Language Comprehension: Evaluating a Linking Hypotheses between Fixations and Linguistic Processing. Journal of Psycholinguistic Research 29 (6): 557–580. Tyler Lorraine K. & William Marslen-Wilson 1982 Processing utterances in discourse context: Online resolution of anaphors. Journal of Semantics 1: 297–314.
Corpus data and experimental results as prosodic evidence: On the case of stressed auch in German Denisa Lenertová and Stefan Sudhoff
1. Introduction Additive particles (like German auch, English also, Dutch ook, etc.) belong to the class of focus particles, as they typically associate with a focused element in their c-command domain. As illustrated in (1a), the associated focus carries the nuclear accent, whereas the particle itself remains unstressed. However, additive particles – in contrast to exclusive or scalar particles like nur/only and sogar/even, respectively – can also relate to a preceding constituent and carry the nuclear accent themselves, cf. (1b). The associated constituent (AC; indicated by square brackets throughout this paper) is added to a contextually given set by means of the particle. In (1a), the relevant set includes places where Cornelius used to study, and in (1b), it is the set of people that studied at the place under discussion. Crucially, only in (1a) is the AC marked by the nuclear accent (indicated by capitals). (1) a.
Cornelius hat auch [HIER] studiert. Cornelius has also here studied ‘Cornelius also studied here.’ (= additionally here) b. [Cornelius] hat AUCH hier studiert. ‘Cornelius studied here, too.’ (= additionally Cornelius)
In German, the AC of stressed auch can be located either in the prefield (i.e., preceding the finite verb in verb-second clauses), cf. (1b), or in the middlefield (i.e., following the finite verb in verb-second clauses or the complementizer in verb-final embedded clauses), cf. (3) below. Constructions like (1b) pose a serious problem for existing theories of focus particle constructions (cf. Büring & Hartmann 2001 and references therein), as two requirements assumed to be crucial are not fulfilled in the case of stressed auch: on the one hand, the particle does not c-command the AC; on the other hand, the AC does not bear the focus accent. For this reason, the concept of association with focus has been questioned (cf. Reis &
228 Denisa Lenertová and Stefan Sudhoff Rosengren 1997). Krifka (1999) presents a different approach, complementing the original concept of focus sensitivity with the Contrastive Topic Hypothesis given in (2). (2)
The associated constituent of stressed postposed particles is the contrastive topic of the clause in which they occur. (Krifka 1999: 113)
Krifka argues that sentences like (3), taken from Reis & Rosengren (1997: 253), are compatible with his proposal, as multiple topics are possible if they are not equally ranked. In (3), both einen Gauguin and Peter are assumed to be topics, the former having scope over the latter. (3)
Mensch, Paul besitzt einen Gauguin! ‘Boy, Paul possesses a Gauguin.’ Einen Gauguin besitzt [Peter] auch a Gauguin possesses Peter also ‘Peter possesses a Gauguin, too (, aber ihm fehlen andere Impressionisten). (, but he doesn’t have other impressionists).’
Another crucial assumption is that, although they have the semantic properties of contrastive topics, ACs of stressed auch are not necessarily marked as such prosodically. This is in contrast with the extensive literature on the prosodic properties of contrastive topic constructions in German, which relates them to the so-called hat pattern intonation involving a rising accent on the topic and a falling nuclear accent (cf. Féry 1993, among others). More specifically, Frascarelli & Hinterhölzl (2007) claim that German contrastive topics are marked by L*H accents. Braun (2005), on the other hand, stresses the importance of gradual phonetic parameters such as peak height instead of categorical distinctions between accent types. This paper deals with (i) the specific prosodic realization of constituents associated with stressed auch, (ii) possible generalizations about the optionality of the prosodic marking, and (iii) its perceptual relevance. We will address evidence from two different sources, a spoken language corpus (Section 2) and several controlled production and perception experiments (Section 3), and discuss the relation between the results as well as their theoretical consequences (Section 4). We will present supportive evidence for the prosodic marking of ACs of stressed auch and show that there is no 1:1 mapping between the status of being associated with the particle and accentuation. ACs must be accented
Corpus data and experimental results as prosodic evidence
229
under certain circumstances, but rather than a particular accent type, gradual phonetic parameters – as proposed by Braun (2005) – are decisive for the marking.
2.
Corpus study
We analyzed a corpus of 225 utterances with stressed auch extracted from 9 movies and 12 episodes of a TV series.1 The constituents associated with the focus particle were determined on the basis of context information, and their prosodic properties were analyzed using the software Praat. Finally, we annotated all occurring accents in terms of GToBI (Grice et al. 2005). The corpus is heterogeneous with respect to the syntactic structure of the utterances, the syntactic function of the ACs, and their location. More specifically, it contains verb-first (9.8%; imperatives and polar interrogatives), verb-second (69.7%; declaratives), verb-final (4.9%; embedded argument and adjunct clauses), and elliptical (verbless) structures (15.6%). The ACs serve as subjects, objects, adverbials, and predicatives, and they are located in the prefield (47.1%) or in the middlefield (32%).2 In most utterances, auch is associated with a single constituent (93.3%). However, there is a small group of utterances (6.7%) containing two ACs distributed over the clause. We will first discuss the former type of construction and briefly return to the cases with ‘conjoined’ ACs in Section 2.2.
2.1. Utterances with a single AC The overall results of the accent annotation in the group of utterances with a single AC (N = 210) show that only 52.4% of ACs are accented, carrying LH* (25.7%), L*H (16.7%),3 or H* (10%) accents, whereas the remaining ACs are either unaccented (45.7%) or deleted due to ellipsis (1.9%). Note, however, that the corpus contains much more pronominal (76.7%) than non-pronominal ACs (23.3%). Due to the suspected influence of this factor on the overall accent distribution, we looked at the two groups separately. As shown in Figure 1, unaccented ACs exclusively fall into the pronominal category, whereas non-pronominal ACs always carry an accent. We observed that most accented pronominal ACs are located in the prefield; the pronominal ACs in the middlefield are mostly unaccented.
230 Denisa Lenertová and Stefan Sudhoff
Figure 1. Accent distribution for non-pronominal ACs (left panel, N = 49) and pronominal ACs (right panel, N = 161)
A comparison of the accent distribution between the groups of accented non-pronominal (N = 49) and pronominal (N = 61) ACs reveals that LH* is the most frequent accent in both groups, its proportions being quite similar (51% of the non-pronominal and 47.5% of the accented pronominal ACs). L*H is the second most frequent accent, having, however, a higher proportion in the group of non-pronominal ACs (36.7%) compared to the pronominal ones (27.9%). On the other hand, the high accent H* is more frequent among the pronominal (24.6%) than among the non-pronominal ACs (12.2%). One might speculate that the frequency differences are partly due to phonetic factors. For example, some cases of H* on pronouns might actually be realizations of underlying LH* accents that are difficult to produce on short pronouns, especially at a high speech rate. Before we continue with the discussion of possible factors influencing the distribution of the accent types, a few examples will be presented for their illustration. Examples (4), (5), and (6) and the corresponding f0-tracks in Figures 2 and 3 show, respectively, the accents LH*, H*, and L*H on non-pronominal ACs. Both in (4) and (5), the f0-peak is reached on the stressed syllable, but the production of (4) has an additional distinctive f0-rise preceding the peak. In contrast, the stressed syllable in (6) sounds low, and the f0-maximum is delayed into the following auxiliary. (4)
[Einen Sprachfehler] haben Sie auch. a speech_defect have you also ‘And you have a speech defect, too.’
(5)
… einen mit Himbeergeschmack? … [Kirsch] wär auch ok. one with raspberry_taste cherry would_be also ok ‘One with raspberry taste? … Cherry would do as well.’
Corpus data and experimental results as prosodic evidence
231
Figure 2. f0-tracks for examples (4) – left panel and (5) – right panel
(6)
Aber [ein Auto] hab’ ich auch nicht gehört. but a car have I also not heard ‘But I haven’t heard a car either.’
Figure 3. f0-track for example (6)
The counterparts of (4) and (6) with pronominal ACs carrying LH* and L*H accents are illustrated in (7) and (8), respectively (cf. also Figure 4). (7)
[Wir] wissen auch nicht, wie das Ritual genau abläuft. we know also not how the ritual exactly proceeds ‘We don’t know either what the exact procedure for the ritual is.’
232 Denisa Lenertová and Stefan Sudhoff (8)
Hören sie, [ich] hab’ auch Angst. listen you I have also fear ‘Listen, I’m also scared.’
Figure 4. f0-track for examples (7) – left panel and (8) – right panel
Although the relative frequencies of the individual accents in the groups with pronominal and non-pronominal ACs differ, all three types of high or rising accents are obviously available for both groups. An inspection of the contexts did not reveal any information-structural or semantic regularities underlying the accent distribution. Rather, paralinguistic meaning components (irony, emotion, etc.) seem to be among the relevant factors. An example for the utterances with unaccented pronominal ACs is given in (9a). If we take into account that such deictic/anaphoric ACs may be dropped in pro-drop languages, cf. the Czech counterpart of (9a) in (9b), the missing accent is not surprising. Even German allows ‘topic drop’ of anaphoric ACs in the prefield, cf. (10). (9)
a. Bist [du] auch bei Mr. Chomsky in Geschichte? are you also at Mr. Chomsky in history_class ‘Do you also attend Mr. Chomsky’s history lectures?’ b. Nechodíš [Ø] TAKY k Chomskému do dějepisu? not-attend-2SG also to Chomsky-DAT in history_class-GEN
(10) A: Und was ist mit Monica? B: [Ø] Wird auch da sein. will also there be
‘And what about Monica?’ ‘She will be there, too.’
Corpus data and experimental results as prosodic evidence
233
In some cases with unaccented ACs, a non-associated element in the prefield bears a rising accent. Moreover, in cases like (11) and (12), this accent does not evoke alternatives of the prefield element, but again rather signals paralinguistic meaning. Note that this prosodic pattern is also compatible with a context in which the accented element is interpreted as the AC of the particle. (11) A: Ich hab ‘ne lebhafte Fantasie. ‘I have a rich imagination.’ B: Den Eindruck hab [ich] auch. this impression have I also ‘I have this impression, too.’ (12) A: Wo sind deine anderen Kleider? ‘Where are the other clothes?’ B: Auf diese Frage hätt [ich] auch gern ne Antwort. to this question would I also like an answer ‘I too wish I had an answer to this question.’
Figure 5. f0-tracks for examples (11) – left panel and (12) – right panel
2.2. Conjoined ACs A small group of utterances (N = 15) illustrated in (13) and (14) represents a phenomenon not discussed in the theoretical literature. Here the focus particle associates with a pair of elements analogous to ‘complex foci’ (i.e., association of one operator with a pair of foci, cf. Krifka 1992). As Krifka’s
234 Denisa Lenertová and Stefan Sudhoff (1999) view excludes multiple topics that are equally ranked (see the discussion of (3) above), ‘conjoined ACs’ represent a complication that is yet to be accounted for. (13) Wir brauchen dich, und [du] brauchst [uns] doch auch. we need you and you need us PRT also ‘We need you and you need us as well.’ (14) Ich wünschte [ich] hätte [mein Geschenk] auch dabei. I wish I had my present also here ‘I wish I had brought my present with me, too.’ The first AC is usually located in the prefield, but there are also cases with both ACs in the middlefield and elliptical structures without a finite verb. Various accentuation patterns can be found: both ACs accented or unaccented, only the first or the second one accented.
2.3. Conclusions In the utterances of our corpus, ACs of stressed auch are either unaccented or marked by high or rising pitch accents. We found no 1:1 correspondence between accentuation and association with stressed auch. The accented ACs are not characterized by a specific accent type – high and rising accents with early or late peaks were found appropriate for both pronominal and non-pronominal ACs in the prefield and middlefield. A substantial proportion of the ACs do not carry an accent at all; however, the precondition for leaving an AC unaccented seems to be its pronominal status. Under certain circumstances, the AC can even be dropped, and there can be prenuclear accents on non-associated elements. In many cases, prosodic information is thus not sufficient for the identification of the AC, and context information must be taken into account.
3.
Experimental studies
The exploratory corpus analysis cannot go beyond a categorical classification of pitch accents. A laboratory experiment, on the other hand, allows us to investigate the relevance of both categorical and gradual factors. A speech production study with balanced materials controlled with respect to their segmental properties and two speech perception studies were carried
Corpus data and experimental results as prosodic evidence
235
out, facilitating a detailed quantitative examination of the gradual phonetic parameters involved in the prosodic marking of constituents associated with stressed auch. For reasons of space, we will concentrate on the basic ideas of the experimental setup and the results here; for a more detailed report see Sudhoff & Lenertová (2006).
3.1. Speech production study In our speech production study, we examined ambiguous constructions with two potential ACs to the left of auch, cf. (15). (15) [Der Rudi] hat [im Juni] wahrscheinlich auch einen Vortrag gehalten. the Rudi has in June probably also a talk given ‘In June, Rudi probably gave a talk, too.’ Here, both the subject der Rudi and the adverbial im Juni ‘in June’ can be associated with the particle. Our expectation was that such constructions should call for a disambiguation in terms of prosodic marking, depending on which element serves as the AC. We elicited minimal pairs of utterances by embedding the sentences in two different contexts, cf. the English translation of the contexts for (15) in (16). The first context triggers association of the particle with the subject of (15), the second one with the adverbial. (16) a. Can you tell me which of the PhD students gave a talk in June? I heard that only Martin gave one at that time. b. Can you tell me when Rudi gave talks this term? I only know of the one in May. The independent variables were the position of the intended AC (prefield – PF or middlefield – MF), and its syntactic function (subject or temporal adverbial). 7 female speakers each received 20 critical items (5 lexicalizations in 4 conditions) for production, which were randomized and interspersed with fillers. A total of 107 utterances entered the analysis.4 From the qualitative analysis of the target utterances, we obtained the following results: ACs of stressed auch are consistently marked by rising pitch accents, either L*H (81.3%) or LH*. However, the corresponding non-associated elements are frequently accented, too (80.8% in the PF; 29.1% in the MF), carrying L*H, LH*, or H* accents. Moreover, the AC and the corresponding non-AC in a given utterance are often characterized by the same accent type.
236 Denisa Lenertová and Stefan Sudhoff A comparison of the f0-contours revealed a high degree of consistency within the experimental conditions. From the mean contours displayed in Figure 6, it follows that the syntactic function of the ACs does not influence the intonation. On the other hand, the contour shapes differ considerably between PF and MF association. In the conditions with PF association (upper panel), there is a steep rise on the AC and only a very small rise on the non-AC in the middlefield. The conditions with MF association (lower panel) are also characterized by a steep rise on the AC, but here, the nonAC in the prefield shows a rise of almost the same extent.
Figure 6. Mean contours for PF association (upper panel) and MF association (lower panel); gray line: AC = subject; black line: AC = adverbial
Statistical comparisons between the potential ACs were made with respect to the following dependent variables: f0-minimum (f0-min), f0-maximum (f0-max), f0-excursion (df0), duration of the stressed and post-stressed syllable (dur-s23), and temporal alignment of the f0-minimum (al-min) and f0maximum (al-max). The absence of an effect of the syntactic function enabled us to pool the data with associated subjects and adverbials in identical
Corpus data and experimental results as prosodic evidence
237
positions. As the potential ACs were carefully controlled with respect to their segmental properties, we were able to compare them not only between the utterances of one minimal pair (produced by the same speaker), but also within utterances. The comparison of ACs and non-ACs in the same position of lexically identical utterances (comparison between utterances) revealed that ACs are characterized by a higher f0-maximum and a lower f0-minimum than their non-associated counterparts, resulting in a greater f0-excursion. In addition, ACs show a later peak alignment and a longer duration than non-ACs. These differences are statistically significant for both association positions (PF and MF). There is no significant effect of association status on the alignment of the f0-minimum. The mean values of the dependent variables as well as the statistical results are given in Table 1.5 Table 1. Comparison between utterances: paired t-tests (one-tailed); N = 48; α = .0042 (Bonferroni adjustment) variable
pos.
assoc.
non-ass.
p
f0-min (ERB)
PF MF
5.30 5.42
5.42 5.74
< .001 < .001
f0-max (ERB)
PF MF
7.13 6.93
6.75 6.44
< .001 < .001
df0 (ERB)
PF MF
1.83 1.51
1.33 0.70
< .001 < .001
dur-s23 (ms)
PF MF
346 340
300 281
< .001 < .001
al-min (ms)
PF MF
– 47 – 48
– 41 – 44
n.s. n.s.
al-max (ms)
PF MF
193 185
173 139
< .003 < .001
The comparison of ACs and non-ACs within utterances substantially confirmed these results. Due to the fact that the local minimum is mostly located in the pre-stressed syllable, which could not sufficiently be controlled, f0-min and al-min were left out of consideration. With the exception of al-max for MF association, all comparisons show significant effects in the expected direction, cf. Table 2. The comparison within utterances also confirmed that the differences between ACs and non-ACs are significantly greater for PF association than for MF association.6
238 Denisa Lenertová and Stefan Sudhoff Table 2. Comparison within utterances: paired t-tests (one-tailed); α = .0063 (Bonferroni adjustment); left: PF association (N = 55); right: MF association (N = 52) variable f0-max (ERB) df0 (ERB) dur-s23 (ms) al-max (ms)
ass. PF non-a. MF 7.19 1.92 352 196
6.45 0.70 277 137
p
non-a. PF ass. MF
< .001 < .001 < .001 < .001
6.71 1.33 300 173
6.91 1.52 343 185
p < .001 < .003 < .001 n.s.
The accent types found in the speech production experiment correspond to the ones observed in the corpus study: ACs of stressed auch are marked by rising pitch accents (L*H and LH*). However, the accent type cannot be sufficient for their identification, as non-associated elements often carry the same accents. Rather, continuous prosodic parameters seem to be decisive, confirming the findings of Braun (2005). Moreover, the position of the AC is relevant for the magnitude of the prosodic marking: although the differences between ACs and non-ACs are significant for both association positions, they are much greater for PF association. Nevertheless, we expect the cases with MF association to be interpreted correctly, as the accent on the associated MF element should be perceived as more salient. Most of the differences in f0, duration, and alignment between ACs and non-ACs observed in the production data cannot be expressed in terms of different GToBI labels. Whether they are nevertheless perceptually relevant was tested in two speech perception experiments.
3.2. Speech perception study I (original stimuli) The first perception study was based on the original utterances from the production experiment.7 For each utterance, we calculated four parameters expressing the clearness of the prosodic marking: The differences between ACs and corresponding non-ACs with respect to the variables f0-max, df0, dur-s23, and al-max were transformed into scales ranging from 0 to 1, where 0 corresponds to the smallest difference and 1 to the greatest. Due to the different prosodic patterns observed, PF and MF association were considered separately. Owing to this procedure, we were in a position to relate the listeners’ judgements both to the GToBI annotation and to the gradual phonetic properties of the utterances.
Corpus data and experimental results as prosodic evidence
239
Thirty-two subjects were auditorily presented the stimulus utterances (without context, randomized, and interspersed with fillers) and had to select one of two possible continuations given on a computer screen. As each continuation is compatible with only one interpretation of an utterance, the selection indicates which element was interpreted as the AC of auch by the subjects. For the example in (15) above, for instance, the choice was between ‘… and not only in May’ (association with the adverbial) and ‘… and not only Martin’ (association with the subject). In 72.4% of the cases, the listeners chose the continuation corresponding to the original context of the utterance. The percentage of these matching answers, however, clearly differs between the stimuli with intended PF association (84.5%) and MF association (58.3%). It is not immediately clear why this difference should occur. We will return to this point in Section 3.3. The relation between the listeners’ judgements and the categorization of the accents on the potential ACs does not show a consistent picture. Expectedly, utterances with an accented AC and an unaccented non-AC were assigned the intended interpretation more often than utterances with accents on both potential ACs. Within the latter group, however, the results cannot be explained on the basis of the accent distribution. For utterances with intended PF association and identical accents on the AC and the corresponding non-AC (either L*H or LH*), the matching results are clearly above chance level (75.9%). As for the utterances with intended MF association, we find matching results above chance level (66.4%) for cases with an L*H accent on the non-AC and an LH* accent on the AC. Clearly, other factors than accent type must be relevant for the disambiguation of the utterances. Table 3. Correlations between the percentage of correct responses and the derived parameters for PF association (N = 49) and MF association (N = 42) parameter p-f0-max p-df0 p-dur-s23 p-al-max
PF association
MF association
r
p
r
p
.502 .626 .253 .223
< .001 < .001 < .05 n.s.
.525 .385 .387 -.092
< .001 < .007 < .007 n.s.
Now consider the relation between the judgements and the gradual phonetic properties of the stimulus utterances, which are not expressed by the GToBI labeling. As shown in Table 3, we found significant correlations (Pearson’s r)
240 Denisa Lenertová and Stefan Sudhoff between the percentage of correct responses and the derived parameters for f0-max, df0, and dur-s23, but not for al-max. We conclude that the identification rate for a given utterance in the perception task crucially depends on the clearness of the prosodic marking characterizing the utterance (expressed in terms of gradual phonetic variables). Of course, the proportion of the contribution might differ between the parameters with significant effects, especially since the parameters are probably interrelated.8 A comprehensive examination of the individual parameters’ influences lies beyond the scope of this study. Two questions could not be answered on the basis of this perception experiment. First, what is the source of the different proportions of correct responses for PF and MF association? More specifically, are they due to the unbalanced stimulus materials or other effects of the experimental design, or is there a general tendency for association of stressed auch with the prefield element? Second, is it possible to establish perceptual categories on the basis of the prosodic variables examined in the production study? This would be desirable, as the GToBI accents have proved not to be suitable categories for the AC/non-AC distinction. These questions will be addressed in the next section by means of a second perception experiment, making use of manipulated stimulus materials.
3.3. Speech perception study II (manipulated stimuli) The stimuli of the second perception study are based on the sentence in (17), taken from the materials of the production experiment. It was produced by a female speaker with a neutral intonation and similar prominence on both potential ACs. By the joint manipulation of f0-min, f0-max, dur-s23, and almax followed by a resynthesis using Praat’s PSOLA algorithm, we created 11 stimulus versions.9 The parameter settings for the two extreme versions – Stimulus 1 is supposed to be a clear case of PF association and Stimulus 11 of MF association – are based on prototypical utterances from the production study. Their f0-tracks resemble the mean curves for PF and MF association given in Figure 6 above.10 The values for the 9 intermediate stimulus versions represent equal steps on the scale between the extremes, cf. the plotted f0-tracks in Figure 7. (17) [Der Wiener] hat [um sieben] wahrsch. auch einen Anruf bekommen. the Viennese has at seven probably also a call received ‘The Viennese fellow probably received a call at seven, too.’
Corpus data and experimental results as prosodic evidence
241
Forty-four subjects took part in the experiment. The task and mode of presentation were the same as in the first perception study. Each stimulus version was included 6 times, and two critical items were always separated by a filler item preventing the subjects from making direct comparisons between the different versions.
Figure 7. f0-tracks of the 11 resynthesized stimulus versions
A manual inspection of the individual subjects’ decisions revealed two different patterns. About 1/3 of the subjects (group A) turned out to be insensitive to the prosodic variation in the data. All 11 stimulus versions received about the same percentage of choices for PF and MF association from these subjects. The remaining 2/3 of the subjects (group B) made their decisions on the basis of the prosodic information; their judgements clearly differ across the stimulus versions. The aggregated results of both groups are given in Figure 8. For the majority of the subjects (group B), the importance of prosodic cues for the interpretation of ambiguous utterances containing stressed auch could thus be confirmed. Interestingly, Stimuli 1 and 11 (the clear cases of PF and MF association) were interpreted in accordance with the expectations in similarly high proportions (87.4% and 84.5%, respectively). Contrary to what one might expect, the results of group B (right panel of Figure 8) do not show an s-shaped curve, but a rather linear relationship between prosodic realization and interpretation. Subjects did not assign the gradually differing stimuli to distinct perceptual categories.
242 Denisa Lenertová and Stefan Sudhoff
Figure 8. Percentage of decisions for PF association across the 11 stimulus versions; upper panel: group A (15 subjects); lower panel: group B (29 subjects)
3.4. Conclusions In the production study, we found gradual differences between ACs and non-ACs rather than consistent differences in accent type. ACs as well as the corresponding non-ACs were often characterized by rising pitch accents in the construction type examined, but the former show a higher f0-maximum, lower f0-minimum, greater f0-excursion, later peak alignment and longer duration than the latter. These findings support a description of the AC/non-AC distinction in terms of continuous phonetic parameters instead of categorical accent labels.
Corpus data and experimental results as prosodic evidence
243
The perception studies showed that the listeners’ interpretive preferences crucially depend on the prosodic realization of the utterances. More specifically, we found a correlation between the strength of the prosodic marking (defined in terms of the phonetic variables discussed above) and the percentage of decisions for a particular association position. For the majority of the subjects, the identification of the ACs seems to be governed by the relative magnitude of the prosodic parameters characterizing the candidates. As shown by the second perception experiment, there is no preference for the association of auch with the prefield element if prosody is used as a clue (group B). On the other hand, if prosodic information is neglected, a preference for PF association emerges (group A).11
4. General discussion and summary In this paper, the prosodic marking of constituents associated with the stressed variant of the focus particle auch was examined by means of a corpus study and several speech production and perception experiments. The different methodical approaches with their specific data types facilitated an investigation of the phenomenon from various perspectives, providing insights relevant both for the grammar of stressed auch and for general theories of the mapping between prosodic form and meaning. Concerning the former issue, we interpret the results as consistent with Krifka’s Contrastive Topic Hypothesis: ACs of stressed auch are often, although not always, characterized by prosodic properties typical for contrastive topics, and this prosodic marking is relevant for the interpretation of the utterances.12 However, the data does not support a straightforward relation between (prosodic) form and function, contrary to the view assuming a direct mapping from accent types to semantic or information structural categories. The corpus study and the speech production experiment show that ACs of stressed auch, if accented, carry high or rising pitch accents. As the nuclear accent on the focus particle itself is a falling accent (H*L or HL*), the overall intonational pattern is that of a bridge contour. Due to the nature of the data, the actual accent distribution differs between corpus and experimental study. The corpus data suggests the distinction between non-pronominal and pronominal ACs, only the latter showing the optionality of the prosodic marking proposed by Krifka. In both sources, different accents could be observed on the elements associated with auch. However, their relative frequencies differ between the data types. The corpus contains a greater proportion of LH* and H* accents and a lower proportion of L*H
244 Denisa Lenertová and Stefan Sudhoff accents than the experimental data. This can be partly attributed to the lexical properties of the ACs (pronominal vs. non-pronominal status) and to the ambiguous character of the materials used in the production study. The experimental studies showed that ACs of stressed auch are characterized by a number of additional prosodic properties, expressed in terms of gradual phonetic variables, and that these parameters are perceptually relevant: at least in the absence of contextual information, the listeners’ judgements crucially depend on prosodic cues. The question arises whether the prosodic properties of elements associated with stressed auch should be described in terms of discrete phonological categories or gradual phonetic parameters. The first possibility was explored in the corpus study and the qualitative analysis of the production data, the relevant phonological categories being the pitch accent types defined in the GToBI system. It could be shown that there is no 1:1 correspondence between association with auch and a particular accent: on the one hand, we found that various types of accents are appropriate for ACs; on the other hand, the same accents can be observed on non-associated elements, and the AC and some other element can even carry the same accent in one utterance. Thus, the mostly successful performance of the subjects in the perception study cannot be interpreted as resulting from the accent distribution. This is corroborated by the evaluation of the relation between the matching results and the accent distribution on the ACs and non-ACs in the stimulus utterances, which did not yield a coherent picture. Thus, a description of the prosodic properties involved in the marking of ACs in terms of GToBI accents seems to miss the point. The second possibility – the description of the prosodic properties of constituents associated with stressed auch in terms of continuous phonetic parameters – seems to be more adequate. This method captures the relevant differences between ACs and non-ACs in the production data and allows more reliable predictions about the subjects’ interpretation of a given utterance in the perception experiment. Prosodic variation with respect to variables such as f0-peak, f0-rise, or duration is not captured by the GToBI annotation at all. The findings presented in this paper are important for our understanding of the relation between prosody and semantics / information structure. On the one hand, the different pitch accents carried by the ACs of auch in the corpus and production data do not correspond to different semantic or information structural categories, which casts doubt on the appropriateness of the GToBI categories for this purpose. The various prosodic realizations observed in the data seem to convey paralinguistic meaning; however, it has to be clarified what the relevant categories of this meaning component are
Corpus data and experimental results as prosodic evidence
245
and by what means they can be prosodically expressed. On the other hand, it has been shown that the variation of continuous phonetic parameters can be a decisive factor for truth-conditionally and information structurally relevant meaning components. A theoretical model of the relation between prosody and meaning accounting for the status of gradual phonetic parameters is still to be developed. However, there is no doubt that quantitative data from controlled experiments should be taken as evidence in investigating prosodic effects. Finally, the interdependence of prosodic structure and meaning cannot be seen independently of other factors such as syntactic structure or (linguistic) context. In the utterances elicited in our production study, the syntactic position of the ACs influences the magnitude of the prosodic differences between ACs and non-associated elements, and the speech perception experiments revealed that hearers use a default strategy based on word order if they cannot access prosodic information for the identification of the AC. As for the context, it can override preferences for association with a particular element based on the prosodic structure (as in the corpus study), and it can encourage a clear prosodic marking of the ACs (as in the speech production study).
Acknowledgements This research was funded by the DFG research group Linguistic Foundations of Cognitive Science (FOR 349, D1). We wish to thank Stefan Baumann, Bettina Braun, D. Robert Ladd, and Roland Meyer for stimulating discussion at various stages of the experimental work as well as Sam Featherston and the anonymous reviewer for helpful comments on this paper.
Notes 1.
This specific type of data was chosen for practical reasons: existing transcripts facilitated an easy search for the relevant utterances. We did not find a comparable source for truly spontaneous speech. Crucially, the circumstances under which the utterances were produced substantially differ from a laboratory setting. The examples used in this paper stem from the following sources: Kleine Haie (D 1992, ex. (1b), (4)); Das merkwürdige Verhalten geschlechtsreifer Großstädter zur Paarungszeit (D 1998, ex. (5), (11)); Buffy – Im Bann der
246 Denisa Lenertová and Stefan Sudhoff Dämonen (Season 1, USA 1997, ex. (6) – (9), (12)); Lügen und Geheimnisse (F/UK 1996, ex. (10), (14)); Gloomy Sunday – Ein Lied von Liebe und Tod (D 1999, ex. (13)). 2. In the remaining 20.9% of the utterances (elliptical utterances and the structures discussed in Section 2.2), the position of the AC could not reliably be determined. 3. Following the GToBI annotation rules in Grice et al. (2005), we based the distinction between the accent types L*H and LH* on the timing of the peak: if it is reached within the stressed syllable, we annotated LH*, if it is delayed into the post-stressed syllable, we annotated L*H. 4. The utterances of one speaker were excluded as it turned out that she was not naïve with respect to the purpose of the study. Of the remaining 120 utterances, 13 had to be excluded for various reasons, including the production of the unstressed variant of auch, hesitations, and a defective recording. 5. Only 96 utterances could be used for the comparison between utterances, because in case one utterance of a minimal pair was excluded, the other one had to be excluded as well, resulting in 48 minimal pairs. 6. t-test for unrelated measures, two-tailed, p < .001 for f0-max, df0, and al-max, and p < .009 for dur-s23 7. To keep the experiment at a reasonable length, we only used the utterances of 5 speakers (except the defective or unacceptable utterances that had not been included in the analysis), the total number of stimuli being 91. 8. We also cannot cannot conclude that peak alignment is perceptually irrelevant. Its effects could be outweighed by the effects of the other parameters. 9. We varied all parameters with significant effects in the production study. The variation of f0-min and f0-max is automatically accompanied by a variation of df0. 10. Note that the accents on the ACs and non-ACs in all stimulus versions fall into the L*H category of the GToBI system. 11. Presumably, the two groups of subjects following different strategies in the perception task were also present in the first study, explaining the overall preference for PF association. 12. For a proposal how the Contrastive Topic Hypothesis can be utilized to integrate constructions with stressed auch into a general syntactic theory of the grammar of focus particles in German, see Sudhoff (to appear).
References Braun, Bettina 2005 Production and Perception of Thematic Contrast in German. Oxford: Peter Lang.
Corpus data and experimental results as prosodic evidence
247
Büring, Daniel & Katharina Hartmann 2001 The Syntax and Semantics of Focus-Sensitive Particles in German. Natural Language & Linguistic Theory 19: 229–281 Féry, Caroline 1993 German Intonational Patterns. Tübingen: Niemeyer. Frascarelli, Mara & Roland Hinterhölzl 2007 Types of Topics in German and Italian. In On Information Structure, Meaning and Form. Generalizations across languages, Kerstin Schwabe & Susanne Winkler (eds.), 87–116. Amsterdam /Philadelphia: Benjamins. Grice, Martine, Stefan Baumann & Ralf Benzmüller 2005 German Intonation in Autosegmental-Metrical Phonology. In Prosodic Typology: The Phonology of Intonation and Phrasing, Sun-Ah Jun (ed.), 55–83. Oxford: Oxford University Press. Krifka, Manfred 1992 A Compositional Semantics for Multiple Focus Constructions. In Informationsstruktur und Grammatik, Joachim Jacobs (ed.), 17–53. Opladen: Westdeutscher Verlag. 1999 Additive Particles under Stress. In Proceedings of SALT 8, 111–128. Cornell: CLC Publications. Reis, Marga & Inger Rosengren 1997 A Modular Approach to the Grammar of Additive Particles: the Case of German Auch. Journal of Semantics 14: 237–309 Sudhoff, Stefan to appear Focus Particles in the German Middlefield. In The Discourse Potential of Underspecified Structures: Event Structures and Information Structure, Anita Steube (ed.). Berlin /New York: De Gruyter. Sudhoff, Stefan & Denisa Lenertová 2006 Prosodic Properties of Constituents Associated with Stressed auch in German. In Proceedings of Speech Prosody 2006, Dresden, Vol. 1, Rüdiger Hoffmann & Hansjörg Mixdorff (eds.), 390–393. Dresden: TUDpress.
The retrieval and classification of negative polarity items using statistical profiles Timm Lichte and Jan-Philipp Soehn
1. Introduction In this contribution we will address a special group of lexical elements which show a particular affinity for negative contexts. Such elements, usually referred to as negative polarity items (NPI), have been widely studied in linguistic literature since Klima (1964). The classical example of an NPI is the English indefinite determiner any. As demonstrated in (1) a sentence containing any and negation is grammatical; without the negation the sentence is ungrammatical. Following standard terminology, we will refer to the negation as the licenser or trigger of the NPI. We will underline NPIs and print the licensers in bold face. (1)
a. He hasn’t seen any students. b. *He has seen any students.
Since NPIs may occur both in the scope of negation as well as in a variety of other semantically or pragmatically related environments (such as interrogatives, antecedents of conditionals, modifiers of superlative and universal NPs, complements of adversative predicates, to name a few), one very active and controversial research area is the detailed description of possible licensing contexts. The investigation of these polarity items and their distribution provides insight into the architecture of grammar. It is an idiosyncrasy that a given word is an NPI and a semantically similiar word is not (e. g. sonderlich vs. besonders, ‘particularly’). However, there may be parts of a word’s meaning which make it sensitive to polar environments. On the trigger side, generalizations such as downward-entailingness (Fauconnier 1975; Ladusaw 1980) tend to hide other triggers which are not as easy to explain. It is this interplay between idiosyncrasies and regular behavior that is interesting to investigate, ultimately enabling us to better understand language.
250
Timm Lichte and Jan-Philipp Soehn
However, in order to make some advance in this research field, sufficient empirical data is necessary. For English and Dutch, the inventory of NPIs has been documented fairly well. Hoeksema (2005) for instance presents about 700 Dutch NPIs. For German, the state of documentation is less ideal. There is only one relatively extensive list in Kürschner (1983), which, however, does not even come close to the data collected by Hoeksema. The aim of this contribution is to show the use of statistics (i) to automatically retrieve a list of NPI candidates from a partially parsed corpus of written German, and (ii) to classify NPIs. Yet we do not claim that a validated and exhaustive list of German NPIs can be obtained with our method. In order to achieve that, the candidate list must be scrutinized with a detailed corpus study and psycholinguistic experiments.
2.
Theory of NPIs
2.1. Negative Polarity Items Polarity items can be found in every part of speech. We give examples of verbal NPIs (in (2),(3), and (4)), an adjectival NPI (in (5)) and a nominal NPI (in (6)). (2)
a. Er hat es nicht wahrhaben wollen. _ _ _ he has it not accept to be true want ‘He did not want to accept it as true.’ b. *Er hat es wahrhaben wollen.
(3)
a. Es schert ihn nicht. it bothers him not ‘He doesn’t give a damn about it.’ b. *Es schert ihn.
(4)
a. Du brauchst diese Bücher nicht zu lesen. you need these books not to read ‘You need not read these books.’ b. *Du brauchst diese Bücher zu lesen.
(5)
a. Hans war nicht sonderlich zufrieden mit seiner Arbeit. Hans was not very happy with his work b. *Hans war sonderlich zufrieden mit seiner Arbeit.
The retrieval and classification of negative polarity items
(6)
a. Um acht war noch keine Menschenseele da. At eight was yet no men’s soul there ‘By eight o’clock no one had arrived yet.’ b. *Um acht war schon eine Menschenseele da.
(7)
a. Niemand hat auch nur einen roten Heller gespendet. nobody has even one red heller donated ‘Nobody has donated a red cent.’ b. *Auch nur einen roten Heller hat niemand gespendet.
251
Verbal and adjectival NPIs must be in the direct scope of a negative element at LF, i.e. no regular quantifier may intervene. Nominal NPIs are often used as minimizers, e.g. Heller in (7), which impose an additional (syntactic) constraint, namely that the licenser c-command the minimizer. They are likely to be accompanied by even, the German auch nur or the widely discussed Dutch equivalent ook maar (cf. Zwarts 1997; Hoeksema & Rullmann 2001). For a detailed crosslinguistic discussion of minimizers, cf. Vallduví (1994).
2.2. Licensers and the Licensing Property Items and constructions that allow the occurrence of NPIs within their scope are listed in Fig. 1. · · · · · · · · · · ·
n-words (negative particles, negative quantifiers) antecendent of conditionals questions restrictor of universal quantifiers and superlatives non-affirmative verbs (doubt) neg raising verbs (believe) negative conjunctions (ohne dass (without)) comparative than-sentence too-comparatives negative predicates (unlikely) other (endlich (finally), only)
Figure 1. Licensers for NPIs
252
Timm Lichte and Jan-Philipp Soehn
It is important to note that not all NPIs are necessarily licensed by all of these contexts. The distributional pattern of a given NPI may differ from that of another. This leads to the classification of NPIs (see next section). One of the first steps towards an NPI licensing theory in order to explain the licensing properties of the above contexts was taken by Ladusaw (1980), who established that NPIs can only occur in downward-entailing (DE) contexts, building on an idea from Fauconnier (1975). In the face of a number of open questions concerning the standard Fauconnier-Ladusaw theory of NPIs, there has been further elaboration on this, as well as alternative analyses. In critique of the standard DE theory, Giannakidou (1997) proposes the idea of non-veridicality as the basic property of NPI licensers. However, although her approach elaborates on some unresolved issues (e. g. questions are not DE but non-veridical), her theory is less restrictive than required. According to the theories proposed in Kadmon & Landman (1993), Krifka (1995), Chierchia (2005), NPIs have the semantic properties of domain widening and strengthening. They may introduce alternatives to the foreground information which induce an ordering relation of specificity. The NPI itself denotes the most specific element on this scale. That is why NPIs are banned from semantically non-licensing contexts such as affirmative or upward-entailing contexts. In these semantic approaches the focus lies on the meaning of NPIs and not on the nature of licensing contexts. A problem for these purely semantic characterizations of NPI licensing domains arises from what Linebarger (1987) calls an “immediate scope constraint”, forbidding any quantifier to intervene between an NPI and its licensing (negative) quantifier. It is not obvious how to implement this essentially syntactic constraint into these approaches. There is another set of licensing theories in which pragmatic factors are taken into account to a greater extent, although Krifka (1995) already included pragmatic information in his analysis. For example, de Swart (1998) argues that the possibility or impossibility of inverse scope configurations in which an NPI precedes its negative licenser can be explained by considering the pragmatic implicatures triggered by the NPI.
2.3. NPI classification Zwarts (1997) argues for a classification of NPIs by means of their licensing requirements. He distinguishes between different, logically defined categories of licensers which exhibit different grades of negativity. Adopting
The retrieval and classification of negative polarity items
253
the notion in van der Wouden (1997), we differentiate between minimal (e.g. few), regular (e.g. nobody) and classical (e.g. not) negation. One can classify NPIs into superstrong NPIs (licensed by classical negation), strong NPIs (licensed also by regular negation), and weak NPIs (licensed in all three contexts). The NPI classes and their distributional pattern along the grades of negation are depicted in the table in Fig. 2. NPI weak strong superstrong
Negation classic
regular
minimal
+ + +
+ + –
+ – –
Figure 2. NPI classes
Zwarts gives as an example the Dutch NPI ook maar iets (anything) which is compatible with regular negation, but is excluded from minimal negation. Therefore, it can be classified as a strong NPI.
3.
Retrieval of NPIs
The basic motivating idea behind the corpus-based retrieval mechanism described here is to treat the relation between an NPI and its licenser as similar to the relation between a collocate and its collocator. This idea, going back to van der Wouden (1992) and then pursued in van der Wouden (1997), allows us to apply regular collocation retrieval techniques in order to obtain a list of NPI candidates. The aim of our efforts is not to validate items that are assumed to be negative polar. We aim rather for a list of NPI candidates, i.e. a rich source for collecting NPIs. In other words, it is meant to serve as input for the NPI-seeking linguist.
3.1. The Algorithm NPI extraction proceeds in three steps: conversion of the corpus, lemmata counting and evaluation of frequency data, resulting in a lemma ranking.
254
Timm Lichte and Jan-Philipp Soehn
3.1.1. The Corpus and its Conversion The extraction algorithm is performed on a part of the TüPP-D/Z corpus (Tübingen Partially Parsed Corpus of Written German)1. TüPP-D/Z is based on the electronic version of the German newspaper die tageszeitung (taz). It contains lemmatization, part-of-speech tagging, chunking and clause boundaries. The section of TüPP-D/Z that we used in this study consisted of about 5.8 million sentences. The size of the underlying corpus is of importance, because many NPIs are sparse data, while the provided annotation is crucial with respect to the identification of the licensers and their scope. First, we converted the corpus such that it contained only lemmatized words and the clause structure. Concurrently, licensers were identified and annotated with the aid of POS tags and chunking. The corpus, then, consisted of sentence strings such as the following: (8)
CLstart1 von Friede können also bei alle Optimismus noch lange DEINT die Rede sein. CLend1
Von Frieden kann also bei allem Optimismus noch lange nicht die Rede sein. ‘Even with all optimism, we won’t yet be able to talk about peace for a long time.’ CLstart1 and CLend1 represent the clause boundaries, and the licenser nicht (not) is replaced by the licensing marker DEINT (‘downward entailing and interrogative’), which is the marking for a licenser. Unfortunately, some licensers in Fig. 1 cannot be clearly identified in the corpus, if at all.2 For this reason, we tried to avoid ambiguous cases and preferred less licenser annotation rather than to risk incorrect annotation. Nevertheless, NPIs should still show a significant distributional pattern, while statistical noise should be suppressed.3 To give an example, restrictors of superlatives and universal quantifiers are only marked up if they are relative clauses. If these relative clauses, however, are moved to the right border of the sentence (into the so-called Nachfeld) they cannot be related to the superlative and universal quantifier, respectively, in a direct way. Instead of annotating all relative clauses in the Nachfeld of a sentence where these licensers occur, we dropped cases where the relative clause did not immediately follow the superlative or universal quantifier.
The retrieval and classification of negative polarity items
255
Analogously, than-clauses which correspond to a comparative expression were ignored since they mostly occured in an extraposed position. Furthermore, the corresponding comparative constructions are in general hard to detect. We also ignored licensers labelled by ‘others’ in Fig. 1, since their status and their licensing behavior remain to be further elucidated.
3.1.2. Lemma Counting After converting the corpus we extracted for each lemma in the corpus the number of total occurrences and the number of occurrences in clauses which contained a licensing marker (e.g. DEINT). The scope of a licensing marker was assumed to be exactly the clause in which it occurred. It did not comprise embedded clauses. In cases where an item is able to license NPIs in an embedded sentence, e.g. with neg raising verbs and inherent negative verbs, the licensing marker was added to the embedded sentence.
3.1.3. Evaluation of Frequency Data In order to derive a list of NPI candidates, we compiled a lemma ranking based on a very straightforward association measure that we will call the context ratio (CR). It is computed for a lemma l using its overall frequency N and the frequency Nlic of configurations where l is in the scope of a licenser: (9)
CR =
Nlic N
In other words, CR reflects the fraction of the licensed occurrence of a lemma relative to its overall occurrence. We expect NPIs to have a significantly high CR value. CR does not differ from association measures such as mutual information (MI, Church & Hanks 1990) in our setting. The reason for this is the semi-fixed nature of our bigrams and the fact that we are not interested in the actual values of an association measure, but in the broad lemma ranking based on it. Due to the well-known weakness of MI and CR against low frequencies, we integrated a cut-off at a frequency of 30, such that the size of the lemma list was reduced to 64,867 lemmata (from ca. 1,058,462 lemmata). To overcome this tradeoff one could consider using X2 or Log-likelihood (see Man-
256
Timm Lichte and Jan-Philipp Soehn
ning & Schütze 1999) as association measures. However, these are biased in favor of lemmata with high frequency counts, such that association measures based on them must be scaled by the overall frequency in order to get reasonable rankings. By doing this, we ended up with rankings almost congruent with those of the much simpler MI/CR. Therefore we adhered the latter measure.
3.1.4. Enhancement for Complex NPIs So far, we have only considered single lemmata. We know, however, that many NPIs are complex and that some only show negative polarity as complex entity, i.e. the combined lemmata are not inherently negative polar, but the combination of the lemmata (e.g. nicht alle Tassen im Schrank haben ‘to have lost one’s marbles’) is. Therefore, we enhanced the extraction algorithm to also include complex NPIs. The schema for the enhancement is shown in Fig. 3. lemma list
list of lemma chains
negation test
collocation test
list of collocates
new lemma list
Figure 3. Schema of the enhancement for the retrieval of complex NPIs
The starting point is the list of lemmata and their context ratios. We performed a collocation test for every lemma to identify other lemmata that significantly co-occur (i) in the same clause and (ii) in negative contexts. As a collocation measure we integrated the G2 score, a derivative of Loglikelihood (Rayson & Garside 2000). We let those collocates pass that had a G2 value of ≥ 250 and still co-occurred more often than a certain minimal frequency threshold (N ≥ 10). This yielded a list of collocates for each of the lemmata. Next we asked whether the distribution pattern of lemma and collocate showed higher or equal affinity for negative contexts compared to the lemma individually (negation test). If that was the case we repeated the procedure on the lemma-collocate pair, which was now handled the way
The retrieval and classification of negative polarity items
257
we handled single lemmata. In doing this, we obtained chains of lemmata as new NPI candidates, which could not be expanded because they lacked either collocates or an enlarged affinity for negation. These new lemma chains were added to the original lemma ranking in accordance with their context ratio. Starting with the lemma Tasse (cup) at rank 15,221, for instance, the enhanced acquisition method compiles the lemma chain Tasse Schrank (cup cupboard) which corresponds to the negative-polar expression nicht alle Tassen im Schrank haben (to have lost one’s marbles) and position 433 of the ranking. Thus, the enhancement not only generated lemma chains that are easier to map onto complex NPI candidates. It also moved those complex NPIs whose parts are rather non-polar and hidden at lower rankings to a more prominent position on the scale.
3.2. Results: The Candidate List With the method described in the previous section, a ranking consisting of single lemmata and lemma chains was generated. The linguist who is interrested in collecting NPIs will go through these by hand, expecting good NPI candidates to have accumulated among the higher ranked entries. Since the ranking comprises ca. 65,000 entries, the question may arise how far one should go down. This decision rests with the researcher. In the following, however, we used a shortlist of NPI candidates in order to evaluate the retrieval method. The shortlist was derived with the aid of z-values (Moore & McCabe 2006) that standardized the CR values and permitted us to determine significantly deviating CR values.4 In our setting, the derived shortlist comprised ca. 2000 single lemmata and lemma chains (at p ≤ 0.01). The evaluation of the shortlist of NPI candidates consists of two parts: first, we take a direct look at a small section of the shortlist; then, we compare the shortlist with that of Kürschner in a quantitative way. Among the 50 lemma chains with the highest scores, more or less most of the items indeed can be connected to negative polarity more or less. For example, one finds lemma chains such as in (10) that can be mapped onto complex NPIs: (10) a. unversucht lassen etw. unversucht lassen ‘to leave sth. undone’
258
Timm Lichte and Jan-Philipp Soehn
b. Staunen heraus aus dem Staunen herauskommen ‘to be able to believe’ c. reichen vorne hinten vorne und hinten reichen ‘to be sufficient’ Interestingly, some NPIs appear to be somewhat hidden in different lemma chains. For example, brauchen (need), which is known to be negative polar when used as an auxiliary verb, can only be found as a part of five lemma chains. In this case, however, it does not seem to be a serious problem, since these lemma chains reflect regular complex expressions that allow the recovery of the negative polar brauchen. Something similar can be observed for the intensifier gar (at all), mehr (more) and the particle noch ([not] yet). In order to show the position of NPIs generally discussed in the literature, (11) lists some of them: (11) 268: 284: 449: 646: 784:
eine Menschenseele ‘a soul’ sonderlich ‘particularly’ einen Hehl machen aus ‘to make no secret of sth.’ sich scheren um ‘to give a damn about sth.’ jemals ‘ever’
One has to carefully distinguish complex NPIs from idiomatic expressions of which a certain negative element is a part, see e.g. the advertising slogan in (12) with a rather fixed n-word in the object position: (12) sonst ja gönnen Man gönnt sich ja sonst nichts. ‘It’s my one and only treat.’ Yet not all of the lemma chains are connected to negative polarity in a pure sense, i.e. many show an affinity for negation that is triggered by the style of the newspaper text. Nevertheless these items can still occur outside licensed contexts and therefore they are called pseudo NPIs following Hoeksema (1997). See (13) for examples: (13) a. hinter Ofen hervor locken
hinter dem Ofen hervorlocken ‘to get someone excited about sth.’
The retrieval and classification of negative polarity items
259
b. entbehren gewiß Komik einer gewissen Komik entbehren ‘to be lacking in humor’ c. Redaktionsschluss fest stehen noch bei bei Redaktionsschluss feststehen ‘be available at press date’ Finally, there are also seven instances of statistical noise, i.e. lemma chains which undoubtedly have nothing to do with negative polarity or which the authors fail to map on any reasonable complex expression. They can be suppressed quite easily because of their CR value of 1 and their noticeable length. In (14) we present such a lemma chain that arises due to a recurring statement in the weekly ‘Letters to the editor’ section of the corpus newspaper. (14) notwendigerweise
Meinung
Seite
erscheinend
geben
wieder auf die
Die auf dieser Seite erscheinenden Leserbriefe geben nicht notwendigerweise die Meinung der taz wieder. ‘The reader’s letters on this page don’t necessarily reflect the opinion of the taz.’ Note that only a very small part of the shortlist has lemma chains with a CR value of 1, which is predicted for NPIs by the theory. Yet in reality, we have to accept the following inevitable shortcomings: firstly, we simply do not know all the possible licensers. Secondly, as mentioned in 3.1.1, not all licensers can be unambiguously identified and annotated in the corpus. Thirdly, “polysemy is rampant among polarity items” (Hoeksema 1997), that is, many NPIs have non-polar counterparts that leave their marks on the CR value.5 More generally, complex NPIs and their lemma chains are also affected by this. While most instances of a lemma chain in the corpus may be attributed to a complex NPI, e.g. the chain to the complex NPI alle Tassen im Schrank haben, it is possible that some instances reflect co-occurence without the presence of the corresponding complex NPI. As mentioned in section 2.1, the most extensive listing of German NPIs to our knowledge is found in Kürschner (1983) comprising 344 items6. We use this as a reference list for quantitatively evaluating our shortlist in terms of recall. As a result, we found 112 (32.6%) of Kürschner’s items.
260
Timm Lichte and Jan-Philipp Soehn
However, it needs to be pointed out that Kürschner’s listing is by no means exhaustive, and that we are not in complete agreement with Kürschner’s selection. In fact, we have some doubts as to the status of ca. 200 items with respect to negative polarity. Needless to say, these shortcomings were part of the motivation for conducting this study.
4.
Classification of NPIs
Our method can also be used for the subclassification of NPIs. Or more precisely, it offers a classification ex negativo by stating the NPI class to which a certain NPI does not belong due to statistical counter evidence.
4.1. Method In principle, classification is an elaboration of the retrieval method, since we performed a refinement of the distributional patterns that the retrieval method makes use of. For this, we simply split the set of negative contexts into subsets according to minimal, regular and classical negation.7 Note that questions counted among minimal negation. The distributional pattern that we obtained for each NPI then separates the three subclasses of negative contexts. In this way, we were able to investigate which degree of negation a given NPI candidate was most strongly associated with and test a classification hypothesis. How can the association of an NPI with a subclass of negative contexts be measured? We compared the observed with the expected frequency count of an NPI within a context subclass by computing the deviance ratio: 8 (15) Deviance ratio (DR): Given a subclass of negative contexts sc, the observed frequency Osc and expected frequency Esc of an NPI in sc, its total frequency in negative contexts Nneg , and furthermore the fraction of sc with respect to the overall frequency of negative contexts Rsc, then we calculate: DR =
Osc –Esc Nneg
Esc = Rsc ∗ Nneg
Positive values express the fact that the frequency count is higher than expected, while negative values express the opposite. In order to evaluate the significance of DR values we again standardize them by computing z-
The retrieval and classification of negative polarity items
261
values. Note that when computing the mean and standard deviation of DR, we only include lemmata with an overall frequency in negative contexts equal to or higher than 20. This reminds one of the frequency cut-off applied to the lemma list when using CR. In fact, both DR and CR are biased toward low frequencies. Fig. 4 depicts the confidence levels and significance statements we included. For clarity we indicate the significance statements by symbols that will be used throughout this section. While ‘○’ denotes no significance, there are two commonly used significance levels. Note that ‘−−’ entails ‘−’ and ‘++’ entails ‘+’. p > 0.05 (not significant, z < –/+1.96)
→
○
p ≤ 0.05 (significant, z ≥ –/+1.96)
→
−/+
p ≤ 0.01 (significant, z ≥ –/+2.58)
→
−−/++
Figure 4. Significance levels and their symbols
Evenly distributed NPIs should exhibit a not significant DR for every subclass of negative contexts. In particular, weak NPIs can be (but don’t have to be) of this kind. On the other hand, NPIs that are strong and superstrong, respectively, are predicted to reveal a systematic deviance from this pattern, as can be seen in Fig. 5. NPI weak strong superstrong
Negation classic
regular
minimal
× × ×
× ×
×
–
– –
Figure 5. NPI classes and their predicted patterns of deviance
We allow weak NPIs to have any deviance pattern. In fact, it is not possible to find counter evidence for weak polarity by using this method. The reason for this is that this method only allows one to state whether an NPI does not belong to the class of strong and superstrong NPIs, respectively. Strong and superstrong NPIs have to conform to a more specific deviance pattern, namely that a strong NPI has to show at least a significant negative deviance with respect to minimal negation, while a superstrong NPI has to exhibit at least a significant negative deviance with respect to minimal and
262
Timm Lichte and Jan-Philipp Soehn
regular contexts. Note that from a point of view based on corpus statistics, these are minimal conditions. If these are not met, we obtain counterevidence against a certain classification. In face of the relative complexity of our method, one may ask why the frequency counts are not evaluated directly. Following the theory for NPIs sketched above, it can be assumed that strong and superstrong NPIs reveal an easily observable pattern, that is to say that there is no occurrence with certain licensers (e.g. weak ones). However, the nature of the data we are dealing with must be respected. Even a small frequency count in weakly negative contexts can still reflect the distributional pattern of strong or superstrong NPIs according to the theory. Instead of strictly demanding no occurrence, we therefore chose to define a more flexible threshold with the aid of statistics.
4.2. Results A set of known NPIs, their distributional patterns (in terms of frequency counts) and the corresponding deviance patterns (in terms of signifance statements) are given in Fig. 6. Lemma chain
Classic
Regular
Minimal
sonderlich (878)
++ (782)
○
−− (4)
brauchen VVIZU (2359)
○
○
(1660)
(625)
jemals (1077)
− (314)
(202)
Tasse Schrank (28)
○
○
(10)
(2)
++ (16)
jedermanns Sache (66)
++ (64)
− (0)
− (2)
Menschenseele (28)
−− (4)
++ (22)
− (2)
sonst ja gönnen (27)
−− (0)
++ (27)
−− (0)
(92)
○
Figure 6. Some NPIs and their deviance patterns
− (74) + (561)
The retrieval and classification of negative polarity items
263
The obtained deviance patterns conform to the introspective classification of the considered NPIs. Strong NPIs such as sonderlich and the auxiliary brauchen9 reveal a significant negative deviance in weak negative contexts. Note that this holds despite the relatively seldom occurrence here, which is marginal from a statistical point of view. On the other hand, jemals and Tasse Schrank (i.e. alle Tassen im Schrank haben) show a significant positive deviance in weak negative contexts, hence there is statistical counter evidence against classifying them as strong. However, this is no big suprise, since it is beyond controversy that these items are weakly negative polar. What about superstrong NPIs? Since no items of this class have been described so far for German (and there is substantial doubt whether NPIs of this class exist at all), we cannot test them here. Yet we do mention an NPI that shows the characteristic deviance pattern of superstrong NPIs, namely jedermanns Sache [sein] (to be everyone’s cup of tea). Of course, this does not indicate that jedermanns Sache sein is necessarily superstrong, it only shows the lack of statistical evidence against such a classification. To show the deviance pattern of a minimizer, we also include Menschenseele in Fig 6. Its distribution clearly concentrates on regular negation, having a significantly low occurrence with classical and minimal negation. This peak can be explained by the natural way of negating minimizers, namely by kein such as in keine Menschenseele (not a soul), which is a regular negation. While the deviance pattern does not hinder a classification of Menschenseele as strong, we regard Menschenseele to be licensable by the minimal negative licenser kaum (scarcely) and hence to be a weak NPI. Similarly to minimizers such as Menschenseele, idiomatic expressions with fixed negative elements should also be biased in favour of only one class of negation. If we examine the lemma chain sonst ja gönnen which corresponds to the saying sich ja sonst nichts gönnen as mentioned above, this pattern is observed very clearly.
5. Summary A solid database for research and theory development of NPIs, e. g. for German, remains a desideratum. In this contribution, we proposed a method for automatically extracting NPI candidates from a partially annotated corpus. The core idea was to obtain for every lemma of the corpus the context ratio with respect to licensing expressions and to derive a ranking based on these context ratios. As many NPIs consist of more than one word, we enhanced
264
Timm Lichte and Jan-Philipp Soehn
our algorithm to extract not only simple words but also complex expressions. In line with our expectations, we found promising NPI candidates among the topmost lemmata of this ranking. In addition, our system provides a fine-grained distributional pattern for each NPI, which treats the three subclasses of negative contexts separately. Thus, the degree of negation with which a given NPI candidate is most strongly or most weakly associated can be investigated. This is an aid in the process of subclassification. In particular, statistical evidence shows to which class a certain NPI does not belong. While both the retrieval and the classification method play a supportive role, their expressiveness with respect to obtaining cast-iron NPIs is rather restricted. This is mainly due to the nature of a text corpus and due to the difficulty in correctly identifying and annotating licensers. The methods presented are useful in establishing a descriptive database for NPIs. Despite the fact that some hurdles have been overcome, there still remains a considerable amount of work for the linguist as far as the evaluation of candidates and the compilation of a comprehensive list of NPIs for a given language is concerned.
Notes 1. See http://www.sfs.uni-tuebingen.de/tupp 2. As Hoeksema (1997) points out, conditionals can be covertly realized, as in: You say anything, and I’ll kill you. We think that it is virtually impossible to reliably identify these licensers in an automated process. 3. We ignore cases of double negation which we assume to be relatively rare. By doing this we, accept a small number of wrongly annotated licensing markers and hence a minor portion of additional statistical noise. 4. Note that z-values are intended for data sets with normal distribution. However, the CR values of our single lemmata are skewed towards a CR score of 1, the mean being at 0.1178. Since we only use z-values for deriving a shortlist, we ignore this. 5. For example, the verb angehen has at least four readings: (1) das Licht geht an (the light is turned on), (2) wir gehen das folgendermaßen an (we’ll tackle this in the following way), (3) es geht dich nichts an (it’s none of your business) (4) es kann nicht angehen, dass… (it can’t be true that…). Only (3) and (4) can be considered as negative polar. 6. We ignore 13 proverbs and sayings where certain negative elements participate.
The retrieval and classification of negative polarity items
265
7. When determining the class of negation for a licenser, one has to take into account expressions in which the licenser can be embedded, and which exhibit a different negation. To give an example, the expression nicht so viel (not so much) contains nicht, an instance of classic negation, but the expression as a whole entails regular negation. There are also licensers of regular negativity which can appear in expressions of classical negativity, e.g. auf keinen Fall (by no means). 8. The formula of deviance ratio resembles that for X2 in Manning & Schütze (1999). However, X2 operates with contingency tables and provides only unsigned values. 9. We assume the auxiliary brauchen has a clausemate VVIZU marking that is a preserved POS tag from the original corpus and indicates a to-infinitive.
References Chierchia, Gennaro 2006 Broaden your views. Implicatures of domain widening and the ‘logicality’ of language. Linguistic Inquiry 37 (4): 535–590. Church, Kenneth Ward & Patrick Hanks 1990 Word Association, Norms, Mutual Information and Lexicography. Computational Linguistics 16 (1): 22–29. Fauconnier, Giles 1975 Polarity and the Scale Principle. Papers from the 11th Regional Meeting of the Chicago Linguistic Society: 188–199. Giannakidou, Anastasia 1997 The Landscape of Polarity Items. Ph.D. thesis, Rijksuniversiteit Groningen. Hoeksema, Jack 1997 Corpus Study of Negative Polarity Items. HTML version of a paper which appeared in the IV–V Jornades de corpus linguistics 1996– 1997, Universitat Pompeu Fabre, Barcelona. http://odur.let.rug.nl/ ~hoeksema/docs/barcelona.html. Hoeksema, Jack & Hotze Rullmann 2001 Scalarity and Polarity: A Study of Scalar Adverbs as Polarity Items. In Perspectives on Negation and Polarity Items, Jack Hoeksema, Hotze Rullmann, Víctor Sánchez-Valencia & Ton van der Wouden (eds.), 129 –171. Amsterdam: Benjamins. Hoeksema, Jacob 2005 De negatief-polaire uitdrukkingen van het Nederlands. Inleiding en lexicon. Ms., Rijksuniversiteit Groningen. Kadmon, Nirit & Fred Landman 1993 Any. Linguistics and Philosophy 16 (4): 353– 422.
266
Timm Lichte and Jan-Philipp Soehn
Klima, Edward 1964 Negation in English. In The Structure of Language, Jerry A. Fodor & Jerrold Katz (eds.), 246–323. Englewood Cliffs, NJ: Prentice Hall. Krifka, Manfred 1995 The Semantics and Pragmatics of Polarity Items. Linguistic Analysis 25: 209–257. Kürschner, Wilfried 1983 Studien zur Negation im Deutschen. Tübingen: Narr. Ladusaw, William 1980 Polarity Sensitivity as Inherent Scope Relations. New York: Garland. Linebarger, Marcia 1987 Negative Polarity and Grammatical Representation. Linguistics and Philosophy 10: 325–387. Manning, Christopher D. & Hinrich Schütze 1999 Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. Moore, David S. & George P. McCabe 2006 Introduction to the Practice of Statistics. 5th Edition. New York: Freeman. Rayson, Paul & Roger Garside 2000 Comparing Corpora Using Frequency Profiling. In Proceedings of the Workshop on Comparing Corpora, ACL, 1–8 October 2000, Hong Kong: 1–6. de Swart, Henriëtte 1998 Licensing of Negative Polarity Items under Inverse Scope. Lingua 105: 175–200. Vallduví, Enric 1994 Polarity Items, N-words, and Minimizers in Catalan and Spanish. Probus 6: 263–294. van der Wouden, Ton 1992 Beperkingen op het optreden van lexicale elementen. De Nieuwe Taalgids 85 (6): 513–538. 1997 Negative Contexts. Collocation, Polarity and Multiple Negation. London: Routledge. Zwarts, Frans 1997 Three Types of Polarity. In Plurality and Quantification, Fritz Hamm & Erhard W. Hinrichs (eds.), 177–237. Dordrecht: Kluwer.
Geographic distributions of linguistic variation reflect dynamics of differentiation John Nerbonne and Wilbert Heeringa
1. Introduction The oldest branch of dialectology is the study of what is today often referred to as “dialect geography”, i.e. the study of the geographical distribution of language varieties, as opposed to the study of many other relations between language varieties and external conditioning factors, such as social class, age, and sex. While it is clear that geography has a massive influence on the distribution of language varieties, and that closer varieties are normally more linguistically alike than more distant ones, still there have been surprisingly few attempts to examine these relationships with an eye toward more general formulations. Trudgill (1974) is an honorable exception to this last generalization. Trudgill proceeds from the very plausible assumptions that closer dialects must influence each other most strongly, and that intensity of social contact is likely to determine the degree of influence. He shows how to tie these ideas together in a model which hypothesizes a gravity-like attraction between dialect varieties, where population is the analog to physical mass, and geographic distance plays its customary role. He adduces evidence in support of this view, relying on selected dialect features. Although we wish to contribute to the understanding of the general principles underlying the geographic distribution of linguistic variation, we structure our paper as a test of the very specific gravity hypothesis advanced by Trudgill, according it the attention we feel it deserves as an early attempt at a general formulation of the the principles of how geography influences variation. Dialectometry provides the more general tools with which such relationships may be studied (Goebl 1982, 1984; Nerbonne & Kretzschmar 2003), and the present paper is an attempt to apply dialectometry to evaluate Trudgill’s ideas more systematically. In fact it has been common to examine the dependence of dialect distance on geography from the earliest work on in dialectometry (Séguy 1971; Heeringa & Nerbonne 2002; Gooskens
268
John Nerbonne and Wilbert Heeringa
2004). There has been no systematic examination of Trudgill’s gravity hypothesis from a dialectometric perspective, however. In the current paper we expose Trudgill’s fundamental ideas to dialectometric examination. The following section presents Trudgill’s ideas, their motivation and an overview of previous work. Section 3 describes the experiment, including the data sources, and Section 4 presents the results, which certainly do not provide confirmation for the importance of the role of gravity, or centripetal forces due to social interaction. The final section discusses these results, and suggests an interpretation which does not dismiss gravity, which after all, is theoretically well-founded, but which emphasizes that centrifugal forces, especially dialect differentiation, are more important.
1.1. Evidence for diffusion The primary goal of this paper is to contribute to the understanding of the geographic distribution of linguistic variation, and to argue that these distributions reflect the dynamics of linguistic diffusion. This argumentation effectively uses (aggregate) synchronic distributions as evidence of diachronic patterns of diffusion. We use aggregate distributions to compensate for the noisiness of individual distributions, and we shall suggest that earlier work on the diffusion of linguistic change has been blinded by too narrow a focus on individual changes. This argumentation likewise summarizes our intended contribution to the reflection on evidence in linguistic theorizing. One the one hand we bring synchronic evidence to bear on diachronic issues, much in the same way that studies of “apparent time” in sociolinguistics do (Chambers 1995). While the latter sorts of studies aim to assay the passage of time by examining successive generations of speakers, we interpret the degree of geographic diffusion of changes as evidence of the temporal course of changes. We are likewise inspired by the demonstrations of astronomers that planetary systems are formed from material escaping from rapidly rotating stars rather than from the capture of large objects which happened to pass within the gravitational fields of stars. Crucial evidence for the former position is the fact that planetary systems function almost perfectly in a plane, a fact which would require independent explanation on the latter, “chance capture” view. Our second intended contribution to the reflection on evidence is simpler. We need to observe a great deal of material if we are to study the principles
Geographic distributions of linguistic variation
269
underlying the geographic distribution of variation. If we restrict our attention to only a few examples, then we may be unfortunate enough to focus on atypical material, and we may be misled into proposing alternative mechanism when in fact geography is massively influential in channeling variation.
2. Background In this section we present, in turn, Trudgill’s “gravity” theory of dialect dynamics, which might be seen as a reaction to the “wave” models of linguistic diffusion (Schmidt 1872), our own ideas on measuring the pronunciation distance between dialects, and the basic idea of testing the one via the other.
2.1. Trudgill’s Gravity Model Schmidt (1872) introduced the idea that a given linguistic change might spread in waves from a center of innovation, an idea that is at the base of many models of the diffusion of linguistic change (Wolfram & SchillingEstes 2003: 721). Peter Trudgill introduced an important refinement, suggesting the application of a GRAVITY MODEL, which had been used earlier in social geography, to questions of linguistic borrowing Trudgill (1974). In Trudgill’s view linguistic innovations spread as if the driving forces were proximity and population size. In a typical case, an innovation spreads from a large population center directly to another intermediately sized one, often bypassing smaller, geographically intermediate sites. It then in turn spreads from the slightly smaller sites to yet smaller ones, and so on. It is as if each population center had its sphere of influence and that behavior within it is best studied with respect to the locally influential center. The gravity model thus postulates that linguistic innovations do not simply radiate from a center, as they might in a pure version of the wave theory, but rather that they affect larger centers first, and from there spread to smaller ones, and so on. For this reason it is also referred to as a CASCADE model (Labov 2001: 285): linguistic innovations proceed as water falling from larger pools to smaller ones. In particular, it should be possible for changes to “hurdle” immediate neighbors, instead of working only very locally. The connection to physical gravity is suggested in Figure 1. In understanding the movement of heavenly bodies, it is best to concentrate on the
270
John Nerbonne and Wilbert Heeringa
nearest very massive body. Thus, even though the moon is affected by the sun’s mass, its path of revolution is determined almost entirely by the much closer Earth. The physical theory of gravity accounts for this by postulating a force due to gravity which is inversely proportional to the square of the distance between bodies. In this way very distant bodies are predicted to have much less influence than nearby ones.
Figure 1. According to the “gravity” model of linguistic dynamics, large population centers exert a force on smaller ones in proportion to the product of their populations, just as the presence of large heavenly bodies exerts a force on smaller ones in proportion to the product of their masses. Because distance likewise plays a role which diminishes quadratically, the most important influences are local ones. Thus, just as the Earth largely determines the behavior of the Moon, so will a local population center dominate within its own vicinity.
Social science uses of “gravity models” emphasize the importance of social contact and its role in suggesting and promoting the adoption of social and cultural innovations. Some of the phenomena studied by social geographers propagate spatially in a way that reflects their dependence on social contact. People generally adopt new styles of dress, new styles of housing and simple
Geographic distributions of linguistic variation
271
new technologies only after seeing others use them. Social contact is not merely a necessary condition for the spread of linguistic variants; contact frequency also determines the chance of adoption, and ultimately, spread. We should expect many, and perhaps very nearly all dialect variants similarly to require experience before they could be adopted, and it is also likely that the frequency of experience plays an important role.
Motivation It is more than plausible that interaction with novel varieties disrupts customary speech habits and promotes the spread of novel linguistic variants. One indication of this plausibility is the readiness with which one interlocutor will adapt his speech to another’s. We see adjustments within the time span of individual conversations, and there is evidence that lasting effects also obtain. We review these issues presently. It has been noted in various subfields of linguistics that conversation partners regularly adjust their speech habits to “accommodate” each other’s use of language. Lewis (1979) introduced a principle of accommodation in discourse analysis to account for the willingness of interlocutors to interpret each other charitably, even in the face of apparent infelicities. Language acquisition experts have noted that adults spontaneously simplify their speech in conversation with young children (Clark & Clark 1977), a phenomenon they refer to as “motherese” or “caretaker speech”. Closer to home, students of dialect contact regularly note that speakers in multi-dialectal exchanges may temporarily adopt (some of) their conversation partner’s dialect features (Giles 1994). Of course, it is one thing to demonstrate a temporary accommodation to a conversation partner, and quite another to show that there are permanent effects of such accommodations either at the level of the idiolect (the speech habits of an individual) or at the level of the dialect (the speech habits common to a social or geographic group). Trudgill (1986: ch.1) presents an overview of what is known on the first topic, along with his own studies of Englishmen in the U.S., and it is quite clear that individuals do adjust their speech habits when they live in another dialect area for a considerable length of time. Trudgill (1986: ch. 2–4) is then an extensive essay which establishes quite convincingly that dialects do borrow from one another following patterns which suggest a dependence on social contact, which in turn makes accommodation as a mechanism quite plausible.
272
John Nerbonne and Wilbert Heeringa
2.2. Formulation If social contact promotes the transfer of features, then we should be able to quantify its overall effect on entire settlements—villages or towns. The overall effect on the varieties associated with settlements should depend on the numbers of individual contacts, which in turn depends on how far apart the settlements are. Distance impedes the chance of contact, so that the further apart the settlements, the less chance of contact. Trudgill takes this idea a step further and suggests that the contact should decline, not as a linear function of distance, but rather quadratically. This seems reasonable if we consider that the area within a given distance of a settlement also increases as a quadratic function of the distance. If we imagine a dialect speakers traveling randomly from a given place of residence, then the chance of traveling to a given point should also fall quadratically with the distance from the place of residence. The size of the settlements clearly promotes the chance of contact, however. In fact, for two settlements of size P1 and P2, the number of chances at social contact will rise with the product P1 · P2, the number of pairs of people where the first person is from the first town, and the second from the second. Let us note that there is room for the incorporation of further factors here, including perhaps whether a town lay on a frequent trade or pilgrimage route, or whether it was a market center or seat of (local) government. Trudgill (1983: 75) pulls these two factors together in a formula suggesting a linguistic counterpart to the law of gravitation: Iij = s ·
Pi Pj (dij )2
where Iij represents the mutual influence of centers i and j, Pi is the population of center i, etc., and dij is the distance between i and j. s is a constant needed to allow for simple transformations, but it may be viewed as “variable expressing linguistic similarity”.1 We note that Trudgill’s discussion makes it clear that he would allow that s differ, depending on the similarity of the varieties he was measuring. We shall not exploit this feature of his ideas – which would indeed resist incorporation into the experiment below, but we shall take care to limit our study to fairly similar varieties. We return to this issue in Section 5 below. The formula thus encapsulates a view of how population size and geographical distance may influence dialect differences. As our discussion has tried to show, the view accords with the notions of accommodation discussed in Section 2.1. If the result of the formula is large, it means that center
Geographic distributions of linguistic variation
273
i has a high level of interaction with center j, meaning that we expect their dialects to influence one another a great deal. It will be convenient to refer below to the two consequences of the gravity theory which we have emphasized thus far, viz., that interaction should correlate positively with the product of population settlement, and negatively with squared distance: PP
Iij = s · (di )2j ij Iij ∝ Pi Pj Iij ∝ 1/dij2 Iij ∝ –dij2
(1)
2.3. Work to-date Trudgill (1974: 225ff.) examines different pronunciations of the phoneme /æ/ in southern Norway, showing that pronunciations in sites closest to Larvik, a local population center, also most closely resembled it. He chose this pronunciation because it was changing at the time the data was collected. In this way he obtained a view of a change in progress, which, indeed accorded with the predictions of the gravity model. Callary (1975) noted a strong correlation between the height of /æ/ in Illinois speakers with the size of the city or town those speakers came from. The more urban the speaker’s background, the higher the vowel pronunciation. He noted that this is an exception to the predictions of the wave theory and specifically suggested Trudgill’s gravity model as a potential explanation (p. 168). Trudgill (1986) establishes intimate borrowing in a number of ways, including especially an extensive survey of the relevant literature and also several quantitative studies of individual dialect features (e.g. pp. 42, 64, 111), but there seems to have been no attempt to generalize over a number of features to examine whether geographically proximate varieties in general become more similar over time. This work was not specifically presented as an investigation of the gravity model, but it reaffirms the plausibility of the underlying assumption that social contact is an important factor leading to the acceptance of change. Bailey et al. (1993) and Wikle & Bailey (1997) investigate several ongoing changes in Oklahoman varieties, concluding that while several, indeed most, follow the direction of spread from larger to smaller settlements, important exceptions actually reversed the trend. They show that inchoative
274
John Nerbonne and Wilbert Heeringa
fixin’ to has spread from rural to urban areas, demonstrating that this direction is also possible. They attribute the reversal to the prestige ascribed to the use of this form. Boberg (2000) examines the degree to which the gravity model can account for diffusion across the U.S.-Canada border and concludes that it has relatively little predictive power. In particular, he shows that Windsor, Ontario, which is immediately adjacent to the U.S. border, and to the large population center of Detroit, is no more “American” in its pronunciation than Toronto. He suggests that the border might need to be included in the spatial model, but does not attempt to present a more refined model, and agrees with Bailey et al. (1993) that subjective elements of prestige require attention. Horvath & Horvath (2001) examine /l/ vocalization in Australian and New Zealand English, which they demonstrate to be a change in progress by showing that it is universally more frequent in younger speakers as compared to older ones. They conclude, however, that “a gravity model […] does not account for the diffusion of /l/ vocalization.” They suggest that this reflects an oversimplification of the model, which attributes diffusion to spatial effects without allowing that specific places may differ in their spatial properties. Wolfram & Schilling-Estes (2003: 732) report on a resisted change, an island off the coast of the American South which is not acquiescing in the widespread Southern U.S. change of /aɪ/ to /a/, which they attribute to the islanders’ valuing it “as a marker of in-group identity”. In summary, research has not overwhelmingly vindicated Trudgill’s postulation of a gravity-like effect in linguistic diffusion. There have been voices of affirmation, but even these have noted several counterexamples. Recent studies have almost all concluded that the influence of geography is exaggerated in gravity hypothesis, and that other factors have to be examined: We are very critical of these recent studies, and in particular of their conclusion that the influence of geography has been exaggerated. We show below that this influence is indeed massive. Our quantitative analysis aims to contribute to this discussion in two ways. First, Trudgill’s and others’ studies might rely on fortuitously chosen features which corroborate or contradict the lasting influence of accommodation, but which might be atypical. Since the prima facie case for contactinduced dialect change is strong, however, further investigation about its generality is warranted. By examining the influence of geography quantitatively, the present study attempts to aggregate a larger number of linguistic
Geographic distributions of linguistic variation
275
variables, and thus to examine Trudgill’s ideas from a more general perspective. The present study thus seeks to investigate the influence of geography for a large range of linguistic features. Second, given quantitative tools, we believe we will be in a position to quantify the strength of geography’s influence. This option is not available to those working on isolated linguistic features.
2.4. Dialect distances In our own work we have developed measures of the aggregate linguistic distance between varieties. We describe the method in this section. There are several ways in which phoneticians have tried to measure the distance between two basic sounds, most of which are based on the description of sounds via a small (≤ 25) number of features (see Heeringa 2004 for details). There is also a standard technique for the computational comparison of sequences, viz., Levenshtein distance, also known as (string) edit distance, and we combine these techniques.
2.4.1. Segment distances The phonetic segment distance measure we use in this paper is based on the comparison of spectrograms of sound segments. A spectrogram is a mapping from time and frequency to intensity and captures most of the information available to the human ear. We are attracted to using spectrograms as a basis for segment distance in order to avoid the problem of determining the appropriate relative contribution of the different phonetic features.2 The spectrograms we used were made on the basis of recordings of the sounds of the International Phonetic Alphabet as pronounced by John Wells and Jill House on the cassette The Sounds of the International Phonetic Alphabet from 1995.3 The different sounds were isolated from the recordings and monotonized at the mean pitch of each of the two speakers with the program PRAAT.4 Next, we deployed PRAAT to obtain a spectrogram for each sound using the so-called Barkfilter which is a perceptually oriented model. On the basis of the Barkfilter representation, segment distances were calculated as curve distances between the two spectrographic mappings. The precise way in which this was done is described extensively in Heeringa (2004: 79–119) and briefly in Gooskens & Heeringa (2004).
276
John Nerbonne and Wilbert Heeringa Operation æәftәnʉn æftәnʉn æftәrnʉn æftәrnun
delete ә insert r replace [ʉ] with u
Cost d ( ә,[ ] ) =0.99 d ( [ ] ,r ) =0.95 d ( [ ʉ] ,[ u ] ) =0.76
Total
2.70
Figure 2. Levenshtein distance between two sequences is the least costly sum of costs needed to transform one string into another. The transformations shown here are associated with costs derived from spectrograms, i.e. the distance between the three-dimensional curves representing individual phonetic sounds. The pronunciations are from the Linguistic Atlas of the Middle and South Atlantic States (Kretzschmar et al. 1994).
Because small differences in pronunciation may contribute inordinately to the perception of phonetic distance, we emphasize small differences by applying a logarithmic transformation to the curve distance obtained in the way described above. To avoid taking a logarithm of zero, we calculate a slightly modified quantity: ln(distance + 1) ln(maximum distance + 1) We turn now to the Levenshtein distance, which may be regarded as a means of lifting the segment distances obtained thus far to the level of sequence distances. The basic idea behind Levenshtein distance is to imagine that one is rewriting one string into another. The rewriting is effected by basic operations, each of which is associated with a cost, as illustrated in Fig. 2. The operations used were (i) the deletion of a single sound, (ii) the insertion of a single sound, and (iii) the substitution of one sound for another. We have experimented with other operations, but we have made no use of them for this work. The operation costs used in the procedures were those derived from the distance between spectrograms in a reference database as explained above. They consist of the measure of the distance between the sounds (in the case of substitution), and the measure of the distance between a given sound and silence (in the case of insertions and deletions). We insist on proper alignments in the application of edit distance by requiring in general that only vowels may match with vowels, only conso-
Geographic distributions of linguistic variation
277
nants with consonants, but allowing the exceptions that [j] and [w] may align with vowels, [i] and [u] with consonants, and central vowels (here effectively only the schwa) with sonorants. Thus the [i], [u], [j] and [w] align with anything, but otherwise vowel/consonant status is respected so that unlikely matches (e.g. a [p] with a [a]) are prevented. Comparing pronunciations in this way, the distance between longer pronunciations will generally be greater than the distance between shorter pronunciations. The longer the pronunciation, the greater the chance for differences with respect to the corresponding pronunciation in another variety. Because we would prefer not to exaggerate the effects of sounds in longer words, we normalize the raw distances obtained by dividing the raw distance by the length of the longest alignment which gives the minimum cost. The longest alignment has the greatest number of matches. We illustrate this with an example: æ æ
ә Ø 0.99
f f
t t
ә ә
Ø r 0.95
n n
ʉ u
n n
0.76
The total cost of 2.7 (= 0.99 + 0.95 + 0.76) is now divided by the length of 9. One important advantage of this procedure is that word distances are now expressed as percentages of a potential maximum. In the case above we obtain a word distance of 0.3 or 30%. Our varietal comparisons are made on the basis of 125 words, yielding 125 word distances per pair of varieties. We assay the distance between the varieties to be the mean distance in our 125-element sample. Since word distances are expressed as percentages, mean varietal distances are also percentages. All the distances between the 52 Low Saxon varieties are then arranged in a 52 ×52 matrix. If we apply a Levenshtein procedure to about 100 words from several hundred field work sites, the result may be shown to verify the idea of dialect areas as used in traditional dialectology Nerbonne et al. (1999). These may be reconstructed via clustering techniques, but also via the statistically more stable multi-dimensional scaling. Levenshtein distance has been shown to be consistent and valid with respect to the judgments of lay dialect speakers Gooskens & Heeringa (2004); Heeringa (2004). Kessler (1995) first applied Levenshtein distance to phonetic transcriptions to measure the linguistic distances between (Irish) varieties. Nerbonne et al. (1996) and Heeringa (2004: 213–278) have applied the techniques to Dutch (see also references there), Bolognesi & Heeringa (2002) to Sardinian,
278
John Nerbonne and Wilbert Heeringa
Heeringa & Gooskens (2003) to Norwegian, and Nerbonne & Siedle (2005) to German. Heeringa (2004: 121–135) is the most complete description, and we used exactly the scheme presented there to obtain the measurements in this paper. Although the Levenshtein technique was developed to measure the distance between sequences of phonetic segments, it measures all differences which are reflected in the phonetic transcriptions of dialect atlases, which typically consist of realizations in context, and which therefore include lexical, phonetic and morphological differences as well. In this paper we shall use the Levenshtein distances to test the idea of linguistic gravity, to which we turn in the next section.
2.5. Dialect distances and gravity The fundamental idea behind the current experiment is to test the gravity theory, which is a claim about the dynamics of dialect change, using synchronic dialect distances. Since this is methodologically innovative, let us dwell on it briefly. Examining a range of dialect sites from a fairly stable region, we reason that, if they are subject to the forces of linguistic gravity, then the patterns we find in the synchronic data should reflect the accumulated effects of linguistic gravity. Synchronic differences should reflect historical dynamics. In particular the varieties closest to one another and those involving larger populations should be linguistically most similar as well. In this way we propose to test the gravity idea, examining synchronic (linguistic distance) only. We further take care to make explicit here the assumption that the adoption of features from one dialect into another should make them more similar (than they originally were). We note one advantage which immediately accrues to this sort of probe: it does not require that we isolate ongoing changes and try to wring from them a direction. This was required in earlier examinations of the gravity hypothesis (Section 2.3). This only makes sense if the linguistic data we are examining betrays the effect of incomplete diffusions, changes which, for whatever reason, have not (yet) propagated throughout the area we are examining or which were partially overturned by later ones. Any completely successful change will not introduce a linguistic difference which our measurements can be sensitive to. Since we are employing an aggregate dialectometric technique, we shall be in a position to evaluate the overall tendencies shown in diffusion. We shall not restrict our attention to a small number of linguistic features, and
Geographic distributions of linguistic variation
279
are therefore in a position to resolve the difference of opinion with respect to the gravity model noted in Section 2.3. Our measurements will note indicative and counter-indicative phenomena alike, and also quantify which are dominant. If we are to use synchronic linguistic distances to test claims about the diachronic development of dialects, then it is sensible to use data on the independent variables geography and population from a substantially earlier time, assuming that relative population size has been fairly stable. We imagine dialects undergoing small changes over a long period of time and continuously changing, and we want to allow enough time to lapse to give the processes of social contact a chance to accumulate effects. Finally, we settled on gathering data on population and distance (see below) from the time before the introduction of the railroads, more exactly in 1815, well before the times of modern mobility, and roughly 100–150 years before the linguistic data was collected. We assume that the dialects we examine continued to influence one another from then on, and so it is preferable to examine settlements that have been fairly stable in relative size and accessibility. We note that dialect surveys (including the one we used, introduced below) prefer older, non-mobile respondents, which means that the linguistic time lag is undoubtedly shorter than the 100–150 years between the time for which population sizes are available and the time of publication of the dialect atlas, perhaps by as much as 75 years. Although we believe that it would be legitimate to apply this sort of analysis in a fairly stable dialectal situation even with no time lag, we wished to err on the side of caution and sought data that would certainly reflect the accumulation of changes over decades. Perhaps it is not superfluous to add that we concede that it is difficult to determine a most appropriate time lag in a non-arbitrary way. Since we shall only observe the effects at a single time, we likewise assume that the situation long ago does not confound the effects we seek. This might conceivably have been the case if we had chosen a sample in which similarity unfortunately did not (originally) correlate with the chance of social contact, e.g., the Dutch of the polders reclaimed from the sea and uninhabited until the 1950’s, or the language of areas with large percentages of migrant labor such as the older peat bogs in the north of the Netherlands. We noted in Section 1 above that interaction was predicted to correlate positively with the product of settlement size and inversely with the square of distance. Iij ∝ Pi Pj Iij ∝ –dij2
280
John Nerbonne and Wilbert Heeringa
If it is correct, as we have just argued, that interaction should result in increased similarity, then similarity should correlate in the same way with population sizes and (inverse) distance. We shall finally be measuring linguistic distance, however, so that we shall test the following two hypotheses: LDij ∝ dij2 LDij ∝ –Pi Pj
(2)
A second assumption is likewise crucial. We shall essentially test the predictions in Section 2 by examining the correlations between linguistic distance as measured by Levenshtein distance on the one hand and geographic distance and population size on the other. We have no way of controlling for other effects in the data which are also plausible, e.g. the influence of foreign languages, the social homogeneity of the situations, or function of dialect differentiation as a mark of social differentiation. All of this is effectively “noise” in the current scheme. Finally, let us note that while Trudgill distinguishes the attractive force of a larger settlement on a smaller one from that of a smaller settlement on a larger one (effectively using an analog of the asymmetric acceleration due to gravity), noting that one expects the smaller settlement to accommodate more to the larger one than vice versa,5 we are restricted to observing only the long-term results of the attraction so that we do not distinguish the two cases. Viewed from another perspective, we are using a true distance measure, which is therefore symmetric. We cannot distinguish the effects of i on j from the inverse effects of j on i using this measure. We might be able to get some leverage on the asymmetric effects if we had data from different time points. In the present study we are attempting to evaluate an historical hypothesis on the basis of the accumulated effects it predicts. We turn now to the details of the experiment. 3.
Experiment
In this section we review our selection of data and the conduct of the experiment. 3.1. Linguistic data The dialect data from the Reeks Nederlandse Dialectatlassen (RND) were used, compiled by E. Blancquaert and W. Pée in the period 1925–1982. In
Geographic distributions of linguistic variation
281
these records we find the pronunciation transcripts of local speakers of each dialects in nearly 2,000 locations from which we can then choose a suitable sample (see below). On the basis of this data, linguistic distances between settlements were calculated using Levenshtein distance (see above). 125 words formed the basis of the calculations (Heeringa 2004: App.B). Table 1. Words Aggregate Pronunciation Difference is Based on
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Dutch
English
RND
mijn vriend werk op schip kregen brood vinger vier bier twee drie hij knuppel ik knie gezien kerel stenen breder duivel gebleven meester zee graag steel bezem geroepen peer rijp
my friend work on ship got bread finger four beer two three he cudgel I knee seen fellow stones broader devil stayed master sea gladly handle broom called pear ripe
2 2 4 5 5 5 5 6 10 10 11 12 13 13 14 14 14 21 25 25 28 28 29 29 31 33 33 35 36 36
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Dutch
English
RND
geld ver brengen zwemmen bed springen vader zes jaar school laten gaan potten zijn veel maart nog koud kaars geeft licht paard tegen kaas dag avond barst brief hart spannen
money far bring swim bed spring father six year school let go jars are much March yet cold candle gives light horse against cheese day evening crack letter hart put
38 39 39 42 45 47 53 53 53 53 53 53 56 56 56 58 58 58 59 59 59 60 63 66 68 68 70 71 72 74
282
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
John Nerbonne and Wilbert Heeringa Dutch
English
RND
nieuwe kar zoon koning ook geweest lange woord kindje was dochtertje bos ladder mond droog dorst weg krom liedje goed kelder voor moest drinken broer moe dun zuur put uur vuur duwen hebben
new cart son king also been long word baby was daughter forest ladder mouth dry thirst way curved ditty good cellar for must drink brother tired thin sour well hour fire push have
74 74 76 76 76 76 78 79 80 80 82 82 83 86 86 86 87 87 90 92 95 95 96 96 98 98 100 100 101 101 104 105 106
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
Dutch
English
stuk brug veulen komen deur gras bakken je eieren krijgen waren vijf hooi is groen boompje wijn huis melk spuit koe koster buigen blauw geslagen saus flauw sneeuw doen dopen dorsen binden
piece bridge foal come door grass bake you eggs get were five hay is green little tree wine house milk spouts cow sexton bend blue struck sauce flat snow do baptize thresh bind
RND 106 106 107 107 109 111 113 116 116 116 119 119 122 122 122 124 125 126 127 127 127 128 129 131 131 132 132 133 136 137 138 139
For each pair of settlements in the sample (see below), we obtain a measure of the pronunciation distance between the settlements. It is best to imagine the results as a distance chart of the sort created and distributed by automobile clubs. But the linguistic distance chart is a table in which the cells
Geographic distributions of linguistic variation
283
are not travel distances or travel times, as in the auto club charts, but rather linguistic distances. Every cell in the table represents the linguistic distance between the two settlements. Naturally, the diagonal contains only zeroes (the linguistic distance from a settlement to itself), and the table halves above and below the diagonal are symmetric, just as all distances are. We shall then try to predict this distance using geographical distance on the one hand and the inverse of the populations’ product on the other.
3.2. Choice of settlements The particular area chosen for measurements may be crucial. Naturally we wish to use data that has been collected and recorded consistently and accurately. Further, since we can only test Trudgill’s formulation of the gravity idea using settlements for which it is plausible to assume that they do not differ (at least not systematically) with respect to Trudgill’s linguistic similarity constant s, we should not choose an area straddling a major dialect boundary.
The Groningen-Hengelo sample We then sought a suitable set of locations for the study, in particular an area with a few larger settlements and a larger number of smaller settlements varying in population size, which, moreover, does not encompass known dialect islands or significant dialect boundaries. The 52 locations we chose for the model presented in this paper are roughly between Groningen and Hengelo, with no dialect islands and few settlements with large populations. The towns and villages used in our study can be found in Figure 3, and their relative population sizes have been quite stable (see below). The selection of a given settlement was determined by its simultaneous presence in Heeringa’s digitized RND data and in the historical atlas (see below). We are aware that Dutch “city dialects” do not now enjoy great prestige, but this was not true during most of the time at which the diffusions studied here were underway, so this more recent negative prestige should not affect the chance of seeing the larger Dutch town influence smaller ones. This might have been important in view of the remarks in Section 2.3 above, postulating that prestige plays a confounding role in earlier studies. In any case it would be difficult to control for this factor in a study such as the present one, probably meaning that one would need to seek another dialect area.
284
John Nerbonne and Wilbert Heeringa
Figure 3. The 52 settlements in our study all lie within the Lower Saxon dialect area of the northern Netherlands.
We also note that the sample used here would be less than optimal if it were necessary to mirror the varieties now spoken in the larger towns. This has to do with the field workers’ preference for older, less mobile speakers, who are not representative of the current range of speakers. For example, the variety of Groningen city dialect recorded in the dialect atlas is not widely used in the city now. If we were trying to observe the effects of gravity today, we should prefer a more representative selection of today’s speakers so that the population which is having the attractive effect were better represented. But less us recall that the entire motivation in choosing older, less mobile speakers is the wish to understand what the language was like at the time of the speaker’s childhood. If the RND compilers and fieldworkers choose their respondents well, then they are giving us a picture of the language approximately 50 years before their interviews (on average). So we do not believe our use of atlas material to be a liability in this respect. An attractive aspect of this choice of settlements is that the area involves no substantial barriers such as mountain ranges, national boundaries, or large bodies of water. Since we are ultimately interested in distance as a predictor of the chance for social contact, it is clear that we wish to use a sample in which distance is likely to reflect the (inverse) chance of social contact.
Geographic distributions of linguistic variation
285
3.3. Population size data We decided to use population sizes from the time before great mobility (see above), reasoning that these sizes would reflect the interaction of that older period, which should translate into adopted, measurable changes in the period from which our linguistic data is taken. The populations of the different settlements in our model were taken from the Geschiedkundige atlas van Nederland; Het koninkrijk der Nederlanden 1815 –1931 Ramaer (1931) and date from around 1815. Groningen had the largest number of inhabitants, with a population of 27,824. The other populations range from 553 to 6,962 with a regular distribution, as shown in Figure 4. Ramaer (1931) also provides populations for 1930, and we calculated that the populations for 1930 correlated highly with those from 1815 (r = 0.86) so that it is safe to say that we are dealing with a set of settlements which is quite stable in relative population size. For each pair of towns in the selection, the product of the population of those two towns was calculated, resulting in a symmetric table, like the table of pronunciations (see above).
Figure 4. The populations of the 52 settlements in 1815.
3.4. Geographical distances We measured the distance between settlements using the Euclidean approximation based on the longitude-latitude coordinates, i.e., the root of the sum of the squared differences in longitude and latitude. This distorts the true
286
John Nerbonne and Wilbert Heeringa
distances a bit since the settlements are distributed over the surface of a sphere (the Earth), not a plane, but the discrepancies are minor because the area is small. Van Gemert (2002) reports on an experiment using travel time instead of “as-the-crow-flies” distances, but travel times turned out to correlate so nearly perfectly in the Netherlands (r = 0.92) that we ignore this refinement here. We note for the future that we expect detailed geographic models to be more important as we analyze mountainous areas, where the barriers to social contact are substantial (Gooskens 2004), and perhaps also when we attempt to incorporate the role of waterways more effectively.
3.5. Analysis Our data preparation yields three different half-matrices, one showing the linguistic distance between each pair of settlements, one showing their populations’ product, and a third containing their geographic distance, measured as described above. In examining correlations involving aggregate linguistic distances we might be seen guilty of the “ecological fallacy” (Freedman et al. 1998: 148–150), i.e. of overstating correlations by examining aggregate values. We would counter this putative objection by referring to the need to characterize entire linguistic varieties with respect to some aggregate of their properties. The ecological fallacy arises, e.g. when one studies the relation between income and education not on the basis of individual incomes and educational levels, but rather on the basis of average values over several groups. We maintain that it is simply necessary to examine aggregate properties if we are to characterize entire varieties as opposed to single linguistic variables (such as the pronunciation of final /t/). Linguistic varieties have a status unlike “average individuals” which justify this step. We therefore submitted the data described above to multiple regression analysis, exploring various approximation techniques (stepwise and simultaneous) without noticing effects in the results, to which we turn presently. We note further that, since we are dealing with distances of various sorts, the assumption of independence of observations is violated, meaning that the statistical significance of the correlation coefficients may not simply be read from a standard table or an SPSS output. Mantel (1967) suggested a permutation technique for evaluating significance in such cases. Since we are dealing with large numbers (of distances, normally (522 ) = 1,326), statistical significance is generally not an issue, but we report only correlation coefficients which are significant according to the Mantel test.
Geographic distributions of linguistic variation
287
4. Results In this section we examine whether the predictions of the gravity model are fulfilled.
4.1. Geographic effects It is axiomatic in dialectology that language variety is structured geographically (Nerbonne & Kleiweg 2006), so it comes as no surprise that geography is an excellent predictor of aggregate dialect distance. Our initial, linear regression model accounts for 59% of the variance in aggregate linguistic distance in our sample (r = 0.768). The correlation is positive, just as the gravity hypothesis predicts, and reflecting what is perhaps the fundamental postulate of dialect geography, that more proximate varieties are also more linguistically similar. We present this simplest linear model before turning to the predictions of the gravity model proper to emphasize a polemical point: geography influences linguistic variation massively, accounting for nearly 60% of the variance in the data! This means that the conclusions of earlier research, discussed above, that geography was not as influential as the gravity model would suggest, were misguided. We return to the question of how this could occur below.
4.2. Population When we turn to the effect of population size, we examine the addition of the independent variable for populations product to the purely geographic model. The addition of this variable improves the purely geographic model, but only negligibly (Table 2), allowing a mere 0.4% more variance to be explained with respect to the exclusively geographic model, an improvement that was not even statistically significant. More importantly, and surprisingly, the fundamental relation is not inverse, as the gravity model predicts, but rather direct. That is, the larger the population product, the greater the linguistic distance – exactly the opposite of what the gravity model predicts.
288
John Nerbonne and Wilbert Heeringa
Table 2. Adding population effects to obtain a gravity model adds no explanatory power to the model of the Lower Saxon data. The contribution of the population product independently is moreover positive, contradicting the predictions of the gravity model! Model Geography (d2) “Gravity”
r
r2
∆r2
0.715 0.715
0.511 0.511
0.511 0.004
This is not what we expected from the model. The gravity model predicts that population size should add explanatory value and moreover predicts the opposite direction of influence. Other things being equal, larger settlements tend to be less similar to one another than smaller settlements.
4.3. The need for dialectometry To underscore the need for an aggregate view, let us note that there are counter-indicators in the data as well. There are individual features which show the negative correlation with product populations predicted by the gravity hypothesis. The pronunciation distances of the word knuppel ‘club’ inversely correlate significantly (r = 0.14) with the population product. Figure 5 shows a map of the distribution of this word. The inverse correlation arises because the large settlements tended to have similar pronunciations. But if our analysis were to focus on this one item, we would misread the global trend.
4.4. Quadratic? Of course the gravity model not only predicts a positive correlation between linguistic distance and geographic distance, a prediction which is nearly synonymous with the entire enterprise of dialect geography, but it more exactly predicts that linguistic distance should be a quadratic function of geographic distance. Figure 6 provides a scatterplot of the data, together with the optimally fitting quadratic regression line. As the reader may verify, the cloud of data in the scatterplot does not suggest a quadratic relation. This visual suggestion is also born out by the attempt to model the linguistic distances not as a quadratic function of geographic distance, but rather as a linear or even logarithmic function of geography. These not only result
Geographic distributions of linguistic variation
289
Figure 5. The pronunciation of knuppel ‘club’ shows the distribution with respect to population size predicted by the gravity hypothesis (an inverse correlation of pronunciation distance and population product, r = −0.14), but it is entirely atypical. In the absence of a dialectometric methodology, we might be tempted to choose such atypical material.
in better apparent fits of the regression curves, but also in statistically significant rises in the correlation coefficient from r = 0.715 (for the quadratic curve) to r = 0.768 for the linear fit, and r = 0.751 for the logarithmic curve (the latter two do not differ significantly). Sublinear curves typically fit this data well: Séguy (1971) presented his dialect distances as function of the square root of geographic distance, and Heeringa & Nerbonne (2002) as a function of its logarithm. In this case the linear model is slightly better than the logarithmic model, but there is no significant difference between the two. Since linguistic distance tends to rise to a ceiling when large enough areas are examined, the logarithmic model functions in general better.
5. Discussion This paper has suggested a novel way of examining linguistic diffusion, viz., through synchronic measurements of linguistic similarity. We reason that the forces facilitating and impeding linguistic innovations should leave a
290
John Nerbonne and Wilbert Heeringa
residue of linguistic differences behind. The distribution of these differences betrays the dynamics which created them in a novel way, allowing us to examine the effects of diffusion without needing to probe ongoing changes. The “gravity” model is not perfect in explaining the differences among dialects spoken in a certain area. There is indeed a positive correlation between dialect distance and geographic distance, but the curve does not have the predicted quadratic shape. Even more surprisingly, there is a slight positive correlation between dialect distance and combined population size (see Table 2). Together, these results suggests that the dominant effect in dialect geography is not one of attraction, but rather differentiation. The closer dialects are to one another, and the more people that are involved, the more strongly they generate and retain differentiating elements.
Figure 6. The linguistic distances of Lower Saxon data presented as a function of their geographic distance together with the optimal quadratic regression line. In fact there is little hint of a quadratic form in the scatter cloud.
There are several qualifying remarks that need to be added to this conclusion. First, it would be hasty to conclude that there are no attractive, i.e. gravity-like forces at work in dialect dynamics, only that they are not the strongest. The theoretical arguments establishing the plausibility of a gravity-like force derive from the need to accommodate to one’s interlocutor, and this need is profoundly present in all human communication. But perhaps its effects are not lasting, and in any case they are not the strongest effects in the data we examined.
Geographic distributions of linguistic variation
291
Second, we noted in Section 2.2 above that we would ignore Trudgill’s “similarity” factor. It should have become clear that our experimental design, in which we crucially measured linguistic similarity as a putative result of gravity, could not also include similarity as an independent factor, at least not without complicating the analysis a great deal. It would be wrong to suspect that including similarity in the way Trudgill suggests (Trudgill, 1974: 234) could alter the direction of the conclusion, however. In his model, similarity is postulated to promote diffusion. But since similarity correlates positively with geographical proximity, its postulated effect should only strengthen the geographic one, which we have shown to be much weaker than postulated. Third, and more generally, Wolfram & Schilling-Estes (2003: 726) criticize the gravity model for abstracting away from too many influences which have been demonstrated to influence linguistic diffusion and retention. Wolfram & Schilling-Estes devote a good deal of discussion to the role which social networks may play, discussing in particular the work of Lesley Milroy and James Milroy (Milroy & Milroy 1985). We, too, have abstracted away from many of the forces well known in variationist linguistics, such as the effect of dialect prestige, social class, sex, and age on language varieties, relying on the one hand on the compilers of the RND to have controlled for those effectively, and focusing on a higher level of aggregation on the other. This seems reasonable, given the difficulty of obtaining data of this sort, but it also worth recalling it explicitly. For the reader unfamiliar with regression analyses, we note that the strength we have shown geography to have as an explanatory variable makes it nearly inconceivable that social variables of the sort Wolfram & Schilling-Estes suggest could ever be stronger. We find the gravity model convincing in its intuitive justification, but unconvincing in the concrete test we put it to in the research report in this paper. We do not attempt to examine alternative models in this paper, but our basic methodology clearly supports extensive experimentation in mathematical modeling.
5.1. Future directions The paper points to the need for several follow-up studies and suggests some others. Naturally it would be interesting to examine data from other dialect and language areas, and also from other linguistic levels (lexical, morphological and syntactic data, for example). We have examined the variation curves in several language areas, however, and the sublinear shape
292
John Nerbonne and Wilbert Heeringa
of the linguistic distance curve is found standardly. Séguy (1971) and Cavalli-Sforza & Wang (1986) likewise demonstrate sublinear distributions, the former for a mixture of linguistic variables, and the latter for lexical distance. The most interesting question is why the distribution is sublinear. Are there hypotheses about social dynamics which would predict the form of this curve? The shape of the curve is not unlike the curves drawn by population geneticists, who plot genetic distance as a function of geographic distance and who speak of “isolation by distance” (Jobling et al. 2004: 142–144). A complete theory would try to isolate the effect of a single settlement on another, but it is not clear how one might go about this. Anytime we examine the effect of one settlement on another, we inevitably detect many effects whose causes remain obscure, but which undoubtedly involve the many other settlements. Perhaps a finer theoretical analysis can make sense of the “many body” version of linguistic attraction and differentiation, but we do not attack that problem here. Another aspect worth more explicit attention in a more sophisticated model is the effect of measuring along the many dimensions of linguistic variation simultaneously. It is striking that Trudgill established the plausibility of a gravity model on the basis of individual features, while we have cast doubt upon it as a predictor of aggregate distance. A more sophisticated model should show how aggregate effects relate to individual features. After all, as we noted in above in this section, our negative conclusion about the predictive value of gravity models in aggregate dialect distance is compatible with the existence of gravity-like forces, but we have shown that such forces are not dominant. Feynman et al. (1963) shows how the sum of displacements in a multidimensional space tends to a logarithm of the number of displacements in his famous analysis of “random walks”. Linguistic “space” clearly has very many dimensions – could this suggestive parallel be developed into an analysis? 6 We would be interested in attempting to model the effects of trade together with geography. One expects trade to have increased the chance for social contact, and trade depended largely on waterways. Incorporating the effect of trade would mean exploring the relative importance of routes over land versus over water. This would be a promising topic for future work. Finally, and especially given all of the attention which has been paid to social factors in language change (Labov 2001), it would be most attractive to analyze data which has been collected to systematically catalogue variation over a range of extralinguistic variables, including at least geography,
Geographic distributions of linguistic variation
293
class and sex. This would allow a more direct comparison betzeen the roles of geography and other social factors.
5.2. Evidence for diffusion Discussions of linguistic gravity have focused on identifying elements which are missing from the original model (Boberg 2000; Horvath & Horvath 2001). One suspects that Trudgill might accept that social and political ties might likewise play a role in the diffusion of linguistic innovation without rejecting his basic model which emphasizes the forces of accommodation and conformity. The study here, on the other hand, urges that we interpret the role of geographic proximity and increased social contact not as forces promoting linguistic similarity but rather as forces promoting linguistic differentiation. We speculate that the most profound dynamic in linguistic variation is our differentiation of ourselves from our neighbors. This radically different view was enabled first because we took the step of examining a large body of linguistic material, rather than a small set of variables, each examined individually, and second because we argued that any given synchronic snapshot of linguistic variation should bear the marks of long-standing dynamics of diffusion.
Acknowledgements This research was made possible by a grant from the Dutch Organization for Scientific Research to the project “Determinants of Dialect Variation” (project 360-70-120, P.I. J. Nerbonne). We are indebted to Peter Kleiweg for programming support, Frits Steenhuisen and Elwin Koster for help with GIS and maps and to Pim Kooi for advice on obtaining historical demographic data. Renęee van Bezooijem, Mark Gawron, Charlotte Gooskens, Bill Kretzschmar, Hermann Niebaum, Marco Spruit and Peter Trudgill commented generously on presentations and on an earlier draft.
294
John Nerbonne and Wilbert Heeringa
Notes 1. It is indeed a perfect analog of the formula specifying the force due to gravity: mi mj Fij = G · 2 r in which the masses (m) of the objects play a promoting role, the distance between them (r ) a suppressing one, mediated by a gravitational constant, G. 2. We have also experimented with several phonological feature systems as bases for segment comparison (Heeringa et al. 2002; Heeringa 2004). These perform at levels similar to the acoustically based segment distances described in the test. 3. See http://www.phon.ucl.ac.uk/home/wells/cassette.htm. 4. The program PRAAT is a free public-domain program developed by Paul Boersma and David Weenink at the Institute of Phonetic Sciences of the University of Amsterdam and available at http://www.fon.hum.evua.nl/praat. 5. The degree to which i accommodates to j is proportional to Pi / (Pi + Pj ). 6. Mark Gawron suggested this intriguing line of thought.
References Bailey, Guy, Tom Wikle, Jan Tillery & Lori Sand 1993 Some patterns of linguistic diffusion. Language Variation and Change 3 (3): 241–264. Boberg, Charles 2000 Geolinguistic diffusion and the U.S.-Canada border. Language Variation and Change, 12 (1): 1–24. Bolognesi, Roberto & Wilbert Heeringa 2002 De invloed van dominante talen op het lexicon en de fonologie van Sardische dialecten. Gramma/TTT: Tijdschrift voor Taalwetenschap 9 (1): 45–84. Callary, Robert E. 1975 Phonological change and the development of an urban dialect in illinois. Language in Society 4: 155–169. Cavalli-Sforza, Luigi L. & William S.-Y. Wang 1986 Spatial distance and lexical replacement. Language 62: 38–55. Chambers, J. K. 1995 Sociolinguistic Theory: Language Variation and its Social Significance. Oxford: Blackwell. Clark, Herbert & Eve Clark 1977 Psychology and Language: An Introduction to Psycholinguistics. New York: Harcourt Brace Jovanovich.
Geographic distributions of linguistic variation
295
Feynman, Richard P., Robert B. Leighton & Matthew L. Sands 1963 The Feynman Lectures on Physics. Reading, MA: Addison-Wesley. Freedman, David, Robert Pisani & Roger Purves 1998 Statistics. 3rd Edition. New York: Norton. Giles, Howard 1994 Accommodation in communication. In Encyclopedia of Language and Linguistics, Vol. I, R. E. Asher (ed.), 12–15. Oxford: Pergamon. Goebl, Hans 1982 Dialektometrie: Prinzipien und Methoden des Einsatzes der Numerischen Taxonomie im Bereich der Dialektgeographie. Wien: Österreichische Akademie der Wissenschaften. 1984 Dialektometrische Studien: Anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. Vol. 3. Tübingen: Niemeyer. Gooskens, Charlotte 2004 Norwegian dialect distances geographically explained. In Language Variation in Europe. Papers from the Second International Conference on Language Variation in Europe ICLAVE 2, June 12–14, 2003, B.-L. Gunnarson, L. Bergström, G. Eklund, S. Fridella, L. H. Hansen, A. Karstadt, B. Nordberg, E. Sundgren & M. Thelander (eds.), 195– 206. Uppsala, Sweden: Uppsala University. Gooskens, Charlotte & Wilbert Heeringa 2004 Perceptual evaluation of Levenshtein dialect distance measurements using Norwegian dialect data. Language Variation and Change 16 (3): 189–207. Heeringa, Wilbert 2004 Measuring Dialect Pronunciation Differences using Levenshtein Distance. Ph.D. thesis, Rijksuniversiteit Groningen. Heeringa, Wilbert & Charlotte Gooskens 2003 Norwegian dialect examined perceptually and acoustically. Computers and the Humanities 37 (3): 293–315. Heeringa, Wilbert & John Nerbonne 2002 Dialect areas and dialect continua. Language Variation and Change 13 (3): 375–400. Heeringa, Wilbert, John Nerbonne & Peter Kleiweg 2002 Validating dialect comparison methods. In Proceedings of the 24th Annual Meeting of the Gesellschaft für Klassifikation, W. Gaul & G. Ritter (eds.), 445–452. Heidelberg: Springer. Horvath, Barbara M. & Ronald J. Horvath 2001 A multilocality study of a sound change in progress: The case of /l/ vocalization in New Zealand and Australian English. Language Variation and Change 13 (1): 37–57.
296
John Nerbonne and Wilbert Heeringa
Jobling, Marc A., Matthew E. Hurles & Chris Tyler-Smith 2004 Human Evolutionary Genetics: Origins, Peoples and Diseases. New York: Garland. Kessler, Brett 1995 Computational dialectology in Irish Gaelic. In Proceedings of the European Association for Computational Linguistics, Dublin: 60–67. Kretzschmar, William A. (ed.) 1994 Handbook of the Linguistic Atlas of the Middle and South Atlantic States. Chicago: University of Chicago Press. Labov, William 2001 Principles of Linguistic Change. Vol. 2: Social Factors. Malden, MA: Blackwell. Lewis, David K. 1979 Scorekeeping in a language game. Journal of Philosophical Logic 8 (3): 339–359. Mantel, Nathan 1967 The detection of disease clustering and a generalized regression approach. Cancer Research 27: 209–220. Milroy, James & Lesley Milroy 1985 Linguistic change, social network and speaker innovation. Journal of Linguistics 21: 339–84. Nerbonne, John, Wilbert Heeringa & Peter Kleiweg 1999 Edit distance and dialect proximity. In D. Sankoff & J. Kruskal (eds.), Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, 2nd ed., pp. v–xv. CSLI, Stanford, CA. Nerbonne, John, Wilbert Heeringa, Erik van den Hout, Peter van der Kooi, Simone Otten & Willem van de Vis 1996 Phonetic distance between dutch dialects. In CLIN VI: Proceedings from the 6th CLIN Meeting. Center for Dutch Language and Speech, University of Antwerpen (UIA), Antwerpen. G. Durieux, W. Daelemans & S. Gillis (eds.), 185–202, Also avail. as http://www.let.rug.nl/ ~nerbonne/ papers/clin96.pdf. Nerbonne, John & Peter Kleiweg 2006 Toward a dialectological yardstick. Quantitative Linguistics 13. Accepted. Nerbonne, John & William A. Kretzschmar 2003 Introducing computational methods in dialectometry. Computers and the Humanities 37 (3): 245–255. Special Iss. on Computational Methods in Dialectometry, John Nerbonne & William Kretzschmar, Jr. (eds.).
Geographic distributions of linguistic variation
297
Nerbonne, John & Christine Siedle 2005 Dialektklassifikation auf der Grundlage aggregierter Ausspracheunterschiede. Zeitschrift für Dialektologie und Linguistik 72(2): 129– 147. Ramaer, Johan Christoffel 1931 Geschiedkundige Atlas van Nederland; Het Koninkrijk der Nederlanden 1815–1931. The Hague: Nijhoff. Schmidt, Johannes 1872 Die Verwandtschaftsverhältnisse der indogermanischen Sprachen. Böhlau, Weimar. Séguy, Jean 1971 La relation entre la distance spatiale et la distance lexicale. Revue de Linguistique Romane 35: 335–357. Trudgill, Peter 1974 Linguistic change and diffusion: Description and explanation in sociolinguistic dialect geography. Language in Society 2: 215–246. 1983 On Dialect. Social and Geographical Perspectives. Oxford: Blackwell. 1986 Dialects in Contact. Oxford: Blackwell. van Gemert, Ilse 2002 Het geografisch verklaren van dialectafstanden met een geografisch informatiesysteem (GIS). Master’s thesis, Rijksuniversiteit Groningen. Avail. www.let.rug.nl/alfa/scripties.html. Wikle, Thomas & Guy Bailey 1997 The spatial diffusion of linguistic features in Oklahoma. Proceedings of the Oklahoma Academy of Science, 77:1–15. Avail. at digital.library.okstate.edu/OAS/. Wolfram, Walt & Natalie Schilling-Estes 2003 Dialectology and linguistic diffusion. In The Handbook of Historical Linguistics, B. D. Joseph & R. D. Janda (eds.), 713–735. Malden, MA: Blackwell.
Focus and verb order in Early New High German: Historical and contemporary evidence Christopher D. Sapp
1. Introduction* A well-known feature of Modern Standard German is the position of the verbs in subordinate clauses. Unlike in main clauses, where the finite verb occupies the second position of the clause, all verbs are clustered at the end of subordinate clauses with complementizers, and the order within the verbal complex is fixed. With two verbs, the only possible order is 2-1, i.e. the non-finite V followed by the finite V (1). With three verbs, the order is construction specific, with many constructions requiring the 3-2-1 order (2), some requiring the 1-3-2 order (3), and one syntagm, the future auxiliary werden plus a modal verb and infinitive, allowing both orders (4). (1)
a. … dass Klaus heute das Buch lesen will. that K. today the book read2 wants1 ‘… that Klaus wants to read the book today.’ b. … dass Klaus gestern das Buch gelesen hat. that K. yesterday the book read-PPP2 has1 ‘… that Klaus read the book yesterday.’
(2)
… weil es gekauft werden muss. because it bought3 AUX2 must1 ‘… because it must be bought.’
(3)
… weil er es hat kaufen müssen. because he it has1 buy3 must2 ‘… because he had to buy it.’
(4)
a. … weil er es kaufen können wird. because he it buy3 can2 will1 ‘… because he will be able to buy it.’ b. … weil er es wird1 kaufen3 können2.
300 Christopher D. Sapp However, earlier stages of German, as well as many contemporary dialects, show considerable variation in word order within the verbal complex. This paper deals with such variation in Early New High German (ENHG), arguing that the choice of orders within the verbal complex is sensitive to focus. Of course, determining information structure in an extinct language is problematic, since many cases of focus will be overlooked without intonational clues. Therefore, this paper uses different kinds of evidence to investigate the extent of the effect of focus on ENHG word order. The next section presents data from a corpus study of ENHG, presenting direct and indirect evidence for focus effects. Section 3 provides supporting evidence from interviews with speakers of dialects that show variation similar to that in ENHG. Section 4 presents the results of a Magnitude Estimation study on the effect of information structure on the werden-modalinfinitive syntagm in Modern Standard German. The conclusion will be that the limited focus effects in contemporary varieties of German represent a residue of the more robust ENHG system.
2.
Evidence from Early New High German
2.1. ENHG word order ENHG, the stage of German from 1350 to 1650, is characterized by a great deal of phonological, morphological, and syntactic variation. This is also true for subordinate-clause word order, where the position of the verbs is quite free compared to Modern Standard German. In addition to the “modern”, 2-1 order, the 1-2 order is possible (5). Clusters of three verbs show even more variation: in addition to the 3-2-1 and 1-3-2 orders possible in Modern Standard German, ENHG allows 1-2-3 and 3-1-2 (6). Finally, the verbs are not necessarily clause-final: some constituent may appear extraposed to the right of the verbs, independently of the order within the verbal complex (7).1 (5)
a. das er in kainer sund verczweiffeln sol that he in no sin despair2 shall1 ‘that he should not despair in any sin’ (PM 161) b. das der mensch alle sein lebttag nicht anders scholt thun that the person all his life-days nothing else should1 do2 ‘that man should do nothing else all the days of his life’ (PM 206)
Focus and verb order in Early New High German
301
(6)
a. das so darvorgesetzt ist in fragweis verstanden werden soll that REL before.set is in question understood3 AUX 2 should1 ‘that what is set before should be understood as a question’ (Eu. 14) b. als er des tages scholt begraben werden as he the day should1 buried3 become2 ‘when he should be buried on that day’ (PM 212) c. so er dan den menschen nicht hat mugen vberwinden when he then the person not has1 can2 overcome3 ‘when he has not been able to overcome the person’ (PM 158) d. dy er … getan solt haben REL he done3 should1 have2 ‘that he should have done …’ (PM 159)
(7)
a. Wye man fragen sol dy krancken how one ask2 shall1 the sick ‘how one should ask the sick’ b. daz ich damit sol pussen mein sund that I therewith shall1 atone2 my sin ‘that I should atone for my sin with that’
(PM 166)
(PM 163)
Several factors that favor the 1-2 order in ENHG have been identified in previous scholarship (e.g. Ebert 1981), including syntagm type, the presence of a stressed separable prefix, the phonological weight of the word preceding the verbs, and sociolinguistic factors. However, this paper concentrates on the effect of object focus on verb order, a new finding based on the following corpus study. 2.2. The corpus and methods The ENHG study was conducted on a corpus of thirty texts from the Bonner Frühneuhochdeutsch-Korpus, representing ten dialects from the period 1350–1600. From this corpus, I created a database of 2,921 unambiguously subordinate clauses, each of which contains a finite verb and at least one dependent, non-finite verb. There are 2,752 tokens with clusters of two verbs and 169 with clusters of three or more verbs. As the Bonner Frühneuhochdeutsch-Korpus is not tagged for word order, all tokens in the database were selected and tagged by hand. The database was analyzed using the statistics package GoldVarb 2001 (Robinson et al. 2001). This program was developed for sociolinguistic
302 Christopher D. Sapp studies and allows the researcher to determine the extent of the effect of several independent variables (linguistic and sociolinguistic factors) on a dependent variable (in this study, verb order). The key statistical output of GoldVarb 2001 that will be utilized in this study is the factor weight, which indicates the strength of the effect of a given factor on the dependent variable. The factor weight is expressed as a probability between 0 and 1; the further it is from 0.5, which indicates no effect, the greater that factor’s effect on the dependent variable. In addition to several factors that are not discussed in this paper (see Sapp 2006), clauses in the database were tagged for the following factor groups: focus, scrambling, and extraposition. Under focus, clauses were tagged “new” if an argument was new within the section of the text or new within one or two pages if the text was not divided into sections. Clauses were tagged “contrastive” if an argument could be interpreted as contrastive. If there were no new or contrastive arguments, the clause was tagged as “old”.2 In the factor group scrambling, clauses were marked “unscrambled” if an object appeared to the right of a negator or adverbial (i.e. immediately left of the verbs) and “scrambled” if an object was separated from the verbal complex by some constituent. Since pronouns obligatorily scramble, clauses with only a scrambled pronoun were not tagged as scrambled. Finally, under “extraposition”, clauses were tagged as to whether they contained an extraposed argument, adjunct, or no extraposed constituent.
2.3. The effect of focus First, let us examine the direct evidence for the effect of object focus on verb order in clusters of two verbs. In the corpus, the 2-1 order occurs 75% of the time, and the 1-2 order 24% of the time. Therefore, if the rate of the 1-2 order is significantly higher than 24% in a given context, that context favors 1-2. As shown in Table 1, under contrastive focus, the rate of the 1-2 order is higher than 50%, well above the overall rate of 1-2. Thus contrastive focus has a strong favoring effect on the 1-2 order. With an object that represents new information, the 1-2 order occurs 31% of the time, which is also above the expected rate of 24%. The effect of new information focus is not as strong as that of contrastive focus, but is probably significant, as the factor weight is < .04. Finally, with neither contrastive nor new information focus on an argument, the 1-2 order is somewhat disfavored. The effect of this factor group on verb order is statistically significant (p < 0.001).3
Focus and verb order in Early New High German Table 1. The effect of object focus on the 1-2 order Focus Contrastive New information Old information Total
4
2-1 15 816 1237 2068
1-2
(46%) (68%) (81%) (75%)
303
17 369 288 674
(53%) (31%) (18%) (24%)
Factor weight 0.263 0.397 0.586
Thus there is direct evidence for the favoring effect of object focus on the 1-2 word order. However, I may have been influenced by the word order to incorrectly interpret some constituents as focused. Moreover, I have likely overlooked some instances of focus, lacking the intonation cues that accompany focus in a spoken language. Therefore, let us examine two additional factors as indirect evidence for the effect of focus: scrambling and extraposition. There is a well-known correlation between focus and scrambling in German. An object is typically located adjacent to the non-finite verb when focused (in situ in the OV approach to German) but “scrambles” to the left when not focused (see e.g. Haider & Rosengren 2003). Thus if a focused object favors the 1-2 order in ENHG, we should expect a higher rate of 1-2 when the clause contains a non-scrambled object. As Table 2 shows, that is precisely what we find: with a non-scrambled object, the rate of 1-2 is 38%, well above the expected 24% and probably significantly so (factor weight < .04). On the other hand, in clauses with a scrambled argument, the rate of 1-2 is close to expected. This factor group has a statistically significant effect on verb order (p < 0.015). Table 2. The effect of scrambling on the 1-2 order Position of object Not scrambled Scrambled Ambiguous Total
2-1 45 115 1908 2068
(61%) (71%) (76%) (75%)
1-2 28 46 600 674
(38%) (28%) (23%) (24%)
Factor weight 0.371 0.568 0.499
The second piece of indirect evidence comes from extraposition. Bies (1996) demonstrates that extraposed arguments tend to be either heavy or focused. Therefore, a higher frequency of extraposition with the 1-2 order would support the finding that focus favors the 1-2 order. As shown in Table
304 Christopher D. Sapp 3, argument extraposition correlates to verb order: if an NP or argument PP is extraposed, the 1-2 order occurs significantly more often than expected at 36%. Note that extraposed adjuncts do not favor the 1-2 order, which can be taken as evidence that the effect of argument extraposition on verb order is due to focus, rather than extraposition in general. The effect of this factor group on word order is statistically significant (p < 0.001). Table 3. The effect of extraposition on the 1-2 order Extraposed constituent Extraposed argument Extraposed adjunct PP Nothing extraposed Total
2-1 163 153 1751 2067
1-2
(63%) (72%) (77%) (75%)
93 58 523 674
(36%) (27%) (22%) (24%)
Factor weight 0.359 0.525 0.502
Finally, let us examine the effect of focus on the order within clusters of three verbs. Just as the lack of focus on an object favors the 2-1 order, it also favors the 3-2-1 order at 24%, which is higher than the expected rate of 3-2-1 (17%). New information and contrastive focus, on the other hand, slightly (but not significantly) favor each of the three other orders. This effect becomes stronger when the three orders are taken together: with object focus, the 1-3-2, 1-2-3, and 3-1-2 orders occur 87% of the time, which is probably significantly higher than the expected 83% (factor weight < .04). This analysis is statistically significant (p < 0.041). Because of the small number of tokens, it was not possible to support these findings with evidence from scrambling and extraposition. Table 4. The effect of object focus on three-verb clusters Focus New info. / contrastive Old information Total
3-2-1
1-3-2 /1-2-3 /3-1-2
Factor weight
11 (12%) 19 (24%) 30 (17%)
80 (87%) 59 (76%) 139 (83%)
0.387 0.631
2.4. Discussion In this section, we have seen that new information and contrastive focus have a favoring effect in ENHG on the 1-2 order as well as on several or-
Focus and verb order in Early New High German
305
ders in three-verb clusters. The direct evidence for this effect has been supported by indirect evidence based on two focus-related phenomena: scrambling and extraposition. However, one important test is not available in a historical language: testing whether a given sentence is possible under more than one focus context. Therefore, in the next two sections, we will see whether variable word order in contemporary varieties of German is sensitive to object focus, as appears to be the case in ENHG.
3.
Supporting evidence: Swabian and Austrian German
3.1. Design Since Modern Standard German allows only the 2-1 order, we must examine dialectal evidence in order to determine the effect of focus on two-verb clusters in contemporary German. This section reports on interviews conducted with speakers of two dialects of German: Swabian and Austrian German.5 These dialects allow both the 2-1 and 1-2 orders, although most speakers seem to prefer the 2-1 order, perhaps under the influence of the standard language. The same method was used for gathering judgments in both dialects. For each dialect, I asked the first subject to tell me the dialect equivalent of some Standard German words. These dialect words were entered into a computer, which was used to generate sentences in that dialect. From then on, the first subject saw only these dialect sentences on the computer screen. Subsequent subjects were shown a print out of the first subject’s sentences and were allowed to write down phonological adjustments if necessary. This process was intended to minimize the effect of the standard language, by limiting the subjects’ exposure to the Standard German equivalents of the sentences. Each subject was asked to perform two tasks. First, he or she was shown a list of sentences and asked to judge their grammaticality, giving each sentence a score from 1 to 5. These sentences were the dialect translations of the Standard German sentences in (8), with each sentence appearing in fourteen different word orders. (8)
a. Ich glaube, dass Klaus heute das Buch lesen will. I think that Klaus today the book read-INF2 wants1 ‘I think that Klaus wants to read the book today.’
306 Christopher D. Sapp b. Ich glaube, dass Klaus gestern das Buch gelesen hat. I think that Klaus yesterday the book read-PPP2 has1 ‘I think that Klaus read the book yesterday.’ The word-order variations involved the two relative verb orders (2-1 and 1-2) with different placements of the adverb and object (scrambling, extraposition, and within the verb cluster). The six orders relevant to this paper will be illustrated in the next section. (For a complete discussion, see Sapp 2006). The second task involved judging dialect versions of the same basic sentences in (8), with the only variation being verb order (2-1 versus 1-2). However, in this task each pair of sentences was given a context question to elicit focus on the object, verb, VP, or the entire clause. Subjects were asked to judge the naturalness of the sentences as a response to the question and rate the sentences on the same 5-point scale. These sentences will also be illustrated in the next section. 3.2.
Swabian
For Swabian, a dialect of southwestern Germany, I interviewed two speakers, one male and one female. Both were students at the University of Tübingen and were approximately twenty-five years old. The first speaker was from a large suburb of Stuttgart, and the second was from a village just outside of Tübingen. The dialects are similar enough that the second interviewee did not need to make any phonological adjustments to the sentences generated by the first subject. 3.2.1. Task one The following are the Swabian sentences in the present perfect that were judged in the first task, along with their judgments.6 Sentences (9a) and (9b) represent the 2-1 order with unscrambled and scrambled objects, respectively. Sentences (9c) and (9d) are in the 1-2 order (unscrambled and scrambled). Finally, (9e) and (9f) involve the 2-1 and 1-2 orders with an extraposed object. (9)
a. I glaub, dass Glaus geschdern des Buach glese had. I think that Klaus yesterday the book read-PPP2 has1 ‘I think that Klaus read the book yesterday.’
Focus and verb order in Early New High German
b. c. d. e. f.
307
I glaub, dass Glaus des Buach geschdern glese had. I glaub, dass Glaus geschdern des Buach had glese. ? I glaub, dass Glaus des Buach geschdern had glese. ?? I glaub, dass Glaus geschdern glese had des Buach. ? *I glaub, dass Glaus geschdern had glese des Buach. ?
The subjects judged these same orders in the modal-infinitive syntagm: (10) a. I glaub, dass Glaus heud des Buach lese mecht. I think that Klaus today the book read-INF2 wants1 ‘I think that Klaus read the book yesterday.’ b. I glaub, dass Glaus des Buach heud lese mecht. c. ??I glaub, dass Glaus heud des Buach mecht lese. d. ?*I glaub, dass Glaus des Buach heud mecht lese. e. ??I glaub, dass Glaus heud lese mecht des Buach. f. ??I glaub, dass Glaus heud mecht lese des Buach. Several observations can be made from these judgments. First of all, they confirm Steil’s (1989) claims that the 2-1 order is preferred over 1-2 in Swabian, and that 1-2 is more grammatical with the perfect (9c) than with modals (10c). Secondly, as in ENHG, there is a correlation between scrambling and the 1-2 order. Whereas there is no difference in grammaticality between the variants of the 2-1 clauses with and without scrambling, both subjects rated the 1-2 order with modals higher when the object was not scrambled (10c) than when it was scrambled (10d). Finally, the clauses with extraposition showed a great deal of variation between subjects, so it is unclear how the data should be interpreted. However, to the extent that the average of the two subjects’ scores is meaningful, it appears that extraposition does not favor the 1-2 order, but actually seems to make the clause worse: (9f) is worse than (9c–d).
3.2.2. Task two In the second task, subjects judged the 2-1 and 1-2 orders in different focus conditions, which were elicited using context questions. The focus conditions tested were object focus (11), focus on the main verb (12), focus on the VP, i.e. the object and verb (13), and focus on the entire subordinate clause (14). Note that what is being tested here is focus in the syntactic
308 Christopher D. Sapp sense, rather than phonological stress, as all of the sentences except (12) have the sentential accent on the direct object. All of these sentences have both a present perfect and a modal-infinitive variant; they are illustrated here using the present perfect sentences.7 (11) Was had Glaus geschdern glese? what has Klaus yesterday read ‘What did Klaus read yesterday?’ a. I glaub, dass Glaus [F des BUACH] glese had. I believe that Klaus the book read has ‘I think that Klaus read the book.’ b. ?I glaub, dass Glaus [F des BUACH] had glese. (12) Was had Glaus geschdern mit dem Buach gmacht? what has Klaus yesterday with the book done ‘What did Klaus do with the book yesterday?’ a. I glaub, dass Glaus des Buach [F GLESE] had. b. ??I glaub, dass Glaus des Buach had [F GLESE]. (13) Was had Glaus geschdern gmacht? what has Klaus yesterday done ‘What did Klaus do yesterday?’ a. I glaub, dass Glaus [F des BUACH glese had]. b. ??I glaub, dass Glaus [F des BUACH had glese]. (14) Was isch geschdern bassierd? what is yesterday happened ‘What happened yesterday?’ a. I glaub, dass [F Glaus des BUACH glese had]. b. ??I glaub, dass [F Glaus des BUACH had glese]. In this task there was considerable inter-speaker variation, and withinspeaker variation between the two syntagms. The only consistent result was that for both speakers, the 1-2 variants were clearly less acceptable than the 2-1 variants in all contexts and with both syntagms. One of the subjects, however, did find a difference between the grammaticality of the 1-2 order when the object was focused (11b), which was judged with 4 out of 5 points, versus the 1-2 order in all of the other conditions, which all received a 3. I followed up on this by asking if (11b) really was better than the others, and the subject confirmed the original judgment.
Focus and verb order in Early New High German
3.3.
309
Austrian German
This study was conducted with five speakers from different regions of Austria, two male and three female, and all students at the University of Vienna in their mid-twenties. The same method was used as in the Swabian study: the first interviewee, from Lower Austria, translated the lexical items from Standard German into dialect and then was asked to judge dialect sentences generated from those lexical items.8 The remaining subjects, from small towns in Styria, Tyrol, and Vorarlberg, were shown the sentences in the Lower Austrian translations and were permitted to make phonological adjustments in addition to judging the word order. Despite the strong phonological differences between the dialects, the judgments in these tasks were largely similar. 3.3.1. Task one The judgments for the sentences in task one are presented below. Unlike the Swabian study, the Austrian judgments showed very little difference by syntagm, so I have averaged the scores for the perfect and modal-infinitive syntagms. The sentences are illustrated using the perfect syntagm and are given in the Lower Austrian form: (15) a. I glaub, dass da Klaus gestan des Buach glesn hot. I think that the Klaus yesterday the book read2 has1 ‘I think that Klaus read the book yesterday.’ b. I glaub, dass da Klaus des Buach gestan glesn hot. c. ??I glaub, dass da Klaus gestan des Buach hot glesn. d. ?*I glaub, dass da Klaus des Buach gestan hot glesn. e. ?*I glaub, dass da Klaus gestan glesn hot des Buach. f. *I glaub, dass da Klaus gestan hot glesn des Buach. There are a number of interesting results from this study. First of all, as in Swabian, the standard-like 2-1 order (15a–b) is clearly more grammatical than the 1-2 (15c–d), contrary to previous work on Austrian word order (Patocka 1997). Secondly, as in the Swabian study, although the 1-2 order is not fully grammatical, it is better when the object is unscrambled (15c) than when it is scrambled (15d). For some speakers, this distinction was as strong as 5/5 (fully grammatical) with an unscrambled object to 1/5 (completely ungrammatical) with scrambling.
310 Christopher D. Sapp Finally, unlike ENHG but as in Swabian, extraposition does not make the 1-2 order more acceptable, but rather makes the sentence even more ungrammatical (15f).
3.3.2. Task two As in the Swabian study, the Austrian informants were given a second task, in which they judged the 2-1 and 1-2 verb orders under different focus conditions. In this task, none of the four subjects showed any difference in acceptance of the 1-2 order based on focus (elicited in this manner). Nevertheless, one interesting point did arise from this task. After completing the survey, the speaker from Styria mentioned that the 1-2 order sounds better when the object is stressed: (15) c. c.'
?? ?
I glaub, dass da Klaus gestan des Buach hot glesn. I glaub, dass da Klaus gestan des BUACH hot glesn.
The fact that the subject mentioned this but did not show any difference in task two between the different focus conditions could mean one of two things. One possibility is that new information focus alone is not enough to make the 1-2 order more acceptable, but that contrastive focus is necessary. The second possibility is that the background question did not clearly elicit the intended focus interpretation.
3.4.
Discussion
There are two results of these studies that indicate that object focus continues to have an effect on word order. First of all, although for most subjects there was no clear pattern in the second task, one Swabian clearly indicated that the 1-2 order is better under object focus than other focus types, and one Austrian accepted 1-2 with a strongly stressed object. Secondly, results from the first task in both dialects show that the 1-2 order is more acceptable when the object is not scrambled. Since an object that fails to scramble is usually focused, this observation confirms the effect of object focus on the 1-2 order. However, the 1-2 order is clearly marginal in these dialects. It appears that this word order is so marked that it is no longer an acceptable way to indicate object focus. Thus the correlation between object focus and the 1-2
Focus and verb order in Early New High German
311
order in these dialects today is merely a remnant of the situation in ENHG, when this effect was much more robust. Note that these results can only be considered preliminary, since the studies did not follow the usual practices to ensure reliable results in experimental linguistics, such as filler sentences, lexical variation, randomized presentation, and tests for statistical significance. On the other hand, it is not clear how feasible it would be to use a controlled experiment to investigate dialect phenomena. One would need to find large numbers of speakers of the same (micro-)dialect and create sentences in that variety, and it is not clear how subjects would react to seeing their spoken dialect in printed form. Therefore, the next section reports on an experiment testing the effect of focus on verb order in Standard German.
4.
Supporting evidence: Modern Standard German
Recall from section 1 above that there is only one construction in Modern Standard German that allows any variation within the verbal complex. When the future auxiliary werden governs two infinitives, both the 3-2-1 and 1-3-2 orders are possible, and the 3-1-2 order is possible in some varieties of the standard, especially in Austria and southern Germany: (16) a. weil er es kaufen können wird because he it buy3 can2 will1 ‘because he will be able to buy it’ b. weil er es wird1 kaufen3 können2 c. %weil er es kaufen3 wird1 können2 Schmid & Vogel (2004) find that the word orders in this construction are influenced by stress. This section presents a Magnitude Estimation study that tests whether focus, rather than stress, is the crucial factor for determining word order.
4.1. Design In the Magnitude Estimation method of eliciting grammaticality judgments (Bard et al. 1996), subjects are presented with a number of sentences and asked to rate them relative to a reference sentence on a scale of the subject’s own choosing. This method has a number of benefits. First of all,
312 Christopher D. Sapp many sentences are tested, helping to abstract away from the possible effects of individual lexical items. Secondly, the study involves multiple subjects, abstracting away from the possibly idiosyncratic judgments of individuals. Thirdly, rather than eliciting absolute grammaticality judgments, the judgments are relative and often fine-grained. In order to elicit different focus interpretations, a correction format was used. In the instructions, subjects were asked to imagine that they are speaking with a friend who always misunderstands everything, so that they have to constantly repeat themselves. Subjects were instructed to judge only the answer. Five focus conditions were tested: subject focus (17a), object focus (17b), VP focus (17c), focus on the main verb (17d), and focus on the modal (17e).9 (17) a. Was? Maria wird einen Roman schreiben müssen? what M. will a novel write must ‘What? Maria will have to write a novel?’ Nein! Ich habe gesagt, dass [Foc Klaus] einen Roman schreiben no I have said that K. a novel write müssen wird. must will ‘No! I said that Klaus will have to write a novel.’ b. Was? Klaus wird eine Geschichte schreiben müssen? what K. will a story write must ‘What? Klaus will have to write a story?’ … dass Klaus [Foc einen Roman] schreiben müssen wird. c. Was? Klaus wird eine Geschichte lesen müssen? what K. will a story read must ‘What? Klaus will have to read a story?’ … dass Klaus [Foc einen Roman schreiben] müssen wird. d. Was? Klaus wird einen Roman lesen müssen? what K. will a novel read must ‘What? Klaus will have to read a novel?’ … dass Klaus einen Roman [Foc schreiben] müssen wird. e. Was? Klaus wird einen Roman schreiben können? what K. will a novel write can ‘What? Klaus will be able to write a novel?’ … dass Klaus einen Roman schreiben [Foc müssen] wird.
Focus and verb order in Early New High German
313
Each focus condition was tested twice for the three word orders that were expected to be fully to partially grammatical (3-2-1, 1-3-2, and 3-1-2) and just once for a word order expected to be ungrammatical (1-2-3). The result was thirty-five experimental sentences. Additionally, there were five filler sentences ranging from grammatical to ungrammatical. The reference sentence also used the correction format: (18) Was? Richard tanzt gern Tango? what R. dances gladly tango ‘What? Richard likes to dance Tango?’ Nein! Ich habe gesagt, dass Edith gern Walzer tanzt. no I have said that E. gladly waltz dances ‘No! I said, that Edith likes to dance waltz.’ Using the correction format for the reference and filler sentences was intended to result in better comparison with the experimental sentences. Note, however, that neither the fillers nor the reference sentence contained verb clusters and that that the focus condition in the reference sentence (multiple focus on the subject and object) was not tested in the experiment. Thus despite the similar format, these sentences should not have any effect on the judgments of the word orders under consideration. The experiment was conducted on paper. The first two pages consisted of the instructions and a practice activity, and the experiment itself occupied the remaining two pages. There were twenty different sets of lexical items, such that no subject saw any set of lexical items more than twice.10 Each subject saw a different version of the experiment, with the twenty combinations of focus and word order represented by different sets of lexical items in each version. There were a total of twenty participants in the experiment, seventeen women and three men, with a mean age of 23.6. All were native speakers of Austrian German, and thirteen were from Vienna and its suburbs. Thirteen of the surveys were administered in an introductory course on German grammar at the University of Vienna. The seven remaining surveys were completed by acquaintances of the experimenter. All of the subjects were university-educated and thus proficient in Standard German, but none of the subjects had any coursework in syntax.
314 Christopher D. Sapp 4.2. Results Figure 1 illustrates the results for this experiment.11 The word orders that should be grammatical (3-2-1 and 1-3-2) were judged to be grammatical: they score in about the same range as the grammatical fillers. The more marginal word order 3-1-2 scores below the two grammatical orders. As expected, the ungrammatical 1-2-3 order scored far below these orders, in the same range as the completely ungrammatical fillers (not shown in Figure 1).
Figure 1. Acceptance of word order by focus context
Recall from section 2 that in ENHG, focus on the object has a favoring effect on the 1-2 order and probably on the 1-3-2 order as well. In this experiment, with focus on the VP, the 3-2-1 order is considerably better than the 1-3-2 order; however, under object focus, 3-2-1 and 1-3-2 are scored about equal. The improved acceptability of the 1-3-2 order appears to confirm the favoring effect of object focus on the 1-3-2 order in Modern Standard German as well as in ENHG.
Focus and verb order in Early New High German
315
4.3. Discussion Schmid & Vogel (2004) find that stress has an effect on the order of the verbs in this construction. However, object stress is compatible with a number of focus interpretations, including object focus and VP focus. Thus if stress alone were the most important factor in determining word order in this construction, we would expect to find that object focus and VP focus show similar word-order preferences. However, Figure 1 shows that object focus and VP focus show differing preferences, especially with respect to the 1-3-2 order. Therefore, this experiment shows that the effect of focus on word order within the verb cluster is independent of the effect of stress. To sum up this section, in the Modern Standard German werden-modalinfinitive construction, focus has an effect on word order within the verb cluster. Generally speaking, this supports my findings from ENHG. Unfortunately, however, the Modern German data cannot be directly compared to ENHG, as there is only one instance of this construction in my ENHG corpus.
5.
Conclusions
This paper has demonstrated that focus on an object has a favoring effect on the 1-2 order and similar orders in Early New High German. Based on a corpus of ENHG subordinate clauses, direct evidence for this effect was found, as well as indirect evidence from the effects of scrambling and extraposition on verb order. This was supported by evidence from contemporary varieties of German. Some contemporary Swabian and Austrian speakers more readily accept the 1-2 order when preceded by a focused, stressed, or non-scrambled object. A Magnitude Estimation study of Modern Standard German found that focus has some effect on word order in the werden-modal-infinitive construction. As a final note, this paper has shown the usefulness for historical linguistics of comparing different types of evidence. This is especially important when attempting to establish the effect of subtle phenomena such as focus.
316 Christopher D. Sapp Acknowledgements Parts of this research were generously funded by Deutscher Akademischer Austauschdienst and the Fulbright Program. I would like to thank Marga Reis and the SFB 441 for their welcome during my year in Tübingen, to Sam Featherston for help with Magnitude Estimation, and to Richard Schrodt and Wolfgang Dressler for putting me in contact with students in Vienna. Finally, thanks to the audiences of the workshop on verb clusters at the University of Tübingen and the 2006 conference on Linguistic Evidence for their helpful comments.
Notes 1. 2.
3.
4.
5.
6.
7.
8.
In addition, some constituent may come between the verbs in the 1-2 and similar orders (e.g. 1-X-2). See Sapp (2006) for discussion. Although focus on a verb is of course possible, there were not enough clear cases of this to test for statistical significance. However, focus on a verb very likely has an effect on verb order in ENHG, as in contemporary German (Schmid & Vogel 2004). It is well-known that information structure is related to definiteness: indefinite NPs tend to be new to the discourse. However, neither the definite vs. indefinite distinction (p > 0.602) nor the non-pronominal NP vs. pronoun distinction (p > 0.101) was statistically significant (see Sapp 2006 for details). GoldVarb 2001 does not round up percentages, so they often add up to 99% rather than 100%. I report the statistics here exactly as outputted by GoldVarb, except that I have changed p = 0.000 to p < 0.001. I gathered the Swabian data while on a DAAD grant at the University of Tübingen and the Austrian data on a Fulbright grant at the University of Vienna. This research was conducted with the approval of the Indiana University Bloomington Human Subjects Committee (#03-8702). The sentences seen by the subjects followed Standard German punctuation and capitalization rules. The superscripts were determined by averaging the subjects’ numerical judgments and rounding up to the nearest whole number (5 indicated with no mark, 4 with ?, 3 with ??, 2 with ?*, and 1 with *). Capitals letters mark the accented word (focus exponent), and brackets indicate the focus projection. The sentences seen by the subjects used conventional German punctuation and capitalization. The judgments here are for the present perfect sentences only, with the modal + infinitive sentences showing slightly different results. The judgments were calculated as in task one. This speaker rejected all clauses with the 1-2 order; therefore, this speaker’s judgments are not included in the discussion below.
Focus and verb order in Early New High German
317
9.
The focus conditions are illustrated using only one word order and one set of lexical items. In the experiment, the sentences appeared in standard orthography and punctuation. 10. Subjects were ten male and ten female first names, 4–8 characters long and with a mean frequency of 2.9 million hits on www.google.de (Germanlanguage sites only). The objects were 4–11 characters long with a mean frequency of 468.6 in the CELEX corpus (Baayen et al. 1995), and the verbs were 5–10 characters with a mean frequency of 1081.88 in CELEX. 11. 95% confidence interval of the normalized z-scores of the grammaticality judgments calculated with the program SPSS. Outliers were removed using the box-plot test.
References Baayen, R. Harald, Richard Piepenbrock & Leon Gulikers 1995 The CELEX Lexical Database (Release 2). Philadelphia: Linguistic Data Consortium, University of Pennsylvania. Bard, Ellen Gurman, Dan Robertson & Antonella Sorace 1996 Magnitude estimation of linguistic acceptability. Language 72: 32–68. Bies, Ann 1996 Syntax and Discourse Factors in Early New High German: Evidence for Verb-final Word Order. M.A. Thesis, U. Penn., Philadelphia. Das Bonner Frühneuhochdeutsch-Korpus Institut für Kommunikationsforschung und Phonetik, Universität Bonn. http://www.ikp.uni-bonn.de/dt/forsch/fnhd/ Ebert, Robert Peter 1981 Social and stylistic variation in the order of auxiliary and non-finite verb in dependent clauses in Early New High German. Beiträge zur Geschichte der deutschen Sprache und Literatur 103: 204 –237. Eu = Der Eunuchus des Terenz. Uebersetzt von Hans Neidhart 1486. 1915 Ed. by Hermann Fischer. Tübingen: Litterarischer Verein. Haider, Hubert & Inger Rosengren 2003 Scrambling: Nontriggered Chain Formation in OV Languages. Journal of Germanic Linguistics 15: 203–267. Patocka, Franz 1997 Satzgliedstellung in den bairischen Dialekten Österreichs. Frankfurt am Main: Lang. PM = Eine neue Quelle für die Kenntnis des mystischen Lebens im Kloster Pillenreuth. (Untersuchungen und Texte). 1960 Ed. Elvira Langen. Ph.D. dissertation, University of Heidelberg.
318 Christopher D. Sapp Robinson, John, Helen Lawrence & Sali Tagliamonte 2001 GoldVarb 2001: A multivariate analysis application for windows. www.york.ac.uk/depts/lang/webstuff/goldvarb Sapp, Christopher D. 2006 Verb Order in Subordinate Clauses from Early New High German to Modern German. Ph.D. dissertation, Indiana University. Schmid, Tanja & Ralf Vogel 2004 Dialectal Variation in German 3-Verb Clusters: A Surface-Oriented Optimality Theoretic Account. Journal of Comparative Germanic Linguistics 7: 235–274. Steil, Claudia 1989 Untersuchungen zum Verbalkomplex im Schwäbischen. M.A. thesis, University of Tübingen.
Contrastive topics in pairing answers: A cross-linguistic production study Stavros Skopeteas and Caroline Féry
1. A typological questionnaire on information structure In this paper we present the results of a cross-linguistic production study bearing on the answers to single and double wh-questions, called ‘single answers’ and ‘pairing answers’ in the following. This study is part of a large experimental set-up designed to create a cross-linguistic spoken data archive for the study of the realization of information structure. For this purpose, a Questionnaire on Information Structure (QUIS, Skopeteas et al. 2006) has been developed, which contains the following modules for the creation of a data set in an object language: (a) The first module of QUIS is a classic grammatical questionnaire in the sense of Comrie and Smith (1977). It contains questions about the typological properties of the object language at all layers of grammar, with special emphasis on those properties that are relevant for the encoding of information structure (prosody, morphology and syntax). (b) The main part of QUIS aims at the elicitation of spontaneous spoken data in a near-naturalistic discourse context. This part of QUIS comprises 29 tasks which elicit sentences with different types of information structure. The individual tasks address a large number of information structural properties: different kinds of focus-background structure, discourse status of the referents (given, accessible, new), thetic vs. categorical sentences, contrast, selection, correction and several notions of topic (implicational topic, bridging topic, frame setting, etc.). In order to allow for cross-linguistic comparison, the elicitation tasks of QUIS rely on non-verbal stimuli, mainly pictures and videos, but also games and guided dialogues, which establish identical discourse situations in all object languages. (c) The third module is a set of 380 sentences to be translated and recorded while read aloud by a native speaker. These sentences are set in con-
320 Stavros Skopeteas and Caroline Féry texts and in this way induce a variety of information structural properties. This last module leans on a rich tradition of questionnaires illustrating target structures and contexts (see for instance the questionnaire on tense/aspect categories in Dahl 2000). The cross-linguistic experiments in QUIS are designed to reveal several insights on the universals and typology of languages with respect to the encoding of information structure. In this paper, we present the results from one elicitation task, devoted to single answers and pairing answers. Section 2 outlines the research question concerning pairing answers and their properties in comparison to single answers. Section 3 introduces the related elicitation task in QUIS and explains the experimental procedure. Section 4 sums up the results with main emphasis on word order and sentential prosody, and section 5 discusses the results and draws some typological generalizations. Section 6 concludes the paper. 2. Pairing answers and single answers ‘Pairing answers’ are sentences that express the pairing between members of two given sets, i.e. they match members of one set to members of another set (see Dayal 2003). Assuming, for instance, a set of persons {John, Mary} and a set of fruits {apple, banana}, a pairing sentence matches members of these two sets, {(John, apple), (Mary, banana)}, as illustrated in (1a). In contrast to pairing sentences, ‘single sentences’ express the relation between two entities, as illustrated in (1b). (1)
a. John is eating the apple and Mary is eating the banana. b. John is eating the apple.
The crucial semantic difference between (1a) and (1b) is that the referents involved in the first example are members of a matching function that links each member of the first conjunct with the corresponding member in the second conjunct. The information structure of the examples in (1) depends on the context in which they occur, but following the above assumption, it is expected that they are inherently different across contexts: referents belonging to a pairing sentence bear an additional feature which captures the fact that they are members of a set. An object question licenses the information structure illustrated in (2a– b). A single answer to a single object question has a discourse-linked subject constituent bearing a topic feature ‘Top’ (aboutness topic) and an ob-
Contrastive topics in pairing answers
321
ject constituent that provides the information required by the question and bears the focus feature ‘Foc’. The pairing answer to a multiple object question in (2b) bears the same features plus a feature ‘C’ that indicates that the referents are members of a set of alternatives and are contrastive topics (see Féry & Samek-Lodovici 2006). (2)
a. A: What is John eating? B: JohnTop is eating the appleFoc. b. A: What is John eating and what is Mary eating? B: JohnTopC is eating the appleFocC and MaryTopC is eating the bananaFocC.
In the same vein, subject questions induce answers with the subject in focus in both cases. In the pairing answers, they bear a feature of contrast resulting from their property of being members of sets of alternatives. The object constituents express the information the sentence is about. (3)
a. A: Who is eating the apple? B: JohnFoc is eating the appleTop. b. A: Who is eating the apple and who is eating the banana? B: JohnFocC is eating the appleTopC and MaryFocC is eating the bananaTopC.
Pairing answers may also function as answers to multiple constituent questions. Following Roberts (1996) and Büring (2003), a multiple constituent question may be derived by a set of subquestions that take either the subject or the object as the sorting key. This predicts that both structures, illustrated as B and B' in (4), are congruent answers to the multiple constituent question.1 (4)
A: Who is eating what? (subquestions: What is John eating?, What is Mary eating?) B: JohnTopC is eating the appleFocC and MaryTopC is eating the bananaFocC. (subquestions: Who is eating the apple?, Who is eating the banana?) B': JohnFocC is eating the appleTopC and MaryFocC is eating the bananaTopC.
Even if both are congruent with respect to question A, these answers should not be seen as equally likely to occur. It has been argued that this type of question requires that one of the arguments is contextually given or at least
322 Stavros Skopeteas and Caroline Féry subject to an ‘aboutness’ condition (see Boškovic 2002; Kuno 1982; Comorovski 1996; Krifka 2002; among others). Though the accounts crucially differ with respect to the conditions that determine which argument is discourse linked (either the subject as typical carrier of given information, or the first constituent in the question as a result of its hierarchical position in the sentence structure), it is expected that the strategy implied by the answer B will be preferred over that of B'. On the basis of these reflections, two hypotheses can be formulated, which are tested below. Hypothesis I: Effects of topicalization are more likely to occur in the pairing answer than in the single answer.
Hypothesis II: Multiple constituent questions induce answers in the form ‘subject topic – object focus’.
3.
Elicitation task on pairing answers
The aim of the elicitation task, to which we now turn, is to establish a nearnaturalistic discourse setting for the production of the answer types just described. The data give us (a) insights about individual languages’ reflexes for the encoding of the information structural properties in the different contexts, and (b) empirical support for theoretical assumptions.
3.1. Materials and Procedure In the elicitation task eight question types (see section 3.2) are implemented by eight items corresponding to different situations presented in pictures, resulting in a total of 64 experimental elements. The experimental elements are distributed in 8 sessions following a factorial design, so that each session contains 8 elements from different conditions and different items. 16 native speakers of each language participated in the production study. In sum, the results are based on 16 sentences per condition. Each session contains a large number of different production tasks, such as an experiment on spatial relations, narrations about short films, etc. These tasks are pseudorandomly distributed within the sessions.
Contrastive topics in pairing answers
323
Each item corresponds to a different event: “drinking”, “eating”, “holding”, “throwing”, “carrying”, “pushing”, “looking” and “hitting”. Two stimuli have been prepared for each event, one with parallel events and one with single events, as illustrated in Fig. 1.
Figure 1. Left picture: parallel events stimulus; Right picture: single event stimulus
The material was presented in a power point presentation, to be run in a self-paced manner. The questions had been pre-recorded by a native speaker of each language. The answers of the subjects were recorded on a DAT recorder (SONY 100), and analyzed with PRAAT (Boersma and Weenink 2006).
3.2. Conditions Pairing and single sentences were elicited in four conditions: (5)
All-new (wide focus) a. single answer What’s happening? (single event stimulus) b. pairing answer What’s happening? (parallel events stimulus)
(6)
Subject question (focus on the subject) a. single answer Who is eating the apple? (parallel events stimulus) b. pairing answer Who is eating the apple and who is eating the banana? (parallel events stimulus)
324 Stavros Skopeteas and Caroline Féry (7)
Object question (focus on the object) a. single answer What is the woman eating? (parallel events stimulus) b. pairing answer What is the woman eating and what is the man eating? (parallel events stimulus)
(8)
Multiple constituent question (double foci) a. single answer Who is eating what? (single event stimulus) b. pairing answer Who is eating what? (parallel events stimulus)
The stimuli with parallel events were used for the elicitation of the pairing answers as well as for the elicitation of single answers with single subject and object questions. The question evoking all-new answers and the multiple constituent question for the elicitation of single descriptions were presented with the single event stimulus.2
3.3. Object Languages The elicitation task on pairing answers was carried out with four languages: English, Georgian, German, and Greek.3 These languages differ significantly with respect to their word order properties, especially the possibility to move the arguments from their canonical position to encode pragmatic functions. English is a rigid SVO language, though certain information structural manipulations can induce deviations from the canonical constituent order, notably preposing of one argument or inversion of the two arguments (Ward & Birner 2004). In Georgian, the orders SOV and SVO are the preferred orders in corpora, but it is a matter of dispute whether the verb final order is the canonical one (Aronson 1982: 47) or if both orders are canonical (Harris 1981; Hewitt 1995). Furthermore, in Georgian the order of S and O may be reversed under the influence of the information status of the referents. German declarative sentences exhibit SVO order (with SOV as the basic order; see Thiersch 1978) and arguments may be reordered to encode pragmatic functions. The Greek preferred order is SVO and focusing and topicalization are expressed through movement to the left periphery as well as clitic doubling.
Contrastive topics in pairing answers
325
All four languages are intonation languages. The prosodic realization of the sentence is sensitive to information structure: pitch accents and boundary tones are determined by the pragmatic function of the constituent, such as topic or focus. 4.
Results
4.1. Word order In all four languages, the subject precedes the object in the canonical word order, as is particularly clear in the ‘all-new’ answers; see the single event examples from Georgian (9) and German (10) and the parallel events from English (11) and Greek (12). The order illustrated in (9)-(12) is the only attested order in all-new answers in English, German, and Greek. In Georgian, it is the dominant order with one exception (see Table 1). (9)
A: ‘What’s happening?’ (stimulus: single event) B: k’ac-i t∫’am-s banan-s man-NOM (OBJ.3)eat-SBJ.3 banana-DAT ‘A/the man eats a/the banana.’
(10) A: ‘What’s happening?’ (stimulus: single event) B: Ein Mädchen wirft einen Ball. ‘A girl is throwing a ball.’
4
GEO
GER
(11) A: ‘What’s happening?’ (stimulus: parallel events) B: A man is eating a banana and a woman is eating an apple. (12) A: ‘What’s happening?’ (stimulus: parallel events) B: i jinéka trói éna mílo ci o ádras trói mja banána. GRK the woman eats an apple and the man eats a banana ‘The woman is eating an apple and the man is eating a banana.’ The same order was produced in the object questions, both in the single and in the pairing answers. All sentences exhibit subject first orders as illustrated in (13) for German. (13) A: ‘What is the man hitting and what is the woman hitting?’ (stimulus: parallel events) B: Der Mann schlägt einen Ball, das Mädchen einen Karton. ‘The man is hitting a ball, the girl a box.’
GER
326 Stavros Skopeteas and Caroline Féry A subject question induces answers in which the subject is the focused information and the object is the background. The Greek data contains two examples of object fronting, and for Georgian more than half of the answers exhibited object fronting (14). (14) A: ‘Who is throwing the ball?’ (stimulus: single event) B: burt-s isvri-s gogo ball-DAT (OBJ.3)throw-SBJ.3 girl(NOM) ‘A/the girl throws a/the ball.’
GEO
In agreement with Hypothesis I, multiple subject questions induce answers in OS order in Georgian (15), Greek (16), and German. In Greek, a familiar object is left-dislocated and triggers clitic doubling. (15) A: ‘Who is throwing the ball and who is throwing the ring?’ (stimulus: parallel events) B: burt-s isvri-s gogona ball-DAT (OBJ.3)throw-SBJ.3 girl(NOM) bet∫’ed-s isvri-s bit∫’una ring-DAT (OBJ.3)throw-SBJ.3 boy(NOM) ‘A/the girl throws a/the ball, a/the boy throws a/the ring.’
GEO
(16) A: ‘Who is pushing the car and who is pushing the table?’ (stimulus: parallel events) GRK B: to aftokínito to spróxni i jinéka ke to trapézi o ádras the car it pushes the woman and the table the man ‘The woman is pushing the car and the man the table.’ Finally, multiple constituent questions induced answers in the SO order, as predicted by Hypothesis II. (17) A: ‘Who is throwing what?’ (stimulus: parallel events) B: gogona isvri-s burt-s GEO woman(NOM) (OBJ.3)throw-SBJ.3 ball-DAT bit∫’una isvri-s rgol-s boy(NOM) (OBJ.3)throw-SBJ.3 circle-DAT ‘A/he woman throws a/the ball and a/the boy throws a/the circle.’
Contrastive topics in pairing answers
327
A single example in OS order as an answer to a multiple constituent question occurs in Georgian, in accordance with the word order flexibility of this language, which is apparent even in all-new contexts. Table 1 summarizes the results on word order in non-elliptical answers. As a result, fewer sentences with argument focus are counted in single answers than in pairing answers, since the former type of question often triggers argument ellipsis. Table 1. Number of SO/OS clauses focus
English
German
Greek
Georgian
SO
OS
SO
OS
SO
OS
SO
OS
single answer
all O S SO
15 10 13 14
– – – –
14 10 12 14
– – – –
16 3 14 16
– – 2 –
15 14 7 12
1 – 7 –
pairing answer
all O S SO
11 15 15 15
– – – –
15 15 14 16
– – 2 –
16 15 11 16
– 1 5 –
14 14 6 13
1 – 10 1
The few sentences in Table 1 do not allow for statistic inferences, but they reveal some typological differences that are in line with the grammatical profile of the object languages (see section 3.3). The data shows that the subject question induces different word orders to a greater extent than the other questions. It has a stronger impact in Georgian and Greek, a weaker impact in German – more so in pairing answers than in single answers – and no impact in English (see discussion in section 5).
4.2. Prosodic properties 4.2.1. Prosodic structures The four languages investigated use intonation to signal information structure. English, German and Greek have been described extensively in the literature (see, among others, Pierrehumbert 1980, Ladd 1996, Gussenhoven 2004 for English; Uhmann 1991, Grabe 1998 for German; and Arvaniti & Baltazani 2005 for Greek), but, up to now, there is no study dealing with
328 Stavros Skopeteas and Caroline Féry the intonation of Georgian (see Skopeteas, Féry & Asatiani 2007 for a first attempt). In all four languages, a focused constituent is realized prosodically with a pitch accent, and a constituent in the background may be deaccented, depending on its position relative to the last focused constituent. Prenuclearly, these languages realize accents on given constituents. This is illustrated with an object question in Greek (Fig. 2). A rising contour occurs in the prenuclear accented word.5 The focused constituent (é)na trapézi is marked with a falling accent (noted H*L). The high part of the falling accent is downstepped relative to the high part of the first accent.
250
200
150 120 L*H
HP
L*H
oádras
H*L
spróxni
L P LI
natrapézi
1.2
2.8 Time (s)
Figure 2. Greek sentence [I [P o ádras]Top [P spróxni na trapéziFoc]] ‘The man is pushing a table’ as an answer to the question ‘What is the man pushing?’
250
200
150 120 L*H éna korítsi
HP
L*H kuvalái
0.16
H*L
LP L I
mNa karékla 1.9
Time (s)
Figure 3. Greek sentence [I [P éna korítsi]Top [P kuvalái mNa karéklaFoc]] ‘A girl is carrying a chair’ as an answer to the question ‘Who is carrying what?’
329
Contrastive topics in pairing answers
It is conspicuous that there is no noticeable difference between a sentence with a unique focus on the object, as in Fig. 2, and a sentence realized as an answer to a multiple constituent question (single event), as in Fig. 3. In both cases, a rising tone is located on the subject and a falling tone on the object. Downstep is also visible in Fig. 3. In the postnuclear position, a given constituent is deaccented, as shown in Fig. 4, again for Greek. In this sentence, the subject bears the falling accent H*L, as it is the last accent of the sentence. The object is clearly deaccented. English and German show the same intonational behavior. Georgian also deaccents its postnuclear constituents, though less regularly than in the other three languages.
250
200
150 120 H*L ijinéka
LP
L PL I xtipái
0.6
tokutí 2.1
Time (s)
Figure 4. Greek sentence [I [P i jinékaFoc] [P xtpái to kutíTop]] ‘The woman is hitting the box’ as an answer to the question ‘Who is hitting the box?’
Answers to multiple questions are illustrated in Figures 5 and 6. It is tempting to analyze these sentences as consisting of two intonation phrases (IP) together forming a complex intonation phrase (see Féry and Truckenbrodt 2005 for IP recursivity in German). Both subjects have a rising contour and the object NP in the second conjunct has a falling accent H*L. But in the first conjunct, the accent signaling focus is overridden by the rising boundary tone. The main characteristic of these sentences in German and in English is the typical downstep pattern, signaling cohesion between the two IPs. The subjects are downstepped relative to each other, as are the objects. It must be noted that the rising tone has been analyzed as a topic accent in German, Greek and English, but that, in our sentences, due to the preference for having a rising accent at the beginning of the phrase and a falling one at the end of the sentence, the focus accent is sometimes realized as a
330 Stavros Skopeteas and Caroline Féry rising tone, and the accent on a given constituent may have a falling tone. The typical pattern found for the pairing answers is illustrated for German and English with multiple question responses. However, due to the contrastive reading that these sentences have, the same contour was also realized when only the subject or only the object was asked for (see section 4.2.2).
400
300
200 120 L*H
H*L
der Mann
LP
HI
L*H
trägt einen Tisch
und die Frau
H*L
LI
trägt einen Stuhl
2.9
5.6 Time (s)
Figure 5. German sentence [I [P Der Mann]TopC [P trägt einen TischFocC]] [I [P und die Frau]TopC [P trägt einen StuhlFocC]] ‘the man is carrying a table and the woman is carrying a chair’ as an answer to the question ‘who is carrying what?’
200
150
100
50 L*H the lady
H*L
LP
HI
is eating an apple
L*H and the man
0.59
H*L
LI
is eating a banana 3.46
Time (s)
Figure 6. English sentence [I [P The lady ]TopC [P is eating an apple FocC]] [I [P and the man ]TopC [P is eating a banana FocC]] as an answer to the question ‘Who is eating what?’
Contrastive topics in pairing answers
331
Because of the neutralization of tonal patterns due to the extensive use of hat patterns in different contexts, the differences between the prosodic results need to be quantified. We turn to this task in the next section.
4.2.2. Cross-linguistic tendencies The question arises as to the quantity of pitch accents and deaccenting realized in the different conditions. In this section, normalized tonal contours6 for the SVO sentences in the four languages are compared. The quantified data allow the visualization of differences that cannot be perceived by a comparison involving single pitch tracks. First, the results for single answers are displayed in Fig. 7 for subject focus (S), object focus (O) and double question focus (SO). The discussion concentrates on pitch accents and deaccenting. The contours obtained in the double foci conditions are interpreted as the baseline. They show in all languages more than one accent: in English, German and Greek only the subjects and the objects are accented, while in Georgian the verb is accented as well. In Greek and German, the verb and the object are lower when the subject is a narrow focus than in the other conditions. In English, only the verb is lower, and the deaccenting of the object is signaled by the absence of further tonal movement. In Georgian, the subject is higher than in the other conditions, but object and verb are more or less identical in all three conditions. In German and in Greek, the contour starts higher when the subject is narrowly focused, and in Greek the fall on the subject is aligned earlier. In Georgian, the narrowly focused subject has a rising contour and it is this rise which aligns differently: it is much later than in the other contours. As for narrow focus on the object, in Greek, German and Georgian, the object is higher than in the other conditions, but this is not true for English. In English, accenting of the object is signaled by a further downstepping accent. Greek is the only language in which a narrow focus on the object is significantly higher than in the double focus condition. Furthermore, in this language, an additional accent appears on the verb. Fig. 8 shows normalized contours for the first conjunct of the pairing answers. Here, the question type does not have a large impact on the question. All three question types induce answers with a similar contour, in conformity with the double contrast induced by a double conjunct: a rise followed by a fall on the subject, a smooth fall on the verb (except in Greek where the verb is accented as well), and the final rise. The most conspicuous
332 Stavros Skopeteas and Caroline Féry Single answ ers in Greek 2
S O
z-scores
1
SO 0 -1 -2 subject
verb
object
Single answ ers in English 2
S O
z-scores
1
SO 0 -1 -2 subject
2
verb
object
Single answ ers in German S O
z-scores
1
SO 0 -1 -2
subject
verb
object
Single answ ers in Georgian 2
S O
z-scores
1
SO 0 -1 -2 subject
verb
object
Figure 7. Time and F0 normalized contours of the SVO single answers
Contrastive topics in pairing answers Pairing answ ers in Greek
2
S O
z-scores
1
SO 0 -1 -2
subject
2
verb
object
Pairing answ ers in English S O
z-scores
1
SO 0 -1 -2 subject 2
verb
object
Pairing answ ers in German S O
z-scores
1
SO 0 -1 -2 subject 2
verb
object
Pairing answ ers in Georgian S O
z-scores
1
SO 0 -1 -2
subject
verb
object
Figure 8. Time and F0 normalized contours of the SVO pairing answers
333
334 Stavros Skopeteas and Caroline Féry property of these pitch tracks is the steep rise induced by the boundary at the end of the contour, in Greek and German more so than in English and in Georgian, due to the fact that in these latter languages, a large proportion of the sentences were accompanied by a falling contour. Only Greek shows a difference between narrow focus on the subject and the other contours. And only German shows a difference with narrow focus on the object. In this case, the object has a lower contour. In sum, single list answers reveal differences in prosodic strategies in all four languages, but in the pairing answers, these differences are neutralized. It can be deduced that the need to realize the double contrast (on the subject and on the object) overrides the difference in the focus-background structure. An accent is always realized on both subjects and both objects, leading to a homogeneous accent pattern in all cases.
5. Discussion In section 2, we presented the two central hypotheses of this elicitation task, which are repeated here. Hypothesis I: Effects of topicalization are more likely to occur in the pairing answer than in the single answer.
Hypothesis II: Multiple constituent questions induce answers in the form ‘subject topic – object focus’.
We found evidence for Hypothesis I in the results from both word order and prosody. With respect to word order, we distinguish between three language types. In English no deviation from canonical order could be observed. In Georgian, the given-before-new order is the preferred option, and this triggers object fronting in both experimental conditions in which the object is part of the background. Finally, in Greek and German the same discourse conditions have different effects on word order. In German, familiarity of the object does not induce object fronting, but contrastive topicalization does. In Greek, we observe the same pattern with a slight quantitative difference. Familiar object constituents are rarely fronted, but when the familiar object is also contrasted in a pairing answer, then object fronting occurs much more frequently. As far as prosody is concerned, the pairing answers were nearly always produced with a prosodic pattern in which
Contrastive topics in pairing answers
335
both subject and object are accented. This prosodic pattern is congruent with a pattern in which the subject is topicalized. The second hypothesis relates to the different types of questions, and to the types of answers they elicit. The answers to multiple constituent questions are expected to induce a topic – focus pattern more often than the other questions. The results from word order have shown that these questions do not induce object fronting. This evidence is in line with the view that the sorting key in multiple constituent questions in which the subject pronoun precedes the object pronoun is the subject. However, this is not a necessary implication of our results, since the SO order is – at the same time – the canonical order of all examined languages. The results from the prosody of SVO sentences reveal that in the single answers, the focus structure is directly implemented in the pitch excursions. Object questions trigger answers with a falling accent on the sentence final (object) constituent, and subject questions trigger answers with a falling accent on the sentence initial (subject) constituent. The prosody of double foci is similar to that of narrow focus on the object, with the exception that the excursion on the object is not as high as in the narrow focus condition. Turning to the prosody of answers to multiple object and subject questions, the focus is overridden by a prosodic structure imposed by the contrastive context and according to which the sentence initial constituent is accentually marked as a topic. Multiple constituent questions induce subject topic – object focus answers, which confirms Hypothesis II. The predicted relationship between prosody and syntax is borne out. Syntax follows prosody in the sense that if a certain kind of prosody is strongly preferred (topic – focus accent pattern in a large number of the conditions), the syntax seeks to adapt. Since the prosodic marking of the initial constituent as a focus is clearly dispreferred in pairing answers, object fronting – in languages that allow for it – is the only remaining possibility to topicalize objects in this structure.
6. Conclusion This paper has presented results from an elicitation task in the framework of a questionnaire on information structure (QUIS). The task consisted in answering questions with the help of single answers or pairing answers in a well-balanced design. Four languages have been used for this paper: English, Georgian, German and Greek. It has been shown with word order and prosody that this specific task is appropriate to identify specific differences
336 Stavros Skopeteas and Caroline Féry between the strategies used in the implementation of focus structures. Word order is subject to more variation in Georgian and Greek than in German and English. The comparison between strategies involving prosody shows that single answers are more prone to display differences than pairing answers. The reason for this discrepancy is that pairing answers also involve a contrast, which overrides other information structural properties that these sentences may have. The investigation started in this paper needs to be extended in three ways. First, non-intonation languages, like tone languages and pitch accent languages, use different prosodic strategies, based on phrasing rather than on pitch excursion contrasts. This needs further material and thorough investigation. Second, the unique task reported on here can only reveal a tiny proportion of the interesting strategies used by languages in the reflexes of information structure. A large number of such tests are needed to provide us with a complete overview and to allow the formulation of sound typological generalizations. Third, variation in the production of spoken material also needs careful investigation. Research in these directions has already been conducted, and we will provide results from all three lines of investigation in the near future.
Acknowledgements This paper is part of project D2 “Typology of information structure” of the SFB 632 on information structure. We would like to thank Stefanie Dipper, Gisbert Fanselow, Michael Goetze, and Ruben Stoel for helpful discussions, as well as Anja Arnhold, Katharina Moczko, Andreas Pankau and Fabian Schubö for technical assistance.
Notes 1.
2.
It is assumed here that a constituent cannot be both a topic and a focus at the same time, and that the constituent chosen as the sorting key is necessarily a topic (see Büring 2003). We are aware that this assumption is speculative. The use of a multiple constituent question is marked in several languages, among them English and German, if it refers to a single pair. Nevertheless, this question has been included in our elicitation task in order to maintain the symmetry of the experimental design. All informants answered this question felicitously.
Contrastive topics in pairing answers 3.
4.
5.
6.
337
The English data were collected and transcribed by Elisabeth Medvedovsky, the Georgian data by Shorena Bartaia, the German data by Andreas Pankau and Katharina Moczko, and the Greek data by Thanasis Georgakopoulos, Yanis Kostopoulos, and George Markopoulos. Abbreviations for language names: GEO=Georgian, GRK=Greek, GER=German; Abbreviations for glosses: 3=3rd person, DAT =dative, NOM =nominative, OBJ = object, PRV =preverb, SBJ =subject. We do not consider prosodic phrasing in this paper, due to lack of space, and follow suggestions by Revithiadou & Spyropoulos (2006). According to them, the NP-subject o ádras ‘the man’ forms a p-phrase and is delimited on the right by a phrase tone HP. The figures in this section present the means of normalized contours of the SO sentences in Table 1. The time axis contains the measurements of F0 means of five equal intervals for each constituent. The pitch measurements have been normalized by transformation into z-scores (the difference between the pitch measurement and the speaker’s means has been divided by the standard deviation of the same speaker).
References Aronson, Howard I. 1982 Georgian: A Reading Grammar. Chicago: Slavica. Arvaniti, Amalia & Mary Baltazani 2005 Intonational analysis and prosodic annotation of Greek spoken corpora. In Prosodic Typology: The Phonology of Intonation and Phrasing, Jun Sun-Ah (ed.), 84 –117. Oxford: Oxford University Press. Boersma, Paul & David Weenink 2006 Praat: Doing phonetics by computer (Version 4.4.20) [Computer program]. Retrieved May 3, 2006, from http://www.praat.org/. Boškovic, Željko 2002 On multiple wh-fronting. Linguistic Inquiry 33: 351–383. Büring, Daniel 2003 On D-trees, beans, and B-accents. Linguistics and Philosophy 26: 511–545. Comrie, Bernard & Norval Smith 1977 Lingua descriptive studies: Questionnaire. Lingua 42: 1–72. Comorovski, Ileana 1996 Interrogative Phrases and the Syntax-Semantics Interface. Kluwer: Dordrecht. Dahl, Östen (ed.) 2000 Tense and Aspect in the Languages of Europe. Berlin /New York: Mouton de Gruyter.
338 Stavros Skopeteas and Caroline Féry Dayal, Veneeta 2003 Multiple wh questions. The Blackwell Companion to Syntax, Vol. 3, M. Everaert & H. van Riemsdijk (eds.), 275–326. Oxford: Blackwell. Féry, Caroline & Vieri Samek-Lodovici 2006 Focus projection and prosodic prominence in nested foci. Language 82 (1): 131–150. Féry, Caroline & Hubert Truckenbrodt 2005 Sisterhood and Tonal Scaling. Studia Linguistica. Special Issue “Boundaries in Intonational Phonology”. 59 (2/3): 223–243. Grabe, Esther 1998 Comparative intonational phonology: English and German. Ph.D. diss., Universiteit Nijmegen. Gussenhoven, Carlos 2004 The Phonology of Tone and Intonation. Cambridge: Cambridge University Press. Kuno, Susumo 1982 The focus of the question and the focus of the answer. Papers from the Parasession on Nondeclaratives, 18th Regional Meeting of the Chicago Linguistic Society, 134–157. Harris, Alice C. 1981 Georgian Syntax: A Study in Relational Grammar. Cambridge: Cambridge University Press. Hewitt, George 1995 Georgian: A Structural Reference Grammar. Amsterdam / Philadelphia: Benjamins. Krifka, Manfred 2002 For a structured meaning account of questions and answers. In Audiatur Vox Sapientiae: A Festschrift for Arnim von Stechow, (Studia grammatica 52), Caroline Féry & Wolfgang Sternefeld (eds.), 287–319. Berlin: Akademie Verlag. Ladd, D. Robert 1996 Intonational Phonology. Cambridge: Cambridge University Press. Pierrehumbert, Janet B. 1980 The phonology and phonetics of English intonation. Ph.D. diss., MIT. Revithiadou, Anthi & Vassilios Spyropoulos 2006 A multiple spell-out account of Greek topics. Handout, University of Potsdam. Roberts, Craige 1996 Information structure in discourse: Towards an integrated formal theory of pragmatics. In OSU Working Papers in Linguistics 49: Papers in Semantics, J.H. Yoon & Andreas Kathol (eds.), 91–136.
Contrastive topics in pairing answers
339
Skopeteas, Stavros, Ines Fiedler, Sam Hellmuth, Anne Schwarz, Ruben Stoel, Gisbert Fanselow, Caroline Féry & Manfred Krifka 2006 Questionnaire on Information Structure (ISIS Vol. 4). Potsdam: Universitätsverlag Potsdam. Skopeteas, Stavros, Caroline Féry & Rusudan Asatiani 2007 Word order and intonation in Georgian. (To appear in Lingua.) Uhmann, Susanne 1991 On the tonal disambiguation of focus structures. Journal of Semantics 8: 220–239. Thiersch, Craig 1978 Topics in German syntax. Ph.D. diss., MIT, Cambridge, MA. Ward, Gregory & Betty Birner 2004 Information structure and non-canonical syntax. In The Handbook of Pragmatics, Laurence R. Horn & Gregory Ward (eds.), 153–174. Malden, MA: Blackwell.
Coordinate structures: On the relationship between parsing preferences and corpus frequencies Ilona Steiner
1. Introduction The question whether the mechanisms of sentence comprehension and sentence production are distinct or not has been a central issue in psycholinguistics for a long time. One possibility to address this question is to investigate the relationship between parsing preferences (comprehension data) and corpus frequencies (production data). According to the tuning hypothesis (Cuetos et al. 1996; Mitchell et al. 1995), initial parsing preferences in syntactically ambigous sentences are determined by people’s exposure to similar structures in the past with the result that people prefer the most frequently occurring resolution of an ambiguity. In this proposal parsing preferences and corpus frequencies should be correlated, i.e., the construction preferred during comprehension should occur more frequently in corpora than the dispreferred construction. The status of this hypothesis is an object of current research. A finding that is problematic for the tuning hypothesis concerns the relative clause attachment ambiguity of the form “NP1 Prep NP2 Relative Clause” in Dutch. A corresponding English example is illustrated in (1). (1)
Someone shot [NP1 the servant] of [NP2 the actress][RC who was on the balcony].
In this syntactic ambiguity the relative clause (RC) can either be attached to NP1 the servant (high attachment) or to NP2 the actress (low attachment). It has been shown in reading-time studies that Dutch-speaking readers have a preference to attach the RC high (Brysbaert & Mitchell 1996; Mitchell et al. 2000). In contrast to the comprehension data, Mitchell & Brysbaert (1998) found that in corpora low-attaching RCs were more frequent than high-attaching RCs. Mitchell & Brysbaert took this as evidence against the tuning hypothesis.
342
Ilona Steiner
Desmet et al. (2002) however showed that the experimental stimuli used in the reading experiments were not representative of the sentences in the corpus. The experimental items contained mainly noun phrases that refer to humans, whereas most noun phrases in the corpus sentences referred to nonhuman entities. A reanalysis of the Mitchell & Brysbaert (1998) corpus revealed that the attachment preferences depend on the nature of NP1. For sentences with a human NP1, high-attaching RCs were more frequent than low-attaching RCs, which correlates with the experimental findings. For sentences with a non-human NP1, low attachment was more frequent. This shows that when the corpus data are controlled carefully to match the experimental sentences (here: if an NP refers to a human entity) the discrepancy between sentence comprehension and sentence production disappears. Another finding that was taken as evidence against the tuning hypothesis comes from a series of studies by Gibson & Schütze (1999) and Gibson et al. (1996b). They investigated disambiguation preferences in English noun phrase conjunction of the form “NP1 Prep NP2 Prep NP3 and NP4” as illustrated in (2). NP4 can be attached to three possible attachment sites. (2)
The salesman ignored [NP1 a customer] with [NP2 a child] with [NP3 a dirty face] and… a. [NP4 a wet diaper] (low attachment) b. [NP4 one with a wet diaper] (middle attachment) c. [NP4 one with a baby with a wet diaper] (high attachment)
Both an off-line study rating the comprehensibility of the different attachments (Gibson et al. 1996b) and a reading-time study (Gibson & Schütze 1999) indicated that high attachments are easier to understand than middle attachments (high < middle). Detailed corpus analyses, however, revealed that middle attachments are significantly more frequent than high attachments (middle < high) (Gibson et al. 1996b). Since the authors did not find a correlation between disambiguation preferences and corpus frequencies, they conclude that “…the sentence comprehension mechanism is not using corpus frequencies in arriving at its preference in this ambiguity and hence the decision principles of sentence comprehension and sentence production must be partially distinct”. Desmet & Gibson (2003) however were able to show that, here again, the sentences used in the studies by Gibson and colleagues were not representative of the sentences in the corpus counts. The experimental stimuli contained the pronoun ‘one’ as head of the conjoined noun phrase (NP4), whereas the conjoined NP in the corpus sentences did not contain a pronoun
Coordinate structures
343
in most of the cases. Both in a corpus study and in a self-paced reading experiment, Desmet & Gibson (2003) demonstrated that the presence or absence of the pronoun ‘one’ in a noun phrase influences the preference to conjoin this noun phrase with the first (high attachment) or second (middle attachment) of three possible NP attachment sites. The presence of the pronoun ‘one’ induced a high attachment preference, the absence of the pronoun resulted in a tendency to prefer middle attachment. This shows that the high attachment preference in sentence comprehension found in Gibson & Schütze (1999) was completely due to the use of the pronoun ‘one’ in their items. Desmet & Gibson (2003) argue that these results cast serious doubt on Gibson & Schütze’s hypothesis that different processes guide preferences in production and comprehension for this kind of ambiguity. The above discussion shows that the main question of how these two types of linguistic evidence are related to each other is still open. For example, it is unknown to what extent corpus frequencies can be used to draw conclusions on the mechanisms of sentence comprehension. In order to provide further evidence, we investigate the relationship between parsing preferences and corpus frequencies with respect to coordinate structures in English. We focus on two processing effects that have been found in reading-time studies: 1) the parallel-structure effect and 2) the disambiguation preference in noun phrase vs. sentence coordination. These effects have been reported in the literature and are described in more detail in Section 2 and 3. We compare these preferences, which are measured in faster reading times, to the corresponding corpus data in the English Verbmobil treebank (TÜBA-E, Hinrichs et al. 2000) in order to see whether they are also present in corpora. The TÜBA-E treebank consists of spoken dialogs in the domain of business appointments and has been annotated manually at the levels of morphosyntax (parts-of-speech categories), syntactic phrase structure and functionargument structure. We decided to use a corpus of spontaneous speech, i.e., unedited naturally occurring dialogues, in order to avoid that the corpus data do not reflect solely mechanisms of sentence production, but also intervening factors that are due to editing processes.1
2. The parallel-structure effect The first processing effect we address in this paper is the parallel-structure effect, reported by Frazier et al. (1984, 2000). Using reading times, the authors observed that in a coordinate structure the second conjunct is read faster when it is structurally similar to the first one.
344
Ilona Steiner
(3)
a. Terry wrote [NP a long novel] and [NP a short poem]. b. Terry wrote [NP a novel] and [NP a short poem].
The noun phrase [a short poem] is processed faster in (3a) than in (3b), since the two conjoined NPs in (3a) are structurally identical, which does not hold for the sentence in (3b). Obviously, the human parser prefers processing conjuncts that are structurally similar. We wanted to find out whether this preference for structural similarity in coordination is also present in corpora. Therefore we analysed a fraction of the TÜBA-E corpus, i.e., 2906 sentences (CD13). In order to match the experimental stimuli, our analysis is limited to coordinate structures with two conjuncts that are connected with the conjunction and. Furthermore, we only considered coordination within complete sentences, which is important, because we are dealing with spoken language that contains lots of fragmentary utterances. This procedure results in 242 occurrences of coordinate structures (coordination dataset). For each coordinate structure we extracted: – – – –
the syntactic category of the mother and the daughter nodes, the grammatical function of the mother, the length of conjuncts (number of words), the degree of similarity in the conjuncts.
The syntactic annotation in the treebank was used as basis for the extraction process. Table 1 shows a list of the syntactic categories in the TÜBA-E treebank that are involved in the coordinate structures together with a corpus example and the number of instances with this category as mother node. In the following we focus on the degree of similarity in the conjuncts, since this is the relevant type of information in order to examine the preference for structural similarity in coordination. 2.1. Corpus analysis I: Structural similarity of the conjuncts We manually inspected each occurrence in the coordination dataset to determine the degree of similarity in the conjuncts (0–100%): 100% meaning both conjuncts having exactly the same structure including the parts-ofspeech categories, 0% meaning that not a single syntactic node is redundant. The sentence in (4a) is an example from our dataset with 36% degree of similarity, whereas the conjuncts in (4b) have identical structures (100% similarity). The corresponding entries of the treebank are shown in Figure 1 and 2.
Coordinate structures
345
Table 1. Categories in the TÜBA-E treebank that are involved in coordination Category
Description
Corpus example
S
sentence
77
VP
verbal phrase
[I am free that afternoon] and [the seventeenth I am completely free] if I can [run home] and [get a shower]
NP
noun phrase
I have [a early morning meeting] and [a lunch meeting]
113
PP
prepositional phrase adjective phrase adverbial phrase cardinal number Ordinal number
[on the twenty sixth] and [on the twenty seventh] I am busy all day but early on next week is actually [pretty free] and [late] [where] and [when] would like to do it
7
it sounds like between [twelve] and [one] o’clock would be the best time I am going out of town on the [sixth] and [seventh]
1
AP ADVP CNUM ONUM
(4)
Frequency
19
1 1
23
a. [NP Monday] and [NP Tuesday the twenty seventh] are bad for me b. [NP twenty second] and [NP twenty fourth] are pretty bad
As you can see in Figure 1 the two conjuncts [NP Monday] and [NP Tuesday the twenty seventh] differ strongly in their structure. We determine the degree of similarity by counting the syntactic nodes of the two conjuncts (including the terminal nodes with the parts-of-speech categories), then we calculate how many of the nodes are identical in both conjuncts in a topdown procedure. In Figure 1 the two conjuncts comprise eleven syntactic nodes, four of which are identical (two nodes in each conjunct). These are the two mother nodes of the conjuncts labelled with syntactic category NP and the two terminal nodes (corresponding to the words Monday and Tuesday) labelled with the PoS-tag NP. Four reoccurring nodes out of eleven gives us the 36% degree of similarity. The two conjuncts in Figure 2, in contrast, do have exactly the same structure, i.e., 100% degree of similarity (eight reoccurring nodes out of eight).
346
Ilona Steiner
Figure 1. Treebank entry with 36% degree of similarity in the conjuncts (see (4a))
Figure 2. Treebank entry with 100% degree of similarity in the conjuncts (see (4b))
The analysis of the coordination dataset provides a distribution of similarity between 0% to 100%, which is illustrated in Figure 3. The first bar in Fig. 3 represents the number of occurrences with a degree of similarity between 0% and 10% (exclusive). The second bar represents the occurrences with a degree of similarity between 10% and 20% (exclusive) etc. The last but one bar concerns the degree of similarity between 90% and 100% (exclusive). And the last bar represents only the occurrences with exactly 100% similarity.
Coordinate structures
347
Figure 3. Distribution of similarity from 0% to 100% in coordination dataset
As can be seen in the diagram, similarity is distributed almost evenly over the different degrees, except for the 100% bar. This bar differs strikingly from the other degrees and makes 32.6% of all occurrences in the coordination dataset. That is, almost a third of all occurrences of coordinations contained conjuncts that have an identical structure, as the sentence in (4b). In order to interpret these figures, we have to ask how often structural similarity occurs randomly. Therefore, we constructed a random dataset. For each first conjunct of our coordination dataset we extracted a random second “conjunct”, which is a phrase randomly chosen from the corpus, independent of coordination. This random phrase matches the original second conjunct in syntactic category, grammatical function, and length. (5c) is an example of a randomly extracted phrase for the coordinate structure in (4b) repeated here as (5a). The pairs of original first conjunct and random second conjunct (5b)+(5c) constitute the random dataset. (5)
a. [NP twenty second] and [NP twenty fourth] are pretty bad b. → Original first conjunct: [NP twenty second] c. → Randomly chosen phrase: [NP the first]
The random phrase [NP the first] in (5c) matches the original second conjunct [NP twenty fourth] in syntactic category (NP), grammatical function (subject)
348
Ilona Steiner
and length (2 words). We now have to determine how many occurrences in the random dataset are structurally identical. The analysis of the random dataset reveals that 10.7% of the occurrences (pairs of original first conjunct and random second conjunct) showed exactly the same structure (i.e., 100% similarity). In the coordination dataset 32.6% of the occurrences were structurally identical. The difference of the coordination data and the random dataset with respect to structural similarity (32.6% vs. 10.7%) is significant (χ 2 (1) = 60.1; p < 0.01). Structural similarity within coordination is thus significantly more frequent in our corpus than structural similarity of two phrases independent of coordination. These corpus findings match the preference for structural similarity in coordination during parsing. We are thus able to establish a correlation between this preference and corpus frequencies as predicted by the tuning hypothesis.
2.2. Distribution of parallel occurrences In order to ensure that the observed effect in the corpus is a general effect and not an artefact that can be attributed to a particular type of coordinate structure, we inspected the distribution of parallel occurrences (i.e., with 100% structural similarity) in the coordination dataset and in the random dataset with respect to syntactic category and length. We first focus on the distribution across syntactic categories. Table 2. Distribution of parallel occurrences across syntactic categories Syntactic category
Coordinate structures
Parallel instances Coordination Random
Mean length (first conjunct)
ADVP CNUM
1 1
100% 100%
100% 100%
1.0 1.0
ONUM
23
78.3%
78.3%
1.5
NP AP
113 1
50.0% 0%
5.3% 0%
2.4 2.0
PP S
7 77
28.6% 1.3%
0% 0%
4.4 8.1
VP
19
0%
0%
3.5
242
32.6%
10.7%
4.3
All categories
Coordinate structures
349
2.2.1. Distribution across syntactic categories Table 2 shows for each category the number of coordinate structures, the number of parallel instances in the coordination and in the random dataset (in percentage), and the mean length of this category in the position of the first conjunct (number of words). The number of coordinate structures for each category (second column) and the mean length (last column) are, of course, identical for the coordination dataset and for the random dataset, because the random conjuncts are controlled for syntactic category and length. Looking at Table 2, it seems that almost all of the syntactic categories exhibit also parallel occurrences (except for category AP and VP). A closer look, however, reveals that only the categories NP and PP show an obvious difference to the number of parallel instances in the random dataset (NP: 50.0% vs. 5.3%; PP: 28.6% vs. 0%). Some of the categories (ADVP, CNUM and ONUM) show a large proportion of parallel instances in the coordination dataset as well as in the random data. The categories AP, VP and S, on the other hand, do not occur with parallel structures (except for one parallel occurrence of S coordination), neither in the coordination dataset nor in the random data. In the following we discuss this distribution. Categories ADVP, CNUM, ONUM: These categories show a large proportion of parallel instances in the coordination dataset as well as in the random data (ADVP: 100% vs. 100%; CNUM: 100% vs. 100%; ONUM: 78.3% vs. 78.3%). The reason for this distribution is the limited variability in the form of these categories. Category ONUM, for example, occurs in our corpus only in two structural versions: a) a one-word ordinal number as, e.g., sixth, or b) a two-word ordinal number as, e.g., twenty sixth.2 For each length there exists exactly one structure in the corpus. Since the random conjuncts match the original second conjuncts in length, we get for each parallel occurrence in the coordination dataset also a parallel occurrence in the random dataset. The same explanation holds for category CNUM, too. Here again, the structure of this category is completely determined by the length. The situation with category ADVP is slightly different: the structure of an ADVP is not determined by the length, but the only occurrence of ADVP coordination in the corpus has a grammatical function which completely determines the structure of this phrase. Since the random conjuncts match the original second conjuncts also in grammatical function, we get for the parallel occurrence in the coordination dataset also a parallel occurrence in the random dataset. Because of their limited structural vari-
350
Ilona Steiner
ability, these categories (ONUM, CNUM, ADVP) are only of limited use for investigating the parallel-structure effect. Categories NP, PP: Concerning structural variation, the categories NP and PP seem to be at a medium level. Both categories show an obvious difference between the number of parallel instances in the coordination dataset and in the random data. This difference shows that the parallel-structure effect is not due to the combinatorial properties of a particular category. It seems to be a general effect of coordination. Categories AP, VP, S: These categories do not show parallel occurrences, neither in the coordination dataset nor in the random data (except for one parallel occurrence of S coordination). We cannot say much about the AP category, because AP coordination occurs only once in the corpus. But there are two possible explanations for the missing parallel occurrences in the VP and S category. The first reason may be the high structural variability of these categories. We would probably need more data to get some (or some more) parallel occurrences for VP and S coordination. Alternatively, it could also be that the length of the conjuncts influences the parallel-structure effect, since the mean length of category VP (3.5 words) and S (8.1 words) is larger than, for example, the mean length of category NP (2.4 words). Overall, we can say that the parallel-structure effect in the corpus is a general effect of coordination and not an artefact that can be attributed to a particular syntactic category. Some amount of structural variation of the categories, however, is a prerequisite to get observable results. In the next section we examine whether length is a relevant factor for the parallelstructure effect.
2.2.2. Distribution across different lengths There are two reasons to inspect the distribution of the parallel occurrences by length. First, we wanted to see whether the parallel-structure effect in the corpus is specific to particular lengths of the conjuncts as a result of their combinatorical properties. We argued that the structural variation of a category plays a role for the parallel-structure effect. The variation of a category, however, is related to its length, i.e., the longer a phrase, the more structural variation is generally possible. It may thus be that the parallelstructure effect exists only for particular lengths of conjuncts. It is possible, for example, that we do not get the effect for short conjuncts, because they
Coordinate structures
351
Table 3. Distribution of parallel occurrences w.r.t. length Length of first conjunct 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 All lengths
Coordinate structures
Parallel instances Coordination Random
72 43 20 10 19 11 13 17 16 9 2 3 4 1 1 1
66.7% 44.2% 30.0% 40.0% 5.3% 0% 0% 0% 6.3% 0% 0% 0% 0% 0% 0% 0%
26.4% 16.3% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0%
242
32.6%
10.7%
are too limited in their structural variation (as we showed for the categories ONUM, CNUM and ADVP). Second, we are interested to see if there is a general length effect in the data, i.e. a cognitive effect that is due to limitations of working memory. We then would expect that the parallel-structure effect disappears with longer conjuncts, because the whole structure of the first conjunct has to be held in working memory while processing the second conjunct. This becomes more and more problematic with the increasing length of the first conjunct. Table 3 shows the distribution of parallel occurrences with respect to the length of the first conjuncts. For each length (number of words) we give the number of coordinate structures and the number of parallel instances (in percentage) in the coordination dataset and in the random data. As illustrated in Table 3, parallel instances occur in the coordination dataset for length 1 up to length 5. Conjuncts with more than five words are no longer parallel (except for one occurrence of S coordination with conjuncts of length 9). Parallel instances in the random dataset do only occur with
352
Ilona Steiner
length 1 and 2. There is an obvious difference between the number of parallel instances in the coordination data and in the random data for length 1 to 4. The parallel-structure effect thus exists also for different lengths (1–4). We only report the statistical results for length 1 and 2, since there is no parallel occurrence in the random data for length 3 and 4, i.e., a precondition for the χ2 test is violated. The difference between the number of parallel instances in the coordination data and in the random data is significant for length 1 (χ2(1) = 12.9; p < 0.01) and length 2 (χ2(1) = 10.6; p < 0.01). This shows that the large number of parallel instances with length 1, for example, is not due to the pure combinatorical properties of conjuncts with length 1. The same holds for length 2 to 4. Finally, we wanted to find out about the general length effect, which would cause problems with longer conjuncts. Therefore, we calculated a normalised parallelity score for each length, i.e., the difference between the number of parallel occurrences in the coordination data and the expected random distribution (number of parallel occurrences in the random data) for that length. The normalised parallelity score (in percentage) is plotted for each length (1–9) in Figure 4.
Figure 4. Normalised parallelity score for length 1–9. (Normalised parallelity: difference between number of parallel occurrences in coordination data and in random data; length: number of words of first conjunct.)
Coordinate structures
353
Figure 4 shows the existence of a fair amount of parallelity for length 1 to 4. From length 5 on parallelity drops drastically. Statistically, the parallelity scores of length 1 to 4 do not differ significantly from each other (χ2(3) = 2.2; p = 0.5), i.e., there is no length effect for length 1 to 4. The difference between length 1–4 and length 5–9 does reach significance (χ2(1) = 29.0; p < 0.01), though, which means that there is a length effect for short (length 1–4) vs. long conjuncts (length 5–9). Long conjuncts are rarely structurally parallel. This is what we expected based on the properties of working memory. And this might also be an explanation for the missing parallel occurrences in sentence coordination, since the conjuncts of this category are usually long (mean length: 8.1 words).3 We assume that the missing parallelity in the VP category results from a combination of high structural variability and length. In this section we have shown that the preference for structural similarity during parsing is also present in corpus data. We found an overall effect for the coordination data vs. the random data. Thus, a correlation between this preference and corpus frequencies can be established as predicted by the tuning hypothesis. We have shown that the parallel-structure effect in the corpus is a general effect of coordination and not an artefact that can be attributed to a particular type of coordinate structure. For example, we could report the parallel-structure effect for different lengths (length 1–4), which shows that the effect is not simply due to the combinatorical properties of short conjuncts. And the effect is not specific to a particular syntactic category either. We however found a general length effect for short (length 1–4) vs. long conjuncts (length 5–9). In the present corpus, structural similarity occurs only with short conjuncts up to length 4, which can very likely be attributed to properties of working memory.
3. Disambiguation preference: NP vs. S coordination We now come to the second processing effect addressed in this paper. Frazier (1979) compared reading times of sentences containing two conjoined noun phrases, as in (6a), with sentences that contain the coordination of two sentences (6b). When encountering the conjunctions in (6a,b) the parser is faced with a local ambiguity that cannot be resolved prior to the last word. (6)
a. Peter kissed [NP Mary] and [NP her sister] too. b. [S Peter kissed Mary] and [S her sister laughed].
354
Ilona Steiner
The results show a garden-path effect, i.e. significantly longer reading times, at the last word laughed in (6b) compared to the last word too in (6a). These results indicate that the parser prefers to interpret the noun phrase her sister as part of a conjoined noun phrase and not as the beginning of a new sentence (see Frazier (1987) and Hoeks et al. (2002) for similar results in Dutch). 3.1. Corpus analysis II: Disambiguation preference The question now is whether the preference for NP coordination (compared to sentence coordination) is also present in the corpus data. Therefore we analysed a larger portion of the TÜBA-E treebank, i.e., CD6 (2554 sentences) and CD13 (2906 sentences). We extracted all occurrences that have the form “…NPSubj …Verb…NPObj and…” (basic construction) and that allow semantically the continuation with a noun phrase (NP coordination) or a sentence (S coordination). Example (7a) shows a relevant construction from our dataset that continues with a noun phrase, whereas the one in (7b) continues with a sentence. (7c), for example, is excluded from our dataset, since the continuation with a sentence would not be semantically possible here. (7) a. b. c.
and I will bring [NP the doughnuts] and [NP coffee] I guess and [S Friday I have a nine to ten meeting] and [S I also have a meeting in the early afternoon] I hate to mix [NP business] and [NP weekends]
Our analysis showed that in 69.2% of the relevant constructions, the sentence continued with a noun phrase (NP coordination, n = 27), only 30.8% continued with a sentence (S coordination, n = 12). Provided that chance is 50%, this difference is statistically significant (t(38) = 2.57; p < 0.05). Thus the disambiguation preference for NP coordination (as opposed to S coordination) during parsing is also present in the corpus. As predicted by the tuning hypothesis, we found a correlation between this preference and corpus frequencies. 3.2. Potentially influencing factors It is known from the literature that there are additional factors that may influence attachment preferences. As described in the introductory section, two factors were crucial for the correlation studies by Desmet et al. (2002)
Coordinate structures
355
and Desmet & Gibson (2003), and hence have to be controlled for: (a) the nature of the noun phrases (human, non-human) that function as possible attachment sites, and (b) the use of pronouns. In order to ensure that the observed disambiguation preference is not an artefact due to one of these two factors, we looked for these in our corpus data.
3.2.1. Nature of the noun phrases In Dutch the preference to attach the relative clause in (1), repeated here as (8), depends on the nature of NP1 (Desmet et al. 2002). If NP1 refers to a human entity, attachment to NP1 is preferred (high attachment). If NP1 refers to a non-human entity, attachment to NP2 is preferred (low attachment). (8)
Someone shot [NP1 the servant] of [NP2 the actress][RC who was on the balcony].
The coordinate structures we are investigating here are presumably not completely comparable to the relative clause attachment ambiguities in Dutch, but we wanted to ensure that the nature of the NPSubj and NPObj in the basic construction, i.e. if they refer to human or non-human entities, does not influence the preference to coordinate the NPObj (NP coordination) or the whole sentence (S coordination). We therefore list the number of NP and S coordinations as a function of the nature of NPSubj and NPObj in the basic construction (see Table 4).4 Most occurrences contain a human NPSubj and a non-human NPObj in the basic construction. We did not find occurrences with a non-human NPSubj and a human NPObj and only 3 instances with a human NPSubj and a human NPObj, which is not sufficient to draw any conclusions. The other two sentence types (h-nh, nh-nh), however, show a clear picture. There is a preference for Table 4. Distribution w.r.t. the nature of NPs (h: human, nh: non-human) Sentence type (NPSubj–NPObj)
No.
NP coordination
S coordination
h-nh nh-nh
24 12
66.7% 83.3%
33.3% 16.7%
h-h
3
33.3%
66.7%
356
Ilona Steiner
NP coordination in both sentence types. We therefore conclude that, at least for these two sentence types, the observed disambiguation preference is not an artefact due to the nature of the noun phrases. Unfortunately, we are not able to make a statement about the other two sentence types (h-h, nh-h).
3.2.2. Anaphoric relations Another factor that influences parsing preferences is the use of pronouns. It has been shown that when there are several possible antecedents for a pronoun, the first-mentioned antecedent is usually the most accessible (Gernsbacher 1989). Desmet & Gibson (2003) showed that the high attachment preference in (2) (“NP1 Prep NP2 Prep NP3 and NP4”) is due to the use of the pronoun ‘one’ as head of the conjoined noun phrase (NP4). The coordinate structures we are investigating are also not completely comparable to the sentences in (2), since we are not dealing with different types of NP coordination, but with NP vs. S coordination. Nevertheless, we wanted to ensure that the disambiguation preference in the corpus is not due to anaphoric relations between the first and the second conjunct. An analysis of the data reveals that there are two occurrences of NP coordination with a pronoun in the second conjunct that refers to the first conjunct. The data also exhibits two occurrences of S coordination with a pronoun (‘that’) in subject position of the second sentence that refers to the first conjunct (first sentence). Overall, we can say that anaphoric relations between the first and second conjunct are very rare in our data and do not change the results. We therefore conclude that the preference for NP coordination we found in the corpus is not due to anaphoric relations between the first and second conjunct. In this section we have shown that the preference for NP coordination (compared to sentence coordination) during parsing is also present in the corpus data. As predicted by the tuning hypothesis, a correlation between this preference and corpus frequencies can be established. These results are not an artefact due to the nature of the noun phrases (human, non-human), at least in the sentence types occurring in the corpora, nor an artefact due to anaphoric relations between the first and the second conjunct.
Coordinate structures
357
4. Methodological considerations We have presented the statistical results of two corpus studies. Nevertheless, we have to keep in mind that these results are completely based on the syntactic annotations in the TÜBA-E treebank. The treebank is annotated manually by linguists and not automatically by computational parsers, which has the advantage that the error rate is relatively low. On the other hand, the annotation reflects the annotators’ preference in debatable cases and, of course, the design principles of the annotation scheme. One example of this is the design principle LMP (Longest Match Principle) in the annotation scheme of the TÜBA-E treebank. This principle demands that as many constituents as possible be included into a syntactic structure, as long as the whole structure remains syntactically, as well as semantically, well-formed (Kordoni 2000). If a treebank entry contains, for example, several sentences, the LMP leads to a preference to conjoin as many sentences as possible. We often found treebank entries with several conjoined sentences that we would have annotated as separate sentences. Without the LMP, the treebank would comprise fewer sentence coordinations. Very likely the preference for NP coordination (compared to S coordination) in this case would be even stronger.
5. Conclusion In this paper we investigated the relationship between two types of linguistic evidence, namely parsing preferences and corpus frequencies. The study was based on two processing effects, i.e., the parallel-structure effect and the disambiguation preference NP vs. S coordination. We have shown that for both processing effects a correlation between parsing preferences and corpus frequencies in spoken dialogs can be established. There are two general possibilities to explain these correlations. First, as stated in the tuning hypothesis, the human parser might be sensitive to the statistical patterns occurring in natural language, as, for example, the relative frequencies in corpus data. Here, the statistics in language production is seen as the cause for the development of parsing preferences. There is however another possibility to account for the correlations, namely a common source for language production and comprehension. There are several structure-based accounts that are able to explain the parsing preferences discussed above. The parallel-structure effect, for example, can be explained by a recycling mechanism, which exploits the structural
358
Ilona Steiner
redundancy in the conjuncts (Steiner 2005). In this account the human parser, when processing the second conjunct, reuses the structure that was already built up for the first conjunct. The preference for noun phrase conjunction as opposed to sentence coordination can either be derived from the Minimal Attachment Principle (Frazier 1979; Frazier & Clifton 1996), which favours the structure that requires fewer syntactic nodes, or it might also be due to a general preference for recency (see, e.g., Gibson et al. 1996a). In these structure-based accounts the preferred construction is usually more economical and therefore easier and faster to process. If the same mechanisms are also active during language production, the constructions that are easier to understand would also be easier to produce and are presumably produced more often. A common mechanism for language production and comprehension would lead to faster reading times during parsing and to higher frequencies during production. With the present study we are not able to differentiate between the two possibilities, but we could show that production and comprehension preferences are closer to each other than expected. And if it can be shown that the correlation between these two types of linguistic evidence holds in general, an interesting consequence would arise, namely that corpus data can be used to evaluate, at least qualitatively, models of sentence processing.
Notes 1. The importance of using unedited corpus data for studying language production is emphasised in Gibson et al. (1996b). 2. Ordinal numbers as, e.g., twenty sixth are annotated as two words in the treebank. This of course implicates that a coordinate structure as, e.g., [the fifth] and [the twenty first] is not completely parallel in our analysis. Treating twoword ordinal numbers as one word would clearly increase the number of parallel instances in the coordination dataset. 3. Frazier et al. (1984) report the parallel-structure effect also for the coordination of sentences. In these sentences, however, parallelity refers to a higher phrasal level. Frazier and colleagues found differences in reading times for example between active and passive sentence constructions. a. [ S The prince kissed the princess] and [ S the king hugged the queen]. b. [ S The princess was kissed by the prince] and [ S the king hugged the queen]. The second conjunct [S the king hugged the queen] is processed faster in example a. than in b., since the two conjoined sentences in a. are structurally identical (active, active), which does not hold for the sentence in b. (passive, active).
Coordinate structures
359
It has not been shown in sentence coordination whether the internal structure of the local phrases affects the parallel-structure effect. It is possible that local phrase structure is accessible only for a short period of time, whereas higher phrasal nodes are active beyond. This has the consequence that parallel internal phrase structure affects the parallelism effect only if the conjuncts are short, e.g., in noun phrase conjunction as in (3). Parallelity on a higher phrasal level is relevant, if the conjuncts are long, e.g., in sentence coordination (see above). 4. Note that the NPs in Frazier’s experimental stimuli, as e.g. in (6), were not controlled for animacy. The materials contained noun phrases that refer to human as well as non-human entities.
References Brysbaert, Marc & Don C. Mitchell 1996 Modifier attachment in sentence processing: Evidence from Dutch. The Quarterly Journal of Experimental Psychology 49A: 664–695. Cuetos, Fernando, Don C. Mitchell & Martin M. B. Corley 1996 Parsing in different languages. In Language Processing in Spanish, M. Carreiras, J. E. Garcia-Albea & N. Sebastian-Galles, (eds.), 145– 187. Hillsdale, NJ: Erlbaum. Desmet, Timothy, Marc Brysbaert & Constantijn De Baecke 2002 The correspondence between sentence production and corpus frequencies in modifier attachment. The Quarterly Journal of Experimental Psychology 55A(3): 879–896. Desmet, Timothy & Edward Gibson 2003 Disambiguation preferences and corpus frequencies in noun phrase conjunction. Journal of Memory and Language 49: 353–374. Frazier, Lyn 1979 On Comprehending Sentences: Syntactic Parsing Strategies. Ph.D. thesis, University of Connecticut, Storrs. 1987 Syntactic processing: Evidence from Dutch. Natural Language and Linguistic Theory 5: 519–559. Frazier, Lyn & Charles Clifton 1996 Construal. Cambridge, MA: MIT Press. Frazier, Lyn, Alan Munn & Charles Clifton 2000 Processing coordinate structures. Journal of Psycholinguistic Research 29 (4): 343–370. Frazier, Lyn, Lori Taft, Tom Roeper, Charles Clifton & Kate Ehrlich 1984 Parallel structure: A source of facilitation in sentence comprehension. Memory and Cognition 12 (5): 421–430.
360
Ilona Steiner
Gernsbacher, Morton Ann 1989 Mechanisms that improve referential access. Cognition, 32: 99–156. Gibson, Edward, Neal Pearlmutter, Enriqueta Canseco-Gonzalez & Gregory Hickok 1996a Recency preference in the human sentence processing mechanism. Cognition 59: 23–59. Gibson, Edward & Carson T. Schütze 1999 Disambiguation preferences in noun phrase conjunction do not mirror corpus frequency. Journal of Memory and Language 40: 263–279. Gibson, Edward, Carson T. Schütze & Ariel Salomon 1996b The relationship between the frequency and processing complexity of linguistic structure. Journal of Psycholinguistic Research 25 (1): 59–92. Hinrichs, Erhard, Julia Bartels, Yasuhiro Kawata, Valia Kordoni & Heike Telljohann 2000 The Tübingen treebanks for spoken German, English, and Japanese. In Verbmobil: Foundations of Speech-to-Speech Translation, Artificial Intelligence, 552–576. Berlin /Heidelberg /New York: Springer. Hoeks, John C. J., Wietske Vonk & Herbert Schriefers 2002 Processing coordinated structures in context: The effect of topicstructure on ambiguity resolution. Journal of Memory and Language 46: 99–119. Kordoni, Valia 2000 Stylebook for the English treebank in VERBMOBIL. Technical Report 241, Verbmobil. Mitchell, Don C. & Marc Brysbaert 1998 Challenges to recent theories of crosslinguistic variation in parsing: Evidence from Dutch. In Syntax and semantics: A crosslinguistic perspective, D. Hillert (ed.), 313–335. San Diego, CA: Academic Press. Mitchell, Don C., Marc Brysbaert, Stefan Grondelaers & Piet Swanepoel 2000 Modifier attachment in Dutch: Testing aspects of Construal theory. In Reading as a perceptual process, A. Kennedy, R. Radach, D. Heller & J. Pynte (eds.), 493–516. Oxford: Elsevier. Mitchell, Don C., Fernando Cuetos, Martin M. B. Corley & Marc Brysbaert 1995 Exposure-based models of human parsing: Evidence for the use of coarse-grained (nonlexical) statistical records. Journal of Psycholinguistic Research 24: 469–488. Steiner, Ilona 2005 On the syntax of DP-coordination: Combining evidence from readingtime studies and agrammatic comprehension. In Linguistic Evidence, S. Kepser & M. Reis (eds.), 507–527. Berlin /New York: Mouton de Gruyter.
Adverbs and sentence topics in processing English Britta Stolterfoht, Lyn Frazier and Charles Clifton, Jr.
1. Introduction Many languages permit considerable flexibility of word order. The importance of information-structure for the placement of arguments in these languages has been widely discussed. When an argument appears in a noncanonical position, typically there are information-structure constraints on its discourse status. Often the change of word order affects the focus-background articulation of a sentence (e.g. Höhle 1982; Abraham 1992; Steube 2000 for German). In German for example, a phrase may be scrambled to a position earlier than its canonical position, but typically the scrambled phrase must be already given in discourse and the clause will receive narrow focus (the focus will not include the scrambled constituent). Experimental evidence for this assumption comes from studies showing that processing locally ambiguous sentences with non-canonical word order (scrambling) requires not only a syntactic reanalysis to be performed, but also a focus structure reanalysis (Bader & Meng 1999; Stolterfoht & Bader 2004). Further evidence for the important role of information-structural constraints is found in processing object topicalization in Finnish. The canonical order in Finnish is SVO. When listeners encounter an OV sentence-beginning, they immediately predict that the (post-verbal) subject will refer to some discourse-new entity (Kaiser & Trueswell 2004). There is a strong relation between information-structure and word order in flexible word order languages. But what about the information-structuring in a language like English that does not have scrambling or highly flexible ordering of arguments? In the present paper we will focus on whether information-structure constraints found in scrambling languages may also apply in languages like English that do not have scrambling. We will focus on the information-structure constraints conveyed by adverb placement. Identifying the structural position of arguments is often difficult. However, when the sentence contains an adverb, the adverb may in effect identify a structural position allowing the position of the argument to be determined. The position of adverbs relative to other elements is one traditional
362 Britta Stolterfoht, Lyn Frazier and Charles Clifton, Jr. diagnostic in the analysis of phrase structure. Adverbs were often assumed to mark phrasal boundaries, and used as landmarks to demonstrate the movement of other elements across them (e.g. Emonds 1976; Platzack 1983). It has long been noted that the IP-internal syntax in a scrambling language like German is sensitive to information-structural considerations dependent on the adverb position. (1)
a. …, weil sie immer Briefe aus Europa beantwortet hat. … since she always letters from Europe answered has ‘… since she is always engaged in answering letters from Europe.’ b. …, weil sie Briefe aus Europa immer beantwortet hat. … since she letters from Europe always answered has ‘… since she never leaves a letter from Europe unanswered.’
Diesing (1992) and Kratzer (1995) have developed one strong hypothesis, known as the Mapping Hypothesis, concerning the relation between the interpretation of a phrase and its syntactic position. A bare plural DP like Briefe aus Europa (‘letters from Europe’) is ambiguous between a generic and an existential reading. According to the Mapping Hypothesis, there is a boundary marked by the adverb immer (‘always’) below which an ambiguous bare plural receives a weak or existential interpretation like in (1a) and above which it gets a strong or generic reading like in (1b). Diesing identified this boundary with the VP boundary and stated that material from VP is mapped into the nuclear scope and material from IP is mapped into a restrictive clause (i.e. Diesing 1992: 10). However, it appears that the Mapping Hypothesis is inadequate to explain all the facts for German. As Meinunger (1995) pointed out, the definite DP der Hund (‘the dog’) in the examples in (2) has a strong interpretation whether it sits above or below the VP boundary marked by the adverbial auf einmal (‘all of a sudden’) (capitals indicate sentence accents, ‘the dog’ is given information, see pp. 89). (2)
a. Aber als er wieder rauskam war auf einmal der HUND But when he again out.came was at once the dog verschwunden. disappeared.
Adverbs and sentence topics in processing English
363
b. Aber als er wieder rauskam war der Hund auf einmal But when he again out.came was the dog at once verSCHWUNden. disappeared. ‘But when he came back out, all of a sudden the dog had disappeared.’ But there is still an interpretational difference: ‘the dog’ is the sentence topic in (2b), but not in (2a). Meinunger accounts for this type of data by assuming that DPs to the left of boundary-marking adverbials occupy the specifiers of Agreement phrases (AgrPs), and by linking Agr to topicality with the feature [+topic] (see also Svenonius 2002). It has been pointed out that not any type of adverbial marks the VP boundary. Frey (2000) identifies the topic position in the German middlefield above a specific class of adverbials, namely sentential adverbials. According to Frey, sentential adverbials are evaluatives like erstaunlicherweise (‘amazingly’), evidentials like offensichtlich (‘obviously’) and epistemics like wahrscheinlich (‘probably’). Sentential adverbials are characterized as the boundary between given and new information (Haftka 1995, 2003) and as having their base position above all other arguments and adverbial classes (Frey & Pittner 1998). The examples in (3) provide evidence for a specific topic position preceding the sentential adverbial in German. Frey’s topic concept is an aboutness topic (in contrast to a familiarity topic) which can be described as an expression about whose referents the sentence predicates or makes a judgment (Reinhart 1981, 1995). (3)
Ich erzähl dir mal was von Otto. ‘I will tell you something about Otto.’ a. Nächstes Jahr wird Otto wahrscheinlich seine Kollegin heiraten. Next year will Otto probably his colleague marry. b. # Nächstes Jahr wird wahrscheinlich Otto seine Kollegin heiraten. Next year will probably Otto his colleague marry. ‘Next year, Otto will probably marry his colleague.’
The subject Otto has scrambled to a topic position in front of the sentential adverbial wahrscheinlich (‘probably’) in (3a), which is acceptable. In (3b), the subject remains in its canonical position, which is not acceptable in a context like that in (3), where Otto is a clear topic.
364 Britta Stolterfoht, Lyn Frazier and Charles Clifton, Jr. In addition to this example, Frey (2000) provides us with an impressive battery of tests to demonstrate the link between the position above sentential adverbials and topicality. One of these tests uses non-referential expressions. As demonstrated in example (4), a non-referential expression cannot fill the topic position preceding the sentential adverbial. (4)
* Während des Vortrags hat keiner anscheinend geschlafen. During the talk has nobody apparently slept
According to Frey, the reason for this is that aboutness topics must have identifiable discourse referents for the addressation of the information about these referents. Non-referential expressions do not provide these addresses. Thus, they cannot fill the position above the sentential adverbial. A syntactic approach for capturing these observations for German and similar phenomena in other Germanic languages has been proposed by Bobaljik & Jonas (1996). The authors argue that languages like German and Icelandic have two subject positions within the IP, one in the specifier of the Agreement Phrase (SpecAgrSP) which is linked to topicality, and one in the Specifier of the Tense Phrase (SpecTP). Sentential adverbials are attached to TP and separate these two positions. In contrast, languages like English and Danish do not contain a position for topics, but have only one subject position (SpecAgrSP) which is not sensitive to, at least not dictated by, the information-structural status of the subject (see also Svenonius 2002). To conclude, languages with flexible word order like German and Finnish are sensitive to information-structural constraints with regard to the order of arguments. In addition, adverb position may help to identify structural positions which in turn indirectly convey information-structure constraints. In contrast, the grammar of languages like English seems not to be sensitive to these information-structural constraints. The question to be addressed here is whether English really is not sensitive to information-structure constraints conveyed by adverb placement. We report an experiment we conducted to investigate the interaction of adverb placement and the discourse status of the subject argument. 2. The experiment The experiment addresses the question of whether adverb placement in a language like English with a relatively fixed word order influences assumptions about topichood in a manner similar to that proposed for scrambling languages.
Adverbs and sentence topics in processing English
365
For answering this question, we used the test proposed by Frey (2000) and looked at sentences with non-referential DPs above and below a sentential adverbial in comparison to sentences with referential DPs. (5)
a. b. c. d.
The envoy said that presumably no king defeated the knights. The envoy said that no king presumably defeated the knights. The envoy said that presumably the king defeated the knights. The envoy said that the king presumably defeated the knights.
In particular, the question to be addressed here is whether placing a subject above a sentential adverbial leads readers to treat the subject as a topic. Consequently, it would be very odd to have a negative phrase like no king which cannot serve as topic, appear above a sentential adverbial, given the standard assumption that negatives are non-referential and therefore do not make good topics (Reinhart 1981, 1995; Erteshik-Shir 1997; Frey 2000). In a self-paced reading study, participants read sentences like (5). We manipulated the referential status of the subject (subject type - referential vs. non-referential) and the relative order of subject and adverb (adverb position - early vs. late). The following hypotheses can be formulated: Hypothesis 1 If English does not have a specific position for topics above the sentential adverbial as assumed by Bobaljik & Jonas (1996) and Svenonius (2002), no difference with regard to the referential status of the subject preceding the adverbial should be found for the processing of English (no interaction of order and subject type; no reading time difference between (5b) and (5d)). Hypothesis 2 If adverb placement in English is comparable to adverb placement in German with respect to conveying information-structure constraints, then even in an English sentence, a non-referential subject preceding a sentential adverbial should be highly marked (interaction of order and subject type; longer reading times for (5b) in comparison to all other conditions). Participants were asked to read sentences in a self-paced manner followed by a task which required the choice of the correct paraphrase of the sentence.
366 Britta Stolterfoht, Lyn Frazier and Charles Clifton, Jr. 2.1.
Method
2.1.1. Participants 52 undergraduate students of the University of Massachusetts who participated for course credit. All were native American English speakers.
2.1.2. Materials The materials manipulated the type of subject (non-referential vs. referential) and the position of the adverb (early vs. late). Both factors were manipulated within items. 24 sentence quadruples were constructed (see examples in (5)). The sentences are provided in the Appendix. 12 different sentence adverbs were used, consisting of four evaluatives (surprisingly, amazingly, unfortunately, fortunately), four evidentials (evidently, obviously, apparently, supposedly) and four epistemics (presumably, possibly, probably, certainly). Each adverb appeared twice. Repetition of other lexical items was avoided as much as possible. Four presentation lists were constructed by randomly combining the 24 experimental sentences with 88 filler sentences, counterbalanced across the four groups of sentences. Each participant saw only one version of each item. The two paraphrases were reformulations of the sentences with negative and referential phrases. (6)
Paraphrases a. non-referential The envoy assumed that the knights won. b. referential The envoy assumed that the knights lost.
2.1.3. Procedure The experiment was run on a PC using E-Prime software (Psychology Software Tools, Inc.). The sentences were presented in two regions in a self-paced mode with a moving window technique. Participants pressed the space bar of the keyboard to begin the trial, at which time a row of dashes appeared on the screen. A dash represented each character of the sentence. Then, participant pressed the space bar to present each region of the sentence (see illustration in (7)).
Adverbs and sentence topics in processing English
367
(7) --- ----- ---- ---- -- ---- ---------- -------- --- -------. The envoy said that -- ---- ---------- -------- --- -------. --- ----- ---- ---- no king presumably defeated the knights.
By pressing the space bar, the two paraphrases of the sentence appeared on the screen, preceded by a question mark to signal the new task. Participants chose one of the paraphrases by pressing one of two keys. They were told to read through the sentences at a natural pace and to read closely enough to choose the paraphrase. 2.1.4. Data analysis We analyzed participants’ reading times for the two regions, the accuracy for choosing the paraphrase and the response times. Reading times and responses that were more than 2.5 SD away from the mean were excluded from the analysis. This led to less than 3 % loss of data. The data of three participants were excluded from the analysis, because of more than 25 % loss of data. 2.2. Results The results are presented in Table 1. Table 1. Mean reading times (Region 1 and Region 2) in ms, percentage of correct paraphrases (% correct) and response times (Response) in ms by condition for Experiment 2 Condition non-ref.-early non-ref.-late ref.-early ref.-late
Region 1
Region 2
% correct
Response
1164 1168 1184 1110
2277 2555 2208 2178
91 94 89 93
4338 4391 4260 4240
The reading times of Region 1, which was identical for all four conditions, exhibited no significant differences (all p > .10). The reading times for Region 2 revealed a main effect of subject type (F1 (1,48) = 8.42, p < .01; F2 (1,23) = 7.57, p < .01). Participants needed more time to read the sentences containing a non-referential subject. Addi-
368 Britta Stolterfoht, Lyn Frazier and Charles Clifton, Jr. tionally, there were two marginally significant effects: the main effect of adverb position, marginally significant in the subject analysis and fully significant in the item analysis (F1 (1,48) = 2.94), p = .09; F2 (1,23) = 4.13, p < .05), and the interaction of the two factors, significant in the subject analysis and marginally significant in the item analysis (F1 (1,48) = 4.48, p < .05; F2 (1,23) = 3.26; p = .08). The conventional 2x2 analysis of variance provides some evidence that participants needed more time to read the sentences with a non-referential subject and a late adverb than each of the other three types of sentences. Since this pattern of results was predicted we performed more focused tests, comparing the non-referential subject/late adverb condition against each of the remaining three conditions. Each contrast was fully significant (see Table 2). However, none of the remaining three conditions differed significantly from any other (F < 1.0). Table 2. ANOVA RT Region 2 – planned comparisons Comparison non-ref.-late with non-ref.-early ref.-early ref.-late
F1 (1,48)
p1
F2 (1,23)
p2
5.56 7.17 11.56
.02 .01 .001
5.40 10.24 8.92
.03 .004 .007
For the choice of the correct paraphrases, only a marginally significant main effect for adverb position was found (F1 (1,48) = 3.33, p = .07; F2 (1,23) = 3.59, p = .07). For unclear reasons, participants gave slightly more correct answers for sentences with a late adverb (93,5%) than for sentences with an early adverb (90%). The analysis of the question-answering times revealed no significant effects (all F < 1.0)
3. Discussion The results of the self-paced reading study revealed significantly longer reading times for sentences with a non-referential subject preceding a sentential adverbial in comparison to all other conditions. These results can be interpreted as evidence for Hypothesis 2, which assumes a topic position for English comparable to that found in scrambling languages like German. Hypothesis 2 predicted that a subject placed above a sentential adverbial is treated as a topic and thus a non-referential subject preceding the adverb
Adverbs and sentence topics in processing English
369
will be highly marked. This result is particularly interesting because German and English are so different with respect to the relevant structural properties: German permits various types of fronting operations (fronting to SpecCP, scrambling to various positions within the middlefield) to reflect information-structure whereas English permits very little movement of this type. The results of our study can be interpreted as evidence against Hypothesis 1 which assumes that English has only one landing position for subjects that is not sensitive to the information-structural status of DPs.1 Our data suggest that English patterns with other Germanic languages with regard to information-structural constraints for the position of the subject. One might worry that the long reading times for sentences with a negative subject preceding the adverb were due to the possibility of a scope ambiguity in these sentences, but not in the sentences with a referential subject. However, at least according to our intuitions, there is no scope ambiguity with the adverbs tested in the actual materials (unfortunately, evidently, apparently, surprisingly, etc.). Thus, we think this possibility is remote, given that at best one would be dealing with a potential ambiguity. Further, in the processing literature on scope, one does not find longer processing times due to actual scope ambiguity. For example, Anderson (2004, Experiment 6) tested scope ambiguous sentences like A climbing expert scaled every cliff. In a self-paced reading study, the ambiguous sentences were presented in either a context biased to surface scope or a context biased to inverse scope. Unambiguous control sentences were also tested (The climbing expert scaled every cliff for surface scope; A different climber scaled every cliff for inverse scope). As in all her other studies, the surface scope sentences were read faster than the inverse scope sentences but there was no effect of ambiguity.
4. Conclusions Our study examined the behavior of subjects in English when a sentential adverbial followed the subject in an embedded clause. The results of the experiment showed that a non-referential subject gave rise to a penalty (longer reading times) when it preceded the adverb but a referential subject did not. This was expected if a subject preceding the sentential adverbial is treated as a topic, since non-referential subjects do not make good topics. Based on evidence from intuitions, a similar effect seems to occur in German. But in German it is not so surprising that adverb placement would influence the information-structure of a sentence. Given the movement pos-
370 Britta Stolterfoht, Lyn Frazier and Charles Clifton, Jr. sibilities afforded by scrambling, often the only way to be sure where in the syntactic structure a subject sits is by looking at its position relative to an adverb or some other constituent. Further, the fact that a subject or other argument may appear in various syntactic positions allows the positions to be exploited for marking information-structure. But what the present results suggest is that adverbs may play a similar role in English. The results encourage the view that in a non-scrambling language too adverb placement can constrain and signal information-structure.
Acknowledgements We would like to thank William Evans for his help with collecting the data. Furthermore, we would like to thank Thomas Ernst for helpful suggestions. This work was supported by a fellowship within the Postdoc-Programme of the German Academic Exchange Service (DAAD) awarded to Britta Stolterfoht and by National Institutes of Health Grant HD-18708 to the University of Massachusetts.
Note 1. Alternatively, a single syntactic subject position may be ‘valued’ by informationstructure constraints in a context-dependent fashion, depending on its position relative to a sentential adverb. Regardless of whether one adopts a more complicated syntax with straightforward mapping to information-structure, or a simpler syntax and a more complicated statement of the information-structure constraints, it is clear that adverb position and the information-structure status of the subject interact.
Adverbs and sentence topics in processing English
Appendix Materials (one version of all experimental items and paraphrases) The envoy said that presumably no king defeated the knights. The envoy assumed that the knights lost/won. The electrician reported that presumably no appliance caused the blackout. The electrician supposed that there was a/no defective appliance. The officer noticed that surprisingly no suspect knew the victim. The victim was known/not known by a suspect. The exterminator saw that surprisingly no mouse ate the cheese. The exterminator saw that the cheese was gone/still there. The president declared that evidently no minister lied to the subordinates. The president claimed that the subordinates were deceived/not deceived. The doctor concluded that evidently no patient survived the disease. The doctor concluded that the disease was nonlethal/lethal. The police assumed that possibly no owner torched the warehouse. The police assumed that the owner was/the owners were not involved in arson. The magazine speculated that possibly no actress visited the hospital. The magazine speculated that hospital was/was not visited by an actress. The reporter said that unfortunately no quarterback attended the party. The reporter said that the party was/was not attended by a quarterback. The mother said that unfortunately no nurse called the doctor. The mother said that a/no doctor was called. The organizers announced that probably no band will play at the festival. It is likely that the/no band will appear at the festival. The forecast claimed that probably no storm will reach Amherst. The forecast assumes that Amherst will/won’t get nasty weather. The journalist emphasized that obviously no soldier killed the demonstrators. The demonstrators were/were not killed by the military. The investigator heard that obviously no clerk broke the safe. The investigator heard that the safe was/was not broken by the clerk. The teacher said that certainly no pupil smoked a cigarette. The teacher said that the pupil smoked/did not smoke. The father stated that certainly no son washed the car. The father stated that his son has been busy/his sons have not been busy.
371
372 Britta Stolterfoht, Lyn Frazier and Charles Clifton, Jr. The lawyer stated that apparently no priest embezzled the money. The lawyer stated that the money was/was not embezzeled by a priest. The director heard that apparently no audience loved his film. The film was a/no success. The judge stated that supposedly no secretary stole the data. The data were/were not stolen by a secretary. The artist recognized that supposedly no gallery owner bought the picture. The artist recognized that the picture was/was not sold. The driver said that fortunately no child missed the bus. Some child missed/Everybody caught the bus. The mayor said that fortunately no people obeyed the request. The request was/was not complied. The professor noticed that amazingly no student passed the exam. The professor noticed that somebody/nobody passed the exam. The activist noticed that amazingly no whale survived the spill. The spill left a/no whale alive.
References Abraham, Werner 1992 Wortstellung und das Mittelfeld im Deutschen. In Erklärende Syntax des Deutschen, W. Abraham (ed.), 27–52. Tübingen: Narr. Anderson, Catherine 2004 The structure and real-time comprehension of quantifier scope ambiguity. Unpublished Ph.D. dissertation, Northwestern University. Bader, Markus & Michael Meng 1999 Subject-object ambiguities in German embedded clauses: An acrossthe-board comparison. Journal of Psycholinguistic Research 28: 121– 144. Bobaljik, Jonathan David & Dianne Jonas 1996 Subject positions and the role of TP. Linguistic Inquiry 27: 195–236. Diesing, Molly 1992 Indefinites. Cambridge, MA: MIT Press, Emonds, Joseph 1976 A transformational approach to English syntax: root, structurepreserving, and local transformations. New York: Academic Press. Erteshik-Shir, Nomi 1997 The Dynamics of Focus Structure. Cambridge: Cambridge University Press.
Adverbs and sentence topics in processing English
373
Frey, Werner 2000 Über die syntaktische Position der Satztopiks im Deutschen. ZAS Papers in Linguistics 20: 137–172. Frey, Werner & Karin Pittner 1998 Zur Positionierung der Adverbiale im deutschen Mittelfeld. Linguistische Berichte 176: 489–534. Haftka, Brigitta 1995 Syntactic positions for topic and contrastive focus in the German middlefield. In Proceedings of the Göttingen Focus Workshop, [Arbeitsberichte des Sonderforschungsbereichs 340 Nr. 69] I. Kohlhof (ed.), 1–24. University of Tübingen. 2003 „Möglicherweise tatsächlich nicht immer“ – Beobachtungen zur Adverbialreihenfolge an der Spitze des Rhemas. Folia Linguistica 37: 103–128. Höhle, Tilman 1982 Explikationen für „normale Betonung“ und „normale Wortstellung“. In Satzglieder im Deutschen, W. Abraham (ed.), 75–153. Tübingen: Narr. Kaiser, Elsi & John C. Trueswell 2004 The role of discourse context in the processing of a flexible word order language. Cognition 94: 113–147. Kratzer, Angelika 1995 Stage-level and individual-level predicates. In The Generic Book, G. Carlson & F. Pelletier (eds.), 125–175. Chigaco: University of Chicago Press. Meinunger, André 1995 Discourse dependent DP (De-)Placement. Ph.D. dissertation. University of Potsdam. Platzack, Christer 1983 Germanic word order and the COMP/INFL parameter. Working Papers in Scandinavian Syntax 2. Reinhart, Tanya 1981 Pragmatics and linguistics: An analysis of sentence topics. Philosophica 27: 53–94. 1995 Interface strategies. OTS Working Papers. Utrecht University. Steube, Anita 2000 Ein kognitionswissenschaftliches basiertes Modell für Informationsstrukturierung (in Anwendung auf das Deutsche). In Von der Philologie zur Grammatik, J. Bayer & C. Römer (eds.), 213–238. Tübingen: Niemeyer. Stolterfoht, Britta & Markus Bader 2004 Focus structure and the processing of word order variations in German. In A. Steube (ed.), Information structure: theoretical and empirical aspects. Berlin /New York: Mouton de Gruyter.
374 Britta Stolterfoht, Lyn Frazier and Charles Clifton, Jr. Svenonius, Peter 2002 Subject positions and the placement of adverbials. In Subjects, expletitives, and the EPP, P. Svenonius (ed.), 201–242. Oxford: Oxford University Press.
List of contributors
Sam Featherston SFB 441 Linguistic Data Structures
Katrin Axel Department of German Linguistics
University of Tübingen Nauklerstraße 35 72074 Tübingen Germany
Saarland University Postfach 151150 66041 Saarbrücken Germany
[email protected]
[email protected]
Wolfgang Sternefeld Department of Linguistics
Peter Bosch Institute of Cognitive Science
University of Tübingen Wilhelmstraße 19-23 72074 Tübingen Germany
University of Osnabrück Albrechtstraße 28 49069 Osnabrück [email protected]
[email protected] Doug Arnold Department of Language and Linguistics
Oliver Bott SFB 441 Linguistic Data Structures
University of Essex Wivenhoe Park Colchester, C04 3SQ United Kingdom
University of Tübingen Nauklerstraße 35 72074 Tübingen Germany
[email protected]
[email protected]
Inbal Arnon Department of Linguistics
Joan Bresnan Department of Linguistics
Stanford University Margaret Jacks Hall (Building 460) Stanford, CA 94305 USA
Stanford University Margaret Jacks Hall (Building 460) Stanford, CA 94305 USA
[email protected]
[email protected]
376 List of contributors Harald Clahsen Department of Language and Linguistics
Wilbert Heeringa Meertens Institute
University of Essex Wivenhoe Park Colchester, C04 3SQ United Kingdom
Postbus 94264 Joan Muyskenweg 25 1090 GG Amsterdam The Netherlands
[email protected]
[email protected]
Charles Clifton, Jr. Department of Psychology
Thomas Hoffmann Department of English and American Studies
Tobin Hall University of Massachusetts, Amherst Amherst, MA 01003, USA
University of Regensburg 93040 Regensburg, Germany
[email protected]
[email protected]
Elena Dieser SFB 441 Linguistic Data Structures
Philip Hofmeister Department of Linguistics
University of Tübingen Nauklerstraße 35 72074 Tübingen, Germany
Stanford University Margaret Jacks Hall (Building 460) Stanford, CA 94305, USA
[email protected]
[email protected]
Caroline Féry Linguistics Department
T. Florian Jaeger Brain and Cognitive Sciences
University of Potsdam Karl-Liebknecht-Str. 24 –25 14476 Golm, Germany
University of Rochester, Meliora Hall Box 270268 Rochester, NY 14627-0268, USA
[email protected]
[email protected]
Lyn Frazier Department of Linguistics
Anke Karabanov Department of Women and Child Health
226 South College University of Massachusetts, Amherst Amherst, MA 01003, USA
Karolinska Institutet Stockholm Brain Institute Solna, 171 76 Stockholm, Sweden
[email protected]
[email protected]
List of contributors
377
Tanja Kiziak SFB 441 Linguistic Data Structures
Janina Radó SFB 441 Linguistic Data Structures
University of Tübingen Nauklerstraße 35 72074 Tübingen, Germany
University of Tübingen Nauklerstraße 35 72074 Tübingen, Germany
[email protected]
[email protected]
Peter König Institute of Cognitive Science
Louisa Sadler Department of Language and Linguistics
University of Osnabrück Albrechtstraße 28 49069 Osnabrück, Germany
University of Essex Wivenhoe Park Colchester, C04 3SQ United Kingdom
[email protected]
[email protected] Denisa Lenertová Institute of Linguistics University of Leipzig Beethovenstraße 15 04107 Leipzig, Germany [email protected]
Ivan A. Sag Department of Linguistics Stanford University Margaret Jacks Hall (Building 460) Stanford, CA 94305, USA [email protected]
Timm Lichte Emmy Noether Project SFB 441 Linguistic Data Structures University of Tübingen Nauklerstraße 35 72074 Tübingen, Germany [email protected]
Christopher D. Sapp Department of Modern Languages C-115 Bondurant University of Mississippi PO Box 1848 University, MS 38677-1848, USA [email protected]
John Nerbonne Center for Language and Cognition University of Groningen P.O. Box 716 9700 AS Groningen The Netherlands [email protected]
Stavros Skopeteas Linguistics Department University of Potsdam Karl-Liebknecht-Str. 24 –25 14476 Golm, Germany [email protected]
378 List of contributors Neal Snider Department of Linguistics
Britta Stolterfoht SFB 441 Linguistic Data Structures
Stanford University Margaret Jacks Hall (Building 460) Stanford, CA 94305 USA
University of Tübingen Nauklerstraße 35 72074 Tübingen Germany
[email protected]
[email protected]
Jan-Philipp Soehn SFB 441 Linguistic Data Structures
Stefan Sudhoff Institute of Linguistics
University of Tübingen Nauklerstraße 35 72074 Tübingen Germany
University of Leipzig Beethovenstraße 15 04107 Leipzig Germany
[email protected]
[email protected]
Ilona Steiner SFB 441 Linguistic Data Structures
Aline Villavicencio Institute of Informatics
University of Tübingen Nauklerstraße 35 72074 Tübingen Germany
Federal University of Rio Grande do Sul Av. Bento Gonçalves, 9500 CEP 91501-970 Caixa Postal: 15064 Porto Alegre Brazil
[email protected]
[email protected]
Index
acceptability, 17, 25–26, 31, 36, 47, 169, 171–173, 185–187, 192–198, 200–203, 314 accessibility, 1, 4–5, 75–76, 100, 185–189, 191–192, 195–200, 202, 279, 319, 356, 359 activation, level of, 187–191, 195, 198, 202–203 adverb placement, 38, 45, 49, 235–236, 239, 302, 306, 345, 361–366, 368–370 agreement, 7, 9–15, 17–27, 62, 111, 115, 169, 199, 223, 260, 326, 363–364 closest conjunct ~, 10–27 gender ~, 9, 13–15, 19–22, 224 number ~, 11, 16–18 prenominal vs. postnominal ~, 11–12, 15–17, 19, 21–22, 25–27 resolution, 10–12, 16–21, 25 anaphora, 117, 119, 121, 189–191, 202– 203, 207–208, 212, 222–224, 232, 356 antecedent, 117–118, 121, 177, 180, 187– 190, 208–210, 212, 215, 218–220, 222– 224, 249, 356 auch, 4, 141, 143, 148–149, 151, 227– 235, 238–241, 243–244, 246, 251 bilingualism, 7, 80, 133–137, 144–145, 147–157 ~ and code-switching, 134, 137, 147 one system theory, 135–136, 148 binding, 116–122, 224 Binding Theory, 117–118, 122, 224 boundary tone, 325 bridge predicate, 31, 35 c-command, 73, 212, 222–223, 227, 251
CELEX, 81, 104, 317 competence, 202 complexity, 11, 70, 163–164, 174, 176– 178, 185, 187–189, 191–192, 198, 262 context, 5, 55, 57–62, 65–68, 73, 75, 78– 79, 84–85, 87–92, 102, 114, 136, 143, 153–155, 163, 189–191, 201, 203, 207–208, 210, 227, 229, 233–235, 239, 244–245, 255–257, 260, 263, 278, 302, 305–307, 314, 319–321, 335, 363, 369 coordination, 9–14, 16–26, 243, 341, 343–359 NP vs. S, 349–351, 353–357 corpus historical ~, 31, 40, 46, 301 ~ of spoken language, 76–77, 102, 104, 115, 319, 343 corpus model, 77–78, 80, 83–84 corpus study, 1, 4–5, 9, 13, 24–25, 34, 46, 53, 63–65, 75, 78, 81, 83, 86, 89, 91, 104, 161–163, 165, 173, 178–180, 185, 227, 229, 238, 243–246, 250, 253–259, 261–263, 300–301, 324, 341–344, 348, 353–358 correspondence hypothesis, 107–108 Cosmas Corpus, 64 dative alternation, 75–78, 83–90 deaccenting, 328–329, 331 dialectology, 267–277, 287 dialectometry, 267–268, 278, 288–289 dialects, 267–269, 271–275, 277–280, 283–284, 287–293, 305–306, 309, 311 Austrian German, 305, 309, 313 Low Saxon, 277
380 Index Swabian, 5, 305–307, 309–310, 315– 316 discourse, 115, 319, 322, 328–330, 361– 364 discourse linking, 186–190, 193, 196– 201, 320, 322 discourse referent, 190, 193, 202, 207, 364 Down’s Syndrome, 20, 118–123, 253 downstep, 328–329, 331 downward entailing, 11, 48, 134, 249, 252, 254, 264, 317 Dutch, 108, 227, 250–251, 253, 277, 279, 281, 283, 293, 341, 354–355 Early New High German, 5, 299–301, 303–305, 307, 310–311, 314–316 elicitation task, 102–103, 319–320, 322, 324, 334–336 English, 4–6, 75–76, 84, 100–102, 105– 106, 108, 161–162, 169, 178, 185– 186, 191, 193, 197, 201, 227, 235, 249, 250, 274, 281, 324–325, 327, 329–331, 334–337, 341–343, 361, 364–366, 368–370 event-related brain potentials, 110–114, 116 experiments, 2, 5–6, 32–36, 38, 47, 56, 61, 71, 73, 75–76, 78–84, 86–87, 90– 91, 99, 107–108, 110–116, 119–121, 162, 169–171, 173, 189, 192–195– 196, 198–199, 203, 209–210, 213, 221, 223, 234, 238, 240–241, 243– 246, 250, 268, 272, 278, 280, 286, 311, 313–315, 317, 320, 322, 343, 364, 366–367, 369 extraction/parenthesis debate, 6, 29–36, 38–49 extraposition, 49, 255, 300, 302–307, 310, 315 eye tracking, 2, 4, 207–209, 212–213, 221, 224
filler-gap dependency, 177–178, 185– 187, 191–193, 195–197, 199–201 focus, 4–6, 10–11, 33, 134, 147, 185, 187, 190, 207, 209, 213, 218–219, 222–223, 227–229, 233, 243, 246, 252, 268–269, 288, 293, 299–308, 310–317, 321–329, 331–336, 343– 344, 348, 361, 368 association with ~, 4–5, 26, 31–32, 37, 54–57, 66, 71, 73, 80–99, 102, 105, 108, 141, 143, 148–149, 151, 163, 170–172, 227–246, 251, 274, 294, 320, 348–350, 362, 366 contrastive ~, 302, 304, 310 ~ particles, 4, 141, 143, 148–149, 151, 227–235, 238–241, 243–244, 246, 251 ~-background structure, 319, 334, 361 multiple ~, 313, 324, 331, 334–335 narrow ~, 331, 334–335, 361 new information, 302, 310 Georgian, 6, 324–329, 331, 334–337 German, 4–6, 16, 29–31, 40–42, 45–46, 49, 54, 57, 64, 67, 99, 101–102, 104– 110, 112–113, 115–116, 124, 133– 134, 137–147, 149–152, 154–157, 187, 201, 210, 221–222, 227–228, 232, 246, 250–251, 254, 259, 263, 278, 299–300, 303, 305, 309, 311, 313–316, 324–327, 329–331, 334– 337, 361–365, 368–370 grammaticality, 75, 84–85, 87–91, 110– 112, 166, 168, 169, 171–173, 178, 185, 187, 191, 201, 249, 305, 307– 314, 317 grammaticality judgments, 75, 84, 91, 311–312, 317 gravity hypothesis, 6, 267–270, 273–274, 278–280, 283–284, 287–294 Greek, 6, 324–329, 331, 334–337
Index historical data, 5–6, 29, 31, 40, 43, 45– 46, 166, 268, 278–280, 283, 293, 299, 305, 315 ICE-GB, 162–164, 169, 171, 173–174, 178 information structure, 4, 6, 232, 243–245, 300, 316, 319–320, 322, 324–325, 327, 335–336, 361–362, 364–365, 369–370 intervention, 26, 65, 176, 186–188, 190, 193–196, 199–200, 251–252, 343 intonation, 71, 228, 236, 240, 243, 300, 303, 325, 327–329 intonational phrase (IP), 109, 329, 362, 364 introspection, 161, 171 intuitions, 7, 53, 56, 75–76, 79, 84, 91, 369 judgments, 2, 5–6, 11, 13, 22, 25, 27, 29, 31–35, 40, 46, 53–55, 57, 59, 62–63, 66, 68, 71, 75, 85, 89, 91, 119, 121, 169–171, 173, 191–193, 198, 200– 201, 203, 238–239, 241, 244, 277, 305–309, 312–313, 316, 363
381
~ and geographic distance, 267, 269, 272, 274–275, 279–280, 283, 285– 293 ~ and population size, 11, 117–118, 267, 269–270, 272–274, 279–280, 283–285, 287–290, 292 ~ and social contact, 267, 270–273, 279, 284, 286, 292–293 locality, 5, 22, 24, 117–118, 185–187, 191, 193, 200–202 magnitude estimation, 5, 32, 53, 66, 162, 169, 171, 179, 192, 300, 311, 315–316 mapping hypothesis, 362 middlefield, 41, 45, 47–48, 227, 229, 234–241, 363, 369 mixed effect regression models, 78, 81– 89, 173 natural usage data, 91 negative polarity item, 4, 249–264 classification, 250, 253, 260–264 licensing, 249, 251–255, 259, 262–265 retrieval, 187, 253 Old High German, 6, 31, 40–49
language acquisition, 97–98, 100–101, 103–104, 107, 116, 123, 134–137, 150, 154, 156, 271 language disorders, 7, 97–98, 113–114, 116–119, 121–123 language mixing, 133–137, 148–150, 152–153, 155–156 language production, 76, 134, 139–140, 147, 152–153, 156, 341–343, 357, 358 language separation, 7, 133–134, 136– 137, 143, 150, 153–155, 157 Levenshtein distance, 275–278, 280–281 linguistic distance, 267, 275, 277–283, 286–290, 292 linguistic variation, 5, 137, 267–269, 271–275, 277–281, 283–284, 287– 293, 300–301, 305–306, 309–311
parallel-structure effect, 343–344, 347– 353, 357–359 parsing preferences, 4, 341, 343, 356– 357 passive, 116–123, 140, 148, 162, 358 pattern matching technique, 6, 32 phonetic parameters, 228–229, 235, 238– 239, 242, 244–245 phonetic segment distance, 275–276, 278, 294 Portuguese, 7, 9–13, 15–19, 22–24, 26 prediction, 115, 177, 193, 195–197, 288 prefield, 45, 227, 229, 232–243, 246 preposition placement, 5, 161–180 probability, role of, 5, 75–85, 91, 106, 173, 209, 212–224, 302
382 Index production experiment, 234–235, 240, 242, 244–246, 319, 322 pronoun, 4–5, 36, 44, 47–48, 76, 82, 85– 90, 117–118, 120–122, 165–166, 187– 190, 207–210, 212–225, 230, 302, 316, 335, 342–343, 355–356 ~ resolution, 208–209 prosody, 4, 6, 31, 45, 89, 124, 177, 227– 229, 233–235, 238, 240–241, 243– 245, 319–320, 325, 327–328, 331, 334–337 psycholinguistics, 1, 3–6, 33–36, 38, 47, 55–56, 59, 61, 63, 66, 68, 71–72, 75– 76, 78–84, 86–87, 90–91, 97–99, 103, 105, 108, 110–116, 119–120, 123, 162, 166, 168–171, 173, 174, 176, 178, 192–193, 195–199, 208–210, 212–213, 221, 223, 227, 234–236, 238, 240–241, 243,–246, 250, 268, 272, 278, 280, 286, 291, 311, 313– 315, 317, 319–320, 322, 334, 336, 341–344, 359, 361, 364, 366–367, 369, 371 quantifier, 7, 251 ~ distributivity, 63–68, 71 ~ scope, 7, 53–60, 62–71, 73, 190, 228, 240, 249, 251–252, 254–255, 362, 369 questionnaire study, 3, 6, 53, 57, 67, 72, 78–82, 87–89, 186, 192, 201, 208, 279, 313, 319–320, 335 referential expression, 207–210, 212–224 reflexive, 117–118, 120–123 relative clause, 43, 123, 161–163, 165– 180, 186, 191, 254, 341 relative clause attachment, 341–343, 354–356 Russian, 133–134, 138–152, 154–157 scrambling, 302–307, 309–310, 315, 361– 364, 368–370
self-paced reading, 4–5, 110, 112, 116, 186–187, 189, 192, 198–203, 208, 323, 341–343, 353–354, 358, 365–369 semantic classes, 90, 92 sentence comprehension, 4, 97, 107–112, 115–116, 201, 203, 208, 221, 341– 343, 358 Spanish, 11, 23, 105 speech perception, 228, 234, 238, 240, 243, 245 spontaneous speech production, 75, 102, 104, 115, 245, 319, 343 stress, 124, 227–236, 238–244, 246, 301, 308, 310, 315–316, 325, 328–331, 334–336, 362 superiority, 185–187, 190, 195–202 syntactic preferences, 91 topic, 188, 228, 232, 234, 243, 271, 292, 319–320, 322, 324–325, 329, 334– 336, 361, 363–365, 368–369 contrastive ~, 228, 243, 246, 319, 321, 334 Contrastive ~ Hypothesis, 228, 243, 246 translation equivalents, 7, 135, 139–143, 146–148, 154–156 TÜBA-E, 343–354, 357 tuning hypothesis, 341–342, 348, 353– 354, 356–357 TÜPP-D/Z, 254 verb inflection, 100–101, 103–107, 124 verb placement, finite, 6, 29–31, 35, 42–44, 48–49, 108–116, 227, 229 verbal complex, 45, 71, 299–311, 315– 316 Visual World Paradigm, 207, 209, 213 Wh-Processing Hypothesis, 185–186, 191–192, 198, 201 wh-questions multiple wh-dependencies, 185–189, 191, 193, 198, 200–202
Index single vs. double ~, 319–322, 324– 327, 329–336 subject vs. object ~, 193–195, 320– 321, 323–328, 335 Williams Syndrome, 118–121
383
word order, 4, 6, 42, 49, 58, 99, 108–112, 114–115, 169, 185, 245, 300–301, 303–306, 309–311, 313–315, 317, 320, 324–327, 331–336, 361, 364