Rethinking Idiomaticity
Corpus and Discourse Series editors: Wolfgang Teubert, University of Birmingham, and Michaela...
31 downloads
734 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Rethinking Idiomaticity
Corpus and Discourse Series editors: Wolfgang Teubert, University of Birmingham, and Michaela Mahlberg, University of Liverpool. Editorial Board: Frantisek Cerm´ak (Prague), Susan Conrad (Portland), Geoffrey Leech (Lancaster), Elena Tognini-Bonelli (Siena and TWC), Ruth Wodak (Lancaster) and Feng Zhiwei (Beijing). Corpus linguistics provides the methodology to extract meaning from texts. Taking as its starting point the fact that language is not a mirror of reality but lets us share what we know, believe and think about reality, it focuses on language as a social phenomenon, and makes visible the attitudes and beliefs expressed by the members of a discourse community. Consisting of both spoken and written language, discourse always has historical, social, functional and regional dimensions. Discourse can be monolingual or multilingual, interconnected by translations. Discourse is where language and social studies meet. The Corpus and Discourse series consists of two strands. The first, Research in Corpus and Discourse, features innovative contributions to various aspects of corpus linguistics and a wide range of applications, from language technology via the teaching of a second language to a history of mentalities. The second strand, Studies in Corpus and Discourse, comprises of key texts bridging the gap between social studies and linguistics. Although equally academically regorous, this strand will be aimed at a wider audience of academics and postgraduate students working in both disciplines.
Research in Corpus and Discourse Meaningful Texts The Extraction of Semantic Information from Monolingual and Multilingual Corpora Edited by Geoff Branbrook, Pernilla Danielsson and Michaela Mahlberg Corpus Linguistics and World Englishes An Analysis of Xhosa English Vivian de Klerk Evaluation in Media Discourse Analysis of a Newspaper Corpus Monika Bednarek Idioms and Collocations Corpus-based Linguistic and Lexicographic Studies Edited by Christiane Fellbaum
Working with Spanish Corpora Edited by Giovanni Parodi Conversation in Context A Corpus-based Analysis ¨ Christoph Ruhlemann Studies in Corpus and Discourse English Collection Studies The OSTI Report John Sinclair, Susan Jones and Robert Deley Edited by Ramesh Krishnamurthy With an introduction by Wolfgang Teubert Corpus Semantics An Introduction Anna Cerm´akov´a and Wolfgang Teubert Historical Corpus Stylistics Media, Technology and Change Patrick Studer Text, Discourse, and Corpora. Theory and Analysis Michael Hoey, Michaela Mahlberg, Michael Stubbs and Wolfgang Teubert With an introduction by John Sinclair
This page intentionally left blank
Rethinking Idiomaticity A Usage-based Approach
STEFANIE WULFF
London r New York
Continuum The Tower Building 11 York Road London SE1 7NX
80 Maiden Lane, Suite 704 New York NY 10038
C
Stefanie Wulff 2008
All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. Stefanie Wulff has asserted her right under the Copyright, Designs and Patents Act, 1988, to be identified as Author of this work. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-1-8470-6420-2 (Hardback) Library of Congress Cataloging-in-Publication Data The Publisher has applied for CIP data Typeset by Aptara Printed in the United Kingdom by Biddles, Norfolk
Contents
Acknowledgements
x
Introduction
1
1 Theoretical issues Introduction 1.1 Previous approaches to idioms and idomaticity 1.1.1 Discourses analysis and the relevance of phraseological units 1.1.2 The collocation-idiom continuum in phraseology 1.1.3 Idiomaticity as a multifactorial concept: Psycholinguistic approaches 1.2 A constructionist approach to idioms and idiomaticity 1.2.1 Theoretical underpinnings 1.2.2 The role of non-compositionality and frequency 1.2.3 A usage-based model 1.2.4 Placing idioms in the constructicon Conclusion
8 8 8
12 14 14 15 17 17 19
2 Methodological issues Introduction 2.1 (Quantitative) corpus linguistics 2.2 Data 2.2.1 Preliminary considerations 2.2.2 The data sample 2.2.3 Experimental data Conclusion
20 20 20 23 23 25 28 33
3 Compositionality Introduction 3.1 Previous approaches 3.1.1 Non-compositionality approaches 3.1.2 Compositionality approaches
35 35 35 35 36
10 11
viii
Contents 3.1.3 Idiomatic pattern typologies 3.1.4 Corpus-linguistic/computational approaches Towards a new approach to compositionality 3.2.1 A pre-test data sample of VPCs 3.2.2 Finding an adequate association measure 3.2.3 (Dichotomous) p-values do not reflect (interval-scaled) association strength 3.2.4 Verb or particle, is that the question? 3.2.5 Interim summary A new approach to compositionality 3.3.1 Application 3.3.2 Results Conclusion
39 41 44 46 47
4 Flexibility measures Introduction 4.1 Previous approaches 4.1.1 Theoretical approaches 4.1.2 Psycholinguistic approaches 4.1.3 Corpus-linguistic approaches 4.1.4 Kinds of flexibility 4.2 A new approach to flexibility 4.2.1 The baseline 4.2.2 Measure I: An extension of Barkema (1994a) 4.2.3 Measure II: Entropy 4.2.4 Measure II, version B: Directional entropy 4.3 Tree-syntactic flexibility (SF) 4.3.1 Application 4.3.2 Results 4.4 Lexico-syntactic flexibility (LF) 4.4.1 Application 4.4.2 General lexico-syntactic flexibility 4.4.3 Flexibility of the NP-slot 4.4.4 Flexibility of the verb-slot 4.5 Morphological flexibility (MF) 4.5.1 Application 4.5.2 Flexibility of the NP-slot 4.5.3 Flexibility of the verb-slot 4.6 Evaluation Conclusion
67 67 68 68 69 71 75 77 77 78 82 85 87 87 88 91 91 94 98 107 116 116 116 123 141 145
5 The idiomatic variation continuum Introduction 5.1 Method: Principal Component Analysis
150 150 150
3.2
3.3
50 52 54 55 55 59 65
Contents 5.2
How idiomatic variation parameters cluster Conclusion
ix 152 154
6 The idiomaticity continuum Introduction 6.1 Method: Multiple regression analysis 6.2 Assessing the relevance of idiomatic variation parameters for idiomaticity judgements Conclusion
157 157 157
7 Towards a new model of idiomaticity
163
Appendices
170
A. Example of a questionnaire for the elicitation of perceived idiomaticty data
171
B. Results for compositionality
173
C. Barkema-based flexibility values for tree-syntactic, lexico-syntactic and morphological flexibility
176
D. Entropy-based flexibility values for tree-syntactic, lexico-syntactic and morphological flexibility
186
E. Barkema-based flexibility values for the parameter levels of tree-syntactic, lexico-syntactic and morphological flexibility
196
F. An overview of the formal flexibility parameters and their parameter levels with corresponding examples
214
G. Scatterplots displaying the correlation between the two flexibility measures (Barkema-based = ‘B’, entropy = ‘H’)
217
References Author Index Subject Index
227 236 238
158 160
Acknowledgements
This book grew out of my Ph.D. thesis, which was generously funded by the University of Bremen, Germany. I consider myself very fortunate to have had many people around me who supported me throughout the process of writing this book. I am deeply indebted to my supervisor, Anatol Stefanowitsch. His confidence in my abilities kept me going, and he gave the project decisive impulses at the right time, which helped me finish. Especially in the planning stages of the project, I benefited from discussions with my colleagues Arne Zeschel and Kerstin Fischer. Holger Diessel and Stefan Mu¨ ller also offered constructive feedback, which I greatly appreciated. A special thanks goes to Ewa Da˛ browska and Philip Shaw at the University of Sheffield for distributing questionnaires in their classes. For most of 2006, the Linguistics Department at the University of California at Santa Barbara provided me with a friendly and stimulating atmosphere to wrap things up. In spring 2007, the English Language Institute at the University of Michigan became my new academic home. Many people there, among them Nick Ellis, Gregory Garretson, John Swales and Ute R¨omer, gave insightful comments on the final draft of the book manuscript. My gratitude also goes to my family. The loving and unconditional support from my mother is simply beyond words. Marianne is the best vice-mum and friend I could ever hope to have. And my brother Christian was always available for uplifting conversations about the ‘really important things in life’. A special thanks goes to Stefan Th. Gries for his advice on the statistics, and for affording meticulous feedback in every stage of the project. It is wonderful to have such a distinguished colleague and best friend in one person. Finally, I am forever grateful for the music of people like Norah Jones, Ray Lamontagne and my very favourite U2. It makes manual data coding a much more bearable task. I dedicate this book to my father, who loved language as much as I do.
Introduction
Being presented with phrases of the kind see a point, take the plunge or write a letter , native speakers of English can readily identify differences between these phrases with respect to their idiomaticity: take the plunge is undoubtedly more idiomatic than write a letter . What exactly is it about take the plunge as opposed to write a letter that guides a speaker’s decision that the former is more idiomatic than the latter? That is the question addressed in this book. The study presented here is by far not the first to address this issue; rather, it builds upon previous research in various linguistic disciplines, the most relevant findings of which I lay out in more detail in Chapter 2. In early studies, idiomaticity was equated with non-compositionality, which means that the meanings of the parts of a phrase do not add up to the meaning of that phrase. More recent research has shown that it is not possible to draw a sharp dividing line between idioms and non-idioms on the basis of this criterion. Instead, noncompositionality appears to be a matter of degree. For instance, native speakers were shown to be able to distinguish at least three different degrees of compositionality, with V NP-constructions like see a point obtaining the middle class between take the plunge, which is judged very non-compositional, and write a letter , which is judged highly compositional. Psycholinguistic studies, lexicographic research and most importantly, cognitively oriented linguistics furthermore suggest that idiomaticity cannot be reduced to (degrees of) non-compositionality. Speakers also rely on other characteristics of a phrase to assess its overall idiomaticity, such as its formal fixedness or degree of conventionalization. For one, the perceived idiomaticity of a construction tends to correlate highly with the acceptability of modified variants of that construction. Consider the examples in (1) and (2). 1. (a) (b) 2. (a) (b)
I see your point. I see your important point. He took the plunge. He took the plunge slowly.
While native speakers tend to agree that (1b) is an acceptable variant of (1a), there is less agreement with respect to (2a) and (2b). Their idiomaticity
2
Introduction
judgements for phrases like these correspond to these differences in acceptability, such that the more acceptable they consider a modification, the less idiomatic they consider the phrase to be. Speakers’ idiomaticity judgements were also shown to correlate with the phrases’ syntactic flexibility, that is, the extent to which a construction licenses syntactic variations such as passivization. For example, pop the question (‘to propose marriage’) is generally considered less idiomatic than kick the bucket (‘die’) – at the same time, the former can occur in the passive variant, but the latter cannot (without losing its idiomatic interpretation), as illustrated in (3) and (4). 3. (a) (b) 4. (a) (b)
He popped the question. The question was popped by him. He kicked the bucket. ∗ The bucket was kicked by him.
This interplay between aspects of formal and semantic characteristics of idiomatic constructions raises the question ‘if’ and ‘to what extent’ each one of them is an actual part of a speaker’s concept of idiomaticity. The theoretical framework of the present study is Construction Grammar (cf. Section 1.2 for a brief introduction). While the notion of construction embraces morphemic, lexical and multi-lexemic entities in Construction Grammar, the present study focuses on phrasal constructions. More specifically, given their high prominence in idiomaticity-related research, V NP-constructions were selected as the primary body of data, for example, phrases like make a point, take the piss or leave a mark. With regard to complex constructions, a constructionist perspective suggests that the single necessary condition a complex expression has to meet in order to qualify as a construction is that it has to be a conventionalized multi-lexemic expression – once a complex expression is assigned construction status, it will automatically be positioned somewhere on an idiomaticity continuum. By conventionalization, I mean that the specific combination of words constituting the phrase, regardless of the specific variant forms the individual component words take on, has to be sufficiently frequent. (Problematic aspects of the frequency criterion with regard to corpus-based analyses will be discussed in Section 2.2.1.) In accordance with the observation that idiomaticity is a scalar and complex phenomenon, the term (core) idiom is used only for those constructions that cluster around the high end of an idiomaticity continuum. Idiomaticity, in turn, is not a category name for a construction that stands in opposition with other constructional categories, but a characteristic that is inherent in any construction to a given extent. Any construction can be positioned on this idiomaticity continuum, since it covers both highly idiomatic constructions (that is, those constructions most commonly referred to as core idioms) alongside literal and freely recombinable constructions. The difference between constructions qualifying as core idioms and other more or less idiomatic constructions is a matter of degree.
Introduction
3
The present study presents corpus-linguistic definitions of all factors that were proposed to contribute to idiomaticity. On the basis of these definitions, it develops a model of this complex meta-concept, that is, the first to be exclusively based on corpus data, rather than being merely applied to corpus data. In other words, all independent variables like compositionality or formal flexibility (potentially) determining the dependent variable, idiomaticity, will not only be measured in the corpus on the basis of some corpus-external definition of each independent variable – rather, the definitions are determined in a bottom-up fashion, such that they are based on the distributional characteristics observable in the corpus data as far as possible. In view of the multitude of psychological and psycholinguistic studies that have successfully demonstrated their ability to elicit idiomaticity rankings for a variety of idiomatic variation parameters, it appears that corpus data are not really needed to provide insights into the distributional characteristics of idiomatic constructions. So what can be gained by a corpus-linguistic approach to idiomaticity of this kind? Actually, the conception of idiomaticity assumed here entails that idiomaticity is not really ‘in the corpus’. The nature of idiomaticity to be outlined in more detail in the following sections also rules out that it is observable in the same way as other multifactorially determined linguistic phenomena are. There is nothing in the corpus that uniquely represents idiomaticity that one could detect and count in the same way one can count, say, the number of occurrences of pied-piped versus canonical verb particle clauses when being interested in particle placement. Idiomaticity in its entirety is a purely psychological construct, which is only real in the head of a speaker. So technically speaking, the dependent variable of the present study, idiomaticity, is not manifest in the corpus data themselves. Given the high success rate of psycholinguistic approaches to different independent variables establishing idiomaticity, and the fact that the dependent variable itself is a psychological construct anyway, the sceptic may feel inclined to ask: Why turn to corpus data at all? In a nutshell, the answer to this question is: because from a usage-based perspective as adopted here, that is exactly what speakers also do. As outlined in Section 1.1.4, a usage-based model of language hypothesizes that principally all linguistic behaviour is determined by the speaker’s linguistic environment, that is, his or her linguistic input and output. Large-scale, balanced corpora like the British National Corpus (BNC) can be regarded as the best representation of a speaker’s linguistic environment that is available today. With respect to idiomaticity judgements, a usage-based approach hypothesizes that speakers monitor the distributional characteristics of the phrase in question at different levels, such as morphology, syntax and semantics, and weighting the distributional information obtained in a particular way, they arrive at a judgement of overall idiomaticity. The weighting itself, the idiomaticity formula, so to speak, is not directly retrievable from the corpus – yet the distributional properties of the phrase that enter into this formula are provided by the corpus.
4
Introduction
Corpus linguistics in general, and quantitative corpus-linguistic studies in particular (cf. Section 2.1), are gaining growing attention for their ability to model multifactorial processes through a combination of large-scale data samples and analytical statistics specifically geared at multifactorial phenomena (Gries 2003, Diessel and Tomasello 2005, Wulff 2003). To provide an example, Wulff (2003) presents a corpus-based approach to prenominal adjective order in English that is corpus-based and multifactorial in nature. Similarly to idiomaticity, several syntactic, semantic and pragmatic factors were proposed in previous studies to influence the preferred ordering of adjectives, such as the perceived nouniness of the two adjectives in question, their semantic closeness to the head noun and their overall frequency. For a total of eight different factors, Wulff develops corpus-linguistic definitions, which are fed into a Linear Discriminant Analysis (LDA), that determines how large the impact of each variable is on the ordering of two adjectives if they are simultaneously taken into consideration. The LDA serves as a model of speakers’ unconscious weighting of the different variables when deciding on the preferred order of two particular adjectives, yielding a prediction accuracy of 73.5 per cent. A corpus-linguistic approach is taken to be the most adequate and elegant way for defining the independent variables that potentially contribute to idiomaticity because it emulates a speaker’s actual proceedings to the maximum extent possible. In judgements of overall perceived idiomaticity, it appears that those different variables are assessed subconsciously, so it is difficult to conceive of a valid way to elicit the weightings of the different parameters experimentally. This is not to downplay, however, the major challenge that resides in developing the corpus-linguistic definition of a factor that stands in accord with what is already known about this factor from, say, other psycholinguistic studies. One factor that is notoriously difficult to grasp, for instance, is compositionality. Neither do I want to claim that the corpus-linguistic definitions I present here could not be improved – as a matter of fact, it lies at the heart of the present study to point out how different methodological decisions at the various stages of the developing process of a corpus-linguistic definition influence the results, and sometimes fundamentally so. More often than not, more than one alternative definition and/or measure is presented. When reference is made to the different factors that were proposed to contribute to idiomaticity, such as compositionality or syntactic flexibility, the generic term I will use for them is ‘variation parameter’1 . Neither any individual idiomatic variation parameter, nor a specific combination, nor the total sum of them, is referred to as idiomaticity. Instead, the term idiomaticity is reserved to refer to the psychological construct of a quality that speakers create on the basis of the different idiomatic variation parameters (and just how they do this is one of the central questions approached in this study). Consequently, we also have to draw a distinction between two different continua: the idiomaticity continuum is what speakers construct on the basis of the information about constructions that they consider salient to help them assign the construction a specific position on the continuum; the idiomatic variation continuum is defined by the range
Introduction
5
of values that constructions actually take on with respect to their semantic and syntactic behaviour. From the cognitive-linguistic/constructionist perspective adopted here, it is reasonable to assume that speakers will not rely on all distributional characteristics a particular construction exhibits on all its dimensions to the same extent when judging the idiomaticity of a particular construction. It is possible that some parameters are more important than others, and/or that particular values on particular parameters are differently important. Also, different combinations of values of particular parameters may result in different idiomaticity judgements. To conclude, constructions are assumed to take on a potentially much larger number of values with respect to the different idiomatic variation parameters than actually enter into the assessment of idiomaticity as the overall quality of the construction. The idiomatic variation continuum represents what is available in the linguistic environment of a speaker. The idiomaticity continuum represents what speakers do with this input, highlighting what they pick out as salient characteristics of an idiomatic construction. With regard to the idiomatic variation parameters that are considered in the present study, the largest common denominator was drawn from the review of literature. From a constructionist perspective, any construction is a form-meaning pair, and so it can be expected that the overall idiomaticity of a construction manifests itself both at the semantic and at the formal level. The primary semantic feature that has been argued to contribute to idiomaticity is compositionality. In this context, it is important to emphasize once again that although non-compositional expressions may be prototypical instances of constructions such that non-compositional constructions are more readily identified as constructions, even perfectly compositional expressions qualify as constructions – they only differ from non-compositional items on this parameter. That is, compositionality constitutes a sufficient criterion for construction status, yet it is not a necessary condition like the frequency condition. With respect to their formal behaviour, phrasal expressions are flexible (or frozen) to different extents. What exactly formal flexibility entails with regard to V NP-constructions is discussed in more detail in Chapter 4. In line with the data-driven approach adopted here, all forms of formal variation that could be observed in the corpus data were taken into account. This bottom-up procedure stipulated a categorization into aspects of syntactic flexibility (as exemplified in (3) and (4)), lexico-syntactic flexibility (lexical insertions as exemplified in (1) and (2)), and morphological flexibility, that is, phrases’ variability with respect to the morphological variations they exhibit (in terms of tense, aspect, negation, etc.). The fundamental distinction made here between idiomatic variation and idiomaticity turns the question what reflects idiomaticity into one of asking which of the idiomatic variation parameters proposed in the literature actually play a role in speakers’ conceptualization of idiomaticity, and how important these are relative to each other. To address this question, the corpus-linguistically defined idiomatic variation parameters are complemented with idiomaticity judgements obtained from native speakers of English (cf. Section 2.2.3).
6
Introduction
In bringing together the corpus-linguistic definitions of the idiomatic variation parameters that have been argued to contribute to overall idiomaticity and the experimentally elicited idiomaticity rankings, the study takes a major step towards an empirical answer to the following questions:
r What does a comprehensive, usage-based model of the idiomatic variation continuum look like? That is, what picture of the distributional characteristics of idiomatic constructions emerges once we take more than one idiomatic variation parameter into account? Where does the bottomup definition of the different idiomatic variation parameters produce results that are compatible with theoretical claims and previous psycholinguistic findings, where does it depart? To what extent does any given idiomatic construction deviate from a general trend observable for V NPconstructions? How do the different variation parameters relate to each other? r What reflects idiomaticity from a usage-based perspective? That is, which of the idiomatic variation parameters can be shown to contribute to speakers’ perceived idiomaticity of a phrase once we take all idiomatic variation parameters into account? How important are the different idiomatic variation parameters relative to each other? Which ones are most decisive for the overall idiomaticity judgement? A final remark concerning falsifiability is in order. It has to be emphasized that the distinction between idiomatic variation in performance data and speakers’ corresponding intuitions on idiomaticity entails that the idiomaticity judgements are not the gold standard against which the quality of the corpus-linguistic definitions is determined. If it can be shown that the corpus-linguistic definitions of the different idiomatic variation parameters actually account for speakers’ idiomaticity judgements to a considerable extent, it is plausible to assume a causal relationship between the two, and it is an indication that the corpus-linguistic definitions of the idiomatic variation parameters are accurate. However, if a particular idiomatic variation parameter does not correlate substantially with the idiomaticity judgements, this does not necessarily say anything about the accuracy of the corpus-linguistic definition of this parameter per se: the definition may well be perfectly accurate, but the parameter may simply not figure in speakers’ intuitions. This is not to mean, however, that the corpuslinguistic definitions of the idiomatic variation parameters are not falsifiable at all. They are falsifiable in two ways: first, if it turns out that all idiomatic variation parameters in toto do not relate to the idiomaticity judgements in general, the validity of the corpus-linguistic definitions presented here have to be called into question. Second, we can assess the quality of the corpus-linguistic definitions by checking to what extent the idiomaticity model that emerges from them is compatible with established facts about idiomaticity. For instance, variation parameters like compositionality or syntactic flexibility were confirmed to be strongly associated with idiomaticity in various studies, so if the results of the present study do not suggest any substantial impact of these parameters on
Introduction
7
the idiomaticity judgements, this must be taken as a strong indication that the corpus-linguistic operationalizations of these parameters are flawed. This book is structured as follows: Chapter 1 starts out with a brief introduction to the theoretical framework underlying the present approach, Construction Grammar. In Chapter 2, it is argued that quantitative corpus-linguistics is particularly useful for the investigation of idioms from a constructionist perspective. This chapter also describes the extraction of the corpus data from the British National Corpus and how the corresponding idiomaticity judgements were elicited. Chapter 3 is devoted to a corpus-linguistic definition of compositionality, which is tested first on the basis of verb particle constructions and then applied to the V NP-data. In Chapter 4, different measures of formal flexibility are presented, covering aspects of tree-syntactic, lexico-syntactic and morphological flexibility. Chapter 5 addresses the question how the various variation parameters jointly characterize a continuum of idiomatic variation in the performance data. This idiomatic variation continuum is matched against the judgement data in Chapter 6 to identify those parameters that actually figure in speakers’ concept of idiomaticity. Chapter 7 concludes with a summary of the major findings of the study and discusses how they may be implemented into a schematic representation of the mental lexicon. Every chapter begins with a short introduction and is rounded off with a concluding section.
Note 1. Sometimes, the term ‘flexibility parameter’ is used instead – this term is narrower because it does not refer to semantic variation (i.e. different degrees of compositionality). Note how ‘flexibility’ evokes a gradable concept, that is, it implies the existence of some standard of comparison, which ‘variation’ does not; accordingly, ‘flexibility’ is mostly used in contexts in which a particular value is compared against a baseline value.
Chapter 1
Theoretical issues
Introduction In early generative approaches, idiomaticity was equated with noncompositionality, and it was regarded as a binary concept, dividing language into idioms and non-idioms. Idioms were assigned only a marginal status in language. However, Section 1.1 summarizes findings from studies in discourse analysis, phraseology, and psycholinguistics that have shown that idiomaticity is better conceived of as a phenomenon that is
r multifactorial in nature, that is, comprising more than compositionality only, but very likely also factors such as different kinds of flexibility;
r scalar in nature, such that different constructions in a language can be differently idiomatic, creating a continuum ranging from clearly nonidiomatic patterns to core idioms; r a feature that, given its omnipresence, cannot be regarded as marginal, but deserves a central position in any grammatical theory. Section 1.2 briefly introduces the framework underlying the present study, Construction Grammar, and illustrates how this framework can straightforwardly and elegantly integrate established findings from the above-mentioned studies.
1.1 Previous approaches to idioms and idiomaticity The term idiom has basically two meanings; one meaning refers to ‘the ability to speak a fluent and appropriate version of a language’ (Grant and Bauer 2004:39), which is also referred to as ‘native-like selection’ (Pawley and Syder 1983:191). With respect to the second meaning, which is the one of interest here, a widely quoted definition can be found in the Oxford English Dictionary: A form of expression, grammatical construction, phrase etc., peculiar to a language; a peculiarity of phraseology approved by the usage of a language, and often having a significance other than its grammatical or logical one. (OED 1989 s.v. idiom)
Theoretical issues
9
This definition obviously does not provide a precise and watertight definition of idiomaticity, since it is only vaguely paraphrased. As this section will show, the reason for this vagueness is probably due to the fact that the picture of idiomaticity that emerges from linguistic research is far too complex and unsettled to find its way into an unambiguous and crisp dictionary entry. The early definition of idioms as units which display phrase-like behaviour in some respects but word-like behaviour in others, paired with the predominance of generative grammar throughout most of the twentieth century, relegated them to the margins of linguistics (Sonomura 1996:28). More recently, however, the generative-transformational paradigm with its sharp distinction between syntax and the lexicon, its primary emphasis on syntax and relative neglect of semantics for an adequate description of language, and its claim that the core grammar of the human language faculty is actually innate, has triggered a variety of critical responses from the fields of linguistics, psycholinguistics and psychology. Given the diversity of perspectives and the fact their motivations only partially overlapped, it is difficult to provide a brief summary of the major consequences that these studies have had on our understanding of idioms and idiomaticity – as a matter of fact, as perspectives on idioms and idiomaticity diversified, so have definitions of these terms. However, one can reasonably group most studies into major streams of research that have contributed to our changed understanding of idioms and idiomaticity. They are grounded in their perception that semantic, pragmatic, and functional issues of language are hugely underrated in generative grammar, and that these aspects actually deserve a central role in linguistic description and theory. The present section will briefly introduce recent developments of idiomaticity-related research in discourse analysis, phraseology, and most importantly, cognitive linguistics, which have ascribed idioms and idiomaticity a much more central role. A comprehensive presentation of all these different approaches is not relevant here; moreover, there are many excellent overviews of the chronological development of idiomaticity and phraseology research, so there is no need to recap those here either. Instead, this section briefly fleshes out the major changes in the conceptualization of idiomaticity in order to establish a basis for the model developed here. Therefore, the discussion is deliberately selective and oversimplifying (for a comprehensive review of the history of studies on idiomaticity, cf., e.g. Sonomura (1996: chapter 3) and Moon (1998)). Wray (2002: part I) is a recent overview of definitions and models on formulaic phrases in general (cf. also Cowie and Howarth (1996) for a select bibliography on phraseology). The three most important changes in the view of idioms and idiomaticity concern the following three issues:
r How much of language can be referred to as idiomatic? That is, can only phrases be idiomatic, or are lexical items idiomatic, too? Are only idioms idiomatic, or is idiomaticity a property that transcends the boundaries of core idioms and actually characterizes most, if not all of language, to some extent?
10
Rethinking Idiomaticity r If we allow for different kinds of lexical and phrasal items to exhibit certain degrees of idiomaticity, the ultimate question is what kind of theoretical model can handle this continuum, and how core idioms relate to other kinds of idiomatic constructions within that model. r What reflects idiomaticity in the first place? Is it founded only on non-compositionality, or do we have to take other variation parameters like lexico-grammatical fixedness, transformational deficiency, etc. into account?
1.1.1 Discourse analysis and the relevance of phraseological units Discourse analysis, the analysis of language beyond the syntactic level, experienced a strong revival in the 1970s and 1980s. Theories focused on the functional, cultural, and interactional properties of language and the direct consequences they have for the shape that language takes, e.g. Conversation Analysis (Sacks et al. 1974), Interactional Sociolinguistics (Schiffrin 1994) and Variation Analysis (Labov 1972). The major accomplishment of these approaches with regard to idiomaticity is that they all emphasize that phraseological units are not a marginal phenomenon in language. In contrast, they show that phraseological units, which may be idiomatic to different extents, are highly prominent and therefore indispensable units of a language. These multi-word expressions are labelled formulas (Pawley and Syder 1983), prefabricated language (Hakuta 1986), conversational routines (Hymes 1962, Coulmas 1981), scripts (Ellis 1984) or non-propositional speech (Van Lancker 1975). Studies range from very specific and detailed analyses of idiomatic constructions (e.g. Drew and Holt (1995) on the function of idiomatic phrases as topic termination or transition markers) to large-scale attempts to devise a function-based taxonomy of formulaic sequences. Nattinger and DeCarrico (1992), for example, present a threefold functional taxonomy that categorizes different formulaic sequences as ‘social interactions’, ‘necessary topics’ and ‘discourse devices’, while Aijmer (1996) presents a functional taxonomy specifically for conversational routines. While many formulaic sequences can be ascribed a discourse-functional or social role, it has also been pointed out that by no means all formulaic sequences can be described functionally. Cowie (1988) distinguishes ‘formulae’ (which have discourse-functional or interactional roles) from what he refers to as ‘composites’, which ‘function as constituents in sentences [. . .] and contribute to their referential, or propositional meaning’ (Cowie 1988:134). Moon’s (1992) category of ‘informational fixed expressions’ serves the same purpose. None of these approaches is specifically geared at developing a model of grammar. Most of them are descriptive rather than explanatory in nature, which is sometimes taken issue with, as in Wray’s (2002:53) critical stance towards Nattinger and DeCarrico’s (1992) functional taxonomy. However, they substantially contribute to grammatical theorizing by pointing to the relevance of phraseological units in actual communication. Any grammatical theory that strives to
Theoretical issues
11
be compatible with actual observational data has to be able to integrate and account for these items. Ultimately, the growing recognition of the relevance of phraseological units has also stipulated discussions about the possibility of accounting for them in more recent generative approaches, for example Culicover (1999) on phraseologisms such as had better or not-topics, or Jackendoff (1997) on the ‘time’ away construction as exemplified in We’re twisting the night away. This increasing attention to idioms may have far-reaching consequences; as Gries (2008) comments, the acknowledgment of phraseologisms as theoretically relevant units begins ‘to undermine the modular organization of the linguistic system’ and raises awareness for ‘subtle interdependencies on different levels of linguistic analysis’, a fact that has long been recognized in cognitive-linguistic approaches to grammar (cf. Section 1.2).
1.1.2 The collocation-idiom continuum in phraseology The second large group of studies that contributed to our understanding of idioms and idiomaticity comes from the field of phraseology itself. In early generative approaches, idioms were mostly treated as exceptions which had to be stored in a separate part of the lexicon, a ‘phrase-idiom-part’, which was different from the ‘lexical part’ (Katz and Postal 1963). Phraseological models, in contrast, do not assign idioms a special status outside of any established category, but instead regard idioms as a subcategory of multi-word units: ‘All idioms are formulas but not all formulas are idioms (in the strict sense of a construction with an unpredictable meaning or irregular form); most are not idioms’ (Pawley 1985:89). Moreover, the boundaries between idioms, collocations, and other multi-word units are fuzzy; they are seen as overlapping to some extent on a continuum of fixed expressions (Cowie and Mackin 1975, Cowie et al. 1983, Alexander 1984, Carter 1987, Nattinger and DeCarrico 1992). This has several important consequences for a definition of the term idiomaticity. First, since idiomaticity is a term that tries to capture the idiosyncrasies of all multi-word units on such a continuum, the term no longer covers only aspects of non-compositionality, but also embraces formal fixedness. This is reflected in Fernando and Flavell’s (1981:17) criteria for idiomaticity:
r the meaning of the idiom is not the result of the compositional function of its constituents;
r an idiom is a unit that either has a homonymous literal counterpart or at least individual constituents that are literal, though the expression as a whole is not interpreted literally; r idioms are transformationally deficient in one way or another; r idioms constitute set expressions in a given language; r idioms are institutionalized. Secondly, idiomaticity is no longer a property of core idioms alone: both noncompositionality and formal fixedness can be present to different degrees in a
12
Rethinking Idiomaticity
given multi-word expression. Ultimately, this leads to a view of idioms as a subset of collocations, with non-compositionality being a secondary idiomatic variation parameter. The overarching parameter that all phrases on the idiomaticity continuum share is some restriction in terms of their degree of variability. As Fernando (1996) puts it: Idioms and idiomaticity, while closely related, are not identical [. . .] In sum, while habitual co-occurrence produces idiomatic expressions, both canonical and non-canonical, only those expressions which become conventionally fixed in a specific order and lexical form, or have only a restricted set of variants, acquire the status of idioms. In other words, idioms (in the sense of more or less idiomatic expressions) are defined as ‘conventionalized multi-word expressions often, but not always, non-literal’ (Fernando 1996:38); as Wray (2002:34) puts it: ‘An alternative use of the transparency and regularity gauge might be in subcategorizing types of formulaic sequence. In other words, the feature ±idiom could be a defining variable in a typology of formulaic sequences along a continuum from fully bound to fully free’. Consider Fernando’s (1996:32) continuum of multi-word units as shown in Table 1.1. In Table 1.1, idioms and collocations are still conceived of as being two different lexical types, but they are related in that they only differ in terms of their degree of variability. Compositionality does not sufficiently discriminate between the two: both literal and non-literal expressions occur both in the idiom and the habitual collocations column; conversely, only variable items occur in the habitual collocations column. Another continuum model proposed by Howarth (1998:28) is very similar to the one proposed by Fernando (1996) in that it also calls upon semantic and syntactic information: he distinguishes between free combinations, restricted collocations, figurative idioms and pure idioms. While the non-idiomatic end of the continuum is mainly characterized by formal fixedness, the idiomatic end of the scale is determined by non-compositionality of meaning. In sum, the major contribution of phraseology is to build a bridge between the recognition of the importance of formulaic language in approaches to discourse on the one hand and grammatical theories on the other by addressing the question how idiomatic phrases relate to other kinds of multi-word units in a phraseological model.
1.1.3 Idiomaticity as a multifactorial concept: psycholinguistic approaches As already suggested, early generative approaches equated idiomaticity exclusively with non-compositionality: ‘The essential feature of an idiom is that its full meaning, and more generally the meaning of any sentence containing an idiomatic stretch, is not a compositional function of the meanings of the idiom’s elementary grammatical parts’ (Katz and Postal 1963:275, also
13
Theoretical issues Table 1.1 Fernando’s (1996:32) classification scheme for multi-word expressions. Idioms I
II
III
IV
Pure idioms, invariant, non-literal devil-may-care, backlash, red herring Restricted variance, non-literal take/have forty winks, get/have cold feet Semi-literal idioms, invariant drop names, catch fire, kith and kin Restricted variance good morning/day, blue story/joke/gag/comedian Literal idioms, invariant in sum, in the meantime, arm in arm Restricted variance opt in favour of/for for example/ instance
Habitual collocations
I
Restricted variance, semi-literal catch the post/mail, thin/flimsy excuse
II
Restricted variance, literal addled brains/eggs, for certain/sure
III
Unrestricted variance, semi-literal catch a bus/plane/ferry etc., run a business/company etc. Unrestricted variance, literal beautiful/lovely/sweet etc. woman, glowing rosy etc. cheeks Restricted variance, literal, optional elements shrug (one’s shoulders), nod (one’s head)
Literal idioms
IV
Restricted variance, optional elements abstain (from), worse (even), worse (still)
V
Weinreich 1969:26). However, even early generative studies point towards the difficulty in making a categorical distinction between what is idiomatic and what is not (Fraser (1966:59, n. 3) for the difficulty of obtaining unanimous judgements, and Cowie and Mackin (1993:ix) and Gibbs (1994: Ch. 5–6) for discussion). A growing number of psycholinguistic studies are devoted to the question: How do we comprehend what idioms mean? And consequently: How are idioms acquired? Native speaker intuitions obviously strongly accord with those phraseological models that postulate that not all idioms are totally non-compositional, but many of them are partially motivated and analysable, and they differ in the extent to which they are analysable (e.g. Cacciari and Glucksberg 1991, Gibbs 1992, 1993, Gibbs and Nayak 1989); these studies will be discussed in somewhat more detail in Section 3.1.2 in the context of defining compositionality corpuslinguistically.
14
Rethinking Idiomaticity
Beyond strengthening the hypothesis that analysability is a scalar phenomenon, several studies have drawn a connection between the different degrees of analysability and speakers’ intuitions about other properties of an idiom. For instance, Gibbs and Nayak (1989) demonstrate that native speakers judge idioms ranking higher in compositionality as more syntactically flexible than non-compositional idioms. In a similar fashion, Gibbs et al. (1989) argue that degrees of semantic analysability influence speakers’ intuitions about lexical flexibility, such that, e.g. button your lips can be altered into fasten your lips without the idiomatic meaning being lost, because button your lips is a relatively compositional phrase. Punt the bucket instead of kick the bucket, in contrast, no longer means ‘to die’, which is explained by the fact that kick the bucket cannot be semantically decomposed. McGlone et al. (1994) find a correlation between analysability and semantic productivity, that is, the possibility to create new but related idiom meanings by substituting component parts. For instance, speakers readily assign shatter the ice as a variant of break the ice with the slightly different meaning ‘to break down an uncomfortable and stiff social situation flamboyantly in one fell swoop!’. McGlone et al. (1994) find that comprehension of such variants is supported by speakers’ familiarity with the original idiom. Several studies have also provided evidence that the perceived analysability of idiomatic phrases does not reside in linguistic competence, but is actually conceptually grounded (which in turn strengthens the case for cognitive-linguistic models). Nayak and Gibbs (1990), Gibbs and Nayak (1991) and Gibbs (1992) present a series of experiments that uniformly show how ‘people’s knowledge of the metaphorical links between different source and target domains provides the basis for the appropriate use and interpretation of idioms”’ (Gibbs 1995:107). In sum, research from different disciplines such as discoursal, phraseological and psycholinguistic research has suggested that idiomaticity is best conceived of as a scalar and complex concept, and that any multi-word expression can be placed on a collocation-idiom continuum according to its idiomaticity. The following section provides a brief introduction to the basics of Goldbergian construction grammar with a specific focus on how this framework manages to integrate, and is even partly founded on, this conception of idiomaticity.
1.2 A constructionist approach to idioms and idiomaticity 1.2.1 Theoretical underpinnings Construction Grammar belongs to the multitude of post-generative approaches that carry the label ‘cognitive-linguistic’. As Evans et al. (2007) point out, Cognitive Linguistics is best described as a ‘movement or an ‘enterprise’, precisely because it does not constitute a single closely-articulated theory. Instead, it is an approach that has adopted a common set of core commitments and guiding principles, which have led to a diverse range of complementary, overlapping (and sometimes competing) theories.
Theoretical issues
15
All these theories share two general commitments:
r the Generalization Commitment, i.e. a commitment to seek generalizations that hold at all levels of linguistic analysis;
r the Cognitive Commitment, i.e. a commitment to make the general principles of language compatible with what is known about the mind and brain from other disciplines. As opposed to formal linguistic approaches, which assume a modular division of language into phonology, syntax, semantics, as well as the autonomy of language from the rest of cognition, the combination of these two commitments makes it possible to consider the possibility that different levels of linguistic analysis are indeed organized in a similar fashion. The working assumption is that it should be possible to find generalizations that hold across these subbranches and reflect general cognitive principles. This ultimately raises the question to what extent language should be conceived of as a modular system, and so renders cognitive linguistics a truly interdisciplinary approach. With respect to cognitive-linguistic theories of grammar, one can broadly distinguish between two groups: Langacker’s Cognitive Grammar or Talmy’s Lexical Sub-systems approach (Langacker 1987, 1991, Talmy 2000) aim at fleshing out the cognitive mechanisms and principles that may serve to explain why a grammar looks the way it does. The other large group of approaches to grammar is primarily concerned with an adequate and comprehensive characterization of all linguistic constructions that constitute a language (Fillmore et al. 1988, Kay and Fillmore 1999, Lakoff 1987, Goldberg 1995, 2006, Croft 2001, or Bergen and Chang 2005, to name but a few).
1.2.2 The role of non-compositionality and frequency The basic units of organization in Construction Grammar are constructions. Constructions are defined as symbolic units in Langacker’s terminology, i.e. they are pairings of form and meaning and/or function. These form-meaning pairs are often idiosyncratic in the sense that their semantics cannot be derived from either the component parts of the construction or other established facts about the language (Goldberg 1995:68). This definition ascribes idioms a central role in Construction Grammar. As Fillmore (2006) puts it: [F]rom our point of view, idioms are the irreducible units of description for the way a language works, and these are precisely the individual morphemes of the language (together with their combinatorics), and the constructions which license combinations of linguistic units into larger units, including, of course, those which require reference to one or more lexical items. In that sense, of course, almost every grammatical model is a theory of idioms, since the job is to isolate those principles which are in themselves irreducible, that is, which are not explained by other principles in the same language.
16
Rethinking Idiomaticity
Table 1.2 Examples of constructions (adapted from Goldberg 2006:5). Morpheme Word Complex word Complex word (partially filled) Idiom (filled) Idiom (partially filled) Covariational Conditional Ditransitive (double object) Passive
pre-, -ing avocado, anaconda, and Dare-devil, shoo-in [N-s] (for regular plurals) going great guns, give the Devil his due jog <someone’s> memory, send <someone> to the cleaners The Xer the Yer (e.g. the more you think about it, the less you understand) Subj Obj1 Obj2 (e.g. he gave her a fish taco; he baked her a muffin) Subj aux VPpp (PPby ) (e.g. the armadillo was hit by a car )
As I already mentioned in Section 1, non-compositionality is a sufficient, but not a necessary condition for construction status: in accordance with findings from non-linguistic categorization, psycholinguistics, language acquisition, and phraseological models like Fernando’s (1996) presented in Table 1.1, fully regular and predictable items will also be stored as long as they are sufficiently frequent (e.g. Bybee and Hopper 2001, Tomasello 2003, Diessel 2004, Pinker and Jackendoff 2005). Storage as a function of frequency is a major building principle of many cognitive approaches to grammar and is often referred to as entrenchment (also Bybee 1985:117): Every use of a structure has a positive impact on its degree of entrenchment, whereas extended periods of disuse have a negative impact. With repeated use, a novel structure becomes progressively entrenched, to the point of becoming a unit. (Langacker 1987:59) This definition is not restricted to lexical items, but refers to structures of all sizes, that is, includes the complete list of constructions in Table 1.2. Moreover, it emphasizes the fact that complex constructions do not have a particular size per se; rather, the exact length of a unit is a matter of conditional frequencies of co-occurrence and therefore performance-dependent and subject to change. This variable conception of what is stored as a unit in the mental lexicon also finds its counterpart in connectionist models of language activation processing; as Harris (1998 :65) points out: A lexicon of variable-sized units fits easily into a connectionist framework. A key idea of connectionism is that the entities available to conscious reflection, and which appear to have causal force in human cognition, are emergent from an underlying microstructure (Smolensky 1988). According to this view, the units of lexical representation are not part of a fixed architecture, but emerge through extracting co-occurrence regularities. One implication of this idea is that unit status, and the size of units, may be a matter of degree.
Theoretical issues
17
1.2.3 A usage-based model The side-by-side of simplex and complex phrases as well as lexically specified and totally unspecified phrases which are continuously changing on the basis of language input and output ultimately supersede the distinction between syntax and lexis. Instead, the totality of these constructions is often assumed to be stored in the so-called constructicon, an expanded lexicon (Jurafsky 1996). The constructicon can basically be described as a network of constructions which is organized like other conceptual categories, i.e. some of its major organizational principles are inheritance, prototypicality and extensions. Usage-based models particularly emphasize that categorization is based on inductive learning processes, that is, ‘the mental grammar of the speaker [. . .] is formed by the abstraction of symbolic units from situated instances of language use’ (Evans et al. to appear:36; Bybee and Hopper (2001:19) argue that it is plausible to assume that intuitions and language use are linked in the same way). This is reflected in Goldberg’s (2006:64) definition of usage-based grammars: Grammars are usage-based if they record facts about the actual use of linguistic expressions such as frequencies and individual patterns that are fully compositional alongside more traditional linguistic generalizations. Among the variants of construction grammar, usage-based models have gained most support recently. They were successfully applied in diverse areas of linguistic research such as, e.g. typology (e.g. Croft 2001), morphology (e.g. Bybee 1985), phonology (e.g. Bybee 2001) and also syntactic analyses (e.g. Croft 2000, 2001). Moreover, they stand in accord with findings from language processing (Jurafsky 1996), acquisition (Tomasello 2003) and learning (e.g. Gries and Wulff 2005), and therefore appear to be psychologically plausible models of language and grammar.
1.2.4 Placing idioms in the constructicon Across all linguistic levels, constructions can be classified on the basis of the extent to which they are schematized, i.e. to which degree they are lexically specified, and how much variation they allow for in those slots that are not lexically specified. Constructions occupy the whole region from fully lexically specified, simplex constructions such as individual morphemes and words, fully lexically specified multi-word units such as collocations and core idioms, partially specified complex constructions like the ‘What’s X doing Y’-construction, to totally unspecified complex constructions like argument structure constructions (e.g. the Ditransitive construction). Table 1.2 provides a schematic overview. An important consequence of the permeability of constructional categories is that any actual utterance is a simultaneous manifestation of several constructions at different different levels of schematization. For instance (5) instantiates a bundle of different constructions given in (6) (example taken from Goldberg 2006:10).
18
Rethinking Idiomaticity
(5)
What did Liza buy Zach?
(6)
(a) (b) (c) (d) (e) (f)
Liza, buy, Zach, what, do constructions Ditransitive construction Question construction Subject-Auxiliary inversion construction VP construction NP construction
This ‘vertical’ approach to language (Evans et al. to appear:5) has crucial consequences for our understanding of complex phrases: . . . rather than seeing a composite structure as an edifice construct out of smaller components, we can treat it as a coherent structure in its own right: component structures are not the building blocks out of which it is assembled, but function instead to motivate various aspects of it. (Langacker 1987:453) As Table 1.2 shows, idioms occupy the centre on this continuum of differently complex and abstract constructions. They form a fuzzy category themselves: some idiomatic phrases are fully specified and fixed (e.g. by and large) while others license some variation. Thinking about constructions as differently schematized templates supersedes a categorical distinction between idioms and other constructions, since both (lexico)-syntactic variability and semantic compositionality are a matter of degree throughout the whole continuum. That is, regular syntactic expressions and idiomatically combining expressions are not ‘compositional’ and ‘noncompositional’, respectively. Instead, the rules underlying the semantic composition of the former are more general than those of the latter. The notion of construction embraces the whole range of such expressions (Croft and Cruse 2004). Any construction can be conceived of as idiomatic, the only difference between core idioms and other construction being that the former are fully lexically specified, whereas the latter are less fixed. Accordingly, Croft and Cruse (2004) refer to them as ‘schematic idioms’. Given this definition, construction grammar is indeed all about idioms – not in the sense that its scope is restricted to the analysis of phrases like kick the bucket or red herring , but in the sense that construction grammar defines idiomaticity as a property that is inherent in all linguistic items regardless of their size and degree of schematization. From this standpoint, referring only to some constructions as idioms, like Goldberg (2006) does in her list of examples of constructions, does not imply that only these phrases are idiomatic. However, it can be argued that because of their relatively low degree of schematization, they constitute a group of items in which idiomaticity effects can most easily be identified and measured, not only in terms of non-compositional semantics, but also in terms of syntactic, lexicogrammatical, and morphological restrictions. For this reason, the present study also focuses on fully lexically specified V NP phrases like make a point or take the plunge.
Theoretical issues
19
Conclusion Usage-based construction grammar accounts for established findings about the nature of idiomaticity from discoursal, phraseological and psycho-linguistic research. Idiomatic constructions are assumed to be stored alongside words in the mental lexicon, which is primarily characterized as a continuum of differently schematized templates. Idiomaticity effects should be most easily observable in nearly or fully lexically specified constructions like the V NPconstructions investigated here.
Chapter 2
Methodological issues
Introduction Section 2.1 starts out arguing that quantitative corpus linguistics constitutes an excellent method to meet the theoretical requirements made by a constructionist approach to idiomaticity. Not only do the major commitments of quantitative corpus linguistics straightforwardly reflect the most fundamental assumptions about the linguistic system as postulated in Construction Grammar, a combination of corpus linguistic methodology with multifactorial statistical procedures also makes it possible to develop a data-based model of idiomaticity that includes more than one factor, thereby doing justice to the well-established fact that cognitive and psychological processes are multivariate and complex. In Section 2.2, the generation of the data sample underlying the present study is described. After a short discussion of preliminary considerations that have to be taken into account with regard to the compilation of a corpus-based data sample of idiomatic constructions, the extraction procedure is explained. Finally, this section also describes the creation of a questionnaire experiment that served to elicit native speaker judgements for the data that comprise the corpus-based data sample. The results of this experiment are briefly summarized and discussed, particularly with regard to their validity and reliability.
2.1 (Quantitative) corpus linguistics The methodological basis of the present approach is primarily quantitative corpus linguistic (Stefanowitsch and Gries 2003, 2005, Gries and Stefanowitsch 2004,). Quantitative corpus linguistics shares with traditional corpus linguistics the ‘linguistically-informed corpus-based interest in the whole range [of] phenomena’ (Stefanowitsch and Gries 2005:4), that is, it intends to be applicable to phonology, morphology, lexis, grammar or variation, to name but a few areas of interest. Quantitative corpus linguistics departs, however, from the majority of traditional approaches by making a strong quantitative commitment. This renders quantitative corpus linguistics very similar to Leech’s (1992:107–111) concept
Methodological issues
21
of what he refers to as computer corpus linguistics. The latter differ from corpus linguistics as a mere methodological tool in several regards, licensing a view of computer corpus linguistics (and, accordingly, also quantitative corpus linguistics) as a real paradigm. Computer corpus linguistics distinguishes itself in (i) concentrating on linguistic performance, (ii) aligning with the description of individual languages, (iii) exploiting qualitative and quantitative methodology in order to carve a out a model of language that is essentially quantitatively founded and (iv) more generally, adopting an empiricist view of scientific analysis. While quantitative corpus linguistics shares all these criteria, it does not lay claim to be a philosophical approach. Rather, quantitative corpus linguistics is a methodology that allows for the investigation of theoretically informed hypotheses at a level of sophistication that is not possible by employing traditional corpus linguistic methodology. Furthermore, just like traditional corpus linguistics, quantitative corpus linguistics is particularly compatible with performance-based approaches to language (cf. Section 1.2 for a brief overview). Its major advantage over traditional approaches is that the methodological consequences that result from such a theoretical perspective are fleshed out in considerably more detail. First, analyses are based on naturally occurring language data that are ideally extracted from maximally balanced and representative corpora. Secondly, data extraction is maximally exhaustive with respect to the phenomenon in question, that is, all attestations of the phenomenon to be investigated in a given corpus are extracted, even if this entails manual extraction and/or post-editing of (sometimes thousands of) hits (cf. Section 2.2 on actual implications of this for the extraction of the data sample of the present study). In this respect, quantitative corpus linguistics markedly differs from many computational approaches which extract data automatically and employ them without further manual editing. While automatic data processing allows for the investigation of much larger data samples (such as, say, internet data), precision is often rather low – Section 3.2.1 gives an example of how the reliance on automatic processing may provide inaccurate results. Thirdly, the extracted data are interpreted and evaluated employing rigorous quantification and statistical procedures. Given the inductive relation that is assumed to hold between frequency of exposure and grammatical representation in the constructionist approach adopted here, this step is by far not a trivial one. In the introductory chapter, I have already pointed to the potential that resides in quantitative corpus linguistics specifically with respect to the modelling of multifactorial phenomena, which many linguistic processes are assumed to be. However, quantitative corpus linguistics departs from most traditional corpus linguistics analyses already at a more fundamental level of methodological complexity. Most traditional analyses rely on raw frequency statistics (an exception here is Biber (1993). In fact, raw frequencies alone often do not capture a given phenomenon sufficiently, or worse, may even license inadequate conclusions. Gries et al. (2005), for instance, obtain rankings of the verbs preferably used in the so-called as-predicative construction via a sentence
22
Rethinking Idiomaticity
completion experiment. In a second step, they identify those verbs in the British National Corpus (BNC) which are most strongly statistically associated with the as-predicative. In other words, they produce a ranking that is based on conditional frequencies (that is, the frequency of occurrence of a verb X is determined under different conditions: how often does a verb X/do other verbs occur in the as-predicative, how often does verb X/do the other verbs occur elsewhere?) as opposed to absolute or raw frequencies (which only asks for the verb’s overall corpus frequency). Correlating these rankings with the ranking obtained from the elicited data, the association-based rankings significantly outperform the rankings obtained from raw frequency counts. Quantitative corpus linguistics raises awareness for the fact that methodological choices always impact theoretical conclusions: the mode of operation of the method chosen for the operationalization of parameter should be maximally compatible with what is already known about the phenomenon in question as well as more general theoretical premises. With regard to the latter, (quantitative) corpus linguistics is particularly suited for investigating language from a cognitive-linguistic and/or constructionist perspective because there is a considerable match between cognitive-linguistic theoretical concepts and corpus linguistic methodology. Let me briefly point out some of the most striking similarities. As was already mentioned several times, the cognitive-linguistic concept of entrenchment is based on frequency of exposure. Principally, corpus data provide nothing but frequency data. Schmid (2000:39) accordingly refers to this match as the ‘Corpus-to-Cognition Principle’. Gries (2008:12) points out that the correspondences between cognitivelinguistic theory and corpus linguistic methodology are even more systematic. For instance, he likens Langacker’s rule-list fallacy to Sinclair’s Idiom and Open Choice Principle: according to the rule-list fallacy, the mental lexicon does not only consist of words, but a variety of prefabricated and sometimes also productive units which have become entrenched to the point of having acquired unit status (which, in turn, bear close resemblance to constructions). Sinclair’s Idiom Principle correspondingly states that ‘a language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analysable into segments’ (Sinclair 1991:110), which stands in opposition to the Open Choice Principle, which states that ‘[a]t each point where a unit is completed (a word or a phrase or a clause), a large number of choices opens up and the only restraint is grammaticalness’ (Sinclair 1991:109). Gries concludes that ‘the [corpus linguistic] analysis of phraseologisms does not only reveal patterns, and maybe pecularities, of usage – it can ultimately also lead to more refined statements about matters of mental representation within the linguistic system’ (Gries 2008:12). Sch¨onefeld (1999:149–153), laying out the parallels between the Idiom Principle and Langacker’s Cognitive Grammar, furthermore points out that these two conceptions are also compatible regarding the question what qualifies as a prefabricated unit: they can take the form of any specified symbolic unit ranging from morphemes to words and complex lexical expressions.
Methodological issues
23
Last but not least, some of the most central notions in corpus linguistics, collocations, n-grams, clusters or colligations can be regarded as special instances of phraseologisms (Gries 2008). Hunston and Francis’s (2000:37) definition of a pattern, for instance, goes beyond mere co-occurrence: The patterns of a word can be defined as all the words and structures which are regularly associated with the word and contribute to its meaning. A pattern can be identified if a combination of words occurs relatively frequently, if it is dependent on a particular word choice, and if there is a clear meaning associated with it. A closer look reveals that this definition of a pattern seems to provide a technical operationalization of the definition of partially and completely filled idioms in the Goldbergian construction grammar sense (cf. Section 2.2):
r The basic precondition for a unit to qualify as a construction is that it is to some extent conventionalized – this corresponds to patterns being regularly associated combinations of words, or words and structures. r Constructions range from totally abstract to fully lexically specified, yet those constructions that Goldberg refers to as partially or fully specified idioms require the presence of at least one lexical item – likewise, a pattern can be a combination of words, or words and structures. r The fact that non-compositionality is a feature of many idioms in the construction gammar sense is compatible with the condition of a pattern having a clear meaning associated with it. In sum, the depth of these correlations between corpus linguistic methodology and cognitive-linguistic/constructionist theory promotes a corpus linguistic study of idiomaticity as pursued in the present study.
2.2 Data 2.2.1 Preliminary considerations The data sample of the present study focuses on what I will subsequently refer to as V NP-constructions, that is, constructions consisting of a mono-transitive verb and its direct object noun phrase. A determiner may intervene optionally; some constructions, however, rarely take a determiner, if at all (consider the examples in (7)–(9)).1 (7) (8) (9)
Happily such pessimism was dispelled a decade later, when the study of bacterial chemistry began to bear fruit. Achieving a standstill is vital for Heron if it is to make headway in rescheduling its debt. If a trader can mock me, in a week every bastard in London will follow suit.
Moreover, the choice of determiner is relatively fixed for some constructions in the sample (consider the examples in (10)–(13)), but not for all of
24
Rethinking Idiomaticity
them (consider the examples in (14)–(16)). Therefore, when referring to a particular construction in general, the determiner slot is marked with an X as a placeholder, as in, for instance, fit X bill, or make X point. The placeholder should be interpreted as a wildcard for any kind of determiner (cf. Section 4.5.2), including a zero determiner. (10) (11) (12) (13) (14)
Many took the plunge, and the great rail revival had started. There may be well more to it than meets the eye. And changing the washers usually does the trick. If a trader can mock me, in a week every bastard in London will follow suit. Discouragement may be given in small doses, but the cumulative effect, like dripping water, leaves its mark. (15) The impact of such major reversal has left an indelible mark on Haslam’s memory. (16) You notice that the years are leaving their mark despite daily work-outs.
Creating an adequate data sample is a challenging task in corpus linguistic analyses. The finite size of corpora has to be taken into account, and to complicate matters even more, idiomatic constructions exhibit their very own characteristics which make them notoriously difficult to handle in a corpus linguistic approach. A compromise has to be worked out between several competing considerations. To begin with, an ideal data sample should capture a range of constructions that spread across a wide, if not the whole spectrum of the idiomatic variation parameters, such that we can have a look at constructions that are formally fixed and semantically transparent to varying degrees. Technically speaking, an ideal data sample should have a high type frequency. Only a sufficient number of types with different values of variation parameters allow us to see to what extent the (sometimes fine-grained and subtle) differences are actually reproduced by our operationalizations. High type frequency is a desideratum not only for the analysis of idiomatic constructions, but for any linguistic phenomenon, since the ultimate goal is to describe and account for all types of a language at any level of abstraction. The more types can be accounted for, the higher is the representativity or generalization potential of the analysis. So with respect to type frequency, the higher, the better. In principle, the argument in favour of high type frequency can also be extended to token frequency: if an analysis can account for a type that has many instantiations, it accounts for a large part of the language. However, to consider only types with high token frequencies would not do justice to the by today well-established fact that linguistic constructions are distributed in a fashion that has been described in Zipf’s Law: the frequency of any word is roughly inversely proportional to its rank in the frequency table. So with respect to token frequency, the ideal data sample would comprise both high and low frequency items to adequately reflect the overall distribution of language. As a matter of fact, idiomatic constructions illustrate very nicely that high frequency need not necessarily correlate positively with salience: many typical idiomatic
Methodological issues
25
constructions are rather infrequent. For instance, as Moon (1998) points out, one construction that is quoted throughout the literature on idiomaticity as an excellent example of a core idiom is kick the bucket. Despite the omnipresence of this idiom in the pertinent literature, it occurs only once in the British National Corpus. While a data sample that reflects the frequency distributions of idiomatic constructions most adequately would have to include items like kick X bucket, in quantitative corpus studies, a third factor comes into play. Most quantitatively oriented approaches to linguistic phenomena require a sufficiently high number of tokens per type to ensure that the statistical procedures applied actually produce valid results. It would not make too much sense, for instance, to derive any conclusions about the idiomaticity of kick X bucket from its singular attestation in the BNC. Note, however, that this complication does not invalidate a quantitative approach to linguistic phenomena per se, but is only a downside of the limited size of corpora which are publicly available today. A possible way out of this dilemma might be to merge corpora and thereby gather additional attestations for infrequent constructions from other corpora or even the internet, but this also brings about a number of consequences that are difficult to handle. For example, it becomes increasingly difficult to estimate construction frequencies or to adequately compute significant collocates if corpora are not annotated for part-of-speech or syntactic information. Moreover, in order to comply with the requirement of exhaustive data retrieval, extracting items of low frequency constructions from, say, the internet would require a corresponding search for all other constructions, too, and for many constructions like make X point, the number of hits would surely run into hundreds of thousands.
2.2.2 The data sample In line with the above, I decided to restrict my analysis to data from the British National Corpus. While this means that my sample does not contain examples of comparatively infrequent idiomatic constructions like kick X bucket or break X ice, the data retrieval is exhaustive for the types selected, and it is maximally accurate. The present study does not aim at a comprehensive analysis of English V NPconstructions that is comprehensive. Rather, it aims to work out corpus linguistic definitions of idiomatic variation parameters which are, in principle, applicable to all constructions. To develop these definitions requires a data sample that is representative in the sense of being maximally free of noise of any kind such as misclassifications or false hits. Only a maximally accurate data sample ensures that we can attribute the results obtained to the corpus linguistic definition rather than to some effects produced by the inferior quality of the data. The data sample comprises a total of 40 constructions, including a random sample of the abstract V NP-construction which served as a baseline value for comparisons (cf. Section 4.2.1). All in all, 13,706 items were manually inspected to arrive at a sample that finally comprised 13,141 tokens. At first sight, this might
26
Rethinking Idiomaticity
seem an unimpressive number, particularly in view of previous computational studies in which several hundred types of this construction were taken into consideration (cf. Section 3.1.4/4.1.3). However, as I will argue below, the data sample as it stands excels the majority of data samples in previous studies in several aspects. These clearly outweigh its seemingly small size, particularly with respect to the purposes of the present study. Consider Table 2.1 for an overview (without the abstract V NP-construction). All entries in the Collins Cobuild Dictionary of Idioms were checked for constructions of the kind V NP. The dictionary offers the advantage of labelling the most frequent constructions as they are attested in the Bank of English, which consists of more than 450 million words.3 Since the BNC is less than a quarter of the size of the Bank of English, constructions not marked as high frequency items can be expected to occur only rarely in the BNC if at all. Therefore, to comply with the statistical requirements outlined above, only those V NP-constructions that were marked in the dictionary as high frequency items were taken into consideration. For every construction in the list of 262 candidate constructions retrieved from the idiom dictionary, its frequency of occurrence was checked in the BNC. After checking the 262 concordances manually for false hits, 33 constructions turned out to occur at least 90 times in the BNC, so these remained in the sample. The definition of idiom that the Collins Cobuild team adheres to covers a huge spectrum on the idiomaticity scale, including, among others, ‘traditional idioms’, ‘semi-idioms’, ‘multi-word metaphors’ and ‘metaphorical proverbs’ (Collins Cobuild Dictionary of Idioms:v). It has to be borne in mind, however, that such labels already fall back on theoretical or folk model assumptions concerning what qualifies as idiomatic and what does not. Moreover, the labels themselves divide the idiomaticity continuum into discrete categories of different idioms. While this is a very reasonable approach for an idiom dictionary that is specifically geared to the needs of language users and learners (rather than linguists), restricting the data sample to these items might introduce a considerable bias towards the idiomatic end of the idiomaticity continuum. Therefore, six additional constructions were included that do not occur in the dictionary. A concordance of all verbs followed by a noun phrase in the BNC was considered for selecting these six constructions. Out of 477,749 V NP-sequences, 193 occur more than 100 times, and from this list, six constructions were selected randomly. Note that these 193 V NP-sequences are not types in the above sense because the concordance is non-lemmatized and includes only V NP-sequences in canonical order. This means that some of the types occurred several times in the list with different inflections. However, in order to get an impressionistic overview of the V NP-constructions that are sufficiently frequent, it served well. As can be seen in the rightmost column of Table 2.1, the construction types were obtained by considering two sources: the abbreviation ‘CC’ informs the reader that the construction was obtained from the Collins Cobuild Dictionary of Idioms, no entry means that it was obtained from the BNC-based V NPconstruction list.
27
Methodological issues Table 2.1 The V NP-constructions in the data sample and their corpus frequencies. Construction
Frequency in BNC
Source
bear X fruit beg X question break X ground break X heart call X police carry X weight catch X eye change X hand close X door cross X finger cross X mind deliver X good do X trick draw X line fight X battle fill/fit X bill follow X suit foot X bill get X act together 2 grit X tooth have X clue have X laugh hold X breath leave X mark make X headway make X mark make X point make/pull X face meet X eye pave X way play X game scratch X head see X point take X course take X piss take X plunge take X root tell X story write X letter
90 163 133 185 325 157 491 212 827 150 140 145 155 310 192 116 135 109 142 164 232 98 292 145 136 213 1,005 371 365 269 290 100 278 294 121 115 113 1,942 1,370
CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC
28
Rethinking Idiomaticity
2.2.3 Experimental data The corpus data sample was complemented with idiomaticity judgements for the same 39 V NP-constructions. In what follows, I will briefly outline the methodological design of the questionnaire that served to obtain the judgements before presenting the results. We will then return to the judgement data in Chapter 6. Recent research suggests that just like in other research disciplines, the results of linguistic experiments crucially hinge on the nature of the scale that subjects are given to make their judgements. More specifically, categorical or ordinal scales that are commonly used for rating tasks may only insufficiently or even falsely reflect intuitions because they do not allow the subject to perform relative rankings, that is, assign values that also provide information about the relative distance between adjacent points on the scale. Alternatively, Bard et al. (1996:40–41) suggest employing a technique that is referred to as magnitude estimation, by means of which subjects create their own scale at ratio level: [M]agnitude estimation in its simplest version requires the subject to associate a numerical judgment with a physical stimulus [. . .] Once the initial stimulus, or modulus, is presented and a number associated with it by experimenter or subject, the subject assigns to each successive stimulus a number reflecting the relationship between that stimulus and the modulus. Subjects are explicitly instructed to reflect perceived ratios in their judgments: a stimulus that appears to be 10 times as bright as the first is to be given a number 10 times the original number; one that seems one-third as bright is given a number one third the size. Bard et al. present a series of empirical studies that show how magnitude estimation outperforms other methods in linguistic experiments. They attribute this to the numerous advantages that the technique has to offer: both the range of responses and the distribution of individual responses within that range are informative since there are no restrictions on the number of values used to measure the property of interest, and subjects are given the freedom to assign identical or different values to two different stimuli as they please. Moreover, in ratio-scaled judgements, differences in values directly reflect the subjects’ perceived differences, and if multiple judgements are obtained for a single stimulus, it is possible to calculate mean and variance. Also, a very pragmatic advantage of magnitude estimation as opposed to other elicitation techniques is that judgements can be obtained for a multitude of different stimuli without the subject having to do hundreds of direct, pairwise comparisons of stimuli. For the purpose of eliciting idiomaticity judgements, magnitude estimations scales were integrated into the questionnaire design as follows. Each subject was assigned one of the 39 V NP-constructions as a reference construction, so each V NP-construction served as a reference construction once, and only once.
Methodological issues
29
Subjects were asked to assign any positive number they felt appropriate to the reference construction and then judge every following construction in relation to this reference construction. The syntactic structures in which the V NP-constructions were presented were selected from the corpus data such that the structures are typical of the V NPconstruction in question.4 One might object that maximal control over context effects is only achieved if the contexts are identical for all V NP-constructions, such that, say, the subject slot is always lexicalized as a personal pronoun, and the verb is always in the simple past tense. However, such a procedure does not do justice to the contextual preferences (or even restrictions in some cases) that the different V NP-constructions exhibit. To give a few examples, write X letter nearly always occurs in the past tense, have X clue mostly occurs in its negated form (191/236 items), and it is generally ideas, political tendencies and other abstract concepts that occupy the subject slot of take X root, not people. For the most part, these observations are clearly semantically motivated. Neglecting them could have the undesirable effect of producing awkward contexts which could distract subjects’ attention. They might end up focusing more on the adequacy of the combination of the V NP-construction and the lexical realizations in the subject, verb, and NP slot rather than the idiomaticity of the V NP-construction. In order to avoid that, the contexts were selected such that they reflect the (lexico-) syntactic and morphological preferences of the construction in question as well as possible. At the same time, the context should be minimally complex to ascertain that it is the idiomaticity of the V NP-construction rather than the complexity of the sentence that is judged. Adjectives, attributive nouns, adverbials, prepositional phrases or modifying clauses were added only where necessary: since pave X way nearly exclusively occurs with an adverbial (267/269 items), and ground in break X ground is nearly always modified by new (126/134 items), their contexts were adapted accordingly. With respect to the instructions, subjects were informed that the questionnaire aims at idiomaticity judgements. ‘Idiomatic sentences’ were only very roughly defined as ‘the kind of sentences you typically find in dictionaries or phrase books’, and examples were given. Since the goal of the questionnaire was to find out if and to what extent subjects fall back on any of the different parameters like compositionality or syntactic flexibility, none of these parameters was introduced to explain idiomaticity in further detail. As a little help, subjects were asked to judge these sentences relative to ‘normal’ sentences, and how reasonable they thought it was to include them into a dictionary or phrase book. The method of magnitude estimation was explained as simply as possible. Subjects were asked to judge the first sentence they were presented on the questionnaire and regard this sentence as their reference sentence. They were informed that they could freely choose any positive number (if necessary, also fractions or decimals) that seems appropriate to them, and then assign all the other sentences a value in relation to the reference sentence to indicate how much more or less idiomatic they consider these sentences. An example was
30
Rethinking Idiomaticity
provided, and it was emphasized that there are no right or wrong answers, and that subjects should provide their first, spontaneous answers (and example of a questionnaire is given in Appendix A). 39 students from the University of Sheffield participated in the experiment. The students were first year undergraduate students in English linguistics. The concept of idiomaticity had not previously been discussed in the course. The values that subjects assign may depend on the frequency with which subjects encounter the stimulus in the experiment, and the order in which the stimuli are presented (e. g. Greenbaum 1976, Nagata 1989). In order to rule out such effects, each subject received an individual questionnaire with a different V NP-construction as reference construction. The order of the remaining 38 sentences was randomized individually for each questionnaire. Figure 2.1 summarizes the results. For each V NP-construction, the mean idiomaticity value is given. To allow for the calculation of mean values and a unified display, all values were normed according to a scale from zero to one (for every subject, the highest value assigned was set to one, and all other values were converted in relation to this value; if, for example, the highest value assigned was eight, and another value assigned was four, the latter was converted into 0.5). The ranking as displayed in Figure 2.1 principally ties in very nicely with established idiom pattern typologies (cf. also Section 3.1.3), which is a proper indication that subjects actually judged idiomaticity rather than anything else. As can be expected from any set of empirical data, there are, however, no clearcut boundaries between different classes of idioms. Rather, particular kinds of idioms seem to cluster together on particular parts of a continuum scale. For instance, the constructions ranking lowest in idiomaticity are write X letter , tell X story, call X police and close X door – which are exactly the four V NP-constructions that were not obtained from the Collins Cobuild idiom dictionary, but added separately, and which constitute highly compositional constructions. Also in the lower half are constructions like scratch X head, hold X breath, or grit X tooth, cross X finger or pull X face, all of which share the characteristic that the literal referent of the idiom is itself an instance of the idiomatic meaning, which is why they are also referred to as quasi-metaphorical idioms (cf. Section 3.1.3). The middle range is predominantly occupied by metaphorical constructions like see X point, draw X line, make X mark, leave X mark or fight X battle, all of which have in common that the relationship between the literal and the idiomatic meaning is transparent, so they could be referred to as abnormally decomposable in Gibbs’s terms, or as analysable-transparent idioms in Cacciari and Glucksberg’s terms. Many of the constructions that are even further up in the ranking share the property that the relationship between one part of them and the overall idiomatic meaning may not be as obvious at first sight as for the metaphorical constructions, which is why Cacciari and Glucksberg refer to them as metaphorical-opaque; examples are follow X suit, make X headway, take X plunge or foot X bill. Lastly, the data sample does not contain any fully non-decomposable constructions like
_ te lette ll_ r ca st ll_ or c l pol y o sc se_ ice ra d t c oo h pl _ h e r ay a _ d ha gam v m e_c e ak l u ha e _ p e v oi ho e_la nt l d ug _b h gr r e a it_ th se too c r e_p t h os o s _ i nt fi pu ng e ll_ r dr f a a c m w_ e ak lin l e e_m e a ch ve_ ark an m g e ar _ k ca han tc d fig h_ h ey t a t _b e ke at _c tle ou m ak do rse e _ _t r he i c c r adw k os a s_ y m ge i nd fo t_ be llow act g_ _s ca que uit r s br ry_w tion ea e k_ ig br g r o ht e a un ta k_h d ke ea _p r t pa l ung ve e ta _wa ke y _r o fit ot t a _bi de k e l l liv _p er iss _ be goo ar d _ fo frui ot t m _b i ee ll t_ ey e
Mean normed idiomaticity value 0.50
0.40
0.30
0.20
Methodological issues
wr ite
1.00
0.90
0.80
0.70
0.60
0.10
0.00
V NP-construction
31
Figure 2.1 Native speakers’ idiomaticity judgements for the V NP-constructions.
32
Rethinking Idiomaticity
chew the fat (because these are too rare in a 100 million words corpus like the BNC; Section 2.2.1), and Figure 2.1 may be interpreted as reflecting this fact, too: while the magnitude estimation method gives subjects a scale that has no upper limit, the highest idiomaticity value assigned averages around 0.6, which indicates that subjects rarely assigned values that are extremely higher than the other values on their scale. This could simply indicate that subjects are only insufficiently able to distinguish fine-grained levels of idiomaticity; an alternative explanation could be that since subjects’ accessed concept of idiomaticity embraces totally non-decomposable and fixed constructions like chew the fat (‘to chat informally’), they were, on average, reluctant to assign high values to the constructions presented in order to be able to assign these values to such a construction, should it occur among the set. We can give the latter interpretation the benefit of the doubt since Gibbs and his colleagues have shown in a series of experiments that once subjects are presented stimuli that cover the whole idiomaticity spectrum, they are well able to distinguish them, at least on a three-fold taxonomy of decomposable, abnormally decomposable and nondecomposable constructions (cf. Section 3.1.2). A comparison of the results with idiomaticity scales derived from other experiments shows that the ranking obtained from the questionnaires makes sense. But how reliable are the results? It might be argued that eliciting idiomaticity data without providing a detailed definition of this concept from non-experts is a risky undertaking. In the absence of a more informative explanation of idiomaticity, na¨ıve speakers may resort to an understanding of idiomaticity as their familiarity with that phrase. Such a strategy may be further motivated by the instructions to judge the constructions depending on how relevant they are for inclusion in a dictionary or phrase book. However, it was not opted to instead provide a more specific definition of idiomaticity to the participants or to take trained linguists as subjects instead because this most likely would have distorted the results even more. Expert opinions on what is grammatically acceptable, typical of a language, or stylistically appropriate in certain text types and registers often deviate starkly from how the general public puts language to use. Similarly, it is plausible to assume that language experts will have had considerable exposure to theoretical approaches to idiomaticity, so their judgements will hardly be unfiltered (the widely established equation of idiomaticity with noncompositionality is very likely to be particularly problematic here). The scope of the present study, in contrast, is to carve out the understanding of idiomaticity in na¨ıve speakers’ heads. The line of reasoning here is that idiomaticity is a daughter process of language change, which, from the usage-based perspetive adopted here, is not primarily driven through conscious intitiatives on the side of the expert community. Rather, it is unconsciously perpetuated by the speaker community at large. Therefore, to ask na¨ıve native speakers for their judgements was a conscious decision. One way to ascertain that the results are not a product of chance but actually reflect a systematic pattern is to check to what extent subjects’ responses correlate among each other. For instance, was change X hand actually assigned a
Methodological issues
33
medium range value by the majority of subjects, or is the medium position on the ranking a result of some subjects assigning extremely low, and other subjects assigning extremely high idiomaticity values? This question can be addressed by computing a so-called reliability analysis, which provides a correlation coefficient, Cronbach’s alpha. This coefficient measures the extent to which a set of test items can be treated as measuring a single latent variable (Cronbach 1951). For the present data, Cronbach’s alpha amounts to 0.923,5 which means that subjects were indeed highly consistent in their judgements. This strongly indicates that the parallels between idiom typologies and the present ranking are indeed meaningful. While it may still be the case that the data reflect some concept other than idiomaticity, only in a very consistent fashion, it can be ruled out with considerable certainty that the judgements merely reflect subjects’ familiarity with the V NP-constructions: the correlation between the mean normed values and the corpus frequency of the V NP-constructions is only moderately high (r Pearson = −0.635). That is, the correlation between judged idiomaticity and frequency may point towards some influence of frequency on idiomaticity judgements (such that the higher the frequency, the lower idiomaticity), but not an exclusive one.
Conclusion The preceding section outlined the methodological issues involved in the present study. As has become obvious, an empirical approach to idiomaticity poses specific challenges both for corpus linguistic data retrieval and experimental data elicitation. As to the former, a maximally accurate and exhaustive data sample was given priority over a larger sample size. As to the latter, the use of a magnitude estimation scale was opted for to elicit valid judgements; furthermore, the results produced were validated with the necessary caution, comparing them with established findings on idiom taxonomies and ruling out mere frequency effects.
Notes 1. Unless indicated otherwise, all examples discussed in the present study are taken from the British National Corpus. 2. Get X act together was one of the patterns extracted for pre-tests, so it remained in the final sample even though it is not a V NP-construction. 3. High frequency here means that the idiom occurs at least once in 2 million words in a total of 210 million words of the corpus. 4. The most typical contexts were identified as follows. For every V NP-pattern, an excel sheet was created in which each attestation is represented in one row, and all variation parameters included in the present study (tree-syntactic
34
Rethinking Idiomaticity
flexibility and the different kinds of lexico-syntactic flexibility and morphological flexibility) are represented in one column each. The different parameter levels of each variation parameter are coded with numbers. For instance, the morphological flexibility factor Tense exhibits four parameter levels: ‘0’ for past tense, ‘1’ for present tense, ‘2’ for future tense, and ‘3’ for cases with nonfinite verb forms. Every attestation of a V NP-pattern was assigned one number for each variation parameter. If, for example, a V NP-construction preferably occurs in the present tense, this is reflected by the predominance of the number ‘1’ in the corresponding column. So with respect to Tense, the most typical context for that V NP-pattern is one in present tense. After having identified the most frequent parameter levels for all variation parameters (that is, in all columns), that attestation of a V NP-pattern which unites variable-specific preferences to a maximum extent was taken to represent the V NP-pattern in its most typical context. 5. Cronbach’s alpha was computed using SPSS 12.0.
Chapter 3
Compositionality
Introduction Sweet defined the non-compositionality of an idiom as follows: ‘the meaning of each idiom is an isolated fact which cannot be inferred from the meanings of the words of which the idiom is made up’ (Sweet 1889:139). Ever since, there has been general agreement that among the range of variation parameters that can be claimed to contribute to an expression’s idiomaticity, non-compositionality (sometimes also referred to as decomposability, analysability or transparency) is the one idiomatic variation parameter that lies at the heart of idiomaticity (Sonomura 1996:28). This section is structured as follows. Section 3.1 provides a summary of previous approaches to compositionality, beginning with early psycholinguistic studies which conceived of idioms as exclusively perfectly non-compositional, to psycholinguistic studies which suggest that compositionality better be conceived of as a scalar phenomenon which manifests itself in constructions to different degrees. Section 3.1.2 turns to more recent studies that have gone beyond native speakers’ intuitions about compositionality by reproducing this idiomatic variation parameter corpus-linguistically and computationally. Section 3.2 is devoted to addressing the various methodological issues that arise from the discussion of the previous approaches. Working out solutions to these issues on the basis of a small pre-test data sample of verb particle constructions, Section 3.3 presents a new approach to compositionality. This definition is applied to the V NPconstructions forming the data sample of the present study, and the results obtained are correlated with the idiomaticity judgements.
3.1 Previous approaches 3.1.1 Non-compositionality approaches Non-compositional approaches are mostly inspired by the theoretical work of generative linguists like, among others, Fraser (1970), Chomsky (1980) or van der Linden (1992). All these approaches have in common that they essentially
36
Rethinking Idiomaticity
reduce idiomaticity to non-compositionality, and that they regard idioms as long words that behave just like any other lexical entry. A widely quoted definition of compositionality in that spirit is that of Katz and Postal (1963:275): ‘The essential feature of an idiom is that its full meaning . . . is not a compositional function of the meanings of the idiom’s elementary parts’. One of the central claims that stem from this conception of idioms is that they are taken as positive evidence for the existence of transformations that mediate between deep and surface structure representations. Models of idiom processing that adopt such a non-compositionality approach all share the view that idioms should be regarded as long words that syntactically and semantically behave just like lexical entries do. These approaches differ, however, with regard to the question how exactly idioms are accessed, and at which point of time in the comprehension or production process. In Bobrow and Bell’s (1973) ‘literal processing model’, idioms are retrieved from a mental list via a special idiomatic processing mode, which is triggered (only) after literal interpretation fails. The literal processing model is argued to be supported by an experiment that showed how exposure to literal phrases decreases the likelihood that following phrases are interpreted in the idiomatic processing mode. Swinney and Cutler (1979) take issue with the validity of the self-reported measure of comprehension in Bobrow and Bell’s experiment. Alternatively, they propose a ‘lexical representation model’ which assumes that both literal and figurative meanings are activated right from the start. The empirical evidence in favour of this hypothesis comes from an experiment in which response latencies were faster for idioms than literal phrases. With his ‘direct access model’, Gibbs (1980, 1986) puts forward the even stronger hypothesis that idioms have computational priority over literal phrases. This hypothesis is compatible with the findings that idioms are read faster if presented in idiomatically biased contexts as opposed to literally biased contexts (Ortony et al. 1978, Schweigert and Moates 1988), and that recall is better for idioms in literal than in idiomatic contexts, which is attributed to the double processing of the idiom in both the idiomatic and the literal mode (Gibbs et al. 1989). The non-compositional approaches are also supported by studies on word familiarity effects, where familiarity is defined as input frequency. Reading time and idiom familiarity are found to be positively correlated, be the idioms presented in context (Schweigert 1986, Schweigert and Moates 1988) or in isolation (Cronk and Schweigert 1992).
3.1.2 Compositionality approaches Nunberg et al. (1994:498) define compositionality in a different way as ‘the degree to which the phrasal meaning, once known, can be analyzed in terms of the contributions of the idiom parts’, and they draw a sharp distinction between
Compositionality
37
an idiom’s compositionality and its (relative) conventionality, which is defined as follows. [Conventionality] is determined by the discrepancy between the idiomatic phrasal meaning and the meaning we would predict for the collocation if we were to consult only the rules that determine the meanings of the constituents in isolation, and the relevant operations of semantic composition. Nunberg et al. argue that the above-mentioned definitions of idioms as being non-compositional phrases all suffer from the common confusion of the two concepts compositionality and conventionality, or more precisely, from the misconception that conventionality entails non-compositionality. They argue instead that . . . while phrasal idioms involve special conventions, these do not entail the noncompositionality of such expressions; the conventions can be attached to the use of the idiom constituents, rather than to the collocation as a whole. (Nunberg et al. 1994:499) According to Nunberg et al., a consequence of this definition is that most phrasal idioms can actually be regarded as (relatively) compositional. Using the example of spill the beans, they argue that this is a compositional phrase in the sense that after speakers have been able to retrieve its meaning (from contextual clues), they will be able to recognize its compositionality. According to the established but misguided definition, to argue that spill the beans is compositional would be tantamount to saying that a speaker knowing what the constituent words spill, the and beans can use the phrase to mean ‘divulge information’ without having the idiom encountered before – which is an implausible assumption (Nunberg et al. 1994:498–499). They also point towards other studies that argued in favour of the fact that many idioms are indeed compositional, among them studies as early as Weinreich (1969), Mitchell (1971), Makkai (1972), Bolinger (1977) and others (Nunberg et al. 1994:499). This theoretical perspective is also nourished by a number of empirical facts that are difficult if not impossible to account for from a non-compositional perspective. For one, (parts of) many idiomatic constructions can be modified by adjectives or relative clauses, and/or they can be quantified or topicalized. This is incompatible with the non-compositional approach because a necessary condition for these operations to apply is that the idiomatic construction has some identifiable internal syntactic and semantic structure. Moreover, given that speakers do not receive any explicit input on the syntactic flexibility of idiomatic constructions, it is striking that speakers considerably agree on the (set of) syntactic operations that a given construction licenses or not (Wasow et al. 1983). A number of psycholinguistic studies have accumulated evidence in favour of the compositional view. In Peterson and Burgess’s (1993) experiment, subjects heard sentence fragments (e.g. He kicked the . . .) and were presented
38
Rethinking Idiomaticity
syntactically appropriate completions (in this example: nouns) or inappropriate completions (verbs). Naming latencies were faster in the appropriate completion condition, which suggests that since syntactic processing is not terminated after encountering an idiomatic construction, it is processed literally to some extent. According to Cacciari and Tabossi’s (1988) ‘configuration hypothesis’, activation of the idiomatic meaning takes place only after a sufficient portion of the string is encountered, that is, after an ‘idiomatic key’ has been processed (which varies across idiomatic constructions). The configuration hypothesis finds support by empirical evidence that for idiomatic constructions with a high cloze probability (i.e. if the final word is highly predictable), priming is observed for the idiomatic interpretation at the offset of the final word, but not for the literal meaning of the idiom-final word; for idiomatic constructions with low cloze probability, literal word meaning activation is obtained at the offset, while idiomatic meaning activation is obtained only 300 ms after the offset of the final word. In a similar vein, Titone and Connine (1994a) provide empirical evidence that literal activation varies as a function of the predictability and literal plausibility of the phrase: in low predictable idiomatic constructions, the word meanings are activated regardless of literal plausibility, while in high predictable idiomatic constructions, the word meanings are only activated for plausible literal interpretations. Gibbs and colleagues (Gibbs and Nayak 1989, Gibbs et al. 1989) demonstrate in a number of different studies that subjects can distinguish between at least three classes of idiomatic constructions in terms of their decomposability (normally decomposable, abnormally decomposable and non-decomposable idioms), and that sentences containing decomposable constructions are read faster than those containing non-decomposable constructions (cf. also Cronk and Schweigert 1992 for similar results). This suggests that the literal meanings that are activated during processing facilitate idiomatic construction comprehension to the extent that they overlap with the idiomatic meaning. Moreover, several studies testify to a positive correlation between decomposability and syntactic (Gibbs and O’Brien 1990) as well as lexical (Gibbs et al. 1989) flexibility, which suggests that compositionality is not the only determinant of idiomaticity. As a matter of fact, this can be expected if the word semantics indeed drive the idiomatic meaning: while in decomposable constructions, the meaning of the parts is more important than that of the combined phrase, the word semantics are only of minor importance in non-decomposable constructions, which are processed more as one single unit. McGlone et al. (1994) show that non-decomposable constructions are semantically productive to some extent, which suggests that even in these phrases, the semantics of the component words are activated at least to some degree. Similarly, literal word meanings constrain contextual plausibility: while e.g. kick, the and bucket do not overlap semantically with die, the sentence John kicked the bucket in the car accident is more plausible than John lay kicking the bucket due to his chronic
Compositionality
39
illness, because kick denotes a sudden action and accordingly, it will preferably occur in sentences which license such a sudden event-reading. In a series of experiments, Sprenger et al. (2006) show that on the one hand, (phonological as well as semantic) priming is higher for idioms than literal phrases, which supports the idea that idioms are stored as unitary entities in the lexicon; on the other hand, an experiment in which naming the last word of an idiom primed phonologically or semantically related words strongly suggests that literal word meanings are active during idiom production. Instead of postulating the existence of distinct processing mechanisms for idiomatic and literal language, their ‘superlemma model’ introduces a superlemma to represent the syntactic properties of an idiomatic phrase. This superlemma is connected to the representation of the phrase’s component words. This way, Sprenger et al. integrate the finding that ‘idioms are both unitary and compositional’ at the same time (2006:174).
3.1.3 Idiomatic pattern typologies A variety of semantic taxonomies have been proposed to categorize idiomatic constructions exhibiting different degrees of compositionality. Some of the most established and widely quoted will be presented in the following (but also, among others, Alexander 1978, Yorio 1989, Fernando and Flavell 1981, Cowie et al. 1983, Nattinger and DeCarrico 1992, Fernando 1996, Howarth 1998 and Moon 1998). As mentioned above, Gibbs and colleagues propose a threefold taxonomy which is also empirically supported in their experiments. They distinguish normally decomposable idioms, which are expressions in which part of the idiom is used literally (like the question in pop the question); abnormally decomposable idioms, in which parts are linked to their referents via metaphor (e.g. buck in pass the buck); and non-decomposable idioms, the meaning of which cannot be derived compositionally from the component words at all (as with chew the fat). Cacciari and Glucksberg (1991) propose a functionally oriented fourway taxonomy of idiom compositionality. Their classification differentiates between analysable-transparent, analysable-opaque, quasi-metaphorical and non-analysable idioms. In analysable-transparent idioms, there is a clear semantic relation between the idiom parts and the overall idiomatic meaning; this relation is mostly metaphorical in nature (as in break the ice or spill the beans). In analysable-opaque idioms such as kick the bucket, the relation between an idiom’s elements and the overall idiomatic meaning may be opaque, but they may still constrain the appropriate use of the idiom as well as its semantic and discourse productivity (as e.g. kick in the kick the bucket). The class of analysableopaque idioms embraces both normally and abnormally decomposable idioms in Gibbs’s taxonomy – as Cacciari and Glucksberg argue, there is no difference in the functions of literal and metaphorical usages from a discourse processing perspective. In quasi-metaphorical idioms, the literal referent of the idiom
40
Rethinking Idiomaticity
is itself an instance of the idiomatic meaning. For example, ‘surrender’ may be expressed idiomatically with the expression to give up the ship; at the same time, the literal meaning of this phrase is a prototypical example of the act of surrendering. It is only with non-analysable idioms that semantic and syntactic analysis of the idiom into its constituent parts does not reveal anything about the meaning of the composed phrase (Cacciari and Glucksberg provide the example by and large). Nunberg et al. (1994) propose a distinction between what they refer to as idiomatically combining expressions on the one hand and idiomatic phrases on the other. Idiomatically combining expressions are expressions like take advantage or pull strings, the meanings of which are distributed among their parts, while in idiomatic phrases such as kick the bucket or saw logs, the meaning is not distributed over the component words. Taxonomies like the above are indispensable when it comes to developing ideas about and discussing fundamental qualitative distinctions between different kinds of compositional phrases as they appear to be theoretically possible, desirable, and even empirically instantiated by individual examples. However, they present simplifying generalizations and need to be interpreted as such. The actual match between any given data set and idiom taxonomies can be expected to be imperfect, which does not necessarily disqualify either the data or the taxonomy in question. Empirical language data mostly defy a classification in terms of distinct classes – as Bolinger (1977:158) put it: Generalizations would indeed be lost if the words in most idioms could not be related to words elsewhere . . . The distinction [between an idiom and a collocation] was as clear as I could define it – allowing for the fact that in reality there is no clear borderline between the two. Accordingly, several scholars have argued in favour of a scalar representation of compositionality, e.g. Cowie (1988) or Wood (1986), who even states that this continuum is best defined as ‘shading by gradual degrees from total noncompositionality to fully regular combinations’ (Wood 1986:v). That pre-defined taxonomies may be problematic is also demonstrated empirically by Titone and Connine (1994b), who found that subjects have little problems with a binary distinction of idiomatic phrases into compositional and non-compositional phrases – so since idiomatic expressions are not all treated alike, we may conclude that this difference is a valid one in some way. At the same time, however, subjects had severe difficulties to classify idiomatic phrases within the range of highly compositional to metaphorical into further discrete classes. This may well be interpreted as pointing towards the fact that a classification into, say, three discrete idiomatic classes only insufficiently reflects the scalar nature of the phenomenon. To conclude, the results to be presented in Section 3.3.2 are not interpreted in terms of any set of given compositionality classes – rather, compositionality is conceived of as a continuum, and accordingly, the technical operationalization
Compositionality
41
of this idiomatic variation parameter (cf. Section 3.3.1) does justice to this assumption.
3.1.4 Corpus-linguistic/computational approaches The main interest of computational approaches to compositionality stems from the fact that non-compositionality is one of the essential characteristics of many multi-word expressions (e.g. verb-particle constructions, henceforth VPCs), so finding an adequate definition of compositionality enables computational linguists to automatically extract these multi-word units from large corpora. Berry-Rogghe (1974) presents an approach to the identification of VPCs which basically rests on the same logic as many studies to follow hers. Starting out from the assumption that one of the primary characteristics of VPCs is their non-compositionality, this concept is defined as follows: ‘The combinations of verb + particle are to be considered to constitute a single lexical item when they contract different collocational relations from those of the particle as a separate entity’ (Berry-Rogghe 1974:21–22, her emphasis). In other words, the more the particle contributes to the semantics of the VPC, the more of its collocates will be among the set of the VPC’s collocates. That is, the higher the ratio of the number of collocates shared between VPC and particle to the overall number of VPC collocates, the more compositional the phrase. Berry-Rogghe refers to this ratio as the R-value of the VPC, as shown in the formula in (17) below. (17)
R=
no. of collocates of VPC shared with P a = b no. of collocates of VPC
Only significantly associated collocates as determined by a z-score enter into the computation. With respect to the particles, Berry-Rogghe notes that it is difficult to extract ‘true’ collocates, so in order to rule out collocates that do not reflect the distributional characteristics of the particle, but rather, say, the preceding verb governing the phrase, she determines the particles’ collocates by considering only the words right to particles that directly followed some punctuation mark. To illustrate the method, consider the VPC live in. In a corpus comprising 202,000 words from texts by D. Lessing, D.H. Lawrence and H. Fielding, BerryRogghe identified 11 significant collocates of live in: hut, house, town, country, London, room, world, place, family, happiness and ignorance. Out of these 11 collocates, 6 are also among the set of collocates that are significantly associated with in alone. Accordingly, the R-value of live in is 6/11 = 0.54. Given that the R-value can theoretically range between 0 (when there is no overlap) and 1 (when there is a complete overlap), this value indicates that the phrasal verb live in is fairly compositional. Berry-Rogghe contrasts this with the example of the phrasal verb believe in, which has five significant collocates (witchcraft, God, Jesus, devil and paradise), none of which it shares with in alone, so the corresponding R-value is 0/5 = 0. Accordingly, believe in is an example of a highly non-compositional VPC.
42
Rethinking Idiomaticity
While the generalizability of Berry-Rogghe’s results can be called into question because of the small database, the methodology she develops anticipates the basic idea that the semantic contribution of a component word to a phrasal expression can be quantified corpus-linguistically in terms of the distributional similarity of the component word and the phrase. Moreover, she was ahead of many later approaches in pointing out that this similarity measure has to be based only on significantly associated collocates. From the cognitive-functional perspective as adopted here, this procedure can be regarded as an operationalization of entrenchment via association strength, that is, some collocates will be more entrenched than others, and the potential impact of a particular collocate on the overall compositionality value is a function of its entrenchment. Collocates that are most entrenched will be activated most quickly and strongly and will consequently have a larger impact on the compositionality value than less entrenched collocates. There are, however, also a number of aspects in her approach that seem debatable both on methodological and theoretical grounds, such as employing the z-score to determine collocational association strength (cf. Section 3.2.2 for further discussion), or to consider only the particle’s contribution to the phrase and not also the verb’s (we return to these issues in Section 3.2.4). Lin (1999) measures compositionality in a different fashion, adopting a socalled substitution-based approach. He starts out from the assumption that noncompositional phrases differ in their distribution from phrases that are formed by replacing one of the component words of the non-compositional phrase with a semantically similar word. On the basis of a data sample of VPCs taken from a collocation base constructed by himself (Lin 1998b), each component word of the VPC is substituted with its ten closest semantic neighbours; these are obtained from a self-created thesaurus (Lin 1998a). For each of these phrases, defined as a combination of some dependence relationship (A), the lexical head (B) and modifier (C), the Mutual Information (I ) score is computed as follows ((18) provides the corresponding formula): The mutual information score of a collocation is the logarithm of the ratio between the probability of the collocation and the probability of the events A, B, C [to] co-occur if we assume that B and C are conditionally independent given A. (Lin 1999:318) (18)
I (A, B, C ) = log
p(A, B, C ) p(B/A) p(C /A) p(A)
Non-compositionality is defined as follows: A collocation α is non-compositional if there does not exist another collocation β such that (a) β is obtained by substituting the head or the modifier in α with a similar word and (b) there is an overlap between the 95% confidence interval of the mutual information values of α and β. (Lin 1999:319)
Compositionality
43
To evaluate the results obtained, Lin checks whether the phrases identified as non-compositional have an entry in the NTC’s English Idioms Dictionary (NTCEID) and the Longman Dictionary of English Idioms (LDOEI), respectively. Noncompositional phrases should occur in the dictionary, compositional ones not. Overall, the measure achieves precision and recall rates of 15.7 per cent/13.7 per cent (NTC-EID) and 39.4 per cent/20.9 per cent (LDOEI). Lin (1999:320) attributes these rather poor results to the fact that ‘lexicographers differ in their opinions on what qualifies as a non-compositional phrase’ and that ‘even the most comprehensive dictionaries may have gaps in their coverage’. One problematic aspect of Lin’s approach is that the reliability of the results is evaluated against the presence of the extracted items in idiom dictionaries. As Bannard (2005:471) points out, such an evaluation procedure is questionable because ‘the absence of an item from a list of idioms is not necessarily evidence for it not being an idiom’ and likewise, ‘it is not the case that all items listed as idioms are non-compositional’. Moreover, the association measure used, the MI score, may cause problems particularly with respect to rare collocations (cf. Section 4.2.2).1 Schone and Jurafsky (2001) extracted MWEs from corpora using Latent Semantic Analysis2 (cf. also Baldwin et al. 2003). Compositionality is measured via the cosine between the vector representation (containing collocation frequencies) of the MWE and a weighed vector sum of its component words, the assumption being that small cosines indicate compositionality. They evaluate their results by checking if the items extracted in this way have an entry in existing dictionaries, and conclude that their approach does not excel other, already established MWE extraction techniques. However, their approach is supported by the fact that Baldwin et al. (2003) also report a correlation between their LSAbased contextual similarity scores and compositionality. Moreover, evaluating their results against dictionaries probably suffers from the same misconception as outlined by Bannard with respect to Lin’s (1999) study. Bannard et al. (2003:71) conducted a study that is similar in spirit in assuming that ‘identifying the degree of semantic similarity between a VPC and its component verb and/or particle will indicate whether that component part contributes independent semantics’. A verb is said to contribute to the meaning of the VPC if it is one of the 20 items most similar to the VPC; particles are considered as contributing if they occur among the ten nearest neighbours. The results thereby obtained marginally exceed the baseline value of human annotator agreement scores. Bannard (2005) provides further evidence in favour of such an approach by repeating the analysis on the basis of a much larger corpus and experimental data samples. The correlation between the contextual similarity of the VPC to its verb and annotators’ judgements as to whether the verb is contributing its meaning are highly significant, but not for the particles. Bannard attributes this to the fact that ‘the meaning of the particle is not so clearly reflected in its lexical contexts’ (Bannard 2005:476). McCarthy et al. (2003) also employ a substitution-based approach to measure compositionality. On the basis of VPCs that are automatically extracted from
44
Rethinking Idiomaticity
the BNC, they use an automatically acquired thesaurus to classify their relative compositionality, which is defined as the overlap of the semantic neighbours of the VPC and its component words. More specifically, McCarthy et al. test various possibilities of overlap sizes (the top 30, 50 or 500 semantic neighbours) and compare the different results depending on whether the VPC’s neighbours are compared with those of the simplex verb or the particle. Correlated with human compositionality ratings on a 11-point scale, they conclude that the highest correlation is achieved with those variants of the measure that take the semantics of the particle (rather than the verb) into account. This result seems to back up Berry-Rogghe’s rather intuitive conception that the particle is the more decisive component word in VPCs. Moreover, comparing Bannard’s results with those of McCarthy et al. indicates that the overlap size chosen is a crucial determinant of an adequate operationalization: both take a substitution-based approach, but unlike Bannard, McCarthy et al. do not consider all semantic neighbours, but only a certain number of the top neighbours, and they do not report any problems with the particles.
3.2 Towards a new approach to compositionality All the above-mentioned approaches share the idea that compositionality is a function of the semantic similarity of the constituent words and the phrasal expression. In other words, they are all theoretically inspired by a compositional view of language (cf. Section 3.1.2). They differ, however, in their definitions of semantic similarity: the substitution-based approaches ask ‘How well can the phrase’s meaning be retained if we replace one component word with another one?’, while the contextual similarity-based approaches ask ‘How similar are the contexts of the phrase and those of the constituent words’? More specifically, Berry-Rogghe’s R is the only approach that measures contextual similarity directly via the collocational overlap of VPC and component words – all the other contextual similarity measures compare the distributional characteristics of the VPC’s semantic neighbours and the component words. In the absence of a theoretical argument in favour of building in such a mediating layer, and given that the more recent approaches did not produce overwhelming results, the measure to be presented in the following is based on the R-value. However, several modifications of this R-value are inspired by the above-mentioned recent studies. These modifications concern the following:
r Berry-Rogghe’s data sample is simply too small to produce generalizable results, so it has to be tested in how far the method is extendable to larger data samples. The data samples in later studies are automatically extracted and consequently very large (even larger than my own), but most likely at the expense of accuracy (cf. Section 3.2.1). r The choice of an adequate association measure to determine the significantly associated collocates of the component words and the phrase
Compositionality
45
respectively. The majority of prior studies either do not legitimate their choice of association measure at all, thereby ignoring potential methodological implications, or they simply opt for the association measure that produces the ‘best’ results (cf. Section 3.2.2). r The problem that dichotomous p-values as provided by such association strength measures do not adequately reproduce interval-scaled association strength, so the question arises which collocates should be included and which not (cf. Section 3.2.3). r The question which component parts of an idiomatic construction are most relevant (in this case, verb or particle) (cf. Section 3.2.4). These modifications are outlined and illustrated on the basis of a pre-test data sample of VPCs (described in more detail below). VPCs were selected for the pilot studies so that the results could be compared with those of previous studies more straightforwardly. The reader who is less interested in the considerations that went into the development of the measure but more interested in the application of the final measure to V NP-constructions may skip Section 3.2 and turn directly to Section 3.3, which outlines the development of an elaborated version of the R-value. This extended R excels previous approaches in being the first one that measures the contribution of all component words to the phrasal expression instead of being (more or less arbitrarily) restricted to one component word’s contribution alone. With respect to the evaluation of the pre-test results, some remarks are in order. As has become obvious in the discussion of previous studies on compositionality, this is a concept that, unlike the flexibility parameters to be discussed below, is not directly retrievable from corpus data. That is, if we want to determine how flexible a V NP-contruction is with regard to, say, adverbial modification, a performance-based approach motivates a straightforward operationalization of this parameter that basically falls back on counting the number of times the V NP-pattern occurs with an adverb. While one can plausibly argue that an adequate operationalization of adverbial flexibility is a little more complex such that it also has to take into account the overall frequency of that V NP-pattern as well as the average frequency with which this kind of pattern in general takes adverbials, all these aspects of the operationalization basically rest on simple frequency counts. With respect to compositionality, on the other hand, matters are more difficult because there are more questions implied: which words contribute, how much does each of them contribute, and how much does each word account for with regard to the phrase? And how to measure this contribution in the first place? None of these aspects can directly be measured in terms of frequencies of occurrence, and accordingly, the different approaches presented above testify to the complexity of the issue. Moreover, in the absence of any other other stable correlate of compositionality, most of them rely on compositionality judgements to evaluate the quality of their operationalizations. Likewise, the results of the pre-tests in the present study will also be compared with informally
46
Rethinking Idiomaticity
assessed intuitive rankings and established idiom pattern typologies (which are ultimately also intuition-based), at least with regard to general tendencies. The sceptical reader might object that this runs counter to the performancebased approach and the scope and objectives of the present study as delineated in Section 1, where I argued at length that the judgement data are not to be regarded as the gold standard against which the quality of the definitions of the idiomatic variation parameters is assessed. Accordingly, let me emphasize here that the goodness of fit of the corpus-based compositionality values and intuitive rankings plays only a secondary role in the development of the compositionality measure. That is, all methodological decisions are based on theoretical premises and/or general methodological criteria to the maximum extent possible. For example, both component words enter the overall compositionality value because a constructionist perspective predicts that all component parts do contribute to the semantics of a higher order phrase (cf. Sections 1.2.5 and 3.3). Similarly, the decision which association measure to use is not based on the match of the results obtained with intuitive judgements, but is stipulated by differences between association measures regarding the distributional requirements they make (cf. Section 3.2.2). Also, these kinds of considerations are always given priority over the perceived quality of the results in terms of their match with intuition. However, particularly the relationship between statistical association strength and cognitive entrenchment is an under-researched topic to date. That is, there is no theoretical or general methodological argument that would motivate where to draw the line between those collocates of a component word that are sufficiently strongly associated with that component word to actually play a role and those which are not. Similarly, the results obtained for VPCs and V NP-patterns suggest that this threshold value is depending on the kind of phrasal construction investigated (cf. Section 3.3.1). In the absence of any other criteria, intuitive judgements are the only way to explore these aspects of the operationalization.
3.2.1 A pre-test data sample of VPCs The first conclusion to be drawn from the previous corpus-based approaches to compositionality concerns the extraction of the data sample. With the exception of Berry-Rogghe (1974), the data are extracted automatically from corpora. Thus, the data will by necessity be noisy to some extent. For instance, I manually extracted the 280 VPC types that McCarthy et al. (2003) identified as the VPCs that occur most often in the BNC. I compared the estimated frequency of occurrence with my manually extracted token frequencies and found that the average absolute deviation amounts to 279 items given an average frequency of 543 tokens. That is, the absolute average deviation amounts to 53 per cent. Consequently, the derivation of semantic neighbours or collocation frequencies obtained on the basis of such a data sample will be grossly inaccurate. Therefore, until automatic extraction methods have not reached an even
Compositionality
47
Table 3.1 Pre-test sample of VPCs and their token frequencies obtained from the BNC. VPC
Frequency in BNC
act up fill up give back give up hold up knock down knock up live down show off switch off take off
15 718 309 2,195 1,403 572 55 41 617 1,159 1,153
higher level of accuracy, corpus linguists should take the trouble of manual data extraction. Accordingly, like all corpus data underlying this study, the tokens of the 11 VPCs are manually extracted from the BNC. The VPCs were selected with respect to their informally assessed idiomaticity (which was guided mostly by discussions of the VPCs’ idiomaticity in the pertinent literature). Moreover, the VPCs are selected such that they differ (i) with respect to the particle or the verb to license a systematic comparison, and (ii) with respect to their assessed idiomaticity. Exhaustive retrieval of all tokens was ensured by simply searching for any joint occurrence of the verb and particle in question within one clause, no matter the ordering, and then manually extracting the true hits. The window for the collocates to enter the computation was fixed to ten words to either side of the search word. Also in accordance with BerryRogghe’s original procedure, the collocates of the particle were determined by considering only the first ten words following any occurrence of the particle in question which is itself directly followed by a punctuation mark that signals the end of some preceding clause (that is, full stops, commas, [semi-]colons, question or exclamation marks). Consider Table 3.1, which shows the VPCs and their frequencies of occurrence after manual cleaning of the concordances. In sum, this produced 8,237 manually extracted tokens.
3.2.2 Finding an adequate association measure Another important aspect is the choice of an adequate association measure. Once we accept an operationalization of entrenchment via collocational association strength and take this line of reasoning a little but further, the picture is much more complicated than it might appear at first sight, because studies on the exact relation between association strength as
48
Rethinking Idiomaticity
determined corpus-linguistically and effects of entrenchment (that can also be observed in experimental studies) are rather scarce. Since different association measures (such as MI, the z-score, or the Fisher Yates exact test) make different presuppositions as far as the distribution of the data is concerned, the question arises which association measure reproduces theoretical linguistic concepts like entrenchment best, or in other words, which association measure’s mode of working is most compatible with the mode of working of entrenchment. In most previous studies, the question of selecting an adequate association measure is either not attributed any relevance at all, or it is handled rather casually. As mentioned above, Lin (1999) relies on the mutual information score without questioning its adequacy in any way; Bannard (2005:475) tests the mutual information score as well as the t-test on his data and concludes that since no ‘consistent improvement in performance’ is obtained, he relies on simple frequency counts instead. While this is indeed an issue that needs to be addressed in a corpus-linguistic study, it is unfortunate to discard (or adopt) a particular operationalization simply because given some (rather arbitrarily chosen) parameter settings, it produces the ‘best’ results, and then to try and relate the results obtained post hoc to some theoretical background. In its most extreme form, such a procedure runs the risk of disconnecting linguistic theory from empirical modelling such that empirical methodology becomes an end in itself. This is particularly true if there are theoretical assumptions that speak in favour (or against) a particular method, such as the distributional requirements of the different association measures (Evert 2005). Accordingly, Stefanowitsch and Gries (2003:217–218) address these differences, and they suggest employing the Fisher Yates exact test (henceforth FYE) because it does not presuppose any distributional preconditions of the data sample such as normal distribution or homogeneity of variances. Moreover, with respect to rare collocations, the FYE does not overestimate association strength, and it does not underestimate the probability of error, like, for instance, the mutual information score does (Pedersen 1986). The above-mentioned aspects are of immediate relevance for the present study. First, being concerned with idiomatic phrases, it has to be able to cope with extremely rare collocation frequencies at times. Secondly, Gries et al.’s (2005) study lends further support to the view that entrenchment can be measured corpus-linguistically via collocation strength, and also suggests that the FYE clearly outperforms raw frequency statistics. For these reasons, it seems reasonable to discard the z-score in favour of the FYE for all computations of collocation strength underlying the present study. In order to have a standard of comparison, let me first outline the basic mode of operation of the R-value on the basis of results as obtained when no modifications of Berry-Rogghe’s original approach have been undertaken, that is, also based on z-scores, not FYE values. Consider Table 3.2 for the corresponding results. For each of the VPCs, Table 3.2 provides the number of collocates that are shared between particle and VPC (i.e. a), as well as the overall number
49
Compositionality Table 3.2 Berry-Rogghe’s R (based on z-scores/collocatesP in collocatesVPC ). VPC
a = n collocatesP in collocatesVPC
b = n collocatesVPC
R = a/b
give back fill up knock up knock down hold up act up give up take off switch off show off live down
1,126 655 64 630 522 5 2,189 564 489 327 17
1,899 1,444 151 1,635 1,450 18 8,531 2,554 2,297 1,612 91
0.593 0.454 0.424 0.385 0.36 0.278 0.257 0.221 0.213 0.203 0.187
of collocates of the VPC (i.e. b). In the rightmost column, the corresponding R-value resulting from dividing a by b is provided. As can be seen in Table 3.2, Berry-Rogghe’s R can distinguish between intuitively compositional and non-compositional VPCs only to a limited extent. With the exception of give back, which takes the first place in the ranking as expected intuitively, the ranking of the remaining VPCs is debatable at best. Clearly counter-intuitive examples are act up and give up, which rank higher in compositionality than switch off and show off , which rank lowest. Another problematic issue is that while the R-value can theoretically range between 0 and 1, the actual spread of the values is much smaller – while this could be ignored when only the resulting ranking is considered, this makes the R-value difficult to interpret when considering individual values in isolation (for give back, for instance, we would want to have an R-value that is considerably higher than 0.593 in absolute terms) or when the values of VPCs are directly compared to each other. Let us now turn to Table 3.3, which compares the R-values and corresponding rankings of the original z-based values of Table 3.2 with those that are obtained if the FYE is used instead of the z-score; all collocates significant at the 5 per cent level entered the computation. Two things have to be concluded from Table 3.3. First, it can be noted positively that the FYE-based values spread about a third more than the z-based ones (the amplitude between the highest and the lowest R-value amounts to 0.638 for the FYE-based values and 0.406 for the z-based values). So the FYE-based value of 0.864 for give back, for instance, nicely reflects the intuitive conception of this expression as a highly compositional one (and not merely as a middle range candidate as suggested by the z-based value of 0.593). The resulting ranking, however, does not constitute a substantial improvement: metaphorical
50
Rethinking Idiomaticity
Table 3.3 Comparison of Berry-Rogghe’s R based on FYE-scores and z-scores. VPC give back hold up knock up take off fill up knock down act up give up live down show off switch off
VPC
RFYE−based 0.864 0.626 0.477 0.432 0.378 0.357 0.357 0.328 0.302 0.236 0.226
1 2 3 4 5 6 7 8 9 10 11
give back fill up knock up knock down hold up act up give up take off switch off show off live down
Rz−based 0.593 0.454 0.424 0.385 0.36 0.278 0.257 0.221 0.213 0.203 0.187
1 2 3 4 5 6 7 8 9 10 11
VPCs like switch off and show off still rank lower than clearly non-compositional phrases such as give up or act up. However, as I will argue in the next section, this problem should not be ascribed to the FYE itself; rather, it resides in the question how many of the particle’s and the VPC’s collocates enter into the computation.
3.2.3 (Dichotomous) p-values do not reflect (interval-scaled) association strength Another problematic aspect that has to be addressed is that dichotomous significance decisions based on p-values do not reflect interval-scaled association strength (or entrenchment, for that matter). So is it really plausible to assume that all statistically significant collocates have their share in the determination of compositionality, like Berry-Rogghe did? To take the example of VPCs, the number of significant collocates run into hundreds of thousands for the particles (because the number of collocates that yield the significance level are of course also correlated with the general frequency of the search word – an aspect which could not possibly have been foreseen by Berry-Rogghe because the corpus on which her analysis was based was much smaller). But it is neither plausible to assume that all of these collocates play a role after all, nor that all of them have an equal impact on the compositionality value. Consequently, one has to decide on some threshold value beyond statistical significance levels. This raises the question where to draw the line. One option would be to take an arbitrary percentage of the most significant collocates; alternatively, one could include all collocates up to a particular association strength. In both cases, there are no psycholinguistic or other studies that motivate the selection of a particular threshold value; it rather appears that this is one of the most urgent issues that still await future investigation to
51
Compositionality Table 3.4 Berry-Rogghe’s R (based on FYE-scores/collocatesP in collocatesVPC with different overlap samples and corresponding rankings). VPC
R10%
give back fill up knock down hold up take off switch off show off knock up give up live down act up
0.966 0.549 0.512 0.487 0.434 0.414 0.361 0.308 0.257 0.125 0
R15% 0.886 0.524 0.446 0.525 0.421 0.338 0.325 0.368 0.203 0.125 0
R20% 1 3 4 2 5 7 8 6 9 10 11
0.898 0.457 0.427 0.449 0.388 0.298 0.271 0.32 0.217 0.094 0
Rall 1 2 4 3 5 7 8 6 9 10 11
0.864 0.378 0.357 0.626 0.432 0.226 0.236 0.477 0.328 0.302 0.357
1 5 6 2 4 11 10 3 8 9 7
ultimately unite theoretical ideas and concepts with results obtained from empirical (corpus-linguistic) analyses. For practical purposes, the issue can only be approached exploratively (as done in a similar fashion by McCarthy et al. (2003) reported in Section 3.1.4). In accordance with these considerations, the settings for the third run of the pre-test are modified such that (i) the FYE test substitutes the z-score and (ii) the overlap of the particle collocates and the VPC collocates is based on different sample sizes: how do the results change if only the top 10, 15 or 20 per cent, instead of all significant collocates are taken into account? Table 3.4 provides an overview of the results. The VPCs are sorted according to the ranking that results from an overlap sample that only comprises the top 10 per cent of the significant collocates of particle and VPC; the rankings that result from taking 15 per cent, 20 per cent, or all significant collocates into account are shown in the columns adjacent to the R-values for each VPC. To give an example, give back always yields the highest R-value (i.e. the most compositional one), so it is always on top of the ranking. Fill up, on the other hand, ranks second according to the 10 per cent sample, but is assigned third rank according to the 15 per cent sample, second rank according to the 20 per cent sample, and only fifth rank according to the sample which includes all significant collocates. The most important conclusion to be drawn from Table 3.4 is that the results indeed crucially depend on the methodological parameter settings. With the exception of give back, the rankings are totally unstable across the different sample sizes. Moreover, it appears that taking only the top 10 per cent of all significant collocates into account yields the most promising results: VPCs like give back, fill up, or knock down are intuitively highly compositional, and this fact is accurately reproduced by the operationalization. Likewise, give up, live down, and act up are often used as examples of fairly non-compositional VPCs, and the 10 per cent operationalization is also capable of reproducing this other extreme
52
Rethinking Idiomaticity
pretty well, since these VPCs rank lowest. Thirdly, metaphorical VPCs like take off , show off , and switch off occupy rankings in the middle, which stands in line with previous accounts. However, VPCs like hold up or take off have multiple senses, and the judgement will vary depending on the sense that is taken to be instantiated. This issue is addressed in somewhat more detail in the next section.
3.2.4 Verb or particle, is that the question? Berry-Rogghe’s approach differs from most other approaches in that she ascribes the particle the most decisive semantic contribution to the phrase’s overall compositionality, not the verb. In most other approaches, the particle does not enter the computation because it is regarded as too polysemous to be able to extract collocates that truly reveal something about the particle’s semantic contribution to the phrase3 or because ‘the meaning of a preposition is not so clearly reflected in its lexical contexts’ (Bannard 2005:476). As already mentioned above, Berry-Rogghe is aware of the polysemy problem and adapts her operationalization in order to minimize this potential effect, so in her work, methodology is driven by the theoretical assumption that the particle in its function as the head of the phrase must be the more relevant determinant in the compositionality of the VPC. The results of the (already slightly modified) particle-based results are indeed promising, which demonstrates that one should not discard the potential influence of one constituent simply because of methodological problems that a particular operationalization brings about. Indeed, it lies at the heart of the present study to point out that the results achieved are to a large extent determined by the operationalization that is chosen. It remains debatable also from a theoretical perspective which of the two constituents should be ascribed the more relevant role. In Section 3.3, I argue that from the theoretical perspective adopted here, it is even questionable whether the operationalization should take only one of the two constituents into account, which is why I opt for an operationalization in which both constituents enter into the resulting value. For the purpose of the pre-test, however, it is necessary to shed some light on the question how (much) considering the verbal rather than the particle’s collocates changes the picture. Moreover, the question must be addressed as to what extent the results change if the VPC’s collocates are specified for their senses. Accordingly, the third run of the pre-test focuses on hold up and take off . For both VPCs, approximately 100 attestations of the most prominent senses are selected from the overall sample. According to WordNet 1.7.1, these are ‘physically uphold’ and ‘delay’ for hold up, and ‘remove’, ‘move (figuratively)’ and ‘leave’ for take off . Table 3.5 provides the results for these sense-specific samples (together with the results obtained for the non-sensespecific samples) based on the inclusion of the verbal collocates in the VPC’s collocates. For comparison, Table 3.6 provides the results for sense-specific and general samples based on the default, particle-based operationalization.
53
Compositionality
Table 3.5 Berry-Rogghe’s R (based on FYE-scores/collocatesV in collocatesVPC with different overlap samples and corresponding rankings) for sense-specific VPCs. VPC
R 10%
take off ‘leave’ take off ‘remove’ hold up take off ‘movefig’ hold up ‘delay’ take off hold up ‘uphold’
1 0.9 0.872 0.737 0.722 0.711 0.647
R 15% 1 0.875 0.814 0.852 0.741 0.711 0.64
R 20% 1 2 4 3 5 6 7
1 0.909 0.769 0.816 0.667 0.678 0.706
R all 1 2 4 3 7 6 5
1 0.929 0.846 0.883 0.758 0.886 0.727
1 2 5 4 6 3 7
As can be seen in Tables 3.5 and 3.6, hold up ranks higher than take off in both the verb-based and the particle-based conditions – with respect to the absolute R-values, however, there are drastic differences: the verb-based values are much higher than the particle-based ones, in both cases yielding fairly high absolute R-values. With respect to the question whether sense specification has a clarifying effect, the results in Table 3.5 can only be described as counter-intuitive and chaotic. While the idiomatic ‘leave’-sense of take off yields the maximal compositionality value, the literal ‘uphold’-sense of hold up occupies the lowest position in the ranking. These results tie in with Bannard et al’s. (2003) results, who elicite judgements on the extent to which different VPCs semantically entail their component words; comparing the average inter-annotator agreements for those VPCs which have only one sense and those which are polysemous (according to WordNet), they find no significant difference between the two groups. Accordingly, they conclude that ‘polysemy was not a significant confounding factor’ (Bannard et al. 2003:68). The particle-based results in Table 3.6 are much more promising. With the exception of the ‘uphold’-sense of hold up, the ranking of the sense-specific Table 3.6 Berry-Rogghe’s R (based on FYE-scores/collocatesP in collocatesVPC with different overlap samples and corresponding rankings) for sense-specific VPCs. VPC
R 10%
take off ‘remove’ take off ‘movefig’ hold up ‘delay’ hold up ‘uphold’ take off ‘leave’ hold up take off
0.7 0.684 0.611 0.529 0.5 0.487 0.434
R 15% 0.625 0.704 0.778 0.56 0.333 0.525 0.421
R 20% 3 2 1 4 7 5 6
0.591 0.763 0.75 0.559 0.5 0.449 0.388
R all 3 1 2 4 5 6 7
0.643 0.569 0.736 0.721 0.833 0.626 0.432
4 6 2 3 1 5 7
54
Rethinking Idiomaticity
VPCs is, by and large, plausible. This rules out the potential argument that the sense-specific samples were simply too small to yield interpretable results – if this were the case, the particle-based results should not be better than the verb-based ones. As in the second run of the pre-test, the 10 per cent samples yielded the most comprehensible results. To conclude, this result seems to support BerryRogghe’s assumption that it is the particle that is most decisive in determining the compositionality of the VPC – it does not rule out, however, that the verb also contributes to the overall compositionality.
3.2.5 Interim summary We may derive the following conclusions from the pre-test study as far as necessary modifications of the original approach by Berry-Rogghe are concerned. First, the FYE provides results that fit assumptions concerning the quantitative distribution of the data best, and accordingly, provides much more promising results than the z-score used by Berry-Rogghe. Secondly, considering not all collocates but rather only a small fraction of the top significant ones yields results that are much established compositionality classes – moreover, this procedure stands in line with prior studies supporting the idea that association strength (and, consequently, its effects) is a gradual phenomenon. Thirdly, the particle-based approach provides promising results, which shows, first of all, that it is reasonable to define compositionality corpus-linguistically by comparing the distributional characteristics of the component words and the phrasal idiom via association measures. This also lends further credibility to the assumption that the meaning of at least one of the component words is activated together with the meaning of the phrasal idiom, thereby supporting a compositional approach to language. Moreover, a comparison of the results obtained via a particle-based operationalization to those with a verb-based operationalization points towards a stronger, though not necessarily exclusive, impact of the particle to the overall semantics in the case of VPCs. Taken together, this is compatible with the hypothesis that different component words influence the phrase’s compositionality to different extents. Last but not least, it may be taken as an indication that compositionality cannot be measured by only considering the contribution of one component word to the phrasal semantics. Indeed, there is no theoretical argument in favour of such an approach, but empirical evidence in favour of the contrary assumption that all constituent words will play a role: Stroop (1935) was the first to demonstrate that people cannot inhibit understanding the meaning of words to which they attend; Cacciari and Tabossi (1988) show that this even holds for highly conventionalized phrases like spill the beans (McGlone et al. 1994). Even if one wanted to argue that it is the head of a phrase that contributes the lion’s
Compositionality
55
share of the phrasal semantics (Hamblin and Gibbs (1999) present evidence in favour of a decisive impact of the main verb on the phrasal meaning), this does not rule out that considering also the other constituent words’ contribution could possibly render the results more accurate. So with the results of the pre-tests in mind, let us now address the following question: how can we quantify and relatively weigh the relevance of the contributions made by all component words? For instance, how can we determine (or at least estimate with sufficient precision) the quantitative contribution made by, say, live in live down, or that of the noun line in the V NP-construction draw the line?
3.3 A new approach to compositionality Unfortunately, there is no theoretical argument that would help us to specify the hypothesis that the different component words will contribute to different extents any further: neither for VPCs nor for V NP-constructions is it possible to derive a general rule of thumb that says something like ‘60% are contributed by one constituent word, 40% by the other’. It is much more plausible to assume that the weighting of the contributions made by the component words will be item-specific and only roughly correlate with the construction in question, or the word class of the constituent words. Accordingly, the operationalization has to be maximally adaptive in this regard. In what follows, two different compositionality measures, both of which rely on the R-value, are presented. The general line of reasoning and the mode of operation is illustrated on the basis of the V NP-constructions underlying the present study. The two measures are also applied to the VPC data since the results should give an additional clue as to what extent the formulae are generalizable to different kinds of constructions (in addition to different instantiations of the same syntactic construction).
3.3.1 Application Both VPCs and V NPs comprise two component words (leaving aside the (optional) determiner in V NP-constructions), and for each component word, an R-value can be computed. If those two R-values are to be united into a single R-value (henceforth referred to as the compositionality or compositionality value of the construction), one way to quantify the relative weight of each of these two R-values is to ask how many of the collocates that the component words offer for overlap with the construction are contributed by each component word. The logic here is that the more collocates a component word has to offer for collocational overlap with a higher order phrase, the more likely will be its potential contribution to any phrase it is part of. For words
56
Rethinking Idiomaticity
which have only few significant collocates, on the other hand, it is very unlikely that these overlap with the significant collocates of a higher order phrase. For example, imagine that in a V NP-construction, the verb has 25 significant collocates and the noun 75, that is, 100 collocates altogether. Given this difference, chances are much higher (more precisely, 3:1) that any collocates of the construction are among the set of the noun’s significant collocates rather than the verb’s. Accordingly, the first extended compositionality measure comprises four parts (two for each component word), each of which will be explained in the following. In a first step, for both constituent parts, here verb and noun phrase, the R-values are determined (R V and R NP ). Recall that the R-value is the number of construction collocates included in the set of verb or noun collocates, respectively, divided by the number of all significant construction collocates; consider the formulae in (19).
(19) (a)
RV =
n colls pattern in colls V n colls pattern
(b)
R NP =
n colls pattern in colls NP n colls pattern
In a second step, the weight of R V and R NP is determined by checking how big the share of the verb’s and noun phrase’s collocates is in the sum of their collocates. This is done by simply dividing the number of collocates of one constituent word by the sum of both constituent words’ collocates; consider (20). (20) (a) (b)
weightR V =
n colls V n colls V + n colls NP
weightR NP =
n colls NP n colls V + n colls NP
For each component word, we can now quantify its contribution by relativizing its R-value against its weight. That is, we multiply the R-value with the weight value; consider (21). (21) (a)
(b)
RV weight of R V n colls pattern in colls V n colls V contributionV = × n colls pattern n colls V + n colls NP
contributionNP
R NP weight of R NP n colls pattern in colls NP n colls NP = × n colls pattern n colls V + n colls NP
Compositionality
57
The overall compositionality value of the construction is the sum of the contribution values of the component words; consider (22). (22) compositionality V NP = contribution V + contribution NP
To illustrate the procedure, consider the example of draw X line. First, the R-values for draw and line respectively are determined as follows. (23) (a) (b)
R draw = R line =
51colls draw X line in colls draw = .773 66 colls draw X line 62colls draw X line in colls line = .939 66 colls draw X line
Second, the weights of these R-values are calculated. (24) (a) (b)
weight R draw = weight R line =
425 colls draw = .371 425 colls draw + 721 colls line 721 colls line = .629 425 colls draw + 721 colls line
Thirdly, by multiplying the R-values and their weights, we get the contributions made by draw and line to draw X line. (24) (a) (b)
contributiondraw = 0.773 × 0.371 = 0.287 contributionline = 0.939 × 0.629 = 0.591
Adding the two contributions of draw and line, the overall compositionality value of draw X line is 0.878. The compositionality value can theoretically range between 0 (very low in compositionality) to 1 (very high in compositionality), so according to this operationalization of compositionality, draw X line is a highly compositional construction. This extended compositionality measure unites a number of positive aspects: both the verb and the noun are taken into account simultaneously, which reflects the assumption that every constituent makes a contribution. Neither is the formula qualitatively restricted to V NP-constructions, but can be applied to any kind of phrase (because for any component word, it is principally possible to compute the significant collocates); nor is the formula quantitatively restricted to two-partite structures, but can principally be applied to n-grams – the overall compositionality value naturally increases with the number of parts that contribute (more or less) to the overall phrase. With respect to the (number of) collocates entering the computation, it has to be noted that to take the top 10 per cent as in the pre-tests is unfeasible, because this would result in inflationarily high numbers of collocates on the part of the constituent words (think of the light verbs have and do as in have X laugh or do X trick, respectively), while the overall number of significant collocates of the idiomatic constructions are mostly so small that taking only a 10 per cent sample
58
Rethinking Idiomaticity
of this renders the question of inclusion senseless: if the 10 per cent sample of the idiomatic construction’s collocates comprises only, say, 2 collocates, while the 10 per cent sample of the constituent verb comprises several thousand collocates, the fact that the two idiom collocates are among those several thousands is not really telling us anything about the compositionality of the idiomatic phrase. In other words, the measure is not conservative enough with respect to the constituent words’ collocates, while it is too conservative with respect to the idiomatic construction’s collocates. Therefore, the constituent word’s collocate sample size is determined via a threshold value (while taking all significant collocates of the idiomatic construction into consideration). In the absence of any other findings motivating a particular threshold value, the threshold was determined exploratively. A comparison of the R-values based on all significant collocates with FYE-values of 50 or higher, 75 or higher, and 100 or higher shows that the 100+-based results are most suitable for two reasons.4 First, they are conservative with respect to the number of collocates entering into the computation. Secondly, the resulting R-values spread most widely, which is desirable if we want to see the assumed gradualness of compositionality to be reflected in the R-values as much as possible. Two caveats remain with respect to the generalization potential of this threshold value. First, it is beyond the scope of the present study to explore the plausible assumption that the 100+ threshold value need not prove to be the optimal value for all kinds of constructions. Secondly, since the FYE-value naturally increases with increasing corpus size, the adequate threshold values will differ considerably depending on the corpus that is used. An alternative extension of the R-value adopts a different approach to weighting the R-values. Its focus is on how much of itself the component words actually contribute: it weights the inclusion of the construction collocates in the component words’ collocate sets by asking for how much of the overall component words’ semantics (i.e. their significant collocates) this inclusion actually accounts for. If a word contributes many or even most of its significant collocates, it will be attributed a heavier weight as opposed to a word which shares only a marginal number of its collocates with the higher order phrase. In doing so, a component word is not attributed a heavy weight simply because it is highly frequent and consequently also has a high number of significant collocates associated with it, which automatically increases the likelihood that there will be overlap with the collocates of the higher order phrase. So while the computation of the R-value remains the same as in the first extension, instead of a weight value, a share value is determined for every component word; consider (25). (25) (a) (b)
share V = shareNP =
n colls pattern in n colls V n colls V n colls pattern in n colls NP n colls NP
Compositionality
59
The contribution of a verb and noun phrase, respectively, is determined by relativizing the R-value against this share value; consider (26). (26)
RV share of R V n colls pattern in colls V n colls pattern in n colls V contributionV = × n colls pattern n colls V
contributionNP
R NP share of R NP n colls pattern in colls NP n colls pattern in n colls NP = × n colls pattern n colls NP
As in the first extension, the overall compositionality value is simply the sum of the contributions made by all component words (here: verb and noun phrase); consider (27). (27) compositionality V NP = contributionV + contributionNP 5 Again, take the example of draw X line. Since the formulae differ only with respect to the weighting of the R-values, we only have to determine the share values for draw and line, respectively. (28) (a) (b)
share draw = share line =
51colls draw X line in n colls draw = .12 425 colls draw 62colls draw X line in colls line = .086 721 colls line
Multiplying the R-values with these share values, we get the contribution made by draw and line to draw X line. (29) (a) (b)
contribution draw = .773 × .12 = .093 contribution line = .939 × .086 = .081
The sum of these contributions amounts to 0.174. That is, according to this compositionality measure, draw X line ranks very low in compositionality.
3.3.2 Results So which approach fares better? Consider Figure 3.1 for an overview of the general results of the first extended version for the V NP-constructions; Figure 3.2 provides the results for the VPCs (for the exact numbers, cf. the tables in Appendix B). As can be seen in Figure 3.1, this extension of the R-value is not yet optimal: weighting the contributions of verb and noun phrase as done here, highfrequency verbs or noun phrases push the R-values extremely up or down. For instance, take X plunge ranges so high because the share of take in the sum of take’ s and plunge’s collocates amounts to 100 per cent. Consequently, the fact that all the construction’s significant collocates are also among take’s list is weighted with 100 per cent, while the fact that plunge does not share all its
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
ll X gr sto it r wr X t y m ite oo ak X t h e/ pu lett e l cl l X r os fa e ce be X d o ar o X r fo fru o fig t X it ht b X ill fil ba l/f ttl ho i t e ld X b X ill dr bre a c a w at h rr X y X lin w e m eig br eet ht ea X k X eye pa he v e ar t ge pl a X w t X y X ay ac ga tt m o e ca geth t c er ca h X ll ey le X p e a v ol ch e X i ce an m ge ar X k t a han ke d cr X os p s i m X m ss ak de e X ind liv p e r oi n X t se go e X od ta k e po be X c int g X ou r q s f o ues e llo tio m wX n ak e X sui t ta m a s c ke r k r X m atch ro ak o eX Xh t br he ead e a ad kX w ta gr ay ke ou X nd pl un ge
R -value
60
te
1.00
0.90
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 3.1 Compositionality values for V NP-constructions (R-value extension I).
Compositionality
61
collocates with take X plunge enters with 0 per cent into the final R-value. A similar picture can be observed for the VPC data in Figure 3.2: high-frequency verbs such as give and take range high, verbs comparatively low in frequency such as switch and knock rank lowest. The measure as it stands is sub-optimal from a cognitive-linguistic perspective. It considers the overlap between component words and the phrase only in one direction, asking: how much of the idiom is accounted for in terms of the component words, while ignoring the question how much of themselves the verb and the noun phrase bring into the semantics of the idiom. If one simply conceives of the higher order constituent as a combination of two lower-order constituents, this procedure makes perfect sense, but this presupposes a very structuralist and stable conception of language. From the cognitive-functional perspective adopted in the present study, however, the operationalization should also leave room for a potential backward influence of the phrasal verb’s semantics on the (weightings in the often polysemous network of) the verb’s semantics. This backward influence may, in turn, change the compositionality value of the phrasal verb: the higher the phrasal verb’s share is of the verb’s semantics, the more compositional the phrasal verb will be. The second extension exactly captures this fact neatly. Figure 3.3 provides an overview of the results for the V NP data and Figure 3.4 for the VPC data. As can be seen in Figure 3.4, the second approach outperforms the first one, particularly so with respect to the V NP data. Here, the superiority of the second approach can already be seen when looking only at the extremes: write X letter, tell X story, etc. rank highest, while the formerly problematic cases like take X plunge are assigned low-ranking positions. Note also that the majority of items are assigned a fairly non-compositional value on the scale from 0 to 1, which ties in nicely with the fact that most of these were actually obtained from an idiom dictionary. Items such as write X letter and tell X story, on the other hand, were selected so to test if items that are intuitively assessed as (nearly perfectly) compositional are actually treated accordingly by the measure – the second approach proves very accurate since these items do not only rank highest, but moreover, their compositionality values are very high in absolute terms (0.73 for tell X story and 0.84 for write X letter ). It goes without saying that individual rankings run counter to an intuitive assessment, such as call X police obtaining a middle rank only. In the VPC data, the overall ranking generally appears much more plausible than the one obtained from the first approach, but that give up obtains the highest ranking of all does not make any sense whatsoever. However, while this certainly points towards potential for improvement of the present approach, the exact source of error need not necessarily reside in the corpus-linguistic operationalization itself, but may also be a side effect of a yet insufficiently large data sample for these kinds of expressions. In general, however, the corpus-linguistic approach is supported by the overall results.
62
1.00
0.90
0.80
0.70
Rethinking Idiomaticity
R -value
0.60
0.50
0.40
0.30
0.20
0.10
0.00 switch off
knock up
knock down
fill up
act up
hold up
VPC
Figure 3.2 Compositionality values for VPCs (R-value extension I).
show off
live down
take off
give up
give back
Compositionality 63
Figure 3.3 Compositionality values for V NP-constructions (R-value extension II).
1.00
64
0.90
0.80
0.70
Rethinking Idiomaticity
R -value
0.60
0.50
0.40
0.30
0.20
0.10
0.00 act up
knock up
live down
hold up
switch off
show off
VPC
Figure 3.4 Compositionality values for VPCs (R-value extension II).
knock down
take off
fill up
give back
give up
Compositionality
65
Conclusion The second extension of Berry-Rogghe’s R-value presented in the previous section provides very satisfactory results. While there is no gold standard available in the present study against which the quality of the final corpus-linguistic definition can be assessed, there are several strong indications that it is indeed valid. For one, the correlation between the values produced by the second extension of the R-value and the corpus frequency of the V NP-constructions is very high (r Pearson = 0.802), which stands in accord with Barkema’s (1994b:26) results, where corpus frequencies were correlated with intuitively assessed compositionality values. What is more, the corpus-linguistic definition of compositionality presented here derives a lot of plausibility from its compatibility with theoretical premises. It is the first measure in which the contribution of not only one, but all component words to the overall phrase are taken into account. Rather than considering, say, only the verb make in make a point, the measure also quantifies the contribution made by point. Therefore, the measure stands in accord with the constructionist view that a complex phrase is a manifestation of several smaller constructions, and that every one of them contributes to the meaning of the complex phrase (Goldberg 2006:10; Section 1.2.5). Moreover, rather than assuming that each word contributes equally to the meaning of the phrase, this compositionality measure weights the contributions made by the component words relative to each other. That is, the measure licenses the possibility that the contribution made by point in make a point can be smaller or bigger than in see a point. Thereby, the measure implements a central assumption of many cognitive approaches to grammar that constructions are entrenched in the mental lexicon to different extents, depending on their frequency of use (Langacker 1987:59). Accordingly, differences in entrenchment between make, see, and point can be expected to influence the relative weight of their contributions. Finally, the contribution made by each component word is not only weighted in terms of how much of the construction’s meaning is accounted for by taking the component word into account, but also how much of itself each component word brings in. For instance, in take the plunge, plunge brings in nearly all of its semantics (in terms of the significant collocates it shares with the construction), whereas take only contributes a fraction of its meaning potential (take has many other significant collocates that are not associated with the construction take the plunge in particular). Accordingly, the measure leaves room for a potential backward influence of the phrase’s semantics on the (weightings in the often polysemous network of) constituent word’s semantics (Langacker 1987).
Notes 1. I disagree with Bannard, however, in his point of critique that Lin’s substitution-based operationalization models productivity rather than
66
2.
3.
4.
5.
Rethinking Idiomaticity compositionality – or at least we have different definitions of productivity. I understand productivity to be a process operative at different levels of linguistic analysis by which parts of linguistic entities are combined creatively to obtain new forms carrying new meanings. One can say that compositionality is a precondition for what has often been labelled semantic productivity, i.e. the possibility of deriving new idiomatic meanings by re-using parts of the idiom while substituting others: In order for an idiom to be semantically productive, the idiom has to be compositionally analysable (Plag 2005:121). Consider to burn one’s bridges ahead of oneself derived from to burn one’s bridges behind oneself , where the preposition behind could only be replaced with ahead because of the analysability of the phrase. Lin’s approach to compositionality, however, does not check whether new meanings can be derived if parts of idioms are taken and recombined with other parts. Rather, he tests how much of their ‘default’ semantics the constituent parts bring into the idiomatic phrase by comparing the contribution made to the idiom with that of items which should behave distributionally highly similar provided that the part is actually contributing its ‘default’ semantics. Latent Semantic Analysis (LSA) is ‘a high-dimensional linear associative model that embodies no knowledge beyond its general learning mechanism, to analyze a large corpus of natural text and generate a representation that captures the similarity of words and text passages’ (Landauer and Dumais 1997:211). However, this argument is discarded by Bannard’s (2005:476) experimental data, ‘because if this were the case, then we would expect humans to find it more difficult to make semantic judgments about the particles, and we know that inter-annotator agreement was higher when assessing the semantic contribution of the particles than it was for the verbs’. The regular output of a FYE test is a p-value. Since small p-values can be difficult to interpret and cumbersome to report, Gries et al. (2005:648) suggest to report the p-value’s logarithm to the base of ten instead. A converted value of ≥1.3 is equivalent to a 5 per cent probability of error. Accordingly, FYE values of 50, 75 or 100 indicate extremely high association strength. In order for the second extended R-value to range between 0 and 1, it would principally be necessary to divide the final value by the number of constituents that entered the computation (in this case, two); however, for the V NP-patterns and the VPCs discussed here, the values were extremely small already, so the values reported here are not divided. Anyway, this is not a problem as long as only the results of one analysis are compared (since the ranking of the constructions remains the same), but once results from several analyses are compared, it may be reasonable to do the division in order to stay within the index from 0 to 1.
Chapter 4
Flexibility measures
Introduction While the centrality of compositionality is unanimously agreed upon, there is considerable disagreement concerning the role that flexibility plays in the definition of idiomaticity: does it constitute an intrinsic property, or is it a mere side effect of different degrees of compositionality? Furthermore, is it possible to quantify the importance of flexibility alongside compositionality for the overall idiomaticity of a phrase? And what is a proper definition of flexibility in the first place? In order to approach all these issues, this chapter is structured as follows. Section 4.1 is devoted to an overview of the most relevant literature on the flexibility of idiomatic expressions. For reasons of structural lucidity, this brief review is structured just like the one on compositionality, that is, it starts out with theoretical approaches and proceeds to psycholinguistic and corpus-linguistic approaches. As these sections will show, the term flexibility can actually be used to denote a variety of qualitatively different phenomena. Section 4.1.4 summarizes these differences in a more systematic fashion and explains what kinds of flexibility are included in the present analysis. Section 4.2 sets out to describe the corpus-linguistic measurement of (the different kinds of) flexibility by, first of all, explaining how a baseline value was established with which the flexibility values of the V NP-constructions can be compared. Sections 4.2.2 and 4.2.3 are devoted to the introduction of the mode of operation of the two methods selected to quantify flexibility: the first is an elaborated version of the measure proposed in Barkema (1994a), the second is entropy. The two measures highlight different aspects of flexibility and in combination, they provide a much more revealing picture of a phrase’s flexibility behaviour than either one could accomplish alone. Sections 4.3 to 4.5, each of which is devoted to one particular kind of flexibility, are all structured in a parallel fashion: before the results obtained for the two different measures are presented, the operationalization of the kind of flexibility in question is briefly explained, and examples from the corpus are given.
68
Rethinking Idiomaticity
Section 4.6 rounds off the chapter by comparing the results obtained from the two measures, showing that while they emphasize different aspects of flexibility, they generally produce highly compatible results.
4.1 Previous approaches 4.1.1 Theoretical approaches In earlier theoretical approaches, several proposals were made for formal devices to account for the idiosyncratic syntactic behaviour of idiomatic expressions. All these approaches have in common that they focus exclusively on flexibility at the syntactic level (more often than not limiting the discussion to classic syntactic transformations such as the passive transformation). Likewise, they consider only core idioms, which stems from the conception of idiomaticity as a binary phenomenon; and being mostly inspired by the generative framework, flexibility judgements are based on the researcher’s intuition rather than naturally occurring language data. Early generative approaches are presented with two problematic issues with respect to the syntactic flexibility of idioms. First, it is recognized that there is no stable correlation between non-compositionality and syntactic flexibility: It would be extremely gratifying if transformational defectiveness were a reliable syntactic correlate of phraseological units, semantically defined. Indeed, among the hundreds of idioms and other phraseological units I have looked at in four or five languages, I have not found a single one that did not have some transformational defect. But it also turns out that transformational defectiveness is not restricted to phraseological units. (Weinreich 1969:47) Moreover, it is difficult for generative models to handle the fact that if an idiom undergoes a certain transformation, one cannot automatically conclude that it licenses a fixed set of other transformations as well. A contrary claim is made by Fraser (1970), who argues that idioms are organized in a frozenness hierarchy, which ranges from expressions which undergo nearly all traditional transformations without the idiomatic meaning being lost, such as pass the buck, to idioms which are obviously completely syntactically frozen, e.g. blow off steam. Idioms are categorized into six classes on the hierarchy that specify which syntactic operations they undergo and which not; consider Table 4.1. To give some examples, Fraser analyses pass the buck (‘to refuse to take responsibility’) as belonging to level 5, which means that any reconstitution transformation as well as all transformations of lower levels apply to this idiom. Blow off steam (‘get rid off strong feelings’), according to Fraser, is categorized as level 1, which means that only the adjunction transformation applies to this idiom. Fraser’s frozenness hierarchy has been criticized by various linguists. McCawley (Dong 1971), for instance, provides a variety of (vulgar) counter-examples for which the hierarchy does not hold. Makkai (1972) provides even more
Flexibility measures
69
Table 4.1 Fraser’s (1970) frozenness hierarchy (adapted from Fraser 1970:39). Level
Transformation
Example
6 5 4 3 2 1 0
Unrestricted Reconstitution Extraction Permutation Insertion Adjunction Completely frozen
Nominalization Particle movement, passive, preposing of PPs Indirect object movement, particle movement Insertion of non-idiomatic constituent(s) Gerundive nominalization transformation
counter-examples (which were not affectively loaded in any direction). Machonis’s (1985) analysis of more than 4,000 idioms shows that they do not align with the predictions made by the hierarchy. Nevertheless, Fraser makes the valid observation that frozenness is a scalar phenomenon, ranging from completely free to completely frozen expressions. More often than not, more recent results seem to be compatible with a modified version of Fraser’s original proposal, which leaves open the possibility that while the hierarchy may not be universally valid, different dialect-specific or otherwise specified versions of the hierarchy co-exist. Also, several studies report findings that apparently support Fraser’s version of the hierarchy. Cutler (1982) uses Fraser’s idiom categorization to compare the average age of the idioms of the six classes and finds that the earlier the first citation in the Oxford English Dictionary is dated, the more resistant the idioms are on average to variation; another example is Reagan (1987), discussed in Section 4.1.2. Another proposal to account for the syntactic behaviour of idioms is made by Newmeyer (1974). He suggests that the syntactic behaviour of an idiom can be predicted on the basis of the meanings of the idiom and their literal paraphrases (Chafe 1968). That is, phrases like kick the bucket, shoot the bull and make the scene do not passivize, because their literal paraphrases, die, talk and arrive, do not either. However, this approach is also only feasible to a limited extent and crucially hinges upon the literal paraphrase one selects. Michiels (1977) illustrates the problem using the example drop a brick: if one paraphrases the meaning as ‘commit social blunder’, it can be passivized – but if one alternatively paraphrases it as ‘goof’, it cannot. Similarly, Nunberg (1978) points out that give up the ghost, throw in the towel and pop the question can be transformed into the passive despite the fact that their literal paraphrases, die, resign and propose, are intransitive.
4.1.2 Psycholinguistic approaches Psycholinguists primarily take interest in the relationship between noncompositionality and frozenness as a potential factor in idiom production and comprehension (cf. also Section 3.1.2). One of the first studies investigating
70
Rethinking Idiomaticity
syntactic frozenness is presented by Swinney and Cutler (1979). They classify idioms according to Fraser’s (1970) frozenness hierarchy and test if the degree of frozenness has any effect on processing speed – their results are not significant. However, Gibbs and Gonzales (1985) point out that the poor results may be due to the fact that the hierarchy is flawed since it does not adequately represent most English speakers’ mental lexicon. Accordingly, Gibbs and Gonzales (1985) elaborate on Swinney and Cutler’s idea. On the basis of an empirically established frozenness continuum of idioms, they find that subjects process syntactically frozen idioms faster than flexible ones on this hierarchy. This lends credence to the assumption that frozen phrases become routinized over time and are consequently accessible as holistic units. Moreover, the study demonstrates that people are sensitive to the syntactic properties of idioms. With respect to the question to what extent this sensitivity to syntactic flexibility depends on the semantic composition of the phrase, Gibbs and Nayak (1989) conduct a series of experiments which support the hypothesis that semantic analysability is indeed a good predictor of syntactic flexibility. Gibbs et al. (1989) expand the hypothesis that semantic composition determines syntactic flexibility to what they refer to as lexical flexibility, i.e. the freedom with which constituent words of an idiom phrase can be substituted with synonymous words. Their line of reasoning is the following (cf. also the discussion of Lin’s (1999) corpus-linguistic compositionality measure in Section 3.1.4, which is based on exactly the same idea): If the individual parts of semantically decomposable idioms contribute separately to these phrases’ overall figurative meanings, then changing any of these parts by inserting a synonym in its place should not be as disruptive to their figurative interpretations as would substituting synonyms for parts of nondecomposable idioms. (Gibbs et al. 1989:59) Their results testify to a strong correlation between compositionality and substitutability when comparing normally decomposable idioms with nondecomposable idioms, while abnormally decomposable idioms (the middle range class of Gibbs’ compositionality taxonomy, cf. Section 3.1.3) are nearly as flexible as normally decomposable idioms. Reagan (1987) compares the flexibility of idioms with three other different variables: the meaning closeness of the literal and the idiomatic meaning, explicability (which is defined as speakers’ ability to explain the etymology of the idiom), and familiarity (which is defined by the number of subjects in the experiment who can provide a definition of the idiom). The best predictor variable turns out to be speakers’ familiarity, followed by meaning closeness (which, given Reagan’s definition, can be regarded as a reproduction of noncompositionality). Explicability is not found to correlate significantly with flexibility. Moreover, the judged disruptiveness of different transformations on the idiomatic meaning correlates highly with Fraser’s (1970) frozenness hierarchy (combined r = 0.897, p < 0.01). Last but not least, 86 per cent of the judgements
Flexibility measures
71
also support Fraser’s claim that the different levels are implicationally organized such that if a particular transformation is felt to be unacceptable, all transformation further down in the hierarchy will also be considered unacceptable, the only exception being nominalizations. In sum, psycholinguistic studies show that different kinds of flexibility tend to correlate with compositionality and familiarity. Furthermore, they point towards the fact that the association between the semantic and the syntactic behaviour of idiomatic phrases only holds partially, and that it is particularly strong for core idioms. In order to get a more precise picture, it appears that more data need to be investigated – which are provided by corpus-linguistic approaches.
4.1.3 Corpus-linguistic approaches Maybe the most comprehensive corpus-based study of idiomaticity in terms of the range of idioms and other fixed expressions taken into account is presented in Moon (1998). Her data sample comprises 6,776 fixed expressions and idioms extracted from the Oxford Hector Pilot Corpus (OHPC), which range from collocations and metaphors to formulae (cf. Moon (1998:19) for her typology of fixed expressions and idioms). The widest possible range of lexical and grammatical forms, including predicates, nominal groups or modifiers, and also more complex phrases like subordinate clauses, exclamations and others are included. Also, fully lexically specified expressions are considered alongside what she refers to as idiom schemas (as exemplified by the cluster of items shake in one’s shoes/quake in one’s shoes/quake in one’s boots/quiver in one’s boots/quake in one’s Doc Marten’s etc.; Moon (1998:161)). The study provides a variety of frequency statistics for (different classifications of) these items in terms of transformational, lexical and discoursal variation parameters, which cannot be summarized briefly in toto here, so the review is restricted to predicate fixed expressions and idioms, which include V NP-constructions as investigated by the present study. Predicate expressions constitute the largest share of her data sample (40 per cent), and they most commonly take the form of ‘subject + predicator + object’ (29 per cent) or ‘subject + predicator + object + adjunct’ (27 per cent). With respect to what Moon refers to as ‘transformations’, she notes that 15 per cent of the predicate expressions passivize, and that occasional or even rare attestations of embeddings, nominalizations or transformations to adjectives can be observed (Moon 1998: chapter 5). Moon’s definition of the term variation covers a wide array of phenomena such as substitution of a part of the expression (which she refers to as lexical variation), interruption and insertion (e.g. We are a little late in getting our Christmas act together ), truncation (e.g. make hay/make hay while the sun shines), addition (of, say, prepositional phrases: go to hell (in a handbasket)), reversal (such as day and night/night and day), dialectal variation (e.g. touch wood in British English is mostly knock (on) wood in American English), genre variation (in your face/in yer face), and even erroneous variation (e.g. wrack and ruin instead of rack and ruin), calques (i.e. co-existing English expressions and Latin or French translation
72
Rethinking Idiomaticity
equivalents, e.g. au contraire/on the contrary) and false variations (i.e. variations which are no variations: get one’s hands dirty means ‘involved’, while have dirty hands means ‘be guilty’). In general, between 31 per cent of fixed expressions and idioms (those that occur 1–4 times in the OHPC) and 50 per cent (those that occur more than 100 times per million words in the OHPC) exhibit variations (Moon 1998:121), with more than 70 per cent of her data occurring less than one time per million words (Moon 1998:60). This testifies to the fact that idioms and fixed expressions are much more variable than was formerly assumed, and to a correlation between the corpus frequency of an item and its variability. In a series of papers, Barkema also investigates several aspects relating to idiomaticity from a corpus-linguistic perspective. Throughout, Barkema adheres to van der Linden’s (1992:8) definition that equates idiomaticity with non-compositionality as a scalar concept. Accordingly, Barkema distinguishes between fully compositional, pseudo-compositional, partly compositional and idiomatic expressions. Moreover, collocability and flexibility are regarded as correlates of idiomaticity, and in combination, the three variation parameters distinguish so-called ‘received expressions’ from ‘free expressions’ (cf. Barkema (1996a) for a detailed account of his descriptive model). Barkema (1994a) presents a corpus-linguistic measure for the syntactic flexibility of idioms. Since his approach forms the basis for the flexibility measures employed in the present study, it is described in all detail in Section 4.2.2. Barkema (1994b) presents the results of a corpus-based analysis of noun phrases which he classified as received expressions in the above sense. With respect to flexibility, four kinds of processes are distinguished: additions, insertions, permutations and term selections. Additions and insertions are described as ‘additional functions’, but only insertions ‘interrupt[.] the syntactic structure of the base form’. An example of an addition is constant in a constant bone of contention, an example of an insertion is finally in the straw that – finally – breaks the camel’s back (Barkema 1994b:33). Permutations are defined as word order variations such as heir presumptive versus presumptive heir . Term selections are described as variations of alternatives within some closed class paradigm, e.g. tense or number variation. On the basis of 50,760 tokens obtained from the 20-million words Birmingham Collection of English Texts, Barkema reports a variety of frequency statistics. A basic observation is that 62.3 per cent of all tokens are base forms, and the remaining percentage contains one or more variant forms. Additions are the most frequently occurring flexibility type (25.86 per cent of all variant tokens), followed by term selections (14.77 per cent), permutations (3.1 per cent) and insertions (0.27 per cent). The priority of additions and term selections compared with other flexibility processes is also evidenced in Table 4.2, which shows how many different variations are attested for each flexibility type: only for additions and term selections are more than two variations attested. Moreover, Barkema (1996b) interprets the data as suggesting an implicational hierarchy of variation types that is reminiscent of Fraser’s (1970) frozenness hierarchy
73
Flexibility measures Table 4.2 Number of different variations attested for each flexibility type (adapted from Barkema 1994b:35).
1 2 3 >3
Addition
Term selection
Permutation
Insertion
11,369 7,292 719 62
1,635 192 5 2
115 112 – –
9 1 – –
because ‘most types which had tokens with term selections also had ones with additions, while most types that had tokens with permutations also had ones with term selections and additions’(Barkema 1996b:74–75). In Barkema (1996b), the same data are subjected to an analysis of variance (ANOVA) in order to test to what extent flexibility is affected by variables such as compositionality, syntactic function, genre or register. With respect to compositionality, collocability, formulaicity (Barkema classified items as either being formulaic or not), genre, and frequency, the ANOVA produces non-significant results. The only factors that significantly explain differences in flexibility are the syntactic functions the item takes on, such that e.g. NPs in subject position tend to be much more flexible than in indirect object position (Barkema 1996b:79). Secondly, flexibility is generally significantly higher in written texts compared to spoken data – however, this significance of register does not hold for those types that occur in both registers. A variety of other studies have also investigated the relationship between compositionality and different flexibility parameters of these phrases, and mostly come to the conclusion that while semantic and syntactic parameters tend to correlate with respect to particular kinds of idiomatic constructions, this correlation is not sufficiently systematic as to license the assumption of a causal relationship between the two. Abeill´e (1995) investigates the relationship between the semantic compositionality and syntactic flexibility of 2,000 French V NP idioms. Analysing transformations such as raising, passives, cleft structures and adnominal modifications, her major conclusion is that ‘the syntactic description of an idiom cannot be predicted entirely from its semantic representation, in the sense that lots of noncompositional French idioms are syntactically flexible’ (Abeill´e 1995:38). Another example is Nicolas (1995), who also investigates V NP-idioms. Taking up claims that the possibility of internal modification of such structures like adjectives modifying the noun of the phrase requires that they be at least partially compositional (Gazdar et al. 1985, Wasow et al. 1983), Nicolas sets out to show that ‘such modification is systematically interpretable as modification of the whole idiom’ (Nicolas 1995:233). For instance, 64 out of 75 of the idioms he analysed can be regarded as instances of what he calls ‘viewpoint modification’: in a sentence like A hearing was convened . . . whether the . . . Act . . . passes constitutional muster, constitutional modifies the whole phrase, which
74
Rethinking Idiomaticity
Nicolas argues to be obvious from the fact that one can paraphrase it as to pass muster from a constitutional viewpoint. He empirically backs up his proposal of a grammatically based typology of V NP-idioms and a semantically based typology of adjectival and adverbial modifiers on the basis of corpus data from a 50-million words newspaper corpus, as well as acceptability judgements. His results are also largely compatible with a slightly different typology presented in Abeill´e and Schabes (1996). As a side remark, Nicolas points out that at least 90 per cent of the V NP-idioms he considers, including cases which were hitherto regarded as completely frozen, actually allow some form of internal modification. Moreover, there seems to be a correlation between grammatical features such as ‘NP definiteness and the availability of different semantic types of internal modification’ (Nicolas 1995:249). To sum up, previous corpus-based studies have investigated different flexibility parameters. There is general consensus that (i) most items are flexible at least to some extent, (ii) flexibility tends to correlate with token frequency, and (iii) compositionality does not correlate with any kind of flexibility to an extent that licenses the assumption of a causal relationship between the two. With respect to the comparability of the results obtained by the abovementioned studies and the present one, one caveat is in order. They all define their variables a priori, that is, not on the basis of the distribution of the corpus data, but on some corpus-external, mostly theoretical basis. This, however, may influence the results in at least two undesirable ways. First, any rigid class assignment will inevitably coerce items into classes which need not represent their semantic or syntactic behaviour in an optimal way. Consider, for instance, the operationalization of compositionality as a four-level variable. In view of the corpus-linguistic measure developed in Section 3.3, which shows that V NP-constructions actually form a very fine-grained continuum, forcing every item into one of four classes would necessarily result in classes with sub-optimal internal coherence. Secondly, reducing, say, morphological flexibility to tense variation and negation will provide adequate results with respect to the variable levels included, yet the overall picture for the flexibility type will be incomplete. Why not also consider variation in terms of mood, voice or kind of determiner? For these reasons, the present study presents flexibility measures which can handle any forms and variations a V NP-construction occurs in, that is, the classification schemes are derived from the data rather than the other way around to the maximum extent possible. For instance, with respect to syntactic flexibility, no particular syntactic variant like, say, the passive variant, is singled out in advance and tested for its occurrence. While syntactic configurations like the passive may be of particular theoretical relevance in transformational approaches to grammar, the theoretical approach adopted here does not pre-emphasize any specific syntactic configuration as being more relevant or revealing than another. Instead, for every V NP-construction, it is checked in how many different syntactic configurations the construction occurs in, and how often it occurs in every syntactic configuration (cf. Section 4.3 for a more detailed description).
Flexibility measures
75
4.1.4 Kinds of flexibility As the brief overview of previous studies has shown, various different concepts of flexibility have been proposed. These differences concern both the linguistic level at which flexibility is measured, and the kind of technical process that is involved. That is, the flexibility of a linguistic construction has been investigated at different levels: the syntactic, lexico-syntactic, or the morphological level. Moreover, and partly depending on the level at which the flexibility of a phrase is investigated, flexibility can technically be seen as a process of either permutation, addition/omission and/or even substitution. To complicate matters even more, some approaches, such as Fraser’s (1970) frozenness hierarchy, cut across these dimensions. The most prominent kind of flexibility in the context of idiomaticity is what is referred to as syntactic flexibility, frozenness, deficiency, transformability or syntactic productivity. Syntactic flexibility is mostly defined as permutability, that is, it is tested if (or to what extent) specific syntactic transformations are possible for a particular construction. If a transformation is possible, this is taken as evidence that the phrase is flexible. In early approaches to idioms, non-passivizability is often considered a key characteristic of core idioms; later studies also include other transformations. Consider the well-known example He kicked the bucket: since the passive version of this idiomatic construction, The bucket was kicked by him, can only be understood in its literal interpretation, this is taken as evidence that the construction is syntactically inflexible, furthering the point that it is a core idiom. Alternatively, a lexico-syntactic perspective shifts the focus to the question whether material of any kind can be added (or, alternatively, left out) without affecting the idiomatic interpretation of a phrase. For instance, the fact that ∗He kicked the bucket slowly is unacceptable may be taken as evidence that the phrase is also lexico-syntactically inflexible – He kicked the bucket quickly, however, is fine, and it points out that even this phrase, which is often quoted as an example of a core idiom, is flexible under certain conditions. Thirdly, one can conceive of flexibility as substitutability, which is sometimes also referred to as collocability: can parts of the phrase be substituted for others without losing the idiomatic interpretation? To give examples, consider (30a)/(30b): make and pull are obviously substitutable, because (30a) and (30b) are principally identical in meaning, so one can say that this is a relatively flexible construction. In contrast, substituting a piece of my mind for a piece of my conceptual system, the idiomatic meaning of (31a) is lost in (31b). Accordingly, this construction may be regarded as less flexible. (30) (a) (b) (31) (a) (b)
He made a face. He pulled a face. I gave him a piece of my mind. ∗ I gave him a piece of my conceptual system.
With respect to substitutability, it also has to be noted that some use this term in relation to synonymous alternative realizations of a construction that have
76
Rethinking Idiomaticity
become conventionalized. Other studies, in contrast, focus less on the variation that can actually be observed in authentic language data, but more on the ad-hoc substitutability potential of a particular construction. Flexibility here is regarded as a meta-concept that unites at least three specific sub-parameters, which are subsequently referred to as tree-syntactic, lexicosyntactic and morphological flexibility (each of which is defined in more detail in the following sections). Tree-syntactic flexibility refers to the flexibility with which a phrase occurs in different syntactic patterns, for example in its passive version, as a relative clause or as an interrogative (a detailed overview of all the factors comprising tree-syntactic flexibility and examples are presented in Section 4.3). That is, tree-syntactic flexibility covers what other studies mostly refer to as syntactic flexibility, frozenness, or the like. Lexico-syntactic flexibility refers to any kind of variation of the form of the V NP-construction which involves the addition of some kind of material, be it internal or external modification. That is, lexico-syntactic flexibility captures to what extent a V NP-construction occurs with attributive adjectives modifying the noun phrase of that construction, how freely it takes prepositional phrases, or how often it is modified by an adverbial phrase of some sort. For an overview of the different lexico-syntactic flexibility factors and examples, cf. Section 4.4. The third flexibility parameter considered in the present study is morphological flexibility. Morphological flexibility refers to any kind of variation at the morphemic level, that is, it measures, among other things, how flexible a V NP-construction is with respect to tense, voice or mood, and how variably the determiner slot is filled. A complete list of the morphological flexibility factors for which all tokens of the V NP-constructions are coded is provided in Section 4.5. Phonetic flexibility, while most likely another specific instantiation of flexibility, can be neglected here because the vast majority of data are written language (cf. Section 2.2). Finally, substitutability is not included in the present investigation for theoretical reasons. From a constructionist point of view, even though constructions like make a face and pull a face mean roughly the same and are surely related, they instantiate two separate constructions, since any change in the surface realization of a construction is expected to be accompanied by a change in meaning and/or function (Goldberg 1995:67 on the Principle of No Synonymy). From a constructionist point of view, the transition from meaningpreserving changes like from make a face to pull a face to the productive creation of clearly different meanings such as from to burn one’s bridges behind oneself to to burn one’s bridges ahead of oneself is small and gradual. However, no matter how minimal the changes involved, the focus of the analysis shifts from idiomaticity (that is, institutionalization and lexicalization) to delexicalization, which are principally two diametrically opposed processes. Likewise, the focus of analysis shifts from fully lexically specified constructions to partially specified constructions. So while it is reasonable to assume that low idiomaticity is a precondition for delexicalization processes to set in, they are two distinct processes that unfold into different directions, thereby underscoring the distinction made between
Flexibility measures
77
fully specified and partially specified idiomatic constructions in construction grammar (cf. Section 1.2.5).1 Taken together, 13,141 tokens (that is, all occurrences of the 39 V NPconstructions and the 1,151 tokens of the baseline sample) were coded for 18 different variation parameters: tree-syntactic flexibility, ten different subparameters of morphological flexibility (such as voice, number, mood, etc.) and six different sub-parameters encoding lexico-syntactic flexibility. Table F.1 in Appendix F provides an overview of the formal flexibility parameters, their parameter levels and examples from the data sample. A total of 236,538 classifications were made manually or semi-manually. A data sample that comprises 39 V NP-constructions may not appear very impressive at first sight. However, the fact that nearly a quarter of a million classifications were made gives an impression of the depth of the coding and the richness of the data set.
4.2 A new approach to flexibility 4.2.1 The baseline In the context of a corpus-linguistic definition of flexibility, the question arises how to put the flexibility values for the individual constructions into a similar interpretative context. Without some standard of comparison, the values and resulting rankings are difficult to interpret: how flexible are the corpus data in general? Which constructions in the sample deviate from this baseline? For which flexibility parameters or maybe even specific parameter levels can we observe such deviations from the baseline and how strong are they? In order to answer these questions, a corpus baseline value has to be established for every flexibility parameter. With respect to tree-syntactic flexibility, for instance, we need a value that tells us how often, on average, V NP-constructions occur in the active or passive voice, or how often they take the form of declarative, interrogative or imperative sentences. Likewise, a morphological flexibility value of, say, 0.6 for the idiomatic variation parameter Tense on a scale from 0 (for minimally flexible constructions) to 1 (for maximally flexible constructions) is easier to interpret if we know that the average morphological flexibility with respect to Tense in V NP-constructions in general is 0.76. As Barkema (1996b:45) points out, the ideal baseline data sample should be extracted from the same corpus as the other data. However, he also has to cope with the problem that syntactically and/or morphologically annotated corpora are, to date, too small to retrieve a sufficient number of hits for a particular lexically specified construction. That is, while it is possible to derive a sample of, say, 1,000 hits of the abstract V NP-construction from the British component of the International Corpus of English (ICE-GB), this 1-million word corpus is much too small to retrieve a sufficient number of hits for lexically specified V NP-constructions such as make X headway or do X trick. Therefore, Barkema proposes to use a large corpus to extract the lexically specified constructions
78
Rethinking Idiomaticity
(which he refers to as forms of specific received expressions) and to obtain the baseline data for the abstract syntactic construction (which he refers to as patterns of free expressions) from a small, fully annotated corpus. In accordance with this proposal, the baseline data for the present study were taken from the ICE-GB. A potential problem arising from such a procedure is that the BNC and the ICE-GB differ substantially with respect to their register distribution, which may have a huge impact on the (frequencies of the) variations found. In the ICE-GB, 60 per cent are spoken and 40 per cent are written data. The BNC, in contrast, comprises only 10 per cent spoken language. To maximally enhance the similarity of the two corpora, it is therefore reasonable to compile a (sufficiently large) sample from the smaller corpus (here: the ICE-GB) that reflects the register distribution of the larger corpus (here: the BNC) as much as possible. To that end, the ICE-GB was searched for V NP sequences (with optional adjectival or adverbial modifiers intervening; allowing the presence of any kind of determiner/no determiner) in all their syntactic forms, i.e. in declarative main clauses, (zero) relative clauses, particle clauses with to, interrogatives and imperatives (all of them both in active and passive voice). Taken together, 2,295 sentences were retrieved, 1,046 written and 1,249 spoken ones. In order to obtain a maximally large data sample, and in compliance with the 90 per cent written/10 per cent spoken language distribution of the BNC, all 1,046 written sentences were taken over into the sample. To these 1,046 written sentences, another 105 randomly selected spoken sentences were added, yielding a baseline data sample of 1,151 sentences. This baseline sample was then coded for all parameters in the same fashion as the lexically specified constructions, and the results were reported alongside.
4.2.2 Measure I: An extension of Barkema (1994a) Barkema (1994a) presents a formula for determining the syntactic and morphological flexibility of English noun phrases such as cold war . In a first step, the so-called received form profile of the construction has to be determined, that is, an inventory of all its variant forms in the corpus is created (Barkema 1994a:43). The noun phrase cold war , for example, exhibits the following received form profile in the 20-million words Birmingham Corpus (Barkema 1994a:43): (32) renewed Cold War; the melting Cold War; the world Cold War; continuing, ever-present Cold War; the Cold War won by Europeans who ‘destalinized’ Eastern Europe; the cold war which threatened to divide the world into two ideological armed camps; a not-so-cold war against Kaddafi; the awkward cold war thought up by the American paranoids, who should be back in the law offices of middlewestern towns; a period of hot and cold civil war which ended with Hitler’s invasion of Austria; a kind of cold civil war; the cold war that existed between the two giants, the United States and [. . .]; the Cold War in Washington; the cold war between the Nature Conservancy Council and the farmers.
79
Flexibility measures
Table 4.3 Flexibility profile for cold war (adapted from Barkema 1994a:50). Form
n exp. (%)
n obs. (%)
Diff.%
Base form Premod. Adj Postmod. PP Premod. adj./postmod. PP Premod adj./postmod. clause Postmod. past part. clause Premod. adv./postmod. PP Premod. adj. (in expression) Premod. noun Coord. conj./premod. adj./postmod. clause Noun is in plural Noun is in plural/postmod. PP Premod. intensifying adv. Postmod. past part. clause Noun is in plural/premod. intensifying adv. Noun is in plural/postmod finite clause Superlative premod. adj./postmod. PP Superlative premodifying adjective Noun is in plural/postmod. past part. clause
49.15 (39.64) 4.18 (3.37) 7.59 (6.12) 19.24 (15.52) 1.17 (0.94) 0.98 (0.79) 1.96 (1.58) 0.08 (0.0006) 0.04 (0.0003) 0.04 (0.0003) 24.64 (19.87) 2.85 (2.3) 2.35 (1.9) 0.98 (0.79) 0.86 (0.69) 0.66 (0.53) 0.55 (0.44) 0.5 (0.4) 0.46 (0.37)
111 (89.52) 3 (3.2) 2 (1.6) 2 (1.6) 1 (0.008) 1 (0.008) 1 (00.008) 1 (0.008) 1 (0.008) 1 (0.008) 0 (–) 0 (–) 0 (–) 0 (–) 0 (–) 0 (–) 0 (–) 0 (–) 0 (–)
+49.88 −0.17 −4.52 −13.92 −0.93 −0.78 −1.57 +0.0074 +0.0077 +0.0077 −19.87 −2.3 −1.9 −0.79 −0.69 −0.53 −0.44 −0.4 −0.37
Total
124 (100)
124 (100)
–
These variations are grouped according to morpho-syntactic criteria, such that, e.g. all instances of cold war that are preceded by an adjective form one group, all occurrences of cold war followed by a prepositional phrase form another, and all occurrences of cold war preceded by an adjective and followed by a prepositional phrase form a third group. For the received form profile of cold war , ten such groups can be established. The overall picture emerging from (5.3) is that cold war occurs 124 times in the corpus, 111 times in its base form, the remaining 13 times in a variant form (cf. Table 4.3 for an overview). The standard of comparison for the flexibility of cold war is a corresponding inventory of all variant forms of the syntactic construction underlying the lexically specified expression cold war , which is an adjective premodifying a singular common head noun (subsequently referred to as adjective–noun construction). This construction occurs 3,171 times altogether in the Birmingham Corpus, and 1,257 times in its base form. The logic behind comparing the flexibility of cold war with that of the adjective–noun construction is that if cold war behaves like a free expression, then for any particular form A, such as the variant form in which cold war is followed by a prepositional phrase, the ratio of the frequency of occurrence of this form to the total number of variant forms of cold war should be about the same as the ratio of the frequency with which the adjective–noun construction takes prepositional phrases to the number of all base and variant forms of the adjective–noun construction. This is represented in (33).
80
Rethinking Idiomaticity
(33) n form A coldwar n form A in adj − noun pattern = n base + variant forms coldwar n base + variant forms in adj − noun pattern
Accordingly, one can compute the expected frequency of any particular form A of the expression cold war on the basis of formula (34). (34) n exp form A coldwar = f base + variant forms coldwar ×
n form A in adj − noun pattern n base + variant forms in adj − noun pattern
With respect to cold war , one can, for instance, determine the expected frequency of the base form by inserting the respective frequencies into the formula as in (35). (35) n exp base form coldwar = 124 ×
1,257 = 49.15(∼49) 3,171
While the expected frequency of the base form is 49/124, the observed frequency is 111/124. From this we can already conclude that cold war is obviously much less flexible than expected, because it mostly occurs only in its base form. A more complete picture is accomplished by considering the expected and observed frequencies for any form in which cold war can potentially occur; consider Table 4.3 for a complete overview of cold war ’s flexibility profile. The leftmost column specifies the form, followed by its expected frequency and its observed frequency (all accompanied by their values in per cent of all 124 occurrences of cold war ). The rightmost column provides the difference between observed and expected frequency; positive numbers mean that cold war occurs more often in that particular form than the adjective–noun construction in general, negative numbers mean that cold war occurs less frequently in that form compared to the construction. For the purpose of the present study, Barkema’s method was slightly modified in several respects. First, it has to be noted that while Barkema collapses morphological, lexico-syntactic and tree-syntactic aspects of flexibility, the following sections discuss the results obtained for each of these flexibility types individually. Secondly, as already outlined in Section 4.2.1, the flexibility profile of the V NP-construction, henceforth referred to as the baseline, was established slightly differently from Barkema’s. A third modification is motivated by the desire to have (i) an overall flexibility value for each variation parameter, and (ii) an index value ranging between 0 and 1 so that the results can be compared more easily. But how to weight the influence of every parameter level (i.e. ‘form’ in Barkema’s terminology) to the overall flexibility value? The basic idea of the extension of Barkema’s measure employed here is that small deviations from the baseline will have only a little effect on the overall flexibility value, while large deviations will have a considerable impact. To provide an example, consider the V NP-construction foot the bill and its morphological flexibility in terms of the idiomatic variation parameter Tense.
Flexibility measures
81
This parameter comprises four levels: ‘past’, ‘present’, ‘future’ and ‘nonfinite’. Foot the bill occurs 109 times overall in the BNC, and these 109 occurrences are distributed across the parameter levels of Tense as follows: 10 times as ‘past’, 45 times as ‘present’, 9 times as ‘future’ and 45 times as ‘nonfinite’. If the parameter levels of Tense were actually distributed across these 109 items like they are distributed in the baseline, we would expect 28 occurrences in the past tense, 68 occurrences in the present tense, 2 occurrences in the future tense, and another 11 occurrences of the verb as a nonfinite form. Accordingly, the differences between observed and expected frequencies in per cent amount to −16.80 per cent for the parameter level ‘past’, −20.66 for the parameter level ‘present’, 6.69 for ‘future’ and 30.77 for the parameter level ‘nonfinite’. Since the variation parameter Tense forms a closed paradigm, the sum of all deviation percentages will always be 0, so these numbers have to be combined in a different manner to obtain an overall Tense flexibility value. To this end, the values are squared and then added: small deviations will contribute only little to this overall deviation value, while big deviations will contribute much more. The overall sum of squared deviations for Tense for foot the bill is 1700.952. The largest deviation, namely that of the parameter level ‘nonfinite’, contributes 946.904 to this overall value, while ‘future’, deviating only 6.69 per cent from the baseline for this V NP-construction, contributes only 44.8 to the overall value. Note that the weighting of the parameter levels for each parameter is always determined item-specifically: for do the trick, the parameter level ‘nonfinite’ is apparently much less important, because the deviation only amounts to −1.48, which is why it makes only a small contribution to do the trick’s overall Tense flexibility value. Since the resulting overall values can take on any value higher than 0, they may be difficult to interpret when one wants to compare different values within one variation parameter, or just to get an impression of how the V NP-constructions are dispersed with respect to the variation parameter in question. For that reason, not only the sums of the squared deviations (SSD) are reported in the results sections, but also a normalized version of this value (NSSD). That is, for every variation parameter, an index is created by distributing the values on a scale between 0 and 1 (which is done by taking the smallest summed squared deviation to be 0 and the largest to be 1). The summed squared deviation value is also reported since it might be interesting to see how high the absolute deviation actually is for a particular V NP-construction. Moreover, there is one piece of information that is neither accessible on the basis of Barkema’s original method nor the extension presented here: the directionality of the deviation. Is, say, foot the bill generally more or less flexible with respect to tense variation compared to the baseline? So far, we can only consider the directionality of the deviation for the different variation parameter levels (cf. Appendix E for the results for all parameter levels of the different flexibility types), but at the level of the more general variation parameters, this information is lost due to the squaring process. For this reason, another flexibility
82
Rethinking Idiomaticity
measure is taken into account to complement the Barkema-extension which provides exactly this missing piece of information: relative entropy.
4.2.3 Measure II: Entropy Entropy is a term that is used in disciplines and fields of study as diverse as physics, information theory, economics, evolution and statistical mechanics. What exactly the term refers to crucially depends upon the context in which it is employed, although many of these meanings share certain properties. The basic idea behind most of the meanings related to entropy, including the statistical meaning employed here, is best described with reference to its most prominent use in the context of thermodynamics. One can think of a glass with iced water standing in a warm room as a small ‘universe’, that is, a thermodynamic system that consists of a surrounding (in this example the warm room) and a system (the glass, ice and water). In any thermodynamic system, differences in pressure, density, and temperature of the quantities of matter involved tend to equalize over time. That is, the longer the iced water stands in the warm room, the more heat energy from the room will spread out to the glass and cause a melting of the ice, until ultimately, the temperature in the glass and the room are the same. While the room temperature decreases only minimally, the temperature increase in the glass is enormous. In other words, while the energy dispersion in the room decreases a little, that of the glass increases a lot. Entropy here refers to the degree of energy dispersion in the glass and the room at some given point of time in the unfolding of this process. In statistical terms, this degree of energy dispersion can be paraphrased as the amount of randomness, uncertainty or mixedupness (as described by the theoretical physicist J.W. Gibbs) in a given system. The more different states are simultaneously present in a system with a higher probability, the more mixed up or disordered is the system. For example, imagine you have two (fictitious) systems, A and B, with 10 possible states, and that you observed those two systems for a while, noting down every time what state that system was in. Let us assume that 100 observations provided you with the frequency data in Table 4.4. As can be seen in Table 4.4, system A takes on all the ten different states more or less equally often, while system B has a strong tendency to occur in state 1. In other words, in system A, all states have a relatively high probability (given that there are ten different possible states), while in system B, state 1 has a much higher probability than all the other states. The exact probabilities for each state can be determined by dividing the number of observations by the total number of observations made, which is 100 in our fictitious example; consider Table 4.5. As Table 4.5 shows, system A can be described as random or disordered because it is difficult to predict which state it will take on – the differences in probability are too small to derive any predictions which state the system will take on at any future observation. For system B, on the other hand, it is safe to predict
83
Flexibility measures Table 4.4 Frequency distribution of different states for systems A and B (fictitious example). State 1 2 3 4 5 6 7 8 9 10
n observations system A
n observations system B
10 11 8 14 11 10 6 14 6 10
91 1 1 1 1 1 1 1 1 1
that it will take on state 1, because this state has a probability of 91 per cent. Note that neither of the example distributions given here is the most extreme one possible: The most random, disordered system would be one in which each of the ten states is equally likely (that is, has a probability of 0.1); the least random, ordered system would be one in which only one of the ten states occurs 100 per cent of the time. We can calculate the relative entropy for the two systems as follows. The basic logic behind most statistical entropy measures was developed by Boltzmann. He defined entropy as proportional to the logarithm of the number of states that a given system can occupy. Since we have ten different states in our example, the highest theoretically possible entropy value (H max ) is the logarithm (to the base of 22 ) of 10 = 3.322. In order to calculate the actually observed entropy (H ) for the two systems, we have to build the sum of all probabilities multiplied with their logarithms, cf. formula (36) (Manning and Schu¨ tze 2001:61). Table 4.5 Probabilities of occurrence of states in systems A and B (fictitious example). State
p in system A
p in system B
1 2 3 4 5 6 7 8 9 10
0.1 0.11 0.08 0.14 0.11 0.1 0.06 0.14 0.06 0.1
0.91 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
84
Rethinking Idiomaticity
(36) H(p) = H(x) =
p(x) log2 p(x)
x∈X
For system A, we get an entropy value of 3.269. For system B, the entropy value is 0.786. If we divide the observed entropy H by the maximum entropy H max as shown in (37), we get the relative entropy (H rel ), which can range between 0 and 1. (37) Hrel =
H Hmax
System A has a relative entropy of 0.984, so this system is very close to being perfectly random. System B, on the other hand, has a relative entropy value of 0.237, which is rather low. From here, it is just a small conceptual leap to using relative entropy as a flexibility measure. The different flexibility types are the systems we want to investigate; the different parameter levels each flexibility type can take on, such as ‘simple’, ‘progressive’, and ‘perfective’ for the morphological flexibility parameter Aspect, find their conceptual counterpart in the different states of a system. In analogy to the above example, flexibility is highest when all possible parameter levels of a given variation parameter tend to occur equally often, that is, statistically speaking, have the same probability of occurrence. That is, if we consider morphological flexibility in terms of Aspect and find that a V NP-construction occurs about equally often in the simple, progressive and perfective aspect, this renders the construction more flexible than one which occurs exclusively in the simple aspect. For instance, for the V NP-constructions draw X line and fit X bill, we find the distribution of their tokens according to the three parameter levels of Aspect as shown in Table 4.6 (in analogy to Table 4.4). As can be seen in Table 4.6, draw X line has a clear preference to occur with simple aspect, but it also takes the perfective aspect and sometimes even the progressive aspect. Fit X bill, on the other hand, nearly exclusively occurs in the simple aspect. In analogy to Table 4.5, we can now determine the exact probabilities for each parameter level by dividing the number of observations for each parameter level by the overall number of occurrences (310 for draw X line and 116 for fit X bill) as shown in Table 4.7. Since there are three possible parameter levels, H max = log2 (3) = 1.585. Inserting the numbers from Table 4.7 into the formula in (36), we get the observed entropy values for draw X line and fit X bill: H drawXline = 1.262, HfitXbill Table 4.6 Frequency distribution of draw X line/fit X bill for the parameter levels of Aspect. Parameter level
n tokens of draw X line
n tokens fit X bill
Simple Progressive Perfective
194 30 86
111 1 4
85
Flexibility measures Table 4.7 Probabilities of occurrence for parameter levels of Aspect for draw X line/fit X bill. Parameter level
n tokens of draw X line
n tokens fit X bill
Simple Progressive Perfective
0.626 0.097 0.277
0.957 0.009 0.034
= 0.287. Dividing the observed entropy by the maximum entropy, the resulting relative entropy values amount to 0.797 for draw X line and 0.181 for fit X bill, respectively. According to these relative entropy values, the aspectual flexibility of draw X line is considerably higher than that of fit X bill. Just like the relative entropy values for all V NP-constructions can be compared with each other, this value also enables us to compare the flexibility values for the fully lexicalized constructions with that of the baseline sample. With respect to the example of aspectual flexibility, the entropy value for the baseline amounts to 0.767. Accordingly, draw X line is even slightly more flexible than the baseline, while fit X bill is much less flexible than the baseline.
4.2.4 Measure II, version B: Directional entropy For tree-syntactic as well as morphological flexibility, it is reasonable to define the concept of flexibility as one of versatility or variability like the entropymeasure entails: the more different parameter levels of a particular variation parameter are present, and the more evenly they are distributed across the overall number of tokens, the more flexible the construction in question. That is, talking about, say, the morphological flexibility variation parameter Mood, a construction X which takes the three parameter levels of that variation parameter, namely ‘indicative’, ‘subjunctive’ and ‘nonfinite’, as often as expected by chance is considered more flexible than another construction Y which tends to occur exclusively in the indicative mood. With respect to lexico-syntax, flexibility is a qualitatively different concept altogether: the more often a construction is attested being modified by, say, adjectives, the higher is its flexibility. A construction that is always accompanied by an adjective should rank highest with respect to that variation parameter; a construction that never takes any adjectives should rank lowest. In a more technical parlance, for lexico-syntactic flexibility, flexibility is highest when only the parameter level encoding the presence of some added lexical material is present, and it is lowest when only the parameter level encoding the absence of some added lexical material is assigned to all tokens of a construction. The entropy-measure, however, is not sensitive towards this qualitative difference between parameter levels. Entropy values would be high for constructions which occur about half of the time with an adjective and the other half of the time without one. This does not adequately reproduce the conception of
86
Rethinking Idiomaticity
this variation parameter: a construction with such a distribution should obtain a middle rank on the flexibility index. Similarly, entropy will assign low values both to constructions which occur exclusively without adjectives, and constructions which are always accompanied by one. The resulting ranking is confusing because these two different pieces of information are not distinguishable. For lexico-syntactic flexibility, constructions with many positive instantiations of the variation parameter in question should rank high, constructions with many negative instantiations of the parameter rank low, and constructions with a mixed distribution obtain the middle ranks. To this end, the original entropy values were transformed into what I, being unaware of the existence of a similar adaptation of the entropy measure, coined directional entropy. The information that is added to the original entropy value is the directionality of the flexibility, such that items which predominantly occur without the lexical material in question are assigned negative values, whereas those that predominantly occur with lexical material are assigned positive values. Constructions for which the distribution of tokens with and without additions is about even obtain a value around zero. This conversion involves two simple steps. For each lexico-syntactic variation parameter (with the exception of KindAdv, which calls for the original entropy value rather than the directional version), the entropy value is subtracted from 1. Accordingly, high entropy values result in small numbers and low entropy values in high numbers. A construction with an even distribution of parameter levels now has a value around zero, while constructions with strongly biased distributions of parameter levels have a value close to 1. In a second step, these values are assigned directionality such that the raw data are considered to check which of the parameter levels is instantiated more often: either the one encoding the absence of the added lexical material in question, or the sum of instantiations with parameter levels denoting the presence of some added material. If the former is the case, the subtracted entropy value is assigned a negative algebraic sign; if the latter is the case, it gets a positive algebraic sign. Accordingly, we obtain a directional entropy flexibility index ranging from −1 to +1: constructions with negative values are the least flexible, constructions with a value around zero are moderately flexible, and constructions with positive values are considerably flexible. The information provided by the original entropy value, namely the magnitude of the randomness or chaos in the data, is preserved, and supplemented by information about the directional tendency of this variability. To provide a more concrete example, consider the lexico-syntactic variation parameter Addition. This variation parameter comprises two parameter levels, namely ‘absent’ and ‘present’, encoding the absence or presence of any kind of lexical material added to the V NP-construction in question. That is, if many instantiations of Addition are coded as ‘present’, it means that the construction often occurs with adjectives, prepositional phrases, adverbials and the like. The more tokens are coded with ‘absent’, the less often the construction occurs with any kind of additional material. Let us take a look at the V NP-construction make
Flexibility measures
87
X headway. The original relative entropy value for this construction amounts to 0.719, that is, there is substantial variability in the data such that the two parameter levels ‘present’ and ‘absent’ are both instantiated; subtracting this value from 1, we get 0.222.3 The raw data of make X headway reveal that it occurs more often with the parameter level ‘present’, i.e. takes additional lexical material. Accordingly, the final directional entropy value is +0.222. For the V NPconstruction scratch X head, we get a highly similar relative entropy value of 0.76, but in this case, the raw data show that scratch X head occurs less often with additional material than without. This is reflected in the final directional entropy value of −0.24, which clearly sets scratch X head apart from make X headway.
4.3 Tree-syntactic flexibility (SF) 4.3.1 Application The ICE-GB was used as a starting point to establish a list of the syntactic variations that V NP-constructions tend to occur in. Despite the fact that the ICE-GB and the BNC exhibit considerable differences in terms of genre and register distribution, it is safe to assume that the ICE-GB provides a representative picture of the potential syntactic variations that V NP-constructions occur in, and it offers the enormous advantage of syntactic annotation, so syntactic constructions and their frequencies can be determined quickly. Consider Table 4.8 for an overview of the syntactic variations found for V NP-constructions. Nearly all items of the V NP-constructions in the BNC-based data sample could be classified as belonging to one of the above classes – only 98 of the total of 13,141 items had to be classified as ‘other’; an example is given in (38). Table 4.9 provides an overview of the parameter levels of syntactic flexibility that are distinguished, and the abbreviated code names for each parameter level. Table 4.8 Frequencies of syntactic variations of V NP-constructions in the ICE-GB.4 Syntactic variation
Frequency in ICE-GB (% of total)
declarative active declarative passive (zero) relative clause active particle clause with to active (zero) relative clause passive particle clause with to passive interrogative active imperative active interrogative passive5 imperative passive
11,502 (68.14) 534 (3.16) 3,948 (23.39) 156 (0.92) 4 (0.02) 11 (0.07) 393 (2.33) 333 (1.97) 0 (–) 0 (–)
Total
16,881 (100)
88
Rethinking Idiomaticity Table 4.9 Parameter levels for tree-syntactic flexibility (SF). Code
Parameter level
DeclAct DeclPass RelClAct PartClAct RelClPass PartClPass InterrAct ImpAct InterrPass ImpPass
declarative active declarative passive (zero) relative clause active particle clause with to active (zero) relative clause passive particle clause with to passive interrogative active imperative active interrogative passive imperative passive
(38) (a) (b) (c) (d) (e) (f) (g) (h) (i) (k)
It broke her heart. (declarative, active voice) The police were then called and McDonald was arrested. (declarative, passive voice) Corbett sat and studied the letter he had written. ([zero] relative clause, active voice) There are two more points to make. (particle clause with to, active voice) This pliable conception structures the course of action taken. ([zero] relative clause, passive voice) Inevitably there is a little story to be told about this. (particle clause with to, passive voice) I mean, didn’t you have a clue? (interrogative, active voice) Don’t take the piss! (imperative, active voice) Can a satisfactory line be drawn here? (interrogative, passive voice) Ruth heard Adam teeth grit at the memory. (other)
4.3.2 Results In the following, all results are presented as graphs because they are more accessible, particularly with regard to the overall ranking of the V NP-constructions and the range that the flexibility values take on. The values on which the graphs are based are given in appendices B (for the Barkema-based flexibility measure) and C (for the entropy-based flexibility measure), respectively. For the Barkema-based tree-syntactic flexibility measure, the reader is referred to Appendix E, which provides the results for the individual parameter levels (that is, declarative active, declarative passive, etc.), presented as deviations from the baseline in per cent in accordance with Barkema’s original proposal. Figure 4.1 displays the overall results for tree-syntactic flexibility. Recall that the higher the deviation from the baseline, the less flexible the V NPconstruction. There are several noteworthy facts to be taken from Figure 4.1. First of all, the results confirm that there is no one-to-one correspondence between syntactic
ll_ p dr oli aw c e wr _ it lin m e_l e e ak t t e_ e r t e po i n ll_ t pl st a o ta y_g ry k e am c r _co e os ur s ho _fi se l d ng _ e cl bre r os at e h m _ do ak o e r c a _f a c t c ha h_ e c a v e_l e y e r r au y_ g w h m ha eig ak v e ht e_ _ c he lu ad e ta wa ke y gr _pi i s fig t_to s br ht _ ot h e a ba k_ tt ta gro le ke u _p nd lu n be do_ ge g_ t r qu i c k es ti fit on s e _bi e_ l l p fo oin o t pa t_b i le ve_ ll av w br e _ m a y ea a k r cr _he k o de ss_ art liv m er ind _ be go s c a r od ra _f tc ru h_ it ta hea ke d m _r a ch ke_ oot an m ge a r ge f o _ha k t_ llo nd ac w t _ _s to ui g t m ethe ee r t_ ey e
Mean NSSD 0.50
0.40
0.30
Flexibility measures
ca
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.1 Mean (normalized) sums of squared deviations from baseline (NSSD) for tree-syntactic flexibility (SF).
89
V NP-construction
90
Rethinking Idiomaticity
flexibility and compositionality. Consider the V NP-constructions with the lowest sums of squared deviations, i.e. those that exhibit the highest syntactic flexibility: highly compositional constructions (such as call X police at position 1, write X letter at 3, tell X story at 5 or close X door at 10) alternate with constructions taking on middle ranges in the compositionality ranking (e.g. draw X line at 2, make X point at 4, play X game at 6 or take X course at 7). However, the fact that all of the nearly perfectly compositional items such as write X letter , call X police, tell X story and close X door are among the top range of the flexibility ranking points towards a partial correlation between syntactic flexibility and compositionality. Highly compositional items tend to be highly flexible, too. With respect to those constructions which are not (nearly) perfectly compositional, in contrast, we find that they may exhibit any kind of syntactic behaviour ranging from very flexible (above) to fairly flexible, so the correlation between compositionality and syntactic flexibility is not a systematic one (statistically speaking, the correlation between the Barkema-based measure for tree-syntactic flexibility and compositionality amounts to −0.3, which is rather weak). To provide an example, constructions like see X point, break X heart or scratch X head obtain flexibility ranks within the same range as constructions like foot X bill or take X piss, which rank much lower in compositionality. The V NP-constructions with the lowest syntactic flexibility are change X hand, follow X suit, (get X act together ) and meet X eye – all of them exclusively occur in the declarative active variant form. That is, contrary to recent claims that most if not all constructions are flexible at least to a limited extent, the performance data reveal that some are indeed totally frozen. Moreover, the overall picture that emerges from Table C.1 (Appendix C) is that most constructions in the present data sample differ only slightly in their flexibility: the majority takes on index values between 0.8 and 1. In other words, substantial differences in syntactic behaviour are observable only in those V NP-constructions that are generally highly flexible, while most other V NPconstructions are much more difficult to discern on the basis of their syntactic profiles (at least if one only considers the overall quantitative picture). If we take Table E.1 (Appendix E) into consideration, we can furthermore check if there are parameter levels of syntactic flexibility that are particularly responsible for deviations from the baseline. It is clearly obvious that the deviations are highest for the parameter level ‘declarative passive’ such that the V NP-constructions tend to passivize considerably less often than the baseline does. So it comes as no surprise that passivizability is often used as a key criterion to check for idiom status: once we take this parameter level into consideration, we can distinguish more and less idiomatic constructions much better than on the basis of any other parameter level (which of course only makes sense for V NP idioms – for other idioms this parameter may be completely irrelevant). However, we can also see in Table E.1 that syntactic flexibility cannot be reduced to passivizability: constructions also clearly deviate from the baseline with respect to their readiness to occur in the imperative active, interrogative active or relative clause active variant forms – neither of which have played a
Flexibility measures
91
prominent role in previous studies. A more adequate concept of tree-syntactic flexibility also highlights differences in syntactic behaviour in terms of sentence types. Figure 4.2 complements the results obtained from the Barkema-based measure by showing that all V NP-constructions are indeed considerably less flexible than the baseline (recall that for this measure as opposed to the Barkema-based measure, high values mean high flexibility). Moreover, the entropy measure reproduces results that are highly similar to those of the Barkema-based measures (the correlation between the two measures is actually nearly perfect for this variation parameter: r Pearson = 0.955; cf. Section 4.6.1 for an overview). Most of the V NP-constructions are flexible only to a limited extent, and V NP-constructions high and low in compositionality are more or less evenly distributed across the range of flexibility values. This is also reflected in the correlation between the entropy-based measure for this variation parameter and compositionality, which amounts to 0.267. As to Reagan’s (1987) claim that the best predictor of flexibility is familiarity, corresponding correlations between the tree-syntactic flexibility values and corpus frequency do not suggest a particularly strong relationship between the two variation parameters (Barkema-based measure r Pearson = : −0.416; entropybased measure: r Pearson = 0.334). However, the results can only roughly be compared with each other because of the different underlying measurements of flexibility and familiarity. To sum up, the corpus-linguistic results for tree-syntactic flexibility suggest that (i) there is no stable correlation between this flexibility type and compositionality; (ii) syntactic flexibility, just like compositionality, is a gradable phenomenon, ranging from completely frozen to (nearly) perfectly flexible constructions; and (iii) tree-syntactic flexibility should not be reduced to passivizability, since V NP-constructions also exhibit differences on other tree-syntactic variation parameter levels, most notably with respect to their occurrence in different sentence types.
4.4 Lexico-syntactic flexibility (LF) 4.4.1 Application With respect to syntactic flexibility, this variation parameter is obligatorily instantiated by any token in the present data sample. Moreover, it is possible to classify any instantiation as one, and only one, combination of sentence type, voice, and main or subordinate clause. In other words, it forms a closed paradigm. Lexico-syntactic flexibility, on the other hand, is a qualitatively different concept because we can understand it in two different ways. First, at a more general level, lexico-syntactic flexibility is something that is either present or absent. Secondly, lexico-syntactic flexibility comes in various forms, such as attributive adjectives, nouns, prepositional phrases, relative clauses or adverbials. However,
92 Rethinking Idiomaticity
Figure 4.2 Mean relative entropy values (H ) for tree-syntactic flexibility (SF).
93
Flexibility measures
Table 4.10 Variation parameters comprising lexico-syntactic flexibility (LF). Code
Variation parameter
Parameter levels
Addition AttrAdj AttrNP PP RelCl NoAdv
added material attributive adjective for NP attributive noun for NP post-modifying PP for NP post-modifying relative clause number of adverb(ial)s
0 (absent)/1 (present) interval interval interval interval interval
as opposed to syntactic flexibility, these different realizations of lexico-syntactic flexibility do not form a closed paradigm; they are principally independent from one another such that the presence of one does not condition the presence of another form, and several realizations may be present simultaneously (as in He broke new ground yesterday, which comprises both an attributive adjective and a time adverbial). In accordance with these differences, lexico-syntactic flexibility is not operationalized as a single unitary variation parameter with different parameter levels, but as a meta-concept that comprises several variation parameters. How many and which parameters to include is again determined exploratively, that is, rather than subjecting the data to a set of given parameters, a bottom-up approach is adopted. The result is shown in Table 4.10. For each variation parameter, it is coded how often it is present in one instantiation. If, say, no attributive adjective is present, the variation parameter LF AttrAdj is coded with 0. If one adjective is present, it is coded as 1, if there are two, with 2, and so on. In other words, the variation parameters are coded at an interval scale. Another variation parameter simply provides the information whether any kind of material was added to the construction or not (referred to as LF Addition); it is coded as a nominal variation parameter comprising two parameter levels. The following are examples from the data sample. (39) (a) (b) (c) (d) (e)
He would tell the rudest stories out loud when he knew that Joan Sims was within earshot. (adjective modifying the noun phrase) I will tell you a horror story. (noun modifying the noun phrase) The runner has gone into it and told the story of the battle. (prepositional phrases modifying the noun phrase) A story was told whose smear value demands immediate publication. (relative clauses modifying the noun phrase) Now it is important for me to tell the story correctly. (adverbial)
In order to license a further distinction of the different kinds of adverbials that are involved, the variation parameter LF NoAdv was additionally coded as LF KindAdv, a nominal variation parameter with eight different parameter levels (adapted from Crystal 1996:170f.); consider the examples in (40) and Table 4.11 for an overview.
94
Rethinking Idiomaticity
Table 4.11 Parameter levels of LF KindAdv. Code
Type
Information denoted
0 1 2 3 4 5 6
no adverb(ial) adverb modifying N space adverbial time adverbial process adverbial respect adverbial contingency adverbial
–
7 8
modality adverbial degree adverbial
(40) (a) (b) (c) (d) (e) (f) (g) (h) (i)
position, direction, distance position, duration, frequency, relationship manner, means, instrument, agent ‘being concerned with’ cause, reason, purpose, result, condition, concession emphasis, approximation, restriction amplification, diminution, measure
I will tell you my story. (no adverb/adverbial present) I could tell quite a story. (adverb modifying the noun phrase) People tell me stories on the doorsteps. (space adverbial denoting position) The story has often been told. (time adverbial denoting frequency) The story is told in seven episodes each covering a day. (process adverbial denoting manner) He told them a story about his own children when they were very young . (respect adverbial) I tell this story for two reasons. (contingency adverbial denoting purpose) She really is telling this story. (modality adverbial denoting emphasis) . . . and then have the Oracle tell the story of the Poison Feast from the Drachenfels novel in full . . . (degree adverbial denoting measure)
4.4.2 General lexico-syntactic flexibility Figure 4.3 provides the results for the lexico-syntactic variation parameter Addition, which gives a first and very general impression of lexico-syntactic flexibility in that it simply measures if any material (be it one or more attributive adjectives, relative clauses, adverbials, etc.) modifies the V NP-construction in question. As can be derived from Figure 4.3, looking at lexico-syntactic flexibility from this very general perspective already indicates that V NP-constructions differ substantially in the extent to which they allow for modification. Constructions like carry X weight, make X headway, write X letter , etc. do not differ at all from the flexibility value determined for the baseline with respect to this variation parameter. Indeed, the majority of V NP-constructions in the present data sample do not deviate strongly from the baseline, since only six of them yield NSSD values higher than 0.5. However, some constructions deviate strongly from the baseline, and a look at the algebraic signs of the parameter level-specific results provided in Table E.2 (Appendix E) confirms that these deviations stand for
m rry_ ak w e_ ei h e gh wr ad t ite wa _ y dr lett aw e r le _ av lin e_ e t m be ell_ ark g_ st q or fig ues y h tio m t _ba n ak tt e_ le ta mar ke k s e _pi e_ s s c l po ch ose int an _d br ge oor e a _h k_ an gr d pa o u n v t a e_ d ke w _ a pl c o u y ay r s _g e f o am e o ta t_bi ha ke_ ll r de v e _ l oot liv au e g m r_g h ak oo c r e_p d os o s in cr _fin t o s ge s r m _m i ak n e d be _ f a ar ce ha _fru v i c a e_c t ll_ lu p e ca oli ta tch ce k e _e _p y e lu n fit ge _ m b br eet ill ea _e k y ge g _he e t _ r i ar a c t _t t t_ o sc tog oth ra e t tc he h_ r h do ead ho _ ld tric _ fo bre k llo at w_ h su it
NSSD 0.50
0.40
0.30
Flexibility measures
ca
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.3 Normalized sums of squared deviations from baseline (NSSD) for the lexico-syntactic flexibility parameter Addition.
95
V NP-construction
96
Rethinking Idiomaticity
limited lexico-syntactic flexibility, licensing few (grit X tooth, scratch X head, hold X breath) or no additional modifiers (follow X suit). Metaphorical V NP-constructions involving human body parts, of which there are several in the present data sample, seem to cluster at the lower end of the ranking, that is, they are generally less flexible than the other constructions: cross X finger (0.168), cross X mind (0.185), make X face (0.22), catch X eye (0.336), meet X eye (0.356), break X heart (0.391), grit X tooth (0.501) and scratch X head (0.611) all obtain rankings of position 23/39 or lower. A possible explanation might be that the continuously strong presence of the original, literal meaning that motivates the metaphorical mapping (all these constructions are quasi-metaphorical idioms in Cacciari and Glucksberg’s (1991) terminology such that the literal referent is itself an instantiation of the idiomatic meaning) restricts the number of possible modifications concerning the idiomatic meaning. According to this hypothesis, quasi-metaphorical idioms could actually turn out to be less lexicosyntactically flexible than analysable-transparent/analysable-opaque constructions on the one hand, as well as non-analysable idioms on the other. There is also a more general fact to be derived from Figure 4.3 which indicates that just as with tree-syntactic flexibility, the overall correlation between the readiness of a construction to accept additional material and its compositionality is relatively weak (r Pearson = −0.3). For instance, constructions like take X piss, foot X bill, or take X root are all far from being perfectly compositional, yet this does not appear to restrain their overall lexico-syntactic flexibility. A comparison of the construction ranking as well the NSSD values obtained for this lexico-syntactic flexibility parameter and those for tree-syntactic flexibility in Figure 4.1 reveals that these two parameters do not overlap at all. To give some examples, carry X weight, which is the most flexible construction in terms of lexico-syntax, ranks only at position 14 in the tree-syntactic ranking, and it also deviates considerably from the baseline, yielding an NSSD value of 0.847. The same holds for beg X question: while at rank 7 in the lexico-syntactic ranking with a negligible deviation from the baseline of 0.004, this construction obtains position 23 in the tree-syntactic ranking, and it nearly hits the ceiling in terms of its deviation from the baseline with a value of 0.928. Also, while the NSSD values for call X police of 0.284 for lexico-syntactic flexibility and 0.216 for tree-syntactic flexibility do not appear to make much of a difference, the respective ranking positions show that they actually do: for lexico-syntactic flexibility, it ranks at position 29 – for tree-syntactic flexibility, it ranks at position 1, that is, it is the most versatile construction in terms of its tree-syntactic variation potential. This comparison not only emphasizes the non-compatibility of the two parameters, but also shows that the most adequate picture of a construction’s overall flexibility is only achieved by considering both its flexibility index values and the rank it obtains relative to other constructions. Let us now turn to the entropy-based results for Addition (recall that for all lexico-syntactic flexibility parameters except for KindAdv, the directional entropy-value is used instead of the regular relative entropy value) as shown in Figure 4.4.
H(dir) -0.10 0.00
-0.20
-0.40 -0.30
-0.50
Flexibility measures
br pav e a e_ k_ wa f gr y m igh oun ak t _ d e _ ba he ttl BA adw e ca SE ay rr LI y_ N wr we E ite igh _ t dr lett a e le w_ r av l i e _ ne t m be ell_ ark g_ st q or m ues y ak tio e_ n ta mar ke k s e _pi e_ s s cl p ch ose oint an _ d g o t a e_h or ke a _ nd pl c o u ay r s _g e f o am e o ta t_b i ha ke_ ll r de v e _ oot liv lau e g m r_g h ak oo c r e_p d os o s in cr _fin t o s ge s m _m i r ak n e d be _ f a ar ce ca _fr ll_ ui p t ca oli ta tch ce k e _e _p y e lu n fit ge m _bi br eet ll ea _e k_ ye gr h e a it_ rt ge h to t _ av ot ac e_ h t_ c sc tog lue ra e t tc he h_ r he d ho o_ ad ld tri _ c fo bre k llo at w_ h su it
1.00
0.90
0.80 0.70
0.50
0.60
0.40
0.30
0.20
0.10
-0.60
-0.80 -0.70
-0.90
-1.00
Figure 4.4 Directional entropy values (Hdir ) for the lexico-syntactic flexibility parameter Addition.
97
V NP-construction
98
Rethinking Idiomaticity
The results produced by the entropy-based measure are highly compatible with what can be derived from the Barkema-based measure (r Pearson = 0.955; cf. Section 4.6.1): the majority of constructions do not deviate significantly from the baseline, yet a few are either considerably less flexible (follow X suit, hold X breath, do X trick and scratch X head are the least flexible also in the entropy-based ranking), and some are also (substantially) more flexible (such as break X ground and pave X way, the latter having obtained a less extreme ranking according to the Barkema-based measure). Likewise, the entropy-based measure accords with the Barkema-based measure with respect to the weak relation between lexicosyntactic flexibility and compositionality (r Pearson = −0.016). In sum, the results of the two flexibility measures may be taken to suggest that if we consider only the presence or absence and quantity of modifying material of any kind, V NP-constructions tend to behave very similarly overall. However, as we will see below when turning towards more specific lexico-syntactic variation parameters, this impression is deceptive.
4.4.3 Flexibility of the NP-slot Let us first turn to the variation parameter AttrAdj, which measures each construction’s flexibility in terms of how often it takes attributive adjectives. The results for this variation parameter are displayed in Figure 4.5. As Figure 4.5 shows, the results tie in well with the general results obtained for lexico-syntactic flexibility: most constructions do not significantly deviate from the baseline at all. Overall, the NSSD values suggest that most constructions are even very similar in their behaviour compared to the baseline since the large majority of constructions yield values around or below 0.1. In other words, this variation parameter does not have a high discriminatory potential for ranking the different constructions according to their lexico-syntactic flexibility. Particularly eye-catching in Figure 4.5 are the values for fight X battle and break X ground, which exhibit the highest deviations from the baseline, with break X ground even yielding an NSSD value of 1. While these values are correct in the sense that they can be interpreted as deviations from the baseline, they also provide primary examples of the limitations of the information we can derive from the NSSD values alone. A closer look at the parameter level-specific data shows that the two constructions deviate strongly from the baseline not in the sense that they do not allow for attributive adjectives – in contrast, they license attributive adjectives much more than the baseline (or any of the other V NP-constructions, for that matter). That is, for the baseline sample, we find that the large majority of the tokens (namely 865 out of 1,151) do not take any attributive adjective; 265 tokens take one adjective, 20 take two, and there is one example of a V NP-construction with three prenominal adjectives. For fight X battle, in contrast, we find that it occurs with an adjective 120 out of its 192 occurrences, and for break X ground, the numbers are even more drastic (126/133). We may conclude from this that these two constructions are more versatile than the baseline. However, a closer look at the data reveals that this
NSSD 0.50
0.40
0.30
Flexibility measures
wr it l e e_l e av t t be e _ e r g_ m qu a r k e te stio ll_ n m sto ak r e y m pl a _ f ac ak y_ e e _ ga h m ha ead e ve wa _ y dr lau aw gh m _ ak lin t a e_m e ke a _c rk o m urs ee e be t _ e ar ye _ fo frui m ot_ t ak b e il c l _po l os i n e t ha _ do v e or c a _c l l br l_po ue ea li ta k_h ce k e ea _p r t c l un de atch ge liv _ er ey ge c r o _go e t _ s s od ac _ t_ mi t n ho oge d ld th s c _b e r ra re tc at h_ h he a fo fit_ d llo bi w ll p a _s cr ve_ uit os w s_ ay fin do ge r gr _tri it_ ck se too e_ t h ta poin ke t t _p c h ake i s s an _r c a ge_ oot r r ha y_ n fig we d br ht _ i gh ea b t k_ att gr le ou nd
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.5 Normalized sums of squared deviations from baselines (NSSD) for the lexico-syntactic flexibility parameter AttrAdj.
99
V NP-construction
100
Rethinking Idiomaticity
versatility is, in turn, highly restricted to particular adjectives only: fight X battle is mostly realized as fight a losing battle as in (41a), and sometimes with other (nearly always negative) adjectives; examples are given in (41b)/(41c). Break X ground is nearly exclusively instantiated as break new ground as in (42a), so an example like (42b) is a rare exception. (41) (a) (b) (c) (42) (a) (b)
The windscreen wipers sounded asthmatic, fighting a losing battle against the insistent rain. But a press backlash did get into gear, leaving a handful of diehard music writers to fight a bitter battle. It is to emphasize that Papert is fighting major battles over the nature of the relationship between computers and education. Her work is difficult and always tries to break new ground Abbey National continues to break ground fast in the service and products offered to some ten million investors and two million borrowers . . .
In other words, the high deviations that we obtain with our flexibility measure actually point to considerable frozenness, but not at the level of V NP, but at a more specific level of schematization. Note that this does not invalidate the flexibility measure per se (it does point out, however, that the results it produces need to be interpreted with caution). As a matter of fact, the measure may be used to identify the most adequate level of schematic representation of constructions: if a construction deviates so strongly from the baseline for any given parameter, this most likely points towards the fact that the level of V NP is simply too coarse-grained to represent the schematization of the construction in question. For fight X battle, a more adequate representation is fight DETindef ADJ battle; for break X ground, the results suggest that it can be represented as a fixed phrase, break new ground. All in all, these results demonstrate that idiomatic constructions can hardly be binned in neat categories: in terms of schematization, fight DETindef ADJ battle is located somewhere between V NP constructions and V ADJ NP constructions; break new ground more unambiguously belongs to the level of V ADJ NP, yet occasional attestations without an adjective testify to the permeability of the different layers of schematization. In Chapter 7, I outline an elaborated version of the constructicon that enables us to represent these different shades of schematization. According to Figure 4.6, many constructions are indeed less flexible than the baseline, but the spread of the values is notably higher than is suggested by the NSSD values of the Barkema-based measure, occupying the full range from −1 (take X root, do X trick, etc.) to 0 (take X course: −0.096) to clearly positive values (break X ground: −0.703, call X police: −0.691). In sum, one can say that the lexico-syntactic flexibility variation parameter AttrAdj does not have particularly high general discriminatory potential, and that V NP-constructions are generally reluctant to take attributive adjectives. Some constructions, however, decidedly depart from this general trend by either predominantly occurring with attributive adjectives (break X ground, fight X battle) or not licensing any at all (take X piss, have X clue, grit X tooth, etc.).
k_ fig gro c a ht _ u nd r r ba y t ta _we tle ke i g _ h c ht m av e o u r ak _ se e_ la h e ug p l adw h ay a m _ga y be ake me g_ _ f q a le ues ce av tio e_ n dr ma aw r k m _ ak l i e _ ne m mar ee k be t _ e ar ye _ f fru m oot_ it ak b e i BA _po ll SE in t te LIN wr ll_s E ite tor c l _l e y os tte c a e _d r l o br l_p or ea oli ta k_h ce ke e _p a r t c lu de a t c nge liv h_ e ey ge cr r_g e t _ o s oo ac s_ d t_ m i ho toge nd l t s c d_b h e r ra re tc at h_ h he a fo fit_ d llo bi w ll p _ ch ave suit an _w g c r e_ ay o s ha s_ nd fin do ge r gr _tri it_ ck ha too ve t h se _clu e_ e ta poin ke t t a _pi ke s s _r oo t
ea
H(dir) -0.10 0.00
-0.20
-0.30
-0.40
-0.50
Flexibility measures
br
1.00
0.90
0.80 0.70
0.50
0.60
0.40
0.30
0.20
0.10
-0.60
-0.70
-0.80
-0.90
-1.00
Figure 4.6 Directional entropy values (Hdir ) for the lexico-syntactic flexibility parameter AttrAdj.
101
V NP-construction
102
Rethinking Idiomaticity
Turning towards the next lexico-syntactic flexibility parameter AttrNP, consider the results in Figure 4.7. Figure 4.7 is slightly different from the one for the variation parameter AttrAdj: the majority of constructions yield the highest possible NSSD value. In combination with the results obtained from the entropy-based flexibility measure for this variation parameter (Figure 4.8), we find that most V NP-constructions, as opposed to the baseline, do not take any attributive noun phrases whatsoever, and that all V NP-constructions in the present data sample take less attributive noun phrases than the baseline sample does. However, the directional entropy value of the baseline is also lower for AttrNP (−0.706) than for AttrAdj (−0.546); considering the baseline’s overall Addition flexibility value of 0.222, we can conclude that the baseline is relatively frozen with respect to taking attributive noun phrases. The constructions that actually take attributive noun phrases are mostly highly compositional (such as close X door , tell X story, call X police, write X letter ). Given that the large majority of the constructions achieves directional entropy values of −1, the overall correlation between compositionality and AttrNP is again rather weak for both measures (Barkema-based measure: r Pearson = −0.393; entropy-based measure: r Pearson = 0.384). Summing up, the results unanimously testify to a general tendency against taking attributive noun phrases. As a matter of fact, the tendency against twopartite nominal compounds in English has been noted by several authors (Berg and Helmer 2006), so this result ties in with this general tendency.6 So far, we have seen that the large majority of V NP-constructions behave rather conservatively with respect to prenominal modification. In order to see to what extent this also holds for postnominal modification, consider, first, the results for the variation parameter PP displayed in Figure 4.9 and Figure 4.10. This variation parameter measures if and how often the V NP-constructions occur with a prepositional phrase (the two flexibility measures again produce highly compatible results: r Pearson = 0.728). While the Barkema-based measure suggests that V NP-constructions spread a little more than with respect to the prenominal flexibility parameters, the picture that emerges from the entropy-based measure is again one of relatively high frozenness. This holds not only for the majority of V NP-constructions, but also for the baseline (−0.563). The only constructions deviating at least somewhat from this general tendency are foot X bill, beg X question, see X point, write X letter , take X course, catch X eye and make X point – which makes sense from a semantic point of view since most of them normally imply a beneficiary or recipient, even if it is (obviously) not always overtly expressed. As with fight X battle and break X ground, it appears that these constructions have strong ties to the level of V NP PPbeneficiary/recipient , which should be represented accordingly in the constructicon. The results furthermore illustrate that what are labelled here as V NP, V Adj NP, or V NP PP constructions are actually overlapping layers of representation in the constructicon. The flexibility measure(s) presented here express this variation quantitatively. The correlation between PP and compositionality is also weak again (for the Barkema-based measure: r Pearson = −0.118; for the entropy-based measure: r Pearson = 0.201).
os e dr _ do aw or pl _l ay in fig _ga e ht m _b e te att ll_ le s f o t or ca ot_ y l l bi wr _po ll l de ite_ ice liv let e te le r_g r a v oo e_ d ca ma ho tch rk l d _e y _b e ge re a t_ ac fit th t _ _b to il g l ha eth v e er b _c be ear lue g _ br _qu frui e a es t k_ tio br g r o n ea un ch k_ d an he g a c r e_h r t os a s_ nd fin do ge r f o _t r llo ic w k m m a _su ak ke it e_ _ f h e ac m adw e ak a m e_m y ak a e_ r k m po i n ee t p t_ sc ave eye ra _w tc a h_ y se hea e_ d ta poin ke t t a _pi c a ke s s r r _r y_ oo cr w e t o s i gh s_ t gr min it_ d ha to ve o ta _la th ke u g t a _co h ke ur _p se lu ng e
NSSD 0.50
0.40
0.30
Flexibility measures
cl
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.7 Normalized sums of squared deviations from baseline (NSSD) for the lexico-syntactic flexibility parameter AttrNP.
103
V NP-construction
-0.10 0.00
-0.20
-0.30
-0.40
-0.60 -0.50
-0.70
Rethinking Idiomaticity
os e dr _ do aw or pl _l ay in BA _ga e S m fig EL e ht INE _b te att ll_ le s f o t or ca ot_ y l l bi wr _po ll i de te_ lice liv le e tte le r_g r a v oo ho e _ m d ld a _b r k ca rea tc th b h_ be e a r e y e g _ br _qu frui e a es t k_ t i br g r o on ea u c a k _ nd r r he ch y_w art an ei g g c r e_ ht o s ha s n cr _fin d o s ge s_ r m do ind _t ric ge fo fit_ k t_ llo bi ac w l l t _ _s to ui g t gr eth it_ er ha too ha v e _ t h ve clu _ e m ma laug ak ke h e_ _ f h ac m ead e ak wa m e_m y ak a e_ r k m poi n ee t p t_ sc ave eye ra _w tc a h_ y se hea e t a _p d ke o _ c i nt o t a urs ta ke_ e ke p _p i s s t a l ung ke e _r oo t
H(dir) 0.90 0.80
104
cl
1.00
0.70
0.60
0.50
0.40
0.30
0.20
0.10
-0.80
-0.90
-1.00
V NP-construction
Figure 4.8 Directional entropy values (Hdir ) for the lexico-syntactic flexibility parameter AttrNP.
ke _c o t e urs ll_ e se stor e_ y ca poi be t c nt g _ h_ q ey m ues e ak tio c a e_ n r r po y_ in pl wei t ay gh _ t be gam a br r _ e ea fru k ha _h it v e ear cl _lau t os g fig e_d h ht oo c r _ba r os tt s_ le dr m i aw nd ge _ t_ me lin ac e t e t _ _e to ye ge do ther _t ri fit ck ha _b i l e v e_c l l av lu e_ e m ma a ta ke_ rk k br e_p face ea l u k n de _gr ge liv ou e r nd _g f o ood o ca t_ l b ho l_po ill ld lic c h _b e an re ge at h _ ta han cr ke_ d os p s_ is fo fin s llo ge w r m g r _su ak it_ it e_ to h e ot m adw h ak a e_ y p m s c av e a r k ra _w tc a h_ y ta hea ke d w r _r ite oo _l t et te r
NSSD 0.50
0.40
0.30
Flexibility measures
ta
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.9 Normalized sums of squared deviations from baseline (NSSD) for the lexico-syntactic flexibility parameter PP.
105
V NP-construction
H(dir) 0.90
0.80
106
-0.10 0.00
-0.20
-0.40 -0.30
-0.50
-0.60
-0.70
Rethinking Idiomaticity
be f oo g_ t_ q u bi l e l se stio w r e_p n it oi t a e _ l nt ke ett _c e r c a ou r s m tch_ e ak e y e BA _po e ca SE int rr LI y_ N w E te eigh ll_ t pl st ay o r _ y be gam a br r _ e ea fru ha k_h it v e ea r cl _lau t os g fig e_d h ht oo c r _ba r os tt s_ le dr m i aw n d ge _ t_ me lin ac e t e t _ _e to ye ge do ther _t ric le fit_ k av b e il m _ma l ak r t a e_ k k br e_p face ea l u k n de _gr ge liv ou er n c a _go d ll_ od ho po l ch d_b lice an re g a c r e_h t h os a s_ n fo fin d llo ge w r g r _su it_ it m ha too ak v e t h e_ _ c h lu m ead e ak wa e_ y p m s c av e a r k ra _w tc a h_ y ta hea ke d t a _pi ke s s _r oo t
1.00
0.70
0.60
0.50
0.40
0.30
0.20
0.10
-0.80
-0.90
-1.00
V NP-construction
Figure 4.10 Directional entropy values (H dir ) for the lexico-syntactic flexibility parameter PP.
Flexibility measures
107
Considering the second postnominal modification parameter included in the present study, RelCl, the two measures are even more unanimously suggesting a relatively high general frozenness of V NP-constructions on this parameter: in terms of the Barkema-based measure, hardly any V NP-construction deviates from the baseline by more than 0.2. Looking at the entropy-based results, we find that first, this deviation is negative such that if at all, constructions tend to be less flexible than the baseline. Moreover, more than half of the V NPconstructions yield the lowest possible index value of −1 on the entropy-based scale, that is, they are completely frozen with respect to this parameter; and even the baseline obtains an index value that indicates clearly limited flexibility (−0.701). Despite this relative uniformity, however, it is worth pointing out that the correlation between RelCl and compositionality is higher than for the other variation parameters considered so far (Barkema based measure: r Pearson = 0.518; entropy-based measure: r Pearson = 0.342). According to the Barkema-based value, write X letter appears to be an exception – looking at the magnitude of the variation as given by the entropy-based measure, however, one can see that write X letter , while indeed being the most flexible of all constructions, yields a rather low directional entropy index value of −0.55). With respect to the ranking of the individual constructions, the two measures do not appear to coincide as much as for the other variation parameters – the overall correlation between the two measures only amounts to 0.53. However, leaving out write X letter , the correlation is fairly high (r Pearson = 0.733). In summary, the results obtained so far indicate that V NP-constructions are generally less flexible than the baseline with regard to both prenominal and postnominal modification. Moreover, the baseline itself obtains rather low flexibility values. In other words, regarding their NP-slot, V NP-constructions of any kind, be they conventionalized or not, appear much more often in their basic, non-modified versions than in a variant form which includes an adjective, noun phrase, prepositional phrase or relative clause. This tendency is even more extreme for conventionalized V NP-constructions like the ones in the present data sample. Some of the examples discussed in more detail above run counter to this general trend; these constructions occupy the space between different levels of schematization in the constructicon. The flexibility measures weigh the degree to which a particular construction deviates from the baseline and other constructions in the data sample, so they essentially provide a quantitative assessment of the cognitive-functional notions of fuzzy and overlapping category boundaries. Let us now turn towards the results for the flexibility of the verbal slot in order to see to what extent the results overlap.
4.4.4 Flexibility of the verb-slot Figures 4.13 and 4.14 display the results for the lexico-syntactic parameter NoAdv, which measure for how often a V NP-construction occurs with an adverb.
NSSD 0.90
108
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
se t a e_p ke oi _ c nt o t e urs ll_ e m sto de ee ry liv t_e e y m r_g e ak oo e_ d dr p oi aw nt ha _l i v n fig e_c e ht lu _ e p ba br lay_ ttle ea g k _ am gr e m follo oun ak w d e _ _s he u cr adw it os a m s_m y ak in e d cl _m os a r e_ k c do be atch or g_ _ q e ho u e s y e ld tio _ n ca bre ll_ ath p b ol c a e ar i c e r r _f ch y_w ruit an ei g g c r e_h ht os a s_ nd fin do ge r _t fo rick ot gr _b i i le t_to ll av o e th m _ma a sc ke rk ra _f tc ac h_ e ta hea k br e_ d ea r o k _ ot he ge t_ a ac fit rt t _ _b to il ha ge t l v e he _ r pa laug t a v e_ h ke w _p a y l t a ung wr ke_ e ite pis _l s et te r
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.11 Normalized sums of squared deviations from baseline (NSSD) for the lexico-syntactic flexibility parameter RelCl.
g_ q w r ues ite tio BA _le n SE tter se LIN t a e_p E ke o _ c i nt ou m de ee rse liv t_ e e m r_g ye ak oo e_ d dr p o aw int te _lin fig ll_s e ht t o r pl _ba y ay ttl _ e m foll gam ak o w e e_ _ h su cr ead it os wa m s_m y ak i ho e _ m nd ld a _ r cl bre k os a t e h c a _do t c or h br be a _e y e a r_ e k_ fr br gro uit ea u k n ca _he d ca ll_p art rr o ch y_w lice an ei g g c r e_ ht o s ha s_ nd fin do ge r _t ri fit ck ge t _ f o _bi a c ot l l t _ _b to il g l gr eth it_ er ha too ha v e _ t h ve clu le _la e a v ug e h m _ma ak r e k p _f sc ave ace ra _ tc wa h_ y ta hea ta ke_ d ke p _p i s s t a l ung ke e _r oo t
H(dir) -0.10 0.00
-0.20
-0.30
-0.40
-0.50
Flexibility measures
be
1.00
0.90 0.80
0.70
0.60 0.50
0.40
0.30
0.20
0.10
-0.60
-0.70
-0.80
-0.90
-1.00
Figure 4.12 Directional entropy values (H dir ) for the lexico-syntactic flexibility parameter RelCl.
109
V NP-construction
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
os m e _d ak oo ch e_ r an m ge a r _ k ta han ke d fig _r de ht _b oot li a br ver ttle e a _g k _ oo gr d s ou c a e e _ nd r r po y_ in w t ta eigh k cr e_ t o s pi s wr s_m s ite in l _ d m eav lette ak e _ r e_ m he a r a k te dwa ll_ y dr sto aw r y _l in ca fit_ e ll_ bil p l ha oli t a v e_ c e ke cl _ u pl pl un e be ay_ ge g _ ga q m m ues e ak tio e_ n gr p oi it_ nt ha to v o cr e_la th os u s g br _f i h e a ng k_ er be h e a t a ar_ r t ke fru _ i g e c c ou t t _ at r s ac ch e t _ _e to ye ge t sc mee her ra t _ tc ey h_ e h do ead m _t r a i ho ke_ ck l d f ac _b e pa rea ve t h _ fo way fo ot_ llo bi w_ ll su it
NSSD 0.90
110
cl
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.13 Normalized sums of squared deviations from baseline (NSSD) for the lexico-syntactic flexibility parameter NoAdv.
ve _ do way _ dr tri aw c k ta _lin k BA e_p e SE iss m L ak IN e_ E m clos ma ak e r k e _d br _ h e oo ea ad r k_ wa l e gro y av un e_ d m c h t ake a r k an _r g oo c r e_h t os an s_ d s m ca ee_ ind r r po de y_w int liv ei e r gh _g t o cr fit od o s _b s i fig _fin ll ht ge c a _ba r ll_ ttl pl pol e ay i c w r _g e a be ite_ me g_ l e q tt ha u e s e r ve tio _ n be lau g t a ar_ h ke fr _c uit gr ou r it_ se ha too v t ge br e e _ c h t_ ak lu ac _h e t _ ea to r ge t t t e sc ll_ her ra st tc or h_ y m hea ta eet d ke _e _ y m pl u e ak n e _ ge c a po i tc nt m h_e a ho ke_ ye ld fa _b ce r f o e at h fo ot_ llo bi w_ ll su it
H(dir) 0.00
-0.10
-0.20
-0.30
-0.40
Flexibility measures
pa
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
-0.50
-0.60
-0.70
-0.80
-0.90
Figure 4.14 Directional entropy values (H dir ) for the lexico-syntactic flexibility parameter NoAdv.
111
V NP-construction
112
Rethinking Idiomaticity
According to the Barkema-based results, the V NP-constructions spread considerably. A look at the entropy-based results also confirms that they do: the constructions occupy a wide range of values from pave X way (0.675) to make X mark (0.327), see X point (−0.242), have X clue (−0.428), make X face (−0.533) and follow X suit (−0.738); accordingly, the correlation between the two measures is fairly high (r Pearson = 0.612). The vast majority of constructions, however, yield values below the baseline, and the biggest share of these, in turn, is even considerably less flexible than the baseline. So with regard to the lexico-syntactic flexibility of the verbal slot, we find a much stronger discrepancy between baseline and V NP-constructions. The positive index value of 0.336 for the baseline furthermore evidences that the baseline is not only considerably more flexible than the majority of the V NP-constructions, but also flexible in absolute terms. This finding clearly underscores the importance of considering the aspect of external alongside internal modification. The difference between the baseline and the V NP-constructions in the sample suggests that differences in idiomaticity have a considerable effect on the absence or presence of adverbials, which are often regarded as so nonobligatory and remote from the pattern they are attached to that they form the context of the idiom rather than part of it. The evidence presented here suggests the contrary. Despite this difference in behaviour as compared to the flexibility of the NPslot, the correlation of NoAdv with compositionality is similarly low (Barkemabased measure: r Pearson = −0.165; entropy-based measure: r Pearson = −0.029). This is clearly expectable since the scope of most adverbials covers the whole construction rather than part of it. As to the question how exactly the V NP-constructions deviate from the baseline, we can again turn to the parameter level-specific results provided by the Barkema-based method; cf. Table E.7 in Appendix E. The strongest deviations are found when comparing the number of V NP-constructions accompanied by one or two adverbials with the number of corresponding occurrences of the baseline. There are considerably more instantiations of these in the baseline data. Looking at the frequencies of adverbial clusters comprising more than two adverbials, we find that the V NP-constructions do not differ strongly from the baseline because these items are less frequent in general. A similar picture can be gathered from Figures 4.15 and 4.16, which display adverbial flexibility in terms of how many different kinds of adverbials the different constructions occur with. As Figure 4.15 indicates, once the aspect of adverbial flexibility is narrowed down to the question how many different kinds of adverbials one can observe with any construction, the V NP-constructions in the present data sample turn out slightly less flexible: the majority of NSSD values is smaller than 0.2 (compare with Figure 4.14). We can specify this result by looking again at the parameter level-specific results provided in Table E.8 (Appendix E). Comparing the different columns (which represent adverbial classes, cf. Table 4.11 for an overview),
NSSD 0.50
0.40
0.30
Flexibility measures
cl o de se_ liv do e o m r_g r ak oo fig e_m d ht a r w r _ba k i br te ttle e _ m ak_ lette ak g r e_ rou h e nd a te dwa ll_ y ca st ll_ or p y ta olic ta ke_ e k r ca e_p oot r r l un y_ g m we e ak i g e _ ht po i fit nt gr _b i cr t_t ill o s oo s_ th pl fin ay ge c r _ga r os m br s_m e ea in k ha _h d e c h v e _ ar t an lau ge gh t a _h ke an _c d c a our t s ge b ch_ e t _ ea ey ac r_ e t_ fr to ui g t m ethe ee r t_ be take eye g _ _p qu i s s ha estio s c ve n ra _c tc lu h_ e h do ead m _t r ak i c l e e_f k av a e_ ce se mar e_ k dr poi ho aw_ nt ld lin _b e r f o e at h fo ot_ llo bi w ll p a _su ve it _w ay
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.15 Normalized sums of squared deviations from baseline (NSSD) for the lexico-syntactic flexibility parameter KindAdv.
113
V NP-construction
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
ak e br _ h e e a ad ch k_g way an r o ge u n _ d ta han k e fig _ d h ro m t _ba ot ak t e_ tle m t a c a ke ark r r _p de y_w iss liv ei e g cr r_g ht o s oo ta s_m d ke i _ n BA plu d SE nge cl LI os N l e e _d E av o e or dr _ma aw r k gr _l i it_ ne ha too v t c a e_c h ll_ lu po e li ha fit_ ce v e bi c r _l l l o a be ss_ ugh g_ fin q g br ues e r ea t i k _ on s e hea e_ r be p o t i a ge w r r _ f nt t_ ite ru ac _ it t_ let t t t a oge e r k sc e_c ther ra ou tc rs h pl _ h e e ay a _g d do am e _ te tric ll_ k m stor ee y c a t _e t m ch_ ye ak e e y m _po e ak i n e t pa _ f a ho ve_ ce ld w _ a fo bre y llo at w_ h f o sui ot t _b ill
H 0.90
114
m
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.16 Relative entropy values (H ) for the lexico-syntactic flexibility parameter KindAdv.
Flexibility measures
115
it is obvious that the V NP-constructions in the present data sample are most conservative about taking space, process and contingency adverbials, while they are moderately flexible with regard to time, modality, degree and respect adverbials. This general tendency can be made sense of straightforwardly: concepts like time, modality, degree, and respect are semantically much wider in scope than space, process or contingency adverbials, so they can be expected to be applicable to a wider array of (V NP-)constructions. Compositionality, on the other hand, does not appear to play a very prominent role in this regard, either (correlation with Barkema-based measure: r Pearson = −0.282; with entropy-based measure: r Pearson = −0.063). Looking at the entropy-based results (which correlate only moderately with the Barkema-based results; r Pearson = 0.587), we can see that a number of constructions actually yield fairly high entropy values (make X headway: 0.75, break X ground: 0.727, change X hand: 0.676, etc.). Note how the semantics of these constructions call for an adverbial: they all express a major change in the development of some process or transaction which requires to be made explicit; as noted above, this again provides a hint that V NP is an underspecified level of representation for these constructions; their core representation is much rather one which includes a default slot for this process/transaction. The majority of constructions are less flexible than the baseline and, overall, moderately flexible. As with all the other lexico-syntactic variation parameters, there is no systematic relation between the number of different adverbials a construction takes and its compositionality (Barkema-based measure: r Pearson = −0.282, entropy-based measure: r Pearson = −0.063). Summing up the results obtained for lexico-syntactic flexibility, two findings are of particular relevance. First, it has become obvious that V NP-constructions differ more conspicuously on the parameters of verbal flexibility than on that measuring the flexibility of the noun slot. More precisely, with respect to flexibility of the verbal slot, the range of values that constructions take on an index from least to fully flexible is higher, and more constructions yield positive values. With respect to prenominal and postnominal modification, in contrast, V NP-constructions are generally rather conservative, since on most parameters, their index values are below 0 (on the entropy-based index scale). However, it has to be noted that the baseline is also considerably higher with respect to verbal flexibility as compared to nominal flexibility, so despite the fact that the V NP-constructions are more flexible on the verbal parameters as opposed to the nominal ones, they are less flexible on the verbal parameters when being compared to the baseline. Taken together, these observations suggest that there is more variation in the verb slot compared to the noun slot of V NP-constructions. Another aspect worth pointing out is the mostly non-existent or only very weak correlation between the different lexico-syntactic flexibility parameters and compositionality. In this regard, the results back up those of the previous studies mentioned above which mostly could not find a systematic correlation between flexibility and compositionality (cf. Section 4.1.1).
116
Rethinking Idiomaticity
4.5 Morphological flexibility (MF) 4.5.1 Application Morphological flexibility is similar to lexico-syntactic flexibility in that it may be conceived of as a meta-concept that comprises different variation parameters, which in turn comprise different parameter levels. A crucial difference between lexico-syntactic flexibility and morphological flexibility, however, is that the variation parameters that constitute morphological flexibility, such as Person, Number, Tense or Aspect, are obligatory, and they form closed paradigms. That is, one parameter level of each of the parameters that constitute morphological flexibility must be instantiated in each example of a V NP-construction (even the absence of a negation may be regarded as the presence of a zeromorpheme). As for all other flexibility types, morphological flexibility was not restricted a priori to particular aspects of morphological variation; instead, all potential variation parameters of the verb, the determiner, and the noun phrase were taken into account. Consider Table 4.12 for an overview of the variation parameters comprising morphological flexibility, the parameter levels of each variation parameter and the abbreviated names. The parameter level MF Det was coded in accordance with a classification described in Crystal (1996:129); five different kinds of determiner were distinguished. Table 4.13 provides an overview.
4.5.2 Flexibility of the NP-slot Let us first turn to the morphological flexibility parameter Det, which was coded like shown in Table 4.13. Figure 4.17 provides the results for the Barkema-based measure, Figure 4.18 those for the entropy-based measure. Figure 4.17 indicates that V NP-constructions show clear differences with respect to the different kinds of determiners they take. Constructions like tell X story, draw X line or make X point only slightly deviate from the baseline, so we can conclude that they license various kinds of determiners. Take X root, change X hand or do X trick, in contrast, yield (near-)maximum NSSD values, which points to the fact that they tend to occur with one specific determiner almost all the time. Taking the parameter level-specific results into consideration (Table E.17 in Appendix E), we see that most constructions deviate most strongly with respect to the determiner classes 0 (no determiner), 1 (the, my, no, what) and 5 (a(n), each, every, (n)either ). For the remaining determiner classes, there is generally a tendency of limited flexibility (since constructions mostly obtain negative flexibility values), yet the magnitude of the deviation is only very small. A look at Figure 4.18 confirms that the V NP-constructions cover a wide range of flexibility values, and that the constructions spread almost evenly across the whole range of values (accordingly, there is a solid correlation between the Barkema-measure and the entropy-measure: r Pearson = 0.545). However, the large majority of constructions are less flexible than the baseline, which is fairly high for this parameter (0.64).
117
Flexibility measures Table 4.12 Variation parameters and parameter levels constituting morphological flexibility (MF). Code
Variation parameter
Parameter levels infinitive first second third vocative other
Person
person verb
NumV
number verb
Tense
tense verb
Aspect
aspect
simple progressive perfective
Mood
mood verb
indicative subjunctive nonfinite
Voice
voice verb
active passive
Neg
negation verb
absent present
Det
number/definiteness Det
cf. Table 4.13
NumNP
number NP
singular plural
Gerund7
Gerund
singular plural nonfinite past present future nonfinite
absent present
With respect to Nicolas’s (1995) above-mentioned observation that the definiteness of the noun phrase seems to determine the readiness with which V NP-constructions license internal modification, the data in the present study do not support this claim. The two variation parameters which clearly measure internal modification are AttrAdj and AttrNP. Comparing the ratios of the number of modified attestations with definite and indefinite noun phrases, respectively, the mean ratio of adjectival modification for definite NPs is 0.163
118
Rethinking Idiomaticity
Table 4.13 Types of central determiner (adapted from Crystal 1996:129).
Type 1 (the, my, no, what) Type 2 (some/ any, enough) Type 3 (this, that) Type 4 (these, those) Type 5 (a[n], each, every, [n]either )
Occurs with singular count noun √ – √ – √
Occurs with plural count noun √
Occurs with non- count noun √
√
√
– √ –
√ – –
and that for adjectival modifications in indefinite NPs is even higher (0.222). Leaving aside that these numbers point into the exact opposite direction as proposed by Nicolas (1995), the difference between the two is far from being significant (t Welch = −0.957, p = 0.342). The same holds for nominal modification. The mean ratio for attributive NPs in definite head NPs is 0.012, that for attributive NPs in indefinite head NPs amounts to 0.023, so again, the tendency goes against the hypothesized directionality, and the difference is not significant anyway (t Welch = −0.823, p = 0.415). To give two prominent examples from the data sample, consider fight X battle and make X face. As to the former, fight a losing battle alone accounts for 35 out of 94 occurrences of this construction with an adjectival modifier in the NP-slot. Similarly, make X face preferably occurs with adjectival modifiers such as ugly, wry, silly, etc., accounting for 85 out of 313 instances of this construction with an indefinite determiner. Likewise, there is no correlation between Det and compositionality (Barkemabased measure: r Pearson = 0.011; entropy-based measure: r Pearson = 0.21). With respect to the other noun-related aspect of morphological flexibility, NumNP, consider Figures 4.19 and 4.20, respectively. As is obvious from Figure 4.19, the large majority of V NP-constructions are very similar to the baseline: the NSSD values for most constructions are not higher than 0.2. A look into the parameter level-specific results in Table E.16 (Appendix E) shows that for the six constructions which deviate strongly from the baseline (call X police, change X hand, cross X finger , deliver X good, grit X tooth and meet X eye), the deviation resides in the noun phrases occurring almost exclusively in the plural rather than the singular form. Note how many of these constructions are body part-metaphors that involve body parts of which humans naturally have more than one (hands, fingers, teeth, eyes), of which there are quite a few in the overall small (in terms of types) data sample. Figure 4.20 specifies that picture by showing that the baseline itself is moderately flexible with respect to the distribution of singular and plural noun phrases
NSSD 0.50
0.40
0.30
Flexibility measures
te ll_ dr sto aw r y m _ ak lin t a e_p e ke oi be _c nt g_ o u q rs l e ues e av tio e_ n se mar e c a _po k l i cr l_p nt os oli s_ ce br f i e a ng k_ e r h m ear e pl e t _ t ay e y _g e am cr fit_ e os b ho s_m ill ld in _ d cl bre o de se_ ath liv do e r or _ gr goo it_ d pa too ve t h ge _ t_ tak wa ac e_ y t_ p to is w r ge t s ite he fig _le r t ca ht_b ter rr a y_ ttl ha w e e v e i gh _ t ha laug ve h m ma _clu ak ke e e_ _ f h e ac a e be dw ar ay m _f ak r u e_ it m br f o a r k ea ot_ k_ b ta gro ill ke u s c _p nd ra l u tc ng h f o _he e llo a w d c a _su tc it h_ c h do e y e an _t r ge i c k _ ta han ke d _r oo t
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.17 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter Det.
119
V NP-construction
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
os s_ pl fin ay ge w r _g r i t am m e_l e e ak t t ca e_ er rr m y_ ar fig we k ht i gh BA _ba t SE ttle te LIN ll E dr _sto aw r y ca _ be ll_p line g_ o q lic m ues e ak tio e_ n be p o ar int l e _f av r u ta e_m it m ke a ak _ c r k e _ ou he r s a e m dwa ha eet_ y v e ey _l e s e aug br e _ p h ea o k in m _he t ak a r c r e_ t o s f ac s_ e m fo ind ot _b ta fit ill k br e_p _bil ea l u l k_ n ho g r o ge l d un de _b d liv re e r at h _ gr go it_ od ha too ve t h pa _ c l v u c l e_w e sc ose ay r a _d tc oo h_ r ta hea k fo e_p d llo is w_ s d o sui t c _t c at ric ge han ch_ k t _ ge e y ac _ e t _ ha to nd g ta eth ke er _r oo t
H 0.90
120
cr
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.18 Relative entropy values (H ) for the morphological flexibility parameter Det.
NSSD 0.50
0.40
0.30
Flexibility measures
fig be ht_b g_ a qu ttle e tel stio wr l_st n sc ite_ ory ra le tc tte h r pl _he ay a br _g d ea am k_ e dr hea aw rt ma _ ke line ma _po ke int ha _fa v c clo e_c e se lue cr _d os oo s_ r ca min tch d lea _ v ey ha e_m e ve ar _l k a f ug ta oot_ h ke b _c ill be our a s fo r_fr e llo u i g e s w_ s t t_ ee ui ac _p t t_ oi t ho oge nt ld the _ ma bre r ke ath _m do ark _ ma ta tric ke ke_ k _h ro ta ead ot ke w br _p ay ea lu k n ca _gr ge rr ou y_ n w d pa eig ve ht _w fit ay ta _bi k e ll _ me pis ca et_ s ch ll_p eye an ol g i cr e_h ce os a de s_f nd liv in g er er _ gr goo it_ d to ot h
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.19 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter NumNP.
121
V NP-construction
H 0.90
122
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
pl ay w r _g i t am fig e_le e be ht_ tter g _ ba qu ttle e m stio ee n t_ sc tell_ eye ra st tc o br h _ h r y ea e k _ ad dr h e a aw r t m _ ak l i e ne m _po ak i n BA e_f t S ac cl ELI e os N c r e _d E o s oo s_ r ca m i n t le ch_ d av e y e ha _ m e ve ar _l k a f ug t a oot h k e _b _c ill be ou r a s f o r_f e llo ru w it ge t _ s e e _su ac _ it t _ po t ho oge i nt ld th _ e ca bre r ll_ at m po h br ake lic ea _m e k ca _gr ark r r ou ch y_w nd an ei g g c r e_ ht o s ha de s_f nd liv in e r ge r _g do ood _t ri f ck gr it_b it_ ill m ha too ak v e t h e_ _ c he lu a e pa dw ve ay t a _w ta ke_ ay ke p _p i s s t a l ung ke e _r oo t
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.20 Relative entropy values (H ) for the morphological flexibility parameter NumNP.
Flexibility measures
123
(0.437), so a third of the constructions are more flexible than the baseline. At the same time, however, the entropy-based measure also identifies a number of constructions as completely frozen – in addition to those constructions which occur exclusively in their plural forms, the entropy-measure also considers those which occur exclusively in their singular forms as inflexible (break X ground, carry X weight, do X trick, fit X bill, make X headway, pave X way, take X piss, take X plunge and take X root). The tokens of constructions such as play X game, write X letter and fight X battle, on the other hand, are distributed fairly evenly across the singular and plural category, which renders them maximally flexible in terms of their relative entropy. All over, the two measures correlate fairly highly (0.472). Interestingly, while the Barkema-based measure does not correlate with compositionality to an extent worth noting (r Pearson = −0.226), the correlation between the entropy-based measure and compositionality is fairly high (r Pearson = 0.466).
4.5.3 Flexibility of the verb-slot The morphological flexibility of the verb-slot of V NP-constructions can be assessed from various perspectives, since there is potential variation in terms of the morphological realization of person, number, tense, aspect, mood, voice and also negation. To begin with, consider the results for the variation parameter Person provided in Figures 4.21 and 4.22. According to the Barkema-based results, the V NP-constructions deviate modestly from the baseline. A few constructions deviate more strongly, for instance, see X point (0.697) and have X clue (0.678) have a much stronger association with ‘first person’ than other constructions, as can be derived from the parameter level-specific results (Table E.9 in Appendix E). The entropy-based results in Figure 4.22 suggest even stronger diversification of constructions on this parameter: index values range between 0.107 (cross X mind) and 0.919 (have X laugh), and the values for all other constructions are distributed evenly over this amplitude. Furthermore, Figure 4.22 shows that the baseline is highly flexible with regard to Person: the index value of 0.764 indicates that the tokens comprising the baseline are nearly evenly distributed across singular and plural form. In relation to the high baseline value, the majority of constructions turn out less flexible, the only exceptions being have X laugh (0.919), play X game (0.877), take X piss (0.832), draw X line (0.807), get X act together (0.802) and take X plunge (0.786). Let us now turn to variation on the variation parameter NumV. Consider Figures 4.23 and 4.24. According to the Barkema-measure, the distribution of V NP-constructions is very similar to the one of the Person parameter: most constructions deviate moderately from the baseline, cross X finger being the sole exception. According to Table E.10 (in Appendix E), this deviation stems from the fact that the verb cross X finger frequently occurs in a nonfinite form.
NSSD 0.90
124
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
ca l ho l_po ld lic _ ge wr bre e t_ ite ath ac _l t_ ett to er g te ethe ll_ r cl st os o r e y pl _do ay o br _g r e a am k_ e h t a ea m ke_ rt ak pi e_ s s dr p o i aw nt gr _l i ta it_t ne k e oo _ t fig plu h ht ng _ e m ba br ak ttle ea e_ k _ f ac gr e m m ou n ak ee d e_ t _ he e y a e pa dwa de v e y liv _w e a ta r_g y ke o _ c od c a our tc se h t _ c h ake e y e a n _r g oo ha e_h t a c a v e _ l nd r r au y_ g m we h ak i g s c e _ ht ra m tc ar h_ k he fit ad _ fo bil o l be t_b be ar ill g_ _fr q u ui fo est t llo io w_ n d su le o_t it av ri c cr e_m k os ar s_ k ha m i v e nd s e _cl c r e_p ue os o s_ int fin ge r
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.21 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter Person.
H 0.50
0.40
0.30
Flexibility measures
ha ve _ pl lau ay g _g h ta am ke e ge d _p t_ ra is ac w s t_ _li t ta oge ne k e th _ e BA plun r SE ge wr L i t e IN fig _le E ht tte ma _ba r ke ttl ca _po e ll_ in p t se olic e_ e tel poin clo l_st t se ory _ gr do it_ or ma ha too ke ve_ th _h cl br ead ue ea w ho k_h ay ld ea _b rt r fo eath ta o t_ ke b sc _c ill ra ou t de ch_ rse liv he er a fo _go d llo o ma w_ d ke sui _ t be ma ar rk _ br me frui ea et_ t k_ ey gr e pa oun ch ve d an _w ge a y ma _ha cr ke_ nd os fa s_ ce f ta inge ke r c _r ca atch oot rr _ y_ ey we e i be fi ght g_ t_ qu bil es l d tio lea o_t n v ric cr e_m k os ar s_ k m in d
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.22 Relative entropy values (H ) for the morphological flexibility parameter Person.
125
V NP-construction
_b se rea e_ t h p do oin _t t ri f i ck c a t _b tc ill ha h_e be v e y e g_ _c q l br u e s u e br eak tion ea _h k _ ea w r gro r t i t un m e_l e d ak t t c a e_ e r r r po y_ in w t ta eig k e ht t e _r oo ll m _sto t ak r c l e_f y os ac l e e _d e av o e _ or m c h t ake ark an _p g is c r e_h s os an s_ d gr min it_ d ge t t _ m e oot ac e t h t _ _e sc tog ye ra e t tc he h_ r m dra hea ak w d e _ _l he in a e pa dw a v t a e_ y ke w _c ay fig ou ht r s _ pl ba e a de y_ ttle liv ga er m e _ be go o t a ar_ d ke fr _p uit c a l un ll g ha _po e ve lic _l e a f o ugh m ot_ ak b e il f o _m l l a c r l ow r k o s _s s_ uit fin ge r
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
ld
NSSD 0.90
126
ho
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.23 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter NumV.
ve fo _lau ll t a ow_ g h ke s de _p uit liv lun er g c a _go e ll_ o ge pl po d t_ ay lic a _ e m c t _t gam ak og e e_ e t he h e ad r fo way fig ot_ ht bi _ ll dr bat ch aw tle an _ ge lin _ e pa han m v e_ d ak w e _ ay be ma c a ar r k rr _fr y_ u w it ta eigh ke t _ br t a k pi s ea e_ s k_ r o g r ot t e ou n l t a l _s d ke t o _ w r cou r y ite rs l e _l e e a sc ve_ tter ra m tc a m h_h r k ak ea e_ d gr p o i it_ nt ha too ve t h c a _cl t u BA ch_ e SE eye m LIN ho eet E l d _e cr _br ye os ea s_ th fin do ge r _t ri fit ck s e _b br e_p ill e o be a k _ i nt g_ he qu ar cl est t os io e n m _ do ak o c r e_ r o s f ac s_ e m in d
H 0.50
0.40
0.30
Flexibility measures
ha
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.24 Relative entropy values (H ) for the morphological flexibility parameter NumV.
127
V NP-construction
128
Rethinking Idiomaticity
The entropy-based results complement the Barkema-based results in showing, first, that the flexibility of the baseline is high for this variation parameter, too (0.71). Other than for the Person parameter, however, the majority of constructions turn out even more flexible than the baseline, that is, there are nearly equal proportions of singular and plural forms among the tokens of most constructions. The tokens of cross X mind (0.301), make X face (0.419), and close X door (0.559) are comparatively biased (the former two towards the singular form, close X door towards nonfinite forms), yet the absolute index values indicate that they are far from being completely inflexible. A very similar picture emerges for the variation parameter Tense. Consider Figures 4.25 and 4.26. The Barkema-based results show that on average, the NSSD values are higher for Tense than for NumV or Person. Also, the degree of diversification is higher on this parameter than on the previous two. Again, the parameter level-specific results given in Table E.12 (in Appendix E) state the source of these deviations more precisely: V NP-constructions evidently repel present tense. The strong association between idiomatic constructions and past tense points to the frequent use of these constructions in narrative contexts. With the single exception of beg X question, none of the constructions in the data sample yield a positive flexibility value. Instead, many constructions are strongly associated with past tense (for instance, cross X mind, hold X breath or make X face), or they prefer to occur in nonfinite forms (e.g. bear X fruit, draw X line or foot X bill). The entropy-based results (which modestly correlate with the Barkema-based ones, r Pearson = 0.521) bear an even more striking resemblance to the ones obtained for NumV (compare Figures 4.24 and 4.26). The baseline value is relatively high (0.684), yet there are numerous constructions which are even more flexible (foremost, take X plunge (0.97), draw X line (0.959) and follow X suit (0.915)). The correlation between Tense and compositionality is again negligible (Barkema-based measure: r Pearson = 0.167; entropy-based measure: r Pearson = −0.074). With respect to Aspect, the Barkema-based method suggests that V NPconstructions have very differing flexibility values: the NSSD values range between 0.002 (call X police) and 1 (take X piss). What is more, the values spread across this range such that the resulting line is approaching one that would be produced by a linear function, so all constructions differ from each other to some extent. Taking the parameter level-specific results into account, we can narrow down the source of this variation to simple and progressive aspects: with the exception of cross X finger , the V NP-constructions are united in their repulsion of perfective aspect (consider the column for perfective in Table E.11 (Appendix E); again, this may be interpreted as a reflection of the primarily narrative use of these constructions. The vast majority of constructions yield negative flexibility values of double-digit magnitude). With respect to the parameter levels ‘simple’ and ‘progressive’, however, no systematic patterns can be identified. There is no general tendency for V NP-constructions preferring to occur in the simple aspect to reject progressive aspect equally strongly, or
e_ po t_ ac fit int t_ _ ca tog bill rr eth y_ e we r h br av i gh ea e_ t k_ clu gr e t a ou n ke d t _p be ell_ iss g_ st q or ha u e s y ve tio _l n a do ugh m _t ak ri br e _ p c k e a oi k n pl _he t ay a r _g t a de f oo m e liv t_b er i l _ l dr goo aw d be _l i a n fig r_f e ht r ui _ t pa batt ta ve_ le ke w _p a y l m ung ta eet_ e ke e _c ye o m follo urs ak w e e _ _s h e ui c a adw t ll_ ay p ta olic ho ke_ e l d ro _ o wr bre t ite at l e _l e h av t t e_ er ca ma t c rk c l h_ os e y e_ e gr do it_ or m to a sc ke_ oth ra m ch tch_ ark an h e ge ad m _ha ak nd c r e_ os f a s c cr _fin e o s ge s_ r m in d ge
NSSD 0.50
0.40
0.30
Flexibility measures
se
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.25 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter Tense.
129
V NP-construction
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
ke _p dr l un a g fo w_l e llo in b r w_ e e a su k_ it h do e a r ha _t t ve ric _ k pa lau v e gh _w ca tell_ ay rr st de y_w ory l e br iver igh e a _g t k _ oo gr d t a ou n k m e_ d ak pi e_ s s p f o oi n pl ot_ t ay bi _ ll be gam a fig r_f e h ru t t_ it m ake batt ak _c le e _ ou he r s ad e wa f ge ca it_ y t_ ll_ bi ac p l l t _ ol to ice g m ethe ee r t a t _ey wr ke_ e i t ro m e _ l e ot ak t t e_ er gr ma it_ rk h to s c av e ot h ra _c tc lu h BA _h e SE ead cl LI os N e_ E s e doo e l e _p r a ch ve_ oint an m ge a r _ k ca han ho tch d l d _e c r _br y e os ea s_ th m fin be ake ger g_ _ f q ac c r ues e os tio s_ n m in d
H 0.90
130
ta
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.26 Relative entropy values (H ) for the morphological flexibility parameter Tense.
Flexibility measures
131
anything of that kind. Rather, they can be more flexible with regard to parameter levels compared to the baseline (bear X fruit, have X laugh or meet X eye), less flexible on both parameter levels (e.g. call X police, leave X mark) or a mixture of the two (e.g. cross X finger or see X point). Also, these tendencies towards comparatively increased or limited flexibility have different magnitudes. This explains why the constructions are strung like pearls on a chain in Figure 4.27. The entropy-based results displayed in Figure 4.28 indicate that compared to the previously presented morphological flexibility parameters Person and NumV, most constructions behave more conservatively on the Aspect parameter. The majority of constructions are less flexible than the baseline, which is even higher than for the other two variation parameters (0.767). In general terms, the entropy-based measure suggests a considerable degree of diversification of the constructions on this parameter, with a range of values between 0.876 (make X headway) and 0.043 (see X point). There are, however, some differences as far as the ranking positions of the individual constructions are concerned (correspondingly, the correlation between the two measures is only moderately high: r Pearson = 0.465). As is the case with most other morphological flexibility parameters, the correlation between Aspect and compositionality is very low (Barkema-based measure: r Pearson = −0.248; entropy-based measure: r Pearson = 0.137). As to the flexibility parameter Mood, the Barkema-based method describes the average flexibility of the V NP-constructions as considerably lower and less diverse; consider Figure 4.29. As is obvious from Figure 4.29, most V NP-constructions deviate only a little from the baseline, the single exception (again) being cross X finger . The constructions occupying the right-hand third of the graph, such as deliver X good (0.191), foot X bill (0.261) or bear X fruit (0.324), all have a tendency against the indicative mood, preferring the nonfinite form instead. With respect to the third possible parameter level, subjunctive mood, constructions diverge both positively and negatively from the baseline, yet rarely to a great extent. Do X trick and, although less so, fit X bill stick out in their fairly strong preference for the subjunctive (consider Table E.13 in Appendix E, which provides the parameter level-specific results for Mood). Once again, the semantic motivation for this preference is more than obvious: as the examples in (43) for fit X bill illustrate, these constructions are strongly associated with contexts of speculation or uncertainty, which stipulates a use in subjunctive mood; (44) provides some of the few examples of do X trick in indicative mood. (43) (a) (b) (44) (a) (b)
Got any glad rags that would fit the bill? As I notice that no reserve has been included in the lot, it should fit the bill precisely. Mr Kilroy-Silk, a high flier on the egg circuit, tells me he can do the trick with three eggs simultaneously. Keep hair smelling clean and fresh – subtly perfumed shampoos do the trick beautifully!
NSSD 0.90
132
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
ca ll l e _po av li c cr e_m e os ar wr s_m k ite in _ d dr lett aw e r ta _lin ta ke_ e m ke_ ro ak c ot e _ ou ch hea rse an dw g a m e_h y ak an m e_p d ak o i e_ nt te ma br ll_s rk ea t o k r cl _he y os a t a e _d r t ke o _p or pa l ung ve e ge _ t_ d wa a c o_ y t_ tri to c ha ge k ve the de _ l r liv au e r gh _ gr goo it_ d ca too ho tch th ld _ey _b e re c a f oo at h rr t_ y_ bi w ll m eigh e f i et _ t br ght_ eye e a ba k_ tt g le be r o u n s c ar d ra _f tc r cr h_ uit o s he be s_f ad g_ in q u ge pl est r ay io _ n m ga ak m e_ e fa fit ce ha _b v il f o e_c l llo lu w e s e _su e_ it ta poin ke t _p i ss
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.27 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter Aspect.
ak e_ c h he a an dw ge a pl _ha y ay n w r _g d i t am m e_l e br ake ette ea _m r k_ a g rk pa r o u n v fig e_ d ht wa _ y dr bat aw tle ta _lin ke e t a _pi le ke_ ss av r o e o BA _m t S a ha E L r k v IN ta e_la E ke u _ g c l cou h os r s cr e_ e o s do s o ca _fin r sc ll_ ger r a po tc li m h_h c e ak e e _ ad t e po i l br l_s nt ea t o k_ ry be h e a ar r t c r _f os r u s_ it gr m i it_ nd ha too t a v e_ t h ke cl _ u ge hol pl u e t _ d_ ng ac b e t_ re de tog ath liv et e r he _g r f o ood ot _ do bi l _t l m ca ee rick rr t _ y_ ey w e ca eig be t c ht g _ h_ qu e y m est e ak io f o e_f n llo ac w_ e s fit uit s e _b e_ ill po in t
H 0.50
0.40
0.30
Flexibility measures
m
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.28 Relative entropy values (H ) for the morphological flexibility parameter Aspect.
133
V NP-construction
NSSD 0.90
134
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
le a be ve_ g_ m br q u a r k e a es k_ tio gr n s e ou n e_ d ca poi ho tch nt l d _e y _ cr bre e o s at s ge h _m h t_ av in ac e_ d t _ cl t br oge u e ea th k er m _he ak a r m e_f t ak a e_ ce t a poi ke nt c l _r os oo e c a _d t l l oo w r _po r l ca ite_ ice rr l e y _ ch w tter an ei ge gh _h t a fit nd t a _bi ke l l t e _pi s ll_ s gr sto it_ ry to do ot h _t m ric h ee k m av e t _ e y ak _ e e_ lau he g fig adw h sc ht_ ay r a ba tc ttl h_ e dr h e de aw ad liv _l er in pl _go e ay o t a _g d k e am _ t a pl u e ke n _ c ge o fo urse o pa t_b v il f o e_w l llo a w y b e _su ar it m _f ak r cr e_m uit os a s_ rk fin ge r
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.29 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter Mood.
Flexibility measures
135
Turning towards the entropy-based results for Mood in Figure 4.30, we see that the index values spread considerably also for this variation parameter, ranging between 0.971 (take X plunge) and 0.267 (cross X mind). As opposed to the other variation parameters presented so far, however, the baseline yields only a moderate index value (0.457); accordingly, the majority of V NP-constructions are more flexible than the baseline. As in line with most variation parameters presented so far, the correlation between Mood and compositionality is virtually non-existent (Barkema-based measure: r Pearson = −0.058; entropy-based measure: r Pearson = −0.053). Yet another morphological flexibility parameter to be taken into account is Voice; consider Figures 4.31 and 4.32. As is immediately obvious from the Barkema-based results, V NP-constructions behave very differently with respect to this variation parameter, taking on values between 0 (call X police) and 1 (e.g. take X root). Contrary to other morphological flexibility parameters on which values spread to a considerable extent, we see that Voice has less discriminatory potential: 14 out of 39 V NP-constructions obtain NSSD values of 1, and 27 out of 39 constructions have an NSSD value higher than 0.8. Once we take the parameter level-specific results into account (cf. Table E.14 in Appendix E), we see that this uniformity stems from a common rejection of passive voice (with the exception of call X police, which even has a slight preference towards passive voice). This reluctance to passivize is also reflected in the entropy-based results given in Figure 4.31 (with respect to voice, the correlation between the two flexibility measures is nearly perfect: r Pearson = 1): while active and passive voice are nearly perfectly evenly distributed when looking at the baseline (0.942), all constructions but call X police are less flexible than the baseline, and 14 are even totally frozen such that they never occur in the passive voice. This result is not surprising in view of the corresponding results for tree-syntactic flexibility. A correlation of the two flexibility measures and compositionality produces only weak correlation scores, and there is no evidence in favour of any systematic relationship between the two variation parameters (Barkema-based measure: r Pearson = −0.333; entropy-based measure: r Pearson = 0.315). How do the V NP-constructions behave with respect to negation? Consider Figures 4.33 and 4.34 for an overview of the results. According to the Barkema-based measure, nearly none of the V NPconstructions notably contrast with the baseline: all V NP-constructions yield NSSD values below 0.1. Two exceptions here are see X point and have X clue, which are often used in contexts of discussion or other exchanges of arguments and opinions to signal disagreement and lack of understanding. As the parameter level-specific results in Table E.15 (Appendix E) indicate, the two constructions deviate from the baseline in occurring in their negated version more often: see X point occurs 41.80 per cent more often in a negative form than the baseline, have X clue even 70.31 per cent more often. Typical examples are given in (45) and (46), which illustrate that see X point is highly associated with argumentative contexts to express disagreement, and have X clue nearly always denotes a mixture of helplessness and ignorance.
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
k sc e_p ra l u tc ng h_ e pa h e v e ad be _ w a a f o r_f y l r cr low uit o s _s s_ ui t dr fing aw e r _l fo ine ot _ do bi l l h m ave _tri ak _ c k e_ la h e ug a h ta dwa k ta e_ y ke r o _ m c ou ot ak r e_ se m ca f i ark rr t _ y_ bi ge pl w e l l t _ ay i gh a c _g t t_ a to me g te ethe fig ll_s r t de ht _ o r y liv ba ch er_ ttle an g ge oo c a _ha d l br l_po nd ea li k_ ce m hea m eet_ rt ak e e_ ye t a po i n k m e_p t a i wr ke_ ss ite fac _ e gr lett it_ er ha too v t c l e_c h os lu e e c a _do t c or BA h_ be SE eye g_ LI q N l e ues E br ave tion ea _m k_ a gr r k s e ou ho e _ p nd ld oi _ cr bre nt os a s_ th m in d
H 0.90
136
ta
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.30 Relative entropy values (H ) for the morphological flexibility parameter Mood.
ll_ p dr oli aw c e wr _ ite lin _ e te lette ll_ r m s ak t o t a e_p r y ke oi _ n p l c ou t ay r s _ e c a gam tc e c l h_ o e cr se_ ye o d br ss_ oor e f m ak_ inge ak g r ge e _ r o u t _ he nd ac ad t_ w t a ho oge y l t c a d_b h e r rr re y_ at w h pa eig be v e ht g_ _w qu ay es ti fit on _ ta b br ke_ ill e a pi k s le _he s av a e_ r t m do a r k ta _ ke tri _p c k l s e ung e_ e m po ak i n e t be _ f a ch ar ce an _f g ru cr e_h it os an de s_ d liv m e in fig r_g d ht oo f o _ba d llo ttl w_ e f o sui ot t gr _b it_ ill ha too t ha v e _ c h ve lu m _la e ak ug e_ h m sc mee ark ra t _ tc ey h_ e ta hea ke d _r oo t
NSSD 0.50
0.40
0.30
Flexibility measures
ca
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.31 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter Voice.
137
V NP-construction
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
ll_ BA po SE lice dr LIN a wr w_ E ite lin _ e te lett m ll_s er ak t o t a e_p r y ke o _ in pl c o u t ay r s _ e c a gam tc e c l h_ o e cr se_ ye o d br ss_ oo e f r m ak_ ing ak g e r ge e _ r o t _ he u n ac ad d t_ w t ho oge ay l t c a d_b h e r rr re y_ at w h pa eig be v e ht g_ _w qu a es y ti fit on t a _bi br ke_ ll ea p i le k_h ss a v ea e_ r t m a t a do_ r k ke t r _p i c k s e l ung e e m _po ak i n e t b _f ch ear ace an _ f g ru cr e_h it os an de s_ d liv m e in fig r_g d ht oo f o _ba d llo ttl w_ e f o sui ot t gr _b it_ ill ha too ha v e _ t h ve clu m _la e ak ug e_ h m sc mee ark ra t _ tc ey h_ e ta hea ke d _r oo t
H 0.90
138
ca
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.32 Relative entropy values (H ) for the morphological flexibility parameter Voice.
NSSD 0.50
0.40
0.30
Flexibility measures
pl d ay ge eliv _ga t _ er m ac _ e t _ go to od g m ta ethe ak k e r e _ _r h o ho e ad ot ld wa _b y r t a e at ke h _p i fit ss m _bi ee ll b t_ ca ear eye rr _fr y_ u we it do i gh c a _t r t l i br l_po ck ea li ch k_ ce an he ge ar _ t dr han a le w_ d av lin m e_m e ak a e_ rk te mar ll k m _s br ak t o r y ea e_ k _ f ac g f o rou e llo n d c l w_s os u i e ha _d t v e oo ta _la r ke u be _ c gh g_ ou qu rse e ca stio ta tch n k e _e _ y fig plu e ht ng w r _ba e it ttl m e_l e e ak t t c r e_p e r os o s_ int fin f o ge r ot p _ sc ave bill ra _w tc a h_ y gr hea it_ d cr to o s ot s_ h se min e_ d ha poi v e nt _c lu e
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.33 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter Neg.
139
V NP-construction
H 0.90
140
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
se c e_ m ros poi ak s_ nt e_ m de hea ind liv dw er a pl _go y ay o ge BA _ga d t_ S m ac E L e t_ IN to E g ta ethe ho ke_ r l d ro _b ot ta reat ke h _p fit iss m _bi ee ll be t _ e ca ar ye rr _fr y_ u we it do i gh c a _t r t l i br l_po ck e ch ak_ lice an he ge ar _ t dr h a aw nd le _ av l i n m e_m e ak a e_ rk te mar ll k m _sto br ak r y ea e_ k_ f a g c f o rou e llo n c l w_ d o s su ha e_d it v o ta e_la or ke u be _ c gh g _ ou qu rse ca estio ta tch n k e _e _ y fig plu e ht ng w r _ba e it tt m e_le le ak t t c r e_p e r os o s_ in fin t f o ge ot r gr _b it_ ill ha too ve t h p _c sc ave lue ra _w tc a h_ y he ad
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.34 Relative entropy values (H ) for the morphological flexibility parameter Neg.
Flexibility measures (45) (a) (b) (c) (46) (a) (b) (c)
141
But I don’t see the point. As someone who hates being pestered by incoming calls, I no longer see the point of a cellular phone. You see the point I’m making? I was world number one and I didn’t have a clue about life. I need to do a CV for tomorrow and basically I haven’t a clue about structure or content. She hadn’t a clue as to what was going on.
The entropy-based measure (correlating highly with the Barkema-based measure: r Pearson = 0.715) shows that most constructions are less flexible than the baseline, which yields an index entropy value of 0.611. The only constructions that are more flexible are see X point (0.986) and cross X mind (0.94), which occur about equally often in their affirmative and negative forms. Make X headway (0.748), deliver X good (0.647) and play X game (0.614) also obtain entropy values higher than the baseline, so their distribution of negated and non-negated instantiations is also less biased towards either side than for the baseline. The least flexible construction is have X clue (0), which exclusively occurs in its negated form. Note again how the entropy-based measure helps to obtain a more informative picture than the Barkema-based measure can: that have X clue is actually deviating from the baseline in the sense of being more restrictive, not more flexible, compared to see X point, which exhibits the opposite behaviour, cannot be derived from the NSSD values given in Figure 4.33, but the entropy-based ranking provides this information. The last morphological flexibility parameter to be considered is Gerund: how often is the verb-slot in the V NP-constructions a gerund (or present participle)? Figures 4.35 and 4.36 provide the results. The Barkema-based flexibility values suggest that most V NP-constructions behave very similarly with respect to Gerund, and that their degree of flexibility is also very close to that of the baseline. A number of V NP-constructions, however, stick out: pave X way (1), close X door (0.855) and play X game (0.704) are the top three of six V NP-constructions with NSSD values that are clearly higher than those of most other constructions. Taking the parameter level-specific results in Table E.18 (Appendix E) into account, we see that these constructions deviate from the baseline in that they prefer to occur in gerundial or participial forms: pave X way prefers these forms 17.75 per cent more often than the baseline, close X door occurs 16.41 per cent more often in these forms and play X game 14.89 per cent more often. According to the entropy-based measure, we first of all see that the baseline value is comparatively low (0.174), and most V NP-constructions yield higher entropy values. The spread of entropy values is quite high, ranging from 0.731 (pave X way) to 0 (cross X mind, do X trick). Finally, it is worth pointing out that Gerund does not substantially correlate with compositionality either (Barkema-based measure: −0.073; entropy-based measure: 0.031).
NSSD 0.90
142
0.50
0.40
0.30
0.20
Rethinking Idiomaticity
ca ca ll_p r r ol y_ ic le w e e av i g e _ ht be ma ar r k _ ta fru ke it gr _r o i o fig t_to t ht ot h f o _ba llo ttl ge ma w_s e t_ ke ui ac _ t m t _t p o i ak og nt e_ e t he he ad r w be fi ay g_ t _ q de ue bill liv st e io br r_g n e a oo k_ d ha h e a ve rt se _clu e_ e p do oin c r _t t os ric s_ k m br catc ind e a h_ k_ ey gr e o fo und ta ot_ ke b _c ill o t e urs l w r l _s e it to ho e _ l r y ld ett _b e r dr rea aw t h m _l i ak n m e_f e ak a ta e_m ce ke a c r _pl r k o s un sc s_f ge ra in tc ge ha h _ h r ch ve_ ead an lau g e gh _ m han ee d t a t _ey k pl e_p e ay i s c l _ga s os m e_ e pa do v e or _w ay
1.00
0.80
0.70
0.60
0.10
0.00
V NP-construction
Figure 4.35 Normalized sums of squared deviations from baseline (NSSD) for the morphological flexibility parameter Gerund.
v c l e_w os a e pl _ do y ay o _g r t a am ke e _ ch mee piss an t _ g ey ha e_h e v sc e_ and ra la t u c r c h _ gh o s he ta s_fi ad k e ng _ e m pl u r ak ng e e m _ma ak r e k dr _ f a ho aw_ ce ld lin _ wr bre e ite at _ h te lett ta ll_s er ke t o _c ry o br fo urse ea ot_ k_ b g ill c a rou n t br ch d e _ de a k _ e y e m live hea a r ge ke_ r_g t t _ h e oo a c ad d t_ w t a m oge y ak th e e f o _po r llo in fig w_ t h t su _b it ta att ke le be _r o a o ca r_fr t ll_ ui BA po t ca SE lice rr LI y_ N le w e E av i g e _ ht gr ma it_ rk ha too ve t h _c be fi lue g_ t _ qu bil e l se stio c r e_p n os oi s_ nt m do ind _t ric k
H 0.50
0.40
0.30
Flexibility measures
pa
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.36 Relative entropy values (H ) for the morphological flexibility parameter Gerund.
143
V NP-construction
144
Rethinking Idiomaticity
4.6 Evaluation Several methodological and content-related conclusions can be drawn from the above presentation of the results for the different flexibility parameters. To begin with the methodological issues involved, the two (or three, if relative and directional entropy are regarded as different measures) flexibility measures presented here highlight different aspects of the flexibility of V NPconstructions. The primary advantage of the Barkema-based method is that next to an overall flexibility value for each flexibility parameter, one can consult the individual values for each parameter level given as percentages (negative for less flexible than expected, positive for more flexible than expected), which makes it possible to get more detailed information about the exact source of the deviation. With respect to the morphological flexibility parameter Tense, for instance, one might be interested to know not only that a particular construction does not occur with all of the parameter levels of this variation parameter (‘past’, ‘present’, ‘future’ or ‘nonfinite’), but also which parameter level(s) the construction prefers/refuses to occur in. In many cases, we find evidence in favour of lexically highly specified fixed expressions, such as break new ground, have no clue or should do the trick. The relative entropy measure applied to tree-syntactic and morphological flexibility, on the other hand, is a more generic measure in the sense that it does not provide any information about the behaviour of a construction at the individual parameter levels – in the directional entropy values, however, we are at least provided with the essential information of whether the aspect it measures is present or absent. Both entropy-based measures are useful complements to the Barkema-based measure in that they provide flexibility indices not only for all V NP-constructions, but also for the baseline sample itself, which makes it possible to (i) compare baseline flexibility values across variation parameters, (ii) assess the flexibility of a particular V NP-construction in relation to the baseline value, and (ii) determine the directionality of its deviation from the baseline at the level of variation parameters (as opposed to parameter levels in the Barkema-based approach). To sum up, the two measures team up nicely to provide a comprehensive picture of the flexibility of a construction. Given their different modes of operation, the sceptical reader may ask to what extent the results produced by the two measures are compatible after all. In order to get an impression of the overall differences and similarities, the ranks produced for each variation parameter by the two measures were correlated with each other; consider Table 4.14. The correlation values indicate that generally speaking, the results are very similar. For the majority of variation parameters, the correlation ranges between moderate and high. The overall match between the two measures becomes even more transparent if one considers a scatterplot with a locally weighted regression8 as provided in Appendix F. For most of the variation parameters included, the data points line up on the regression line like a string of pearls. In other words, the regression function captures most of the data nicely.
Flexibility measures
145
Table 4.14 Correlations between the Barkema-based and the entropy-based flexibility measure by flexibility parameter. Flexibility parameter
r Pearson flexibility measures
MF Voice SF LF Addition MF NumNP LF PP MF Neg LF AttrNP MF Person LF NoAdv LF KindAdv MF Det LF RelCl MF Tense MF Aspect LF AttrAdj MF NumV MF Mood MF Gerund
1.000 0.955 0.950 0.749 0.728 0.715 0.679 0.641 0.612 0.587 0.545 0.530 0.521 0.465 0.451 −0.677 −0.831 −0.856
There are, however, a few exceptions to this. The values for the morphological flexibility parameters Mood, NumV, and Gerund highly correlate with each other, but negatively so: the higher the ranking of a V NP-construction according to one flexibility measure, the lower is the ranking for the other measure. Having ruled out an error in the data coding, I have no explanation for these results at this point.
Conclusion Table 4.15 provides (from left to right) the average entropy values for all flexibility parameters sorted according to decreasing entropy; the standard deviation of the average entropy values; the corresponding flexibility values of the baseline; and the difference between the two (positive numbers indicate that the constructions are on average more flexible than the baseline, negative values that they are less flexible). For ease of interpretation, the mean entropy values and their corresponding baseline values for each flexibility parameter are also provided in Figure 4.37. Note that in order to allow for a direct comparison of all values, the directional entropy values, which normally range from −1 to +1, were adapted to a scale with a value range between 0 and 1. Since the entropy-based method is the one which also provides the flexibility values of the baseline, and given that both flexibility measures tend to correlate highly anyway, the exposition is restricted to the entropy-based values.
146
Rethinking Idiomaticity
Table 4.15 Average entropy values, baseline values, and differences between the two for all flexibility parameters. Parameter
Mean H Parameter
SD
H Baseline
Diff. H Parameter /H Baseline
LF AttrNP MF RelCl LF PP MF Voice LF AttrAdj SF MF NumNP MF Neg MF Gerund MF Det LF NoAdv LF KindAdv LF Addition MF Person MF Aspect MF Mood MF Tense MF NumV
0.032 0.033 0.111 0.178 0.208 0.211 0.274 0.286 0.334 0.344 0.395 0.477 0.522 0.614 0.635 0.641 0.758 0.799
0.061 0.057 0.130 0.202 0.193 0.145 0.235 0.170 0.325 0.243 0.204 0.252 0.138 0.170 0.201 0.184 0.139 0.165
0.147 0.149 0.219 0.942 0.227 0.822 0.437 0.611 0.174 0.640 0.668 0.551 0.634 0.764 0.767 0.457 0.684 0.710
−0.115 −0.117 −0.108 −0.764 −0.019 −0.611 −0.163 −0.325 0.160 −0.296 −0.546 −0.074 −0.225 −0.150 −0.132 0.184 0.074 0.089
Several interesting general tendencies can be derived from Figure 4.37. First of all, the alignment of the average flexibility values as increasing from left to right clearly shows that the different flexibility parameters tend to cluster according to the part of the V NP-construction of which they assess the flexibility. Flexibility parameters assessing the flexibility of the noun occupy the left-hand end of the scale, that is, they rank low in flexibility, whereas those parameters measuring different aspects of flexibility of the verb-slot are all grouped in the right-hand part of the scale. Thinking of Gerund as a verb-related flexibility parameter, it appears to make the single exception to this general ordering, since it ranks slightly lower than MF Det. Alternatively, thinking of gerundial forms as mostly pointing to nominalizations, the placement of Gerund in the group of noun-related flexibility aspects makes perfect sense. Furthermore, Figure 4.37 reveals that within the groups of nominal as well as verbal flexibility parameters, V NP-constructions are on average less lexicosyntactically than morphologically flexible. With regard to the flexibility of the noun-slot, LF AttrNP, LF RelCl, LF PP, and LF AttrAdj have lower average entropy values than MF NumNP, MF Neg, MF Gerund and MF Neg. Likewise, all lexico-syntactic flexibility parameters measuring adverbial modification obtain lower flexibility values than the morphological flexibility parameters Person, Aspect, Mood, Tense and NumV. From this, one can conclude that the centrality that is often attributed to the verb in phrasal entities does not result in a particularly conservative behaviour in terms of restricted flexibility in conventionalized expressions such as the V NP-constructions of the present
D et LF _N oA dv LF _K in dA dv LF _A dd iti on M F_ Pe rs on M F_ A sp ec t M F_ M oo d M F_ Te ns e M F_ N um V
d
eg
er un
N
M F_
G
M F_
P
SF um N
M F_
N
M F_
A dj
oi ce
_A ttr
LF
V
M F_
LF _P P
lC l
N P
Re
M F_
(Mean) directional/relative H 0.50
0.40
0.30
Flexibility measures
LF _A ttr
1.00
0.90
0.80
0.70
0.60
0.20
0.10
0.00
Figure 4.37 Average entropy values for V NP-constructions and corresponding baseline.
147
Formal flexibility factors
148
Rethinking Idiomaticity
data sample. In contrast, the verb-slot is considerably more versatile than the noun-slot. As to the fact that Figure 4.37 indicates higher morphological than lexico-syntactic flexibility of V NP-constructions, one might object that this can be expected since morphological specification, other than lexico-syntactic specification, is obligatory. While this naturally increases the entropy values to some extent, the result still stands in contrast to previous corpus-based studies, which found variations in term selections less frequently than additions. Barkema (1994b), for instance, reports that in nominal compounds, term selections account only for 14.7 per cent of the variations observed, whereas additions account for 25.86 per cent (cf. Section 4.1.3). While nominal compounds cannot be expected to behave very similarly to the verb-slot of V NP-constructions, the fact that even for the noun-slot, morphological flexibility parameters have higher entropy values than the lexico-syntactic flexibility parameters stands at odds with Barkema’s observations. A possible reason for this difference may be the presence of a verb in V NP-constructions. This hypothesis awaits the investigation of other kinds of constructions with and without verbal components. Another interesting fact concerns the average entropy value of tree-syntactic flexibility and the morphological flexibility parameter Voice. Compared to the other flexibility parameters, both of them yield low average entropy values, that is, V NP-constructions are generally syntactically flexible only to a limited extent. This result stands in accord with previous studies which have claimed that tree-syntactic flexibility is limited in idiomatic expressions. That Voice yields an average entropy value within the same range as tree-syntactic flexibility could be expected since we have seen above that the most decisive parameter level in syntactic flexibility is the passive declarative variant form. Moreover, Figure 4.37 shows that it is these two parameters which depart most strongly from the baseline – as a matter of fact, the baseline values for tree-syntactic flexibility and Voice are the two highest baseline values overall. While it is more difficult to discern a more general relation between average construction entropy values and the corresponding baseline values, it is worth pointing out that while the baseline is generally higher than the average construction entropy value, the three flexibility parameters yielding the highest entropy values, as well as Gerund, yield higher average entropy values than their corresponding baseline values (in the case of Gerund, it is even nearly twice as high). For other variation parameters like Neg, in contrast, we find that the baseline can also be considerably higher than the average construction entropy. What is particularly striking in this regard is that those parameters which yield flexibility values higher than the baseline rarely if ever, played a prominent role in idiomaticity research. The corpus-linguistic results can be interpreted as pointing to speakers’ sensitivity to these variation parameters in particular. The huge discrepancy between construction and baseline values are a highly salient distributional fact about V NP-constructions that could not possibly go unnoticed by any speaker. Since idiomatic expressions are always considerably less flexible on these parameters than the whole inventory of constructions of this kind in general, these variation parameters have a high cue validity for idiom status: if a V NP-construction is substantially limited tree-syntactically, chances
Flexibility measures
149
are high it is a conventionalized, idiomatic string. However, as the comparison of the corpus-linguistic and the experimental results obtained will show, the importance of the individual flexibility parameters (and compositionality and corpus frequency) for judgements of overall idiomaticity are not (exclusively) a function of cue validity as defined by a substantial negative deviation of construction flexibility from baseline flexibility. Before approaching the question which variation parameters figure in idiomaticity judgements, it is useful to characterize the idiomatic variation continuum more systematically. To this end, the next chapter is devoted to the following questions: how do the different variation parameters relate to each other? Is it possible to identify groupings of parameters, and if so, what expectations can be derived from these groupings as regards their impact on the idiomaticity judgements?
Notes 1. For a constructionist, quantitative corpus-linguistic approach to incipient delexicalization in English constructions, cf. Zeschel (2007). 2. As Manning and Schu¨ tze (2001:61) point out, ‘[entropy] is normally measured in bits (hence the log to the base 2), but using any other base yields only a linear scaling of the results’. 3. Small deviations are due to rounding. 4. The terminology used for the different syntactic variations is that also used in the ICE-GB. 5. Neither interrogatives nor imperatives are attested in the passive voice for V NP-constructions in the ICE-GB, but there are examples of interrogative passive variants in the BNC-based data sample (e.g. would the story have been told . . . ? is an example of an interrogative passive variant of write X letter ), so this syntactic variation had to be included, too – even though no attestation was found for imperative passives in the BNC-based data sample, either, it remained in the set of variations because it complements the paradigm. 6. Interestingly, the conservative behaviour of English apparently only holds for two-partite nominal compounds. Berg and Helmer (2006) observe that with respect to the frequency of tri-partite and quadruplet nominal compounds, English is actually increasingly flexible as opposed to German, in which two-partite nominal compounds play a prominent role in word-formation processes. In the present data sample, no instantiation of either a tri-partite or quadruplet compound is attested; however, this does not invalidate Berg and Helmer’s claim since the data sample is restricted to the NP-slot of conventionalized V NP-structures. 7. No distinction was made between gerunds and present participles; for a detailed account why this distinction is obsolete, cf. Huddleston and Pullum (2002:1220–22). 8. As opposed to a linear regression line, a locally weighted regression line is ‘smoothed’ to the actual distribution of the data such that the regression is not necessarily a straight line, but can be wavy.
Chapter 5
The idiomatic variation continuum
Introduction In Chapter 1, a fundamental distinction was made between idiomatic variation and idiomaticity. The former refers to differences in the distribution of constructions according to the semantic and formal variation parameters discussed in Chapters 3 and 4; the latter reflects the psychological concept that is assumed to be guided by the variation observable in performance data. The corpus results presented in the previous chapters revealed a complex distribution of the data such that constructions vary quite considerably with regard to the values they take on for different variation parameters. Likewise, they were shown to exhibit differently strong tendencies to deviate from baseline values. The present chapter explores the question to what extent it is possible to make out patterns that structure this complex distribution. That is, while we turn to the question what speakers actually do with the input available to them in the performance data in Chapter 6, the major question addressed here is whether there are groups of parameters that are so highly associated with each other that they form clusters of information. From a usage-based perspective, these clusters would constitute a natural choice for speakers to focus on when judging a construction: the more distributional information clusters around certain aspects of the construction, the more relevant it should be for a qualitative assessment. In order to find out how much internal structure can be discerned in the performance data, the data were subjected to a Principal Components Analysis. Section 5.1 explains this method; Section 5.2 summarizes the results.
5.1 Method: Principal Component Analysis All 20 idiomatic variation parameters were subjected to a P rincipal Components Analysis (PCA). A PCA is a statistical procedure that is geared towards detecting structure within a set of variables and grouping them into over-arching factors (or principal components). That is, it serves the purpose of reducing the complexity of the data involved by identifying commonalities between the different variables and classifying them into larger groups on the basis of these commonalities (Bortz 2005:511ff.).
The idiomatic variation continuum
151
Statistically speaking, how much two variables have in common is defined as the height of the correlation between them. The correlation between two variables can be conceived of as depicted by a scatterplot to which a regression line has been added that summarizes the relationship between the variables (an example is given in Appendix F, which provides the scatterplots and regression lines for the correlations of the results produced by the two flexibility measures). The higher the correlation, the better the distribution captured by the regression line. If it is possible to define a variable that approximates this regression line, that is, a variable that captures the essence of the two items, this variable can be used instead of the two original ones in future studies. When it is possible to identify such a principal component, the amount of variables has been reduced with only minimal or (ideally) no loss of information, and the two original variables are classified at a higher-order level. A PCA is furthermore able to compare the correlations of more than two variables at once, that is, it identifies regression lines in a multi-dimensional variable space. The primary strategy to detect these regressions is referred to as variance maximizing. The regression line can be thought of as the original X axis that has been rotated so as to be maximally close to the regression line. It is exactly this kind of rotation that is involved in variance maximizing: the best regression line is one which captures most variance of the data it represents and, at the same time, leaves as little variance uncaptured as possible. After the first regression line has been identified, the PCA starts anew: a second regression line is created that captures a maximum of the remaining variance of the data. These components are also referred to as consecutive factors. They are independent from each other because one consecutive component only considers the amount of variance of the data that remains unaccounted for by previously extracted components. Where to draw the line between principal components that are still relevant in the sense that they capture a sufficient amount of variance and those which do not is basically an arbitrary decision. A common strategy is to take the socalled eigenvalues of each component into account and only consider factors with an eigenvalue of 1 or higher (this strategy is also referred to as the Kaiser criterion). The eigenvalue of a component predicates how many of the original variables are represented by a component, so a component with an eigenvalue of 1 captures as much variance as one original variable. Next to the eigenvalues, the PCA also provides the percentage of the variance that is captured by each component. These numbers may serve as an additional clue to the number of relevant components. As to which variables are grouped into which higher-order principal components, each variable is assigned a so-called component loading, which represents the correlation coefficients between the variable and the component. As with Pearson’s r correlation coefficient, the rough guideline for their interpretation is that component loadings of 0.7 or higher can be considered as pointing towards substantial relatedness.
152
Rethinking Idiomaticity Table 5.1 Principal components identified by the PCA. Component
Eigenvalue
Cum. eigenvalue
% variance
Cum.%
1 2 3 4 5 6 7 8
4.224 2.882 2.12 1.901 1.357 1.241 1.168 1.009
4.224 7.106 9.226 11.127 12.484 13.726 14.894 15.903
21.121 14.41 10.599 9.505 6.786 6.207 5.842 5.047
21.121 35.53 46.129 55.635 62.421 68.628 74.47 79.517
5.2 How idiomatic variation parameters cluster As is obvious from Table 5.1, the PCA compresses the 20 original variation parameters into a total of eight components.1 The second column from the left provides the eigenvalues of each of these components; the second column from the right shows how much of the total variance of the data (in percent) these components account for. The respective columns to the right of these provide cumulative numbers. The first principal component, for instance, has an eigenvalue of 4.224, that is, it summarizes a little more than four of the original 20 idiomatic variation parameters. In doing so, it accounts for a little more than 21 per cent of the total variance in the data. The second component accounts for a little less than three of the original variables that are left unaccounted for by component 1; in isolation, it captures about 14 per cent of the remaining variance in the data. Taken together, components 1 and 2 account for more than 35 per cent of the total variance in the data, and they represent about seven of the original factors. Summing up, the first four components are by far the most important ones: they have an information value that is equivalent to that of more than half of the original variables (their cumulative eigenvalue amounts to 11.127). Correspondingly, they account for more than half of the total variance in the data (55.635 per cent). Components 5, 6, 7 and 8 also yield eigenvalues higher than 1, but the explanatory power they add is much lower than for the first four components – each additional component added after component four increases the cumulative eigenvalue only by about the size of one original variable. Accordingly, the PCA shows that the idiomatic variation parameters can be reduced to eight principal components, that is, about 40 percent of the number of original variables, and still account for 74.47 per cent of the total variance. This is a solid result that testifies to a considerable amount of inner structure within the idiomatic variation parameters. The component loadings of each individual idiomatic variation parameter on the principal components show which of the idiomatic variation parameters are grouped together; consider Table 5.2. Component loadings higher than
Table 5.2 Component loadings for the idiomatic variation parameters according to the PCA. Variable
2
3
4
5
6
7
8
0.901 0.434 −0.082 0.173 0.31 0.04 0.894 0.064 0.625 −0.129 −0.012 0.2 0.137 0.611 0.152 −0.145 0.103 0.067 −0.128 −0.29
−0.087 0.326 0.943 0.388 0.158 0.95 0.03 −0.104 0.046 0.431 0.057 0.009 −0.078 −0.071 0.147 −0.152 0.151 −0.005 −0.026 −0.026
0.069 −0.394 0.077 −0.063 0.259 0.098 0.074 −0.032 0.157 −0.224 −0.012 0.739 −0.176 0.146 0.172 0.064 0.926 0.648 −0.082 −0.182
0.2 0.15 0.081 −0.136 0.128 0.008 0.218 0.016 −0.187 0.398 −0.008 0.09 0.118 0.377 0.049 −0.583 0.059 0.195 −0.925 −0.801
0.098 −0.022 0.006 0.007 0.142 0.08 0.121 −0.055 −0.196 −0.103 0.907 −0.367 −0.083 −0.274 0.143 −0.01 0.12 0.548 −0.121 0.074
−0.009 0.328 0.107 0.47 −0.29 0.032 −0.092 −0.035 0.304 0.486 0.084 0.175 0.201 0.08 0.836 0.525 −0.01 0.036 0.013 −0.119
−0.106 0.277 0.019 0.46 −0.543 0.029 −0.064 −0.001 −0.219 0.073 0.005 −0.04 −0.795 0.116 −0.152 0.026 0.062 0.136 0.081 0.136
0.002 0.241 −0.069 −0.268 0.391 −0.026 0.079 0.939 −0.055 0.003 −0.071 −0.01 −0.09 0.11 −0.042 0.009 −0.043 0.14 −0.054 0.003
The idiomatic variation continuum
SF MF Person MF NumV MF Tense MF Aspect MF Mood MF Voice MF Neg MF Det MF NumNP MF Gerund LF Addition LF AttrAdj LF AttrNP LF PP LF RelCl LF NoAdv LF KindAdv Comp CorpFreq
1
153
154
Rethinking Idiomaticity
0.7 (absolute values) are highlighted in bold print since these are the variables which actually constitute the component. For those parameters which do not yield values higher than 0.7, their highest component loadings are italicized to indicate on which component they load highest, even if their contribution is not significant. According to Table 5.2, the most important principal component 1 (important here in the sense that it accounts for more variance than any other) comprises the idiomatic variation parameters tree-syntactic flexibility and the morphological flexibility parameter Voice – in other words, if one wants to describe the overall distribution of the V NP-constructions in the present data sample, these two parameters are most informative. Also, the morphological flexibility parameters Det and Person, as well as the lexico-syntactic flexibility parameter AttrNP, have their highest loadings on this principal component. As already suggested by a comparison of the average values for the flexibility parameters, tree-syntactic flexibility and Voice correlate so highly as to form a common principal component. This could be expected since the parameter level-specific results for tree-syntactic flexibility already pointed out that it is the parameter level ‘declarative passive’ which contributes most to the overall relevance of this parameter. This also indicates that we have to interpret this result with caution since one could argue that the PCA only grouped these two parameters together because, in large parts, they are a form of double coding (in the sense that both code the same information). Accordingly, the fact that they tend to correlate so high as to form a principal component does not mean that different variables are compressed into one component. The second most important component according to the PCA comprises the morphological flexibility parameters NumV and Mood; the third component comprises two lexico-syntactic flexibility parameters, namely Addition and NoAdv. Note how the factor loading for LF KindAdv of 0.648 only marginally misses the threshold value of 0.7, so one could consider it part of this principal component as well. The fourth component associates compositionality and corpus frequency. The strong correlation between these two parameters is actually higher than any correlation of either parameter with any other. Components 5, 6, 7 and 8 each comprise only one single variation parameter, in order of decreasing importance: MF Gerund, LF PP (MF Tense and MF NumNP load highest on this component, too), LF AttrAdj (MF Aspect has its highest loading on this component) and MF Neg.
Conclusion With regard to the idiomatic variation parameters that serve best to describe the overall distributional behaviour of the V NP-constructions in the present data sample, the results of the PCA suggest that tree-syntactic flexibility (which manifests itself in a corresponding morphological marking at the level of Voice) is the most important parameter; followed by aspects of morphological flexibility that
The idiomatic variation continuum
155
consider the number and mood of the verb involved in the V NP-construction; in turn followed by lexico-syntactic aspects of adverbial modification. Compositionality and corpus frequency are also of considerable relevance, yet compared to the other parameters, they capture less of the overall variance observable in the data. Given these results, the multifactorial corpus results by and large identify exactly those parameters as key characteristics of V NP-constructions that were already argued to have a considerable impact in monofactorial studies (both corpus-linguistic and psycholinguistic ones). For one, the idea that tree-syntactic flexibility or, to be more precise, passivizability, as the closer analysis of the parameter level-specific results suggested, finds support in the PCA. The same holds for the acknowledged connection between the possibility of adverbial modification and idiomatic variation. Although compositionality and corpus frequency turn out not to be the most decisive variation parameters from a purely corpus-linguistic point of view, the centrality assigned to both parameters throughout the literature is also captured by the multi-factorial analysis. Rather unheard of, in contrast, is the prominence of the morphological flexibility parameters NumV and Mood. Next to distinguishing relevant from irrelevant idiomatic variation parameters, the PCA excels the results of previous studies by providing a quantitative assessment of the relative importance of each variation parameter. This quantitative and comprehensive perspective on the principal components reveals that the relevance of tree-syntactic flexibility can hardly be underrated. Moreover, the results stand at odds with the widely held claim that compositionality is the most important variation parameter. It has been shown that speakers are sensitive to differences in compositionality, but the explanatory power of this parameter with respect to the idiomatic variation continuum is limited. However, this result has to be interpreted with caution since it can not be ruled out that the rather weak connection between compositionality and the overall distribution of the data stems from a suboptimal corpus-linguistic definition of compositionality. Interestingly enough, the corpus frequency of the V NP-constructions does not figure among the top four principal components, either. That is, the information retrievable from the distribution of V NP-constructions in performance data can hardly be reduced to absolute token frequencies. Once more finegrained classifications schemes as provided here are taken into account, a much more complex picture emerges.
Note 1. Actually, three PCAs were computed: one in which the Barkema-based flexibility values were entered, which is the one reported here; one in which the entropy-based flexibility values were entered instead; and a third in which all flexibility factors were entered with both the Barkema-based and the entropybased flexibility values. Which PCA to consider was based on the number of
156
Rethinking Idiomaticity
components extracted and the amount of variance that could be explained by these components. The first alternative, with the Barkema-based results figuring in the flexibility factors, proved to be by far the best with regard to both criteria, so this is the one reported here. Correspondingly, two multiple regressions of the idiomatic variation parameters and the idiomaticity judgements were computed. Again, it was the version with the Barkemabased values which fared much better from a statistical perspective. This also entails that the results of the PCA reported here and those of the multiple regression analysis can be related to each other without reservations.
Chapter 6
The idiomaticity continuum
Introduction The final section is devoted to the following questions: what reflects idiomaticity, that is, which of the idiomatic variation parameters do speakers rely on? And to what extent do they rely on these parameters when they judge the overall idiomaticity of a V NP-construction? As outlined in the previous section, the picture emerging from a multifactorial analysis of the corpus-linguistically defined idiomatic variation parameters is that both constructions’ average behaviour on these parameters and the spread of the values of the individual V NPconstructions on each of these idiomatic variation parameters can reasonably be argued to reflect the probabilistic influence of some parameters. In order to approach this question, the results of a multiple regression analysis are presented which compares the values produced by the individual idiomatic variation parameters with the average idiomaticity values from speakers’ overall idiomaticity judgements. The section concludes with a summary of the results thereby obtained, and discusses the conclusions that have to be drawn from the present study as to the nature of idiomatic variation in performance data, the nature of idiomaticity as a psychological construct and the relation between the two.
6.1 Method: Multiple regression analysis In order to first obtain answers to the question if and which idiomatic variation parameters figure in speakers’ concept of idiomaticity, a so-called M ultiple Regression Analysis (henceforth MRA), was computed. The defining characteristic of an MRA compared to a simple bivariate correlation is that two or more independent variables enter into the computation. In the context of the present study, the independent variables are the values of the idiomatic variation parameters for each V NP-construction; the dependent variable is the corresponding average normed idiomaticity value of the V NP-constructions as they can be derived from speakers’ idiomaticity judgements. The MRA produces exactly the pieces of information that we are interested in at this point. First, it provides a correlation value between the independent
158
Rethinking Idiomaticity
variables in toto and the dependent variable. This overall correlation value shows if and to what extent the idiomaticity judgements are actually correlated with the usage-based variation parameters. Secondly, the MRA provides so-called beta-weights for each independent variable. These beta-weights quantify the contribution of each individual independent variable to the overall correlation between the usage-based variation parameters and the idiomaticity judgements. In other words, provided that a general connection between the independent variables and the dependent variables is confirmed by the overall correlation value, the beta-weights allow us to answer the more specific question how important the different idiomatic variation parameters are for the overall idiomaticity judgement in relation to each other, and we can precisely quantify their importance.
6.2 Assessing the relevance of idiomatic variation parameters for idiomaticity judgements An MRA including all idiomatic variation parameters yields a highly significant correlation (R2 = 0.794, p = 0.005∗∗ ). Taking all the idiomatic variation parameters into account, nearly 80 per cent of the variance in the average idiomaticity judgements is accounted for. While the adjusted R 2 -value, which is the more accurate one to report, amounts only to 0.565, it has to be borne in mind that this value is lowered by the overall number of variables entering into the computation: the more the variables are required to account for all the variance in the data, the lower the adjusted R2 will be.1 In sum, the overall correlation testifies to a solid relationship between the idiomatic variation parameters and the idiomaticity judgements. So what are the variation parameters that contribute most to this overall correlation? Consider Table 6.1, which provides the (absolute) beta weights for all idiomatic variation parameters. The closer an idiomatic variation parameter is to 1, the more important it is. Generally, beta weights =+0.22 can be considered relevant because this value indicates that the parameter accounts for 5 per cent of the variance. As Table 6.1 shows, the most important variation parameters are the morphological flexibility parameters NumV and Mood. They are followed in rank by two lexico-syntactic flexibility parameters, KindAdv and NoAdv. Next in line are compositionality and tree-syntactic flexibility. The morphological flexibility parameters Voice and Neg also yield sufficiently high beta weights to be considered relevant. The last variation parameter with a value higher than +0.22 is the lexico-syntactic flexibility parameter Addition (0.265). Corpus frequency (CorpFreq) yields a beta weight of only 0.209. Given the difference between Addition and CorpFreq, it seems that drawing the line at +0.22 does not impose an artificial cut-off point, but coincides well with the actual results. If we relate the results of the PCA, i.e. our model of idiomatic variation clusters in performance data, to that of the MRA, we find that there is an extraordinary fit
159
The idiomaticity continuum Table 6.1 Beta weights for variation parameters as determined by a multiple regression of corpus and judgement data. Variation parameter
Beta weight
MF NumV MF Mood LF KindAdv LF NoAdv Comp SF MF Voice MF Neg LF Addition
0.757 0.695 0.651 0.632 0.578 0.573 0.351 0.275 0.265
CorpFreq MF Person MF Gerund MF Tense MF NumNP LF AttrNP MF Det MF Aspect LF PP LF RelCl LF AttrAdj
0.209 0.197 0.16 0.125 0.109 0.083 0.055 0.046 0.043 0.038 0.032
between the two. If we consider the top nine parameters speakers rely on according to the MRA and check where they occur in the PCA, it becomes obvious that all of these parameters are exactly those which form the most important principal components in the PCA. NumV and Mood form one principal component; the lexico-syntactic flexibility parameters form another, tree-syntactic-flexibility and Voice form yet another component. According to the MRA, speakers furthermore rely on the morphological flexibility parameter Neg – which is one of the parameters which constitutes a principal component of its own. And although corpus frequency, which forms one factor in the PCA model alongside compositionality, is not assigned a sufficiently high beta weight in the MRA to be considered statistically relevant, it nevertheless is the parameter closest to compositionality, and closest to the threshold level of +0.22. Not only do the parameters that correlate highest with the overall idiomaticity judgements coincide with the principal components, what is more, they single out those parameters which form the most important principal components (the only exception here is Neg, which, according to the PCA, is only the eighth most important component). NumV and Mood form the second component, the adverbial flexibility parameters the third, and tree-syntactic flexibility and Voice
160
Rethinking Idiomaticity
are the parameters comprising component 1. At the same time, none of the parameters which the PCA did not identify as belonging to either component is present among those parameters that the MRA singled out as being relevant for speakers’ idiomaticity judgements. Beyond that, the ordering of the idiomatic variation parameters in the MRA according to their beta weights even results in those parameters being closest which also form common principal components according to the PCA. Alternatively, it would have been possible that the parameters are identified as important by the MRA, but that, say, NumV is the most important one in terms of its beta weight, and that Mood is also assigned a high beta weight, but not the second highest one. Even for those parameters which are not particularly important, we find that morphological flexibility parameters relating to the verb-slot on the whole rank higher than those flexibility parameters related to the noun phrase-slot of V NPconstructions. Paired with the absence of noun-related flexibility parameters among those parameters with a beta weight higher than +0.22, the results strongly point towards the verb-slot being the component part on which speakers found their judgement of the idiomaticity of the construction.
Conclusion So how do speakers process this complex picture? Which pieces of information are relevant to them when they decide on the overall idiomaticity of a V NPconstruction? The corpus-linguistic results do not provide a definitive answer to this question. However, they suggest different possibilities, depending on how exactly speakers extract information from their linguistic environment. Speakers could base their idiomaticity judgement on the perceived peculiarity of the V NP-construction in the sense that they compare its distributional characteristics with those of V NP-constructions in general. If speakers adopted this strategy, it would be very likely that tree-syntactic flexibility plays a major role in their assessment, because this is the variation parameter with the strongest discrepancy between V NP-constructions and baseline. In fact, formal flexibility has often been defined as the discrepancy between potential and observed flexibility, which implies exactly this kind of strategy. Predictions change drastically if there is a particular flexibility threshold value that has to be superseded for an idiomatic variation parameter to be taken into account at all. It is plausible to assume that a variation parameter like, say LF PP, will play only a minor role in assessing idiomaticity because the information it provides is very limited, at least with respect to V NP-constructions. They simply rarely if ever take prepositional phrases. While this tendency is even stronger for the V NP-constructions of the data sample than in the baseline sample, which could at least reveal to a speaker that the V NP-construction in question is relatively idiomatic, it is conceivable that this information is not considered at all because exposure to constructions with prepositional phrases is too limited in
The idiomaticity continuum
161
general (compared with other idiomatic variation parameters and their parameter levels). Consequently, whether or not a construction is accompanied by a prepositional phrase may not be a salient criterion. Similarly, predictions on the potential impact of tree-syntactic flexibility would change drastically since the overall flexibility value is low. In contrast, parameters measuring aspects of adverbial lexico-syntactic flexibility and morphological flexibility of the verb should be the most important variation parameters since they yield the highest flexibility values. This strategy would also be compatible with results which suggest that the discrepancy between the V NP-constructions’ values and the corresponding baseline values are only of minor importance (if at all). Much rather, it would be sufficient frequency of exposure that triggers awareness for a certain parameter. A third possible strategy for speakers could be to base their judgements primarily on those idiomatic variation parameters on which the different constructions diversify most. This strategy could help to facilitate comparative idiomaticity judgements. If that were the case, those idiomatic variation parameters with the highest standard deviations from the average flexibility values should occur among the top parameters correlating with the judgements, such as MF Gerund, LF Kind Adv or MF NumNP (cf. Table 4.15). Yet another strategy that could dominate is one according to linguistic levels such that the resulting ranking of the idiomatic variation parameters groups those parameters belonging to the same linguistic level together (that is, syntactic, morphological and lexico-syntactic flexibility parameters, with compositionality and corpus frequency somewhere in between). The extraordinary overlap between the principal components as identified by the PCA and the parameters singled out by the MRA strongly suggest that speakers’ primary strategy is to concentrate on those distributional parameters which have the highest information value in terms of how much variance of the overall distribution they potentially cover. This becomes most evident with regard to the morphological flexibility parameters NumV and Mood obtaining the top ranks – a result that ties in perfectly well with the results of the PCA. At the same time, it stands in opposition to the widely held assumption that compositionality is the most decisive variation parameter contributing to idiomaticity, which underscores the usefulness of the bottom-up approach adopted here. Also, it is the verb rather than the noun phrase-slot of the V NP-construction that figures in the judgements. The extent to which V NP-constructions differ from their baseline, in contrast, does not apparently matter much. Alongside tree-syntactic flexibility and Voice, where this discrepancy is high, most other parameters do not exhibit such a negative discrepancy. In fact, for the most important parameters, Mood and NumV, we saw that the average flexibility values even supersede that of the baseline. The results are compatible with the hypothesis that the frequency of exposure of all parameter levels involved plays a major role. Most of the parameters which neither the PCA nor the MRA consider particularly relevant have low average V NP-construction values and low baseline values, e.g. AttrNP (average V NP-construction value = 0.032;
162
Rethinking Idiomaticity
baseline value = 0.147), RelCl (0.033/0.149), PP (0.111/0.13) and AttrAdj (0.208/0.227). Tree-syntactic flexibility and Voice and Neg, on the other hand, also yield only low average V NP-construction values, but their baseline values are among the highest. This could explain how this parameter gains salience for speakers’ judgements (cf. Table 4.15).
Note 1. How sensitive the adjusted R2 -value is to the number of variables is easily demonstrated by including only those variables which the first MRA identified as important (according to their beta weights): considering only 9 instead of 20 idiomatic variation parameters, the adjusted R2 amounts to 0.625, a much better result.
Chapter 7
Towards a new model of idiomaticity
The major conclusion that can be drawn from the present study as far as methodology is concerned is that the quantitative corpus-linguistic approach adopted here makes it possible to model idiomaticity in a comprehensive and adequate manner. The methodologies employed live up to general theoretical assumptions about the cognitive processes involved. The multifactorial statistics, in particular, reveal systematic patterns that could not be uncovered otherwise. Therefore, the study goes beyond merely identifying potential variation parameters and providing evidence in favour of their likely association with idiomaticity. Instead, it presents a multifactorial approach and subjects all the variation parameters at once to the scrutiny of statistical tests. The relative influence of each variation parameter to the overall meta-concept idiomaticity is assessed quantitatively. In doing so, this study is one of the first to do justice to the assumption that cognitive processes in general, and consequently intuitionbased phenomena like idiomaticity in particular, are complex in nature. Moreover, the present study distinguishes itself from previous studies in which more than one idiomatic variation parameter was discussed in that all the operationalizations presented here are data-driven. An attempt is presented to measure the different variation parameters without reference to any given classification scheme which is not itself empirically founded, such as different classes of compositionality. Instead of choosing operationalizations which influence or even determine the outcome of the hypothesis-testing process, the present operationalizations have been developed in a bottom-up fashion and, thus, allow for a maximally objective and comprehensive analysis. Thirdly, the corpus-linguistic definitions of the variation parameters are in line with general theoretical assumptions of usage-based Construction Grammar. All operationalizations allow for the existence of scalar categories, differences in the cognitive entrenchment of variation parameters and their parameter levels, and an active interplay between lexical and constructional semantics, all of which determine the overall behaviour of idiomatic phrases. Given the match between usage-based approaches to language and quantitative corpus-linguistics, the measures developed and extended here are not restricted to the analysis of V NP-patterns or VPCs. They can be applied to any kind of complex construction. Moreover, many of the methodologies presented
164
Rethinking Idiomaticity
here may prove useful for questions that only remotely relate to idiomaticity. The basic concept of entropy has been applied to various linguistic questions already;1 the expansion I introduced in Section 4.2.4, directional entropy, may be particularly useful when one wants to assess not only the degree of variation in some data sample with respect to different variable levels, but when a qualitative difference is associated with the predominant presence of one of these variable levels as opposed to another. Similarly, the expansion of Barkema’s flexibility measure was found potentially useful for a quantitative assessment of the degree of schematization of particular constructional slots. With respect to the compositionality measure presented here, Doug Biber (p.c.) suggested that the part of the extended R-value which measures the semantic contributions made by individual words could alternatively be employed to assess the degree of semantic bleaching of verbs, which could be interesting with regard to the investigation of incipient grammaticalization processes, particularly with regard to the question to what extent bleaching is context-dependent. With respect to the theoretical implications of the present study, Section 1 delineated the primary theoretical question: what reflects idiomaticity from a usage-based perspective? The present study is the first to present an approach that comprises both semantic and syntactic variation and assesses the relative importance of each variable. The results tie in very well with many widely established claims about idiomaticity, with tree-syntactic flexibility, particularly passivizability, as a key characteristic to describe the distribution of V NPconstructions. Likewise, the central role of compositionality is reproduced by the PCA. However, the multifactorial perspective reveals that when being considered in toto, other variation parameters turn out to be even more important, namely aspects of morphological and lexico-syntactic flexibility. The analysis particularly emphasizes the importance of morphological flexibility in terms of the number and the mood of the verb, a fact that stands out because, to my knowledge, no previous study of idiomaticity claimed that these variables are of relevance. Indeed, morphological flexibility has not been discussed at all in the context of the idiomaticity of V NP-constructions. One could argue that this result is a statistical artefact such that morphological features will always account for a considerable amount of variance in the data. However, from that it does not necessarily follow that they are not of any psychological relevance with regard to perceived idiomaticity. Neither would this objection explain why it is the number and the mood of the verb that rank so high in the PCA, while other morphological factors like those measuring variation with respect to person or tense do not. Thirdly, according to Newman and Rice’s (2005) Inflectional Island Hypothesis, verbs can be strongly associated with particular inflected forms, just like lemmas can exhibit strong preferences for a restricted set of argument structures and lexical meanings. They also note that verbs with extremely biased inflectional profiles are particularly susceptible to grammaticalization – which may be an explanation for speakers’ (subconscious) sensitivity towards morphological parameters when judging the idiomaticity of V NP-constructions.
Towards a new model of idiomaticity
165
Consequently, this prominence of MF NumV and MF Mood in the PCA is a testing case for a performance-based approach. If speakers are really sensitive to their input and base their judgements on those variation parameters that cover most of the variance in the input, we would expect MF NumV and MF Mood to rank high in the multiple regression analysis – which they do. This result, combined with the overall match between the principal components identified by the PCA and the ranking and importance of these components in the multiple regression analysis as outlined in Chapter 6, provides strong support for a perfomance-based approach in which speakers’ primary strategy is to focus on those variation parameters which explain the distribution of the data best. The fact that speakers are obviously sensitive to correlational clusters of variation parameters as modelled by the PCA particularly supports a usage-based approach that assumes only little grammatical hardware. Speakers do not apply given grammatical concepts or categories to the data in order to make sense of them; rather, they build their categories in a bottom-up fashion, the primary task being to cover as much variance in the input as possible. The results also suggest that speakers are paying attention to the lexicosyntactic flexibility of V NP-constructions, particularly the flexibility of the verb slot. While previous studies have pointed towards the association between lexicosyntactic flexibility and idiomaticity, the high ranking of these parameters in the correlation of the corpus-based data with the idiomaticity judgements ascribes them a much more prominent role. As Elizabeth Traugott pointed out (p.c.), speakers’ focus on aspects concerning the adverbial modification potential of the V NP-constructions when judging their overall idiomaticity makes a lot of sense from the perspective of grammaticalization processes: Adverbial flexibility is one of the first properties phrasal constructions tend to lose during this process. In sum, speakers seem to rely on a variety of parameters when judging the overall idiomaticity of V NP-construction, with a particular focus on the morphological and lexico-syntactic variability of the verb, and tree-syntactic and semantic features of the phrase playing an important, yet secondary role. The solid correlations between the corpus-based definitions of the variation parameters and speakers’ idiomaticity judgements lend further credence to a performancebased approach to language. With respect to the theoretical implications of these findings, the major question is: what should a model of idiomaticity look like? From a constructionist perspective, this question can be reformulated as follows: How and where is the information about the idiomatic variation of a construction stored in the constructicon? As outlined in Section 1.2.5, descriptions of the constructicon mainly focus on the gradience regarding the lexical specification and structural complexity of constructions. This creates a continuum of constructions in which idiomatic expressions of the kind analysed here are located somewhere in the middle. In other words, the constructicon is mainly specified with regard to what I would
166
Rethinking Idiomaticity
like to call the vertical axis of the constructicon. The primary process creating diversification along the vertical axis beyond the level of lexical constructions is delexicalization (or schematization). Complex constructions create a continuum across the vertical axis with regard to how open their slots are for lexical substitutions and how many of the slots of the phrasal construction are lexically specified. More or less idiomatic constructions, however, do not spread across this vertical axis. Idiomaticity is a property that mainly characterizes fully lexically specified or mostly lexically specified constructions, that is, constructions which are only minimally schematized. In many respects, idiomatization is diametrically opposed to delexicalization: the more idiomatic a construction is, that is, the more formally fixed and semantically non-compositional, the less likely it is that this construction delexicalizes. In other words, on a continuum of idiomatic phrases ranging from collocations to idioms, the more idiomatic the phrase, the less delexicalization potential it has. Goldberg’s revised definition of the notion of construction, in which conventionalization is the only necessary condition for a phrase to qualify as a construction, and in which non-compositionality is downgraded to a sufficient criterion for construction-hood, is compatible with the use of the term idiom to cover all phrases on a collocation-idiom continuum. However, this revision has not yet found its way into a schematic outline of the constructicon. The present results confirm the hypothesis that idiomaticity cannot be reduced to non-compositionality but that it is a complex meta-concept comprising semantic and formal information. Beyond that, an adequate model of idiomaticity must license differences in the weightings of the different parameters and/or parameter levels contributing to overall idiomaticity. Also, the model has to be able to accommodate the fact that there is no stable correlation between the different variation parameters. In other words, the model must be able to handle probabilistic information. It has to represent what kind of variation is possible, and how likely it is that this kind of variation is instantiated by any given attestation of a particular V NP-construction. With regard to the latter point, the first conclusion to be drawn is that idiomaticity has to be represented construction-specifically, such that, say, tree-syntactic flexibility is not represented at the level of the abstract V NPconstruction, but individually for each lexically specified construction (i.e. make X point, take X piss, make X headway, etc.). Likewise, representing idiomaticity in the form of one overall index value is not feasible because this would stand in opposition to the fact that speakers can alternatively judge constructions according to individual variation parameters, such as compositionality or lexicosyntactic flexibility. Accordingly, this information has to be retrievable from the particular construction’s representation rather than the overall idiomaticity value. The latter could be created ad hoc on the basis of the information stored for the individual variation parameters. This hypothesis also accomodates the assumption that idiomaticity judgements are most likely context-dependent to some extent.
167
Towards a new model of idiomaticity morphemes low SCHEMATIZATION words write, take, letter, plunge,… complex words complex low IDIOMATICITY constructions write a letter
high IDIOMATICITY take the plunge burn with ambition
V letter write NP
passive construction
high SCHEMATIZATION
Figure 7.1 Extended representation of the construction.
In order to integrate all these different kinds of information about the idiomaticity of a construction into the constructicon, I propose to extend the constructicon by adding another dimension that I will refer to as the horizontal axis. This horizontal axis cuts across the small range of the vertical axis where fully lexically specified complex constructions are located. More precisely, one can think of the constructicon as bifurcating at the level of complex constructions. Beyond that level, constructions are either increasingly delexicalized, according to which they can be positioned on the vertical axis; or they are increasingly idiomatized, according to which they can be positioned on the horizontal axis. The closer a phrasal construction is located on the horizontal axis to the vertical axis, the more compositional and fomally less constrained it is (e.g. write X letter ). The more formally frozen and semantically opaque a construction is (such as take X plunge), the further away from the vertical axis that construction is positioned on the horizontal axis. A schematic representation is given in Figure 7.1. The horizontal axis itself has multiple layers, one representing each variation parameter. These layers form clusters that, on a level of coarse granularity, can be likened to the principal components identified by the PCA. On a finer level of granularity, they can be likened to variables. For instance, one layer represents information about aspects of morphological flexibility in terms of the number of the verb, another that of the mood of the verb. Since they correlate highly, they form a cluster. Each V NP-construction is represented once on each of these layers (or clusters of layers), that is, it is assigned a value on the morphological flexibility layers, the lexico-syntactic layers, the compositionality-layer, etc.
168
Rethinking Idiomaticity
The higher the overall idiomaticity of a construction, the less its representation is connected to its constituting lexical constructions. For instance, take X plunge is both semantically highly non-compositional and formally restricted, so its overall idiomaticity is high. Accordingly, take X plunge is only weakly connected with the lexical representation of take and plunge. Write X letter , on the other hand, ranges relatively low on all idiomatic variation parameters, and so it is more strongly connected with the lexical representations of write and letter further up the vertical axis – and consequently, it is also more strongly connected with other lexical constructions that are associated with write and letter , such as type/compose or email/paper . This makes write X letter a likely candidate for substitutions and subsequent schematizations. By adding the horizontal idiomaticity axis (or better, the idiomatic variation parameter axis) to the constructicon, the results of the present study can be represented, and the representation also stands in line with established findings from previous studies. To begin with, the horizontal axis basically represents the idiom-collocation continuum, specifying the (slightly misleading) term idiom on the vertical axis. Conceptualizing the vertical axis as multilayered, the fact that idiomaticity is a multifactorial and scalar concept is represented. By assigning each construction a value on each of these layers, item-specific differences in the weightings of the different variation parameters are represented. Fourthly, the distributional similarity of the different variation parameters can be represented via the distance between the different layers. Fifthly, the construction-specific representation accommodates the fact that speakers are able to judge constructions according to their values on individual variation parameters. Moreover, this item-specific, multilayered, and therefore storage-redundant representation accommodates Langacker’s (1987) Rule-List Fallacy, and therefore stands in accord with other contemporary models of linguistic representation. One example is Pierrehumbert’s (2003) exemplar theory, according to which phonetic knowledge can be regarded as the acquisition of probability distributions over a multidimensional phonetic space. While this kind of representation allows us to integrate all these aspects suggested by the empirical results, it may also be understood to entail some conclusions for which the present study does not provide direct empirical support. First, while the results of the present study strongly reconfirm the idea of a peformance-based concept of idiomaticity, it does not follow from the present study that idiomaticity is exclusively based upon performance data. Also, strictly speaking, the fact that the performance-based idiomatic variation parameters correlate highly with the idiomaticity judgements surely points towards their relevance – however, with regard to the question if and to what extent these parameters are also grammatically represented, the evidence presented here is not stringent – alternatively, at least some of the parameters speakers rely on could be categories created ad hoc (Barsalou 1992). For example, is it necessary to assume that the morphological and lexico-syntactic variation parameters that figured so dominantly in speakers’ judgements are actually represented in grammar? In view of the fact that both morphological and lexico-syntactic flexibility information is hardly ever drawn upon in the (synchronic) description
Towards a new model of idiomaticity
169
of individual constructions, a plausible hypothesis could be that speakers create these categories online when being asked to judge idiomaticity, and that their motivation to focus on these parameters is indeed performance-based because these parameters cover so much of the variance they are trying to systematize; yet from this it does not necessarily follow that these parameters have some grammatical status (although it is a plausible working assumption). In order to shed more light on the question which of the idiomatic variation parameters are actually grammatically represented, a greater variety of constructions has to be considered. Moreover, further experimental studies could focus on individual variation parameters; for instance, in the questionnaire used in the present study, all V NP-constructions were presented in their most typical morphological forms, so it would be interesting to see if similarly high correlations with morphological flexibility can be observed if the morphological contexts (or that of any other variation parameters, for that matter) are controlled for and/or varied systematically. These limitations, however, should not downplay the major findings of the present study: a seemingly complex and intuitive phenomenon like idiomaticity can be modelled on the basis of performance data, thereby providing further evidence for a mutual interplay between grammar and usage and the relevance of studies of authentic language like the present one as summarized by Bybee (2006:730): Usage feeds into the creation of grammar just as much as grammar determines the shape of usage. Actual language use cannot be omitted from the study of grammar, because it constitutes a large part of the explanation for why languages have grammar and what form that grammar takes. More specifically, the results suggest that the constructicon must be enriched with a multi-dimensional, horizontal axis that serves to represent the information that speakers draw upon when judging the idiomaticity of a construction. In sum, the present study not only lends further credence to the basic working assumptions of perfomance-based theories of grammar; what is more, it forms a queue with the increasing number of recent studies (cf., among others, Arppe and J¨arvikivi (2007), Gries et al. (2005), Kepser and Reis (2005)) that demonstrate the vast potential that resides in combining quantitative corpus-linguistic and experimental methods for the study of language.
Note 1. For instance, Shannon (1950) is a classic study in the field of Natural Language Processing. He applies entropy to assess the entropy of the English language, the ultimate aim being to find the key for predicting words in text without reference to human intuition, that is, artificial intelligence. For a more recent application of entropy as a collocation measure, cf. Mason (2000).
Appendices
A. Example of a questionnaire for the elicitation of perceived idiomaticity data
172 Rethinking Idiomaticity
B. Results for compositionality
Table B.1 Compositionality values for V NP-constructions (R-value extension I). V NP-construction
R
V NP-construction
R
have X clue have X laugh do X trick cross X finger tell X story grit X tooth write X letter make/pull X face close X door bear X fruit foot X bill fight X battle fill/fit X bill hold X breath draw X line carry X weight meet X eye break X heart pave X way play X game
0.233 0.305 0.413 0.625 0.649 0.689 0.720 0.773 0.782 0.791 0.820 0.864 0.871 0.876 0.878 0.889 0.904 0.906 0.911 0.914
get X act together catch X eye call X police leave X mark change X hand take X piss cross X mind make X point deliver X good see X point take X course beg X question follow X suit make X mark take X root scratch X head make X headway break X ground take X plunge
0.943 0.946 0.947 0.947 0.956 0.965 0.967 0.973 0.983 0.985 0.986 0.989 0.991 0.996 0.996 0.997 1.000 1.000 1.000
174
Rethinking Idiomaticity Table B.2 Compositionality values for VPCs (R-value extension I). VPC
R
switch off knock up knock down fill up act up hold up show off live down take off give up give back
0.037 0.047 0.061 0.105 0.154 0.183 0.209 0.242 0.366 0.760 0.926
Table B.3 Compositionality values for V NP-constructions (R-value extension II). Construction
R
Construction
R
make X headway take X plunge take X piss make/pull X face get X act together pave X way change X hand take X course foot X bill see X point leave X mark grit X tooth break X ground meet X eye make X mark have X laugh fill/fit X bill have X clue call X police play X game
0.003 0.004 0.008 0.021 0.026 0.033 0.051 0.058 0.058 0.062 0.074 0.079 0.079 0.101 0.106 0.106 0.108 0.117 0.117 0.132
carry X weight follow X suit beg X question bear X fruit deliver X good cross X finger draw X line take X root cross X mind hold X breath break X heart fight X battle do X trick make X point scratch X head close X door catch X eye tell X story write X letter
0.137 0.147 0.150 0.160 0.161 0.171 0.174 0.185 0.225 0.232 0.238 0.288 0.340 0.359 0.368 0.421 0.432 0.730 0.844
175
Appendices Table B.4 Compositionality values for VPCs (R-value extentsion II). VPC
R
act up knock up live down hold up switch off show off knock down take off fill up give back give up
0.003 0.009 0.013 0.033 0.038 0.063 0.065 0.081 0.096 0.115 0.416
C. Barkema-based flexibility values for tree-syntactic, lexico-syntactic, and morphological flexibility
Table C.1 Mean (normalized) sums of squared deviations from baseline ([N]SSD) for tree-syntactic flexibility (SF). Construction
SSD
NSSD
Construction
SSD
NSSD
call police draw line write letter make point tell story play game take course cross finger hold breath close door make face catch eye have laugh carry weight have clue make headway take piss grit tooth fight battle break ground
1,744.369 3,061.140 3,522.699 3,740.335 4,207.359 4,510.159 4,910.355 5,246.056 5,621.978 5,863.658 6,678.110 6,684.504 6,749.436 6,850.612 7,123.239 7,165.667 7,250.543 7,304.705 7,323.799 7,344.649
0.216 0.379 0.436 0.463 0.520 0.558 0.607 0.649 0.695 0.725 0.826 0.827 0.835 0.847 0.881 0.886 0.897 0.904 0.906 0.909
take plunge do trick beg question fit bill see point foot bill pave way leave mark break heart cross mind deliver good bear fruit scratch head take root make mark change hand follow suit get act together meet eye
7,375.770 7,460.539 7,504.114 7,553.738 7,560.582 7,598.400 7,602.557 7,619.287 7,642.878 7,740.437 7,804.965 7,888.067 7,903.738 7,903.917 7,987.835 8,083.493 8,083.493 8,083.493 8,083.493
0.912 0.923 0.928 0.934 0.935 0.940 0.941 0.943 0.945 0.958 0.966 0.976 0.978 0.978 0.988 1.000 1.000 1.000 1.000
177
Appendices Table C.2 (Normalized) sums of squared deviations from baseline ([N]SSD) for the lexico-syntactic flexibility parameter Addition. Construction
SSD
NSSD
Construction
SSD
carry weight make headway write letter draw line leave mark tell story beg question fight battle make mark take piss see point close door change hand break ground pave way take course play game foot bill take root have laugh
.531 .848 4.742 14.817 26.412 32.941 43.235 47.608 148.060 171.057 248.679 474.876 500.345 722.011 780.948 881.386 930.103 1479.953 1488.801 1507.584
.000 .000 .000 .001 .002 .003 .004 .004 .014 .016 .023 .044 .046 .067 .072 .081 .086 .137 .138 .139
deliver good make point cross finger cross mind make face bear fruit have clue call police catch eye take plunge fit bill meet eye break heart grit tooth get act together scratch head do trick hold breath follow suit
1542.563 1781.370 1819.584 2002.046 2378.101 2946.813 3038.701 3071.461 3635.068 3693.590 3747.909 3850.848 4228.795 5419.876 5562.072 6611.601 6775.823 8524.115 10825.138
NSSD .142 .165 .168 .185 .220 .272 .281 .284 .336 .341 .346 .356 .391 .501 .514 .611 .626 .787 1.000
Table C.3 (Normalized) sums of squared deviations from baseline ([N]SSD) for the lexico-syntactic flexibility parameter AttrAdj. Construction
SSD
NSSD
Construction
SSD
write letter leave mark beg question tell story make face play game make headway have laugh draw line make mark take course meet eye bear fruit foot bill make point close door have clue call police break heart take plunge
16.884 25.765 31.653 33.094 52.814 55.289 81.853 93.844 116.715 131.837 133.881 171.814 333.632 338.882 368.541 610.635 645.599 681.609 732.085 772.062
.002 .003 .003 .003 .005 .006 .008 .009 .012 .013 .013 .017 .033 .034 .037 .061 .064 .068 .073 .077
catch eye deliver good cross mind get act together hold breath scratch head fit bill follow suit pave way cross finger do trick grit tooth see point take piss take root change hand carry weight fight battle break ground
826.414 901.629 954.548 957.181 1007.280 1056.785 1069.477 1080.704 1115.212 1150.528 1150.528 1150.528 1150.528 1150.528 1150.528 1150.528 1265.603 2979.100 10030.292
NSSD .082 .090 .095 .095 .100 .105 .107 .108 .111 .115 .115 .115 .115 .115 .115 .115 .126 .297 1.000
178
Rethinking Idiomaticity
Table C.4 (Normalized) sums of squared deviations from baseline ([N]SSD) for the lexico-syntactic flexibility parameter AttrNP. Construction
SSD
NSSD
Construction
SSD
NSSD
close door draw line play game fight battle tell story foot bill call police write letter deliver good leave mark catch eye hold breath fit bill get act together have clue bear fruit beg question break ground break heart change hand
0.000 1.463 4.699 14.846 45.192 53.252 55.106 68.223 71.938 113.564 135.277 151.720 152.878 158.892 158.892 158.892 158.892 158.892 158.892 158.892
0.009 0.030 0.093 0.284 0.335 0.347 0.429 0.453 0.715 0.851 0.955 0.962 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
cross finger do trick follow suit make face make headway make mark make point meet eye pave way scratch head see point take piss take root carry weight cross mind grit tooth have laugh take course take plunge
158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892 158.892
1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Table C.5 (Normalized) sums of squared deviations from baseline ([N]SSD) for the lexico-syntactic flexibility parameter PP. Construction
SSD
NSSD
Construction
SSD
NSSD
take course tell story see point catch eye beg question make point carry weight play game bear fruit break heart have laugh close door fight battle cross mind draw line meet eye get act together do trick fit bill have clue
7.043 29.454 39.800 86.936 154.620 160.139 172.410 271.449 319.854 355.746 398.756 439.029 454.617 470.914 488.038 495.243 517.716 529.182 543.022 543.022
0.010 0.043 0.059 0.128 0.227 0.235 0.253 0.399 0.470 0.523 0.586 0.645 0.668 0.692 0.718 0.728 0.761 0.778 0.798 0.798
leave mark make face take plunge break ground deliver good foot bill call police hold breath change hand take piss cross finger follow suit grit tooth make headway make mark pave way scratch head take root write letter
565.988 586.433 600.808 608.994 613.345 618.001 618.559 650.343 662.605 662.605 662.605 662.605 662.605 662.605 662.605 662.605 662.605 662.605 680.159
0.832 0.862 0.883 0.895 0.902 0.909 0.909 0.956 0.974 0.974 0.974 0.974 0.974 0.974 0.974 0.974 0.974 0.974 1.000
179
Appendices Table C.6 (Normalized) sums of squared deviations from baseline ([N]SSD) for the lexico-syntactic flexibility parameter RelCl. Construction
SSD
NSSD
Construction
SSD
NSSD
see point take course tell story meet eye deliver good make point draw line have clue fight battle play game break ground follow suit make headway cross mind make mark close door catch eye beg question hold breath call police
3.607 4.856 5.070 16.063 20.876 26.038 27.186 32.106 36.262 36.385 41.366 41.569 41.668 42.053 46.663 51.165 51.940 52.628 52.779 56.174
0.011 0.015 0.015 0.048 0.063 0.079 0.082 0.097 0.109 0.110 0.125 0.125 0.126 0.127 0.141 0.154 0.157 0.159 0.159 0.169
bear fruit carry weight change hand cross finger do trick foot bill grit tooth leave mark make face scratch head take root break heart fit bill get act together have laugh pave way take plunge take piss write letter
56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 56.174 331.554
0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 0.169 1.000
Table C.7 (Normalized) sums of squared deviations from baseline ([N]SSD) for the lexico-syntactic flexibility parameter NoAdv. Construction
SSD
NSSD
Construction
SSD
NSSD
close door make mark change hand take root fight battle deliver good break ground see point carry weight take piss cross mind write letter leave mark make headway tell story draw line fit bill call police have clue take plunge
35.923 57.814 65.092 93.690 99.387 143.861 170.084 282.845 293.731 314.446 408.064 472.315 557.996 579.206 582.741 591.559 705.631 723.581 787.939 817.825
0.008 0.013 0.015 0.021 0.022 0.032 0.038 0.063 0.066 0.070 0.091 0.106 0.125 0.129 0.130 0.132 0.158 0.162 0.176 0.183
play game beg question make point grit tooth have laugh cross finger break heart bear fruit take course catch eye get act together meet eye scratch head do trick make face hold breath pave way foot bill follow suit
1015.438 1097.701 1142.269 1209.311 1229.508 1251.182 1341.098 1412.847 1520.925 1664.205 1681.641 1907.464 2032.089 2149.175 2377.760 3114.765 3638.943 4449.502 4475.834
0.227 0.245 0.255 0.270 0.275 0.280 0.300 0.316 0.340 0.372 0.376 0.426 0.454 0.480 0.531 0.696 0.813 0.994 1.000
180
Rethinking Idiomaticity
Table C.8 (Normalized) sums of squared deviations from baseline ([N]SSD) for the lexico-syntactic flexibility parameter KindAdv. Construction
SSD
NSSD
Construction
SSD
NSSD
close door deliver good make mark fight battle write letter break ground make headway tell story call police take root take plunge carry weight make point fit bill grit tooth cross finger play game cross mind break heart have laugh
358.308 449.781 454.073 531.475 646.936 664.886 689.232 705.577 760.699 815.109 866.487 940.066 1,105.661 1,270.483 1,270.588 1,299.045 1,332.703 1,351.668 1,411.301 1,435.214
0.041 0.052 0.053 0.062 0.075 0.077 0.080 0.082 0.088 0.094 0.100 0.109 0.128 0.147 0.147 0.150 0.154 0.156 0.163 0.166
change hand take course catch eye bear fruit get act together meet eye take piss beg question have clue scratch head do trick make face leave mark see point draw line hold breath foot bill follow suit pave way
1,515.261 1,544.950 1,551.287 1,619.204 1,764.441 1,803.595 1,840.509 1,928.797 1,990.883 2,035.005 2,049.446 2,161.021 2,170.148 2,454.714 2,544.533 2,839.968 3,784.084 3,835.795 8,639.289
0.175 0.179 0.180 0.187 0.204 0.209 0.213 0.223 0.230 0.236 0.237 0.250 0.251 0.284 0.295 0.329 0.438 0.444 1.000
Table C.9 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter Person. Construction
SSD
NSSD
Construction
SSD
NSSD
call police hold breath write letter get act together tell story close door play game break heart take piss make point draw line grit tooth take plunge fight battle make face break ground meet eye make headway pave way deliver good
441.61 485.62 519.88 578.32 628.46 662.89 739.04 781.89 786.68 860.13 890.18 954.98 980.79 983.97 1,013.02 1,019.83 1,030.91 1,044.24 1,098.29 1,138.87
0.056 0.062 0.066 0.074 0.080 0.085 0.094 0.100 0.100 0.110 0.114 0.122 0.125 0.126 0.129 0.130 0.132 0.133 0.140 0.145
take course catch eye have laugh carry weight make mark scratch head fit bill foot bill bear fruit beg question follow suit do trick leave mark cross mind have clue see point cross finger take root change hand
1,247.83 1,248.44 1,416.61 1,429.04 1,562.36 1,568.17 1,591.70 1,617.46 1,662.13 1,673.15 1,714.25 1,881.75 1,936.35 2,814.24 2,821.63 3,297.39 6,490.08 7,560.11 7,838.78
0.159 0.159 0.181 0.182 0.199 0.200 0.203 0.206 0.212 0.213 0.219 0.240 0.247 0.359 0.360 0.421 0.828 0.964 1.000
181
Appendices Table C.10 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter NumV. Construction
SSD
NSSD
Construction
SSD
NSSD
hold breath see point do trick fit bill catch eye have clue beg question break heart break ground write letter make point carry weight take root tell story make face close door leave mark take piss change hand cross mind
29.523 34.145 40.920 45.924 91.714 139.992 173.668 205.056 225.645 233.491 309.955 410.694 413.861 421.443 457.400 472.902 546.554 552.528 552.540 575.942
0.004 0.005 0.006 0.007 0.014 0.021 0.026 0.031 0.034 0.035 0.046 0.061 0.062 0.063 0.068 0.071 0.082 0.082 0.082 0.086
grit tooth meet eye get act together scratch head draw line make headway pave way take course fight battle play game deliver good bear fruit take plunge call police have laugh foot bill make mark follow suit cross finger
683.168 811.126 932.295 1,057.538 1,141.993 1,193.407 1,421.573 1,458.337 1,461.913 1,521.169 1,644.299 1,828.924 1,889.550 2,365.649 2,459.388 2,617.118 2,642.679 2,700.547 6,701.652
0.102 0.121 0.139 0.158 0.170 0.178 0.212 0.218 0.218 0.227 0.245 0.273 0.282 0.353 0.367 0.391 0.394 0.403 1.000
Table C.11 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter Tense. Construction
SSD
NSSD
Construction
SSD
NSSD
see point fit bill get act together carry weight have clue break ground take piss tell story beg question have laugh do trick make point break heart play game foot bill deliver good draw line bear fruit fight battle pave way
10.988 41.136 63.207 335.296 393.089 539.484 756.102 796.677 888.910 1,082.559 1,156.300 1,166.304 1,218.259 1,352.478 1,700.952 1,791.678 1,830.487 2,337.669 2,505.235 2,598.634
0.002 0.006 0.009 0.046 0.054 0.074 0.104 0.109 0.122 0.149 0.159 0.160 0.167 0.186 0.234 0.246 0.252 0.321 0.344 0.357
take plunge meet eye take course follow suit make headway call police take root hold breath write letter leave mark catch eye close door grit tooth make mark scratch head change hand make face cross finger cross mind
2,639.439 3,005.369 3,023.787 3,065.317 3,073.562 3,085.513 3,422.034 3,463.598 3,519.450 3,618.598 3,998.629 4,030.642 4,155.309 4,366.675 4,396.281 4,608.301 4,627.070 6,651.612 7,277.503
0.363 0.413 0.415 0.421 0.422 0.424 0.470 0.476 0.484 0.497 0.549 0.554 0.571 0.600 0.604 0.633 0.636 0.914 1.000
182
Rethinking Idiomaticity
Table C.12 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter Aspect. Construction
SSD
NSSD
Construction
SSD
NSSD
call police leave mark cross mind write letter draw line take root take course make headway change hand make point make mark tell story break heart close door take plunge pave way do trick get act together have laugh deliver good
14.088 16.504 35.842 193.562 195.546 248.307 421.055 593.085 734.701 742.047 782.117 898.774 1,124.733 1,188.135 1,190.799 1,205.556 1,346.713 1,374.057 1,382.558 1,643.861
0.003 0.004 0.009 0.047 0.047 0.060 0.101 0.143 0.177 0.178 0.188 0.216 0.270 0.286 0.286 0.290 0.324 0.330 0.332 0.395
grit tooth catch eye hold breath foot bill carry weight meet eye fight battle break ground bear fruit scratch head cross finger beg question play game make face fit bill have clue follow suit see point take piss
1,893.820 2,041.435 2,047.125 2,121.425 2,232.715 2,308.832 2,352.750 2,379.804 2,385.537 2,460.856 2,467.712 2,500.459 2,547.156 2,567.849 2,874.285 3,072.819 3,165.048 3,407.389 4,160.889
0.455 0.491 0.492 0.510 0.537 0.555 0.565 0.572 0.573 0.591 0.593 0.601 0.612 0.617 0.691 0.739 0.761 0.819 1.000
Table C.13 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter Mood. Construction
SSD
NSSD
Construction
SSD
NSSD
leave mark beg question break ground see point catch eye hold breath cross mind have clue get act together break heart make face make point take root close door call police write letter carry weight change hand fit bill take piss
13.654 14.016 46.681 53.627 65.356 120.429 140.729 143.824 151.565 181.985 200.008 202.526 208.166 218.037 222.509 224.051 312.886 358.680 438.017 473.845
0.002 0.002 0.006 0.007 0.009 0.016 0.019 0.019 0.020 0.025 0.027 0.027 0.028 0.029 0.030 0.030 0.042 0.048 0.059 0.064
tell story grit tooth do trick meet eye have laugh make headway fight battle scratch head draw line deliver good play game take plunge take course foot bill pave way follow suit bear fruit make mark cross finger
487.847 793.437 883.849 912.915 1,175.619 1,249.430 1,310.967 1,319.979 1,354.744 1,412.381 1,421.728 1,531.544 1,899.059 1,933.141 2,194.588 2,224.307 2,401.390 2,735.434 7,413.193
0.066 0.107 0.119 0.123 0.159 0.169 0.177 0.178 0.183 0.191 0.192 0.207 0.256 0.261 0.296 0.300 0.324 0.369 1.000
183
Appendices Table C.14 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter Voice. Construction
SSD
NSSD
Construction
SSD
NSSD
draw line write letter tell story make point take course call police play game catch eye close door cross finger break ground make headway get act together hold breath carry weight pave way beg question fit bill take piss break heart
334.903 335.464 811.068 824.368 1,301.083 1,531.915 1,549.009 1,551.944 1,620.435 1,689.263 1,846.839 1,861.653 1,889.587 2,098.874 2,193.374 2,203.634 2,207.234 2,230.271 2,244.444 2,250.680
0.129 0.130 0.313 0.319 0.503 0.592 0.599 0.600 0.626 0.653 0.714 0.719 0.730 0.811 0.848 0.852 0.853 0.862 0.867 0.870
leave mark do trick take plunge see point make face bear fruit change hand cross mind deliver good fight battle follow suit have clue have laugh make mark meet eye scratch head foot bill grit tooth take root
2,299.338 2,317.355 2,403.616 2,510.515 2,529.691 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498 2,587.498
0.889 0.896 0.929 0.970 0.978 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Table C.15 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter Neg. Construction
SSD
NSSD
Construction
SSD
NSSD
play game deliver good get act together take root make headway hold breath take piss fit bill meet eye bear fruit carry weight do trick call police break heart change hand draw line leave mark make mark tell story make face
0.040 4.629 5.446 38.913 79.207 83.690 91.556 105.758 116.530 139.904 150.025 170.163 204.316 206.677 212.734 249.066 268.292 275.844 287.631 304.304
0.000 0.000 0.001 0.004 0.008 0.008 0.009 0.011 0.012 0.014 0.015 0.017 0.021 0.021 0.022 0.025 0.027 0.028 0.029 0.031
break ground follow suit close door have laugh take course beg question catch eye take plunge fight battle write letter make point cross finger foot bill pave way scratch head grit tooth cross mind see point have clue
326.389 328.099 336.690 337.459 337.459 347.948 348.343 353.316 362.769 364.368 371.956 375.220 398.352 451.826 451.826 451.826 855.646 3,495.170 9,888.235
0.033 0.033 0.034 0.034 0.034 0.035 0.035 0.036 0.037 0.037 0.038 0.038 0.040 0.046 0.046 0.046 0.087 0.353 1.000
184
Rethinking Idiomaticity
Table C.16 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter Det. Construction
SSD
NSSD
Construction
SSD
tell story draw line make point take course beg question leave mark see point call police cross finger break heart meet eye play game fit bill cross mind hold breath close door deliver good grit tooth pave way take piss
109.713 112.655 124.158 153.545 215.228 386.016 806.156 836.233 874.252 890.068 890.666 1040.243 1300.648 1496.134 1717.299 1748.618 1788.383 1857.897 1867.248 1902.422
.009 .009 .010 .012 .017 .031 .064 .067 .070 .071 .071 .083 .104 .119 .137 .140 .143 .148 .149 .152
get act together write letter fight battle carry weight have laugh have clue make face make headway bear fruit make mark foot bill break ground take plunge scratch head follow suit catch eye do trick change hand take root
1969.986 3052.126 3246.679 4858.616 5180.607 6575.907 7728.862 8079.866 9011.702 9209.498 11372.922 11532.341 11969.016 12351.830 12371.316 12410.545 12413.576 12534.713 12534.713
NSSD .157 .243 .259 .388 .413 .525 .617 .645 .719 .735 .907 .920 .955 .985 .987 .990 .990 1.000 1.000
Table C.17 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter NumNP. Construction
SSD
NSSD
Construction
SSD
scratch head break heart draw line make point make face tell story beg question have clue close door cross mind catch eye leave mark fight battle have laugh foot bill take course bear fruit follow suit see point get act together
164.642 165.847 186.646 193.878 236.556 265.873 406.055 465.644 555.330 633.479 634.898 805.138 912.102 978.706 1157.804 1210.991 1234.147 1462.416 1465.861 1468.320
.005 .005 .005 .006 .007 .008 .012 .013 .016 .018 .018 .023 .026 .028 .033 .035 .036 .042 .042 .042
hold breath make mark do trick take root break ground carry weight fit bill make headway pave way take piss take plunge write letter play game meet eye call police change hand cross finger deliver good grit tooth
1471.444 1506.589 1584.781 1584.781 1584.781 1584.781 1584.781 1584.781 1584.781 1584.781 1584.781 2182.075 2519.666 18381.533 34185.566 34695.120 34695.120 34695.120 34695.120
NSSD .042 .043 .046 .046 .046 .046 .046 .046 .046 .046 .046 .063 .073 .530 .985 1.000 1.000 1.000 1.000
185
Appendices Table C.18 (Normalized) sums of squared deviations from baseline ([N]SSD) for the morphological flexibility parameter Gerund. Construction
SSD
NSSD
Construction
SSD
NSSD
call police carry weight leave mark bear fruit take root grit tooth fight battle follow suit make point get act together make headway fit bill beg question deliver good break heart have clue see point do trick cross mind catch eye
0.012 0.042 0.780 0.819 1.433 1.493 1.815 2.042 2.819 4.694 5.906 6.707 8.651 9.110 9.431 10.236 10.891 14.508 14.508 15.744
0.000 0.000 0.001 0.001 0.002 0.002 0.003 0.003 0.004 0.007 0.009 0.011 0.014 0.014 0.015 0.016 0.017 0.023 0.023 0.025
break ground foot bill take course tell story write letter hold breath draw line make face make mark take plunge cross finger scratch head have laugh change hand meet eye take piss play game close door pave way
22.068 27.807 28.415 45.567 49.420 77.148 80.365 83.751 89.683 94.446 106.775 173.229 182.466 356.147 393.066 429.952 443.597 538.700 630.323
0.035 0.044 0.045 0.072 0.078 0.122 0.127 0.133 0.142 0.150 0.169 0.275 0.289 0.565 0.624 0.682 0.704 0.855 1.000
D. Entropy-based flexibility values for tree-syntactic, lexico-syntactic, and morphological flexibility
Table D.1 Relative entropy values (H ) for tree-syntactic flexibility. Construction
H
Construction
H
BASELINE call police make point cross finger draw line write letter tell story take course hold breath play game have laugh close door carry weight fight battle make headway foot bill make face grit tooth have clue take piss
0.822 0.580 0.566 0.539 0.517 0.513 0.440 0.424 0.423 0.414 0.273 0.266 0.209 0.202 0.183 0.182 0.179 0.157 0.157 0.156
take plunge catch eye do trick cross mind break ground see point fit bill beg question leave mark pave way break heart deliver good bear fruit scratch head take root make mark change hand follow suit get act together meet eye
0.155 0.152 0.149 0.149 0.141 0.129 0.124 0.122 0.112 0.106 0.106 0.105 0.088 0.081 0.073 0.043 0.000 0.000 0.000 0.000
187
Appendices Table D.2 Directional entropy values (Hdir ) for the lexico-syntactic flexibility parameter Addition. Construction
Hdir
Construction
Hdir
pave way break ground fight battle make headway BASELINE carry weight write letter draw line leave mark tell story beg question make mark take piss see point close door change hand take course play game foot bill take root
0.937 0.887 0.375 0.281 0.268 0.258 0.239 0.218 0.203 0.196 0.186 0.130 0.122 0.099 0.058 0.055 0.021 0.018 0.002 0.001
have laugh deliver good make point cross finger cross mind make face bear fruit call police catch eye take plunge fit bill meet eye break heart grit tooth have clue get act together scratch head do trick hold breath follow suit
0.001 0.001 0.000 0.000 −0.001 −0.007 −0.023 −0.027 −0.050 −0.053 −0.056 −0.061 −0.080 −0.152 −0.152 −0.162 −0.240 −0.253 −0.610 −0.676
Table D.3 Directional entropy values (Hdir ) for the lexico-syntactic flexibility parameter AttrAdj. Construction
Hdir
Construction
Hdir
break ground fight battle carry weight take course have laugh make headway play game make face beg question leave mark draw line make mark meet eye bear fruit foot bill make point BASELINE tell story write letter close door
0.703 0.046 0.000 −0.096 −0.111 −0.117 −0.132 −0.133 −0.150 −0.264 −0.355 −0.367 −0.395 −0.497 −0.500 −0.517 −0.546 −0.547 −0.605 −0.652
call police break heart take plunge catch eye deliver good cross mind get act together hold breath scratch head fit bill follow suit pave way change hand cross finger do trick grit tooth have clue see point take piss take root
−0.691 −0.719 −0.742 −0.773 −0.818 −0.851 −0.852 −0.913 −0.919 −0.928 −0.937 −0.965 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000
188
Rethinking Idiomaticity
Table D.4 Directional entropy values (Hdir ) for the lexico-syntactic flexibility parameter AttrNP. Construction
Hdir
Construction
Hdir
close door draw line play game BASELINE fight battle tell story foot bill call police write letter deliver good leave mark hold breath catch eye bear fruit beg question break ground break heart carry weight change hand cross finger
−0.541 −0.619 −0.664 −0.706 −0.750 −0.769 −0.773 −0.802 −0.810 −0.895 −0.941 −0.967 −0.979 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000
cross mind do trick fit bill follow suit get act together grit tooth have clue have laugh make face make headway make mark make point meet eye pave way scratch head see point take course take piss take plunge take root
−1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000
Table D.5 Directional entropy values (Hdir ) for the lexico-syntactic flexibility parameter PP. Construction
Hdir
Construction
Hdir
foot bill beg question see point write letter take course catch eye make point BASELINE carry weight tell story play game bear fruit break heart have laugh close door fight battle cross mind draw line meet eye get act together
−0.059 −0.159 −0.228 −0.315 −0.358 −0.482 −0.555 −0.563 −0.566 −0.614 −0.651 −0.690 −0.719 −0.754 −0.787 −0.799 −0.813 −0.827 −0.833 −0.852
do trick fit bill leave mark make face take plunge break ground deliver good call police hold breath change hand cross finger follow suit grit tooth have clue make headway make mark pave way scratch head take piss take root
−0.862 −0.874 −0.895 −0.914 −0.928 −0.936 −0.941 −0.946 −0.967 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000
189
Appendices Table D.6 Directional entropy values (Hdir ) for the lexico-syntactic flexibility parameter RelCl. Construction
Hdir
Construction
Hdir
beg question write letter BASELINE see point take course meet eye deliver good make point draw line tell story fight battle play game follow suit make headway cross mind make mark hold breath close door catch eye bear fruit
−0.518 −0.550 −0.701 −0.760 −0.770 −0.833 −0.855 −0.876 −0.881 −0.882 −0.916 −0.917 −0.937 −0.937 −0.939 −0.957 −0.967 −0.975 −0.979 −1.000
break ground break heart call police carry weight change hand cross finger do trick fit bill foot bill get act together grit tooth have clue have laugh leave mark make face pave way scratch head take piss take plunge take root
−1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000 −1.000
Table D.7 Directional entropy values (Hdir ) for the lexico-syntactic flexibility parameter NoAdv. Construction
Hdir
Construction
Hdir
pave way do trick draw line take piss BASELINE make mark close door make headway break ground leave mark take root change hand cross mind see point carry weight deliver good fit bill cross finger fight battle call police
0.675 0.598 0.488 0.336 0.336 0.327 0.310 0.293 0.263 0.257 0.223 0.194 −0.213 −0.242 −0.287 −0.334 −0.346 −0.349 −0.353 −0.370
play game write letter beg question have laugh bear fruit take course grit tooth have clue break heart get act together tell story scratch head meet eye take plunge make point catch eye make face hold breath foot bill follow suit
−0.376 −0.379 −0.380 −0.389 −0.403 −0.426 −0.428 −0.428 −0.431 −0.450 −0.461 −0.472 −0.475 −0.480 −0.521 −0.527 −0.533 −0.640 −0.731 −0.738
190
Rethinking Idiomaticity Table D.8 Relative entropy values (H ) for the lexico-syntactic flexibility parameter KindAdv. Construction
H
Construction
H
make headway break ground change hand take root fight battle make mark take piss carry weight deliver good cross mind take plunge BASELINE close door leave mark draw line grit tooth have clue call police fit bill have laugh
0.750 0.727 0.676 0.655 0.645 0.630 0.627 0.610 0.605 0.593 0.559 0.551 0.528 0.516 0.516 0.513 0.513 0.503 0.498 0.494
cross finger beg question break heart see point bear fruit write letter get act together take course scratch head play game do trick tell story meet eye catch eye make point make face pave way hold breath follow suit foot bill
0.458 0.456 0.455 0.454 0.436 0.436 0.430 0.420 0.406 0.405 0.397 0.391 0.372 0.366 0.365 0.339 0.286 0.268 0.164 0.145
Table D.9 Relative entropy values (H ) for the morphological flexibility parameter Person. Construction
H
Construction
H
have laugh play game take piss draw line get act together take plunge BASELINE write letter fight battle make point call police see point tell story close door grit tooth have clue make headway break heart hold breath foot bill
0.919 0.877 0.832 0.807 0.802 0.786 0.764 0.741 0.738 0.733 0.707 0.697 0.697 0.692 0.678 0.678 0.675 0.666 0.664 0.663
take course scratch head deliver good follow suit make mark bear fruit meet eye break ground pave way change hand make face cross finger take root catch eye carry weight fit bill beg question do trick leave mark cross mind
0.654 0.651 0.640 0.638 0.634 0.618 0.613 0.578 0.557 0.527 0.490 0.475 0.464 0.450 0.408 0.388 0.346 0.326 0.312 0.107
191
Appendices Table D.10 Relative entropy values (H ) for the morphological flexibility parameter NumV. Construction
H
Construction
H
have laugh follow suit take plunge deliver good call police play game get act together make headway foot bill fight battle draw line change hand pave way make mark bear fruit carry weight take piss take root break ground tell story
0.999 0.961 0.960 0.959 0.955 0.949 0.947 0.941 0.935 0.933 0.932 0.923 0.920 0.919 0.916 0.905 0.897 0.894 0.863 0.853
take course write letter leave mark scratch head make point grit tooth have clue catch eye BASELINE meet eye hold breath cross finger do trick fit bill see point break heart beg question close door make face cross mind
0.843 0.822 0.816 0.780 0.778 0.729 0.729 0.727 0.710 0.684 0.669 0.657 0.650 0.639 0.633 0.599 0.579 0.559 0.419 0.301
Table D.11 Relative entropy values (H ) for the morphological flexibility parameter Tense. Construction
H
Construction
H
take plunge draw line follow suit break heart do trick have laugh pave way tell story carry weight deliver good break ground take piss make point foot bill play game bear fruit fight battle take course make headway fit bill
0.970 0.959 0.915 0.905 0.879 0.877 0.877 0.876 0.876 0.873 0.869 0.846 0.835 0.834 0.826 0.819 0.817 0.816 0.786 0.768
call police get act together meet eye take root write letter make mark grit tooth have clue scratch head BASELINE close door see point leave mark change hand catch eye hold breath cross finger make face beg question cross mind
0.767 0.766 0.758 0.724 0.721 0.714 0.695 0.695 0.687 0.684 0.680 0.665 0.661 0.652 0.651 0.647 0.582 0.572 0.451 0.269
192
Rethinking Idiomaticity
Table D.12 Relative entropy values (H ) for the morphological flexibility parameter Aspect. Construction
H
Construction
H
make headway change hand play game write letter make mark break ground pave way fight battle draw line take piss take root leave mark BASELINE have laugh take course close door cross finger call police scratch head make point
0.876 0.869 0.838 0.815 0.812 0.808 0.801 0.797 0.796 0.776 0.773 0.771 0.767 0.761 0.757 0.753 0.752 0.745 0.718 0.692
tell story break heart bear fruit cross mind grit tooth have clue take plunge hold breath get act together deliver good foot bill do trick meet eye carry weight catch eye beg question make face follow suit fit bill see point
0.686 0.676 0.675 0.645 0.645 0.645 0.644 0.627 0.603 0.595 0.526 0.466 0.453 0.408 0.401 0.362 0.351 0.229 0.181 0.043
Table D.13 Relative entropy values (H ) for the morphological flexibility parameter Mood. Construction
H
Construction
H
take plunge scratch head pave way bear fruit follow suit cross finger draw line foot bill do trick have laugh make headway take root take course make mark fit bill carry weight play game get act together tell story fight battle
0.971 0.958 0.883 0.876 0.855 0.837 0.803 0.799 0.770 0.767 0.767 0.762 0.759 0.742 0.724 0.702 0.677 0.656 0.656 0.650
deliver good change hand call police break heart meet eye make point take piss make face write letter grit tooth have clue close door catch eye BASELINE beg question leave mark break ground see point hold breath cross mind
0.642 0.639 0.631 0.622 0.609 0.589 0.575 0.570 0.559 0.516 0.516 0.515 0.467 0.457 0.408 0.394 0.313 0.290 0.269 0.267
193
Appendices Table D.14 Relative entropy values (H ) for the morphological flexibility parameter Voice. Construction
H
Construction
H
call police BASELINE draw line write letter tell story make point take course play game catch eye close door cross finger break ground make headway get act together hold breath carry weight pave way beg question fit bill take piss
0.955 0.942 0.819 0.673 0.502 0.498 0.371 0.308 0.307 0.290 0.272 0.231 0.227 0.220 0.163 0.136 0.133 0.132 0.126 0.121
break heart leave mark do trick take plunge see point make face bear fruit change hand cross mind deliver good fight battle follow suit foot bill grit tooth have clue have laugh make mark meet eye scratch head take root
0.120 0.105 0.099 0.072 0.034 0.027 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Table D.15 Relative entropy values (H ) for the morphological flexibility parameter Neg. Construction
H
Construction
H
see point cross mind make headway deliver good play game BASELINE get act together take root hold breath take piss fit bill meet eye bear fruit carry weight do trick call police break heart change hand draw line leave mark
0.986 0.940 0.748 0.647 0.614 0.611 0.568 0.488 0.422 0.411 0.394 0.381 0.353 0.342 0.320 0.283 0.281 0.274 0.236 0.216
make mark tell story make face break ground follow suit close door have laugh take course beg question catch eye take plunge fight battle write letter make point cross finger foot bill grit tooth have clue pave way scratch head
0.209 0.196 0.179 0.156 0.154 0.145 0.144 0.144 0.132 0.132 0.127 0.116 0.114 0.106 0.102 0.075 0 0 0 0
194
Rethinking Idiomaticity
Table D.16 Relative entropy values (H ) for the morphological flexibility parameter Det. Construction
H
Construction
H
cross finger play game write letter make mark carry weight fight battle BASELINE tell story draw line call police beg question make point bear fruit leave mark take course make headway meet eye have laugh see point break heart
0.735 0.730 0.716 0.677 0.676 0.661 0.640 0.625 0.622 0.582 0.570 0.565 0.527 0.525 0.507 0.501 0.472 0.463 0.391 0.381
make face cross mind foot bill fit bill take plunge break ground hold breath deliver good grit tooth have clue pave way close door scratch head take piss follow suit do trick catch eye change hand get act together take root
0.344 0.316 0.307 0.278 0.174 0.168 0.124 0.104 0.095 0.095 0.088 0.085 0.081 0.069 0.063 0.056 0.038 0 0 0
Table D.17 Relative entropy values (H ) for the morphological flexibility parameter NumNP. Construction
H
Construction
H
play game write letter fight battle beg question meet eye tell story scratch head break heart draw line make point make face BASELINE close door cross mind catch eye leave mark have laugh foot bill take course bear fruit
0.962 0.951 0.871 0.794 0.773 0.751 0.680 0.626 0.598 0.590 0.556 0.437 0.401 0.371 0.371 0.308 0.246 0.182 0.162 0.154
follow suit see point get act together hold breath call police make mark break ground carry weight change hand cross finger deliver good do trick fit bill grit tooth have clue make headway pave way take piss take plunge take root
0.063 0.062 0.060 0.059 0.054 0.043 0 0 0 0 0 0 0 0 0 0 0 0 0 0
195
Appendices Table D.18 Relative entropy values (H ) for the morphological flexibility parameter Gerund. Construction
H
Construction
H
pave way close door play game take piss meet eye change hand have laugh scratch head cross finger take plunge make mark make face draw line hold breath write letter tell story take course foot bill break ground catch eye
0.731 0.704 0.671 0.666 0.651 0.635 0.536 0.529 0.469 0.455 0.449 0.442 0.438 0.433 0.390 0.383 0.346 0.344 0.328 0.307
break heart deliver good make headway get act together make point follow suit fight battle take root bear fruit call police BASELINE carry weight leave mark grit tooth have clue fit bill beg question see point cross mind do trick
0.281 0.279 0.261 0.253 0.237 0.229 0.226 0.221 0.211 0.183 0.174 0.171 0.145 0.132 0.132 0.072 0.054 0.034 0 0
E. Barkema-based flexibility values for the parameter levels of tree-syntactic, lexico-syntactic, and morphological flexibility
Table E.1 Flexibility values (in per cent) for the parameter levels of tree-syntactic flexibility. DeclAct DeclPass
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
78.73 76.78 76.08 77.68 35.23 72.84 72.92 79.84 68.36 9.18 77.70 78.46 76.62 47.59 75.16 77.26 79.84 77.09 79.84 75.58
−31.36 −30.14 −29.11 −29.74 5.87 −30.09 −26.07 −31.36 −26.53 −27.36 −31.36 −31.36 −30.07 −12.98 −31.36 −29.64 −31.36 −31.36 −31.36 −31.36
RelClAct
PartClAct RelClPass
PartClPass
InterrAct
ImperAct InterrPass ImperPass
−8.10 −7.98 −8.46 −9.21 −9.21 −5.39 −9.01 −9.21 −8.85 −9.21 −9.21 −9.21 −9.21 −8.56 −7.13 −9.21 −9.21 −9.21 −9.21 −9.21
−1.13 −1.13 −1.13 −0.59 −1.13 −1.13 −1.13 −1.13 −1.13 −1.13 −1.13 −1.13 −1.13 0.48 1.47 −1.13 −1.13 −1.13 −1.13 −1.13
−2.35 −2.35 −1.59 −2.35 −2.35 −1.71 −2.14 −2.35 −2.35 −2.35 −0.20 −2.35 −2.35 −0.73 −2.35 −2.35 −2.35 −2.35 −2.35 −2.35
−11.03 −11.03 −11.03 −11.03 −9.80 −9.76 −10.83 −11.03 −10.55 −10.37 −11.03 −11.03 −9.10 −6.52 −11.03 −10.17 −11.03 −8.28 −11.03 −10.42
−22.50 −22.50 −22.50 −22.50 −16.66 −22.50 −21.89 −22.50 −16.94 −19.17 −22.50 −21.12 −22.50 −22.50 −22.50 −22.50 −22.50 −22.50 −22.50 −20.06
−2.26 −1.65 −2.26 −2.26 −1.95 −2.26 −2.06 −2.26 −2.02 −1.59 −2.26 −2.26 −2.26 1.29 −2.26 −2.26 −2.26 −2.26 −2.26 −2.26
0 0 0 0 0 0 0 0 0 0 0 0 0 1.94 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Appendices
Construction
197
198
Table E.1 (Cont.) DeclAct DeclPass
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
74.24 72.70 66.83 77.08 72.30 75.43 79.37 51.49 79.84 77.61 59.15 78.84 76.97 58.76 75.71 76.37 78.96 55.85 47.58
−31.36 −31.36 −27.60 −31.36 −31.36 −27.69 −31.36 −21.71 −31.36 −29.51 −25.85 −31.36 −31.00 −28.64 −29.71 −30.49 −31.36 −23.95 −26.47
RelClAct
PartClAct RelClPass
PartClPass
InterrAct
ImperAct InterrPass ImperPass
−6.19 −9.21 −9.21 −8.52 −8.67 −9.21 −9.21 −1.75 −9.21 −9.21 −4.04 −9.21 −9.21 −1.05 −9.21 −9.21 −9.21 −3.65 −0.23
−1.13 −1.13 −1.13 −1.13 −1.13 −1.13 −1.13 5.34 −1.13 −1.13 −0.44 −1.13 −1.13 3.97 −1.13 −1.13 −1.13 2.63 −0.11
−2.35 −2.35 −2.35 −2.35 −2.35 −2.35 −2.35 −2.35 −2.35 −2.35 −2.35 −2.35 −2.35 −2.01 −2.35 −2.35 −2.35 −1.78 −1.83
−8.45 −8.99 −11.03 −10.34 −10.76 −10.30 −11.03 −8.65 −11.03 −11.03 −9.65 −10.03 −8.52 −10.35 −10.21 −11.03 −11.03 −10.00 −10.16
−22.50 −17.40 −13.26 −22.50 −16.30 −22.50 −22.03 −22.50 −22.50 −22.13 −14.92 −22.50 −22.50 −22.50 −20.85 −19.89 −21.62 −19.77 −18.78
−2.26 −2.26 −2.26 −0.88 −1.99 −2.26 −2.26 0.13 −2.26 −2.26 −2.26 −2.26 −2.26 1.82 −2.26 −2.26 −2.26 0.62 9.93
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.05 0.07
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Rethinking Idiomaticity
Construction
199
Appendices Table E.2 Flexibility values (in per cent) for the parameter levels of the lexico-syntactic flexibility parameter Addition. Construction
Absent
Present
Construction
Absent
Present
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
38.38 4.65 −19.00 45.98 39.19 0.52 42.63 15.82 15.41 30.16 31.64 27.77 58.21 2.72 −4.88 43.29 73.57 27.20 52.74 52.06
−38.38 −4.65 19.00 −45.98 −39.19 −0.52 −42.63 −15.82 −15.41 −30.16 −31.64 −27.77 −58.21 −2.72 4.88 −43.29 −73.57 −27.20 −52.74 −52.06
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
38.98 27.46 65.11 3.63 34.48 −0.65 8.60 29.84 43.88 −19.76 21.57 57.50 11.15 20.99 9.25 42.97 27.28 4.06 1.54
−38.98 −27.46 −65.45 −3.63 −34.48 0.65 −8.60 −29.84 −43.88 19.76 −21.57 −57.50 −11.15 −20.99 −9.25 −42.97 −27.28 −4.06 −1.54
Table E.3 Flexibility values (in per cent) for the parameter levels of the lexico-syntactic flexibility parameter AttrNP. Construction
0
1
2
Construction
0
1
2
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
9.12 9.12 9.12 9.12 6.05 9.12 8.92 9.12 −0.55 9.12 9.12 7.74 9.12 1.70 4.96 9.12 9.12 5.45 9.12 9.12
−8.69 −8.69 −8.69 −8.69 −5.61 −8.69 −8.48 −8.69 0.99 −8.69 −8.69 −7.31 −8.69 −1.27 −4.52 −8.69 −8.69 −5.02 −8.69 −8.69
−0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
9.12 9.12 8.78 8.43 9.12 9.12 9.12 9.12 9.12 9.12 2.92 9.12 9.12 9.12 9.12 9.12 9.12 5.36 6.20
−8.69 −8.69 −8.69 −8.00 −8.69 −8.69 −8.69 −8.69 −8.69 −8.69 −2.48 −8.69 −8.69 −8.69 −8.69 −8.69 −8.69 −4.93 −5.77
−0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43 −0.43
200
Rethinking Idiomaticity
Table E.4 Flexibility values (in per cent) for the parameter levels of the lexico-syntactic flexibility parameter AttrAdj. Construction
0
1
2
3
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
13.74 −2.76 −69.89 19.98 19.31 −24.20 21.18 24.85 18.32 24.85 22.71 22.09 24.85 8.40 −37.65 23.99 24.11 13.84 22.74 24.85 18.81 −5.76 23.14 4.16 −3.99 −5.30 8.89 14.40 10.05 24.48 −4.12 23.85 24.85 −7.12 24.85 20.50 24.85 −3.37 3.32
−11.91 4.58 71.71 −18.16 −17.48 26.02 −19.36 −23.02 −16.49 −23.02 −20.88 −20.26 −23.02 −6.57 39.48 −22.16 −22.28 −12.01 −20.91 −23.02 −16.99 7.59 −21.65 −2.33 5.82 7.12 −7.06 −12.58 −8.23 −22.65 5.94 −22.02 −23.02 8.95 −23.02 −18.68 −23.02 4.53 −2.00
−1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.74 −1.12 −1.37
−0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.04 0.06
201
Appendices Table E.5 Flexibility values (in per cent) for the parameter levels of the lexico-syntactic flexibility parameter PP. Construction
0
1
2
Construction
0
1
2
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
12.69 −8.75 17.49 13.38 17.63 9.33 6.64 18.25 14.86 18.25 15.39 17.56 16.31 15.66 15.12 16.52 18.25 −17.53 16.13 18.25
−12.60 8.84 −17.41 −13.29 −17.54 −9.24 −6.55 −18.16 −14.77 −18.16 −15.30 −17.47 −16.22 −15.58 −15.03 −16.43 −18.16 17.62 −16.05 −18.16
−0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
16.52 14.16 17.90 16.87 17.17 18.25 18.25 8.99 15.78 18.25 11.69 18.25 −4.42 1.92 18.25 17.38 18.25 −3.85 −19.49
−16.43 −14.08 −18.16 −16.78 −17.08 −18.16 −18.16 −8.90 −15.69 −18.16 −11.61 −18.16 4.50 −1.83 −18.16 −17.29 −18.16 3.83 17.17
−0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 −0.09 0.02 2.32
Table E.6 Flexibility values (in per cent) for the parameter levels of the lexico-syntactic flexibility parameter RelCl. Construction
0
1
Construction
0
1
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
5.30 −5.13 4.55 5.30 5.30 5.30 5.10 5.30 5.06 5.30 4.59 3.23 5.30 3.69 4.26 5.30 4.56 5.30 5.30 5.30
−5.30 5.13 −4.55 −5.30 −5.30 −5.30 −5.10 −5.30 −5.06 −5.30 −4.59 −3.23 −5.30 −3.69 −4.26 −5.30 −4.56 −5.30 −5.30 −5.30
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
4.01 5.30 4.96 5.30 5.30 4.56 4.83 3.61 2.83 5.30 4.27 5.30 1.34 1.56 5.30 5.30 5.30 1.59 −12.88
−4.01 −5.30 −5.30 −5.30 −5.30 −4.56 −4.83 −3.61 −2.83 −5.30 −4.27 −5.30 −1.34 −1.56 −5.30 −5.30 −5.30 −1.59 12.88
202
Rethinking Idiomaticity Table E.7 Flexibility values (in per cent) for the parameter levels of the lexico-syntactic flexibility parameter NoAdv. Construction
0
1
2
3
4
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
30.94 27.54 −9.32 28.95 21.60 13.65 33.16 −5.61 4.64 28.94 16.89 10.02 37.60 −10.46 −6.46 22.26 53.16 53.01 33.66 28.94 23.12 29.03 44.93 −10.67 39.54 −9.31 −5.31 27.45 35.68 −41.65 26.57 36.60 14.08 32.09 −12.65 23.69 5.39 19.81 18.11
−18.56 −14.79 8.87 −21.38 −15.01 −1.60 −21.42 −0.08 0.40 −18.34 −9.43 −3.01 −24.30 20.22 7.51 −9.39 −38.56 −38.42 −20.47 −14.96 −13.26 −16.48 −30.68 19.75 −26.02 20.23 5.35 −17.73 −22.46 43.61 −13.70 −24.01 −4.16 −19.20 12.37 −10.83 −7.61 −11.60 −9.87
−10.38 −10.76 1.69 −6.65 −5.52 −10.05 −10.15 5.80 −3.41 −8.60 −5.45 −5.70 −11.95 −8.40 −0.62 −10.87 −12.60 −12.60 −11.19 −11.99 −8.72 −10.56 −12.26 −7.08 −11.52 −8.92 −0.86 −8.52 −11.23 −0.70 −10.87 −10.60 −7.92 −10.90 −1.03 −11.73 2.45 −7.40 −6.69
−1.91 −1.91 −1.16 −0.83 −0.99 −1.91 −1.50 −0.02 −1.55 −1.91 −1.91 −1.22 −1.27 −1.59 −0.87 −1.91 −1.91 −1.91 −1.91 −1.91 −1.05 −1.91 −1.91 −1.91 −1.91 −1.91 0.44 −1.21 −1.91 −1.17 −1.91 −1.91 −1.91 −1.91 0.57 −1.04 −0.14 −0.78 −1.47
−0.09 −1.11 −0.61 −0.75 −0.54 −0.31 −0.64 −0.20 −0.47 −0.12 −0.67 −0.71 −0.69 −0.32 0.20 −0.52 −0.86 −0.74 −0.92 −0.70 −0.61 −0.43 −1.02 −0.34 −0.69 −0.27 −0.27 −0.37 −0.10 −0.27 −0.37 −0.34 −1.00 −0.36 0.49 −0.83 −0.87 −0.83 −0.05
Table E.8 Flexibility values (in per cent) for the parameter levels of the lexico-syntactic flexibility parameter KindAdv. 0
1
2
3
4
5
6
7
8
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
30.94 27.54 −9.32 28.95 21.60 13.65 33.16 −5.61 4.64 28.94 16.89 10.02 37.60 −10.46 −6.46 22.26 53.16 53.01 33.66 28.94
1.11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
−17.39 −17.89 −9.48 −17.42 −7.43 −10.23 −12.60 −11.43 11.48 −13.84 −18.51 −10.92 −17.86 −9.15 −13.82 −17.64 −17.02 −17.59 −18.51 −15.46
0.68 −14.15 0.56 −2.47 0.32 −14.08 −6.01 32.60 −6.80 −11.99 18.30 1.94 −10.82 −13.41 0.16 −14.26 −15.25 −15.07 −6.13 −8.67
−14.72 −15.10 15.39 −11.00 −14.79 −13.76 −8.59 −14.58 −2.55 −10.28 −12.66 −10.73 −12.43 −10.17 13.79 5.47 −16.94 −16.02 −9.20 −3.53
1.40 3.71 14.25 0.20 −3.04 10.97 −3.04 −2.57 2.76 −1.71 −3.04 −1.66 −3.04 44.38 −3.04 −0.45 −3.04 −3.04 −1.63 −3.04
−11.24 −12.85 −5.95 −11.30 0.07 −9.01 −12.86 4.93 −11.65 1.87 −13.47 −5.19 −5.08 −9.27 9.45 −13.47 −13.47 −12.55 −12.76 −9.20
−1.76 15.54 3.15 3.62 −1.02 0.95 −1.44 6.10 −1.90 −2.20 5.70 8.86 0.36 −2.22 −1.83 6.62 −0.64 −1.95 0.65 −1.65
−3.13 −1.29 −1.62 1.20 −3.13 7.70 −2.52 −3.13 −2.28 −3.13 −2.41 −1.75 −3.13 −2.81 −2.61 −2.27 −3.13 −3.13 −1.02 −3.13
Appendices
Construction
203
204
Table E.8 (Cont.) 0
1
2
3
4
5
6
7
8
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
23.12 29.03 44.93 −10.67 39.54 −9.31 −5.31 27.45 35.68 −41.65 26.57 36.60 14.08 32.09 −12.65 23.69 5.39 19.86 18.11
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
−18.51 −16.46 −17.82 39.43 −6.38 −8.21 17.64 −12.44 −15.49 −17.39 −17.47 −18.51 −18.51 −15.44 36.87 −7.20 19.55 −11.86 −15.29
−9.95 −11.90 −14.27 −13.92 −12.48 −1.28 −2.37 −7.63 −5.30 −14.13 −11.85 −11.99 −14.19 −9.86 −6.07 −6.42 6.14 −6.05 2.77
−16.51 −7.76 −15.57 −12.11 −15.32 −6.65 −0.98 −2.02 −10.37 −13.60 5.47 −13.94 −15.50 −10.14 −11.16 −12.59 −13.40 −4.58 −4.61
24.11 5.12 −2.70 −2.35 −2.77 20.49 −2.57 −1.95 −3.04 −3.04 −2.70 −2.04 35.81 −3.04 −1.39 −3.04 −2.16 5.92 −0.63
−12.17 −9.38 −6.27 −12.09 −11.85 −8.32 −9.71 −11.08 −11.55 78.73 −11.74 −2.47 −13.47 −8.02 −10.16 −6.51 −13.47 −8.37 −6.75
2.31 −0.83 −2.18 1.96 −2.87 0.07 −0.05 0.02 −2.59 −0.64 −0.45 0.13 2.17 0.87 7.05 −0.26 0.67 −0.70 −0.53
−2.70 −2.11 −2.44 −2.44 −3.13 0.55 −2.66 −2.93 −2.31 −3.13 −2.44 −2.13 −2.05 −1.43 −2.30 −1.39 −0.47 −2.72 −3.13
Rethinking Idiomaticity
Construction
205
Appendices Table E.9 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter Person. Construction
Infinitive 1st pers.
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
32.18 4.45 5.71 1.37 8.49 10.02 3.79 −0.27 0.52 −7.15 −6.39 25.28 3.15 17.99 18.74 1.66 33.66 32.55 7.67 −0.50 −7.82 14.63 −5.08 −1.61 −3.78 21.59 30.68 13.18 8.89 10.02 12.18 −4.82 0.09 25.17 2.10 20.88 12.53 10.82 7.00
−6.69 −4.24 −0.67 −2.91 3.46 −6.05 −3.23 −6.22 0.32 −2.02 −5.98 −1.17 −6.04 3.31 −0.44 −3.24 −1.50 −3.02 13.03 1.85 45.90 16.78 9.75 −4.62 −4.26 −0.81 −2.93 13.31 −4.77 −6.32 2.28 −0.69 47.63 −4.31 9.01 2.01 −5.80 3.20 6.89
2nd pers. 3rd pers. Vocative Other −7.66 −8.16 −6.52 4.74 −5.08 −8.14 −7.96 −8.77 −7.57 −8.11 −8.77 −6.71 −7.48 −3.29 −6.69 −6.19 −8.03 −5.11 −0.32 −7.56 −1.88 −0.61 −7.41 −7.40 −7.43 −5.83 −7.84 −2.71 −8.23 −8.40 −2.91 −6.77 −0.14 −7.07 1.97 −4.43 −7.89 −6.15 −4.83
4.04 32.53 20.66 14.97 7.86 26.82 25.87 23.95 7.00 −32.85 46.34 1.58 35.58 −1.52 2.13 32.10 −2.63 −6.57 −3.63 4.58 −11.87 −22.95 9.78 36.76 25.03 5.83 −4.57 −2.76 12.59 9.07 −11.52 3.48 −22.74 3.58 −6.89 −7.17 22.82 6.82 4.61
−22.50 −22.50 −22.50 −22.50 −16.35 −22.50 −21.89 −22.50 −16.82 −19.84 −22.50 −21.12 −22.50 −22.50 −22.50 −22.50 −22.50 −21.58 −18.28 −20.06 −22.50 −17.40 −13.26 −22.50 −16.03 −22.50 −22.03 −22.20 −22.50 −22.13 −14.92 −22.50 −22.50 −22.50 −20.85 −19.89 −22.50 −19.77 −18.78
0.64 −2.08 3.32 4.33 1.61 −0.15 3.42 13.82 16.53 69.97 −2.69 2.13 −2.69 6.02 8.77 −1.83 1.01 3.73 1.53 21.70 −1.83 9.55 6.21 −0.62 6.47 1.72 6.70 1.19 14.02 17.75 14.89 31.31 −2.33 5.13 14.66 8.61 0.85 5.08 5.12
206
Table E.10 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter NumV (0 = ‘singular’, 1 = ‘plural’, 2 = ‘nonfinite’). 0
1
2
Construction
0
1
2
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act tog. grit tooth
−28.52 7.90 −11.70 5.99 −38.31 −16.44 −0.97 −18.55 −4.26 −51.18 19.58 −29.78 4.28 −23.79 −26.54 4.87 −38.52 −36.07 −24.67 −8.44
−3.19 −10.27 2.66 −11.69 28.21 6.57 −6.23 5.00 −12.80 −11.64 −10.49 2.36 −4.73 −0.22 −0.97 −4.71 3.84 −0.21 15.46 −12.76
31.71 2.37 9.04 5.70 10.10 9.87 7.21 13.54 17.06 62.82 −9.08 27.42 0.46 24.00 27.51 −0.17 34.67 36.28 9.21 21.19
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
4.44 −40.22 3.15 −15.30 13.59 −24.79 −35.23 −6.68 −8.01 −25.38 −28.06 −14.85 4.77 −21.51 −16.48 −31.85 −15.21 −12.43 −8.64
5.21 16.04 −4.28 17.54 −16.29 0.75 −2.14 −7.69 −14.90 −2.40 0.98 −11.64 −2.53 −8.79 −0.28 2.36 1.83 −3.53 −3.48
−9.65 24.18 1.13 −2.24 2.69 24.05 37.37 14.36 22.91 27.78 27.07 26.49 −2.24 30.30 16.76 29.49 13.38 15.95 12.12
Rethinking Idiomaticity
Construction
Table E.11 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter Aspect (0 = ‘simple’, 1 = ‘progressive’, 2 = ‘perfective’). 0
1
2
Construction
0
1
2
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act tog. grit tooth
26.36 33.09 −3.98 18.73 −1.71 32.03 32.32 3.57 12.58 −38.53 4.14 22.76 27.36 6.72 −2.74 39.83 40.43 23.03 23.01 15.48
12.74 4.16 36.31 7.93 −1.35 2.60 −0.76 17.13 15.52 8.29 −4.32 9.44 −3.10 4.64 35.59 −4.18 −1.34 14.23 5.52 19.96
−39.10 −37.26 −32.33 −26.66 3.06 −34.64 −31.56 −20.70 −28.09 30.24 0.19 −32.20 −24.26 −11.35 −32.85 −35.65 −39.10 −37.26 −28.53 −35.44
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
41.12 10.46 14.68 −2.76 32.28 2.96 8.92 16.87 26.60 8.08 −8.28 2.14 43.42 11.48 −18.67 20.66 8.74 17.82 5.60
−4.18 19.45 22.02 −0.21 6.28 15.55 13.74 4.12 11.67 19.50 39.10 33.96 −4.68 4.82 51.99 6.27 3.81 5.62 5.76
−36.94 −29.91 −36.70 2.97 −38.56 −18.51 −22.66 −20.99 −38.27 −27.57 −30.82 −36.10 −38.74 −16.31 −33.31 −26.92 −12.55 −23.44 −11.36
Appendices
Construction
207
208
Rethinking Idiomaticity
Table E.12 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter Tense. Construction
Past
Present
Future
Nonfinite
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
−21.53 −22.30 7.11 15.64 32.18 −6.23 41.23 39.59 34.36 −11.98 64.74 −0.46 16.60 −5.65 13.09 1.61 −0.79 −16.80 −4.85 31.34 16.26 −12.71 41.49 42.30 47.07 18.88 14.87 13.23 22.52 8.97 −4.60 27.02 2.44 12.46 3.77 12.28 34.20 4.09 34.46
−24.17 19.65 −19.84 −29.51 −44.10 −11.63 −47.49 −53.46 −50.58 −50.61 −54.80 −30.91 −27.11 −26.14 −40.07 −5.05 −42.69 −20.66 −2.79 −52.19 −5.91 −18.07 −41.74 −42.64 −49.01 −45.03 −52.09 −28.02 −44.41 −41.13 −23.67 −53.95 −0.08 −43.92 −21.45 −40.21 −46.02 −22.76 −46.76
11.77 0.28 3.70 8.71 1.82 7.99 −0.34 0.32 −0.84 −0.23 −0.85 2.57 11.98 −1.56 0 3.61 9.55 6.69 2.66 −0.34 −0.70 7.62 0.15 2.57 −1.02 −0.09 −0.62 0.43 −1.02 3.64 0.51 −0.56 −0.13 0.82 0.92 −1.56 0.21 2.71 0.26
33.93 2.37 9.04 5.16 10.10 9.87 6.60 13.54 17.06 62.82 −9.08 28.80 −1.48 33.36 26.99 −0.17 33.93 30.77 4.98 21.19 −9.65 23.16 0.10 −2.24 2.96 26.25 37.84 14.36 22.91 28.52 27.76 27.49 −2.24 30.64 16.76 29.49 11.61 15.95 12.04
Table E.13 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter Mood (0 = ‘indicative’; 1 = ‘subjunctive’, 2 = ‘nonfinite’). 0
1
2
Construction
0
1
2
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act tog. grit tooth
−37.63 0.49 5.57 −8.11 −10.94 −14.07 −3.53 −13.23 −8.26 −58.74 7.45 −24.71 −20.24 −27.66 −23.95 −14.71 −35.77 −32.19 −10.05 −18.33
7.03 −2.86 −2.58 10.51 0.84 4.20 −3.07 −0.31 −3.48 −4.08 1.63 −3.39 21.72 3.66 −3.04 14.88 5.55 2.34 5.07 −2.86
30.60 2.37 −2.99 −2.40 10.10 9.87 6.60 13.54 11.74 62.82 −9.08 28.11 −1.48 24.00 26.99 −0.17 30.23 29.85 4.98 21.19
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
3.39 −25.20 7.75 2.87 1.12 −25.85 −36.58 −9.58 −19.38 −36.33 −25.40 −23.40 5.96 −30.98 −13.50 −25.40 −7.53 −15.37 −9.27
6.26 2.04 0.03 −0.64 9.39 1.80 −0.80 −0.90 −3.54 7.81 −2.36 −4.08 −3.36 0.34 −3.26 −4.08 −4.08 −0.48 −2.26
−9.65 23.16 −7.77 −2.24 −10.51 24.05 37.37 10.48 22.91 28.52 27.76 27.49 −2.60 30.64 16.76 29.49 11.61 15.85 11.53
Appendices
Construction
209
210
Rethinking Idiomaticity
Table E.14 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter Voice. Construction
Active
Passive
Construction
Active
Passive
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
35.97 34.13 32.21 34.35 −1.57 34.06 30.47 35.97 30.89 31.30 35.97 35.97 34.68 10.48 35.97 34.24 35.97 35.97 32.45 35.97
−35.97 −34.13 −32.21 −34.35 1.57 −34.06 −30.47 −35.97 −30.89 −31.30 −35.97 −35.97 −34.68 −10.48 −35.97 −34.24 −35.97 −35.97 −32.45 −35.97
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
35.97 35.97 33.57 34.59 35.70 32.29 35.97 25.02 35.97 34.11 30.45 35.97 35.61 28.83 34.32 35.10 35.97 24.90 18.30
−35.97 −35.97 −33.57 −34.59 −35.70 −32.29 −35.97 −25.02 −35.97 −34.11 −30.45 −35.97 −35.61 −28.83 −34.32 −35.10 −35.97 −24.90 −18.30
Table E.15 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter Neg. Construction
Absent
Present
Construction
Absent
Present
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
8.36 13.19 12.77 10.17 10.11 8.66 13.20 10.31 12.97 13.70 −20.68 −1.52 9.22 11.16 13.47 7.27 12.81 14.11 1.65 15.03
−8.36 −13.19 −12.77 −10.17 −10.11 −8.66 −13.20 −10.31 −12.97 −13.70 20.68 1.52 −9.22 −11.16 −13.47 −7.27 −12.81 −14.11 −1.65 −15.03
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
−70.31 12.99 6.47 11.58 12.33 −6.29 11.74 13.64 7.63 15.03 −0.14 15.03 −41.80 12.99 6.77 13.29 4.41 11.99 13.50
70.31 −12.99 −6.47 −11.58 −12.33 6.29 −11.74 −13.64 −7.63 −15.03 0.14 −15.03 41.80 −12.99 −6.77 −13.29 −4.41 −11.99 −13.50
211
Appendices Table E.16 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter NumNP. Construction
Singular
Plural
Construction
Singular
Plural
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
25.93 4.22 28.15 12.47 −71.24 28.15 21.02 −71.85 20.17 −71.85 21.01 −71.85 28.15 13.63 −1.02 28.15 27.41 25.40 27.45 −71.85
−25.93 −4.22 −28.15 −12.47 71.24 −28.15 −21.02 71.85 −20.17 71.85 −21.01 71.85 −28.15 −13.63 1.02 −28.15 −27.41 −25.40 −27.45 71.85
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
19.10 24.07 27.46 22.63 15.21 28.15 27.68 13.92 −49.11 28.15 −10.47 10.15 27.43 25.77 28.15 28.15 28.15 6.63 −8.86
−19.10 −24.07 −27.46 −22.63 −15.21 −28.15 −27.68 −13.92 49.11 −28.15 10.47 −10.15 −27.43 −25.77 −28.15 −28.15 −28.15 −6.63 8.86
212
Rethinking Idiomaticity
Table E.17 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter Det. Construction
Zero
1
2
3
4
5
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
73.19 −1.67 86.45 5.56 18.66 54.02 90.56 90.96 −8.19 11.63 −3.32 −7.66 90.32 −2.26 9.19 −7.31 90.22 85.46 −9.04 −7.82 −5.59 −6.99 −6.64 −6.97 2.82 70.38 73.12 −6.15 8.22 −7.92 24.41 89.96 0.32 −2.57 −9.04 88.36 90.96 7.70 19.72
−56.30 11.15 −60.36 21.38 9.53 −42.11 −61.45 −61.86 35.84 17.47 32.43 36.07 −61.86 −2.18 −41.03 28.66 −61.86 −61.86 38.14 36.92 −51.51 −42.47 35.40 17.45 −59.70 −52.30 −61.86 6.00 19.78 37.03 −20.82 −61.86 19.44 10.25 37.31 −61.86 −61.86 −5.47 −44.49
8.23 2.14 −2.49 −2.37 −4.00 3.01 −4.00 −4.00 −4.00 −4.00 −4.00 −3.31 −4.00 −2.71 −2.95 −3.13 −4.00 −4.00 −4.00 −4.00 −0.12 −2.98 −4.00 −4.00 −4.00 5.56 −4.00 −2.90 −4.00 −4.00 −2.27 −4.00 5.00 −3.32 −4.00 −4.00 −4.00 −3.28 −2.83
−3.56 −3.56 −2.06 −3.56 −3.56 −1.01 −3.56 −3.56 −3.08 −3.56 −3.56 −3.56 −2.92 −0.98 −2.00 3.33 −3.56 −3.56 −3.56 −3.56 −3.56 −3.56 −3.56 −3.56 −2.21 −3.56 −3.56 5.69 −3.56 −3.56 0.92 −3.56 −3.56 1.88 −2.74 −3.56 −3.56 −0.78 2.06
−1.48 0.36 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.36 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.48 −1.21 −1.48 −1.48 0.41 −0.38 −1.48 −0.10 −0.48 −1.48 −1.14 −1.48 −1.48 −1.48 −1.01 −0.38
−20.07 −8.41 −20.07 −19.53 −19.15 −12.43 −20.07 −20.07 −19.22 −20.07 −20.07 −20.07 −20.07 9.61 38.26 −20.07 −19.33 −14.56 −20.07 −20.07 62.26 57.48 −19.73 −1.45 64.30 −18.60 −2.23 −3.05 −20.07 −20.07 −2.14 −20.07 −19.71 −5.10 −20.07 −17.46 −20.07 2.85 25.92
213
Appendices Table E.18 Flexibility values (in per cent) for the parameter levels of the morphological flexibility parameter Gerund. Construction
Absent
Present
Construction
Absent
Present
bear fruit beg question break ground break heart call police carry weight catch eye change hand close door cross finger cross mind deliver good do trick draw line fight battle fit bill follow suit foot bill get act together grit tooth
−0.64 2.08 −3.32 −2.17 −0.076 0.15 −2.81 −13.34 −16.41 −7.31 2.69 −2.13 2.69 −6.34 −0.95 1.83 −1.01 −3.73 −1.53 0.86
0.64 −2.08 3.32 2.17 0.076 −0.15 2.81 13.34 16.41 7.31 −2.69 2.13 −2.69 6.34 0.95 −1.83 1.01 3.73 1.53 −0.86
have clue have laugh hold breath leave mark make face make headway make mark make point meet eye pave way play game scratch head see point take course take piss take plunge take root tell story write letter
2.26 −9.55 −6.21 0.62 −6.47 −1.72 −6.70 −1.19 −14.02 −17.75 −14.89 −9.31 2.33 −3.77 −14.66 −6.87 −0.85 −4.77 −4.97
−2.26 9.55 6.21 −0.62 6.47 1.72 6.70 1.19 14.02 17.75 14.89 9.31 −2.33 3.77 14.66 6.87 0.85 4.77 4.97
F. An overview of the formal flexibility parameters and their parameter levels with corresponding examples
Table F.1 An overview of the formal flexibility parameters and their parameter levels with corresponding examples. Example
Level
SF
declarative active declarative passive relative cl. active particle cl. active relative cl. passive particle cl. passive interrogative active imperative active interrogative passive imperative passive any absent/present attr. adjective for NP attr. noun for NP post-mod PP for NP post-mod. relative cl. adverb(ial)s no adverb(ial) adverb modifying N space adverbial time adverbial process adverbial respect adverbial contingency adverbial modality adverbial degree adverbial
She told the story. The story has been told countless times. Mary told the story she had written. There are many more stories to tell. They agreed that this is a story that needs to be told. Inevitably there is a little story to be told about this. I mean, didn’t you have a good story to tell? Don’t tell me that story! Can a new story be told here? Ruth heard Adam teeth grit at the memory. It is a story never to be told in full. He would tell the rudest stories out loud ... I will tell you a horror story. The runner has gone into it and told the story of the battle. A story was told whose smear value demands immediate publication. Now it is important for me to tell the story correctly. I will tell you a story. I could tell quite a story. People tell me stories on the doorsteps. The story has often been told. The story is told in seven episodes each covering a day. He told them a story about his own children when they were very young . I tell this story for two reasons. She really is telling this story. . . . tell the story of the Poison Feast from the Drachenfels novel in full . . .
nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal interval interval interval interval interval nominal nominal nominal nominal nominal nominal nominal nominal nominal (continued)
LF LF LF LF LF LF LF
Add AttrAdj AttrNP PP RelCl NoAdv KindAdv
215
Parameter levels
Appendices
Parameter
Table F.1 Cont. Example
Level
MF Person
infinitive first second third vocative other singular plural nonfinite past present future nonfinite simple progressive perfective indicative subjunctive nonfinite active passive absent/present the/my/no/what some/any/enough this/that these/those a[n]/each/every/[n]either singular plural absent/present
In the city nobody was allowed to tell stories. I will tell you a funny story about that. Well, you sometimes tell stories for fun, don’t you? She may not be brilliant but she tells a straightforward story. Come on, Dandelion, tell us a story. The act of telling one’s life story is an encounter with reality. But even he may not tell the whole story. Cumulatively, these studies are telling a similar story. No, I am not here to tell you a story. She told us a terrible story. Tell your story based on the poem. I am going to tell you a story. He loved telling the story. Even the abrupt and anticlimatic conclusion tells a story. He gave me an odd look as if I was telling strange stories. I have also been told the same story time and time again. I tell this story for two reasons. We ought to tell the whole story to Zacco. This is the most difficult part of my story for me to tell. I told him your cover story and he swallowed it quite happily. Penny’s story was bravely told. But even he may not tell the whole story. She tells their story well enough. He told some funny stories. I never told my wife this story. Barbara must not be permitted to tell these stories. Daddy is going to tell you a bedtime story. Then you can tell this story until you die, brother. Then he told them terrible stories of his wild and criminal life at sea. I shall summarize . . . by telling a particular story about beavers.
nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal nominal
MF NumV
MF Tense
MF Aspect
MF Mood
MF Voice MF Neg MF Det
MF NumNP MF Gerund
Rethinking Idiomaticity
Parameter levels
216
Parameter
G. Scatterplots displaying the correlation between the two flexibility measures (Barkema-based = ‘B’, entropy =‘H’).
Figure 2.1
218
Figure 2.2
Figure 2.3
Rethinking Idiomaticity
Appendices
Figure 2.4
Figure 2.5
219
220
Figure 2.6
Figure 2.7
Rethinking Idiomaticity
Appendices
Figure 2.8
Figure 2.9
221
222
Figure 2.10
Figure 2.11
Rethinking Idiomaticity
Appendices
Figure 2.12
Figure 2.13
223
224
Figure 2.14
Figure 2.15
Rethinking Idiomaticity
Appendices
Figure 2.16
Figure 2.17
225
226
Figure 2.18
Rethinking Idiomaticity
References
Abeill´e, Anne. 1995. The flexibility of French idioms: a representation with lexicalised tree adjoining grammar, in Markus Everaert, Erik-Jan van der Linden, Andr´e Schenk and Rob Schreuder (eds), Idioms: Structural and Psychological Perspectives. Hillsdale: Erlbaum, 15–42. Abeill´e, Anne and Yves Schabes. 1996. Non-compositional discontinuous constituents in Tree Adjoining Grammar, in Harry Bunt and Arthur van Horck (eds), Discontinuous Constituency. Berlin/New York: Mouton de Gruyter, 113–40. Aijmer, Karin. 1996. Conversational Routines in English: Convention and Creativity. London: Longman. Alexander, Richard J. 1978. Fixed expressions in English: a linguistic, psycholinguistic, sociolinguistic and didactic study (Part 1). Anglistik und Englischunterricht 6: 171–88. Alexander, Richard J. 1984. Fixed expressions in English: reference books and the teacher. English Language Teaching Journal 38(2): 127–32. Arppe, Antti and Juhavi J¨arvikivi. 2007. Every method counts – combining corpus-based and experimental evidence in the study of synonymy. Corpus Linguistics and Linguistic Theory 3(2): 131–59. Baldwin, Timothy, Takaaki Tanaka and Dominic Widdows. 2003. An empirical model of multiword expression decomposability. Proceedings of the ACLWorkshop on Multiword Expressions: Analysis, Acquisition, and Treatment, 89– 96. Bannard, Colin. 2005. Learning about the meaning of verb particle constructions from corpora. Journal of Computer Speech and Language 19(4): 467–78. Bannard, Colin, Timothy Baldwin and Alex Lascarides. 2003. A statistical approach to the semantics of verb-particles. Proceedings of the ACL-Workshop on Multiword Expressions: Analysis, Acquisition, and Treatment, 65–72. Bard, Ellen G., Dan Robertson and Antonella Sorace. 1996. Magnitude estimation of linguistic acceptability. Language 72(1): 32–68. Barkema, Henk. 1994a. Determining the syntactic flexibility of idioms, in Udo Fries, Gunnel Tottie and Peter Schneider (eds), Creating and Using English Language Corpora. Amsterdam: Rodopi, 39–52.
228
References
Barkema, Henk. 1994b. The idiomatic, syntactic and collocational characteristics of received NPs: some basic statistics. Hermes Journal of Linguistics 13: 19–40. Barkema, Henk. 1996a. Idiomaticity and terminology: a multi-dimensional descriptive model. Studia Linguistica 50(2): 125–60. Barkema, Henk. 1996b. The effect of inherent and contextual factors on the grammatical flexibility of idioms, in Ian Lancashire, Charles F. Meyer and Carol Percy (eds), Synchronic Corpus Linguistics. Papers from the 16th International Conference on English Language Research on Computerized Corpora. Amsterdam: Rodopi, 69–83. Barsalou, Lawrence R. 1992. Cognitive Psychology: An Overview for Cognitive Scientists. Hillsdale, NJ: Erlbaum. Berg, Thomas and Sabine Helmer. 2006. Determinants of compound use: a cross-linguistic comparison. Unpublished manuscript. Bergen, Benjamin K. and Nancy Chang. 2005. Embodied construction gram¨ mar in simulation-based language understanding, in Jan-Ola Ostman and Mirjam Fried (eds), Construction Grammar: Cognitive Grounding and Theoretical Extensions. Amsterdam/Philadelphia: John Benjamins, 147–90. Berry-Rogghe, Godelieve L. M. 1974. Automatic identification of phrasal verbs, in Mitchell (ed.), Computers in the Humanities. Edinburgh: Edinburgh University Press, 16–26. Biber, Douglas. 1993. The multi-dimensional approach to linguistic analyses of genre variation: an overview of methodology and findings. Computers and the Humanities 26: 331–45. Bobrow, Samuel A. and Susan M. Bell. 1973. On catching on to idiomatic expressions. Memory and Cognition 1: 343–46. Bolinger, Dwight. 1977. Idioms have relations. Forum Linguisticum 2(2): 157– 69. Bortz, Ju¨ rgen. 2005. Statistik f¨ur Human-und Sozialwissenschaftler , 6th ed. Heidelberg: Springer. Bybee, Joan. 1985. Morphology: A Study of the Relation between Meaning and Form. Amsterdam/Philadelphia: John Benjamins. Bybee, Joan. 2001. Phonology and Language Use. Cambridge: Cambridge University Press. Bybee, Joan. 2006. From usage to grammar: the mind’s response to repetition. Language 82(4): 711–33. Bybee, Joan and Paul Hopper (eds). 2001. Frequency and the Emergence of Linguistic Structure. Amsterdam/Philadelphia: John Benjamins. Cacciari, Cristina and Sam Glucksberg. 1991. Understanding idiomatic expressions: the contribution of word meanings, in Greg B. Simpson (ed), Understanding Word and Sentence. The Hague: North Holland, 217–40. Cacciari, Cristina and Patricia Tabossi. 1988. The comprehension of idioms. Journal of Memory and Language 27: 668–83. Carter, Ronald. 1987. Vocabulary: Applied Linguistics Perspectives. London: Routledge.
References
229
Chafe, Wallace. 1968. Idiomaticity as an anomaly in the Chomskyan paradigm. Foundations of Language 4: 109–27. Chomsky, Noam. 1980. Rules and Representations. New York: Columbia University Press. Collins Cobuild Dictionary of Idioms. 2002. 2nd ed. London: Harper Collins. Coulmas, Florian. 1981. Conversational Routine: Explorations in Standardized Communication Situations and Prepatterned Speech. The Hague: Mouton de Gruyter. Cowie, Anthony P. 1988. Stable and creative aspects of vocabulary use, in Ronald Carter and Michael J. McCarthy (eds), Vocabulary and Language Teaching . London/New York: Longman, 126–39. Cowie, Anthony P. and Peter Howarth. 1996. Phraseology – a select bibliography. International Journal of Lexicography 9(1): 38–51. Cowie, Anthony P. and Ronald Mackin. 1975. Oxford Dictionary of Current Idiomatic English I (retitled Oxford Dictionary of Phrasal Verbs 1993). Oxford: Oxford University Press. Cowie, Anthony P., Ronald Mackin and Isabel R. McCaig. 1983. Oxford Dictionary of Current Idiomatic English II (retitled Oxford Dictionary of English Idioms 1993). Oxford: Oxford University Press. Croft, William. 2001. Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford: Oxford University Press. Croft, William and D. Alan Cruse. 2004. Cognitive Linguistics. Cambridge: Cambridge University Press. Cronbach, Lee J. 1951. Coefficient alpha and the internal structure of tests. Psychometrika 16: 297–334. Cronk, Brian C. and Wendy A. Schweigert. 1992. The comprehension of idioms: the effects of familiarity, literalness, and usage. Applied Psycholinguistics 13(2): 131–46. Crystal, David. 1996. Rediscover Grammar . London: Longman. Culicover, Peter W. 1999. Syntactic Nuts: Hard Cases, Syntactic Theory, and Language Acquisition. Oxford: Oxford University Press. Cutler, Anne. 1982. Idioms: the older the colder. Linguistic Inquiry 13(2): 317– 20. Diessel, Holger. 2004. The Acquisition of Complex Sentences. Cambridge: Cambridge University Press. Diessel, Holger and Michael Tomasello. 2005. Particle placement in early child language. Corpus Linguistics and Linguistic Theory 1: 89–111. Dong, Quang Phuc. 1971. The applicability of transformation to idioms. Papers from the Seventh Regional Meeting of the Chicago Linguistic Society, 198–205. Drew, Paul and Elizabeth Holt. 1995. Idiomatic expressions and their role in the organization of topic transition in conversation, in Markus Everaert, Erik-Jan van der Linden, Andr´e Schenk and Rob Schreuder (eds), Idioms: Structural and Psychological Perspectives. Hillsdale, NJ: Lawrence Erlbaum, 117–32. Ellis, Rod. 1984. Formulaic speech in early classroom second language development, in Jean Handscombe, Richard A. Orem and Barry P. Taylor (eds), On TESOL ’83. Washington, DC: TESOL, 53–65.
230
References
Evans, Vyvyan, Benjamin K. Bergen and J¨org Zinken. 2007. The cognitive linguistics enterprise: an overview, in Vyvyan Evans, Benjamin K. Bergen and J¨org Zinken (eds), The Cognitive Linguistics Reader . London: Equinox. Evert, Stefan. 2005. The statistics of word cooccurrences: word pairs and collocations. PhD dissertation, University of Stuttgart, 12 December 2006 http://www.elib.uni-stuttgart.de/opus/volltexte/2005/2371/. Fernando, Chitra. 1996. Idioms and Idiomaticity. Oxford: Oxford University Press. Fernando, Chitra and Roger Flavell. 1981. On Idiom: Critical Views and Perspectives (Exeter Linguistic Studies 5). Exeter: University of Exeter. Fillmore, Charles. Idiomaticity. University of California at Berkeley, 12 August 2006 http://www.icsi.berkeley.edu/∼kay/bcg/lec02.html. Fillmore, Charles, Paul Kay and Mary C. O’Connor. 1988. Regularity and idiomaticity in grammatical constructions. Language 64(3): 501–38. Fraser, Bruce. 1966. Some remarks on the verb-particle construction in English, in Francis P. Dinneen (ed.), Problems in Semantics, History of Linguistics, Linguistics and English. Washington, DC: Georgetown University Press. Fraser, Bruce. 1970. Idioms within a transformational grammar. Foundations of Language 6(1): 22–42. Gazdar, Gerald, Ewan Klein, Geoffrey K. Pullum and Ivan Sag. 1985. Generalized Phrase Structure Grammar . Oxford: Blackwell. Gibbs, Raymond W. 1980. Spilling the beans on understanding and memory for idioms in conversation. Memory and Cognition 8: 449–56. Gibbs, Raymond W. 1986. Skating on thin ice: literal meaning and understanding idioms in conversation. Discourse Processes 9(1): 17–30. Gibbs, Raymond W. 1992. What do idioms really mean? Journal of Memory and Language 31(4): 485–506. Gibbs, Raymond W. 1993. Why idioms are not dead metaphors, in Cristina Cacciari and Patrizia Tabossi (eds), Idioms. Processing, Structure, and Interpretation. Hillsdale, NJ: Lawrence Erlbaum, 57–77. Gibbs, Raymond W. 1994. The Poetics of Mind. Cambridge: Cambridge University Press. Gibbs, Raymond W. 1995. Idiomaticity and human cognition, in Markus Everaert, Erik-Jan van der Linden, Andr´e Schenk and Rob Schreuder (eds), Idioms: Structural and Psychological Perspectives. Hillsdale, NJ: Lawrence Erlbaum, 97– 116. Gibbs, Raymond W. and Gayle P. Gonzales. 1985. Syntactic frozenness in processing and remembering idioms. Cognition 20(3): 243–59. Gibbs, Raymond W. and Nandini Nayak. 1989. Psycholinguistic studies on the syntactic behavior of idioms. Cognitive Psychology 21(1): 100–38. Gibbs, Raymond W. and Nandini Nayak. 1991. Why idioms mean what they do. Journal of Experimental Psychology: General 120(1): 93–95. Gibbs, Raymond W., Nandini Nayak, John Bolton and Melissa Keppel. 1989. Speakers’ assumptions about the lexical flexibility of idioms. Memory and Cognition 17(1): 58–68.
References
231
Gibbs, Raymond W. and Jennifer O’Brien. 1990. Idioms and mental imagery: the metaphorical motivation for idiomatic meaning. Cognition 36(1): 35–68. Goldberg, Adele E. 1995. Constructions: A Construction Grammar Approach to Argument Structure. Chicago: Chicago University Press. Goldberg, Adele E. 2006. Constructions at Work: The Nature of Generalization in Language. Oxford: Oxford University Press. Grant, Lynn and Laurie Bauer. 2004. Criteria for re-defining idioms: are we barking up the wrong tree? Applied Linguistics 25(1): 38–61. Greenbaum, Sidney. 1976. Syntactic Frequency and Acceptability. Lingua 40(2/3): 99–113. Gries, Stefan Th. 2003. Multifactorial Analysis in Corpus Linguistics: A Case Study of Particle Placement. London/New York: Continuum. Gries, Stefan Th. 2008. Phraseology and linguistic theory: a brief survey, in Sylviane Granger and Fanny Meunier (eds), Phraseology: An Interdisciplinary Perspective. Amsterdam/Philadelphia: John Benjamins, 3–25. Gries, Stefan Th., Beate Hampe and Doris Sch¨onefeld. 2005. Converging evidence: bringing together experimental and corpus data on the association of verbs and constructions. Cognitive Linguistics 16(4): 635–76. Gries, Stefan Th. and Anatol Stefanowitsch. 2004. Extending collostructional analysis: a corpus-based perspective on ‘alternations’. International Journal of Corpus Linguistics 9(1): 97–129. Gries, Stefan Th. and Stefanie Wulff. 2005. Do foreign language learners also have constructions? Evidence from priming, sorting, and corpora. Annual Review of Cognitive Linguistics 3: 182–200. Hakuta, Kenji. 1986. Mirror of Language: The Debate on Bilingualism. Cambridge: Cambridge University Press. Hamblin, Jennifer L. and Raymond W. Gibbs. 1999. Why you can’t kick the bucket as you slowly die: verbs in idiom comprehension. Journal of Psycholinguistic Research 28(1): 25–39. Harris, Catherine. 1998. Psycholinguistic studies of entrenchment, in JeanPierre Koenig (ed.), Conceptual Structures, Language, and Discourse. Berkeley, CA: CSLI. Howarth, Peter. 1998. Phraseology and second language proficiency. Applied Linguistics 19(1): 24–44. Huddleston, Rodney and Geoffrey K. Pullum. 2002. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press. Hunston, Susan and Gill Francis. 2000. Pattern Grammar. A Corpus-Driven Approach to the Lexical Grammar of English. Amsterdam/Philadelphia: John Benjamins. Hymes, Dell. 1962. The ethnography of speaking, in Thomas Gladwin and William C. Sturtevant (eds), Anthropology and Human Behavior . Washington, DC: The Anthropological Society of Washington. Jackendoff, Ray. 1997. Twistin’ the night away. Language 73(3): 543–59. Jurafsky, Daniel. 1996. A probabilistic model of lexical and syntactic access and disambiguation. Cognitive Science 20(2): 137–94.
232
References
Katz, Jerrold J. and Peter Postal. 1963. Semantic interpretation of idioms and sentences containing them. MIT Research Laboratory of Electronics, Quarterly Progress Report 70: 275–82. Kay, Paul and Charles Fillmore. 1999. Grammatical constructions and linguistic generalizations: the What’s X doing Y construction. Language 75(1): 1–34. Kepser, Stephan and Marga Reis (eds). 2005. Linguistic Evidence: Empirical, Theoretical and Computational Perspectives. Berlin/New York: Mouton de Gruyter. Labov, William. 1972. Language in the Inner City. Philadelphia: University of Philadelphia Press. Lakoff, George. 1987. Women, Fire, and Dangerous Things. Chicago: University of Chicago Press. Landauer, Thomas K. and Susan T. Dumais. 1997. A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2): 211–40. Langacker, Ronald W. 1987. Foundations of Cognitive Grammar , Vol. 1. Stanford, CA: Stanford University Press. Langacker, Ronald W. 1991. Foundations of Cognitive Grammar , Vol. 2. Stanford, CA: Stanford University Press. Leech, Geoffrey. 1992. Corpora and theories of linguistic performance, in Jan Svartvik (ed.), Directions in Corpus Linguistics. Berlin/New York: Mouton de Gruyter, 105–22. Lin, Dekang. 1998a. Automatic retrieval and clustering of similar words. Proceedings of the 36th Meeting of the Association for Computational Linguistics, 768–74. Lin, Dekang. 1998b. Extracting collocations from text corpora. Proceedings of the First Workshop on Computational Terminology, Montreal, August 1998. Lin, Dekang. 1999. Automatic identification of noncompositional phrases. Proceedings of the 37th Annual Meeting of the ACL, College Park, USA, 317–24. Machonis, Peter A. 1985. Transformations of verb phrase idioms: passivization, particle movement, dative shift. American Speech 60(4): 291–308. Makkai, Adam. 1972. Idiom Structure in English. The Hague: Mouton. Manning, Christopher D. and Hinrich Sch¨utze. 2001. Foundations of Natural Language Processing . Cambridge, MA: MIT Press. Mason, Oliver. 2000. Parameters of collocation: the word in the center of gravity, in John M. Kirk (ed), Corpora Galore: Analyses and Techniques in Describing English. Amsterdam: Rodopi, 267–80. McCarthy, Diana, Bill Keller and John Carroll. 2003. Detecting a continuum of compositionality in phrasal verbs. Proceedings of the ACL-Workshop on Multiword Expressions: Analysis, Acquisition, and Treatment, 73–80. McGlone, Matthew S., Sam Glucksberg and Cristina Cacciari. 1994. Semantic productivity and idiom comprehension. Discourse Processes 17(2): 176–90. Michiels, Archibal. 1977. Idiomaticity in English. Revue des Langues Vivantes 43: 144–89. Mitchell, Terence F. 1971. Linguistic ‘goings-on’: collocations and other lexical matters arising on the syntagmatic record. Archivum Linguisticum 2: 35– 69.
References
233
Moon, Rosamund A. 1992. Textual aspects of fixed expressions in learners’ dictionaries, in Pierre J.L. Arnaud and Henri B´ejoint (eds), Vocabulary and Applied Linguistics. London: Macmillan, 13–27. Moon, Rosamund A. 1998. Fixed Expressions and Idioms in English. A Corpus-Based Approach. Oxford: Clarendon. Nagata, Hiroshi. 1989. Effect of repetition on grammaticality judgments under objective and subjective self-awareness conditions. Journal of Psycholinguistic Research 18: 255–69. Nattinger, James R. and Jeanette S. DeCarrico. 1992. Lexical Phrases and Language Teaching. Oxford: Oxford University Press. Nayak, Nandini and Raymond W. Gibbs. 1990. Conceptual knowledge and idiom interpretation. Journal of Experimental Psychology: General 119(3): 315–30. Newman, John and Sally Rice. 2005. Inflectional islands. Paper presented at the International Cognitive Linguistics Conference, Seoul, 9 July 2005. Newmeyer, Frederick J. 1974. The regularity of idiom behavior. Lingua 34: 327–42. Nicolas, Tim. 1995. Semantics of idiom modification, in Markus Everaert, ErikJan van der Linden, Andr´e Schenk and Rob Schreuder (eds), Idioms: Structural and Psychological Perspectives. Hillsdale, NJ: Lawrence Erlbaum, 233–52. Nunberg, Geoffrey. 1978. The Pragmatics of Reference. Bloomington, IN: Indiana University Linguistics Club. Nunberg, Geoffrey, Ivan A. Sag and Thomas Wasow. 1994. Idioms. Language 70(3): 491–538. Ortony, Andrew, Diane L. Schallert, Ralph E. Reynolds and S. J. Santos. 1978. Interpreting metaphors and idioms: some effects of context on comprehension. Journal of Verbal Learning and Verbal Behavior 17: 465–77. Oxford English Dictionary. 1989. Oxford: Oxford University Press. Pawley, Andrew. 1985. On speech formulae and linguistic competence. Lenguas Modernas 12: 80–104. Pawley, Andrew and Frances Syder. 1983. Two puzzles for linguistic theory: nativelike selection and nativelike fluency, in Jack C. Richards and Richard W. Schmidt (eds), Language and Communication. London: Longman, 191– 226. Pedersen, Viggo H. 1986. The translation of collocations and idioms, in Lars Wollin and Hans Lindquist (eds), Translations Studies in Scandinavia. Malm¨o: CWK Gleerup, 126–32. Peterson, Robert R. and Curt Burgess. 1993. Syntactic and semantic processing during idiom comprehension: neurolinguistic and psycholinguistic dissociations, in Cristina Cacciari and Patrizia Tabossi (eds), Idioms: Processing, Structure, and Interpretation. Hilldale, NJ: Erlbaum, 201–25. Pierrehumbert, Janet B. 2003. Probabilistic phonology: discrimination and robustness, in Rens Bod, Jennifer Hay and Stefanie Jannedy (eds), Probabilistic Linguistics. Cambridge, MA: MIT Press, 177–228. Pinker, Steven and Ray Jackendoff. 2005. The faculty of language: what’s special about it? Cognition 95(2): 201–36.
234
References
Plag, Ingo. 2005. Productivity, in Keith Brown (ed), Encyclopedia of Language and Linguistics. Oxford: Elsevier, 121–8. Reagan, Robert T. 1987. The syntax of English Idioms: can the dog be put on? Journal of Psycholinguistic Research 16(5): 417–41. Sacks, Harvey, Emanuel A. Schlegloff and Gail Jefferson. 1974. A simplest systematics for the organization of turntaking for conversation. Language 50(4): 696–735. Schiffrin, Deborah. 1994. Approaches to Discourse: Language as Social Interaction. Oxford: Blackwell. Schmid, Hans-J¨org. 2000. English Abstract Nouns as Conceptual Shells: From Corpus to Cognition. Berlin/New York: Mouton de Gruyter. Schone, Patrick and Dan Jurafsky. 2001. Is knowledge-free induction of multiword unit dictionary headwords a solved problem? Proceedings of the 6th Conference on Empirical Methods in Natural Language Processing , 100–8. Sch¨onefeld, Doris. 1999. Corpus linguistics and cognitivism. International Journal of Corpus Linguistics 4(1):137–71. Schweigert, Wendy A. 1986. The comprehension of familiar and less familiar idioms. Journal of Psycholinguistic Research 15(1): 33–45. Schweigert, Wendy A. and Danny R. Moates. 1988. Familiar idiom comprehension. Journal of Psycholinguistic Research 17(4): 281–96. Shannon, Claude E. 1950. Prediction and entropy of printed English. Bell System Technical Journal 3: 50–64. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Sonomura, Marion O. 1996. Idiomaticity in the Basic Writing of American English. Formulas and Idioms in the Writing of Multilingual and Creole-Speaking Community College Students in Hawaii. New York: Peter Lang. Sprenger, Simone A., Willem J.M. Levelt and Gerard Kempen. 2006. Lexical access during the production of idiomatic phrases. Journal of Memory and Language 54: 161–84. Stefanowitsch, Anatol and Stefan Th. Gries. 2003. Collostructions: investigating the interaction of words and constructions. International Journal of Corpus Linguistics 8(2): 209–43. Stefanowitsch, Anatol and Stefan Th. Gries. 2005. Covarying collexemes. Corpus Linguistics and Linguistic Theory 1(1): 1–43. Stroop, J. Ridley. 1935. Studies of interference in serial verb reactions. Journal of Experimental Psychology 18: 643–62. Sweet, Henry. 1889. The Practical Study of Language Teaching. London: Dent and Sons Ltd. Swinney, David and Anne Cutler. 1979. The access and processing of idiomatic expressions. Journal of Verbal Learning and Verbal Behavior 18: 523–34. Talmy, Leonard. 2000. Toward a Cognitive Semantics. Cambridge, MA: MIT Press. Titone, Debra A. and Cynthia M. Connine. 1994a. The comprehension of idiomatic expressions: effects of predictability and literality. Journal of Experimental Psychology: Learning, Memory and Cognition 20(5):1126–38.
References
235
Titone, Debra A. and Cynthia M. Connine. 1994b. Descriptive norms for 171 idiomatic expressions: familiarity, compositionality, predictability, and literality. Metaphor and Symbolic Activity 9(4): 247–70. Tomasello, Michael. 2003. Constructing a Language. Cambridge and London: Harvard University Press. Van der Linden, Erik-Jan. 1992. Incremental processing and the hierarchical lexicon. Computational Linguistics 18: 219–38. Van Lancker, Diana. 1975. Heterogeneity in language and speech: neurolinguistic studies. Working Paper in Phonetics 29, University of California at Los Angeles. Wasow, Thomas, Ivan Sag and Geoffrey Nunberg. 1983. Idioms: an interim report, in Shirˆo Hattori and Kazuko Inoue (eds), Proceedings of the 13th International Congress of Linguistics. Tokyo: CIPL, 102–15. Weinreich, Uriel. 1969. Problems in the analysis of idioms, in Jaan Puhvel (ed.), Substance and Structure of Language. Berkeley, CA: University of California Press. Wood, Mary McGee. 1986. A Definition of Idiom. Bloomington, IN: Indiana University Linguistics Club. Wray, Alison. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press. Wulff, Stefanie. 2003. A multifactorial corpus analysis of adjective order in English. International Journal of Corpus Linguistics 8(2): 245–82. Yorio, Carlos. 1989. Idiomaticity as an indicator of second language proficiency, in Kenneth Hyltenstam and Loraine K. Obler (eds), Bilingualism across the Life-Span. Cambridge: Cambridge University Press, 55–72. Zeschel, Arne. 2007. Delexicalization patterns: a corpus-based approach to incipient productivity in fixed expressions. Doctoral dissertation, University of Bremen.
Author Index
Aijmer, Karin 10 Alexander, Richard 11, 39 Arppe, Antti 169 Baldwin, Timothy 43 Bannard, Colin 43–44, 48, 52–53, 65–66 Bard, Ellen G. 28 Barkema, Henk 65, 67, 72–73, 77–82, 88, 90–91, 98, 100, 102, 107, 112, 115–116, 118, 123, 128, 131, 135, 141, 144–145, 148, 155–156, 164 Barsalou, Lawrence 168 Berg, Thomas 102, 149 Bergen, Benjamin K. 15 Berry-Rogghe, Godelieve L. M. 31–32, 34, 36–44, 55 Biber, Douglas 21, 164 Bobrow, Samuel A. 36 Bolinger, Dwight 37, 40 ¨ Bortz, Jurgen 150 Bybee, Joan 16–17, 169 Cacciari, Cristina 13, 30, 38, 39, 40, 54, 96 Carter, Ronald 11 Chafe, Wallace 69 Chomsky, Noam 35 Coulmas, Florian 10 Cowie, Anthony P. 9–11, 13, 39–40 Croft, William 15, 17–18 Cronbach, Lee J. 33–34 Cronk, Brian C. 36, 38 Crystal, David 93, 116, 118
Culicover, Peter W. 11 Cutler, Anne 36, 69–70 Diessel, Holger 4, 16 Dong, Quang Phuc 58 Drew, Paul 10 Ellis, Rod 10 Evans, Vyvyan 14, 17–18 Evert, Stefan 48, 69, 159 Fernando, Chitra 11–13, 16, 39 Fillmore, Charles 15 Gazdar, Gerald 73 Gibbs, Raymond W. 13–14, 30, 32, 36, 38–39, 55, 70, 82 Goldberg, Adele E. 14–18, 23, 65, 76, 166 Grant, Lynn 8 Greenbaum, Sidney 30 Gries, Stefan Th. 4, 11, 17, 20–23, 48, 66, 169 Hakuta, Kenji 10 Hamblin, Jennifer L. 55 Harris, Catherine 16 Howarth, Peter 9, 12, 39 Huddleston, Rodney 149 Hunston, Susan 23 Hymes, Dell 10 Jackendoff, Ray 11, 16 Jurafsky, Daniel 17, 43
237
Author Index Katz, Jerrold J. 11–12, 36 Kay, Paul 15 Kepser, Stephan 169
Pinker, Steven 16 Plag, Ingo 66 Reagan, Robert T.
Labov, William 10 Lakoff, George 15 Landauer, Thomas K. 66 Langacker, Ronald W. 15–16, 18, 22, 65, 168 Leech, Geoffrey 20 Lin, Dekang 17, 41–43, 48, 65–66, 70 Makkai, Adam 37, 68 Manning, Christopher D. 83, 149 Mason, Oliver 169 McCarthy, Diana 43–44, 46, 51 McGlone, Matthew S. 14, 38, 54 Michiels, Archibal 69 Mitchell, Terence F. 37 Moon, Rosamund A. 9–10, 25, 39, 71, 72 Nagata, Hiroshi 30 Nattinger, James R. 10–11, 39 Nayak, Nandini 13–14, 38, 70 Newman, John 164 Newmeyer, Frederick J. 69 Nicolas, Tim 73–74, 117–118 Nunberg, Geoffrey 36–37, 40, 69 Ortony, Andrew 36 Pawley, Andrew 8, 10–11 Pedersen, Viggo H. 48 Peterson, Robert R. 37 Pierrehumbert, Janet B. 168
69–70, 91
Sacks, Harvey 10 Schiffrin, Deborah 10 Schmid, Hans-J¨org 22 Schone, Patrick 43 Sch¨onefeld, Doris 12 Schweigert, Wendy A. 36, 38 Shannon, Claude E. 169 Sinclair, John 22 Sonomura, Marion O. 9, 35 Sprenger, Simone A. 39 Stefanowitsch, Anatol 20, 48 Stroop, J. Ridley 54 Sweet, Henry 13, 35 Swinney, David 36, 70 Talmy, Leonard 15 Titone, Debra A. 38, 40 Tomasello, Michael 4, 16–17 Van der Linden, Erik-Jan 72 Van Lancker, Diana 10 Wasow, Thomas 37, 73 Weinreich, Uriel 13, 37, 68 Wood, Mary McGee 40, 71 Wray, Alison 9–10, 12 Wulff, Stefanie 4, 17 Yorio, Carlos
39
Zeschel, Arne 149
Subject Index
abnormally decomposable idiom 30, 32, 38–39, 70 act up 47, 49, 50–51 addition 71–73, 75–76, 86–87, 93, 96, 102, 145–146, 148, 154, 158 analysability 14, 35, 66, 70 analysis of variance 73 association strength 42, 46–48, 50, 54, 66 bear X fruit 27, 128, 131 beg X question 27, 96, 102, 128 beta weight 158–160, 162 British National Corpus (BNC) 3, 7, 22, 25–27, 32–33, 44, 46–47, 78, 81, 87, 149 break X ground 27, 29, 98, 100, 102, 123 break X heart 27, 90, 96 call X police 27, 30, 61, 90, 96, 100, 102, 118, 131, 135 carry X weight 27, 94, 96 catch X eye 27, 96, 102 change X hand 27, 32, 90, 102, 115–116, 118 close X door 27, 30, 90, 102, 128, 141 cognitive linguistics 9, 14–15 cold war 78–80 collocability 72–73, 75 component loading 151–152, 154 compositionality 1, 3–8, 10–16, 23, 29, 32, 35–47, 49–74, 90–91, 96, 98,
102, 107, 112, 115, 118, 123, 128, 131, 135, 141, 149, 154–155, 158–159, 163–164, 166–167 configuration hypothesis 38 consecutive factors 151 constructicon 17, 100, 102, 107, 165–169 construction grammar 2, 8, 14–15, 17–20, 23 conventionality 37 corpus frequency 22, 33, 65, 72, 91, 149, 155, 158–159, 161 Cronbach’s alpha 33–34 cross X finger 27, 118, 131, 141 cross X mind 27, 96, 123, 128, 135, 141 decomposability 35, 38 delexicalization 76, 149, 166 deliver X good 27, 118, 131, 141 direct access model 36 directional entropy 85–87, 102, 107, 144–145, 164 do X trick 27, 77, 98, 100, 116, 123, 131, 141 draw X line 27, 30, 57, 59, 84–85, 90, 116, 123, 128 eigenvalue 151–152 entrenchment 16, 22, 42, 46–48, 50, 65, 163 entropy 67, 82–88, 91–92, 96–98, 101–102, 104, 106–107, 109, 111–112, 114–116, 118, 123, 128,
239
Subject Index 131, 135, 141, 144–146, 148–149, 155, 164, 169 exemplar theory 168 external modification 76 familiarity 14, 32–33, 36, 70–71, 91 fight X battle 27, 30, 98, 100, 102, 118, 123 fill up 47, 49–51 fill/fit X bill 27 Fisher Yates exact test 48 flexibility profile 79–80 follow X suit 27, 30, 90, 96, 98, 112 foot X bill 27, 30, 90, 96, 102, 128, 131 frozenness hierarchy 68–70, 72, 75 get X act together 27, 33, 90 give back 47, 49–51 give up 40, 47, 49–51, 61, 69 grammaticalization 164–165 grit X tooth 27, 30, 96, 100, 118
Latent Semantic Analysis 66 leave X mark 27, 131 lexical representation model 36 literal processing model 36 live down 47, 49–51, 55 magnitude estimation 28–29, 32–33 make X headway 27, 30, 77, 87, 94, 115, 123, 131, 166 make X mark 27 make X point 24, 27, 102, 116 make/pull X face 27 maximum entropy 84,-85 meet X eye 27, 90, 118, 131 modification 2, 44–45, 54, 73–74, 76, 80, 94, 96, 102, 107, 112, 115, 117, 146, 155, 165 morphological flexibility 5, 7, 34, 74, 76–77, 80, 84–85, 116–118, 123, 131, 135, 141, 144–146, 148, 154, 158–161, 164, 167, 169 multiple regression 157, 159, 165
have X clue 27, 29, 100, 112, 123, 135, 141 have X laugh 27, 57, 123, 131 hold up 47, 49–53 hold X breath 27, 30, 128
normally decomposable idiom 30, 32, 38–39, 70
Idiom Principle 22 idiom typologies 33 idiomatic variation continuum 4, 5, 7, 150–151, 153, 155 idiomaticity continuum 2, 4, 26, 157, 159, 161 idiomaticity judgement 2–3, 6–7, 28–29, 33, 35, 149, 157–158, 160, 165–166, 168 Inflectional Island Hypothesis 164 insertion 5, 69, 71–73 internal modification 73–74, 117 International Corpus of English 77
pave X way 27, 29, 98, 112, 123, 141 permutation 69, 72–73, 75 play X game 27, 90, 123, 141 principal component 150–152, 154–155, 159–161, 167 principal component analysis 150 productivity 14, 39, 65–66, 75
kick the bucket 2, 14, 18, 25, 39, 40, 69 knock down 47, 49–51 knock up 47, 49–51
omission 75 Open Choice Principle 22
quantitative corpus linguistics 20–22 raw frequency 21, 22, 48 register 32, 73, 78, 87 relative entropy 82–85, 87, 96, 123, 144 schematization 166–168
17–18, 100, 107, 164,
240
Subject Index
scratch X head 27, 30, 87, 90, 96, 98 see X point 27, 30, 90, 102, 112, 123, 131, 135, 141 semantic contribution 42, 52, 164 show off 47, 49- 52 substitution 42–44, 65, 71, 75, 166, 168 superlemma model 39 switch off 47, 49–52 take off 47, 49–53 take X course 27, 90, 100, 102 take X piss 27, 90, 96, 100, 123, 128, 166 take X plunge 27, 30, 59, 61, 123, 128, 135, 167–168
take X root 27, 29, 96, 100, 116, 123, 135 tell X story 27, 61, 90, 102 term selection 72–73, 148 token frequency 24, 74 type frequency 24 variance maximizing 151 verb particle construction (VPC) 35, 41–55, 59, 61–62, 64, 66, 163
7,
write X letter 27, 29–30, 61, 90, 94, 102, 107, 123, 149, 167–168 Zipf’s law
24