A Method for Linguistic Metaphor Identification: From MIP to MIPVU (Converging Evidence in Language and Communication Research)

A Method for Linguistic Metaphor Identification Converging Evidence in Language and Communication Research (CELCR) Ov...

72 downloads 1064 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

A Method for Linguistic Metaphor Identification

Converging Evidence in Language and Communication Research (CELCR) Over the past decades, linguists have taken a broader view of language and are borrowing methods and findings from other disciplines such as cognition and computer sciences, neurology, biology, sociology, psychology, and anthropology. This development has enriched our knowledge of language and communication, but at the same time it has made it difficult for researchers in a particular field of language studies to be aware of how their findings might relate to those in other (sub-)disciplines. CELCR seeks to address this problem by taking a cross-disciplinary approach to the study of language and communication. The books in the series focus on a specific linguistic topic and offer studies pertaining to this topic from different disciplinary angles, thus taking converging evidence in language and communication research as its basic methodology.

Editors Marjolijn H. Verspoor University of Groningen

Wilbert Spooren

Vrije Universiteit Amsterdam

Advisory Board Walter Daelemans

Leo Noordman

Cliff Goddard

Martin Pütz

University of Antwerp University of New England

Tilburg University University of Koblenz-Landau

Roeland van Hout

Radboud University Nijmegen

Volume 14 A Method for Linguistic Metaphor Identification. From MIP to MIPVU by Gerard J. Steen, Aletta G. Dorst, J. Berenike Herrmann, Anna A. Kaal, Tina Krennmayr and Trijntje Pasma

A Method for Linguistic Metaphor Identification From MIP to MIPVU

Gerard J. Steen Aletta G. Dorst J. Berenike Herrmann Anna A. Kaal Tina Krennmayr Trijntje Pasma VU University Amsterdam

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Library of Congress Cataloging-in-Publication Data A method for linguistic metaphor identification : from MIP to MIPVU / Gerard J. Steen ... [et al.]. p. cm. (Converging Evidence in Language and Communication Research, issn 1566-7774 ; v. 14) Includes bibliographical references and index. 1. Metaphor. I. Steen, Gerard. P301.5.M48M49 2010 415--dc22 2010011037 isbn 978 90 272 3903 7 (Hb ; alk. paper) isbn 978 90 272 8815 8 (Eb)

© 2010 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa

Table of contents Preface

ix

Acknowledgements

xi

chapter 1 Linguistic metaphor identification in usage 1.1 Introduction 1 1.2 Introducing MIP 5 1.3 Aspects of interpretation 7 1.4 Aspects of conceptualization 10 1.5 Aspects of operationalization 12 1.6 Data collection 16 1.7 Data analysis 19 1.8 Plan of the book 21 chapter 2 MIPVU: A manual for identifying metaphor-related words 2.1 The basic procedure 25 2.2 Deciding about words: Lexical units 26 2.2.1 General guideline 27 2.2.2 Exceptions 27 2.3 Indirect use potentially explained by cross-domain mapping 32 2.3.1 Identifying contextual meanings 33 2.3.2 Deciding about more basic meanings 35 2.3.3 Deciding about sufficient distinctness 37 2.3.4 Deciding about the role of similarity 37 2.4 Direct use potentially explained by cross-domain mapping 38 2.5 Implicit meaning potentially explained by cross-domain mapping 39 2.6 Signals of potential cross-domain mappings 40 2.7 New-formations and parts that may be potentially explained by cross-domain mapping 41

1

25

 A Method for Linguistic Metaphor Identification

chapter 3 Metaphor identification in news texts 3.1 Introduction 43 3.2 Establishing contextual meanings 46 3.2.1 Specialized terms 47 3.2.2 Novel compounds and novel metaphors 47 3.2.3 Contextual ambiguity 49 3.3 Establishing more basic meanings 52 3.4 Contrast and comparison 54 3.5 Direct metaphor 57 3.6 Conclusion 58 chapter 4 Metaphor identification in conversation 4.1 The wild world of conversation 61 4.2 Illustrating MIPVU in conversation 63 4.3 Challenges to MIPVU 68 4.3.1 Problems with identifying the contextual sense 68 4.3.2 Problems with identifying the basic sense 74 4.3.3 Problems with comparing contextual and basic sense – metonymy 79 4.4 Conclusion 85 chapter 5 Metaphor identification in fiction 5.1 Introduction 87 5.2 Straightforward application of MIPVU 90 5.3 Interesting issues 92 5.3.1 Directly expressed metaphors 92 5.3.2 Character descriptions 96 5.3.3 Personification 101 5.4 Conclusion 105 chapter 6 Metaphor identification in academic discourse 6.1 Introduction 107 6.2 Unanimous agreement 109 6.3 Lack of agreement 111 6.3.1 Metaphor identification and specialist terms: Metaphorical to whom? 111

43

61

87

107

Table of contents 

6.3.2 Metaphor-related words and scientific models 112 6.3.3 Metaphor-related words and text management 115 6.3.4 Metaphor-related words in extended contexts 120 6.4 Conclusion 124 chapter 7 Metaphor identification in Dutch news and conversations 7.1 Introduction 127 7.2 Operational issues 128 7.2.1 The corpus: News and conversation 128 7.2.2 Van Dale dictionary and its implications 130 7.3 Linguistic issues: Complex words and fixed expressions 132 7.3.1 Separable Complex Verbs 132 7.3.2 Polywords 135 7.4 Dutch metaphor analysis: Agreement and disagreement 138 7.4.1 Dutch discourse and agreement 138 7.4.2 Dutch discourse and disagreement 140 7.5 Conclusion 147 chapter 8 Reliability tests 8.1 Introduction 149 8.2 Method 151 8.3 Results and discussion: English-language research 153 8.3.1 Study 1 153 8.3.2 Study 2 154 8.3.3 Study 3 155 8.3.4 Study 4 156 8.3.5 Study 5 158 8.3.6 Study 6 160 8.3.7 General discussion of the English language tests 161 8.4 Results and discussion: Dutch-language research 162 8.5 Conclusion 164 chapter 9 From method to research: Cleaning up our act 9.1 Lexical units 167 9.1.1 Phrasal verbs 169 9.1.2 Polywords 170 9.1.3 Compounds 171 9.1.4 Conclusion 172

127

149

167

 A Method for Linguistic Metaphor Identification

9.2 Words classified as not analyzable or borderline 173 9.2.1 DFMAs 173 9.2.2 WIDLIIs 173 9.2.3 Conclusion 174 9.3 Classes of metaphor and metaphor signals 174 9.4 Individual metaphor-related words 177 9.4.1 Rationale 178 9.4.2 Method 179 9.4.3 Results and discussion 180 9.4.4 Post hoc corrections of individual lexical items 180 9.5 Conclusion 180 chapter 10 Metaphor in English discourse: A corpus-linguistic approach 10.1 Introduction 183 10.2 Method 185 10.2.1 Materials 185 10.2.2 Tools 185 10.2.3 Technique 186 10.2.4 Preparation of final database 189 10.3 Results and discussion: Initial exploration 189 10.3.1 Main metaphor categories 189 10.3.2 Simple and complex lexical units, and borderline cases 190 10.4 Results and discussion: Main analysis 194 10.4.1 Metaphor across register and word class 194 10.4.2 Metaphor across word class in four distinct registers 201 10.5 General comparison and conclusion 208 10.5.1 General comparison 208 10.5.2 Conclusion 217

183

chapter 11 The quality of evidence: From MIP to MIPVU

219

appendix Overview of annotated files from BNC-Baby

223

References

227

Index

235

Preface The subtitle of this book was formulated in Cáceres, in May 2008. We were at the seventh international conference for Researching and Applying Metaphor (RaAM 7) and saw that a substantial number of papers referred to MIP, or had even adopted it. MIP is the Metaphor Identification Procedure developed by the Pragglejaz Group and it had been published only one year before the conference took place in the journal Metaphor & Symbol (2007, Vol. 22, Number 1, 1–39). At the time of the conference, our research group had already applied MIP to a substantial amount of data, and had come up with a refined and extended version, which we call MIPVU: VU stands for Vrije Universiteit, the university in Amsterdam that we work at. MIPVU is largely based on MIP, but goes a good deal further in making explicit and systematic what sorts of decisions have to be taken by analysts when they identify words as related to metaphor. For these reasons, MIPVU is also more reliable than MIP, as we shall report in this book. In Cáceres, therefore, it became clear to us that our planned book report on our research should be framed and sold as a story that goes from MIP to MIPVU, simply because metaphor researchers were just getting interested in MIP. However, due to the immensely helpful comments of two reviewers, we have organized the book as a report concentrating on MIPVU as such. After an introductory chapter that sketches the differences between MIP and MIPVU, we immediately present our own procedure in Chapter 2, and go on, in Chapters 3 through 7, to demonstrate its application to four different registers in English and two of the same registers in Dutch. The final chapters then present a full-blown account of how MIPVU leads to results that are methodologically reliable and empirically revealing. We hope that this series of chapters offers a hands-on perspective on linguistic metaphor identification that is useful to all advanced students of metaphor in language use. Our research is a matter of team effort. Even though there is a functional difference in role between Gerard Steen as principal investigator and all of the other authors as Ph.D. students, each of us has worked on different parts of the project, and nobody has been in sole charge of any particular portion of the data, such as one register. The exception is Trijntje Pasma, who has done all of the empirical research on Dutch by herself. The methodological work, however, which is the main focus of this book, is a group product. That is why the book is published under our six names. Different authors have first responsibility for the various chapters, but the book as a whole is not an edited



A Method for Linguistic Metaphor Identification

book by one main editor. Instead, it is a collective volume that was planned and written in close collaboration, on a joint research project that took us more than two years to complete. The special nature of that undertaking is reflected in the special mode of publication of the present volume. Amsterdam, 5 January 2010

Gerard Steen Lettie Dorst Berenike Herrmann Anna Kaal Tina Krennmayr Trijntje Pasma

Acknowledgements Several colleagues have read portions of the first draft of the manuscript: we are grateful to Graham Low and Stefan Rook. The complete first draft and then final version were studied by two cohorts of MA students in the course ‘Metaphor in discourse’ at VU University, in the autumns of 2008 and 2009. Their responses have helped us to clarify hidden issues much better. Elmar Thalhammer and Susan Nacey also read the complete manuscript of the first draft and offered substantial feedback on content; Susan Nacey, in particular, went beyond the call of duty in giving us extensive and invaluable detailed advice on continuity and formulations. Two reviewers for Benjamins gave us invaluable feedback on the first draft, stimulating us to carry out a radical overhaul of the complete manuscript. Lachlan Mackenzie read and corrected the final manuscript, paying fine attention to linguistic and stylistic consistency. The research reported in this book has been partly sponsored by the VU-Ster programme ‘Rhetorical devices in public discourse’, in particular, the Arts Faculty project ‘Conversationalization in public communication’. The bulk of the research is sponsored by the NWO-Vici programme ‘Metaphor in discourse: Linguistic forms, conceptual structures, cognitive representations’ (277-30-001). Part of the latter research was carried out by Ewa Biernacka and Irene López Rodríguez, between September 2005 and August 2006. We are grateful for all of this support.

chapter 1

Linguistic metaphor identification in usage 1.1 Introduction Metaphor is booming business. Since the publication of Ortony’s (1979/1993) Metaphor and thought and Lakoff and Johnson’s (1980) Metaphors we live by, it has become a central object of study in psychology, philosophy, linguistics, poetics, history, anthropology, discourse studies, and other disciplines. Since 1986, there has been a scholarly journal called Metaphor and Symbol (originally entitled Metaphor and Symbolic Activity), and 2006 saw the foundation of a scholarly association for Researching and Applying Metaphor (RaAM). A recent overview of research has been published as the Cambridge Handbook of Metaphor and Thought (Gibbs 2008), covering a wide range of theoretical, empirical, and applied issues. Contrary to the traditional picture of metaphor as deviant, erratic, ornamental, and spurious, metaphor has been shown to be important to human experience in many different ways. In this light it is odd that there has hardly been any sustained interest in the methodological aspects of metaphor identification. The most influential school of metaphor research since the 1980s, cognitive linguistics, has focused on allegedly clear cases that are supposed to reflect underlying patterns of metaphorical thought. For instance, when words like attack, defend, manoeuvre, and so on are used to talk about discussions, arguments, or debates, they are taken to exhibit conventionally metaphorical senses that are abstract, not concrete (Lakoff & Johnson 1980; cf. Kövecses 2002, for an introduction). These abstract metaphorical senses are motivated, it is claimed, by an underlying metaphor in thought, or ‘conceptual metaphor,’ which is labelled as argument is war. Hundreds of studies in linguistics have been based on this approach and have identified metaphor in various aspects of language. However, fundamental methodological criticism has been voiced about this approach (e.g. Vervaeke & Kennedy 1996; Ritchie 2003, 2004; Haser 2005). It is argued that the delimitation of conceptual metaphors is not sufficiently constrained to allow for the precise identification of specific linguistic items as related to them. Thus, Ritchie has suggested that argument may be just as easily a matter of chess as of war, and that the criteria for deciding which is which are unclear. Other scholars



A Method for Linguistic Metaphor Identification

have questioned the need for postulating conceptual metaphors in the first place (Murphy 1996, 1997; Glucksberg 2001; Jackendoff 2002; McGlone 2007). They claim that most metaphorical expressions in language may have nothing to do with thought, but are a matter of lexical semantics which can be historically explained. And finally, within cognitive linguistics itself, views of what counts as an underlying conceptual metaphor have radically changed (Grady 1997, 1998, 2005; Lakoff & Johnson 1999), to the effect that the model for metaphor in thought has become rather complicated and tenuous, to say the least. In all, the identification of metaphor in language has become a matter of controversy (Steen 2007). Much of this criticism is motivated by methodological concerns in psycholinguistics and other cognitive and social sciences, where high standards of measurement are a conditio sine qua non. Since metaphor identification is a form of categorization of phenomena that are ‘out there’ in reality, it belongs to the realm of scientific measurement; methodologically it can be placed on a par with the measurement of IQ, stress, social and economic class, wealth, education, and so on. In broad areas of the cognitive and social sciences, adequate and accurate measurement of the phenomena under study is a logical prerequisite to all research. This means that metaphor identification has to meet certain generally accepted standards of methodological quality. As a gross generalization, this approach seems somewhat alien to many linguists and other researchers working in the humanities. They live in an academic world that often prefers to speak of ‘interpretation’ instead of measurement by some tool, as if it were a ruler or thermometer outside themselves. This is especially so because scholars in the humanities dealing with language usage like to tackle the full richness and complexity of a phenomenon. Because of this richness and complexity of the data, to many of these scholars, categorizing a particular linguistic expression as metaphorical is therefore a matter of interpretation, inevitably and naturally so. By tradition, the act of identifying many phenomena in the humanities does not raise the methodological issues of measurement that are fundamental in the cognitive and social sciences, such as validity and reliability. In fact, in many areas of the humanities, measurement is a suspicious if not dirty word. This is not to say that these scholars in the humanities do not care about the quality of their interpretations. On the contrary, part of the academic debate in much of the humanities is precisely concerned with the details and plausibility of interpretation. But these details are not systematically and explicitly related to an independent methodological framework, let alone a set of instruments, that imposes operational standards on the quality of measurement. The situation in the humanities seems largely a matter of attitude and culture, which is reflected in different vocabularies as well as a lack of independent methods

Chapter 1. Linguistic metaphor identification in usage

and techniques for doing research. The humanities typically do not offer courses in methods and techniques, as happens in the social and cognitive sciences. Instead, they typically offer courses on competing theoretical frameworks and approaches, which do not spend much time on the quality of measurement separated from conceptual issues themselves. Part of this attitude and culture can be understood by pointing to the traditional interest in the humanities in the unique and the individual, as opposed to the regular and the general: stereotypically, the humanities are idiographic, not nomothetic; they often favour a hermeneutic approach to the understanding of distinct people, events and phenomena. But it should be immediately acknowledged that the picture is more complex. On the one hand, there are many linguists—typically grammarians—who in fact are interested in general patterns, tendencies, and even laws (if there are any) in the structure and use of language. These linguists do talk of measurement and its quality. And on the other hand, there is an increasing number of linguists who interact with cognitive and social scientists, because they work in psycholinguistics, sociolinguistics, or applied linguistics. The need for massive annotation of language data in corpora, in particular, has also brought the problem of metaphor identification right into the centre of mainstream linguistic research. All of these are areas where being precise about measurement as measurement, not interpretation, is a natural concern. These are partly novel developments in the humanities that can also be observed in other disciplines. Traditional barriers are rapidly coming down, and interdisciplinary work between the humanities and the cognitive and social sciences is exponentially increasing. This is particularly so with reference to the rise of cognitive science and cultural studies, where attention to social-scientific methods and techniques of research trails in the wake of the adoption of social-scientific theoretical frameworks and models. Many doors have been opened for this type of interdisciplinary interaction and even collaboration over the past thirty years. Yet as long as these interdisciplinary tendencies have not been more broadly accepted as useful instead of counter-productive throughout the humanities, there remains a big gap between cognitive and social-scientific research on metaphor on the one hand, and other approaches to metaphor, on the other. The present book has been produced in the conviction that more work on these issues may help to close the gap between the two traditions. Many of the norms and values about measurement should be easily acceptable to any scholar in the humanities who aims at adequate and accurate observation, for the purpose of empirical description and explanation. They should even be fruitful to those who aim to observe in order to engage in more encompassing interpretative or critical activities. This book is a demonstration of how such measurement works in practice in as problematic and central a case as metaphor identification in language, and we hope that





A Method for Linguistic Metaphor Identification

it may thus contribute to diminishing some of the misunderstandings between the humanities and the cognitive and social sciences. Linguists who do not see themselves as typical humanities scholars in fact have a comparable problem. In spite of their more scientific interests, they are often less worried about the methods and techniques of metaphor identification than is in fact needed for high-quality empirical research in the cognitive and social sciences. To this group of linguists, the present book provides an example of how metaphor identification can be constrained and tested for its empirical quality in concrete and practical ways. If linguists aim to produce evidence about language and its use that is acceptable to cognitive and social scientists, they should at least be aware of the effort that is required to produce empirical evidence that has been tested against the same standards of quality. Only then can evidence from different disciplines become comparable enough for it to be taken equally seriously across the board, and function as converging evidence (Steen 2007). This volume is a casebook, in a slightly specific sense of the term. It presents an application and refinement of the first explicit and systematic procedure for linguistic metaphor identification in language usage that has been tested for its reliability, the Metaphor Identification Procedure, or MIP, developed by the Pragglejaz Group (2007). MIP is the result of six years of work by ten experienced metaphor researchers: Lynne Cameron, Alan Cienki, Peter Crisp, Alice Deignan, Ray Gibbs, Joe Grady, Zoltán Kövecses, Graham Low, Elena Semino, and Gerard Steen. They were convinced that it is both desirable and possible to bridge the gap between the humanities and the cognitive and social sciences for reasons like the ones sketched above. MIP is a tool for linguistic metaphor identification in natural discourse that can be employed by cognitive linguists, stylisticians, discourse analysts, applied linguists, psycholinguists, and sociolinguists. Since its publication it has been adopted in many metaphor studies that have been reported at various conferences, suggesting that it fulfils a practically felt need for methodological improvement. MIP therefore is a first step in the right direction. The empirical research that forms the basis of this volume constitutes a specific set of cases of linguistic metaphor identification in natural discourse by means of a more refined and somewhat extended version of MIP: we have identified linguistic metaphor in four registers of natural discourse in two languages. Refinement was necessary because MIP still requires taking many decisions (Pragglejaz Group 2007). The choices we have made will be explicated in the next chapter, in which we present our complete manual for metaphor identification, and then demonstrated in practice in five subsequent chapters. Moreover, since MIP has focused on one particular manifestation of metaphor in discourse, metaphorically used words, its coverage is not exhaustive. We have needed to make a number of additions during our own research, which have also fed into our application of MIP.

Chapter 1. Linguistic metaphor identification in usage

This has led to our own refined and extended version, which we call MIPVU; VU is the abbreviation of Vrije Universiteit, the Dutch name of the university at which our research was carried out. In the rest of this chapter, we will introduce MIP as published by the Pragglejaz Group (2007) and offer some general methodological comments on its nature and function. We will also comment on the implementations and additions in MIPVU, prefiguring the details of the manual in Chapter 2. And finally we will present a brief plan for the rest of the book. A review of the literature on metaphor identification will not be offered here. The present research has been carried out against the background of such a review presented in Steen (2007). In the present book we would therefore like to focus on presenting our own approach to developing a reliable method for linguistic metaphor identification in natural discourse.

1.2 Introducing MIP The Pragglejaz Group’s procedure MIP focuses on the linguistic analysis of metaphorically used words, or lexical units, in discourse. The technical term ‘lexical units’ is preferred to the more common term ‘words’ in a way which will be explained below (see Section 1.5), but we will frequently use the terms interchangeably when not much depends on the differentiation. MIP aims to offer an instrument for capturing the bulk of the linguistic expressions of metaphor that have been discussed in the literature over the past thirty years. It looks like this (Pragglejaz Group 2007: 3): 1. Read the entire text/discourse to establish a general understanding of the meaning. 2. Determine the lexical units in the text/discourse. 3. a. For each lexical unit in the text, establish its meaning in context, i.e. how it applies to an entity, relation or attribute in the situation evoked by the text (contextual meaning). Take into account what comes before and after the lexical unit. b. For each lexical unit, determine if it has a more basic contemporary meaning in other contexts than the one in the given context. For our purposes, basic meanings tend to be: –– more concrete; what they evoke is easier to imagine, see, hear, feel, smell, and taste; –– related to bodily action;





A Method for Linguistic Metaphor Identification

–– more precise (as opposed to vague); –– historically older. Basic meanings are not necessarily the most frequent meanings of the lexical unit. c. If the lexical unit has a more basic current/contemporary meaning in other contexts than the given context, decide whether the contextual meaning contrasts with the basic meaning but can be understood in comparison with it. 4. If yes, mark the lexical unit as metaphorical. The rationale of the Pragglejaz Group’s procedure is as follows. Metaphorical meaning in usage is indirect meaning: it arises out of a contrast between the contextual meaning of a lexical unit and its more basic meaning, the latter being absent from the actual context but observable in others. For instance, when a lexical unit like attack or defend is used in a context of argumentation, its contextual meaning has to do with verbal exchange. However, this is an indirect meaning, in the sense of Lakoff (1986, 1993) and Gibbs (1994), because it can be contrasted with the more basic meaning of these words in other contexts, which involves physical engagement between people. Since the basic meaning can afford a mapping to the contextual meaning on the grounds of some form of nonliteral comparison, all uses of defend and attack in contexts of argumentation can be analyzed as metaphorical. This procedure therefore provides an operational way of finding all metaphor in actual usage. Jonathan Charteris-Black (2004: 37) has independently used exactly the same rationale without formalizing or testing his approach. Alice Deignan (2005) has added that this type of method typically leads to the discovery of what she calls active and dead metaphor, with both basic and figurative senses of words like attack and defend being available in the language. Experience shows that novel metaphor is rarely found, but can be accommodated as follows. When the linguistic form wipe out is used in the context of argumentation, as in Lakoff and Johnson’s example If you use that strategy, he’ll wipe you out, its contextual sense is clear. However, that contextual sense, having to do with argumentation, has not become very conventionalized. For instance, it has not ended up in the Macmillan English Dictionary for Advanced Learners (Rundell 2002). Yet MIP does not have a problem with this: the ad hoc or situation-specific contextual sense of argumentation that may be constructed for wipe out may simply be contrasted with and compared to the basic sense of wiping out, which has to do with cleaning. As a result, wipe out also gets identified as metaphorical language use (cf. Steen 2007). By contrast, historical metaphor is not identified as metaphorical by MIP. For instance, the words fervent and ardent used to have two senses, one for temperatures and one for emotions. This may, for instance, be seen from the edition of the

Chapter 1. Linguistic metaphor identification in usage

Concise Oxford Dictionary published in 1974 (McIntosh 1974). However, in contemporary British English both terms have lost their original temperature sense: in the Macmillan dictionary, for instance, they only have their present-day emotion senses. Hence expressions like ardent lover are not judged to be metaphorical when analyzed by MIP because there is no contrast between the contextually appropriate emotion sense and the historically older and more basic temperature sense: the latter is simply not available to the typical contemporary language user anymore, as is reflected by the descriptions of the words in the modern users’ dictionary (Deignan 2005). In making this statement, we are in fact suggesting that metaphor is always a relational term, and is short for ‘metaphorical to some language user’ (Steen 2007). In our own work, we have adopted the position that our language user is the idealized native speaker of English as represented in the description of English by the dictionary of a particular period. Such a dictionary contains a complete and culturally sanctioned representation of the knowledge about the English lexicon which, for instance, has to be acquired by foreign language learners. We will have more to say about this as we proceed. MIP looks like a straightforward and easy method for metaphor identification. The simplicity of the wording, however, is misleading and conceals a whole set of assumptions. The set of instructions was developed and tested over five years. It now produces fairly reliable results between individual analysts who display relatively high levels of agreement between their independent analyses of texts (Pragglejaz Group 2007). Some of its strengths and weaknesses have been discussed in Steen (2007), while a concrete application is reported by Steen, Biernacka, et al. (in press). What we will do in the present chapter is explain some of the hidden issues and add some of our own details in order to motivate the full description of MIPVU in Chapter 2. 1.3 Aspects of interpretation In the following sections, we will offer a general methodological discussion of MIP and its implementation in our research with respect to five distinct stages of the empirical cycle: conceptualization, operationalization, data collection, data analysis, and interpretation (Steen 2007). For ease of exposition, we will begin with interpretation. This feeds into conceptualization which will then take us to the more operational stages of metaphor identification. It should be noted that this type of interpretation comes close to what happens in the humanities, but is specific in that it is constrained and made possible by its relation to the other four stages in the empirical cycle, which all have their own requirements for doing research.





A Method for Linguistic Metaphor Identification

Contrary to common practice in cognitive linguistics, the Pragglejaz Group do not aim to identify the precise nature of the underlying conceptual mappings between domains, such as argument and war, or emotions and temperatures, themselves. They identify the linguistic forms of metaphor, not its conceptual structures. In order to identify a word or set of words as metaphorically used, it is often sufficient to be able to say that there are two senses and that they may be related by comparison, or nonliteral similarity (e.g. Crisp 2002: 9–10). This is because the procedure only needs to find a more basic sense than the one that is used as the metaphorical discourse meaning. To determine which conceptual domains these words belong to is neither simple nor necessary and constitutes a research question of its own (Steen 2007). In fact, if precisely identified metaphorical mappings in conceptual structure were incorporated within MIP, this would reduce reliability, for identifying conceptual metaphors is open to much greater disagreement between analysts. As an illustration, consider an expression which refers to Gandhi’s political opponents (cf. Pragglejaz Group 2007). It may be relatively easy to agree that opponents is metaphorically used, but relatively difficult to agree that politics has to be seen as sports or as war. If the identification of metaphorically used words is made dependent on the identification of underlying conceptual structures, disagreement or lack of agreement about conceptual structures (source domains of sports versus war) would also mean disagreement about the identification of words as metaphorically used. The advantage of MIP in such difficult situations, which are quite common, is that it does not throw out the baby of the identified metaphorically used words with the bathwater of the troublesome conceptual domains and mappings. The same analysts often have less difficulty in agreeing that a word or expression is metaphorical than in establishing the precise nature of the underlying metaphorical concepts and structures. Other linguists dealing with metaphor in usage have also made a methodological separation between identifying the linguistic forms of metaphor as opposed to specifying it conceptual structures (e.g. Cameron 2003; Charteris-Black 2004). Another happy corollary of the linguistic as opposed to conceptual approach to metaphor identification in discourse is that analysts focusing on the linguistic forms of metaphor do not have to choose between competing models for cross-domain mappings. In cognitive linguistics, there is the two-domain approach (Lakoff & Johnson 1980, 1999) as opposed to the many-space approach (Fauconnier & Turner 2002), the two approaches exhibiting important differences which have been discussed at length (Steen 2007). If metaphor identification by means of MIP were to be founded on the identification of conceptual structures, a choice would have to be made for either of the two models. As a result, the general validity of the method might be called into question by proponents of the other approach.

Chapter 1. Linguistic metaphor identification in usage

The linguistic approach that has been adopted, however, permits the analyst to remain agnostic about conceptual structures. What is more, it also relieves the analyst from the duty of comparing these two cognitive-linguistic approaches to conceptual structure with alternative cognitive-scientific models of metaphor as expressing underlying cross-domain mappings, such as Gentner’s (1982) structure mapping approach. On a related note, the Pragglejaz Group aim their findings to be maximally compatible with, but emphatically distinct from, research into metaphor as part of people’s psychological processes and their products. There is no claim that any of the metaphorically used words identified by the procedure are also actively realized as metaphorical mappings in the individual mind. The idea is to find expressions in language that are potentially metaphorical in cognition, which is meant to suggest that it is in principle possible to connect them to research on psychological processes and their products. Whether metaphor in language is in fact cognitively real in the thought of every language user is an empirical issue that requires its own investigation. MIP and MIPVU, then, involve a number of distinct aspects that are related to respecting the distinctions among diverse areas of research. There is the differentiation between the linguistic and conceptual analysis of metaphor in usage. The linguistic analysis would have to show that metaphorical meaning is indirect meaning which is potentially motivated by similarity or cross-domain mapping, with an emphasis on ‘potentially’; this involves looking at some contrast and comparison between contextual meanings and basic meanings. A subsequent and independent conceptual analysis would then have to show that there are two distinct but comparable conceptual domains (or spaces) which may be linked by a cross-domain mapping. And yet another type of analysis (behavioural) would examine the realization of the linguistic forms and conceptual structures of metaphor in the cognitive processes and products in on-going usage. Methodologically, all of these are interpretative issues of what it means to do metaphor identification, which have to be taken into account if metaphor in language is to be found in adequate and accurate ways (Steen 2007). Our current undertaking in this book is to focus on the method for linguistic metaphor identification in usage. As a final note it should be added that this does not mean that we claim that we completely exclude all conceptual issues. All comparison is a conceptual act, including the comparison between word senses; and theoretically metaphor is defined as a mapping across distinct conceptual domains. However, what we do claim is that it is possible to do empirical work with the notion of comparison in such a superficial and language-oriented way that we can broadly agree on some form of nonliteral similarity between uses of words (often conventionalized as distinct and readily available senses) without having to specify which distinct



 A Method for Linguistic Metaphor Identification

conceptual domains are mapped on to each other. Moreover, with the Pragglejaz Group we claim that abstaining from precise conceptual analysis is in fact a sound and productive methodological strategy that yields better linguistic data which, in turn, can be used for relatively independent conceptual research. 1.4 Aspects of conceptualization The theoretical conceptualization of metaphor as a cross-domain mapping leads to a view of metaphor in language as based on indirectness plus similarity. This is what has been captured in the various parts of MIP. The basis of the identification of metaphor has been regarded as a matter of finding indirect meaning in lexical units by both Lakoff (1986, 1993) and Gibbs (1994). Although indirectness is a good starting point for finding metaphor in language, our own research has shown that it is not sufficient. It is both too broad and too narrow. It is too broad because metaphor is not only based on indirectness. There is also the differentiation between the two semantic domains involved in the expression that needs to be bridged by some form of semantic transfer from the one sense to the other, on the basis of some form of similarity or comparison (Pragglejaz Group 2007; cf. Cameron 2003). Thus, The argument collapsed can be given a metaphorical analysis because it involves a contrast between arguments and buildings or physical structures, which may be bridged by constructing a similarity between the two. This is different than another form of indirectness, metonymy, where two senses may be contrasted but where the contrast is bridged by contiguity instead of similarity. Thus, in The White House made the announcement yesterday, there is a contrast between buildings and the people that occupy them, causing a form of indirect meaning. This contrast is resolved by metonymic rather than metaphorical transfer, via the contiguous relationship between houses and their occupants (cf. Steen 2007). The criterion of indirectness is also too narrow to capture all linguistic forms of metaphor. If metaphor is defined as a mapping across two conceptual domains, it is easy to show that such cross-domain mappings in thought may also be realized by direct language instead of indirect language. One illustration may be provided by the following line from a song by Bruce Springsteen (‘I’m on fire’): Sometimes it’s like someone took a knife, baby, edgy and dull,/And cut a six-inch valley through the middle of my soul. This is a form of a cross-domain mapping between emotions and physical pain which is expressed directly: as listeners, we need to build a mental representation of the text that does include both a knife and a process of cutting as part of the text, for that is what the sentences instruct us to do. This is even explicitly signalled by the use of like. However, it is also clear that subsequent

Chapter 1. Linguistic metaphor identification in usage

conceptual analysis has to be carried out to recover the intended metaphorical meaning of this explicit cross-domain mapping: the explicit source-domain words, concepts, and referents have to be connected to the target domain. Such rhetorical figures, including many analogies and most similes, do not use language indirectly, as happens in If you use that strategy, he’ll wipe you out. But they still express metaphor defined as a cross-domain mapping. An inventory of these various forms of metaphor has been proposed by Goatly (1997). This observation suggests that we need to go back to the old criterion of indirect language use and revise it in such a way that other forms of metaphor can also be included within its compass. The key to this revision is to shift the criterion of indirectness from the use of linguistic signs to the use of conceptual structures (Steen 2007: 323). The use of a conceptual domain as a source to understand and talk about another conceptual domain which functions as a target is the true basis for metaphor in the study of usage. It embodies a form of indirectness in conceptualization which exploits the conceptual structure of one domain to conceive of another domain that is the local or global topic of an utterance or message. This type of indirectness in the conceptual structure of discourse may then be linguistically expressed in various ways: –– One of these ways is by means of metaphorically used language—this is the classic category of metaphor. It may consequently be defined by means of indirectly used language which expresses the indirect conceptualization of some target domain by means of some source domain (he’ll wipe you out). –– However, indirect conceptualization by means of a cross-domain mapping may also be expressed directly, and this may happen in various forms. Some of these forms may be explicit invitations for comparison, as in Shakespeare’s ‘Shall I compare thee to a summer’s day?’; others may be assertions of resemblance, such as by the preposition like; and still others may be counterfactual presentations of imagined re-categorizations (as in the Bruce Springsteen lines). In all of these cases, and there are more, indirectness in conceptualization is directly expressed by direct language. Most if not all of these classes of directly expressed metaphor have been studied as manifestations of metaphor in usage, some in cognitive linguistics, most notably in Blending Theory (Fauconnier & Turner 2002). However, they lie outside the scope of MIP, which focuses on indirectly expressed metaphorical meaning. With MIPVU we emphasize that indirect conceptualization by metaphor causes some form of referential and sometimes even topical discontinuity or incongruity in discourse, whether the indirect conceptualization is expressed in direct or indirect language. It introduces an alien conceptual domain into the dominant conceptual domain of the discourse (or discourse segment). An apparent lack of



 A Method for Linguistic Metaphor Identification

coherence consequently arises, which has to be resolved by assuming that a mapping from the foreign source domain to the dominant target domain must be performed. Again, this mapping may take place because it is triggered by indirect language use or by direct language, as the case may be. Indirect conceptualization by metaphor may be explicitly signalled in the language, or not, by metaphor markers. It may be restricted to one word within one utterance, or extend across a number of utterances in a row. And it may be explicitly expressed as a contrast between two domains that are both present in the discourse, or remain implicit. In all of these cases, the real problem for maintaining discourse coherence lies at the level of conceptual structure, not linguistic form (Steen 2007). This problem may manifest itself in varying ways in the linguistic structure. MIP has focused on the most frequent and typical one, the metaphorical (that is, indirect) use of lexical units. It has been part of our application of MIP to cater to the other forms of metaphor as well. This is the result of our conceptual definition of metaphor as a cross-domain mapping, in which we follow the basic starting point of cognitive linguistics. 1.5 Aspects of operationalization Linguistic expressions of underlying cross-domain mappings can be found at many levels of linguistic organization, including morphology (kangaroo-rat, dogfish; cf. Ryder 1994) and syntax (he gave me a headache; cf. Goldberg 1995). Not all of these are relevant for the purpose of identifying those metaphors in discourse that may be presumed to have a function, or the same function, for cognitive processing. Besides, there is so much metaphor around at all of these levels that it is more practical to single out one particular level of linguistic organization that seems most important for this type of research question. The most popular level in most of this research is the level of the word, or lexical unit, as may be seen from the thousands of examples in the cognitive-linguistic literature. This is also the level at which the Pragglejaz Group have pitched their procedure, MIP. As a result, in order to consistently measure metaphor at one level of usage, lexical units need to be systematically and exhaustively examined for metaphorical use, and annotated as such. All other manifestations of metaphor can consequently be left aside, at least for the moment. The main reason for choosing this unit of analysis is the relatively transparent relationship between words, concepts, and referents which is found in most analyses of metaphor in language and discourse. From a processing perspective, most words may be assumed to activate concepts in memory which postulate referents in discourse. A similar claim cannot be made with equal ease for the linguistic

Chapter 1. Linguistic metaphor identification in usage 

unit underneath the word, that is, morphemes, or for the linguistic unit above the word, that is, phrases. Although it is clear that the notion of word itself is not unproblematic, nor its relation to concepts and referents completely transparent (cf. Murphy 2002), it does look as if the word provides the best starting point for the analysis of metaphor in usage. The basic argument goes like this. Some of the presumable referents related to the concepts and words used in a stretch of discourse are not part of the current domain of discourse because the words and concepts have been used metaphorically. To infer the understood referents, these words and their activated concepts need the construction of a cross-domain mapping in order for the intended referents to be integrated into a coherent discourse representation (Steen 2007). This is presumably what would happen with defend in the sentence He defends his claims well: cognitive linguists argue that it would instate a ‘war’ referent if it were taken in its basic sense. What we claim is that all words that can be related to metaphor in this way could in theory be candidates for a cognitive cross-domain mapping by language users when they produce or comprehend language. This is an assumption about the potential of language, which is held by many metaphor researchers. Our formulation signals that this approach is intended to be compatible with, and can eventually be fed into, behavioural research on metaphor processing in psycholinguistics, but that that sort of research requires its own empirical testing (Steen 2007). We need one operational criterion by which all relevant units of analysis in language structure, words, are judged for their relation to metaphor as a crossdomain mapping in conceptual structure. As we have argued above, this is the notion of indirectness by similarity. This criterion, however, is not one monolithic whole: it has been broken down into a number of separate questions by the Pragglejaz Group when they formulated MIP. It turns out that the analyst applying MIP needs to make at least five distinct decisions: 1. 2. 3. 4. 5.

What counts as a stable unit of analysis? What is an adequate description of contextual (situation specific) meaning? What is a generally motivated description of basic meaning? What is the degree of distinctness between the two meanings? What is the degree of similarity between the two meanings?

The five questions focus on the precise role of a potentially metaphorical word as a lexical unit, its contextual meaning, its more basic meaning, and the contrast and comparison between these basic and contextual meanings. When analysts disagree about any of these issues, they will differ in deciding what counts as a metaphorically used lexical unit in usage. This will send them back to their analysis and force

 A Method for Linguistic Metaphor Identification

them to explain how they apply each of the criteria in the procedure, which in turn may lead to a more uniform understanding and application of the tool. Each of these decisions pertains to data that can often be described or measured in more than one way, which complicates decision making. This versatility of the data has mostly to do with the fact that many language phenomena exhibit categorization effects that inevitably make them better or worse examples of their category. This means that they can, for instance, be seen as either inside or outside that category, or closer to the centre of that category or not, or on the borderline between one category and another. All of these aspects are part of language research, as has been part and parcel of the cognitive-linguistic view of linguistics (e.g. Croft & Cruse 2004; Taylor 2002), and they have to be accommodated when making decisions about metaphor in usage. This is not just the case for metaphor as such, as if it were one monolithic whole, but it also holds for the five component aspects listed just now as constitutive of metaphor-related words in natural discourse. This will take us into issues of data collection in the next section. But since MIP only caters to metaphorically used words, we still need to extend it by developing a comparable procedure for words that are not used metaphorically as words themselves. One class expresses cross-domain mappings directly, as in the Bruce Springsteen lines. A brief preview of our solution is the following set of instructions: 1. Find local referent and topic shifts. Good clues are provided by lexis which is “incongruous” (Cameron 2003; Charteris-Black 2004) with the rest of the text. 2. Test whether the incongruous words are to be integrated within the overall referential and/or topical framework by means of some form of comparison. Good clues are provided by lexis which flags the need for some form of similarity or projection (Goatly 1997). 3. Test whether the comparison is nonliteral or cross-domain. Cameron (2003: 74) suggests that we should include any comparison that is not obviously non-metaphorical, such as the campsite was like a holiday village. Whenever two concepts are compared and they can be constructed, in context, as somehow belonging to two distinct and contrasted domains, the comparison should be seen as expressing a cross-domain mapping. Cameron refers to these as two incongruous domains. 4. Test whether the comparison can be seen as some form of indirect talk about the local or main referent or topic of the text. (If it is not, we might be dealing with a digression.) A provisional sketch of a conceptual mapping between the incongruous material functioning as source domain on the one hand and elements

Chapter 1. Linguistic metaphor identification in usage 

from the co-text functioning as target domain on the other should be possible. This type of preliminary conceptual analysis is useful because this is a case of direct metaphor where it is impossible to look up the metaphorical meaning of indirectly used words in the dictionary, as is possible for almost all indirect metaphor. 5. If the findings for tests 2, 3, and 4 are positive, then a word should be identified as (part of) a direct form of metaphor. When this set of instructions is applied to for instance the Springsteen excerpt, they can account for the data: Sometimes it’s like someone took a knife, baby, edgy and dull,/ And cut a six-inch valley through the middle of my soul. Almost all of these words can be seen as participating in a complex local reference shift, away from the love relationship towards a cutting scenario (which in turn has another reference shift inside it). The encompassing reference shift of the cutting scenario has to be incorporated into the overall referential and topical framework of the text on the basis of the criteria explained in these instructions: yes, the cutting scenario is distinct and may in a non-literal way be compared to what happens in the love relationship. Another class expresses cross-domain mappings implicitly. Here is an example from the British National Corpus (file A9J): ‘Naturally, to embark on such a step is not necessarily to succeed immediately in realising it’. Here step is related to metaphor, and it receives a code for implicit metaphor. This idea may need some explication. In discourse analysis, the discourse would have to show the previous concept (antecedent, ‘step’) instead of the cohesive element, it, and this would make the current proposition containing the cohesive element metaphorical. But the language in the surface text would be implicitly metaphorical, because the language does not signal the need for nonliteral comparison, as is the case with indirect and direct metaphor. Instead, implicit metaphor is due to the underlying cohesive link (grammatical and/or semantic) in the discourse which points to recoverable metaphorical material. The main point of this section has been to reconsider the operational criterion for metaphor as indirectness by similarity. The Pragglejaz Group have built MIP on the assumption that metaphor in discourse can be identified by looking for indirectly used words which then have to be interpreted by comparison to a more basic sense. However, as we have seen, there are other forms of metaphor which also operate on indirectness by similarity, but not at the level of indirect word use, but at the level of the conceptual structure of discourse. Since we have conceptualized metaphor as a matter of cross-domain mapping in conceptual structure, these are also metaphors, but either expressed directly or implicitly. They hence require an extension of MIP to be identified as linguistic expressions of metaphor. The basis of the extension remains the same, and refers to the operational criterion of

 A Method for Linguistic Metaphor Identification

indirectness by similarity in conceptual structure while translating this into appropriately corresponding linguistic terms. 1.6 Data collection In turning from the operationalization to the actual process of data collection, additional help can be recruited from a number of tools. These, in turn, prompt more specific decisions on how metaphor is identified in language. When considering the contextual and basic senses of lexical units, for instance, one could do worse than standardize data collection by referring to a publicly available description of all of these in one or more dictionaries. The advantages are obvious: decisions are made on the basis of an independently produced description of the language, they can be checked and the research can be replicated, and dictionaries can be compared or combined for various purposes. There are also disadvantages to using dictionaries, which have to do with for instance problems of space and practical solutions for complex items, but these can be redressed in various ways (Steen 2007). In their presentation of MIP, the Pragglejaz Group (2007) have listed and discussed some of the issues revealed by dictionaries, including the problems with the notion of the lexical unit itself, the question of its relation with lemmas versus word classes, and the matter of its relation with multi-word expressions, including idioms (bite off your tongue), polywords (of course), and phrasal verbs (turn on). As will be explained in the following paragraphs, we have followed up on the Pragglejaz approach in our own research, and become explicitly dependent on the corpus-based learners’ dictionary also employed by the Pragglejaz Group, The Macmillan English Dictionary for Advanced Learners (Rundell 2002). With MIPVU, we have developed and followed explicit guidelines for using this dictionary to answer the first four questions of the Pragglejaz procedure, whereas MIP has left these issues open for decision. Occasionally we have resorted to a second opinion dictionary that is largely comparable to the Macmillan dictionary, The Longman Dictionary of Contemporary English, in circumstances that will be illustrated in subsequent chapters. To us, lexical units are basically word classes, not lemmas, which is the view taken by the Pragglejaz Group. We prefer to analyze by word class because word classes have the closest connections with conceptual and referential classes like entities, processes, and attributes. These are fundamental for our overall discourse approach in which presumed referents that are linked to concepts and words play specific roles in projected discourse representations. Thus, we take the noun dog as a lexical unit distinct from the verb to dog, in that the noun posits a default animal referent as opposed to the verb, the default referent of which is some action that

Chapter 1. Linguistic metaphor identification in usage 

is typical of humans. Therefore, in our procedure the noun cannot provide a basic sense against which any contextual sense of the verb can be identified as metaphorical since these are different lexical units. In our approach, when the verb is used in some usage event, it refers to a process, and the decision whether it does this in a metaphorical or non-metaphorical way depends on the question whether there is a tension between some contextual and some more basic sense of the verb to dog. Of course, the relation between the verb and the noun is clearly metaphorical. However, it is not metaphorical at the level of the use of the lexical unit in the discourse, where it posits a referent of a particular kind. It is metaphorical at the level of the language system, and in particular its word formation system. It is not impossible that the latter source of metaphoricity may also turn out to have an effect on a word’s meaning in discourse, but it is not identical to the way metaphorical meaning is studied in the procedure: MIPVU focuses on word use in context, not on the results of metaphorical word formation processes. If it were to include the latter, it would have to systematically do so for all noun-verb pairs and other word formation products, and not just the few striking ones such as dog/to dog. This could in effect be done by an analysis of this type of dictionary entry as such, and could simply compare all lemmas that exhibit more than one word class in order to check whether one of them might be related to the other one by metaphor. This type of research would have no need to look at words as they are being used in discourse contexts, but would simply be able to assign the label ‘metaphorical’ to all uses of for instance the lemma ‘dog’ used as a verb, depending on the relation of the lemma in one word class to another, more basic use of that lemma in the other word class. This type of analysis would hence be of a different order than what is taking place here, where it is context which has to decide which of a lexical unit’s senses is operative, a basic sense or a metaphorical one. Contextual senses of lexical units defined in this way are looked up in the dictionary, and, unless they are novel, which happens extremely rarely, can as a rule all be found explicated there. Then the question arises whether a more basic sense can be found for the same entry. Basic senses are the most concrete and human-oriented senses that can be distinguished. Contrary to what is suggested by MIP, we have left older senses (as listed in for instance the Oxford English Dictionary) outside consideration when determining basic senses. This is because they are commonly not accessible as relevant senses to the contemporary user of English—as is reflected by the absence of some of them from contemporary user dictionaries. Finally, whether contextual and basic senses are distinct enough (question four) can also be reliably measured, by their degree of independence as separate sense descriptions in the dictionary. All of these details will be explicitly formalized in the MIPVU procedure in the next chapter and then illustrated in the subsequent five chapters.

 A Method for Linguistic Metaphor Identification

What cannot be achieved by resorting to the dictionary is giving a reliable answer to the fifth question listed in the previous section. Similarity, or understanding by comparison, is evidently a matter of degree and typically dependent upon one’s perspective. However, the fact that reality comes in shades does not prevent the researcher from imposing a less fine-grained perspective upon reality. The Pragglejaz Group have used a nominal scale, the most important decision boiling down to a binary distinction between the metaphorical and the nonmetaphorical. A linguistic form is hence judged to be either within or outside the category of metaphorical expressions. This is a reflection of one rather wide angle on metaphorical meaning as opposed to non-metaphorical meaning, but it should be acknowledged that this angle has served its purpose in linguistic research for a long time: the distinction between metaphorical and non-metaphorical meaning has typically been discussed from this binary perspective as a useful way to examine language use, even though at some points lip-service has been paid to gradability. Operationalization of this question in MIP hence involves the intuitive decision on the part of the researcher to adopt a measuring instrument with just two nominal categories (metaphorical versus non-metaphorical) or not. Some analysts have difficulties with accepting this type of approach. They feel that yes/no decisions about metaphor are inappropriate, because metaphor is a graded phenomenon. However, the fact that reality is graded does not mean that it is not useful to make gross orderings of reality into contrasting categories, particularly not when those categories are commonly deemed to be functional for all sorts of cognitive and linguistic processes and their products. The objections of such linguists may be countered by pointing out two observations: on the one hand, the measurement of reality by nominal scales is not the same as its reification into static categories of all or nothing—refinements can always be made in later stages by those who are interested; and on the other hand, if more finegrained scales of measurement are preferred, they have to be applicable across the board with a demonstrably identical degree of precision and reliability, which is a claim that cannot realistically be defended given the complexity of the data and the fuzziness of most theories. Thus, a rank-scale claims that observation is refined enough to order each and every linguistic expression as more or less metaphorical than another, while also allowing for ties. An interval scale would make the same claim, but add the requirement that the distance between the ranks would always be equal for all of the various groups of ranked phenomena. And a ratio scale would claim the same as the interval scale, but add the further claim that the absolute value of each of the intervals would also be known to the researcher and kept under constant control during measurement. Each of these types of measurement is in fact more difficult to make when it comes to the accuracy with which every

Chapter 1. Linguistic metaphor identification in usage 

single case can be assigned to a particular rank or position on a scale. This does not seem to be a realistic demand for the area of metaphor analysis (Steen 2007). It is, however, possible, to include a practically feasible acknowledgement of this problem. One addition we have made to this binary scale in MIPVU is the category of WIDLII, ‘When In Doubt, Leave It In’, for those cases that are borderline (cf. Scholfield 1995). This, in effect, produces a three-category variable: clear metaphor-related words, metaphor-related words that are WIDLII, and words that are clearly not related to metaphor. In our procedure, WIDLII is the logical result of lack of decision after group discussion of those data that were first analyzed independently by individual analysts and then made available for comments by the other analysts and judged to be problematic. If these cases could not be resolved by subsequent group discussion, they were coded as borderline with the ‘WIDLII’ code, as will be illustrated in the next chapters. We realize that other researchers, who do not work in collaboration with colleagues, need to find another measure for operationalizing such a group of borderline cases, but that does not detract from the methodological point that metaphor identification may profitably utilize a binary or three-category variable. MIPVU requires analysts to make a series of decisions of the nominal type. Each word has to be judged as being a lexical unit, or not. Questionable cases can be handled as distinct phenomena, so that there are very few remaining cases of doubt. The decision whether a contrast and comparison can be set up between the contextual and basic meanings of a lexical unit also involves a number of ‘yes/no/ don’t know’ decisions. As will be detailed in the next chapter, when the answer to the question about comparison is affirmative, the lexical unit is automatically classified as metaphorical as opposed to non-metaphorical. 1.7 Data analysis Data are collected in order to be analyzed, and data analysis can take place in many different ways (Steen 2007). In our overall research project on metaphor in discourse, we analyze the data collected by MIPVU in a number of ways, primarily quantitatively with reference to their distribution across registers, word classes, and a number of aspects of discourse. A preliminary report on this analysis will be presented at the end of this casebook in Chapters 9 and 10, in order to demonstrate the great value of spending all this time on high-quality data collection. However, from the present perspective, which is focused on a method for linguistic metaphor identification, data collection may also be followed by an analysis of the quality of the process of data collection itself. In other words, we may also

 A Method for Linguistic Metaphor Identification

proceed to analyze the degree of agreement between analysts in applying the procedure to large and varied amounts of data. This is what we have also done in our research (Chapter 8), and this is the aspect of data analysis that finally needs some attention in this introductory chapter. The Pragglejaz Group, and our own group in their wake, have examined reliability in a separate series of specially conducted methodological studies by looking at Cohen’s and Fleiss’s Kappa (Dunn 1989). This is a test statistic which investigates the agreement between judges on a case-by-case basis: it checks interanalyst agreement for each individual case in the total number of items of a sample. It is specifically concerned with the relation between the agreement that might be expected on the basis of mere chance on the one hand and observed agreement between analysts on the other. There are several interpretations of the test. The most critical and conservative interpretation is that kappa only tells you whether agreement between analysts is above chance. The most optimistic interpretation holds that kappa also gives you a measure of the magnitude of between-analyst agreement corrected for chance agreement. Interpretations of the values of kappa vary too, with some researchers saying that a value of 0.6 to 0.8 indicates substantial agreement, while others say that a kappa from 0.7 upwards is satisfactory, two positions that are in fact not completely incompatible. Two studies by the Pragglejaz Group (2007) yielded kappas between 0.6 and 0.8 before discussion. Our own results will be reported in Chapter 8, but they are even better. The value of these findings should be seen in the light of their possible application. If metaphor identification is needed to analyze large chunks of discourse in a corpus in order to compare the properties and behaviour of groups of words related to metaphor across genres or speakers or situations of usage, then overall reliability need not be as high as when the aim of the research is to be certain about each and every individual case of metaphorical usage, say for the interpretation of one particular text (Scholfield 1995). When large groups of cases are compared, differences between groups of metaphorically used words merely have to be somewhat bigger in order to come out as statistically significant. High reliability about the value of individual cases may be important for single text analysis, for instance in literary discourse analysis. It may also be relevant for the development of experimental materials on the basis of authentic discourse, as may happen with research on metaphor recognition (Steen 2004). If you wish to examine how metaphor recognition in natural discourse by informants is affected by discourse properties of metaphor, it is essential to have a representative sample of all true cases of metaphor in the study, and unreliable metaphor identification may skew the experimental materials in subtle ways. It has been one of the ambitions of both MIP and of MIPVU to be suitable for application in this type of experimental behavioural setting.

Chapter 1. Linguistic metaphor identification in usage 

1.8 Plan of the book We have made several points about linguistic metaphor identification in usage, taking the Pragglejaz Group’s MIP as our point of reference, but pointing out the reasons why we have developed our own variant called MIPVU: –– With the Pragglejaz Group, we conceptualize metaphor as a matter of crossdomain mappings in conceptual structure which are expressed in language. However, contrary to the Pragglejaz Group, we do not restrict our attention to indirect expressions of metaphor, but also include direct expressions (other forms of metaphor such as simile, analogy, and so on) and implicit expressions (by substitution and ellipsis); this will be detailed in the next chapter. –– We operationalize metaphor as indirectness by similarity, or comparison. The Pragglejaz Group have pitched this operationalization at the level of language, testing whether lexical units are used indirectly. We have moved it to the level of conceptual structure, testing whether concepts are used indirectly, in order to cater to other forms of expression of metaphor than indirect language use. This has consequences for the procedure for data collection, as, again, will be explained in the next chapter. –– With the Pragglejaz Group, we are limiting the application of our procedure to metaphorical meaning to the contemporary language user, and as expressed in lexical units. This means that we do not consider historical metaphor, or metaphor in morphology, syntax, and so on. Moreover, our definition of lexical units is less broad than in MIP, since we have to rely on the distinction between word classes in order to guarantee a consistent discourse perspective on the relation between words, concepts, and referents. –– We have gone beyond the Pragglejaz Group’s practice by standardizing the data collection process explicitly with reference to a dictionary, with precise guidelines for making the various decisions that are needed in MIP and MIPVU. This does not mean that our application cannot be used with other dictionaries, but then the standardization would need to be slightly adjusted to those dictionaries. –– With the Pragglejaz Group, we interpret linguistic metaphor identification as yielding data about the semiotic structure of language in usage events; the focus on language means that we do not aim to specify the nature of underlying conceptual structures, and the focus on semiotic structure means that we do not make claims about cognitive processes and products. We will now offer a brief preview of the rest of the book. First of all, Chapter 2 will present MIPVU, our detailed set of instructions for linguistic metaphor identification that are an elaboration and refinement of MIP. This is the set of instructions that has emerged from our empirical research and

 A Method for Linguistic Metaphor Identification

which has been applied in our empirical work on some 190,000 words of English discourse and some 130,000 words of Dutch discourse. The subsequent chapters in the book present the details of our application of MIPVU. Chapters 3 through 6 discuss four English-language registers: news, conversation, fiction, and academic discourse. These are all part of a corpus-annotation project based on BNC-Baby, a four-million word sample from the British National Corpus, which contains four sets of one million words for each of the registers. BNC-Baby was developed as a dataset for linguists who wish to work with the new university grammar of spoken and written English published by Biber, Johansson, et al. (1999), which is based on a comparison between these four registers. Our selection of these materials from BNC-Baby was aimed at achieving some connection with this grammatical description of contemporary English so that the relation between metaphor and grammar might benefit from our research. Our analyses in Chapters 3 through 6 pertain to 50,000 word samples from these four sets. Chapter 7 then turns to the two registers comprised in the Dutchlanguage project, conversation and news. It is based on two equal-sized, specially constructed samples of 50,000 words, with another, historical part with some 30,000 words. The chapter explains the background to the corpus and focuses on the differences between doing this type of work in English and in Dutch. Each of the chapters is largely based on a discussion of the cases that were included in the MIPVU reliability tests, which will be reported on in statistical detail in Chapter 8. This method of presentation in Chapters 3 through 7 enables us to point out truly unproblematic cases, which received the same analyses from all analysts and to show why the method worked so well for these cases. This includes looking at clear cases of words that are not related to the expression of metaphor as well as looking at clear cases of words that are related to metaphor. It also permits us to scrutinize those cases that are borderline and make suggestions about these and other unclear cases. In each of these chapters, we will offer a combination of general considerations for finding metaphorical language, technical considerations of method, and more specific reflections on relations with the particular register. These case studies form the backbone of the book, and they present the detailed research that needs to be done before we can proceed to the more general considerations of method and results in Chapters 8 through 10. The last chapters of the book will present our own methodological evaluation of MIPVU and the findings of our research. MIPVU has been formally tested for reliability throughout the research, and a detailed report of these reliability studies can be found in Chapter 8. Chapter 10 contains some of the main findings of the corpus annotations for the English-language project, spelling out how metaphor is distributed across the four registers (conversation, news, fiction, and academic discourse) and the main word classes (adjectives, adverbs, conjunctions,

Chapter 1. Linguistic metaphor identification in usage 

determiners, nouns, prepositions, verbs, and the remainder). In order to run these analyses, the data needed to be checked for accuracy on a number of points, which we discuss as a separate topic in Chapter 9. The quantitative results of the Dutchlanguage project will be reported separately on another occasion. In a conclusion we will briefly reconsider the issues raised by MIP and our solution to them in MIPVU. It is our hope that these reflections and their basis in the hands-on research presented in Chapters 3 through 7 will be helpful to other researchers of metaphor. If this casebook inspires other researchers to be equally explicit and systematic about their procedures of data collection, this book will have achieved its goal.

chapter 2

MIPVU A manual for identifying metaphor-related words This chapter presents the complete procedure for finding metaphor-related words which has been utilized in our research. The style is in the form of a set of instructions. As announced in Chapter 1, we will report reliability tests for this procedure in Chapter 8, and quantitative empirical results of its application to our materials in Chapters 9 and 10. Qualitative discussions of methodological issues of application can be found in Chapters 3 through 7. The present chapter is intended to be an independent presentation of the procedure as an autonomous tool. It may be used as a reference manual by anyone who aims to find metaphor-related words in usage. The term ‘metaphor-related words’ is used to suggest that the tool aims to identify all words in discourse that can be taken to be lexical expressions of underlying cross-domain mappings.

2.1 The basic procedure The goal of finding metaphor in discourse can be achieved in systematic and exhaustive fashion by adhering to the following set of guidelines. 1. Find metaphor-related words (MRWs) by examining the text on a word-byword basis. –– For information about whether an expression counts as a word, consult Section 2.2. 2. When a word is used indirectly and that use may potentially be explained by some form of cross-domain mapping from a more basic meaning of that word, mark the word as metaphorically used (MRW). –– For information about indirect word use that is potentially explained by cross-domain mapping, consult Section 2.3.

 A Method for Linguistic Metaphor Identification

3. When a word is used directly and its use may potentially be explained by some form of cross-domain mapping to a more basic referent or topic in the text, mark the word as direct metaphor (MRW, direct). –– For more information about direct word use that is potentially explained by cross-domain mapping, consult Section 2.4. 4. When words are used for the purpose of lexico-grammatical substitution, such as third person personal pronouns, or when ellipsis occurs where words may be seen as missing, as in some forms of co-ordination, and when a direct or indirect meaning is conveyed by those substitutions or ellipses that may potentially be explained by some form of cross-domain mapping from a more basic meaning, referent, or topic, insert a code for implicit metaphor (MRW, implicit). –– For more information about implicit meaning by substitution or ellipsis that is potentially explained by cross-domain mapping, consult Section 2.5. 5. When a word functions as a signal that a cross-domain mapping may be at play, mark it as a metaphor flag (MFlag). –– For more information about signals of cross-domain mappings, consult Section 2.6. 6. When a word is a new-formation coined, examine the distinct words that are its independent parts according to steps 2 through 5. –– For more information about new-formations, consult Section 2.7. The use of the phrase ‘potentially explained by a cross-domain mapping’ is intentional. It should be read with an emphasis on ‘potentially’. This links up with the tenuous connection between linguistic and conceptual metaphor identification discussed in Chapter 1. As for the relation with MIP (Pragglejaz Group 2007), points 1 and 2 are essentially the same as MIP. Points 3 and 4 deal with two additions to MIP in the area of other forms of metaphor. Point 5 is a different kind of addition to MIP and includes the identification of signals of metaphor. And point 6 takes one assumption of MIP to its linguistic conclusion by including instructions for handling new lexical units. 2.2 Deciding about words: Lexical units The word is the unit of analysis which is examined for metaphorical use. There are other possibilities, such as the morpheme or the phrase, and these can account for

Chapter 2. MIPVU 

additional metaphor in usage. However, we do not mark these other possibilities, because we can only do one thing at a time. Focusing on the word as the unit of analysis is already a most challenging and complex operation. It is motivated by the functional relation between words, concepts and referents in discourse analysis, described in Chapter 1. A systematic and explicit approach to the relevant unit of analysis is crucial for a consistent and correct quantitative analysis of the data. Lack of clear guidelines may introduce a substantial degree of error and therefore noise into the numbers and patterns obtained. It would undermine detailed quantitative comparison between distinct studies. For theoretical reasons, we will call the word a ‘lexical unit’. In adopting this terminology, we follow the Pragglejaz Group (2007). When you decide about the boundaries of lexical units, the following guidelines should be adopted. 2.2.1 General guideline In our project, the data come from the British National Corpus, and we therefore follow most of BNC practice in deciding what counts as a lexical unit. In other projects with other materials, these guidelines may or may not have to be adjusted to the other source, as we shall show for Dutch in Chapter 7. In our research, the dependence on these materials means two things: 1. All words provided with an independent Part-Of-Speech (POS) tag in the corpus are taken as separate lexical units. For instance, prepositions are coded as PRP, nouns are coded as NN, and so on. A full list of tags is available from the BNC website: www.natcorp.ox.ac.uk. 2. All so-called polywords in the corpus are taken as single lexical units. There are a number of fixed multi-word expressions that are analyzed as one lexical unit in the BNC, on the grounds that they are grammatical units which designate one specific referent in the discourse. Examples include a good deal, by means of, and of course. These multi-word expressions are called polywords. They have special tags and are available in a finite list from the BNC website: www.natcorp.ox.ac.uk. You should follow this practice and, in particular, not examine the parts of these polywords for potential metaphorical meaning. 2.2.2 Exceptions There are three exceptions to our overall acceptance of BNC practice: phrasal verbs, some compounds, and some proper names.

 A Method for Linguistic Metaphor Identification

Phrasal verbs are verbal expressions consisting of more than one word, such as look up or turn on. These are not taken as single lexical units in the BNC, but as independent verbs followed by autonomous adverbial particles. We will not follow this practice, for phrasal verbs function as linguistic units designating one action, process, state or relation in the referential dimension of the discourse. In that respect, they are similar to polywords. You should therefore treat all phrasal verbs as single lexical units: their individual parts do not require independent analysis for potential metaphorical meaning. The phrasal verb as a whole, however, can still be used metaphorically. For instance, setting up an organization is a metaphorical variant of setting up a roadblock. The classification of two or more words as part of one phrasal verb should be marked as such in the data. The problem with phrasal verbs is their superficial resemblance to prepo sitional verbs (i.e. a frequent verb-preposition combination) and to verbs followed by free adverbs. The latter two cases should be analyzed as free combinations consisting of two independent lexical units, as opposed to phrasal verbs which should be taken as only one. Again, the motivation for this approach is the assumption of a functional and global correspondence between words, concepts, and referents. One way to tell these three groups apart is by examining their POS tags in the BNC. Particles of phrasal verbs have received an AVP code (‘Adverbial particle’), prepositions of prepositional verbs a PRP code (‘Preposition’), and freely occurring adverbs an AV0 code (‘Adverb’). These are classifications which have been made independently of any questions about metaphorical use; they are based on a general approach to data analysis, which is a bonus. However, the matter is further complicated in three ways. Firstly, when we go to the dictionaries used in our research for examining contextual and basic meanings, it appears that they do not distinguish between phrasal verbs and prepositional verbs. They in fact call both types phrasal verbs. An example is look at in a sentence like “it was only when you looked at their faces that you saw the difference”. According to Macmillan this is a phrasal verb, but the BNC code for at is PRP, indicating that it is a prepositional verb. We follow the BNC’s decision, which means that you have to analyze look and at as two lexical units and independently examine their main senses in the dictionary to find their respective basic meanings; the contextual meaning of each of them in their combined use, even as a prepositional verb, however, will be found under the phrasal meaning of the combination. Secondly, some of the verb+particle combinations marked as such in the BNC are in fact not conventionalized phrasal verbs. That is, they are not phrasal verbs

Chapter 2. MIPVU 

according to the dictionary. An example is look up in a sentence like “she looked up into the sky”. Here up is coded as AVP in the BNC, suggesting that this is a proper phrasal verb. However, the Macmillan dictionary tells us that the contextual meaning – “to direct your eyes towards someone or something so that you can see them” – is not one of the meanings of the phrasal verb (unlike, for instance “to try to find a particular piece of information”). The contextual meaning, instead, is the result of a free combination of a verb plus an adverb. BNC has probably made a mistake here; the words consequently have to be analyzed as two separate lexical units. Thirdly, there is the matter of complex phrasal verbs, such as make up for or do away with. These may be easily confused with combinations of simple phrasal verbs with a preposition (make up + for or do away + with). However, they are typically listed as complete, complex phrasal verbs in the Macmillan dictionary, as run-ons after the main verb, and they can be replaced by a synonym (compensate and get rid of). Because of this referential unity, we follow the dictionary for complex phrasal verbs and take the dictionary classification of these complex verbs as single units as our guideline. Taking all of this into consideration, we have established the following rules for simple phrasal verbs (complex phrasal verbs being recognizable by the criteria above): a. If the POS tag is PRP then we are dealing with a prepositional verb → analyze the verb and the preposition separately (i.e. two lexical units). b. If the POS tag is AVP then check in the dictionary whether the combination of verb+particle has been listed as a phrasal verb meaning in the relevant contextual meaning –– → if this is the case, then we accept it is a phrasal verb and analyze the combination as one lexical unit; –– → if this is not the case, then we do not take the combination to be a conventionalized phrasal verb and therefore we analyze the verb and the particle separately (i.e. two lexical units). c. If the POS tag is AV0 then we are dealing with a verb followed by a free adverb → analyze as two lexical units. d. If the POS tag is PRP/AVP then apply the tests below to determine whether we are dealing with a phrasal or a prepositional verb. e. If the BNC code is clearly wrong (supported by the above criteria or the tests below) then apply the proper analysis and add a comment in the materials stating “incorrect POS tag: PRP not AVP”.

 A Method for Linguistic Metaphor Identification

Tests for deciding between phrasal/prepositional verbs In prepositional verbs: –– The preposition and following noun can be moved to the front of the sentence, which is not possible with phrasal verb particles (e.g. Up into the sky she looked but not *Up the information she looked). –– An adverb can be inserted before the preposition (e.g. She ran quickly down the hill but not *She ran viciously down her best friends). –– The preposition can be moved to the front of a wh-word (e.g. Up which hill did he run? but not *Up which bill did he run?). In phrasal verbs: –– The adverbial particle can be placed before or after the noun phrase acting as object of the verb, which is not possible for the prepositional verbs (e.g. She looked the information up but not *She looked his face at). –– If the noun phrase is replaced by a pronoun, the pronoun has to be placed in front of the particle (e.g. The dentist took all my teeth out > The dentist took them out but not She went through the gate > *She went it through). Compounds are single lexical units consisting of two distinct parts, which may cause orthographical problems. They can be spelt in three ways: as one word, as two hyphenated words, and as two separate words. a. When a compound noun is spelt as one word, such as underpass, and can be found as such in the dictionary we treat it as one lexical unit designating one referent in the discourse. b. When a compound noun is spelt as two hyphenated words and can be found as such in the dictionary, such as pitter-patter, we similarly treat it as one lexical unit. However, if we are dealing with a novel formation unknown to the dictionary, the compound noun is analyzed as two separate units, even though it may have one POS tag in the corpus. Our reason for this practice is that the language user is forced to parse the compound into its two component parts in order to establish the relation between the two related concepts and referents. This also applies to hyphenated compound nouns created through a productive morphological rule but that are not listed as a conventionalized compound in the dictionary (such as under-five). c. In the BNC, compound nouns that have been spelt as two separate words are not taken as single lexical units, but analyzed as combinations of two independent words which each receive their own POS tags. When such compounds are conventionalized and, again, function as lexical units designating

Chapter 2. MIPVU 

one referent in the discourse, we will not follow the BNC solution. For then they are like polywords, and should be treated as single lexical units, whose parts do not require analysis for potential metaphorical meaning. The Macmillan dictionary has a tell-tale signal for identifying conventionalized compounds that are spelt as two distinct words: when a fixed expression is taken to be a compound noun, there is primary stress on the first word and secondary stress on the second word (e.g. ¢power ¡plant). In cases where the Macmillan dictionary treats a multi-word combination as having one meaning, but displays a reversed stress pattern (such as Ànuclear Ápower), we do not treat the multi-word expression as a compound noun, and analyze it as consisting of two separate lexical units. –– Rules a and b also apply to compound adverbs and adjectives, such as honey-hunting. This example is a novel formation unknown to Macmillan. Therefore, following rule b, the adjective is analyzed as comprising two separate lexical units, even though BNC has given it one POS tag. –– Words may be spelt in more than one way, which may cause problems about the independent status of their components in some cases. An example is when the preposition onto is spelt as two words instead of one. When this happens, we will adhere to the spelling of the dictionary instead of the spelling of the document under analysis, because the dictionary is the more general reference work and related to accepted norms for language users. You should therefore analyze words according to their spelling in the dictionary, not according to their spelling and POS tagging in the corpus. Proper names appear to form a special group in our analyses. There are several subclasses which we have encountered, which may not all technically qualify as genuine proper names. They will be discussed one by one. In general, however, proper names do not require any specific additional coding. Our general strategy is to reduce the number of exceptions to POS tagging as provided by the BNC corpus. The solution to annotation problems proposed below is maximally simple: every separate word will be treated as a separate lexical unit, except for the underlined cases. a. Proper names: all parts of genuine proper names are to be treated in the way of regular POS tagging. That is, Roy Wood and Madame Mattli are coded as two separate words and taken as two lexical units. This can be extended to addresses, with house numbers as well as road names all being cut up into separate lexical units. As a result, New York (in New York Herald Tribune) is also two units.

 A Method for Linguistic Metaphor Identification

b. Some proper names have been bestowed on public entities and may appear in the dictionary. If they do, they are to be treated as all other expressions in the dictionary: thus, Labour Party becomes one lexical unit because it has the stress pattern of a compound. The same holds for some titles that appear in the dictionary, such as Pulitzer Prize, which is also treated as one lexical unit on the basis of the stress pattern. In our annotations, these expressions should be treated similarly to phrasal verbs, compounds, and polywords and should therefore receive a code to indicate that the words form single lexical units. Green Paper and White Paper, by contrast, are to be treated as containing two lexical units, because they have rising stress (Green and White would always be marked as related to metaphor). The elements of names of countries (e.g. United Kingdom) and organizations (e.g. United Nations) that have rising stress in the dictionary should also be treated as separate units. c. Other names and titles do not appear in the dictionary. They are also treated as composites of their independent words, both by the BNC and by us. This accounts for two lexical units in Labour Law, Executive Committee, European Plan, Scarman Report, and even more lexical units in the Student Winter Games, the Henley Royal Regatta, the Criminal Law Revision Committee, House of Oliver, and so on. d. A separate problem is constituted by genuine titles, that is, titles of texts: –– If titles are used as titles, that is, as headings of newspaper articles or chapters and sections of novels and academic writing, they need to be taken on a word-by-word basis. This is because they summarize or indicate content by means of words, concepts, and referents. They are regular cases, if linguistically sometimes odd. –– If titles are mentioned, however, to refer to for example a text or a TV programme, they function as names, like proper names. If they are in the dictionary, check their stress pattern; if they are not, use BNC-Baby as a guide. 2.3 Indirect use potentially explained by cross-domain mapping Indirect use of lexical units which may be explained by a cross-domain mapping is basically identified by means of MIP, with some adjustments. This means that the following guidelines should be adopted.

Chapter 2. MIPVU 

1. Identify the contextual meaning of the lexical unit. –– For more information, see Section 2.3.1. 2. Check if there is a more basic meaning of the lexical unit. If there is, establish its identity. –– For more information, consult Section 2.3.2. 3. Determine whether the more basic meaning of the lexical unit is sufficiently distinct from the contextual meaning. –– For more information, see Section 2.3.3. 4. Examine whether the contextual meaning of the lexical unit can be related to the more basic meaning by some form of similarity. –– For more information, consult Section 2.3.4. If the results of instructions 2, 3, and 4 are positive, then a lexical unit should be marked as a metaphor-related word (‘MRW’), which may be made more precise by adding the information that it is ‘indirect’ (as opposed to ‘direct’ or ‘implicit’, see below). 2.3.1 Identifying contextual meanings The contextual meaning of a lexical unit is the meaning it has in the situation in which it is used. It may be conventionalized and attested, and will then be found in a general users’ dictionary; but it may also be novel, specialized, or highly specific, in which case it cannot be found in a general users’ dictionary. When you identify the contextual meaning of a lexical unit, several problems may arise. 1. When utterances are not finished, there is not enough contextual knowledge to determine the precise intended meaning of a lexical unit in context. In such cases, it may be that the lexical unit has been used indirectly on the basis of a metaphorical mapping, but this is impossible to decide. In such cases, we will discard for metaphor analysis all relevant lexical units in aborted utterances. An example is ‘Yeah I had somebody come round and stuck their bloody …’ The lexical units in the incomplete utterance in question (beginning with stuck) that could or could not have been related to metaphor should be marked as Discarded For Metaphor Analysis (add code ‘DFMA’ to each of them). 2. When there is not enough contextual knowledge to determine the precise intended meaning of a lexical unit in context, it may be that it has been used

 A Method for Linguistic Metaphor Identification

indirectly on the basis of a metaphorical mapping, but this may be impossible to decide.

a. An example is the use of up to indicate movement towards, where it is possible that the target is either higher (not metaphorical) or not higher (metaphorical) than the speaker. b. Another example is the use of idioms such as gasp for breath or turn your shoulder, approached as three lexical units, where it is possible that the designated action in fact takes place and thereby stands for the emotion (metonymy), or the designated action in fact does not take place so that the phrase is used metaphorically to indicate the concomitant emotion. c. A third example involves anaphora which may be interpreted in more than one way, as in all that in the following example, where a possible metaphorical interpretation is applicable: ‘he said I come to sup be supervisor he said, I don’t know, I don’t wish to learn all that!’

In such cases of lack of situational knowledge but with a potential for metaphorical meaning, you have to treat the word as if it was used indirectly and metaphorically, on the basis of the general rule ‘When In Doubt, Leave It In’ and add the special code ‘WIDLII’. 3. Specialist terminology may constitute a specific case of insufficient contextual knowledge to determine the precise intended meaning of a lexical unit in context. When there is not enough contextual knowledge to determine the specific technical and/or scientific meaning of a word in context, regular dictionaries cannot help. In such cases, it would of course be possible to use other, preferably specialized dictionaries to find out the specific contextual meaning of a term. However, in our project we assume that metaphor is ‘metaphor to the general language user’: if we as general language users cannot establish the meaning of the lexical unit with the contemporary dictionaries alone but the lexical unit could be metaphorical on the basis of some contextual meaning projected from the basic—nontechnical—meaning, we also mark the word as metaphor-related based on ‘WIDLII’. 4. Sometimes the contextual meaning of a lexical unit may be taken as either metaphorical or as not metaphorical. This seems to be the case for many personifications, such as furious debate or this essay thinks. These examples may be analyzed as involving a metaphorical use of furious and thinks, respectively, but they may also be resolved by a metonymic interpretation of the other terms, i.e. debate and essay, in which case furious and thinks automatically turn non-metaphorical. In such cases, the possibility of the metaphorical interpretation should not be lost, and you should mark the relevant ambiguous words furious and thinks as metaphor-related words, and add a comment that this is due to a possible personification.

Chapter 2. MIPVU 

2.3.2 Deciding about more basic meanings A more basic meaning of a lexical unit is defined as a more concrete, specific, and human-oriented sense in contemporary language use. Since these meanings are basic, they are always to be found in a general users’ dictionary. A meaning cannot be more basic if it is not included in a contemporary users’ dictionary. From a linguistic point of view, a more basic meaning of a word is its historically older meaning. However, from a behavioural point of view, this definition may not be optimal. Most language users are not aware of the relative ages of the various meanings of most words in the contemporary language. This means that the linguistic notion of basic sense as the historically prior sense has little relevance to the behavioural, in particular cognitive, notion of basic sense. However, it is one of the fundamental claims of contemporary metaphor theory that most of the historically older meanings of words are also more concrete, specific, and human-oriented. This is explained by the cognitive-linguistic assumption of experientialism (Lakoff & Johnson 1980). As a result, concrete meanings are typically also basic meanings from a historical perspective. The still largely programmatic assumption of a connection between historically prior meanings and concrete, specific, and human-oriented meanings makes it possible for us to adopt one practical and consistent general starting point about basic meanings: they can be operationalized in terms of concrete, specific, and human-oriented meanings. This is our general definition for basic meanings. As a result, we will not check the history of each lexical unit as an integral part of our procedure. This is a huge practical advantage, which is based in general cognitive-linguistic practice. Diachronic considerations of basic meanings may only come in when specific problems arise. When attempting to find basic meanings in the dictionary, the following guidelines should be adopted. 1. A more basic sense has to be present for the relevant grammatical category of the word-form as it is used in context. This is because a grammatical category in a text specifies a particular class of concept and referent, which may not be altered when looking for basic meanings, for otherwise the basis of comparison is shifted. When the dictionary shows that a word may be used in more than one grammatical category, you hence have to examine the various meanings of the word within its grammatical category. Contextual and basic meanings are therefore contrasted as two alternative uses for the same word form in the particular grammatical role that it has in the text. As a result,

a. the contextual meaning of nouns, verbs, adjectives, adverbs, prepositions, and interjections cannot be compared with the meaning of other word

 A Method for Linguistic Metaphor Identification

classes for the same lemma (conversions); for instance, the meaning of shift as a noun should be analyzed irrespective of the meaning of shift as a verb. b. the contextual meaning of verbs used as linking verbs, primary verbs, modal verbs, verbs initiating complex verb constructions such as start, stop, continue, quit, keep, and so on, causative verbs (have, get, and so on), and full verbs cannot be compared with the meaning of the same verbs used in other roles. c. the contextual meaning of verbs used transitively can as a rule not be compared with the meaning of the same verbs used intransitively. d. the contextual meaning of nouns used to designate countable entities can as a rule not be compared with the meaning of the same nouns used to designate uncountable entities.

However, there are a number of complications: 2. When a word may be used in more than one grammatical category, but its description in the dictionary is limited to one of those categories only, you inevitably have to compare the various meanings of the word in the other grammatical categories with reference to that one grammatical category. Example: the contextual and basic meanings of suppression have to be examined with reference to the description of suppress. 3. When verbs are described under a single sense description in the dictionary as both Transitive and Intransitive, then you may compare these Transitive and Intransitive meanings with each other in order to determine whether the contextual meaning may be differentiated from a more basic meaning in the same sense description. 4. Sometimes lexical units have an abstract contextual meaning that is general which has to be contrasted with a concrete meaning that is specialized, for instance because it is limited to a style (e.g. very [in]formal), a subject (business, computing, journalism, law, linguistics, medicine, science, and so on), or period (literary, old-fashioned). In that case, we abide by our general rule for finding basic senses and take the most concrete sense as basic, even if it is specialized. Example: the concrete medical sense of palliate is basic and the general abstract sense of palliate is therefore metaphorical. 5. The reverse of [4] also applies: when a lexical unit with an abstract but specialized contextual meaning has to be contrasted with a concrete but general meaning, we also take the concrete sense as basic. Example: the abstract religious sense of father, mother, and so on is not basic, whereas the concrete general sense is. Therefore the religious senses are metaphorical. 6. When the contextual meaning of a lexical unit is just as abstract/concrete as some of its alternative meanings, we have to check whether there is any

Chapter 2. MIPVU 

indication of the (original) domain from which the word derives. For instance, there are verbs such as trot and roar which may be applied with equal ease to a range of concrete entities, but the nonhuman, animal origin (basic sense) of the lexical units decides which applications are metaphorical and which are not. 7. However, other lexical units may have a less clear domain of origin, such as the verb ride. It is presented in the Macmillan dictionary as monosemous between animal and artefact. If we suspect that there is a problem with the dictionary description because of its function as an advanced learners’ dictionary, we check the evidence in a second advanced learners’ dictionary, Longman. For instance, the verb to groom does not have distinct senses for people and animals in Macmillan, but it does in Longman; as a result, we rely on Longman to conclude that the two senses are sufficiently distinct. By contrast, transform has one general sense in Macmillan, which is corroborated by the Longman dictionary. 2.3.3 Deciding about sufficient distinctness Metaphorical meanings depend on a contrast between a contextual meaning and a more basic meaning. This suggests that the more basic meaning has to be sufficiently distinct from the contextual meaning for the latter to be seen as potentially participating in another semantic or conceptual domain. The following practical guideline should be followed: 1. When a lexical unit has more than one separate, numbered sense description within its grammatical category, these senses are regarded as sufficiently distinct. 2. When a lexical unit has only one numbered sense description within its grammatical category, this counts as the basic sense and any difference with the contextual sense of the item under investigation will count as sufficient distinctness. 2.3.4 Deciding about the role of similarity When you have two sufficiently distinct meanings of a lexical unit and one seems more basic than the other, these senses are potentially metaphorically related to each other when they display some form of similarity. This typically happens because they capitalize on external or functional resemblances (attributes and relations) between the concepts they designate. It is immaterial whether these resemblances are highly schematic or fairly rich. In deciding about a relation of similarity between the contextual and the basic sense of a lexical unit, the following practical guidelines should be followed: 1. When a lexical unit has a general and vague contextual sense which looks like a bleached, abstracted relation of a rather specific and concrete sense, you

 A Method for Linguistic Metaphor Identification

should mark the word as metaphorically used when the two senses are distinct enough and can be related via similarity. This is typically the case for senses that may be distinguished as concrete versus abstract. It should be noted that similarity is not the same as class-inclusion, as in the case of synecdoche. Thus, for appeal we have an abstract general sense and a more concrete but also specialized legal sense. If we decide that the latter is basic because it is more concrete, then the general sense of appeal is a case of generalization instead of similarity, and it can therefore be treated as a case of synecdoche instead of metaphor. This should be contrasted with a case like palliate, where we see both generalization and similarity based on metaphorical mapping from concrete (relieve physical pain) to abstract (relieve generally bad situations of their most serious aspects). 2. When a lexical unit has an abstract contextual sense and a sufficiently distinct, concrete more basic sense, but there does not seem to be a relation of similarity between the two even though there does seem to be some sort of relation, check the Oxford English Dictionary to deepen your understanding of the word. In such a case, the two senses may be historically related via a common source which may have disappeared from the language. Checking the OED may explain the strange relation between the current abstract and concrete senses and support the decision not to take the concrete sense as basic for the abstract sense, but instead to take both senses as equally basic because there is no transparent relation of similarity for the contemporary language user. We have seen this for a word like order (‘arrangement’ and ‘bringing about of order by speech act’). 3. When two senses appear to be metonymically related, this does not mean that you should not also consider the possibility that they are metaphorically related at the same time. Sense relations may have more than one motivation. 2.4 Direct use potentially explained by cross-domain mapping Directly used lexical units that are related to metaphor are identified as follows: 1. Find local referent and topic shifts. –– Good clues are provided by lexis which is “incongruous” (Cameron 2003; Charteris-Black 2004) with the rest of the text. 2. Test whether the incongruous lexical units are to be integrated within the overall referential and/or topical framework by means of some form of comparison. –– Good clues are provided by lexis which flags the need for some form of similarity or projection (Goatly 1997).

Chapter 2. MIPVU 

3. Test whether the comparison is nonliteral or cross-domain. –– Cameron (2003: 74) suggests that we should include any comparison that is not obviously non-metaphorical, such as the campsite was like a holiday village. Consequently, whenever two concepts are compared and they can be constructed, in context, as somehow belonging to two distinct and contrasted domains, the comparison should be seen as expressing a crossdomain mapping. Cameron refers to these as two incongruous domains. 4. Test whether the comparison can be seen as some form of indirect discourse about the local or main referent or topic of the text. –– A provisional sketch of a mapping between the incongruous material functioning as source domain on the one hand and elements from the co-text functioning as target domain on the other should be possible. If the findings of tests 2, 3, and 4 are positive, then a word should be marked for direct metaphor (‘MRW, direct’).

2.5 Implicit meaning potentially explained by cross-domain mapping The previous forms of metaphor were explicit in that there is at least one word in the discourse which comes from another semantic or conceptual domain. Implicit metaphor is different and does not have words that clearly stand out as coming from an alien domain. It comes in two forms, implicit metaphor by substitution and implicit metaphor by ellipsis. Following Halliday and Hasan (1976), metaphor by substitution works through pro-forms such as pronouns, and metaphor by ellipsis works through non-existent words which may be inserted into grammatical gaps. Both types therefore do not exhibit ostensibly incongruous words, but still need to be analyzed as the linguistic expression of metaphor in natural discourse. When a discourse uses lexical units for the purpose of substitution and thereby still conveys a direct or indirect meaning that may be explained by some form of cross-domain mapping from a more basic meaning, referent, or topic, insert a code for implicit metaphor (‘implicit’). An example is: ‘Naturally, to embark on such a step is not necessarily to succeed immediately in realising it’. Here step is related to metaphor, and it is a substitution for the notion of ‘step’ and hence receives a code for implicit metaphor (‘MRW, impl’). When a text displays ellipsis and still conveys a direct or indirect meaning that may be explained by some form of cross-domain mapping from a more basic meaning or referent than the contextual meaning recoverable from the presumably understood lexical units, insert a code for implicit metaphor (‘implicit’).

 A Method for Linguistic Metaphor Identification

An example is but he is, which may be read as but he is [an ignorant pig], when that expression is taken as a description of a male colleague discussed before. The verb is may be coded as a place filler by the code <MRW, impl>. In general, for implicit metaphor, we need one linguistic element of cohesion (which means substitution or ellipsis, including what Halliday and Hasan call ‘reference’) that is not necessarily metaphorical by itself but refers back to a previous word and concept that was metaphorically used. Potential elements of cohesion include third person pronouns, primary and modal verbs, and so on. –– The first step in finding implicit metaphor will therefore be to decide whether a particular linguistic form from a list of potentially cohesive devices has in fact been used for cohesion as opposed to another function. –– The second step is to decide whether the cohesion device is related to another word that was related to metaphor. In principle it is possible for both demonstratives as well as general words such as thing and stuff to refer back to a metaphorically used expression. In that case, they are both indirectly metaphorical (because of their linguistic status) as well as implicitly metaphorical (because of their connection to a metaphorical concept in the text base). For this type of case we should add a code which combines ‘met’ with ‘impl’: ‘metimpl’. Finally, tag questions within the same utterance are not included in our view of cohesion. They are grammatical forms enabling a particular form of asking a question. There is no alternative where the pro-forms in the tag could be replaced by full NPs or VPs. This is why these are not part of cohesion. (However, when parts of utterances are repeated by subsequent speakers in order to ask or confirm or deny what the preceding speaker said, these are core cases of cohesion.)

2.6 Signals of potential cross-domain mappings Lexical signals of cross-domain mappings are those words which alert the language user to the fact that some form of contrast or comparison is at play (cf. Goatly 1997). 1. We focus on potential markers of simile and analogy and so on, such as like, as, more, less, more/less … than, comparative inflection plus than, and so on. But we also include more substantial lexical markers such as compare, comparison, comparative; same, similar; analogy, analogue; and so on. Complex mental conception markers are also annotated as metaphor signals; they

Chapter 2. MIPVU 

include regard as, conceive of, see as; imagine, think, talk, behave as if and so on; or simply as if. All of these lexical units are coded with ‘MFlag’. 2. We exclude more general signals of all indirectness, such as sort of, kind of, and so on, since it is not always clear that they signal metaphoricity or other aspects of discourse. We have also excluded what Goatly (1997) calls topic domain signalling, such as intellectual stagnation, since its nature and demarcation were not clear from the beginning of the project.

2.7 N ew-formations and parts that may be potentially explained by cross-domain mapping We assume that new-formations, such as honey-hunting discussed above, have to be analyzed as if they were phrases consisting of more than one lexical unit: each part of such new lexical units activates a concept and relates to a distinct referent in the discourse, which both have to be checked for metaphor. As a result, we sometimes have to mark parts of lexical units (morphemes) as indicating metaphorical meaning. The guidelines for finding metaphor-related words in new-formations are a variant on the basic procedure for finding all metaphor-related lexical units described in Section 2.1. 1. Find metaphor-related words in new-formations by going through the text on a word-by-word basis and identifying all new-formations. –– A new-formation is a complex lexical unit consisting of at least one independent lexical unit which, as a whole, is not defined in the dictionary. –– A special group is formed by specialized technical and scientific terms which may be missing from the regular dictionary but may therefore be seen as new-formations for the general language user. 2. When a lexical unit in a new-formation is used indirectly and its meaning in the discourse may be explained by some form of cross-domain mapping, mark the word as related to metaphor (MRW, indirect). –– If you’re not sure about indirect word use that is explained by cross-domain mapping, go to Section 2.3. 3. When a lexical unit in a new-formation is used directly and its meaning may be explained by some form of cross-domain mapping, mark the word as direct metaphor (MRW, direct). –– If you’re not sure about direct use of lexical units that is explained by cross-domain mapping, go to Section 2.4.

 A Method for Linguistic Metaphor Identification

4. When a lexical unit in a new-formation implicitly conveys a direct or indirect meaning that may be explained by some form of cross-domain mapping, insert a code for implicit metaphor (‘implicit’). –– If you are not sure about implicit indirect meaning that is explained by cross-domain mapping, go to Section 2.5. 5. When a lexical unit in a new-formation functions as a signal that a crossdomain mapping may be at play, mark it as a metaphor flag (‘MFlag’). –– If you are not sure about signals of cross-domain mappings, go to Section 2.6.

chapter 3

Metaphor identification in news texts 3.1 Introduction “There is probably no other discursive practice, besides everyday conversation, that is engaged in so frequently and by so many people as news in the press and on television” (van Dijk 1991: 110). As news contributes to building and adapting knowledge and beliefs and “metaphor is an essential part of the way we deal with novel and current events” (Kennedy 2000: 209), news discourse is naturally a particularly rich source of figurative language. It is not surprising that a large body of research on metaphor in news discourse is available. For example, metaphorical language in news texts has widely been studied with the aim of revealing ideologies and persuasive effects in political discourse (e.g. Chiang & Duann 2007; Kitis & Milapides 1997; Musolff 2006; Zinken 2003). Studies have also looked at a number of sub-genres such as immigrant discourse (e.g. Santa Ana 1999), business discourse and financial reporting (e.g. Koller 2004; Charteris-Black 2004) or sports reporting (e.g. Charteris-Black 2004), to name just a few. None of these metaphor studies on news texts focuses on the identification of metaphors themselves. However, soundly and reliably identified linguistic metaphors can legitimize and enhance any ensuing analysis, whether empirical or interpretative. We present how applying the protocol of MIPVU introduced in the previous chapter allows for precise measurement of metaphor in news discourse – a prerequisite for evaluating the quality of an analysis. Illustrating the application of MIPVU to bulk news data moreover creates an awareness of how this method works for the much studied news register, revealing rich connections with register characteristics. Conventional “schemata” or “superstructures” (van Dijk 1988: 26), predicting, for example, the use of headlines or leads, determine the form typical of news discourse. They ease orientation for the reader. Images are also an integral part of this type of discourse. our corpus is, however, plain text. We make no attempt to analyze multimodal metaphor, such as interaction with pictorial metaphors. The language of mainstream newspapers is formal, texts are written in Standard English (or some other standard language of publication), and are consequently easily accessible. News texts are dense in information. The news production

 A Method for Linguistic Metaphor Identification

process allows journalists to carefully craft their texts and make precise lexical choices, which contrasts with the constraints of real-time production in for instance conversations (Biber 1988: 104–105). It therefore comes as no surprise that the metaphor identification procedure can be transparently applied to newspaper text. In fact, the Pragglejaz Group (2007) demonstrate the steps of MIP by applying it to a sentence from a news report. For any application of MIP to a substantially larger amount of data, however, one might expect difficulties to arise. Yet the manual analysis of 44,793 words of news in our research project has shown only a small number of cases exhibiting ambiguity and difficulty. They are exceptions and not the rule; but they are worth considering, particularly insofar as they helped in the development of MIPVU. We will highlight several such difficult examples and demonstrate that this minority of cases can still be treated within MIPVU in a systematic and consistent manner. For each of the examples, we offer possible approaches and solutions. First, however, we want to give an impression of the largely smooth application of MIPVU to the news register by mentioning some aspects of our reliability tests (for a detailed report, see Chapter 8). The purpose of these reliability tests was to check inter-analyst agreement for annotating metaphors in different registers. For the news register, 1,415 words have been included in the complete series of tests. 79.9% of the lexical units in news texts have been unanimously coded as not related to metaphor by four independent analysts. Unanimous agreement for metaphor is 15.0%, which is the largest across the four registers. This percentage should not be read as an absolute indicator of the degree of metaphor in the texts, since there is a small percentage of items where no inter-analyst consensus was reached (5.1%). It should be noted that our regular annotation procedure (unlike the reliability tests discussed here) adds another step in which analysts cross-check the annotations of the other team members and make notes when they disagree on their decisions. A group discussion to resolve those cases of disagreement follows, which reduces analyst bias as well as error. The above figures for the reliability test are taken prior to this round of discussion. The figures may point, therefore, to inherent differences among the registers, especially the incidence of difficult-to-treat cases within the metaphor identification procedure. In particular, unlike academic texts, which exhibit a much higher incidence of unclear cases than the news texts, reading newspaper articles does not require much expert knowledge for an understanding of the overall meaning of the text— general world knowledge suffices. Therefore, the contextual meaning of words can be established in the overwhelming majority of cases. An exception may be highly specific news texts such as, for example, financial reports, which may require some form of expert knowledge in financial terminology. Also, some terminology used

Chapter 3. Metaphor identification in news texts 

in sports reporting is potentially difficult to analyze. Unlike conversations, news texts consist of coherent and full sentences, and therefore there is almost always sufficient information to determine the contextual meaning of each word. The lexical units analyzed can usually be found in the dictionaries used in this project, since specialized terms are the exception rather than the rule. For the metaphorically used words there is a clear contrast between the contextual and the basic meaning and these can easily be understood in comparison with each other. In sum, the low percentage of unclear cases and the overall good results in the reliability tests show that applying each step of the metaphor identification procedure to news texts is generally straightforward. As a consequence, the vast majority of the lexical units in news texts did attain unanimous inter-coder agreement in the test. Most of the exceptions appear to be clear coder errors and could quickly be resolved through group discussion. Coder error ranges from misapplication of the procedure to overlooking metaphors. This is the reason why we set out with news as our first case study in this book. We now illustrate both a case of unanimous inter-coder agreement and a case of disagreement. A clear case is the word valuable.

(1) Professional religious education teachers like Marjorie B Clark (Points of View, today) are doing valuable work in many secondary schools (…) (K58-fragment01)

This adjective has a clear contextual meaning, ‘very useful and important’. The next step is to check whether there is a meaning that is more basic than the contextual meaning. Such a more basic meaning is ‘worth a lot of money’, because it is less abstract and more specific. Both the contextual meaning and the basic meaning are found in Macmillan. The contextual meaning and the basic meaning clearly contrast but can be understood in comparison with each other. Therefore, according to our procedure, valuable must be marked as metaphorically used in this context. Most words in news discourse are similar, in that they offer no problems for the procedure. One exception is formed by some prepositions. Prepositional phrases are common in journalistic writing since they allow for information packaging (Biber 1988), and much news writing is presumably subject to space constraints. In some contexts it is difficult to identify the metaphoricity of some prepositions, even if their basic meaning is clearly spatial. The preposition in as used in the example below is an illustration of a borderline case for which metaphoricity cannot be easily determined:

(2) Professional religious education teachers like Marjorie B Clark (Points of View, today) are doing valuable work in many secondary schools in trying to separate the facts about religion from (…)

 A Method for Linguistic Metaphor Identification

(3) In primary schools, class teachers are expected to be polymaths (…)

(4) This attempt to codify religious and moral education in the primary school is a mistake (…)

The difficulty for all three cases lies in deciding on the contextual meaning, which seldom poses any difficulties in the news register. It is unclear whether “in (many) … schools” and “in the primary school” should be interpreted as a place (which would make in literally used) or an activity (which might make it eligible to some for metaphorical use), or whether it encompasses both. There is also an interaction with metonymy. Both of these factors, however, can explain why independent analysts may differ in their judgments, as they did in our reliability test. Similar lines of reasoning have led to analyst disagreement for at in Example (5). Again the issue is whether at is interpreted to refer to an actual place or whether it is more broadly construed.

(5) Jack Kahn graduated with honours at the University of Leeds in 1928 (…) (A9Y-fragment 01)

In general, when a word is possibly used metaphorically but a non-figurative interpretation is equally arguable, we code the word as metaphorically used to include it in ensuing textual analysis but add the special tag WIDLII, ‘When In Doubt, Leave It In’, to signal its ambiguity. Whereas we tag such units as ambiguous cases, the original MIP procedure makes no such allowance. As a general understanding of news texts is not difficult to achieve, and the demarcation of lexical units has not posed any major problems either, the main focus of this chapter will be on the subsequent decisions. This is not to say that these decisions are inherently difficult to make; instead, our aim is to present examples that need refined treatment. Such cases surfaced only when applying our procedure to bulk news data as opposed to the brief examples given in the original Pragglejaz Group paper. While they may seem difficult to solve within MIP, we will show that they can still be approached in a logical and consistent fashion with the help of a more refined metaphor identification procedure. For each of the steps we will discuss a variety of problematic cases that reveal particularly interesting properties referring to newspaper articles from a BNC-Baby sample. 3.2 Establishing contextual meanings One objective of news is to inform the population about world events. The newspapers in our corpus are targeted at the non-expert reader and hence their content is generally easily accessible and clear. The only potentially difficult cases concern highly infrequent specialized terms, novel compounds and novel metaphors, and contextual ambiguity.

Chapter 3. Metaphor identification in news texts 

3.2.1 Specialized terms Specialized terms tend to occur particularly in the business and sports news subregisters. Some of them have made their way into the dictionaries we use, while others are too specialized and the contextual meaning must be established in a different way. MIP does not give explicit instructions about how to deal with such expert terminology. Consider the highly specific word usage encountered in the following business news report:

(6) (…) the Gooda Walker agency may have overstated its syndicates’ profits between 1981 and 1988 through the use of time and distance policies (…) (AL2-fragment23)

For the general language user, the expression “time and distance policies” is too specialized to determine its precise contextual meaning. It is impossible to locate the contextual meanings of time and distance in any of our dictionaries. At the same time, however, it should in principle be possible to establish the relevant specialized meaning of distance, if only expert knowledge were available. Consideration of such highly infrequent specialized language data for metaphor analysis would require informants who have such knowledge, or, alternatively, a truly specialized dictionary. Looking at the data from the general language user’s viewpoint, as we are doing, the contextual meaning cannot be established, while at the same time a contrast to the more concrete, spatial basic sense of distance cannot be ruled out by the tools at hand. An abstract use of the term is therefore potentially metaphorical. This is why we include the term in our collection of metaphorically used items but tag distance as an ambiguous case (WIDLII). 3.2.2 Novel compounds and novel metaphors Novel metaphors are said to be abundant in press reports (e.g. Croft & Cruse 2004: 104). While they may be typical in some subgenres of news texts, novel language use is not at all frequent in our overall news corpus, let alone the other registers. Moreover, there is a fine line that distinguishes novel from conventional language use and this line is often difficult to locate. In our research, we have found that absence from the dictionary is a criterion which is easy to use and has to be applied to an estimated one percent of all lexical units classified as related to metaphor. In the following example, state-masonry is a novel lexical unit and cannot be found in the dictionaries.

(7) The masses are being engaged in the craft of state-masonry.

(A9J-fragment01)

 A Method for Linguistic Metaphor Identification

The assumption is that each word in the novel compound will activate a distinct concept and is related to a separate referent in the projected text world. Readers eventually need to parse novel compounds into their components in order to establish the presumed relation between the two concepts and the two referents. Therefore, because state-masonry cannot be found in the dictionaries as a whole, it is necessary to look up the entry for state as well as for masonry. State is a general word, which can be applied equally to concrete and abstract things as well as physical and mental situations. It is therefore not metaphorically used. The basic meaning of masonry refers to ‘bricks and stone’, which is not the contextual meaning. Since physical building can be compared to abstract constructing, however, it may be classified as related to metaphor. Therefore, state is literally used and masonry is metaphorically used. Branching out from novel compounds, there are some novel metaphors which can be located as lexical items in the dictionary, but whose novel contextual meaning has not made its way there (yet). Only once a metaphor becomes frequently used by a speech community does its metaphoricity become conventionalized to the point that, to the everyday speaker, it seems like a familiar expression (Croft & Cruse 2004: 105). Consider the lexical unit roof in the following excerpt from a newspaper article on the conflict in the Middle East:

(8) A pyramid administrative structure, establishing links from popular committees in villages right up to the Executive Committee of the PLO (in its capacity as a Cabinet), can be established. During the Intifada the people have been engaged in building the side walls. A government would provide the roof which would bring these walls together. (A9J-fragment01)

The contextual meaning of roof is an overarching abstract structure that a government represents and is thus metaphorically used in this context. This meaning is not listed in the dictionaries, however, suggesting novel language use since we can contrast it to the basic meaning ‘the top outer part of a building, temporary structure or vehicle’. Treating cases of novel language use is delicate, however, because it is not always clear when precisely a lexical unit can be called novel. This is demonstrated by the lexeme outskirts in the following sentence:

(9) Walking here, you leave the 20th century behind on the outskirts of the forest and enter the reconstructed emptiness (…) (AHC-fragment60)

Its only meaning in the dictionaries, ‘the areas of a town or city that are furthest away from the centre’, is not the contextual one. In our example, outskirts refers to the areas of a forest that are furthest away from the centre. Assuming sufficient distinctness between these two meanings, this means the lexeme is, according to

Chapter 3. Metaphor identification in news texts 

our definition, used in a novel fashion in the present context. Since it is possible to compare the novel contextual use with the conventional basic use, the word may be classified as related to metaphor. The analyst must, however, keep in mind that dictionaries do not capture all contemporary language use because there is a frequency threshold a meaning needs to pass in order to be considered sufficiently conventionalized (Steen 2007: 100). One option is to accept this type of dictionary as simply one relevant reflection of conventionalization, which captures an important level of the experience of language users. Another possibility is to go for a greater degree of refinement and to check a larger corpus, such as the BNC-World. The decision to be made is what frequency of occurrences marks the appropriate cut-off point between conventionalized and novel uses for the purposes of a particular research project (e.g. Cameron & Deignan 2006: 678). For the present example, a search of outskirts in the BNC-World shows that most items are used in the meaning as described in the dictionaries. Only two out of fifty randomly selected hits (600 in total) were used in a novel way, and none of them was applied to a forest. We therefore choose to follow the dictionary as a general rule and regard outskirts as a novel metaphor in ‘outskirts of the forest’. The rigorous framework we are applying leads the analyst to mark some cases as metaphors based on supposedly novel usage which may be looked at in other ways when other tools are used. But the dictionaries are used precisely because we do not want to leave the analysis of metaphors on a linguistic level to the analysts’ intuitions. What is important is an awareness of the restrictions that are imposed by using this framework. 3.2.3 Contextual ambiguity For a number of lexical units the precise intended meaning cannot be determined despite the rich context typically provided in journalistic writing. Consider the following sentence: (10) But by the time I had turned off the road from Bellingham at Kielder village and driven up the bumpy Forest Drive to East Kielder Farm, (…) (AHC-fragment60)

In this case it is possible that the word up was used indirectly and therefore metaphorically (further along a path), though it may also have been used in a direct, nonfigurative way (a higher location). The journalist does not elaborate on the precise location of East Kielder Farm and thus the analyst lacks sufficient information to disambiguate the meaning of up. Since both interpretations are equally possible, the lexical unit up is tagged as an ambiguous case, comparable to at in Example (5). It is hence also given the code WIDLII and is regarded as potentially metaphorically used.

 A Method for Linguistic Metaphor Identification

The following case, which deals with the judgment of a word’s metaphoricity in connection with money, is more subtle in the sense that it allows for multiple levels of contextual semantic analysis. In the sentences below the issue is whether or not to code the items in italics as related to metaphor: (11) (…) a charity called Food International, which raises money from the fiercely competitive matrons of Palm Beach (…) (AL0-fragment06) (The basic meaning of from is ‘starting at a particular place and moving away’.) (12) You got money, you got fame (…) (A5E-fragment06) (The basic meaning of got, here used for saying ‘have’ in informal speech, is ‘used for showing possession’. Note that got as used in this context has its own lexical entry in the dictionary.) (13) (…) until they get any money back (…) (AA3-fragment08) (The basic meaning of get is ‘to receive something that someone gives you or sends you’. It has been taken from Longman because Macmillan conflates concrete and abstract meaning descriptions. This is one of the circumstances in which we add Longman as a second opinion. For more details on the use of Longman see Section 3.4 below.) (14) (…) it isn’t by any means clear what the bill will be or where the money will come from. (A7W-fragment01) (The basic meaning of come is ‘to move or travel to the place where you are’.)

If money is understood to be something concrete, none of these items are metaphorical in the regular sense discussed so far (Example 14 is an exception, because it would lead to possible personification, in that case—see the discussion of Examples 19 and 20 below). If, however, money is an abstract concept, they should each be marked as metaphorically used. The issue is this: on the one hand, money is concrete, in the form of coins and bills; but, on the other hand, money is also abstract in virtual environments (e.g. online banking, account balances). Thus this is a prime example of a borderline case. We have resolved it by arguing that, in principle and in the present day and age, money is (still) concretely reclaimable. Therefore from, got, get and come in the examples above are literally used. Another interesting case is the word system, which, depending on its context, can take on concrete or abstract meaning, or both. The relationship between the basic and the contextual meaning can either be literal, metaphorical or metonymic, as seen in the following examples: (15) PCBs are so difficult to destroy, that Rechem’s emission-monitoring systems are geared to detecting them on the grounds that if you destroy PCBs you destroy everything. (A1U-fragment04)

Chapter 3. Metaphor identification in news texts 

(16) In systems development nothing is more fundamental than assessing user requirements. (A8R-fragment02) (17) (…) the practicalities of an alternative voting system (…)

(A1F-fragment08)

(18) THIRTY FIVE people died and others were maimed for life in the Clapham rail disaster in December last year because work was done in a slovenly, haphazard way and was then left unchecked. (…) Yet yesterday’s report on the Clapham crash, confirming the picture which emerged throughout the Hidden inquiry, makes an event which seemed at the time totally unexpected look almost (A7W-fragment01) inevitable. This was a system hopelessly under strain.

For sentence (15), Macmillan describes a system as ‘a set of connected things that work together for a particular purpose’. This is the contextual but also the basic meaning. Since the technicalities of a concrete system are described, system is not metaphorically used. The newspaper text in (16) refers, again, to a concrete system (‘a group of computers that are connected to each other’). At the same time this type of system includes an abstract system of ‘an organized set of ideas, methods, or ways of working’ that is part of the concrete system. Since this is a part-whole relationship, the relation is via contiguity and not via similarity, and therefore systems is not used metaphorically in this case. Example (17) clearly describes ‘a method of organizing or doing things,’ and not a concrete system. Therefore, system must be marked as metaphorically used. In the last Example (18), the context allows for an ambiguous interpretation of the word. Since it can be read as either a concrete or an abstract system, we include the item in our metaphorically used data for later textual analysis by marking it as an ambiguous case (WIDLII). The final issue we discuss in the framework of contextual meanings is personification, a phenomenon the analyst frequently comes across in news texts. By means of personification the author’s presence and views can be concealed, creating a sense of objectivity (Caballero 2003: 164). Personification can also disguise the fact that there are actual people responsible for the actions described: “(…) although journalists typically present a news account as an ‘objective’, ‘impartial’ translation of reality, it may instead be understood to be providing an ideological construction of contending truth claims about reality” (Anderson & Nicholson 2005: 158). As two cases in point, consider (19) and (20), where the context allows for two interpretations. Both a metaphorical and a metonymic interpretation of the verb are possible. (19) ‘A party can’t even decide its name (…)’

(A7W-fragment22)

(20) (…) the Gooda Walker agency may have overstated its syndicates’ profits (…) (AL2-fragment23)

 A Method for Linguistic Metaphor Identification

For instance in Example (19), the sense description of decide found in Macmillan that is closest to the contextual meaning is ‘to make a choice about what you are going to do.’ The use of the pronoun “you” emphasizes that “deciding” is a human activity. In the present context, the corresponding noun “party” can be interpreted in two different ways. First, the individuals who make up the party can be in focus, in which case party is interpreted metonymically and decide is not used figuratively. As an alternative, the party can be regarded as an abstract group acting as one person. In the latter case decide is metaphorically used since its basic sense is human-related. Because the possibility of metaphorical usage depends on analyst perspective (cf. Low 1999), we code language use of this kind as “possible personification”. MIP does not offer a mechanism for indicating that a lexical unit may have both a metonymic and a metaphorical interpretation. Keeping in such words as “possible personification” is a feature of MIPVU. We shall return to personification in greater detail in Chapter 5, on fiction. 3.3 Establishing more basic meanings Establishing the basic meaning of lexical units in news texts is usually simple. The high percentage of nouns in news reports (Biber 1988, 1989) helps because prototypically their meaning is more autonomous than that of, for instance, verbs, which makes it easier to find a basic sense (Pragglejaz Group 2007: 28). The use of words with relatively specific meanings is also reflected in the high type-token ratio that is typical of news texts. A high type-token ratio is an indicator of high lexical variation and results from precise lexical choice that aims at an exact presentation of information (Biber 1988: 105). Rare challenging cases emerge only when (1) the analysts differ in their intuitions as to the basic meaning, yet find that contemporary dictionaries do not contribute any information that helps to resolve the problem, or (2) the sense descriptions in the dictionaries are derivations of a basic meaning that is no longer familiar to the contemporary language user. We do not regard these challenges as a drawback, using them instead to improve MIP. The overwhelming majority of cases can be resolved by using the Macmillan dictionary, and the Longman dictionary when needed. However, for rare cases, analysts may still disagree on a unit’s basic meaning after lengthy discussion and consulting both dictionaries. For these cases, one recourse is to check the OED in order to achieve better understanding of the historical development of the word. We noted in Chapter 1 that a word’s history, in our own research, is usually disregarded. Nevertheless, in order to treat those cases that cannot be resolved using the contemporary dictionaries alone, the age of a word’s meaning may be considered as a “tiebreaker”. Again, for the bulk of the cases, such a tiebreaker is not

Chapter 3. Metaphor identification in news texts 

needed and the OED is not consulted. But some cases that have been resolved by utilizing the OED include the following. (21) Drifting between grassy polders to which farmers have to ferry their cattle in punts, or following leafy twisting lanes marked only by rusty signs proclaiming the ‘Venise Verte’, you’re in an all-green, mysteriously silent world; only the occasional fisherman, twitching his rod above the algae-smothered waters, (AHC-fragment61) disturbs the stillness.

The contextual meaning of disturb (‘to do something that stops a place or situation from being pleasant, calm, or peaceful’) is clear. The analysts disagreed, however, about the basic meaning. There are two arguments. The third sense in Macmillan, ‘to make something move’, makes reference to a concrete form of movement, and therefore qualifies as a candidate for the basic meaning. However, analysts may be distracted by the salience of the human-oriented first two senses (‘to interrupt someone and stop them from continuing what they were doing’ and ‘to upset and worry someone a lot’). Longman offers similar sense descriptions and therefore does not solve the quandaries. The OED suggests that all senses are equally basic because the primarily physical sense and the primarily abstract senses appeared roughly at the same time. This led us to conclude that disturb in the above example is not metaphorically used since it is sufficiently close to the third sense, ‘movement’. The unit served in the example below also needed group discussion. (22) He served with distinction in the child psychiatry section of the Royal College of Psychiatrists (…) (A9Y-fragment01)

The discrepancy of opinions stems from difficulties in settling on the basic meaning. A clear basic meaning is not immediately obvious. Entries in Macmillan (only three are listed) refer to, for example, providing food and drink, doing a job or performing duties, and helping customers to buy goods in a shop. We argue that these senses are instantiations of the same idea, namely to perform some sort of duty. The historically oldest (and here, we argue, basic) meaning is ‘to be a servant; to perform the duties of a servant.’ The contemporary meanings are derived from this basic meaning, but are not in contrastive opposition as long as the action of serving is performed by a human being. Therefore served is non-metaphorical. The OED is also a useful source when senses seem to be related somehow, but the exact nature of this relationship is unclear. This may indicate that they are derived from a basic meaning that is obsolete. The meanings of issue, as in the following example, illustrate this class: (23) Parliament urged to think again on housing issue:

(A7Y-fragment03)

Macmillan gives the following sense descriptions: ‘a subject that people discuss or argue about, especially relating to society, politics’, ‘a magazine that is published

 A Method for Linguistic Metaphor Identification

at a particular time’ and ‘a set of things, for example shares in a company, that are made available to people at a particular time’. Since it was hard to decide on a basic meaning, we checked the OED, where it appeared that all senses may be regarded as equally basic, since they developed from the old meaning ‘the action of going, passing, or flowing out; egress, exit; power of egress or exit; outgoing, outflow’. Therefore, none of the currently surviving senses is metaphorically used. Although we occasionally consider the history of a lexical unit, we do so only as a last resort. Overall, our approach is more explicitly and intentionally synchronic than the Pragglejaz method. A word’s history is only taken into account for rather rare cases of disagreement and uncertainty, namely when more than one candidate for a basic meaning is present and there is no indication of which candidate should take precedence.

3.4 Contrast and comparison This section describes issues related to comparing and contrasting the basic and the contextual meanings. We present the approach we take when the contextual meaning and the basic meaning can be found in Macmillan but are listed under the same sense description. Subsequently, we describe situations for which it is unclear whether two senses are distinct enough to allow for a mapping, either because the senses are metonymically related or because one of the senses is just a specification of the basic sense. Metaphorical meanings depend on a contrast between a contextual and a more basic sense. Our main operational criterion for deciding whether two senses are sufficiently distinct is whether the contextual and the basic senses are listed as two separate, numbered sense descriptions in the dictionaries. Sense descriptions subsumed under one single sense are regarded as manifestations of the same meaning. For instance, the third sense description for run in Macmillan, ‘if a machine or engine runs or you run it, it is working’, includes the sub-senses 3a, ‘to start or use a computer program’, and 3b, ‘to own and use a motor vehicle’. These may all be seen as slightly more specific manifestations of the main sense. The third sense as a whole would be held to be monosemous, that is, to have only one meaning. This is also the case for the noun struggle as used in (24) Ulster, the provincial champions, may well fancy their chances on November 21, but Leinster look certain to face an uphill struggle even though the tourists have rested 13 of the team that beat Wales. (A80-fragment15)

Chapter 3. Metaphor identification in news texts 

Macmillan gives the following entries: 1: ‘an attempt to do something that takes a lot of effort over a period of time’, 2: ‘a fight or a war’, 2a: ‘an attempt to defeat someone or something, or stop them from having power over you’ and 3: ‘something that takes a lot of physical or mental effort’. The first sense is abstract. The second and third senses conflate physical and mental struggle, which means that the descriptions cannot be easily contrasted relying on Macmillan alone. Struggle demonstrates that one must be aware of the constraints under which dictionary makers operate. Sometimes senses are collapsed, although they might have appeared as two separate sense descriptions had more space been available. In other cases, examples may be simplified for the target audience of learners (Deignan 2005: 63; Steen 2007: 98). However, despite these constraints, we believe that dictionaries – standardized descriptions of language data – are a legitimate tool with which to move away from analyst intuitions towards repeatability of results. Since the opposition of physical struggle (which would qualify as a basic meaning) and effort (the contextual meaning) does point towards a possible metaphorical tension, we find it useful to check Longman as a second source. Longman does list a separate sense for physical struggle: ‘a fight between two people for something, or an attempt by one person to escape from the other’. As Longman does not combine abstract and concrete senses into one description, we take the view that they are sufficiently distinct. Struggle must therefore be marked as metaphorically used. For a number of cases, however, both dictionaries conflate, for instance, concrete and abstract meanings, as is the case for create: (25) His father was a rabbi and a biblical text was to create another well known work by the son, ‘Job’s Illness – Loss, Grief and Integration.’ (A9Y-fragment01)

Following intuition, there seems to be an opposition between designing something concrete and making something abstract. Inspection of both dictionaries, however, suggests that the word’s meaning is general, and that anything, irrespective of the level of abstraction, can be created. What initially looked like a possible reduction of polysemy (conflation of a concrete and an abstract sense) turns out to be agreement over monosemy. A similar problem is posed by the verb use, for which, intuitively, there is a contrast between using a tool and using a method: (26) What criteria would police and immigration officials use in their search for ‘potential terrorists’ on a train (…) (A1F-fragment11)

Again, both dictionaries appear to combine abstract and concrete tools under one heading. Macmillan, for example, gives the following entry: ‘to do something

 A Method for Linguistic Metaphor Identification

using a machine, tool, skill, method etc in order to do a job or achieve a result’. This particular sense of the verb is monosemous and conventionally employed in both abstract and concrete contexts. Therefore, use in the sense of “using a method” is not metaphorical by the criteria of MIPVU. It is also possible for a lexical unit to be not metaphorically used despite having separate basic and contextual sense descriptions in the dictionary that are somehow related. We illustrate this class by looking at the word drops in the following excerpt from the leisure pages of the Daily Telegraph: (27) Now the path ran through heather high above the burn, past circular sheepfolds long disused and over the stony beds of side streams where the grass hung (AHC-fragment60) smooth and inviting, concealing ankle-breaking drops.

The basic sense of drop, ‘a very small amount of liquid with a round shape’, and the contextual sense, ‘a distance down to the ground from a high place’, are related; however, this relationship is one of contiguity and not of metaphor. The object ‘drop’ stands for the distance that it covers before it reaches the ground. Due to this metonymic relationship the two senses are distinct, but they are not understandable by comparison. Therefore, drop is not metaphorically used. In the following example we again address whether two senses are sufficiently distinct, in this case for the lexical unit labour. (28) (…) low zinc levels may lead to problems in pregnancy, from difficult labour to congenital malformations in children. (A1X-fragment04)

Labour has separate sense descriptions for the contextual and the basic meaning. As demonstrated earlier, two separately numbered sense descriptions often indicate that there is sufficient contrast between two meanings of a word, which may point to metaphorical usage. However, the contrast of the contextual meaning in the present example – ‘the process by which a baby is pushed from its mother’s body during childbirth’ and the basic meaning ‘work’ is not strong. The process of giving birth is hard “work”. Labour is therefore not metaphorically used – it can be taken as a specification of a more general basic sense (e.g. Geeraerts 1997; Koch 1999). Macmillan is our major source of reference, which is in accordance with the MIP procedure. As the examples above have demonstrated, however, the use of Macmillan alone is unsatisfactory for some lexemes. MIPVU employs Longman as a second opinion when appropriate. This is done systematically and only when two clearly contrasting meanings are conflated under the same sense description, as well as when it is unclear whether two separate senses are sufficiently distinct.

Chapter 3. Metaphor identification in news texts 

3.5 Direct metaphor Metaphor-related words in the news corpus are typically indirectly used. However, this is not the only way cross-domain mappings can surface. Journalistic writing occasionally employs direct language use that still triggers a cross-domain mapping. This cannot be captured by contrasting basic and contextual meaning, however, as is shown in the following example: (29) IN SYSTEMS development nothing is more fundamental than assessing user requirements. (…) But many system developers are unable to assess requirements properly. They seem to think that you can ask a businessman what his requirements are and get an answer that amounts to a draft system specification. A doctor doesn’t ask his patient what treatment to prescribe. The patient can explain only what the problem is. It is the doctor that provides the remedy. (…) A user may have a deep knowledge of business problems, but knowing little about computers, has no idea how they should be tackled. Yet, analysts are heard asking time and again, ‘Tell me what you want. (…)’ But of course the users don’t know what they want, so they end up getting another duff system. An effective analyst provides the same service to the business as (A8R-fragment02) the doctor provides to the patient.

The italics mark a topic shift from the domain of computers to the medical domain. Because we know how a doctor treats a patient, we can understand how a system developer deals with a user. This comparison of a systems developer to a doctor and the user to a patient triggers a mapping between the two contrastive domains. Within MIPVU, we therefore mark all content words that are part of the topically incongruous stretch of text with a special tag, indicating that the indirect conceptualization is expressed directly – and not indirectly as is the case for most metaphorical language. In this example, the mapping extends over a longer stretch of text. More frequently occurring are less elaborate similes, signalled by words such as like or as, creating a local shift in frame of reference: (30) For many years Thompson lived in New York in his apartment at the Chelsea Hotel. From there, like a buzzard in its eyrie, he would make forays round the US and abroad (…) (A1H-fragment05).

MIP, which focuses purely on indirectly expressed linguistic metaphor, cannot deal with metaphor-related language use of this kind. When language is used in a direct way but does involve a cross-domain mapping, the coder has to identify this as metaphor-related language, too.

 A Method for Linguistic Metaphor Identification

Nevertheless, within a simile or other form of directly expressed metaphorical comparison, a lexical unit can still be metaphorical, for it may have a more basic meaning than the contextual one that expresses the source domain. For instance, the preposition to in Example (29), “as the doctor provides to the patient”, has a more basic meaning that involves some kind of movement from one concrete spot to another. The contextual meaning of to is abstract. Since the basic and contextual sense can be contrasted but can be understood in comparison with each other, to is metaphorically used – in an indirect way. The analyst must therefore, for each and every lexeme within the stretch of directly used language expressing a cross-domain mapping, apply the steps of the metaphor identification procedure as usual. This may lead to marking a lexical unit as direct and indirect metaphor at the same time. (For more details on directly expressed metaphor refer to Chapter 5.) There is an important terminological consequence of this extension of MIP into the expression of metaphor by other forms than metaphorical language use, which has been mentioned before. The phrase buzzard in its eyrie is not metaphorical language use in the same way as a word like defend in Lotte defended her thesis. We have therefore adopted the following terminological conventions: –– Cases like defend, which have turned out to constitute the bulk of metaphor in discourse, can be called metaphorical language use, or metaphorically used word(s); they involve indirect meaning by comparison. –– Cases like buzzard in its eyrie cannot be called metaphorical language use, or metaphorically used words; they involve direct meaning by comparison. In other words, indirectness in conceptualization through a cross-domain mapping is expressed by direct language. –– But it is possible to refer to both sets of cases as ‘metaphor-related words’: the words are used in such a way that, in subsequent analysis, they can be related to more specific underlying conceptual structures that are metaphorical. This holds for both defend and for buzzard in its eyrie. When this is important, we will rely on these terminological distinctions. 3.6 Conclusion News texts have served as a rich source of data for metaphor analysis. However, we are aware of no previous work focusing on the identification of linguistic metaphors themselves in this type of discourse. Since linguistic metaphors often serve as a basis for further linguistic, conceptual, and communicative analysis, a reliable identification procedure, as well as an understanding of how it works within the news register, is essential.

Chapter 3. Metaphor identification in news texts 

Linguistic metaphor identification in news articles is relatively straightforward. General world knowledge is sufficient to understand the meaning of a news text, specialized terms are rare and the discourse is coherent. Indeed, only 5.1% of the lexical units in a series of reliability tests, performed by four analysts, did not receive unanimous inter-coder agreement, which is the lowest of all four registers in our data. Of this already low percentage, the majority of cases of disagreement can be attributed to coder error. The application of our procedure to newspaper discourse has unveiled very few difficult or ambiguous cases. These few remaining items, though they may seem challenging at first, can generally be solved in a reliable and consistent manner. For each of the core steps of the identification procedure we have demonstrated a series of difficult examples that have surfaced when applying MIP to bulk news data, along with their possible solutions, which helped create our more elaborate tool for metaphor identification, MIPVU. This is not meant to suggest that our analyses are free of error. Instead, it should be possible to detect remaining errors fairly easily against the explicit set of assumptions formulated in MIPVU. MIPVU differs from MIP in several ways. The unit of analysis is the grammatical word class, not the broader lemma; this is decisive for the selection of relevant contextual and basic senses that need to be distinguished and compared. When the contextual meaning of a word cannot be established using the dictionaries at hand, whether because of its technical use or because of ambiguous context, we retain the unit in our dataset as potentially metaphorical, marked by a special tag, WIDLII. We use Longman as an additional tool – mainly for cases in which it is not clear whether two senses are sufficiently contrastive. In a small minority of cases analysts still disagree on the basic meaning of a lexical unit after consulting both contemporary dictionaries: for these rare cases, as well as when the relatedness between polysemous senses is unclear, they may consult the OED to take the historical development of a word into account. A final addition is the consideration of directly expressed metaphor for analysis. Though there are those cases that need a more elaborate decision process, we emphasize that, once an analyst is familiar with MIPVU, metaphoricity can be judged quickly for the majority of lexical items in news texts. The examples offered in this chapter have pointed out that even complex cases can be approached in a systematic and reliable manner. By following a consistent decision process, the number of borderline cases can be kept low, which reduces the level of potential error and noise in subsequent quantitative analysis. We do not see the challenging examples as a setback; rather we have used them to design MIPVU in order to deal with more subtle cases.

chapter 4

Metaphor identification in conversation 4.1 The wild world of conversation In terms of discourse, there has always been a distinction between writing and speech. Many linguists have stressed the salient characteristics of each of the two activities (e.g. Chafe 1994; Clark 1996; Crystal 2003). Of these Chafe’s theory of ‘consciousness’ (1994) is attractive for a theory of metaphor that clearly separates linguistic form, conceptual structure, and communicative function. Chafe opposes three conditions under which language may be produced and received: speaking, writing and thinking. They all serve to “shape unique flowing experience into already established patterns that language provides […] and overt language— speaking and writing—offers a way to narrow the chasm between independent minds” (p.41). He identifies the following distinctive features for speech and writing respectively: evanescence of an utterance versus permanence and transportability of a text, difference in tempo, spontaneity versus the deliberate working over of a text, the richness or absence of prosody, the naturalness of speech versus the more artificial nature of writing, and finally, the situatedness of speech versus the desituatedness of writing (pp. 43–44; cf. Clark 1996). One way or another, these features involve and influence the production or processing of language and, more importantly for our current endeavour, the linguistic manifestations of metaphor. Amongst current studies of metaphor and talk, one of the most holistic approaches incorporating the distinctive features of speech in interaction is Cameron’s discourse dynamics approach (1999, 2003, 2008a, 2008b; Gibbs & Cameron 2008). In this approach, Cameron stresses the interactional and fluid nature of metaphor performance. She adopts metaphor analysis at a local or microlevel (within the discourse event or conversation), but also for different conversations between the same speech partners over a stretch of time (macro-level). Speakers may negotiate metaphors or develop them by repeating, relexicalizing, explicating or contrasting them. Throughout their conversations, speakers may show the use of recurring ‘systematic’ metaphors. Cameron stresses the important dynamic nature of conversation in which speakers do not simply put their own thoughts into words, but do so while also taking the listener’s perspective into

 A Method for Linguistic Metaphor Identification

account (this ‘dialogic’ perspective is adopted from Bakhtin 1981, and similarly adopted by Chafe 1994). In this model, linguistic, cognitive, affective and sociocultural manifestations of metaphor are all incorporated at the same time. Within this approach, Cameron applies a similar distinction between levels of metaphor analysis as the present casebook, by teasing apart three: (1) the shape of a metaphor (or the linguistic level) in order to describe manifestations of metaphor and their linguistic environment, (2) the use of a metaphor (the communicative function), for example to convey an attitude or manage the discourse, and, finally, (3) the cognitive level of metaphor interpretation (the conceptual level). The difference between Cameron’s approach and ours is their focus of research. Cameron is interested in the development of metaphor in talk for individual language users in interaction. Our research focuses firstly on general linguistic patterns of metaphor usage within a wide range of randomly picked conversations collected from different speakers in order to compare this to metaphor in written registers. Before embarking on such analyses, however, coding at the linguistic level has to be finished in an empirically reliable fashion in order to do quantitative work with the data. As we have seen, MIP is a general procedure designed for metaphor research in varying contexts, or “metaphor in the wild” (Pragglejaz Group 2007: 1). This chapter describes the application of MIPVU geared to the ‘real wild’, namely the seemingly unruly ‘jungle’ of on-going conversation. Within our research group we analyzed a sample of some 45,000 words divided over 24 extracts randomly picked from the 1-million-word BNC-Baby corpus of spoken data. The BNC-Baby corpus of spoken data consists of 30 randomly selected texts from the bigger BNC corpus and only contains data from the demographically sampled part of the corpus (which is representative of the UK population in terms of age, gender, region, and class). The texts are recorded by selected individuals and contain data produced by different numbers of speakers in different contexts. Their interaction is transcribed per utterance, which indicates a continuous stretch of speech per speech participant (a speech turn). All conversations consist of natural, spontaneous speech in face-to-face interaction, or ‘casual conversation’ (Eggins & Slade 1997: 19). Recordings were for instance made in the work environment, such as when speakers were plucking chickens or preparing packaged food. Other contexts include situations in and around the house, such as participants watching TV or playing a game. At other times, recordings were made while walking outside or paying a visit to McDonalds. The examples offered in this chapter are mainly taken from the three conversation texts used to test the accuracy of our procedure (for a report of the reliability tests, see Chapter 8). They are referred to in this chapter by the names they obtained in the BNC-Baby corpus: KBW-fragment01, KB7-fragment57, KNR-fragment01. The texts specifically concern cleaning out a fishbowl, chit-chat while driving to work, and deciding where

Chapter 4. Metaphor identification in conversation 

to order food. Where this is deemed useful, they are complemented by additional material from the rest of the 45,000-word annotated spoken language corpus. Some of the examples presented below will be of unambiguous cases illustrating which elements in conversation provide no difficulties for MIPVU. Other examples are borderline, or in-between cases highlighting instances where analysts tend to disagree and why. Generally, the examples focus on those features of conversation that need specific attention when annotating a corpus for empirically based metaphor research. Even though MIPVU analysis focuses solely on linguistic metaphor and does not ask for an interpretation of conceptual structures or communicative functions of the metaphorically used words, these higher levels of analysis will be occasionally discussed in order to place the examples in their register-specific context. The main focus, however, will be on the linguistic data analysis, the MIPVU method and difficult areas for metaphor annotation. 4.2 Illustrating MIPVU in conversation Perhaps one of the most interesting findings for conversation is that it turns out to be the register with the highest number of lexical units that are not related to metaphor (for details, see Chapter 10). This may partly be explained by the fact that some of the content spoken about is actually present in the immediate real-world context and does not need to be described in metaphorical words. In casual face-to-face interaction people as a rule do not refer to their car as their ‘carriage’ and their chair as their ‘throne’. Another influence may be the nature of the topics discussed. Since topics in the demographically sampled part of the spoken corpus mostly concern random everyday activities, the number of metaphor-related words may be smaller than in more personal and emotional settings. Emotionally charged topics are generally not encountered in the BNC-Baby fragments. In comparison with the other registers, conversation scores lowest in terms of metaphorically used words. Also, these metaphors are consistently highly conventional metaphors, as will become clear in this chapter. Since we find so few metaphorical words in casual face-to-face conversation one might wonder why we study them in the first place. The answer is that in order to come to a better understanding of metaphor and its use, no context should be shunned. Our overall view of casual conversation as a context well worth of study has been succinctly formulated by Eggins and Slade: […] despite its sometimes aimless appearance and apparently trivial content, casual conversation is, in fact, a highly structured, functionally motivated, semantic activity. Motivated by interpersonal needs continually to establish who

 A Method for Linguistic Metaphor Identification

we are, how we relate to others, and that we think of how the world is, casual conversation is a critical linguistic site for the negotiation of such important dimensions of our social identity as gender, generational location, sexuality, social class membership, ethnicity, and subcultural and group affiliations. In fact, […] casual conversation is concerned with the joint construction of social reality (Eggins & Slade 1997: 6).

Besides simply transmitting pieces of random information, speech participants in casual conversation are with each utterance either deliberately or subconsciously negotiating their social position towards an addressee as well as their topics of speech. Through their word choice speech participants often convey affect or create intimacy by intensifying or evaluating the speech content (see also Carter 2004: 117). A relatively frequently used word class in conversation, for example, is the verb, especially primary and modal verbs (Biber et al. 1999). We often describe our own actions and modalize our messages with directive discourse markers such as ‘I think’, ‘I hope’, ‘I believe’, and modal verbs such as ‘must’, ‘will’, ‘may’ and ‘have to’. Moreover, adverbs often suggest the emotional attitude of a speaker or the imprecision of word choice (through, for example, ‘hedges’), and they occur almost twice as often in the demographic conversation data than in the other BNC registers analyzed in our project (for more details, see Chapter 10). Casual conversation is thereby highlighted as a site for social interaction (see also Coupland 2000). The main points of interest for the general study of metaphor in casual conversation are then: When do metaphorically used words show up in this context? Where do we find them? In which forms? And for which purposes? One way in which metaphors function in conversations is through so-called ‘involvement’ words. These “offer interactants ways to realize, construct and vary the level of intimacy of an interaction” and “include lexical systems, such as the use of vocatives, […] technical or specialized lexis, slang, and swearing” (Eggins & Slade 1997: 143–144). Nicknames, terms of endearment and swearwords, for example, are often metaphorical. An example and first illustration of MIPVU for casual face-to-face conversation can be found in the following sentence taken from the BNC-Baby spoken by a mother (Carole, 36) to a two-year old child (Charlotte). They are leaving home to buy polo shirts:

Example (1) KBH-fragment09

Charlotte: Off we go 〈unclear〉 Carole: Just a minute darling 〈pause〉 it’s alright I 〈pause〉 can afford to buy you a packet of Polos. Charlotte: Off we go again 〈pause〉 I go home. Carole: Pet hold mummy’s hand 〈pause〉 hold mummy’s hand, there’s a good girl 〈pause〉 off we go.

Chapter 4. Metaphor identification in conversation 

Here MIPVU works in a straightforward way. The basic sense of the word pet can be found in the Macmillan dictionary under sense description 1: ‘an animal or bird that you keep in your home and look after’. The contextual meaning, ‘used for talking to someone in a friendly way’, can be found under sense description 3. Even though the definition of the contextual meaning is not as specific as the definition of the basic meaning, it is clear that two distinct domains are concerned. The basic and contextual meaning can be contrasted: a person is not an animal. The basic and contextual meaning can also be understood in comparison with one another: both the person and the pet are, for example, cute. Therefore we are dealing with a metaphorically used word. Apart from realising intimacy through ‘involvement’ words, speech participants often rely on a common ground provided by their immediate surroundings, the topics discussed, and their socio-cultural knowledge. This ‘situated’ context (Chafe 1994) may partially explain the low number of nouns and adjectives in casual conversation. When metaphoric nouns occur in casual conversation, this is often in the context of vague language use, illustrated by the following extract. In this conversation four friends, Jill (21), Lee (23), Sarah (19) and Rachel (19), are discussing what they will have for dinner, probably take-out food, and where to get it. They consider one place in particular where Jill thinks waiting times are not very long:

Example (2) KNR-fragment01

Jill: We didn’t wait for very long in the 〈unclear〉 did we? Lee: 〈unclear〉 Sarah: When you phone your order and they say twenty past they really mean twenty past don’t they. I mean we arrived at quarter past and she said, Oh you’re a bit early. Lee: Mm. Sarah: They just don’t like even attempt to make it before then really do they? Rachel: Mm. Jill: Mind you it was that early so they might not have had everything sort of on the go and stuff.

Applying MIPVU, the basic meaning of stuff is sense description 2 in Macmillan: ‘the basic material or substance people use for making something’. The contextual meaning is a phrase following the separate sense descriptions: ‘spoken, used for things that are similar or related to the subject you are discussing’. Consequently, the basic and contextual meaning can be contrasted (the subject is ‘having everything on the go’, which is not a substance but a situation) and understood in comparison with one another (as is the general use of the phrase and stuff) and therefore we are dealing with a metaphorically used word. Existing literature exploring the functions of vague language use (e.g. Channell 1994; Cutting 2007; Jucker et al. 2003; Tannen 1989) describe it as both

 A Method for Linguistic Metaphor Identification

an unintentional and an intentional phenomenon related to both production as well as processing of speech. Speech participants may simply be too tired, too lazy or under too much time pressure to produce a less general and vague word. At the same time, vague language can be more strategically used; for example to avoid face threatening acts that may be provoked by more detailed language (Channell 1994; Eggins & Slade 1997), or to construct in-group membership as well as to exclude outsiders (Cutting 2000, 2001). Tannen (1989) identifies it as a signal of high involvement style: “the more work […] hearers do to supply meaning, the deeper their understanding [of an utterance] and the greater their sense of involvement with both text and author” (1989: 23). In a similar vein, Jucker et al. qualify and stuff as a ‘vague category identifier’ through which a speaker can indicate that “the thought she has in mind is more complex than is being expressed and to appeal to the listener to construct the relevant members of the set evoked” or “to maintain the pace of the conversation” (2003: 1748–1749). In terms of metaphor research it is interesting to see how an incomplete abstract thought is materialized as a concrete object. Other examples of vague words that are also potentially metaphorically used are the general noun thing (e.g. “one thing and another” and “next thing I know”) and vague quantifying expressions such as a lot of and a load of (e.g. “Erm it’s a load of rubbish anyway”). Note that in the example one thing and another, MIPVU would code another as place filler for the ellipted noun thing, which is metaphorically used (see Chapter 2 Paragraph 5); it is an example of implicit metaphor. Another generally non-problematic example for MIPVU can be found in the use of demonstratives. Speech participants often refer anaphorically to previously shared topics within the conversation with that and this replacing the full referent. This creates a sense of cohesion within the text (Halliday & Hasan 1976). An example can be found in the following extract in which Jill, Rachel, Lee and an unidentified speaker continue their ‘food’ conversation. Lee decides he ‘fancies’ a particular kind of food and asks the other speech participant whether he/she would like the same:

Example (3) KNR-fragment01

Lee: Speaker: Lee: Speaker: Lee: Speaker:

Ooh I fancy that, shall I nip round there and get some? 〈unclear〉 That was 〈unclear〉 〈pause 6 secs〉 Do you want any? 〈unclear 7 secs〉 Oh no I just wanted to know if you wanted any that was all. 〈unclear 6 secs〉

Chapter 4. Metaphor identification in conversation 

In this example, the word that in that was all refers back to the antecedent ‘I just wanted to know if you wanted any’. The basic meaning of that is sense description 2 in Macmillan: ‘the one that you are looking at (SPOKEN)’. The contextual meaning is sense description 1: ‘the one that is known about’. The basic and contextual meaning can be contrasted (being known is not necessarily being visible) and understood in comparison with one another (both elements seem close either in space or time) and therefore we are dealing with a metaphorically used word. Note that the dictionary specifies the basic meaning as being characteristic for ‘spoken language’ (as is the case for previous Example (2)). There is no problem in taking this specific register-related meaning as the basic meaning, for spoken discourse is often argued to be the most basic of all registers (Chafe 1994; Clark 1996; Fillmore 1981; Halliday 1978). Halliday proposes that “[i]t is natural to conceive of text first and foremost as conversation: as the spontaneous interchange of meaning in ordinary, everyday interaction. It is in such contexts that reality is constructed, in the microsemiotic encounters of daily life” (Halliday 1978: 40). It therefore seems only natural for the basic meaning of that (pointing to a space or location) to find its roots there. A final unproblematic example for MIPVU can be found in the coding of the recurring use of spatial relationships to refer to temporal relationships (e.g. Clark 1973; Lakoff & Johnson 1980, 1999; Traugott 1978). In experimental work on metaphor processing, Gentner, Imai and Boroditsky argue that space-time mappings are “pervasive across cultures in artefacts such as clocks, time-lines, drawings, and musical notation” (2002: 537). In everyday language, the mapping manifests itself as a directional time-line, “to capture the sequential order of events” (2002: 538). This is illustrated by the following example from KNR-fragment01, in which Sarah talks to Lee about ordering food.

Example (4) KNR-fragment01

Sarah: When you phone your order and they say twenty past they really mean twenty past don’t they. I mean we arrived at quarter past and she said, Oh you’re a bit early. Lee: Mm. Sarah: They just don’t like even attempt to make it before then really do they?

Analysts had no trouble identifying the metaphorical use of the prepositions in this extract. For past, the basic meaning is sense description 3 in Macmillan: ‘further than a particular place along a road, path, river etc’. The contextual meaning is sense description 1: ‘after a particular time’. The basic and contextual meaning can be contrasted (time and space are two distinct domains) and understood in comparison with one another (a particular time may be seen as a particular position) and therefore we are dealing with a metaphorically used word.

 A Method for Linguistic Metaphor Identification

A similar analysis applies to at (basic sense in Macmillan: ‘used for stating where someone or something is, at a particular place’; contextual sense: ‘used for stating when something happens’) and before (basic sense in The Longman dictionary of contemporary English online: ‘in front of someone or something’; contextual sense in Macmillan: ‘earlier than a particular time’). In all of these cases the tight spatial connection between the prepositions seems to function as a systematic source domain for the abstract dimension of time. As argued by Gentner, Imai and Boroditsky, “[p]ossibly it is the degree of interdependency amongst the meanings of these terms that enforces system-level consistency in these conventional metaphors” (2002: 561). Since MIPVU is only concerned with the linguistic level of metaphor identification, there is no need to further address the significance for processing at this stage. Incidentally, note that the first sentence in this extract contains the expression a bit. This is an example of a polyword: a multi-word expression analyzed as one lexical unit in the BNC on the grounds that it is one grammatical unit designating one specific referent in the discourse. Macmillan also recognizes its specific grammatical function as an adverb or pronoun. We therefore have no problems in not considering bit as a possibly metaphorically used word. To conclude, casual conversation contains the smallest number of metaphorically used words of the four registers. Unproblematic cases for analysis by MIPVU include nicknames that may affectively compare human beings to, for example, animals. In general, however, most metaphorical expressions are less specific, comparing general abstract concepts such as ‘discourse’ and ‘time’ to concrete space through the use of general, vague nouns (stuff and thing), demonstratives (this and that) and prepositions (past, at and before). None of these phenomena presents great difficulties for the application of MIPVU. 4.3 Challenges to MIPVU The above examples have shown cases of metaphorically used words straightforwardly identified by MIPVU. Inter-analyst agreement is high because of this easy application of the method. That does not mean, however, that all challenges for metaphor analysis are now met. In order to pinpoint these specific challenges it is essential that analysts are clear about their annotation decisions. The most important issues for conversation will be illustrated below. 4.3.1 Problems with identifying the contextual sense Because conversation is highly interactive and speech partners often rely on context in its broadest definition to understand each other, determining the correct

Chapter 4. Metaphor identification in conversation 

interpretation of their speech is not always easy. Metaphor analysts would ideally like to take a peek inside speech participants’ minds in order to draw the right conclusions. Then again, a listener partaking in an on-line conversation is confronted with these same challenges. The difference is that listeners have access to prosodic features, such as length, loudness, pitch, prominence, and voice quality (Cruttenden 1986), which add various kinds of semantic and attitudinal information to the meaning of an utterance. Another source of information is the use of gesture through which speech participants show their intentions and attitudes, their aims, goals and interest, and their emotion (McNeill 1992; Kendon 2004). From a neurolinguistic perspective, Hagoort and Van Berkum argue that “knowledge about the context, concomitant information from other modalities and the speaker are immediately brought to bear on utterance interpretation, by the same fast-acting brain system that combines the meanings of individual words into a larger whole” (2007: 809). Different methods for data collection and transcription of spoken language (Edwards & Lampert 1993) can be adopted that enable different degrees of access to paralinguistic and extralinguistic features. The extracts from the BNC-Baby consist of speech transcriptions that encode a small number of paralinguistic elements such as voice quality, non-verbal but vocalized sounds (such as coughs), significant pauses, phenomena of speech management (truncation, false starts and corrections), unclear passages and overlap within speech turns. Other information consists of the first name, age, social role and local origins of the speech participants, the year in which the recording was made, and a minimal description of the location where the conversation takes place (e.g. ‘at home’) and the activity the speech participants are engaged in (e.g. ‘relaxing’ or ‘having breakfast’). Overall, however, the speech context is unknown and has to be retrieved as fully as possible from the text alone; the original recordings are not available to the analyst. This lack of multimodal information does not have to be a disadvantage, though, since it offers analysts the chance to strictly concentrate on what happens at a linguistic level. At a later stage, new analyses may combine linguistic manifestations with paralinguistic elements into a more encompassing multimodal model. At the same time, however, analysts may be forced to deal with sometimes artificially ambiguous situations due to lack of sufficient contextual knowledge. As a result, analysts occasionally can only rely on a hunch about the intended meaning, and multiple interpretations are possible at the same time. Take for example the following sentence from KB7-fragment57. In this extract Stuart (33) and Ann (46) are driving a car and are passing the house of a mutual acquaintance. Stuart would like to show Ann the exact location. (Incidentally, again note that besides the significant lexical units in italics, this example also contains the polyword by now in the final utterance, which is treated as one lexical unit by MIPVU;

 A Method for Linguistic Metaphor Identification

see Chapter 2 Paragraph 2.1. Moreover, BNC makes the transcripts anonymous by excluding the last or full name of speech participants.)

Example (5) KB7-fragment57

Stuart: Would you like to see Pat (last or full name)’s house? Ann: Okay Stuart: Show you Pat’s house. 〈pause 6 secs〉 If you look to your left coming into the junction 〈pause〉 say I presume, I don’t know whether, they may even have moved now. See that one, oh you see the one where the trees are? Ann: Yeah Stuart: And the front wall 〈pause〉 that sort of set back one? Ann: Yeah. Stuart: That’s Pat 〈pause〉 you can see it just there, look Ann: Yeah Stuart: oh the one with the 〈pause〉 that’s Pat (last or full name)’s house. That’s if he still live there as I say. Should think they’ve moved by now. 〈pause〉 That’s where he used to live anyway.

In this extract, analysts ran into trouble analyzing the very last occurrence of demonstrative ‘that’. The first four appearances (See that one, that sort of set back one, That’s Pat, and that’s Pat’s house) are all unmistakably non-metaphorically used: the speaker specifically points his listener to the concrete location of Pat’s house by using the literal verbs see and look. The fifth appearance of that (That’s if he still live there as I say) can easily be classified as metaphorical, on the basis of the following arguments. The dictionary has adopted ‘that is’ as a phrase ‘used when explaining more clearly what you have just said’ or ‘used when you are going to correct something you have just said’. The basic meaning of that is ‘the one that you are looking at (SPOKEN)’. The contextual meaning is ‘the one that is known about’. The basic and contextual meaning can be contrasted (being known is not necessarily being visible) and understood in comparison with one another (both elements are close either in space or time) and therefore we are dealing with a metaphorically used word. The final occurrence of that, however, in That’s where he used to live anyway, is less straightforward. Two interpretations are possible: either (1) that is interpreted as being metaphorical, because it refers back to the information uttered by the speaker earlier on, or (2) that is interpreted as being non-metaphorical, because the speaker is still pointing or looking at the actual place. Deciding which interpretation works better is complicated by the context. Although at first the speaker obviously looks at the house (that = “Pat’s house”), the location becomes more and more part of the discourse message (that = “where I said Pat’s house is”). This

Chapter 4. Metaphor identification in conversation 

gradual change towards the speech event as topic is manifested by the words “as I say” in the second sentence of Stuart’s last speech turn. The problem of deciding whether the lexical unit refers to the immediate spatial situation or the abstract discourse is typical of demonstratives, since they are often used as a substitute for a more specific referent (a whole noun or verb phrase). They function as cohesive as well as economic devices (Biber et al. 1999; Halliday & Hasan 1976). In cases where both a non-metaphorical and a metaphorical interpretation seem valid and a lack of situational knowledge disables disambiguation, we choose to signal the possible metaphorical use of the word. MIPVU offers the code WIDLII as a means to code such systematically ambiguous cases (see Chapter 2 Paragraph 3.1). Difficult cases can thus be collected and identified. Analysts can afterwards decide to include or exclude these WIDLIIs for final analysis. Apart from lack of situational knowledge, another problem is posed by the real-time nature of on-going conversation (and our dependency on taped recordings). This results in all sorts of dysfluencies: people interrupt each other, are distracted, cannot find their words, or simply cannot be understood by the transcriber. This leads to incompleteness of utterances and an even greater lack of contextual knowledge for the analyst. The corpus contains many incomplete utterances of the following kind:

Example (6) KB7-fragment45: colleagues Deidre (44) and Ann (46) discuss being drunk

Deidre: I know, yeah Ann: I’m not keen. Deidre: No I’ve had 〈unclear〉 Ann: Ooh! Ooh! 〈pause〉 That’s close! 〈pause 10 secs〉 Here 〈pause 6 secs〉 Only me about me getting drunk, as if I would dear! Stuart: When? Ann: Anytime 〈pause 6 secs〉 But I remember bloody you coming in on a Sunday morning sometimes you, bloody can hardly 〈unclear〉 the bed! Deidre: Ha ha

Example (7) KB7-fragment45: colleagues Ann (46), Jean (57) and Deidre (44) talk about working the trussing line

Ann: Ann’s had, Merv, John, Teresa, Paul and Vi’s 〈pause〉 all trussing 〈pause〉 done the rest the of my trussing, I’ve done the Char Grill, I’ve done Welcome Break, now they’re on the Tesco. I’ve only got a thousand Sainsburys to do and we’ll be all done by Jean: Nine o’clock

 A Method for Linguistic Metaphor Identification

Ann: or not long after 〈pause〉 so why have they been doing that if there’s nothing much to do? Why is John, 〈unclear〉 and Ann and 〈pause〉 Paul and Teresa and all them been on that trussing line, and Merv Deidre: Probably trying to get ahead for that 〈unclear〉, one of those 〈unclear〉. Ann: Oh yeah, but I wonder if we’re gonna all finished by nine o’clock and you’ll have us all in your ha ha 〈pause〉 ha

Example (8) KBW-fragment01: Dorothy (34) and her son Tim (3) discuss rolling up their sleeves before cleaning out a fishbowl

Dorothy: Tim: Dorothy: Tim: Dorothy: Tim: Dorothy: Tim: Dorothy:

Example (9) KBD-fragment21: colleagues Barry (41) and Alan (38) discuss the work environment

Barry: But more in, but 〈pause〉 what it was 〈trunc〉 we 〈trunc〉 when I was in here, sort of 〈pause〉 five days a week, nine to five job Alan: Mhm mm. Barry: it was getting 〈trunc〉 depres 〈trunc〉 it was getting oppressive in that little office down there. Alan: Oh! I would have thought, yeah. Awful! You’d get a

You want to have your sleeves rolled up? Yeah er 〈unclear〉 got this shirt on. Yeah well we’ll just undo the cuff. Yeah. And roll it up. 〈pause〉 And there we are. And this one. Is 〈unclear〉 doing it? Well that’s 〈pause〉 we need Christopher to do it as well don’t we? Yeah. Now whatever you do don’t drip it all over me stereo.

These incomplete utterances contain unclear stretches, pauses, and truncated words, which are coded as such by the BNC (〈unclear〉, 〈pause〉, or 〈trunc〉). Stri kingly enough, such dysfluencies often follow low-content words such as delexicalized verbs (e.g. I’ve had in Example 6 and it was getting 〈trunc〉 depres 〈trunc〉 in Example 9), prepositions (you’ll have us all in in Example 7), and demonstratives (Well that’s in Example 8). These are all highly grammaticalized words that generally introduce lexically more salient information (cf. Chafe 1994). To deal with these unclear cases, the following guidelines have been adopted in MIPVU. If an utterance is unfinished because of unclearness, a pause, or a truncated word, those lexical units in the incomprehensible part of the incomplete utterance that may or may not be potentially metaphorically used are discarded for metaphor analysis. In other words, these specific words are ignored in our total word count since we do not have enough information to complete the utterance. Even though it might be possible to guess the end of a word or utterance, we do not want to add our own interpretation. In Example (8), for instance, Dorothy replies

Chapter 4. Metaphor identification in conversation 

to Tim’s previous utterance with Well that’s and then pauses. She seems to respond to an idea previously uttered by her son and the demonstrative pronoun that may therefore be interpreted as metaphorically used to refer to an idea instead of a concrete object. This, however, is speculation. In Example (10), Barry describes his office situation as getting 〈trunc〉 depres 〈trunc〉 and restarts by saying it was getting oppressive in that little office down there. What he perhaps meant to say was that the room became depressing, that it was depressive, or perhaps something else related to the verb depress (which may become metaphorically used since its literal concrete sense concerns ‘pressing something down’). This, again, is speculation. In effect, we do not count such a word as a lexical unit, but discard it, adding the code DFMA (Discard For Metaphor Analysis) (see Chapter 2 Paragraph 3.1). In practice, it turns out that this guideline had to be adopted for less than 1% of the lexical units in conversation (for more information, see Chapter 10). Even when utterances are completed, there are cases where the content remains so unclear that the analyst is not at all able to determine its meaning. Apart from lack of context, this may be the result of the many topics and subtopics a conversation can include, and the many speech participants simultaneously taking part in a conversation. Take the following example from KNR-fragment01 in which the students Lee (23), Rachel (19) and Jill (21), who are all friends, are discussing a TV programme they would like to take part in:

Example (10) KNR-fragment01

Lee: Rachel: Jill: Rachel: Lee: Rachel: Unknown speaker: Lee: Unknown speaker: Lee: Unknown speaker: Rachel: Unknown speaker: Lee:

Wheyey. 〈pause 6 secs〉 We should go on that? Why? Why 〈unclear〉 Yeah but you wouldn’t get chosen Yeah we would. You’d set it up and like they’d choose someone else or something. 〈unclear 6 secs〉 I would. 〈unclear 7 secs〉 Cos the other two are burks and you wouldn’t. 〈unclear 16 secs〉 How old is she? Fifty? 〈unclear 16 secs〉 Cheek on what?

The extract shows that a fourth, inaudible speaker joins the conversation and seems to converse with Lee alone as the discussion continues. It is not clear whether the topic changes and Lee’s final comment “cheek on what” seems completely random. Creative speculation may lead to an analysis of cheek as an expression of disrespect (‘I’ve had enough of your cheek’ or ‘Pure cheek on my part’). However, this does

 A Method for Linguistic Metaphor Identification

not directly follow from the previous utterances and the context is too unintelligible for plausible interpretation. Therefore, cheek and on, which might normally at least be candidates for an ambiguous case of metaphorical use, are here discarded for metaphor analysis. A final example of a dysfluency is when a speaker seems to make a mistake and thereby makes the utterance incomprehensible. Take the following example from the KCV-fragment42 transcript in which Katherine (57) and her husband Patrick (56) are talking to their German friend and student Stefan (25):

Example (11) KCV-fragment42

Patrick: Stefan: Katherine: Stefan: Katherine: Stefan: Katherine: Patrick: Stefan: Patrick: Stefan: Patrick: Stefan: Patrick:

You don’t have a coffee to be going on? Hmm? with the soda They’re terrible tense No just 〈pause 20 secs〉 Vaison doesn’t like sultana. Do you know that? What does he..? He picks all out Oh! Isobel wants He doesn’t like cinnamin goping 〈/sic〉 either Isobel wants to go to Brazil Yes But well I mean it’s rather stupid for how long Three weeks Oh! But can Vaison take the three weeks holiday?

This conversation exhibits many different topics including holiday plans (going to Brazil) and food (e.g. having coffee or disliking sultanas). Stefan’s words cinnamin goping seem to apply to neither. The BNC codes this as an error on the part of the speaker (by adding the code ‘sic’). The unintelligibility may be brought about by Stefan’s German accent (assuming that he has one). One interpretation of the expression could be that ‘he doesn’t like cinnamon topping’ or ‘cinnamon coffee’ (staying within the topic of food). But again, this would be based on sheer speculation. The words that are unclear can therefore not be taken into account for possible metaphorical use and are discarded for metaphor analysis. 4.3.2 Problems with identifying the basic sense The previous chapter on news texts already demonstrated how our main tool for the identification of contextual and basic senses, the Macmillan dictionary, sometimes conflates senses for practical purposes. This also occurs for senses that are saliently distinct in the conversation register. Take, for example, the word bet, very often used in spoken discourse as a confident way of referring to a possibility.

Chapter 4. Metaphor identification in conversation 

In the following extract Gordon (61) and Audrey (61), husband and wife, are sitting inside talking to each other. At a certain stage they look outside and wonder who has left the gate to their house open:

Example (12) KBC-fragment13

Gordon: Somebody’s 〈unclear〉. Audrey: Pardon? Gordon: Left the gate open. Audrey: Did you? 〈pause 6 secs〉 I wonder who’s that Gordon: I’ll check. Has a leaflet come through. Audrey: Yes. I bet there has. Tell you what, the Conservative candidate’s not been round has he? Gordon: No he daren’t Audrey: 〈laughs〉

The entry for I bet in the Macmillan dictionary can be found under the verb bet as a fixed phrase I bet/I’ll bet SPOKEN: ‘used for saying that you understand or agree with what someone has just said’. As a verb, however, Macmillan offers only one general sense description: ‘to risk an amount of money by saying what you think will happen, especially in a race or game. You lose the money if you are wrong and win more if you are right’. We have to take this as the basic sense, because it is the only sense. As a run-on (a specific use of the main word added to a sense description of that word in the dictionary), or fixed expression, the dictionary offers ‘bet (that): He bet me £20 that I couldn’t keep quiet for ten minutes’ and then adds the following specification: ‘a) be betting on something; to have a very strong hope that something will happen, so that this influences what you do: House buyers were betting on interest rates continuing to fall’. This loosely corresponds to the more abstract contextual sense of I bet in (12): no racing or games are necessarily involved here. Although both meanings of the main word differ, Macmillan does not treat them as sufficiently distinct to provide two sense descriptions, opting for one sense including fixed expressions and specifications rather than a polysemous analysis. Based on this particular dictionary entry, we cannot therefore treat bet as possibly metaphorically used. This is when we have to turn to the Longman English Dictionary of Contemporary English. It helps us deal with cases where Macmillan does clearly acknowledge different applications of a word (in this case bet), but has not given these descriptions their own demarcated sense descriptions, whether for reasons of economy or space, or to help learners see patterns and connections between senses. This often happens for different uses of a lexical unit where the basic distinction between domains is mainly that of abstract versus concrete use. Sperber and Wilson (2008) would refer to this kind of phenomenon as ‘loose talk’, with ‘money betting’ as

 A Method for Linguistic Metaphor Identification

the strict sense of the word bet and ‘having a strong hope’ as the more general sense. It would be an example of ‘category extension’, which “involves extending a word with a relatively precise sense to a range of items that clearly fall outside its linguistically-specified denotation, but that share some contextually relevant properties with items inside the denotation” (2008: 91). It must be stressed, therefore, that the second dictionary is not used as a means for obstinate analysts to enforce their own intuitions; rather, it is used to investigate recurrent patterns of conflation within Macmillan. In the present case of bet, the description in Longman indeed looks different. Longman offers as a first sense description: ‘to risk money on the result of a race, game, competition, or other future event [gamble]’. As a second separate sense description it reads: ‘I bet, I’ll bet spoken (a) used to say that you are fairly sure that something is true, something is happening etc, although you cannot prove this, (b) used to show that you understand or can imagine the situation that someone has just told you about, (c) used to show that you do not believe what someone has just told you.’ In effect, concrete and abstract ‘betting’ (typically used in conversation) are distinguished. We therefore allow ourselves to overrule the conflation of Macmillan and analyse the lexical unit as a metaphorically used word. It is important to note, however, that had Longman agreed with Macmillan, we would have treated the conflation in both dictionaries as evidence for the fact that both senses are sufficiently similar and cannot on that basis be considered metaphorically used. The opposite problem also exists. This involves cases where all distinct dictionary entries stay within the same general semantic domain and cannot therefore be compared but are closely related by means of another kind of relationship. An example is all right in KCV-fragment42 in which Katherine, Patrick and Stefan are negotiating the best place to spend their holidays: (13) Patrick: Lyon will be alright then the 〈name〉 might be all right but 〈name〉 right over on the Italian branch it might be all right mmm yes

Note that all right is spelt in different ways in this transcript (alright and all right). Since the Macmillan dictionary assigns both spellings the same definition (identifying the first one as generally considered incorrect) this does not affect the analysis. Moreover, all right is considered a fixed multi-word expression by the BNC and is therefore always analyzed as one unit. Misspellings that do affect the analysis, however, will always be analyzed according to their spelling in the dictionary (see Chapter 2 Paragraph 2.2). In this example we have a general contextual meaning (sense 1: ‘satisfactory or fairly pleasant, but not excellent’), which may be contrasted with a more human oriented sense (sense 4: ‘not hurt/ill’). Since both senses are, however, based in human experience (both have to do with feeling and evaluation) it seems odd to prefer the one over the other as more basic. At the

Chapter 4. Metaphor identification in conversation 

same time, both share a more general sense, also based in human experience, of a situation going well (sense 2: ‘going well or happening successfully’). Because this overarching general sense exists and the other senses do not seem distinct enough, we would rather treat this as a case of generalization without metaphor in any of its different senses. The register of casual conversation can be distinguished from the more restricted register of formal written texts in that there is plenty of room for spontaneous language use and jokes. It also differs in directly relying on interaction between different speech participants with different backgrounds. It seems harder to establish one basic level of language interpretation against which all their utterances are evaluated. This sometimes creates difficulties in the application of dictionary entries for analysis. The following extract is from a previous speech context: Dorothy (34) and her sons Tim (3) and Christopher (5) are cleaning out a fishbowl. Suddenly, Dorothy invites the fish to take part in the conversation:

Example (14) KBW-fragment01

Dorothy: Are we going to do these fish? Tim: I’ll go and use the bucket. Dorothy: Well I’ll come and help you Tim cos we don’t want it falling all over the place. 〈pause〉 Whoops. Careful careful. Oh that’s the empty one. That’s alright. Here’s the full one. Tim: And and this is the empty one. Dorothy: Ooh! Tim: Now, this is. Have you cleaned up? Dorothy: 〈laughs〉 〈pause〉 Hallo Bertie. How are you? 〈unclear〉 Christopher: Mum, do fish like people? Dorothy: Do fishes like people? Don’t know, what do you think? Tim: Mummy Dorothy: Hallo Bertie Edward. Do you like us?

In this example, analysts had difficulties handling the italicized words you and like. The main questions to be asked here are: should fish and people be compared and contrasted, which may result in metaphorical use of both you and like? Or are the domains of fish and people similar enough to make them non-metaphorical? A related question is: metaphorical for whom? Should the text be analyzed relative to speech community norms (the way language is generally used) or relative to individual background knowledge (a child may perhaps be more/less aware of the human-animal distinction) (Cameron 1999: 114–115). The Macmillan dictionary offers the following definitions for you: (1) ‘used for referring to the person or people that you are talking or writing to’ and (2) ‘used for referring to people in general’ (italics added). In the case of you the Macmillan

 A Method for Linguistic Metaphor Identification

specifically refers to ‘people’. A second opinion from the Longman dictionary leads to the same result. Judging strictly from the dictionary definitions it could be argued that this is a case of novel metaphor (metaphor that is not yet mentioned in the dictionary) in which the fish is personified. On the other hand it seems reasonable for a dictionary not to specify all the living creatures a pronoun may refer to and to simply mention people as the most salient referent. This is a general pattern found in the Macmillan and Longman dictionaries, also for verbs of motion. In the case of like Macmillan offers the following definitions: (1) ‘to enjoy doing something or to feel that someone or something is pleasant or attractive’ and (2) ‘to prefer to do something in a particular way or prefer to have something done in a particular way’. Contrary to you, in this case no specific actor is mentioned. Again, a second opinion from Longman leads to the same result. Judging by the dictionaries, like could therefore simply be non-metaphorical and applicable to both human beings and animals alike. This interpretation seems to be supported by Christopher’s question Mum, do fish like people? in which he treats the fish as similar to a human being. The description of like in the dictionary, however, seems emotionally loaded and typically human (see enjoy and prefer). Once more, the question is whether animals and human beings can be contrasted and compared. In other words, do animals share human properties or not? In sum, word play, and, more specifically, the fine line between human and animal activity, may challenge the analyst to make difficult decisions for which the dictionary cannot help. Since both interpretations seem equally defensible, the decision here relies on other arguments. Which application of the dictionary definition is adopted should be clearly noted. In our research, we decided to annotate both you and like as personification of the fish, and hence metaphor. Our criterion for metaphor analysis across registers in general is the accepted norms for general language users. This does not take into account specific interpretations for individual users or particular groups of language users, like children. Children, and their parents interacting with them, often create metaphors involving, for example, the personification of animals and trees. As argued by MacKay (1986), “[p]ersonifications enable children to use familiar (person) concepts to understand other, less familiar (nonperson) concepts—the basic purpose of metaphors” (1986: 96). The created metaphors may perhaps be non-metaphorical to the young language user, but they may come across as potentially metaphorical to the general language user, which is why they should be retained as metaphor-related words. Since the sense description in the dictionary poses questions as to its generalizability (to both human and other animate beings) we annotated the words as metaphorically used on the basis of

Chapter 4. Metaphor identification in conversation 

the principle of WIDLII—our group of ambiguous cases that can be collected as potentially metaphorical. 4.3.3 Problems with comparing contextual and basic sense – metonymy One of the inevitable problems for metaphor analysis involves its relationship with metonymy. In cognitive linguistics metaphor is a mapping across two similar but contrastive conceptual domains, whereas metonymic mappings remain within one conceptual domain. Although this seems to be a clear distinction, many different theories exist concerning their relationship. According to Radden (2002), for example, metonymy and metaphor are located along a so-called “literalnessmetonymy-metaphor continuum” (2002: 409). This is elaborated on by Geeraerts (2002), who argues that if such a continuum exists there will also be in-between cases (2002: 454). A recent discussion of metonymy and metaphor proposes the idea that in fact, both metaphor and metonymy are distinct parameters which can be present at the same time: Similarity and contiguity are in fact two independent scales. […] In some cases, the degree of contiguity between the domains is more prominent and relevant than the degree of similarity; these are clear cases of metonymy (e.g. Feyaerts 2000). In other cases, the reverse situation obtains, similarity between the two domains or spaces or categories or concepts being more salient than degree of contiguity (e.g. Warren 2002). There are also cases where the values of the two parameters are approximately equal, so that it is possible to see a conceptual or semantic relation as either or both metaphoric and metonymic. (Steen 2007: 59)

Within conversation there are many cases where both the degree of contiguity and the degree of similarity are simultaneously present, often with a seemingly stronger tendency towards metonymy. Examples include phrasal verbs, delexicalized verbs and ‘creative uses of language’ such as idiomatic expressions and proverbs (Carter 2004). The highly metonymic nature of these expressions can sometimes lead to difficulties in contrasting contextual and basic senses. The following paragraphs will discuss these expressions in more detail. As a rule, phrasal verbs are treated by MIPVU as single lexical units that are similar to multi-word units (see Chapter 2 Paragraph 2.2). They typically have non-decomposable meanings designating one action, process, state or relation in the referential dimension of the discourse. As the Pragglejaz Group admits, however, “difficulties arise for less clear-cut cases, where the meaning of the phrasal verb is more transparently related to its components” (2007: 26). An example of this is the phrasal verb come on in the continuation of KBW-fragment01:

 A Method for Linguistic Metaphor Identification

Example (15) KBW-fragment01

Dorothy: Now I think we’ll, that needs washing out doesn’t it, that bridge? It’s pretty 〈pause〉 mucky. Right, put the dirty water in this bucket then with the bridge. Christopher: I want that please. Dorothy: What the jug? Christopher: Mm. Dorothy: I think you’d be better with a cup myself. Christopher: No I want the Dorothy: No come on cos he might go and get the fish in the jug and then we’ll be a mess.

Come on is analyzed as one lexical unit following the general rule in MIPVU concerning phrasal verbs. The most concrete and therefore basic sense of the phrasal verb come on is meaning 2: ‘to arrive on a stage’. As candidates for the contextual sense there are two possibilities, both mentioned as characteristic of spoken language: either ‘used for telling someone to hurry’ or ‘used for encouraging someone to do something such as make a greater effort or stop being sad’. Dorothy may urge Christopher to quickly move along, but she may also try to make him stop whining for the jug. Whereas the first candidate for contextual meaning may still involve moving from one place to another and can be seen as an extension of the basic sense (metonymy), the second interpretation presents a metaphorical relationship through which physical and emotional ‘movement’ are contrasted and compared. Which interpretation to choose is no easy matter and may lead to disagreement between analysts. The literal interpretation is highly prominent, since the speech participants find themselves together in the same space, as their bodily presence attests. Because we are in doubt about the contextual meaning and one of the two interpretations does lead to a metaphorical interpretation, we include the lexical unit as a possibly metaphor-related word by marking it as an ambiguous case (WIDLII). Note again that this provides a way to systematically collect less clear-cut cases. It also makes them easily retraceable for analysis in order to get a better sense of the problem at hand. Besides phrasal verbs, casual conversation contains many idiomatic expressions based on a metonymic derivation. These idioms can serve different goals in conversation. Moon (1998) presents a text-based account of English fixed expressions and idioms based on an analysis of the Oxford Hector pilot Corpus and the Bank of English. She distinguishes their informational (e.g. wide awake), evaluative (e.g. the icing on the cake), situational (good morning, put a sock in it), modalizing (e.g. in the short run) and organizational (e.g. by the way) functions, acknowledging that most idiomatic expressions serve more than one purpose at the same time. Wray (2002) highlights their function for spoken interaction as a “reduction of the speaker’s processing effort, the manipulation of the hearer (including the

Chapter 4. Metaphor identification in conversation 

earer’s perception of the speaker’s identity), and the marking of discourse struch ture” (Wray 2002: 101). Drew and Holt identify the use of idiomatic expressions to summarize previous topics and convey personal stance and to indicate topic boundaries in telephone conversations (1988; 1998). All agree that physical, social and interactional context influence the kinds of idiomatic expressions used. Idiomatic expressions appear in different forms and have different degrees of transparency and fixedness. Moreover, idiomatic expressions typically consist of clauses or even sentences. For these reasons, analysts need to decide whether they will code the idioms as one lexical unit or as a stretch of text with decomposable components that can all be annotated for metaphorical use. According to Sinclair’s idiom principle (1987; 1991) idiomatic expressions are single choices made on the part of the speaker that restrict the range of possible choices existing within an open choice model. In other words, they are fixed phrases simplifying the production of language. Both Moon and Wray, however, acknowledge the difficulty of establishing whether idioms are composed of active or inactive components for language users; in other words, whether or not idioms are completely conventionalized and lexicalized. Wray therefore sees idiomatic expressions as ‘fluid’ rather than ‘set’ (2002: 57). “[B]ecause most, if not all, idioms are decomposable to some extent for speakers”, the Pragglejaz group (2007: 27) decided to treat each component of an idiom as a separate lexical item. MIPVU adopts the same approach for the same reasons. However, the different gradations of decomposability and transparency do influence metaphor analysis. An example of a highly transparent conventional expression is the following, taken from KB7-fragment10, in which Ann, Jill and Stuart inspect a house they may be interested in buying and talk to the current owners about their relationship with the neighbours:

Example (16) KB7-fragment10

Unknown: But they are very very friendly. Very friendly they are. Other unknown: Mm Unknown: I suppose some people might say to them oh you’re being nosy but then that doesn’t bother me because you don’t tell them what you don’t want them to know. Jill: No. Stuart: That’s right. Jill: But it’s nice because if ever you’re poorly then it’s in your welfare isn’t it? Unknown: That’s right yeah Jill: They keep an eye Unknown: They do tend to keep an eye on me anyway. Jill: Neighbourhood watch 〈laughs〉.

 A Method for Linguistic Metaphor Identification

The expression keep an eye on can be found in the Macmillan dictionary as a fixed phrase under ‘eye’. It is defined as: ‘to look after someone or something’. Similar to the phrasal verb come on, the metonymic base of the expression keep an eye on is rather prominent: people care for others by closely examining what they are doing. This metonymic interpretation is reinforced by there being a body part included in the expression. And, once again, the on-line context of face-to-face conversations makes the speakers themselves bodily present. Moon (1998) describes the ambiguity of body language idioms for similar expressions (e.g. take a deep breath) where “the literal action implies the metaphorical meaning” (1998: 184). As a result literal and metaphorical interpretations are hard to separate. On a similar note, Cameron (2007) points out the notion of ‘symbolic literalization’ for words that may seem literal (or not metaphorically used), but have through discourse dynamics actually acquired a more symbolic, metaphorical function (2007: 208–211; examples include expressions such as sit down and talk). Context should help in deciding whether the expression is metaphorically used or only metonymically. We are in agreement with Goossens (2002): the main point here is that underlying the metaphor there is an awareness that the donor domain [or source domain] and the target domain can be joined together naturally in one complex scene, in which case they produce a metonymy, of course. The actual contexts into which these items fit will be decisive for the interpretation as either a metonymy or a metaphor from metonymy, with, of course, a fuzzy area where it is difficult to decide which of the two is the more relevant interpretation (2002: 366–367).

The contextual meaning, ‘to look after someone or something’ does not necessarily imply that literal watching is involved. This literal meaning, however, is basic to the metaphorical meaning. For this expression, MIPVU therefore annotates all the significant words (keep, eye, on) that may or may not be metaphorically used as potentially metaphorically used and marks them as an ambiguous case (WIDLII). This solution also applies to the expression there we are in the previously seen KBW-fragment01, cleaning out a fishbowl:

Example (17) KBW-fragment01

Dorothy: Tim: Dorothy: Tim: Dorothy: Tim: Dorothy: Tim: Dorothy:

You want to have your sleeves rolled up? Yeah er 〈unclear〉 got this shirt on. Yeah well we’ll just undo the cuff. Yeah. And roll it up. 〈pause〉 And there we are. And this one. Is 〈unclear〉 doing it? Well that’s 〈pause〉 we need Christopher to do it as well don’t we? Yeah. Now whatever you do don’t drip it all over me stereo.

Chapter 4. Metaphor identification in conversation 

Once more, the metonymic relation between the contextual sense and the spatial environment causes problems for analysis. The contextual sense of the expression there we are in the Macmillan dictionary is included as a fixed phrase under there: ‘used when giving something to someone or when you have done something for someone’. In this context there can either refer to a non-metaphorical ‘place’ (sense description 2), namely the current position of the rolled-up sleeve, or to ‘what has just happened’ (sense description 6), in this case the event of rolling up the sleeve. Although the context, in which both speakers are bodily present, highly activates a non-metaphorical interpretation, the ‘symbolic’ interpretation is equally valid. Our procedure marks there as an ambiguous case (WIDLII). Note that the combination of delexicalized verb plus event, have your sleeves rolled up, presents a similar problem. In this case it is clear, however, that the ‘having’ refers to an action. In other words, the action of rolling up the sleeves is turned into a physical possession and have is therefore unambiguously metaphorical. An example of an idiomatic expression where the metaphoric and metonymic values are more equally present is the following, taken from KBH-fragment04. In this extract shop owner Pauline (30+) is complaining about the work ethics of her employees. Adam (36) and another unknown friend try to find a solution:

Example (18) KBH-fragment04

Pauline: He’s got the gall to phone up on Tuesday and say Eileen can’t do a day, we’re going to be short of Eileen Adam: So the answer is to just phone them Pauline: I can’t do it Unknown: I’m not doing it. Adam: What’s the point of carrying on like that? Pauline: Well he’s going to carry on like that isn’t he? Not unless someone puts their foot down.

The contextual sense of put your foot down is included in Macmillan under foot, meaning: ‘a) to refuse very firmly to do or accept something’ and ‘b) British to drive much faster’. In this case the metonymic derivation from ‘refusing to move your feet’ towards ‘refusing to accept something’ is, again, rather transparent. However, the actual metonymic act of ‘remaining where you are’ does not have to be acted out literally for the idiom to be applicable. Although the context, in which the speakers are bodily present, highly activates the non-metaphorical basis of the expression, a symbolic interpretation is inevitable. In other words, the metaphorical or ‘symbolic’ parameter has become more salient than the metonymic parameter. In this case, then, the annotation decision is easily made and all the significant lexical units (someone put foot down) are coded as metaphorical words. One final example of an idiomatic expression that complicates annotation is the following from KB7-fragment10. In this extract Jill, Stuart and an unknown

 A Method for Linguistic Metaphor Identification

speaker have inspected some houses they may be interested in and are reporting their evaluations:

Example (19) KB7-fragment10

Unknown: You’d got the toilet there 〈pause〉 and behind the door, I you had to sort of squeeze yourself and shut the door, and behind the door was a shower. Jill: You’re joking. Stuart: 〈unclear〉 Unknown: No shower curtain mind you. Jill: Even so no room to swing a cat Unknown: Terrible

The Macmillan dictionary defines not enough room to swing a cat as ‘used for saying that a room is very small and there is not enough space to live comfortably in it’. The basic meaning seems less clear-cut. It could be literally applicable if you like throwing domestic animals. The original meaning, however, apparently refers to a sailor’s practice of swinging the so-called cat o’ nine tails, a legal instrument of punishment in the navy. Since the dictionary does not provide the origins of such a fixed expression, it is difficult to decide whether, to the contemporary language user, the expression could count as metaphorically or non-metaphorically used. We are dealing with “a loss of the metonymic basis as a result of the conventionalisation process” (Goossens 2002: 374). Moon (1998: 198) ranges this class under fixed expressions that involve exaggerations and implausibility: they may describe theoretically possible situations, but are untrue in the specific context. Therefore, they lean more towards a metaphoric interpretation. Lakoff and Turner (1989) use the metaphor ‘generic is specific’ to make sense of proverbs that describe particular situations to understand a greater and more general category of situations. Because of its semi-proverbial and therefore potentially metaphorical nature, however, we are coding all the content words within the expression as metaphorically used at a linguistic level. All of the above instances are metaphorical, but highly metonymic at the same time. This metonymic quality is even more prominent because of the situational and spatial context in which the words are uttered. The on-going, situated and multimodal nature of conversation emphasizes and underlines the bodily and concrete metonymic basis of many expressions, contributing to the idiom’s transparency. As a result, it is sometimes difficult to judge whether metonymy and metaphor are equally present or whether the basis for metaphor is simply too weak. Whenever a metaphoric interpretation seems plausible, MIPVU includes the lexical units as metaphorically used, either as clear-cut or borderline cases (WIDLII).

Chapter 4. Metaphor identification in conversation 

4.4 Conclusion When people speak many things happen at the same time; apart from simply conveying information they also show personal stance and constantly monitor the context of their conversation. Within conversation a relatively small number of words are metaphorically used. Examples of these are nicknames (pet), cases of vague language use (such as thing and stuff), and demonstratives (that). The latter two typically serve to turn discourse into space, making topics tangible. At the same time, they serve the interactive purpose of quickly referring to a topic without providing the listener with superfluous information. Problems for metaphor analysis are often a result of a lack of contextual knowledge. Utterances are either unclear, unfinished or aborted and the analyst is left to fill out the remainder (in our case, without any access to the original recordings). Since this often relies too heavily on guesswork MIPVU includes ambiguous cases (which could arguably be used metaphorically) for analysis and discards those cases where lack of context does not allow for interpretation. These are assigned the codes WIDLII and DFMA respectively. Another problem may arise from the use of the dictionary. At times Macmillan conflates senses which may in fact be used as two distinct ones. In these cases we turn to Longman for a second opinion. The use of the dictionary may also prove difficult in more specific cases of language use, as when humorous utterances compare animals to people (recall the mother and child talking to a fish) and both a metaphorical and a non-metaphorical use are equally justifiable. In these instances, analysts should steer one clear course, always explicitly noting their decision. MIPVU allows for inclusion of these possibly metaphorical words by annotating them as ambiguous cases (WIDLII). The biggest problems for metaphor analysis turn up, however, when the distinction between metaphor and metonymy is involved. This happens, for example, with phrasal verbs, but also in more creative language use such as idiomatic expressions. Since metaphor and metonymy are often present at the same time, it is sometimes impossible to distinguish the one from the other. This metonymic nature of many expressions is foregrounded by the speech participants’ physical presence. Deciding whether or not an expression is metaphoric enough to be coded is not always straightforward. In these cases, MIPVU again allows for ambiguous cases to be included for analysis by using the code WIDLII. In this way, MIPVU enables annotation for quantitative research. After such a quantitative linguistic analysis, qualitative linguistic analysis should reveal whether all uses of metaphor are the same, and whether more specific conditions may be identified in which metaphors occur. Corpus research has shown

 A Method for Linguistic Metaphor Identification

that metaphors in spoken discourse often appear after so-called ‘tuning devices’ (Cameron & Deignan 2003) or hedges (Poos & Simpson 2002) such as sort of, kind of and like. Cameron points out that metaphors in classroom talk tend to cluster (Cameron 2003). Drew and Holt established the use of idiomatic expressions to manage discourse (1988, 1998). Cameron (2008a, b) identified shifts, redeployment and rephrasing of metaphor. Moreover, the kind of environment speakers find themselves in (a classroom or a more informal conversation), the topic of the conversation, the number of speakers partaking, and the relationship between speakers are all factors that may contribute to a different pattern of metaphor use within one register. Finally, it should be interesting to draw a more complete picture of metaphor in casual conversation by including other modalities, such as gesture, intonation, and the surrounding context for analysis. As pointed out, the lack of such a major part of the context is one of the reasons why transcripts can be difficult to understand. Moreover, metaphor created through the interaction between language and gesture or gesture alone (Cienki & Müller 2008) may increase the number of metaphors found in casual conversation. A similar argument may be made for tone of voice and intonation. Starting from the text alone, however, does allow a close examination of the linguistic component of a message, enabling a focus on one mode at a time and thereby highlighting the manifestation of metaphor in language.

chapter 5

Metaphor identification in fiction 5.1 Introduction The study of metaphor in literature has a long and rich tradition, going all the way back to Ancient Antiquity when Aristotle claimed in The Poetics that: [i]t is a great thing, indeed, to make a proper use of these poetical forms, as also of compounds and strange words. But the greatest thing by far is to be a master of metaphor. It is the one thing that cannot be learnt from others; and it is also a sign of genius, since a good metaphor implies an intuitive perception of the similarity in dissimilars. (Poetics 22, c. 335 BCE)

Aristotle makes a distinction between ‘ordinary words’ and ‘unfamiliar terms’, the latter group consisting of ‘strange words, metaphors, lengthened forms, and everything that deviates from the ordinary modes of speech’ (emphasis added). Though Aristotle encourages the use of metaphors and other unfamiliar terms because they will ‘save the language from seeming mean and prosaic’, he warns that a statement consisting only of metaphors will be a ‘riddle’, one that consists only of strange words, a ‘barbarism’. The notion of ‘deviance’ remains central to literary scholars working within the Formalist tradition, such as Mukařovský (1970), Nowottny (1965), Leech (1969; 2008) and Short (1996). In this tradition, metaphors are considered to be a form of linguistic deviation at the semantic level that is used to create foregrounding effects. Leech (2008) stresses that these deviations in literature are ‘unique’ and ‘meaningful’ rather than ‘unmotivated aberrations’ (2008: 16). He argues that the use of such deliberate forms of linguistic foregrounding defamiliarize our experience, since ‘by the standards of the accepted code (i.e. “literal meaning”) a literary metaphor is a semantic absurdity’ (2008: 21). This point is also made by Short (1996), who claims that metaphors become nonsensical and illogical when taken literally. Mukařovský (1970) points out that poetic language is characterized by the systematicity and consistency of its foregrounding effects; Leech (2008) notes that it is primarily the number and importance of the ‘deviant features’ that characterize literature. Tsur (1987) argues that metaphorical expressions exploit semantic features to create literary effects, and distinguishes between metaphors with a

 A Method for Linguistic Metaphor Identification

‘split’ or ‘integrated’ focus: a split focus occurs when ‘attention is focussed on the incongruence between meanings or relationships’, an integrated focus when ‘attention is focussed on their concordant aspects’ (1987: 7). He claims that metaphors with a split focus are typically perceived as ironical or witty, while metaphors with an integrated focus tend to be perceived as emotional or even elevated (1987: 7). What all of the above scholars have in common is that they consider the metaphors found in literature to be superior to the metaphors outside literature; metaphors found in ordinary language are deemed lesser derivations of the unique and creative metaphors in poetic language. As Leech (2008) puts it: ‘[i]n making choices which are not permissible in terms of the accepted code, the poet extends, or transcends, the normal communicative resources of the tongue’ (2008: 30). The same notion of creative genius is present in Lakoff and Turner’s (1989) More than cool reason. Lakoff and Turner argue that poets create novel and original metaphorical expressions from the same conceptual metaphors that underlie conventional metaphorical expressions found in everyday language. They discuss four techniques – extending, elaborating, questioning, and composing – that are frequently employed by poets and conclude that poetic metaphor is more interesting than conventional metaphor, since ‘poets lead us beyond the bounds of ordinary modes of thought and guide us beyond the automatic and unconscious everyday use of metaphor’ (1989: 72). Yet contrary to the scholars discussed above, Lakoff and Turner are part of the cognitive metaphor tradition, which argues that metaphors are not in fact deviant and decorative but an indispensable tool in both language and thought. In Metaphors we live by (1980) Lakoff and Johnson claim that ‘[o]ur ordinary conceptual system, in terms of which we both think and act, is fundamentally metaphorical in nature’ (1980: 3). They show that metaphors are ubiquitous even in ordinary language and argue that conventional metaphorical expressions such as ‘Your claims are indefensible’ and ‘He attacked every weak point in my argument’ (1980: 4) are linguistic realizations of the underlying conceptual metaphor ARGUMENT IS WAR. According to Lakoff and Johnson, such conceptual metaphors structure our perception and behaviour and form an essential part of our culture. This approach therefore sees the metaphors in everyday language as primary and the metaphors in literature as creative and novel exploitations of the same underlying conceptual structures. Semino and Steen (2008) stress that even though these two approaches, which they classify as the discontinuity and the continuity approach (2008: 233–238), are hard to reconcile, each of them provides useful insights into our understanding of metaphor in literature, and we need to take into account ‘both the unique characteristics of particular uses in context, and the way in which particular uses relate to conventional patterns that may reflect shared cognitive structures and processes’ (2008: 244). They point out that studies of metaphor in literature have mostly been

Chapter 5. Metaphor identification in fiction 

idiographic, relating to specific texts, authors or genres. They also emphasize that we should distinguish between the use, function and effect of metaphor (see also Steen 1994; Steen 2007). People may expect literature to contain more metaphors due to its aesthetic function, but this expectation need not correspond to actual usage. It may be that people are more aware of the metaphors in literature, or that the metaphors found in literature are more prominent or noticeable than those in other genres and registers. For example, studies carried out by Steen (1994) on journalistic versus literary metaphors revealed that the literary metaphors were considered more difficult, positively valued, impolite and unbiased (1994: 202). As pointed out by Semino and Steen (2008), metaphors in literature may be different from those outside literature because of their properties and distribution, because of the way they are treated by authors and/or readers, or because of an interaction between these two parameters (2008: 243). The current chapter will focus on the identification of metaphorically used words in contemporary fiction. All texts were taken from the BNC-Baby corpus. Here follows a list of the selected texts: BMW: Folly’s child. Tanner, Janet. Century Hutchinson, London (1991). H9C: The prince of darkness. Doherty, P C. Headline Book Publishing plc, London (1992). J54: The divided house. Raymond, Mary. F A Thorpe (Publishing) Ltd, UK (1985). CFY: My beloved son. Cookson, Catherine. Corgi Books, London (1992). CCW: Crackdown. Cornwell, Bernard. Michael Joseph Ltd, London (1990). CDB: A fatal inversion. Vine, Barbara. Viking, London (1987). AC2: Man at the sharp end. Kilby, M. The Book Guild Ltd, Lewes, East Sussex (1991). CB5: Ruth Appleby. Rhodes, Elvi. Corgi Books, London (1992). AB9: Death of a partner. Neel, Janet. Constable & Company Ltd, London (1991). FPB: Crimson. Conran, S, Penguin Group. London (1992). BPA: The titron madness. Bedford, John. Dales Large Print, Long Preston, N. Yorks (1984). C8T: Devices and desires. James, P D. Faber & Faber Ltd, London (1989). FAJ: Masai dreaming. Cartwright, J. Macmillan Publishers Ltd, Basingstoke (1993). FET: Still life. Byatt, A S. Penguin Group, London (1988). G0L: The Lucy ghosts. Shah, Eddy. Corgi Books, London (1993).

The selected fiction excerpts were taken from novels that were classified by the BNC as belonging to the “imaginative” domain. Though there is by no means a consensus on the definitions of ‘literature’, ‘novel’, ‘narrative’ and ‘realism’ within literary and linguistic research, these theoretical issues are beyond the scope of the

 A Method for Linguistic Metaphor Identification

present chapter. What is more important is that our fiction fragments were not selected because of the metaphors they contain. As mentioned above, metaphors in literature have mostly been studied in specific texts, by specific authors or in specific genres that were selected precisely because they use metaphors in an interesting or unusual way. This does not, however, entail that such uses of metaphor are representative of metaphor in literature or fiction in general. Since the fiction texts in our corpus were selected blindly and randomly, we did not have any presuppositions about the kind of metaphors we would find. This approach will therefore hopefully allow us to see both what is typical of specific texts and what most or all of them have in common. The results of the reliability test (reported in detail in Chapter 8) corresponded to our experience during the annotation and discussion process, namely that the application of MIPVU to fiction texts is generally unproblematic. The mean unanimous agreement for fiction was 92.7%. The mean percentage of lexical units that were unanimously coded as not related to metaphor was 83.4%. The average percentage of lexical units that were unanimously coded as related to metaphor was 9.4%. Those problems that did arise were mostly related to either general difficulties in applying MIPVU to natural data or to specific characteristics of the fiction texts. The fiction texts in our corpus contain fewer metaphorically used words than academic discourse and news texts but more than spontaneous conversations. One interesting issue for further analyses from a discourse-analytical perspective will therefore be to see where the metaphors in fiction occur, and whether fiction is situated between news and conversations due to the fact that it often contains a mixture of narrative and dialogue. When direct speech is presented in novels, authors may try to mimic real-life conversations as much as possible. This would entail that the more dialogue a text contains, the more it may resemble conversations rather than narrative prose. Fiction can of course also mimic other registers and genres, such as news reports, diaries, letters, and so on; it is therefore essential that analysts take into account the nature of the excerpts that have been included in the analyses. In the current chapter examples will be used to illustrate issues relating to the identification of metaphor in fiction. 5.2 Straightforward application of MIPVU The following example illustrates the straightforward application of MIPVU in fiction; italics have been added for the reader’s convenience.

(1) “What do you know about it?” Jenny asked. “You’ve never been in love. I know Matthew doesn’t want to get married – he once said marriage was a trap – but I know he loves me.” (J54-fragment08)

Chapter 5. Metaphor identification in fiction 

In this example, three lexical units have been analysed as metaphorically used, namely about, in and trap. Applying MIPVU to these three words yields the following results on the basis of Macmillan; these findings correspond with the application of MIP’s step 3a (For each lexical unit in the text, establish its meaning in context), step 3b (For each lexical unit, determine if it has a more basic contemporary meaning in other contexts than the one in the given context), and step 3c (If the lexical unit has a more basic current/contemporary meaning in other contexts than the given context, decide whether the contextual meaning contrasts with the basic meaning but can be understood in comparison with it): ABOUT (preposition) a. Contextual meaning: ‘concerning a particular subject’ (Macmillan sense 1) b. Basic meaning: ‘used for showing movement’ (Macmillan sense 3) and ‘used for saying where someone/something is’ (Macmillan sense 4) c. Contrast: Yes, the basic meaning is concrete and concerns physical motion and location while the contextual meaning is abstract and concerns discourse and knowledge. Comparison: Yes, we can understand discourse in terms of space. IN (preposition) a. Contextual meaning: ‘used for describing a particular state, situation, or relationship’ (Macmillan sense 7) b. Basic meaning: ‘used for showing where someone or something is; inside a container, room, building, vehicle etc.’ (Macmillan sense 1) c. Contrast: Yes, the basic meaning involves a concrete, physical location whereas the contextual meaning involves an abstract situation or state. Comparison: Yes, we can understand being in a particular state or situation in terms of being in a concrete location. TRAP (noun) a. Contextual meaning: ‘a bad or unpleasant situation that is difficult to change or escape from’ (Macmillan sense 2) b. Basic meaning: ‘a piece of equipment used for catching animals’ (Macmillan sense 1) c. Contrast: Yes, the basic meaning involves a concrete object whereas the contextual meaning concerns an abstract situation. Comparison: Yes, we can understand being in an unpleasant situation in terms of being caught in a concrete trap.

This analysis shows that for all three lexical units we can conclude that they should be marked as metaphorically used. There are, however, some noticeable differences between these three lexical units. For instance, in the case of the prepositions about and in, the target domain is not expressed. By contrast, we have a—relatively unique—classic A IS B metaphor

 A Method for Linguistic Metaphor Identification

with both the target domain (marriage) and the source domain (trap) present in marriage was a trap. Moreover, trap could be replaced by the target-domain equivalent marriage, and about could possibly be replaced by the target-domain equivalent concerning, but finding a replacement term is more difficult for “in love”. The important thing to note, however, is that it is still possible for us to say that in is metaphorically used in this sentence without there being any need at this stage of the analysis to pinpoint an exact target-domain equivalent, or to determine the precise nature of the underlying mapping. This is precisely one of the strengths of MIP and MIPVU: even if a metaphor is so conventionalized that there simply is no other way to express its meaning, we can still show that there is a contextual meaning that is abstract and a basic meaning that is concrete and that these meanings can be contrasted and compared, which entails that the lexical unit has been used metaphorically. 5.3 Interesting issues As was mentioned before, the identification of metaphorically used words in fiction was predominantly straightforward. Interesting issues were mainly related to three specific features of fiction, namely (1) directly expressed metaphors, (2) character descriptions, and (3) personification. 5.3.1 Directly expressed metaphors The theoretical basis of both MIP and MIPVU is the global assumption of a oneto-one correspondence between words, concepts and referents. The words on the page evoke concepts, and these concepts in turn designate referents in the projected text world. Metaphorically used words are then identified on the basis of referential incongruity: words are marked as metaphorically used when they activate concepts that indirectly designate their presumed referents, which is then to be resolved by a comparison with the more appropriate referent. It should be noted that this is a technical reconstruction: as readers, we are typically not aware of this process when encountering conventional metaphors, and it is debatable whether we always understand them by accessing the corresponding source domains. An example of such a conventional metaphor is Example (1) mentioned in the previous paragraph, where in the expression ‘What do you know about it’ the word about technically relates to a concept that has to do with movement or location, which in turn does not directly designate its intended referent in the text world (which has to do with relations to abstract subjects). Yet one frequent form of metaphor in fiction is directly expressed metaphor, which occurs in the form of similes, analogies and other non-literal comparisons

Chapter 5. Metaphor identification in fiction 

(see Chapter 10 for number of occurrences in the different registers). These expressions are considered “direct” since the words on the page activate concepts that refer directly to their referents in the text world, that is, the source domain terms are used directly (“literally”) at the linguistic level. An identification procedure that is based on detecting indirect lexical meaning—such as MIP—cannot deal with such manifestations of metaphor in discourse, for in simile there is no formal incongruity. However, direct expressions of metaphor do introduce a new local source domain that has to be incorporated into the surrounding targetdomain discourse (Steen 2007). This phenomenon is illustrated by the following two examples. Italics have been added to signal the simile; only the words of the simile are taken into account in our analysis here.

(2) Sara was undressed and ready for bed but Jenny was fully clothed, moving about the room in her harlequin dress like some angry restless dragonfly. (J54-fragment08)

(3) “He’s like a favourite old coat.”

(J54-fragment08)

If we apply steps 3a, 3b and 3c of MIP to dragonfly and coat, we obtain the following analysis (sense descriptions have again been taken from the Macmillan dictionary). DRAGONFLY (noun) a. Contextual meaning: ‘an insect with a long narrow brightly coloured body and two pairs of transparent wings’ (monosemous in Macmillan) b. Basic meaning: ‘an insect with a long narrow brightly coloured body and two pairs of transparent wings’ (monosemous in Macmillan) c. Contrast: No. The contextual and basic meaning cannot be contrasted since they are the same. Comparison: No. The contextual meaning is identical to the basic meaning. COAT (noun) a. Contextual meaning: ‘a piece of clothing with long sleeves that you wear over your other clothes when you go outside’ (Macmillan sense 1) b. Basic meaning: ‘a piece of clothing with long sleeves that you wear over your other clothes when you go outside’ (Macmillan sense 1) c. Contrast: No. The contextual and basic meaning cannot be contrasted since they are the same. Comparison: No. The contextual meaning is identical to the basic meaning.

Both dragonfly and coat evoke concepts that directly designate their referents in the text world. This means that for both dragonfly and coat the conclusion of MIP

 A Method for Linguistic Metaphor Identification

would be that these lexical units are not metaphorically used since there is no contrast or comparison between the contextual and basic meaning. However, it is also clear that both lexical units do set up a cross-domain comparison—dragonfly between a person and a dragonfly, coat between a person and a coat—by introducing an incongruous local referent into the discourse. The difference between these similes and the above-mentioned example of ‘What do you know about it’ is that the words dragonfly and coat themselves are not used indirectly. The cross-domain mapping occurs in conceptual structure, and is expressed directly at the level of linguistic form. Moreover, such directly expressed metaphors are often—though not necessarily—explicitly signalled by the use of words such as like, as, seem, appear, etc. (for an extensive overview see Goatly 1997). Though such lexical units are not metaphorically used themselves, they are related to metaphor. In order not to lose these directly expressed metaphors and their signals, MIPVU has extended MIP to be able to take these other manifestations of metaphor in discourse on board. In MIPVU, lexical units that directly express metaphors are coded as being Metaphor-Related Words (MRW) of the type “direct”, while indirectly expressed metaphors as identified by MIP are coded as being Metaphor-Related Words of the type “indirect”. The lexical signals that often occur together with directly expressed metaphors are also included; they are coded as being Metaphor-Related Words of the type “flag”, which indicates that the word “flags”, or signals, a metaphor. There are, however, other issues that crop up as a result of the inclusion of similes and other direct forms of metaphor. The first issue is that of scope, that is, which words should be regarded as belonging to the simile. If we take Example (2), then it is clear that the linguistic simile is like some angry restless dragonfly. However, the question arises whether all of the words inside the expression do in fact belong to the source domain; that is, do angry and restless belong to the source domain of dragonfly, or do they belong to the target domain of Jenny, or possibly even to both domains simultaneously? This kind of layering of source and target domains could be argued for angry and restless in Example (2), where angry and restless may be seen to be part of the conceptual domain of humans. This projection of human behaviour and emotions onto the dragonfly would entail that angry and restless could be conside red as being metaphorically used inside the simile in relation to dragonfly. Since MIP and MIPVU are primarily concerned with the linguistic level of metaphor analysis, not the conceptual level, one way to resolve this issue is by taking syntax and punctuation into account. In this case, ‘angry’ and ‘restless’ occur after the metaphor flag (‘like’) introducing the source domain as premodifiers to the most important word in the simile, i.e. ‘dragonfly’. Thus, ‘angry’ and ‘restless’ would be coded as belonging to the linguistic simile and therefore as part

Chapter 5. Metaphor identification in fiction 

of the source domain (Metaphor-Related Words of the type “direct”). Comments can then be added to the annotations that ‘angry’ and ‘restless’ are potentially metaphorical within the simile. For a more extensive discussion of how to use punctuation, grammatical function, context and situation models to overcome diffi culties in deciding whether individual words are part of similes, see Kaal & Dorst (in press). Another issue regarding the analysis of similes is the question whether we are in fact dealing with a metaphorical rather than a literal comparison, since the word like can also indicate literal similarity. This has often been noted, for instance by Searle (1993), who speaks of ‘literal similes’ and ‘metaphorical similes’, and by Croft and Cruse (2004), who make a distinction between ‘similes proper’ and ‘statements of similarity’. The underlying problem is whether the domains that are being compared are in fact distinct enough to allow for classification as a mapping between domains. It is not always easy to draw the line between literal similarity and metaphorical similarity, especially when physical appearance or other sensory influences are involved. This is illustrated by the following example taken from one of the reliability tests (italics have been added to the relevant part).

(4) “Jenny, I don’t want to sound like an old auntie, but you are not being very sensible about Matthew.” (J54-fragment08)

During reliability testing, two analysts argued this was a literal comparison and two said it was a metaphorical comparison. Emphasis on the commonality between the speaker and any old auntie, that they are human, might make the comparison non-metaphorical; but focus on the different social roles of the speaker as a friend and old aunties as stereotypical givers of unwanted advice makes it possible to distinguish between two domains, where the speaker is conceptualized in another— metaphorical—role. This decision is further complicated due to the fact that the verb sound can be taken as a linking verb meaning ‘to seem good, bad, interesting, exciting etc. based on what you have heard, read, or know’ (Macmillan sense 1) or as a full verb meaning ‘to produce a sound’ (Macmillan sense 2). The question is then whether actual sound has to be involved, or whether the focus is on the content of what is being said. In any case, at the linguistic level we can conclude that these lexical units are potentially involved in a directly expressed metaphor. The exact nature of the mapping would have to be determined during subsequent conceptual analysis. At the conceptual level, this problem can potentially be resolved by turning to the idea of an “image mapping” (Lakoff & Turner, 1989), or rather a “sound mapping” in this case. Image mappings are metaphors based on comparison regarding

 A Method for Linguistic Metaphor Identification

physical appearance, as is also illustrated by the following examples from the fiction corpus (italics added):

(5) Delaney took risks, plummeting feet first through the hatchways, and partly breaking his descent with the handrails, falling like a parachutist, rolling instantly deploying his Uzi. (BPA-fragment14)

(6) You wouldn’t have recognized him, he looked like John the Baptist. (CDB-fragment04)

We can say that Example (5) involves an image mapping in which Delaney’s falling is compared to a parachutist’s falling, with parachutist introducing an incongruous local referent; the resulting cross-domain comparison between Delaney and a parachutist provides the reader with a vivid visualization of the manner in which Delaney was falling. Example (6) can be said to involve a cross-domain comparison between the target domain of modern people and the source domain of biblical people. The result of the comparison is an image mapping that yields a visualization of the physical appearance of the character, i.e. what his hair, beard and clothes look like. By having the reader retrieve this information indirectly via a comparison with a biblical person, the exact nature of his appearance and any inferences to be drawn about the attitude to be taken towards this appearance are left implicit. The methodological question remains, however, whether such examples are concerned with literal or metaphorical external resemblance, and whether the two domains are distinct enough to be compared. In such cases we follow Cameron (2003) in saying that two distinct and ‘incongruous domains’, however weak, should be considered as expressing a cross-domain mapping. By taking on board directly expressed forms of metaphor, the boundary between linguistic and conceptual analysis becomes somewhat blurred, since lexical units involved in directly expressed metaphors do not exhibit the same clear contrast between basic and contextual senses (cf. Kaal & Dorst, in press). Nevertheless, their topical incongruity can serve as a basis for deciding whether these lexical units are related to metaphor. During linguistic metaphor analysis, such examples can be retained as potentially related to metaphor, with the exact nature of the mapping to be determined during subsequent analyses. 5.3.2 Character descriptions We noted above that our fiction sample contained a relatively high number of non-metaphorically used words. This may in part be caused by the nature of much fiction, which aims to describe characters, their behaviour and environment as realistically as possible. This type of fiction may therefore have a lower number of

Chapter 5. Metaphor identification in fiction 

metaphorically used words since its content primarily describes concrete people, animals and objects and concrete actions and surroundings. Though news texts and academic discourse also employ a ‘realistic’ presentation style, the content in news texts is often concerned with abstract social, political and economic relations, influences and consequences, while academic texts often deal with highly abstract topics and theories. Not all realistic fiction is necessarily less metaphorical, of course, but when the main aim of a text is to describe what certain characters look like, where they are, and what they are doing, then it should not come as a surprise that many of the words used for these descriptions will refer directly to the referents in the text world. Leech and Short (2007) mention two main features that play an essential role in the style of realistic fiction, namely verisimilitude and credibility. Verisimilitude concerns the aim of realistic fiction to give us the illusion that we are in the ‘presence of actual individual things, events, people and places’ (2007: 126), while credibility pertains to ‘the likelihood, and hence believability, of the fiction as a “potential reality”, given that we apply our expectations and inferences about the real world to fictional happenings’ (2007: 127). Lodge (1977) categorizes prose as essentially metonymic in nature and poetry as essentially metaphoric in nature, with the addition that realism is at the metonymy end of the scale while Romanticism and symbolism are at the metaphor end. With regard to realistic fiction he argues that: [w]e would expect the writer who is working in the metonymic mode to use metaphorical devices sparingly; to make them subject to the control of context – either by elaborating literal details of the context into symbols, or by drawing analogies from a semantic field associated with the context. (1977: 113)

The above observations can be illustrated by comparing the following character descriptions from the fiction corpus:

(7) Claudia Cohn-Casson is sitting under an enormous African fig tree nearby, at a camp table spread with a Somali cloth. She has in front of her a typewriter, her recorder and her notes. A servant is preparing lunch on a fire. Claudia hears something. She looks up into the sky in the direction of Ol Doinyo Lengai. (FAJ-fragment17)

(8) Paula’s bag and shoes were patent black leather, her gloves white, and she carried a long walking umbrella neatly furled in its fur-trimmed case. Perfectly groomed from head to toe and with all that assurance, she was ready to take on the world, Arlene thought with satisfaction, for she looked on Paula as her very own creation. The raw materials might have been there before indeed, hadn’t it been she, Arlene, who had spotted them? But the transformation of a leggy young filly into a sleekly beautiful racehorse had been her doing. (BMW-fragment09)

 A Method for Linguistic Metaphor Identification

In the first fragment we are only presented with physical descriptions of the character and her surroundings. In this fragment no words were coded as metaphorically used. In the second fragment, however, we see a shift from physical description and observation to mental processes and understanding. This shift corresponds to an increase in metaphorically used words towards the end of the fragment, such as raw materials, filly and racehorse. These examples illustrate that as soon as a text moves beyond a realistic physical description and seeks to describe mental states and psychological developments, to explain causes and consequences, or to provide personal insights or evaluations, then it is likely that the number of metaphorically used words will increase, since we are moving from the world of concrete, physical objects and actions into the realm of emotions, feelings and thoughts. Though there may of course be great differences in personal style between texts and authors, and some authors may prefer a more metaphorical style than others, we would speculate that in realistic fiction authors may try to avoid too complex a metaphorical style. Some may even play with a literal versus a metaphorical style, as has been observed for Salman Rushdie by Heywood, Semino, and Short (2002), or use different styles for different characters so that some characters will use many metaphors while others hardly any, as noted by Semino and Swindlehurst (1996) in the case of One flew over the cuckoo’s nest. Leech and Short (2007) also point out that it is important to realize that ‘whenever a writer uses language, he seizes on some features of “reality” which are crucial for his purpose and disregards others’ (2007: 122). Writers can never describe everything completely; they have to decide which information to include and which to leave out. This means that they have to consciously select how to describe people and places by foregrounding certain features while ignoring others. To enhance the illusion of reality, writers often try to create an image that is as vivid and rich as possible. This ties in with the issue of descriptive focus and the choice between descriptions of concrete physical properties or abstract mental and social properties. Since physical features are often linked to mental or emotional features in character descriptions, blurring the boundary between concrete and abstract, it is sometimes hard to decide what the exact contextual sense of a lexical unit is. These difficulties are particularly striking in the analysis of prepositions, as is illustrated by the following example from one of the reliability tests (numbers between brackets indicate the number of analysts out of a total of four that marked the word as related to metaphor).

(9) Even in (4) physique they were very much alike, both being thick in (2) the shoulders and almost of the same colouring […]. (CFY-fragment01)

Chapter 5. Metaphor identification in fiction 

Here we have two physical descriptions, namely in physique and in the shoulders, yet the first in has unanimously been marked as metaphorically used, while the second in was marked by only two of the four analysts. Although the word physique itself clearly refers to a person’s physical make-up, all four analysts have apparently interpreted in physique as denoting an abstract state rather than a concrete location, so in here has the contextual meaning ‘used for describing a particular state, situation, or relationship’ (Macmillan sense 7) rather than its basic meaning ‘used for showing where someone or something is; inside a container, room, building, vehicle’ (Macmillan sense 1). Part of the difficulty lies in deciding in what way the concreteness of the noun following the preposition determines whether in is still used non-metaphorically. In this case in physique has been treated similarly to the following example from CFY-fragment01: (10) It was only when you looked at their faces that you saw the difference in (4) both age and expression.

Here difference in age clearly denotes an abstract state, not a physical location, but the coordinated noun expression – like physique – refers to an abstract concept that manifests itself physically: ‘a look on someone’s face that shows what their thoughts or feelings are’ (Macmillan sense 2). This would suggest that though in is clearly metaphorically used in relation to age, it is less clearly so in relation to expression. For in the shoulders in Example (9) this situation is even more complicated: shoulders is a physical location, but the expression in the shoulders does not seem to express a state of physical containment. This makes it hard to decide which contextual meaning of in we are dealing with. This is probably the reason why two analysts still chose to code in as metaphorically used. If we look at two further examples from the same text, we can see that finding a distinct contextual meaning can still leave room for slight differences in application: (11) There was now no difference in (4) their height although Harry was twenty-one and Joe sixteen. (12) […] and his grey eyes, which at times seemed colourless, had in (2) their depths a touch of melancholy that had deepened with the years.

For these uses of in, Macmillan offers a separate appropriate contextual meaning: ‘in length/width/height/area etc. used when showing measurements’ (sense 5). Yet apparently the two examples do not work exactly the same way, since in has

 A Method for Linguistic Metaphor Identification

received four votes in Example (11) and only two in Example (12). In Example (11) the analysts apparently agree that height denotes an abstract state of measurement, but in Example (12) they seem to have had difficulties in deciding whether this is concrete or abstract, though depth is equally abstract as height. This difficulty is most likely caused by the presence of the concrete concept eyes, which makes it difficult to decide whether in should be related purely to the abstract notion of depth or whether it metonymically applies to eyes and is therefore related to a physical location rather than an abstract state. This is further complicated by the presence of the abstract concept melancholy, since an abstract concept cannot be physically contained in something. If we compare this to one final example from the same text, we see that a meto nymic reading can sometimes take clear precedence: (13) Joe started when Harry’s elbow caught him in (0) the ribs as he said […].

Here the annotation of in apparently did not raise any problems and was taken to indicate concrete containment in a concrete physical location, even though this may not be complete three-dimensional containment. The methodological point these examples illustrate is that deciding whether these prepositions are metaphorically used is no simple matter. The problem is not automatically solved by determining whether the noun following the preposition is concrete, since shoulders is equally physical and concrete as ribs, and yet in the shoulders was marked as metaphorically used while in the ribs was not. This suggests that the concept preceding the preposition also plays a role, which would explain why difference in height is more evidently metaphorical than thick in the shoulders, which in turn is more clearly metaphorical than elbow in the ribs. However, melancholy in their [= eyes] depths was marked by only two analysts, signalling that even with abstract nouns preceding and following the preposition, the situation is not necessarily clear. This analysis shows that we cannot resolve this issue by solely consulting the sense descriptions in the dictionary; an analysis of the relationship between words, concepts, and referents in the text world is also required. This is of course precisely what the very first instruction of MIP aims to guarantee: ‘Read the entire text/ discourse to establish a general understanding of the meaning.’ Given that framework, the following observations may be formulated by way of conclusion: –– Since the preposition in in its basic sense denotes a spatial relation between concrete objects, it follows that both concepts would have to be concrete for a prototypical in-relation to exist. In the case of in the ribs there seems to be at least a sense of two-dimensional containment, indicating a scenario in which

Chapter 5. Metaphor identification in fiction 

one concrete object makes physical contact with or penetrates another concrete object and is therefore momentarily contained by that object. –– As for abstract concepts such as height, length, and depth, we must take into account that the abstract notions of measurement manifest themselves concretely via the physical objects they apply to. They are therefore often interpreted as being concrete via metonymic inferencing, though it is the object the measurement applies to that is concrete, not the measurement itself. This may entail that if the object the measurement applies to is mentioned explicitly in the preceding text (as was the case for eyes + depths), the metonymic interpretation of depth becomes so foregrounded that in their depths is interpreted as concrete; when in their depths is interpreted as in the depths of his eyes, the concrete containment sense becomes foregrounded as the contextual meaning. –– Naturally it is also relevant whether the perceived object is concrete, since he had a tear in his eye is more concrete than he had a touch of melancholy in his eye, though people may feel that both situations are visually perceivable. –– In order to streamline the annotation of such prepositions (and other words) that seem to be in a kind of no man’s land between concrete and abstract, MIPVU has added the additional code of WIDLII (‘When In Doubt, Leave It In’), as we have seen in previous chapters. To reiterate, this allows analysts to mark the preposition as metaphorically used while at the same time indicating that its metaphorical status is unclear, and that the application of the procedure was not straightforward for this particular lexical item. 5.3.3 Personification Within cognitive metaphor studies, personification is often mentioned in relation to the differences between metaphor and metonymy (i.e. Lakoff & Johnson 1980; Lakoff & Turner 1989; MacKay 1986; Low 1999) but its manifestations in natural discourse have not been analyzed systematically. During our annotations we found that personification can take many different forms, which differ in conventionality, referential function and interaction with metonymy (see also Chapter 3 with regard to personification in news texts). This section will explore the interaction between personification and character descriptions and between personification and linguistic form, and their influence on the linguistic identification of metaphor by means of MIPVU. Personification and character descriptions When it comes to the description of actions, writers can choose to ascribe the action to the character or to focus on

 A Method for Linguistic Metaphor Identification

a specific body part that is involved in the action, as is illustrated by the following examples. (14) Their tense, edgy faces watched Delaney closely.

(BPA-fragment14)

(15) They reached the main deck, dropping down in a defensive posture, eyes (BPA-fragment14) searching the stacked containers.

The first thing to notice is that these examples are highly metonymic, since faces are used for watching and eyes are used for searching. But metonymy is not a necessary condition for such descriptions: (16) His gaze came back to George, still sprawled over the control desk. (BPA-fragment14) (17) Paula’s stomach turned a somersault.

(BMW-fragment09)

In Examples (16) and (17) there is no such metonymy involved since gazes are not used for coming back and stomachs are not used for turning somersaults. In these cases the action is therefore more evidently attributed to the body part itself, not to the character to which the body part belongs. There is also a difference in conventionality and deliberateness between the two cases: although come back and turn both have conventional non-human sense descriptions in the Macmillan dictionary, the addition of somersault to turn makes it more creative and deliberate, causing it to have a stronger sense of agency. As soon as metonymy is involved, many analysts favour a purely metonymic interpretation and do not regard such examples as personifications. Yet Radden (2002) and Goossens (2002), among others, have pointed out the possibilities for “metaphor from metonymy” or “metaphtonymy”. And Steen (2007) and the Pragglejaz Group (2007) have stressed that finding metonymy does not exclude finding metaphor, since metaphor and metonymy are interacting and not opposing forces. In relation to his body part data, Goossens (2002) points out that many cases exploit a ‘double possibility’, namely metaphor from metonymy or metonymy only (2002: 357). The examples from the fiction corpus can also be seen as exploiting this double possibility of metonymy (body part standing for person) and metaphor (personification of body part). By attributing actions and qualities to body parts rather than to people, the narrative can be made more active and immediate, creating a kind of zooming-in effect. As Leech and Short (2007) point out, assigning agency to body parts can also be done to suggest that the body part is acting on its own, sometimes even against the wishes of the character. This may be connected with Hamilton’s (2002) study of the personifications of body and

Chapter 5. Metaphor identification in fiction 

mind in Auden’s poetry, where it is stressed that ‘more than mere metonyms for a person, the body and the mind for Auden here are individual personified beings at odds with one another’ (2002: 416). The analysis of (14) through (17) as personifications may seem strange in isolation, but these body part personifications are part of a larger pattern in fiction; conventionality, co-text and context can play an important role in their noticeability. This is in fact also true of related forms of personification in the other registers, such as This essay argues in academic texts (which can be used to avoid using the personal pronouns I and we) and The White House says in news texts (which can be sued to avoid assigning responsibility). In MIPVU the additional code “possible personification” was therefore adopted to make these instances of metaphor based on a possible personification interpretation easier to retrieve (see point 4 in Section 3.1, Chapter 2). Personification and linguistic form One specific type of personification can be used to illustrate the different linguistic forms in which personification manifests itself in natural discourse, namely the personification of nature. Leech and Short (2007) discuss this type of personification as a typical feature of literature and use a fragment from Hardy’s Return of the native to show that in such expressions as ‘The sombre stretch of rounds and hollows seemed to rise and meet the evening gloom in pure sympathy’ the italicised expressions suggest ‘the animation or the personhood of natural phenomena’ (2007: 159). Leech and Short note that the verbs ‘are the chief carriers of metaphor’ in this process and that ‘Hardy keeps one foot in the literal world by mixing metaphor with quasi-simile’; since words such as seemed, appeared, apparent, and imagined indicate that we may not be dealing with reality here, Hardy suggests that ‘the anthropomorphism may be only a “manner of speaking”’ (2007: 160). These two main observations correspond to what we have found in our fiction sample. Three examples of the personification of nature are given below. The first is an example of regular metaphor, the second of metaphor and quasi-simile (signalled by ‘seemingly’), and the third of metaphor and simile (signalled by ‘like’); italics have been added to the relevant parts. (18) The arrested water shone and danced.

(FET-fragment01)

(19) Leaves and yellow blossoms obscured the top of the window, while the bottom was covered by aggressive pink hollyhocks, seemingly determined to fight their (FPB-fragment01) way inside. (20) This sombre giant – like a defeated proud man – contrasts, when considered in the nature of a living creature, with the pale smile of a last rose on the fading bush in front of him. (FET-fragment01)

 A Method for Linguistic Metaphor Identification

The application of MIPVU to these examples brings some unforeseen difficulties to light. When we apply steps 3.1, 3.2 and 3.3 of MIPVU to Example (18) we find that the personification has been conventionalized in the dictionary: DANCE a. Contextual meaning: ‘if something dances, it makes a series of quick light movements’ (Macmillan sense 3) b. Basic meaning: ‘to move your feet and your body in a pattern of movements that follows the sound of music’ (Macmillan sense 1) c. Contrast: Yes, the contextual meaning is non-human while the basic meaning is human. Comparison: Yes, we can understand the movement of water in terms of the movement of a human being.

This means that we would conclude that the word dance has to be marked as metaphorically used. However, if we do the same with aggressive, determined and fight in sentence (19), we find the following sense descriptions in the Macmillan dictionary: AGGRESSIVE 1. behaving in an angry or rude way that shows you want to fight, attack, or argue with someone 2. someone who is aggressive is very determined to win or be successful a. used about plans or methods that are designed to do everything possible to succeed DETERMINED not willing to let anything prevent you from doing what you have decided to do a. [usually before noun] showing that you are determined to do something successful FIGHT 1. [intrans or trans] if people fight, they use guns or other weapons against each other 2. [intrans or trans] if people or animals fight, they hit, kick, or bite each other a. to hit someone as part of a sport, especially BOXING 3. [intrans] to disagree or argue about something 4. [intrans or trans] to try very hard to prevent something from happening or getting worse 5. [intrans or trans] to try in a very determined way to achieve something 6. [trans] to try very hard not to show a feeling or not to do something you want to do 7. [intrans or trans] to compete in order to win something or get something

Chapter 5. Metaphor identification in fiction 

These sense descriptions show that for each of these three words, the contextual meaning is essentially the same as the basic meaning, but that it is applied to a plant and not to a person. The contextual and basic meanings are distinct enough to be contrasted and compared because we are in fact dealing with a novel application of the basic sense based on a violation of selection restrictions. It is important to note, however, that the dictionary has not paid any particular attention to the specification of whether these senses are only or typically human. For fight the first sense description includes “if people”, the second “if people or animals”, but for aggressive the first sense description simply says “behaving” while the second says “someone who is”, and the description for determined makes no explicit reference to people, except perhaps implicitly by using “you”. This explicit mention of people may seem irrelevant at first glance, but it is in fact often central in deciding whether a lexical unit can be considered metaphorically used based on a personification. We cannot always rely purely on the sense descriptions to tell us whether a sense is human-only, human and animal, human and animate, or general. World knowledge, encyclopaedic knowledge, usage in context and corpus-based evidence can all be taken into consideration to help the analyst out. When lexical items are included as metaphorically used on the basis of such possible violations of selection restrictions, the MIPVU procedure allows us to add the code “possible personification”, and we may even add “WIDLII” if we feel that our decision to treat the basic sense as human is tentative. 5.4 Conclusion When it comes to fiction texts, the application of MIPVU to the data is generally straightforward. Due to the realistic nature of the texts in the sample, the excerpts are mainly concerned with describing characters, their surroundings and their actions. Few problems arise in establishing the basic and contextual meanings of the lexical units. As a result, the percentages for unanimous agreement in the relia bility tests were high for both words that are related to metaphor and words that are not related to metaphor. The difficulties that did occur were mostly linked to specific features of fiction, namely directly expressed metaphor, character descriptions, and personification. Since MIP functions on the basis of indirectly expressed metaphor, it cannot account for similes and other forms of directly expressed metaphor, since no contrast or comparison exists between the contextual and basic meanings. These metaphors work on the basis of referential incongruity by introducing a foreign source domain directly into the domain of reference of the discourse. MIPVU includes these forms of metaphor in discourse, leading to a distinction between

 A Method for Linguistic Metaphor Identification

Metaphor-Related Words (MRW) that are of the type “indirect” (metaphor as identified by MIP), and MRWs of the type “direct” and “flag”. The character descriptions discussed in this chapter show how the boundaries between concrete and abstract, physical and mental, can at times be blurred, making it difficult to establish the exact contextual meaning of the lexical unit involved (often a preposition). A situational and relational analysis can sometimes help pinpoint the contextual meaning, but fuzzy areas remain. In order to keep these borderline cases on board, MIPVU uses the code WIDLII (‘When In Doubt, Leave It In’) in its annotation system, allowing the analysts to retain individual cases as metaphorically used while signalling that the application of the procedure was not straightforward and the word’s status as metaphorically used is not clear. Finally, the attribution of actions and qualities to body parts rather than chara cters creates a type of personification that is based on selection restrictions and is also often highly metonymical. The examples of the personifications of nature in fiction show that personifications can take many different forms: they can be conventionalized or novel, based on separate senses or a violation of selection restrictions, and they can be expressed indirectly (i.e. metaphor as identified by MIP), directly (i.e. similes etc.) or by a mix between these two forms (i.e. quasi-simile). In order not to lose the body part personifications and other personifications based on selection restrictions, MIPVU has added the code “possible personification” to its procedure, which allows analysts to keep such cases in as metaphorically used while signalling their special status. This allows us to analyse them either together with or separate from the other forms of metaphor in discourse. The examples in this chapter also demonstrate that there is considerable interaction between the different manifestations of metaphor in discourse (direct versus indirect), the character descriptions and personification. The personification of nature often correlates with the use of simile and quasi-simile, requiring an awareness of topical incongruity and underlying conceptual structure. The personification of body parts requires additional attention to the role of selection restrictions and the influence of context, world knowledge and rhetoric. Nevertheless, the difficulties encountered when applying MIP to specific types of natural discourse can be overcome by extending the procedure in a systematic and clearly defined way, as has been done in MIPVU.

chapter 6

Metaphor identification in academic discourse 6.1 Introduction Today, metaphor is considered a necessity in scientific and educational discourse, where it is held to have organizing, theory-constitutive, educational, and persuasive functions (cf. Gibbs 1994; Boyd 1993; Cameron 2003; Gentner 1982; Gentner & Jeziorski 1993; Hesse 1966; Holton 1995; Lakoff & Johnson 1999; Low 2008a; Martin & Harré 1982; Ortony 1975; Semino 2008; Turner 1997). While the old, “objectivist” view of science regarded metaphor as distracting or even harmful (e.g. Hobbes 1651, quoted in Semino 2008: 131), the modern philosophy of science at least since Kuhn (1962) has assumed that the paradigms (or models) that underlie modern disciplines are partly structured by metaphorical means. Yet critical voices like Turbayne (1962) may still stress that the use of metaphors potentially lead to “redundancy and confusion” (212) in scientific thought, pointing out that metaphor has a “mythopoeic” function (see also Holton 1995). And works related to critical discourse analysis stress the persuasive (and at times misleading) power of metaphorical mappings in science and politics (e.g. Semino 2008, Chapter 4; Charteris-Black 2004 for societal and ideological factors of metaphor). These are opposite positions, which can be globally linked to an ambivalent view of metaphor in academic discourse throughout history, beginning with Plato, Vico and Nietzsche (see Debatin 1995). However, in contemporary psycholinguistics and artificial intelligence, analogical and metaphorical processes are taken to play a crucial role in scientific discovery (cf. Gentner 1982; Gentner, Bowdle et al. 2001; Gentner & Jeziorski 1993; Hoffman 1985; Sternberg & Davidson 1995; Holyoak & Thagard 1995; Sutton 1993) and the representation of scientific concepts (e.g. Gentner & Gentner 1983 for electricity, Gentner & Grudin 1985 for psychology). A number of studies unravel the role of metaphor in specific scientific disciplines, such as Brown (2003) for chemistry, Keller (2002) for biology, Lakoff and Núñez (2000) for mathematics, Leary (1990) for psychology, and Pulaczewska (1999) for physics. In linguistic work on metaphor in academic language and education (especially in the fields of discourse analysis and applied linguistics), there has lately been an increase in research on metaphorically used lexical items. A seminal

 A Method for Linguistic Metaphor Identification

applied-linguistic work is Cameron (2003), which examines educational discourse involving young learners. Low (2008a) gives a comprehensive overview of the role of metaphor in education, both on the “meta-level” of educational change and the level of natural educational discourse, paying special attention to second language teaching/learning (see also Boers 2003, 2004 and Littlemore & Low 2006 for metaphor and second language teaching/learning). Semino (2008) presents a comprehensive overview of metaphor in “scientific discourse”, both for specialist discourse (including the topic of popularization) and educational materials, supported by her own case studies. Low, Littlemore and Koester (2008) explore the distribution and functions of metaphor in three university lectures, concluding that in this sample linguistic metaphor is short, unconnected and functions primarily locally. General research on academic discourse has also seen a considerable increase over the past two decades (cf. Biber 2006; Flowerdew 2002; Hyland 2006; Peacock & Flowerdew 2001; Paltridge 2004), mostly motivated by applied concerns, which have been triggered by the internationalization of academic studies (e.g. Bruce 2008; Hyland 2009). Biber (2007) offers a comprehensive overview of the features of academic discourse, based on the “exploratory” approach of a multi-dimensional analysis of register variation. Halliday (2004) stresses the diversity of academic discourse, with a heterogeneous array of disciplines and sub-disciplines on the one hand and participants with divergent levels of expertise on the other. As a result, the researcher exploring “scientific discourse” is faced with significant specialization resulting in sociolinguistic variation. This, of course, has implications for claims about “academic discourse”, which should be carefully made (see also Flowerdew 2005 and Hyland 2000). In connection with metaphor identification in academic discourse, there are three themes that merit separate attention here. The first concerns simile. The important role that has been assigned to analogy in science and education has led researchers to expect simile-like expressions to be pervasive in academic writing and speech (cf. Low, in press). However, our own corpus-based research presented here suggests that this may not be the case (cf. Low 2008b, in press, Low et al. 2008). More research on similes in academic discourse is needed, especially within specific genres (such as textbooks) or disciplines (such as psychology). MIPVU offers a reliable and valid method for the identification of such cases of “direct metaphor”. The second theme concerns personification. A particular type of this class of metaphor seems to be characteristic of academic texts and may be closely tied to text management (cf. Cameron 2003; Low 1999). This type occurs when a nonhuman entity (referring to some discourse entity, such as a text) is the subject with a verb that requires a human agent. An example is argued in ‘Woolf ’s report

Chapter 6. Metaphor identification in academic discourse 

argued for an improvement in prison conditions’ (example from Macmillan English Learner’s Dictionary). MIPVU comprises a procedure for identifying such cases of personification in academic discourse. The third theme has to do with expectations about specific conceptual metaphors. Research on metaphor in academic discourse based on the Conceptual Theory of Metaphor (CTM) holds that a number of specific conceptual metaphors underlie much of academic discourse, such as ARGUMENT IS WAR (Lakoff & Johnson 1980, see Ritchie 2003 for a critical discussion), ENCODING MEANING IN WRITTEN TEXT IS SPEAKING (Cameron 2008c), or DISCOURSE IS SPACE (cf. Cameron 2003). CTM holds that lexical items like this, on, grounds, rests, and on indicate mappings like DISCOURSE IS SPACE in utterances like ‘This view, as we shall see, has been attacked on the grounds that it rests on the false assumption that […]’ (ECV-fragment05, humanities arts). Below we will show how MIPVU can be applied to the linguistic identification of such lexical items, proposing that conceptual metaphor identification is a separate step in the process of metaphor analysis. In the following sections we will present a number of cases, taken from different genres and fields of academic discourse. One fragment is from ‘The development of Darwin’s general biological theorizing’, a paper published in Evolution from molecules to men (CMA-fragment01); one fragment from the textbook Lectures on electromagnetic theory (FEF-fragment02); and one fragment on palaeontology from the chapter ‘Bringing fossils back to life’ in the textbook Fossils: The key to the past, a textbook on palaeontology (AMM-fragment01). Two smaller parts came from The mind at work, a textbook on ergonomics (CLP-fragment01), and from the monograph Principles of criminal law (ACJ-fragment01). In case an example stems from the texts used in the reliability texts (CMA, CLP, and FEF), coder agreement/ disagreement will be an additional source of information.

6.2 Unanimous agreement We will now turn to a number of illustrations of the unproblematic application of MIPVU. The first sentence stems from natural science (CMA-fragment01).

(1) This chapter surveys the development of his general biological theorizing over (CMA-fragment01, natural sciences) that remarkable early period.

It is typical of the lexis in academic discourse. For one thing, all lexical items are rather abstract and formal. Many belong to word fields associated with academic discourse (chapter, survey, development, biological, theorizing) or have a general or bleached meaning, such as the adjective general, the demonstratives this, that,

 A Method for Linguistic Metaphor Identification

and the prepositions of, over. Furthermore, the example contains five cases of conventional metaphor, which accords with the general trend. In the reliability test, the identification of metaphorically used words in this sentence was straightforward, with all metaphorically used items exhibiting unanimous metaphor identification. The metaphor-related words include two demonstratives (this, that), which cater to referencing in writing, an issue we shall return to below. The verb survey manifests a typical metaphorical contrast between a concrete everyday sense and an abstract, formal, academic one. The identification of the contextual meaning of this lexical item yields ‘[FORMAL] to study something’ (Macmillan’s sense no. 4, or ‘MM4’). In the next step we identify the basic meaning as ‘to examine an area of land in order to make a map of it’ (MM3). Since the latter is distinct from, but can be understood in comparison with, the contextual sense, survey is a metaphorically used word. There is a special feature of survey in this context. The tension between abstract and concrete is combined with a tension between non-human and human. That is, the contextual sense of survey has a selection restriction that requires a human agent in subject position, but this is violated by the appearance of a non-human agent. The dictionary provides an example included in the contextual sense (MM4): Professor Arens has surveyed a wide range of tribal cultures. It illustrates the semantic restriction of having to select a human agent for survey in the sense of ‘to study something’. This selection restriction is violated in our example sentence, and can be treated as a case of personification. Although personification of this kind seems to be rather typical of academic discourse, frequently being used for text management, there is a fine line between appropriate and inappropriate usage that cannot be transgressed without marking the language as stylistically deficient or conceptually unsound. Low (2005) shows that expressions like “this essay thinks” (which he relates to the conceptual mapping AN ESSAY IS A PERSON) are not accepted by experienced lecturers, while other cases, as we have seen, are perfectly acceptable. It should also be noted that the reported type of personification is closely tied to metonymy and is therefore substantially different from personifications like “each individual cell had to be master of all trades” (from an article in the popularized science journal New Scientist, identified by Low 2005). The latter is a type of personification that seems to be used for distinct functions, such as explanation and entertainment (cf. Low 2005). The next lexical item identified as related to metaphor is development. In previous discussions among the analysts, it had been classified as a borderline case. The decision to regard the item as borderline was most likely prompted by the analysts’ lexical knowledge about concrete instances of development, such as the

Chapter 6. Metaphor identification in academic discourse 

growth of a plant. However, there is no such entry to be found in Macmillan. The dictionary rather lists a fairly universal meaning of development: ‘change, growth, or improvement over a period of time’ (MM1). The entry conflates the basic concrete meaning with more encompassing (‘growth of a child as time passes, as it changes and learns to do new things’, MM1a) and more abstract (‘improving the economy […]’, MM1b) meanings. What is more, even the contextual meaning is subsumed under the universal meaning of MM1. Given this strong general sense of development, the lexical item was later re-analyzed as a non-metaphorical item. At the time of the reliability testing, however, the discussion of development was still pending. All analysts indicated their awareness of this status by assigning borderline status. The last metaphorical item included in this sentence is the preposition over. The contextual sense is ‘during a period of time’, which can be contrasted and metaphorically compared to ‘above someone/something’. All coders agreed on this comparison. Over is thus a maximally straightforward instance of metaphor. At this point, we can make the following observations about identifying metaphor in academic discourse. We observed that metaphor in academic writing often involves forms of personification (cf. Low 1999). And, including rather than excluding borderline cases of metaphoricity is important for metaphor identification in academic registers, too. We will now turn to our treatment of a number of less clear cases.

6.3 Lack of agreement 6.3.1 Metaphor identification and specialist terms: Metaphorical to whom? Our reliability tests show that academic texts (together with news texts) have the highest rate of unanimously identified metaphors of the four registers (see Chapter 8). Most words related to metaphor found in our samples of academic discourse are straightforward cases of conventional metaphor. They are not ambiguous and they are a typical part of academic prose. But academic discourse also exhibits the highest proportion of coder disagreement. This seems to be related to one of the intrinsic qualities of many metaphors in academic discourse, their degree of specialization. There are a number of implications. The British National Corpus reflects the high level of specialization of academic discourse by differentiating between four sub registers: “humanities, arts”, “natural sciences”, “politics, law, education”, and “social sciences”. The fragments representing academic discourse belong to distinct sub registers, which have their own specialised vocabulary. This is one axis of specialisation.

 A Method for Linguistic Metaphor Identification

In addition, Biber (2006) also distinguishes between various “academic levels (lower division, upper division, graduate)” (2006: 21). This categorization relates to different audiences using words (especially technical terms) with distinct levels of expertise. This is another axis of specialisation. Both types of specialization may cause problems for reliable metaphor identification. Each academic discipline has a specific technical language, which features many possible candidates for metaphorically related words. However, the detailed shades of meanings of technical languages are not part and parcel of the general reader’s lexical knowledge, and correspondingly, cases of disagreement in the relia bility test were often from technical vocabulary. The technical meanings of words like scalar (from scalar function in electromagnetics, FEF-fragment02, natural sciences) are not frequent enough to figure in the Macmillan English Dictionary for Advanced Learners (Rundell 2002). This is a special methodological problem of academic discourse, which at first glance does not seem to be resolved by MIPVU’s practice of consulting a usage-based dictionary to support coder decisions. To correctly establish the contextual meaning for technical terms like electrical charge or scalar function, analysts would need to gather information from more encompassing dictionaries, such as the Oxford English Dictionary (OED), or genuinely specialised dictionaries. Our solution for dealing with cases like these was to adopt a general view on metaphor, which means that we assume a general reader. This reader’s knowledge about the meaning of words is taken to correspond with the entries in the Macmillan dictionary, or, as a fallback position, Longman. Decisions should therefore not be based on etymological principles (charge) or solely on specialised dictionaries (scalar). We thus decided to stick to our general identification procedure, and base our decisions primarily on Macmillan. Since specialized terms do not appear in our dictionaries, and we deliberately did not include an additional step for assessing the specific contextual meaning of a lexical unit in the procedure, we cannot compare the exact contextual meaning to an assumed more basic meaning. However, just like the general language user, analysts do have intuitions about the approximate sense of a technical term, in particular about its abstractness and so on. Therefore, if the contextual sense of a specialized term is not in the dictionary, but there is a sense that fulfils our criteria of being basic, and that can be understood by comparison to the (assumed) contextual sense, we mark the word as a borderline case of metaphor (‘WIDLII’)—‘borderline’ because we have not checked the contextual sense against a specialist dictionary. 6.3.2 Metaphor-related words and scientific models In this section we will examine how our linguistic approach interacts with any knowledge we may have of the structure of underlying metaphorical scientific

Chapter 6. Metaphor identification in academic discourse 

models. In particular, the question arises whether it can interfere with achieving unanimous agreement. One example of words that seem to indicate a scientific model is (electrical) charge:

(2) It means that neither the magnitude nor the position of the charge varies as a function of time. (FEF-fragment02, natural sciences)

The contextual meaning is sense description 4 in Macmillan, ‘the amount of electricity that something holds or carries’, which is unproblematic. But when we have to identify the basic meaning of the word, it is difficult to make a decision. The summary for this entry looks like this: 1. 2. 3. 4. 5. 6. 7.

amount of money to pay when sb is accused an attack running fast amount of electricity amount of explosive sb you take care of ability to cause emotion

Longman provides no further information. Candidates for a basic meaning are the bodily-related ‘an attack by people or animals running very fast towards someone or something’ (MM3) and the concrete ‘an amount of the substance that makes a bomb explode’ (MM5). By adding the label WIDLII to our judgment that this word is related to metaphor, we can signal borderline status. We thereby account for the possibility that the general reader might judge one of these senses basic. The OED features the concrete, physical sense of ‘a (material) load, burden, weight’, which is now obsolete. It also provides us with the approximate date of the first usage of load in the electrical sense; this enables us to infer that, when the term was coined as part of a scientific model, the concrete, physical meaning was still part of the English lexicon. It is possible that today there is still available a model of electricity that works on the basis of an analogy between concrete material loads and less palpable amounts of electricity. This might even be explained in these terms to novices in the field. However, because the senses listed in Macmillan and Longman do not feature this obsolete basic meaning, we cannot mark the lexical item as a metaphorically used word based on an assumed comparison between the relevant concrete and abstract sense. Another example of a technical term hinting at a scientific analogy is natural selection:

(3) For, by 1841 he had worked out not only his theory of the origin of species, natural selection, but also, it seems, his theory of generation (including heredity, variation and so on), pangenesis. (CMA-fragment01, natural sciences)

 A Method for Linguistic Metaphor Identification

Macmillan provides an entry for the phrase as a whole: ‘the way in which living things continue to exist as a group or die, according to qualities they have or are able to develop’. However, as happens for all fixed phrases that are not compounds, we have to analyse natural and selection as independent lexical units. In the reliability test, one coder out of four decided that selection was used metaphorically. The basic meaning of selection is ‘the process of choosing one person or thing from a group’, with the examples in Macmillan and Longman suggesting a human agent. The other three analysts coded both elements of the phrase as non-metaphorically used. They apparently did not see that the basic meaning of selection includes a human agent, and can therefore be metaphorically compared to the contextual meaning. Our eventual decision is to mark the second component, selection, as a metaphorically used item. It is possible that individuals who are aware of Darwin’s model might sooner note the metaphorical meaning, for Darwin deliberately compares human selection as practised in plant growth to the genetic advantages of wild species (Young 1988). The following example concerns two related words from a social sciences fragment. The first metaphorically used word, role, stems from the title of the chapter (sentence 4 of the fragment):

(4) The human role as a system controller

(CLP-fragment01, social sciences)

The second metaphorical item is stage. It appears several discourse units later, within the context of depicting the historical development of human system control:

(5) At this stage the typical machine operator manipulated machine controls on the basis of data presented on instruments. (CLP-fragment01, social sciences)

In the reliability test, role was a unanimous metaphorical item, whereas for stage two coders decided “related to metaphor” and two “not related to metaphor”. Let us first consider role. Both the contextual and the more basic meanings can be found in Macmillan, are sufficiently distinct, and can be related by comparison. The meaning in Macmillan closest to the contextual meaning is ‘the purpose or influence of someone or something in a particular situation’, while the basic meaning is ‘the character played by a particular actor in a film, play etc: PART’. Role is thus a clear case of a metaphor-related word without major methodological complications. Stage, however, is slightly more complex. At first glance, it seems that the contextual meaning (‘a particular point in time during a process or set of events’) is sufficiently distinct from the basic meaning ‘the part of a theatre where the actors or musicians perform’. However, the fact that two out of four coders did not identify a metaphorical usage hints at possible difficulties. With regard to the basic meaning, Steen, Biernacka et al. (in press) stress that ‘part of a theatre’ is derived

Chapter 6. Metaphor identification in academic discourse 

from the historically prior, but now obsolete, sense of ‘raised platform’. Contemporary speakers of English may make sense of metaphorical meanings such as the above through a ‘spatial conceptualization of time’. Since this spatial conceptualization is achieved by a comparison with ‘part of a theatre’, which is the diachronic variant of ‘raised platform’, this would be a case of folk etymology. It should be added here that besides the contextual meanings above, both role and stage have other conventionally metaphorical meanings within academic discourse (e.g. developmental stage theory in developmental psychology, or role theory in social psychology). In fact, a number of metaphorical terms seem to be semantically related to the concept of ‘theatre’ by indirect reference. To mention just a few examples from sociology, this group includes dramaturgy, performance, and script (cf. Goffman 1959). And the scientific analogy underlying developmental stage seems to be based on a spatial concept which is related to ‘raised platform’ (cf. Case 1992). Our identification of the lexical items charge, natural selection, stage, and role as metaphor-related words in academic discourse shows that MIPVU offers specific solutions to different methodological issues. With charge, we have a highly conventionalised technical term whose original mapping is not metaphorical anymore, but for which new candidates for the basic meanings are possible (borderline case). Natural selection is a phrase of which the second component is metaphorically used due to the violation of a selection restriction, which coincides with Darwin’s original analogy. And finally role and stage are used in general ways, which implies that scientific and folk models are in constant contact within academic discourse. Our reliance on Macmillan (with the backup of Longman) as a resource for identifying the relevant contextual and basic senses offers an operational approach that can separate the contemporary from the diachronic perspective. When we consult the history of lexical senses listed by the OED, we only do so in order to clarify the nature of a particular problem (charge, stage). At the end of the day, all decisions about metaphor are based on the lexical entries in the contemporary corpus-based dictionaries. This systematic utilization of dictionaries facilitates the identification not only of metaphorically used words in general, but also of such words that are potentially related to scientific analogies. 6.3.3 Metaphor-related words and text management Academic texts exhibit patterns of intertextual and intratextual co-reference (see Eggins & Martin 1997) which are often provided by metaphorically used pronouns and determiners. Francis (1994) describes how nominal groups connect and organize written discourse. Paying special attention to what she calls

 A Method for Linguistic Metaphor Identification

“metalinguistic” labels, she describes nominal groups that label the different stages of discourse as writers present their own and others’ arguments. This category of labels includes nouns and adverbs such as point, where and here. These particular lexical items seem to convey comparisons between discursive meanings and spatial senses, thereby imparting structure to the abstract discourse. We will therefore first look at cases of metaphorical uses of demonstrative articles and pronouns and then scrutinize nouns and adverbs. These are quite often easy to identify as metaphorically used, but their very conventionality and one or two other factors may occasionally affect the achievement of unanimous agreement. Demonstratives and pronouns are a crucial device for the construction of cohesion and co-reference (see Eggins & Martin 1997) in academic discourse. They effect a form of concretization, turning discursive topics into tangible objects. One example is the metaphorical use of the demonstrative and pronoun this. The basic sense of this is ‘the one that is here’. It can be metaphorically contrasted with the contextual sense ‘the one that is known’, which is ‘used when you are referring to a particular person, thing, fact etc that has just been mentioned, or when it is obvious which one you are referring to’. Consider the use of this in the following paragraph, which consists of two consecutive sentences, numbered separately:

(6) Fortunately, there is a single antidote effective against both these myths; and that is to start all over again with the most decisive source of Darwin’s new identity, on the voyage, as a committed man of science: his zealous discipleship of Charles Lyell ‘s (1797–1875) views in geology (including biogeography and ecology).

(7) This antidote is effective against the romantic-individualist myth, because, as a protégé of Lyell, the young Darwin of the Beagle is at once invested with all the intellectual and institutional context that that myth would suppress. (CMA-fragment01, natural sciences)

The lexical item this in sentence (7) was a unanimously coded as a metaphorical item in the reliability test. Sentence (7) establishes co-reference with the preceding units of discourse contained in (6) by aligning “this antidote” with both the already mentioned lexical item antidote and the subsequent specification statement (beginning with the anaphoric implicit metaphor that). But, most importantly, in the metaphorical usage of the demonstrative this the abstract sense of antidote (‘something that helps to improve the effects of something bad or negative’, see above) is being referred to as if it was a concrete object: in the basic sense, this is ‘used for referring to the thing that is nearest to you, especially when you are pointing to it’. Here is another example of cohesion. The lexical item this in sentence (11) refers to the three preceding sentences (8 to 10), efficiently reducing all of the information into one pro-form:

Chapter 6. Metaphor identification in academic discourse 

(8) The business of designing machines, processes and systems can be pursued more or less independently of the properties of people.

(9) Nevertheless people are always involved, the designer himself is a human being and his product will shape the behaviour of many workers and other users.

(10) More fundamentally, the design activity will be meaningless unless it is directed towards serving some human need. (11) In spite of all this, the design process itself is often thought about and executed without any formal considerations about people. (CLP-fragment01, social sciences)

In the reliability test, three analysts decided that this in sentence (11) was a metaphorically used word, while one did not. Given that the contextual use of the pronoun this is a clear case of metaphor for our procedure, this is an error that might be due to the high level of conventionalization of the lexical item in this kind of usage: the metaphorical use of the delexicalized items this, that, and the plural forms these and those can even be missed by trained analysts and are probably almost invisible to the untrained eye. MIPVU is thus an excellent tool for sharpening the analytical view of such constructions. In Example (1) above, we are dealing with a slightly different metaphorical usage of the lexical item this. In ‘this chapter’, this abstractly “points” to what is currently relevant in the discourse (the contextual meaning in Longman being ‘used to talk about the present situation’). This is another distinct sense that can be understood by comparison with the basic, deictic sense. It should be mentioned that there also is a metonymic dimension present here, for this also refers to the materialized text the reader is currently seeing, and, in a way, possibly touching with their hands (related to Longman’s contextual meaning ‘spoken: used to talk about a thing or person that is near you, the thing you are holding, or the place where you are’). Yet, since metaphor and metonymy are not mutually exclusive, our identification of a metaphorically used word is not undermined by this finding. Just as with the demonstratives, the following group of lexical items conveys comparisons between spatial senses and discursive meanings, for which some might wish to apply the cross-domain mapping DISCOURSE IS SPACE when they turn to conceptual analysis. The following examples are general devices for text management. Consider viewpoint in (12): (12) From the narrow accountancy viewpoint, people are a cost and it is desirable to keep this cost as low as possible. (CLP-fragment01, social sciences)

In the reliability test, this lexical item was a straightforward, unanimously identified metaphor-related word. The contextual sense is an abstract ‘way of considering something’ (MM1). The basic meaning is clearly spatial: ‘a place from which you can see or watch something’ (MM2). Both meanings are thus sufficiently distinct

 A Method for Linguistic Metaphor Identification

from each other and can be metaphorically compared, even though it is clear that we need to include a touch of metonymy to get from the physical location to the concrete act of seeing for the basic sense. It is interesting to examine whether the construction point of view behaves similarly. (13) Thus, as with biological theories, crime is seen as pathological (a disease), as something to be looked at from the medical point of view. (B17-fragment02, social sciences)

In contrast to viewpoint, point of view is monosemous in Macmillan. The contextual sense is abstract: ‘a way of judging a situation based on a particular aspect’. However, based on the stress pattern, point of view is not one lexical unit, in the form of a compound, such as stock market. Instead, it is a fixed phrase, which needs to be analyzed as a collection of separate lexical units, following MIPVU’s general guideline for units of analysis (see 2.2.1). When all constituents of the phrase are analyzed as separate lexical items, both point and view are classified as metaphorical. Yet another contextual sense of point that is typical of academic discourse is illustrated in the following fragment: (14) This brings the discussion to a crucial point: […]. (ACJ-fragment01, politics, law, education)

Here, the contextual meaning is ‘a particular stage in a process’ (MM3a), a sub entry of ‘a particular moment in time’ (MM3). It is sufficiently distinct from and can be compared with the concrete spatial sense ‘sharp end of something’ (MM7). This is another metaphorical use of point, but for different reasons than the ones discussed above. When we connect Francis (1994) to our metaphor identification research, we can assume that viewpoint, point of view, and point in the contexts above cater to lexical cohesion in labelling (meta-) linguistic acts. (12) and (13) thus label comments on the ways of considering or judging a subject, while (14) marks a stage in the discursive progress. The fact that these lexical items are indirectly used suggests that metaphorical word usage has a function for text management. The spatial basic senses seem to “ground” the abstract acts of judging/labelling and stage-marking. A second type of “spatial” word is demonstrated by the contextual sense of where, which is used in a similar way as here. Both indicate an abstract situation in discourse: (15) The analysis draws throughout on the work done in the last decade by Gruber (1974), Herbert (1974, 1977), Ghiselin (1975), Ruse (1975a, b; 1979),

Chapter 6. Metaphor identification in academic discourse 

Schweber (1977, 1980), Kottler (1978), Manier (1978), Sulloway (1979, 1982a, b), Kohn (1980), Ospovat (1981), and Sloan (1983a, b) and is derived from studies by the present writer (Hodge 1982, 1986; Hodge & Kohn, 1986) where full reference is made to the documentary sources and secondary literature. (CMA-fragment01, natural sciences)

In the reliability test, analysts disagreed (three coders identified where as a metaphorrelated word, one coder as a non-metaphor-related word). The contextual meaning of where in Macmillan is ‘used for asking about or referring to a situation or a point in a process, discussion, story etc’ (MM3), the basic sense is ‘in or to what place’ (MM1) or ‘in or to a particular place’ (MM2). We have thus a clear spatial basic meaning, contrasted with the abstract (discursive) situation. However, the fact that coders disagreed makes us conscious of the fact that written discourse is always tied to its material basis: the printed text on paper (or screens). We can speculate that the disagreeing analyst coded where as a non-metaphorical item for taking the given references as concrete objects (palpable texts), thus assuming that the contextual meaning was the basic meaning, and not identifying a contrast between the discourse sense and the concrete location. The lexical item here can be found in sentences like the following one, stemming from a physics textbook: (16) We have used here a mathematical theorem stating that the line integral of a gradient depends only on the end-points and not on the connecting path. (FEF-fragment02, natural sciences)

With the background knowledge that the sentence belongs to the genre “textbook”, we identify the contextual meaning as corresponding to ‘at this point in a process, discussion, or series of events’. (MM3). It can be compared to the basic meaning ‘in or to this place’ (MM1). We are thus looking at another discernible metaphorical contrast between a discursive meaning and a spatial sense. However, when we know that the textbook comprises a series of transcribed lectures, we may have to adjust our decision about the contextual sense: the fragment belongs to a hybrid genre between the spoken and written modes of language, which makes it possible to consider the above sentence as spoken language. Thus, a non-metaphorical interpretation of the contextual meaning of here becomes possible, denoting the actual location of speaker and audience. MIPVU’s solution for cases of lack of situational knowledge but with a potential for metaphorical meaning is to treat the word as if it was used indirectly and metaphorically. Thus, the possible spatial contextual sense does not rule out the metaphorical contextual meaning mentioned above, and the lexical item becomes a borderline case (WIDLII). The metaphorical usage of the demonstratives, of the content words viewpoint, point of view, and point, as well as of where and here refers to abstract phenomena

 A Method for Linguistic Metaphor Identification

in and of discourse in a similar way as to concrete locations in space. We may therefore assume some lexical content for these otherwise ‘inherently unspecified (…) element[s]’ (Francis 1994: 83). Naturally, the analyses presented here aim to depict the identification of metaphorically used lexical items, and only secondarily to hint at the question whether and how a widely assumed conceptual mapping (discourse is space) manifests itself on the linguistic level. What has been found in this respect is that the linguistic identification of metaphorically used words does not conflict with the assumption of a systematic mapping between the domain DISCOURSE and the domain SPACE. 6.3.4 Metaphor-related words in extended contexts In this section, we will identify a number of metaphorical word usages belonging to the same stretch of academic discourse. We will suggest that identifying metaphor-related words with MIPVU offers a useful vantage point for developing further ideas about the interaction between and the discourse functions of lexical units. We will examine indirect usage of the items myth and antidote, both found in CMA-fragment01, a biological text on Darwin’s biography. In concert with that, we will identify a number of connected implicit metaphors, a rare phenomenon that MIPVU can help to detect (see section 2.5). Below, the first examined metaphor-related word, myth, is found in sentence (19), with the preceding context in sentences (17) and (18) being vital for establishing the contextual meaning: (17) The analysis draws throughout on the work done in the last decade by Gruber (1974), Herbert (1974, 1977), Ghiselin (1975), Ruse (1975a, b; 1979), Schweber (1977, 1980), Kottler (1978), Manier (1978), Sulloway (1979, 1982a, b), Kohn (1980), Ospovat (1981), and Sloan (1983a, b) and is derived from studies by the present writer (Hodge, 1982, 1986; Hodge & Kohn, 1986) where full reference is made to the documentary sources and secondary literature. (18) Such a survey can serve more than mere biographical curiosity, and a final section will suggest how it may clarify some issues of current interest to historians, to philosophers and to biologists. (19) It can also free us from many mistaken myths about Darwin himself.

This stretch belongs to the introductory part of the fragment, where the author first establishes the theoretical foundation for his argument (17), then goes on to underline the value of the analysis (18), which in sentence (19) culminates in ascribing to the survey a potential power to free “us” from “many mistaken myths” about Darwin. We will focus on the noun myth, which has the contextual sense ‘something that people wrongly believe to be true’. The basic sense is ‘an ancient traditional story about gods, heroes, and magic’. The basic sense is thus sufficiently

Chapter 6. Metaphor identification in academic discourse 

distinct from and can be understood in comparison with the contextual meaning of ‘wrong belief ’; myth is a clear case of metaphor. The metaphorical usage of myth is accompanied by at least two other rhetorical devices – alliteration (‘many mistaken myths’) and pleonasm (the contextual meaning myth denotes ‘something that people believe’ with the additional property ‘wrongly’; the adjective mistaken produces a pleonasm). The basic sense of myth may have been employed as an invitation to the reader to think of particular beliefs about Darwin and his life in terms of ‘gods, heroes, and magic’, with ‘ancient’ and ‘traditional’ being further aspects. A strategic choice of words is an option since irrational or even magical beliefs are not likely to be embraced by the intended academic audience of this text. Thus, we tentatively conclude that the usage of myth in the given context is very likely to have a deliberately persuasive function, inviting the reader to follow the author’s argumentation and interpretation of the facts. This interpretation can be related to the identification of other metaphorically used items in the adjacent context. In the five sentences below, which directly follow sentence (19), a series of interlinked implicit metaphors (see Section 2.5) can be spotted. These metaphors do not clearly stand out as coming from an alien domain, but still convey an indirect meaning that can potentially be explained by a figurative cross-domain mapping. In this case, implicit metaphor works by substitution of the metaphor-related words myth and myths: (20) These myths mostly trace to his own misleading reminiscences later in life, and have been relentlessly reaffirmed since, at the 1959 centennial symposia for example and in the 1978 BBC-TV series on Darwin; but they are nonetheless discredited by the scholarly industry now grown up around the rich manuscript archive from Darwin ‘s early years (Kohn, 1986). (21) One is the romantic, really Wordsworthian, individualist myth so dear to the literary guardians of English national cultural stereotypes. (22) It depicts the young Darwin as a lone, sporting gentleman, an amateur beetlecollector seeing nature as she really is by simply looking with the clear gaze of genius, unimpeded by any scientific training, theological prejudice, professional ambition and so on. (23) Another is the Whiggish, anachronistic myth that Darwin’s general biological thought consists of a molecule comprising just two atoms: the idea of evolution and the idea of natural selection. (24) It depicts his early intellectual development as reducing to two moments of discovery, whereby he moves from having no coherent ideas to having just those ideas.

 A Method for Linguistic Metaphor Identification

All instances highlighted with italics co-refer to the metaphorically used words myths and myth. The pronouns they, it, one and another are implicitly metaphorical by substitution. All of them substitute myth(s), conveying an indirect metaphorical meaning, and therefore receive the label ‘implicit metaphor’. It might be surprising that coders did not spot the implicit metaphors in the reliability test, but it should be noted that implicit metaphor is so rare as well as hard to notice because of its implicit nature that coders who are not specifically looking for it may easily overlook it. Reporting this finding here should thus function as an additional motivation to spot implicit metaphorical meanings. Troubleshooting during the wrap-up phase of the annotation (see Chapter 9) has shown that MIPVU offers a rigorous instrument for the identification of implicit metaphor. Subsequent to this extended co-referential structure relating to myth, sentence (25) contains antidote, an explicit, but indirect case of a metaphor-related word that may possibly exhibit a persuasive function as well. The noun here appears for the first time, thus in the sixth sentence after the introduction of myth (sentence 20 above). It is followed by another implicit metaphor, the pronoun that. (25) Fortunately, there is a single antidote effective against both these myths; and that is to start all over again with the most decisive source of Darwin’s new identity, on the voyage, as a committed man of science: his zealous discipleship of Charles Lyell ‘s (1797–1875) views in geology (including biogeography and ecology). (CMA-fragment01, natural sciences)

The contextual sense of antidote is ‘something that helps to improve the effects of something bad or negative’. It is sufficiently distinct from the basic sense, ‘a substance that prevents a poison from having bad effects’ and can be compared to it. Not surprisingly, it was a unanimously coded for metaphor in the reliability test. The pronoun that in sentence (25) is an implicit metaphor by substitution. It substitutes the noun antidote that has been identified as metaphor-related before, and therefore receives the label ‘implicit metaphor’. When relating the meaning of antidote to our observations about the discourse function above, we note that here the author presents ‘something that helps to improve the effects of something bad or negative’ directly after a cluster of metaphorical items co-referring to myth. By implication, the views criticised as myths might not only appear as ‘bad or negative’, but bear further connotations related to the basic sense of antidote (‘a substance that prevents a poison from having bad effects’). It should be briefly mentioned here that research on linguistic stance can be used as support for such an assumption: Charles (2003) examines the use of nouns for stance construction in thesis writing with regard to encapsulation by determiners, as is the case for these myths (20, 25) or this antidote (7) above, concluding

Chapter 6. Metaphor identification in academic discourse 

that this use of nouns is an important recourse for convincing argumentation and stance expression. Discussing this in more detail would go beyond the scope of this chapter, but it is clear that further analysis is needed to connect metaphorical word usage to the rich, rhetorically marked text structure and against the background of genre. This might deliver more definitive evidence for persuasive discourse function and stance construction. Such an analysis can rely on MIPVU as a reliable and accurate tool for detecting different types of metaphoricity, such as indirect and direct word usage and explicit and implicit metaphorical meaning. The next and last example of this chapter displays both direct and possibly indirect word usage related to metaphor. The sentence stems from a chapter in Fossils: The Key to the Past, a textbook on palaeontology. The chapter is entitled Bringing Fossils Back to Life. (26) “Poplar leaves have an elegant outline resembling that of an Arab minaret.” (AMM-fragment01, natural sciences)

The first lexical unit discussed here is elegant, with the contextual meaning of ‘elegant places and things are attractive because they are beautiful in a simple way’. We might intuitively look for a more basic meaning related to human entities, and actually find ‘an elegant person is attractive and graceful in their appearance and behaviour’. However, this sense is signalled as not being sufficiently distinct from the contextual sense by both Macmillan (where it is subordinate to our contextual meaning), and Longman (where elegant is actually monosemous: ‘beautiful, attractive, or graceful’). Therefore, elegant is not related to metaphor. In sentence (26), we identify a local referent shift, from poplar leaves to Arab minaret. The lexical units Arab and minaret are incongruous with the overall topics of paleontology in general and poplar leaves in particular. However we see that the incongruous lexical units can be integrated within the overall referential framework by means of comparison, signalled by the verb resemble (‘to be similar to someone or something, especially in appearance’), used in present participle form. In the next step, we see that the comparison is nonliteral or cross-domain, with the outline of the fossilized leaves belonging to the domain of plants and the outline of a minaret belonging to a highly salient object (minaret) from the distinct domain of (religious) architecture. The [outline] of an Arab minaret thus indicates the source domain, compared to the target domain expression outline of poplar leaves, with elegant being a property of both. We therefore identify resembling as an MFlag and code the entire collocation that of an Arab minaret is as a ‘direct metaphor’, labelling the source domain of the provisionally sketched conceptual mapping as ‘architecture’. Direct use of lexical units related to metaphor may frequently be related to a didactic function in academic discourse, especially within the present genre, that

 A Method for Linguistic Metaphor Identification

of the textbook. In the above example the direct metaphor probably has the goal of facilitating visualization for the (novice) reader. In view of the need for further corpus-linguistic exploration of the distribution of similes and simile-like utterances in academic discourse (cf. Low in press), MIPVU also offers a reliable procedure for the identification of direct metaphors in discourse. 6.4 Conclusion In this chapter, we have shown how MIPVU serves the identification of various cases of linguistic metaphor in academic discourse. Our primary goal was to run the procedure for a variety of case studies of academic discourse. We demonstrated how MIPVU accounts for the particularities of the register with the aim of providing researchers of metaphor in academic discourse with a reliable and fine-tuned tool. One prominent feature that distinguishes academic discourse from the other registers at this level of analysis is the comparatively high proportion of cases of lack of unanimity in our reliability tests. In absolute terms, the instances of lack of agreement are still rather small in number, but in comparison with performance in news, fiction, and conversation, their relative frequency is striking. We interpret this finding as at least partly reflecting the specific nature of academic discourse, in particular the technical vocabulary of particular disciplines and the role that expertise might play in the usage of specific lexical items. In our group discussions, we observed that differences in prior knowledge and/or intuitions about contextual and basic meanings affected individual decisions. This eventually reinforced our policy of assuming a general reader, with the systematic utilization of a corpus-based learner’s dictionary, as a norm. Recruiting specialised and diachronic dictionaries was not practicable for our particular goal, which after all is to produce annotations in a corpus of a reasonable size that is not limited to academic discourse. Other studies using MIPVU for metaphor identification in the academic register might, however, benefit from gathering information from more encompassing sources. Since highly conventionalized scientific terms are often metaphorical due to diachronic variation and change, including the etymological dimension might be one possible variation on MIPVU. There is not one “academic discourse”, but a number of specialized subfields with different metaphorical word usages. Within our research, we could account for this fact by employing the labels given by BNC, but not much more. Further research is needed here to examine the specific metaphorical word usage in different subdomains of academic discourse. Similarly, only little work on the relation between academic discourse(s) and popular science has so far been based on

Chapter 6. Metaphor identification in academic discourse 

word-by-word examinations of metaphor (e.g. Knudsen 2003; Low 2005; Semino 2008; and for a slightly different approach the work by Nerlich and colleagues, e.g. Larson, Nerlich & Wallis 2005; Nerlich & Halliday 2007). Our range of examples includes straightforward identification of words related to metaphor on the one hand and cases that demand special methodological attention on the other. We have shown how MIPVU caters to specific and less frequent instances of metaphorical word usage in academic discourse, such as implicit meaning and direct metaphor. Cases can also be roughly related to discourse functions, with some technical terms indicating scientific models (charge, natural selection, role, stage), other lexical items possibly related to strategic word choice for persuasion (myth, antidote), and yet other words with “spatial” basic senses (this, that, viewpoint, point of view, point, where, here) serving the creation of textual cohesion. However, we are aware that only a full-fledged discourse-linguistic analysis can provide statistically grounded evidence on the discourse functions of metaphorically related words.

chapter 7

Metaphor identification in Dutch news and conversations 7.1 Introduction MIP was originally developed to analyse metaphor use in English discourse, and MIPVU was based on that starting point when we analysed texts from the BNC-baby corpus. When we began to look at metaphor in Dutch, the question arose whether MIPVU could be adopted wholesale or whether adjustments were needed. This chapter reports on our experiences in this area, in a research project on metaphor in Dutch discourse. Parallel to the English language project we have set up a Dutch component in which a corpus containing Dutch news texts and conversations has been analysed for words related to metaphor. To be able to carry out a similar systematic analysis for Dutch discourse, MIPVU was further specified where necessary in order to be applicable to Dutch as the target language. These specifications did not so much concern the different steps in the procedure, as some of the detailed information on multi-word units and the use of a Dutch dictionary. This chapter will therefore give an overview of the methodological issues in the Dutch research project. It will establish that, as a rule, MIPVU can be directly applied to Dutch discourse. However, the chapter will also point to problems that have cropped up in relation to MIPVU. The procedural problems will predominantly be language-specific, and will highlight some of the differences that occurred between Dutch and English. In addition, registerspecific issues will be raised to discuss semantic issues in relation to the comparison between news discourse and spontaneous speech. These issues will show that, although we are dealing with a language other than English, register-specific problems can occur in similar manners as reported in Chapters 3 and 4; these can be accommodated by applying adjustments that are related to the Dutch language. The procedure that has functioned as a basis for metaphor identification in the Dutch corpus takes as a point of departure the same assumptions as were made for the grammatical structures and lexical characteristics of English. The premises that were specific to English have been explicated in different reports on the identification method (cf. Pragglejaz Group 2007; Steen, Biernacka et al., in press), where linguistic phenomena like phrasal verbs, compounds, new-formations, grammatical words and word class are all discussed in relation to the method.

 A Method for Linguistic Metaphor Identification

But although the languages of English and Dutch overlap on many fronts, there are also important differences. Crucial grammatical dissimilarities have been reported in contrastive grammars of Dutch and English. Aarts and Wekker (1987), for instance, point to the differences in the occurrence and use of demonstrative pronouns and relative pronouns, and indicate differences between certain prepositions in Dutch and English. Other linguistic differences have been discussed in a number of studies on Dutch constructions, such as Booij’s work on constructional idioms (with fixed prepositions) and on separable complex verbs (cf. Booij 2002a, b; Verhagen 2005). As the discussions in Sections 7.3 and 7.4 below will show, some of these essential grammatical features of Dutch have proved to influence the way a method like MIPVU can be applied. Despite the differences in linguistic features between English and Dutch, a brief glance at the agreement percentages from the reliability tests reveals that the analysis of Dutch discourse yields roughly the same figures as for English, both on the general level of language and on the specific level of registers. These figures give a first indication that MIPVU can work in similar ways for the two languages. For each reliability test, three native speakers of Dutch, all members of the two research projects and co-authors of this book, carried out the metaphor identification tasks. On average, 80% of all lexical units were unanimously judged to be non-metaphorical; 83.5% for conversation and 76.2% for news. On average, 12.1% were unanimously judged as metaphor-related; 8.7% for conversation and 15.7% for news. This means that there was an overall lack of unanimous agreement on the metaphorical status of lexical units for less than 10% of the total number of units. These figures are comparable to the percentages reported for the reliability tests carried out on the English materials, and any differences that may already strike the reader as interesting may be due to various factors, as we shall discuss below. More detailed reliability results are presented in Chapter 8. The next sections will highlight some of the difficulties that we encountered with respect to the methodological issues and particular characteristics of Dutch discourse. These difficulties had to do with the operational design of the corpus, with the lexicographical tools used for the procedure, and with grammatical differences that influenced parts of the method. In addition to that, however, the great benefits of the availability of a method such as MIPVU will be highlighted, illustrated by examples of clear agreement. 7.2 Operational issues 7.2.1 The corpus: News and conversation In spite of the fast rise of corpus linguistics, the large majority of corpora still consist of English discourse. In the absence of a fully coded corpus of Dutch texts

Chapter 7. Metaphor identification in Dutch news and conversations 

that could cater to our needs, we had to design one ourselves. In total, about 100,000 words of recent conversation transcripts and recent news texts were needed. Half of the corpus, some 50,000 words of conversation, was taken from the existing CGN corpus (Corpus Gesproken Nederlands [corpus of spoken Dutch]). This corpus was designed at the University of Nijmegen between 1998 and 2004, and consists of nine million words of spoken Dutch in different settings, such as interviews, meetings, television broadcasts and spontaneous speech (Oostdijk 2002). We randomly selected 29 conversations classified as spontaneous speech, adding up to roughly 50,000 words in total. The existing corpora of Dutch news texts do not cover a broad selection of newspapers but predominantly focus on one newspaper or one region, and thus were not suitable for our kind of research. We therefore decided to select 50,000 words of digitally available news texts from 2002 (in Lexis Nexis), spread over five national newspapers (Algemeen Dagblad, NRC Handelsblad, Telegraaf, Trouw and de Volkskrant) and different newspaper sections, to get a good overview and a general representation of newspaper texts. One concomitant problem with designing the news part of the corpus ourselves was that whereas the conversation transcripts were enriched with additional information such as part-of-speech tags and lemma tags, the newspaper texts did not contain this kind of vital information. We therefore had them lemmatised and tagged with the same tagging system used for the CGN corpus to enable appropriate comparison (Van den Bosch, Busser, et al. 2007). The final result was a corpus of XML texts enriched with additional codes such as part-of-speech tags and lemma tags. The CGN tagging system was designed for transcripts of spoken language, but was applied to written discourse in our situation. This inevitably yielded a somewhat less reliable result for part-of-speech and lemma assignment than the conversation transcripts, with incorrect codes on different levels. However, in the majority of the cases, this did not significantly interfere with the metaphor analysis. In cases where part-of-speech tags or lemmas were incorrect and of importance for the identification of metaphor, a comment was added to the word in question. In Example (1), for instance, the word geregeld was tagged by the system as the past participle of the verb regelen. (1) Winkel wagens en stalen platen tussen de rails zorgen ‘Shopping trolleys and steel plates between the rails care geregeld voor levensgevaarlijke situaties. regularly for life-threatening situations’

(Algemeen Dagblad)

‘Shopping trolleys and steel plates between the rails regularly create lifethreatening situations’

Although geregeld is in some contexts the past participle of the verb regelen ‘arrange’, it is also an adverb or adjective in other contexts. In the case of (1), geregeld is the adverb with the meaning ‘regularly’. If in this case we stuck to the information

 A Method for Linguistic Metaphor Identification

given in the part-of-speech tag and the lemma tag and saw this instance as the past participle of the verb, then it might in principle be analysed as a metaphorically used word. This is because the verb regelen has a more concrete sense (that of physical arrangement of objects) than what is meant in the context of (1). However, the tagging system assigned the wrong part of speech and lemma to the lexical unit, so that we need to examine the adverb meanings of geregeld. As a result, the word is not identified as related to metaphor. In cases like (1), where an incorrectly assigned part of speech tag can cause confusion with respect to metaphor analysis, we have added a comment which denotes the correct part of speech and lemma. 7.2.2 Van Dale dictionary and its implications It is important to clarify, in the procedure, the use of the dictionary and its contents. We have used the electronic version of Van Dale Groot woordenboek der Nederlandse taal (Den Boom & Geeraerts, 2005) as our reference tool, in particular for finding contextual meanings and basic meanings of lexical units. This is a historically based reference dictionary. The ideal dictionary for our kind of research would have been a corpus-based dictionary. MIPVU makes use of Macmillan, a dictionary based on the large World English Corpus. This dictionary represents British English language as it is used in everyday speech and writing, being based on naturally produced contemporary language which has been collected in the World English Corpus. For Dutch, there is no such thing as a corpus-based dictionary, so we are forced to make use of the best alternative that is available, Van Dale. There are some key differences between a corpus-based dictionary like Macmillan and the historically-based Van Dale dictionary with regard to lemma entries. Firstly, Van Dale is a dictionary that includes archaic word meanings as well as obscure meanings that are rarely used. Since we do not want to include archaic meanings in our analysis, we add an explanation to the Dutch version of MIPVU that all meaning descriptions labelled verouderd (archaic) must not be taken into account. Rare uses of words that have not been labelled verouderd, by contrast, are assumed to be present in the language of the current user, even if they are rare. These have thus been taken into account when looking for possible metaphorical meaning. In some cases, however, a seemingly basic meaning turns out to be a rarely used meaning. An example of this is the Van Dale description of the verb ontwikkelen (‘develop’). One of the listed meanings was ontvouwen, loswikkelen (‘unfold, unwrap’), with the example sentence de jonge blaadjes ontwikkelen zich (‘the young leaves unfold themselves’). This sense description in itself seems fairly concrete, as is the example sentence. Yet, corpus research will most certainly show that if and when a sentence like de jonge blaadjes onwikkelen zich is used, the apparent

Chapter 7. Metaphor identification in Dutch news and conversations 

concreteness of the definition becomes questionable. Although ontwikkelen is a frequently used verb, also in combination with plants, it is often used in the sense of ‘growing’ and ‘coming into existence’, the more general meaning of ontwikkelen. So even though the verb ontwikkelen appears frequently in the vicinity of jonge blaadjes or a word denoting similar parts of plants, the idea that it is then used in a concrete sense of unfolding is quite implausible. The main difference between a historically based dictionary and a corpusbased dictionary can be illustrated by comparing the description given above of ontwikkelen to the description of develop in Macmillan. For develop, the literal action of unfolding a concrete object is not overtly present in either one of the sense descriptions. The first definition, ‘if people, animals, or plants develop, they change or grow as they get older’ has to do with plants and other living entities, but describes the process in a general manner. Here, it is impossible to see a more basic sense that has to do with plants or leaves that are literally unfolding, although that stage might be involved in the whole process of developing. One reason why Macmillan does not overtly distinguish the sense of ‘unfold/unwrap’ is perhaps that it is not found in such a detailed way in the World English Corpus. In addition, all senses of develop in relation to leaves and plants amount to the general meaning of growing from or into something. This meaning of develop is similar to Dutch ontwikkelen, which in essence appears most frequently in the sense argued above. Since the Dutch Van Dale dictionary is not based on a large corpus, the entries cannot be checked in such a manner. We should therefore take into consideration the possibility that the word ontwikkelen has the different meanings described in Van Dale, and base our judgement of basic and contextual meanings on these entries. Another important feature of Van Dale that can cause problems when searching for metaphor-related words is the fact that some entries have been defined by nominalisations. As a consequence, it is sometimes hard to decide whether the noun, like the verb, has one clearly basic meaning and several derived, possibly metaphorical, meanings, or whether it has simply one general and vague meaning. This can yield problems in finding a clear basic meaning, and has been a reason for disagreement among the analysts in the reliability tests that we carried out. An example of such a vague description is the noun aanpak (‘approach’), where the definition simply is the action of the verb, het aanpakken, wijze van aanpakken (‘the approaching, manner of approaching’). Looking up the verb aanpakken in the dictionary, we see that it has a clear basic meaning and clear derived, and in some contexts metaphorical, meanings. However, it is difficult to make a decision on the uses and meanings of the noun aanpak if it is described in such a general way. It could be said that it clearly derives from the verb, which has both concrete and abstract meanings, and it could be concluded that the same can then be said of

 A Method for Linguistic Metaphor Identification

the noun. However, it is also possible to say that the noun has one general meaning, that it is therefore monosemous, and that it is then not necessary to make a distinction between concrete and abstract uses. If the last option is chosen, some possible metaphorical meanings may be lost. The reliability tests show that the analysts are not always in agreement about how to deal with these nouns. In addition to the vague meaning descriptions, this sometimes also has to do with the influence of native speaker intuitions. In the case of aanpak, for instance, the analysts may follow the one general meaning without going to the verb, because intuitively they judge that the noun is always used in an abstract way. For other nouns, for instance vervolg (‘continuation’) with the definition het vervolgen (‘process of continuing/prosecuting’), analysts may say that the noun can be and is used frequently in different concrete and abstract contexts, and it is thus necessary to refer to the verb senses to establish a basic and contextual meaning. In order to be systematic and to get reliable results from the procedure, we have decided to look only at the definitions of the nouns in instances like aanpak and vervolg, and not to include the senses of the verbs from which they derive in our metaphor analysis. This decision is in accordance with the system proposed by MIPVU. Even though nominalisations like these can encompass concreteness to some extent, they have been defined by Van Dale as including all senses of the verb, and are seen as more or less general or vague. Thus, all instances of aanpak and vervolg in the corpus have been taken as used in such a sense.

7.3 Linguistic issues: Complex words and fixed expressions 7.3.1 Separable Complex Verbs One specifically Dutch grammatical construction that is relevant to metaphor identification in discourse concerns separable lexical units. In the majority of the cases, Dutch lexical units consist of one word. But there is one important class of cases in Dutch where two words separated from each other by an indefinite number of other lexical units can still form one lexical unit: these are the so-called Separable Complex Verbs. These lexical units require their own treatment in the procedure. Separable Complex Verbs (henceforth ‘SCVs’) consist of two components, a particle and a verb. These components form a single word in the infinitive form, but are separated from each other in certain contexts. The following sentence contains a separated SCV, which is italicized: (2) ingaan ~ Zondag gaat de dienstregeling in. in-go ~ Sunday goes the timetable in. ‘be effective ~ The timetable is effective from Sunday.’

(de Volkskrant)

Chapter 7. Metaphor identification in Dutch news and conversations 

Note that the particle is preverbal in the infinitive, but postverbal in the finite form. In general terms SCVs are similar to the phenomenon of phrasal verbs in English, and pose similar problems concerning their fixed characteristics and the unit of analysis (cf. Pragglejaz Group 2007). Although this is true for both syntactic and lexical characteristics (Dutch SCVs are also frequently non-transparent in meaning), the fact that the Dutch verbs are separable changes their analysis. The examples below will illustrate this in more detail. For all SCVs, the corpora do not contain technical indications in the partof-speech tags and lemmatisation that show that the parts of an SVC, gaat and in in (2), are part of one lexical unit. The corpus documentation on the CGN tags briefly touches on the prepositions functioning as the non-verbal particles of SCVs, but they have not been given a unique tag that could have made them easy to recognise. Thus, it is not possible to distinguish an SCV’s particle from a preposition solely on the basis of part-of-speech tags. They have to be distinguished, therefore, with reference to dictionary information. The dictionary lists common SCVs as one unit. We therefore adopt the dictionary as our prime source, and analyse the parts of ingaan in (2) as constituting one lexical unit. This entails that ingaan is to be judged as one whole for its possible metaphorical meaning, and that it is taken to relate to one concept and designate one referent. SCVs can also be distinguished from frequently collocating verbs and prepositions on syntactic grounds. Then arguments and complements can offer clear indications. The nonverbal component is part of an SCV when it is an adposition, and when the verb is transitive. In that case, the particle cannot be a preposition since it does not function as the head of a prepositional phrase. Two examples of a combination of draaien (‘turn’) and om (‘around’) will illustrate the major syntactic differences between an SCV and a prepositional verb (invented examples of frequently occurring combinations of a verb and a preposition).

SCV omdraaien (around-turn; ‘turn’)

(3) De man draait de knop om The man turns the switch around ‘The man turns the switch’

Prepositional verb draaien om (turn around; ‘revolve around’)

(4) De aarde draait om de zon The earth turns around the sun ‘The earth revolves around the sun’

In (3) we are dealing with the transitive verb omdraaien, defined in Van Dale as ‘draaiend bewegen, wenden, van stand doen veranderen’ (‘move while rotating, turn, change position’). Draaien en om are separated when the verb is used in a

 A Method for Linguistic Metaphor Identification

head clause, and together take two arguments, the subject de man and the direct object de knop. The word om should be seen as part of a complex verb (cf. Booij 2002b; Blom 2005). Sentence (4) contains the intransitive verb draaien, defined in Van Dale as ‘zich rond een as of een middelpunt bewegen’ (‘move around an axis or central point’). It takes one argument in the form of the subject de aarde, and is directly followed by the prepositional phrase om de zon. The preposition om collocates with the verb, but is head of the prepositional phrase. In the case of draait om de zon, draaien and om collocate; they co-occur frequently in similar contexts. However, they are not part of one complex verb; the preposition om is the head of the prepositional phrase om de zon, with the noun phrase de zon as the complement. The importance for metaphor analysis is illustrated by (5). (5) ‘Het kost best wat moeite de knop om ‘It costs quite some effort the switch around te draaien’. to turn’.

(Algemeen Dagblad)

‘‘It takes quite some effort to flick the switch’’.

This sentence, from one of the corpus texts, occurs in a context in which a goalkeeper talks about switching mentally from one situation to another. It contains the same SCV as in Example (3), omdraaien, but in a different sense. It is a metaphorically used word, where the abstract target domain of the mind is represented in terms of the concrete source domain of objects. Following our metaphor identification procedure, the SCV omdraaien in (5) is analysed as one metaphorically used word. Suppose we took the verb draaien as one unit and the particle om as one as well, then this could lead to different (and, in our opinion, inaccurate) judgments about metaphorical language use. In particular, we might code draaien (‘to turn’) as metaphorically used, since objects are not literally turned, and om (‘around’) as metaphorically used as well, since objects do not literally change directions. If we disregarded the concept of SCVs and took all separate words as separate units of analysis at all times, this would increase our final set of observed words related to metaphor. Again, there is a clear connection between the instructions on the treatment of phrasal verbs in MIP and MIPVU to what is being discussed here; they are considered in equal measure as one lexical unit when analysed for their metaphorical meaning. Since SVCs are ubiquitous in the Dutch language too, they need to be carefully identified and analyzed. An example of a non-metaphorical SCV can be found in the sentence below:

Chapter 7. Metaphor identification in Dutch news and conversations 

(6) Na een week van tegenslag zat het Richard Groenendaal (…) After a week of setbacks sat it Richard Groenendaal (…) eindelijk een keer mee. finally once with.

(de Volkskrant)

‘After a week of setbacks Richard Groenendaal (…) finally had a bit of luck.’

Example (6) contains the SCV meezitten, separated in the head clause into the verbal part zat (past tense of zitten ‘sit’) and the particle mee ‘with’. According to Van Dale, meezitten is monosemous, and defined as ‘gunstig zitten, goed gaan’ (‘be favourable, go well’). Zat and mee are here identified as two parts of one complex verb. If they are identified as two separate units of analysis, we could ascribe possible metaphorical meaning to one or both parts, since they are then not used in their basic sense, but in a more abstract way. However, since they are separate parts combining into one SCV, we do not take the words separately; and since the SCV has one abstract sense in the dictionary, it is not coded as related to metaphor. It is thus important to add information in the procedure on the treatment of SCVs as complex units to keep a consistently analyzed data set. This essentially comes down to adding an explanatory section on SCVs and their features. 7.3.2 Polywords There are a number of problems that arise when dealing with the somewhat related issue of polywords (e.g. of course in English, and met name [‘in particular’] or af en toe [‘now and then’] in Dutch). Whereas the BNC has added a list of fixed multiword expressions that have to be analysed as one lexical unit, and has coded the elements as such in the corpus, the Dutch corpus does not contain such a list and corresponding codes. In addition to the lack of a part-of-speech tag for polywords in the corpus, the Van Dale dictionary does not indicate when a combination is to be identified as a polyword either. The dictionary does explain the meaning of fixed multiword expressions such as met name under the head word entry name, which is an old inflection of naam (‘name’), but it does not include separate entries for these frequent polywords. To produce such a list from scratch, based on our own intuitions, lay outside the scope of our research. Our procedure, together with the dictionary and the part-of-speech tags in the corpus, therefore requires that we analyse the separate parts of any potential multiword expressions that may qualify as polywords as separate units, which produces an artificial increase in the overall number of lexical units in the data in comparison with the English materials. An additional problem that arises from this decision is that in most cases it is extremely difficult to establish the contextual meaning of some parts, or all parts, of the multiword expressions, due to the grammaticalization of these expressions. If we fill in the different steps of the procedure for both met and name, it becomes

 A Method for Linguistic Metaphor Identification

clear that it is difficult to establish a contextual meaning for the separate words when they are used as parts of a polyword-expression. Take for instance met name in the following sentence: (7) (…) in de vorm van begrotings overschrijdingen, met name (…) in the shape of budget overspending, with name in de zorg. in the care.

(NRC)

‘(…) in the form of overrunning the budget, particularly in the care sector.’

For the lexical unit met, establishing the meaning in the context above is complicated. This use of met is not mentioned in the list of possible meanings under the preposition met. We only come across it as a run-on under the entry of name. The closest possible meaning that is listed in the dictionary under the preposition met is 7 ‘ter aanduiding van een begeleidende omstandigheid, van de wijze waarop iets geschiedt, de gezindheid waarmee iets gepaard gaat’ (‘as an indication of an attendant circumstance, the manner in which something occurs, the inclination associated with something’), with examples such as ‘met opzet iets doen’(‘do something on purpose’). However, this definition does not correspond accurately with met in met name. If we assume that met displays some abstract meaning in our example, then determining if met has a more basic contemporary meaning is less difficult, and in fact manifests two possibilities. The basic meaning of met is easy to find in the dictionary, described in sense 1: ‘(ter aanduiding van een vereniging of begeleiding) in gezelschap van’ (‘(as an indication of a joining or an accompaniment) in the company of ’). In addition, sense 8 can also been seen as basic: ‘ter aanduiding van het werktuig, het middel waarmee iets geschiedt, door middel van’ (‘as an indication of the instrument, the tool with which something is done, by means of ’). The next step in the procedure is to decide whether the contextual meaning contrasts with the basic meaning but can be understood in comparison with it. Since we cannot find a suitable contextual meaning in the dictionary, it is impossible to find a contrast as well as a comparison between the basic and the contextual meaning. Since there is no clear contextual meaning with which to contrast and compare the basic meaning of met, we cannot safely conclude that met in the context in (7) is metaphorically used. Thus, we would not mark the preposition met as metaphorical since we cannot find a suitable contextual meaning. For the lexical unit name, both the contextual and the basic meaning are easy to establish. As far as the contextual meaning is concerned, the dictionary lists only one meaning, which is a description of the use in fixed expressions: ‘verbogen vorm van ‘naam’, alleen in vaste verb.’ (‘inflected form of ‘name’, only in fixed combinations’). The examples in Van Dale of the fixed expression in which name is used refer to the basic meaning of the uninflected form of the word, naam, which is ‘woord waarmee een persoon of zaak wordt aangeduid, hetzij als categorie of

Chapter 7. Metaphor identification in Dutch news and conversations 

als individu’ (‘word that denotes a person or entity, either as a category or as an individual’). For met name in particular, the dictionary makes explicit that name here refers to the basic meaning of its uninflected form, being used to name the entities to which met name refers. The contextual meaning of name should be seen as a specific use of the noun naam, the inflected form, and not so much a different sense of the word. The next step of the procedure consequently has a straightforward solution: the basic and contextual meanings are the same in this case, since the word name is used only in fixed expressions. Therefore, there is no need to contrast and compare them. And since the lexical unit is only used in fixed expressions such as met name, name in the context in (7) is not metaphorically used. This analysis suggests that the demarcation of lexical units can be problematic when we have to deal with the separate units of polywords such as met name. However, since this research project does not have any technical means to recognise two or more words as one polyword, the only systematic solution that we can come up with is to analyse the separate parts as separate units, and to go through the procedure of metaphor identification for each of these units. When interpreting the findings, we then have to take into account a small measure of artificial increase in the number of lexical units that has resulted from this treatment. The clear documentation of polywords in English in the BNC list is of no help here either, since we can assume that polywords and their forms are language-specific. Although the entries in the dictionary suggest that it is not possible to see a contrast and comparison for either met or name, two of the three analysts involved in the reliability tests coded met as a metaphor-related word, and one analyst coded name as a metaphor-related word in (7). In the case of met, an explanation could be that even though there is no clear contextual meaning, the analysts still believe that it is so different from the basic, concrete meaning that there could be a possible mapping between the overarching domains of abstractness and concreteness. In the case of name, the analyst might have interpreted the basic meaning of naam as different from the description for name and concluded that this is very concrete in comparison to the abstract use in met name, basing her conclusion on the potential for mapping between abstract and concrete. Polywords, as opposed to SCVs, thus remain problematic in the present research; the elements are analysed as separate units in accordance with the identification procedure, but the possibly metaphorical status of the units remains difficult to establish. What the previous sections have illustrated is that MIPVU and MIP, which have both been developed to analyse English discourse, need adjustments when they are applied to a language other than English. MIPVU is explicit in its use of lexical tools and in the treatment of specific lexico-grammatical features of English, which has been explained in Chapter 2. These explicit instructions do not all apply to a different language, such as Dutch, and can thus not be taken over literally when doing metaphor identification in a language different from English. Important

 A Method for Linguistic Metaphor Identification

tools such as dictionaries work differently for Dutch, and can complicate matters. The same holds for typical Dutch lexico-grammatical features, which should be treated in specific ways in relation to metaphor identification. These problems are not only present in Dutch discourse; similar issues will occur when analysing discourse in other languages. Yet the most important conclusion remains that the basic steps that MIP and MIPVU present can be equally applied to languages other than English. Although it is necessary to specify certain operational issues and language-specific points, the basic steps of finding the contextual and basic meanings of a lexical unit, and establishing the possible contrasts and comparisons between them can be done in similar ways. The section below will illustrate this by discussing some applications to Dutch in more detail.

7.4 Dutch metaphor analysis: Agreement and disagreement 7.4.1 Dutch discourse and agreement As the percentages in the introduction indicate, the three analysts who have worked on the Dutch texts in the reliability tests agree about the status of more than 90% of the words. The overall percentage of unanimous agreement on the status of the lexical units is roughly the same for both registers; news obtained an agreement percentage of 91.9 and conversation an agreement percentage of 92.2. These percentages of unanimously coded words, both metaphor-related and not metaphor-related, are slightly lower than for the news and conversation registers in the English language project. This may be due to the fact that the analysts encountered the operational problems listed above, having to make the transition from the procedure intended for English to the procedure adjusted for Dutch. In addition, grammatical differences discussed in the sections above were also influential. The numbers still give a clear indication of how well the procedure works for another language than English. This section will highlight a number of points of agreement in relation to the procedure for Dutch. A clear example where complete agreement was reached is the use of the word slachtoffer (‘victim’) in certain contexts. In one of the news texts used for the reliability tests, the word slachtoffer occurred in the following context: (8) Het slachtoffer van de schietpartij in de Bloedstraat zou The victim of the shooting in the Bloedstraat must volgens ooggetuigen in gezelschap zijn according to eyewitnesses in the company be geweest van een man (…). been of a man (…).

(Telegraaf)

Chapter 7. Metaphor identification in Dutch news and conversations 

‘The victim of the shooting in the Bloedstraat apparently was, according to eyewitnesses, in the company if a man (…).’

In English, slachtoffer would be the counterpart of victim in this context. In Dutch, however, slachtoffer can be used in another concrete sense; it is also the term for a sacrificial animal, related to different religious traditions. This sense of slachtoffer can be seen as more basic; it is older and was the original meaning of the word. Since both senses in the dictionary are concrete and animate, but occur in different domains, the oldest meaning is taken as basic. All three analysts agreed for slachtoffer in this context, and in similar contexts, that it was a metaphor-related word. This is based on the idea that the two senses belong to different and distinct domains, and they took the domain of religion, and thus the meaning of sacrificial animal as basic. This agreement might not have been reached if analysts had analysed this sentence without the procedure; slachtoffer in the sense of victim is more commonly used than in the sense of sacrificial animal, even though the first is a derivation of the latter. The basic steps of the procedure thus resulted in unanimous agreement for this particular case in Dutch. A similar pattern of agreement occurs for one particular demonstrative, namely dat (‘that’). Although not all instances of dat were agreed on in the reliability tests, the large majority were analysed in the same manner by all three analysts, and in both registers. In contrast, the analysts had more difficulty with the other demonstratives, and in particular with die (‘that’). Two examples of dat in context can be found in the following utterance from one of the conversations: (9) toen ging ik dat ‘ns uitproberen maar eer thee afgekoeld then went I that once try out but before tea cooled down is op drinktempe dat duurt tien minuten. is at drinking temp- that takes ten minutes.

(CGN-fn000745)

‘then I went to try that out but before tea is cooled down to drinking tempthat takes ten minutes.’

All coders analysed the two instances of dat as related to metaphor, taking as the basic meaning the first sense in the dictionary, ‘wijst iets aan dat zich niet in de onmiddellijke nabijheid van de spreker bevindt’ (‘pointing to something that is not in the direct neighbourhood of the speaker, but is farther removed in space (as opposed to dit [this])’). Since the definition refers to objects in a space removed from the speaker, the concrete meaning was seen as the basic meaning. In the above context, both instances of dat refer to a previously mentioned action, the drinking of tea and the cooling down of tea respectively. The contextual sense is then the third sense in the dictionary, dat referring to something previously mentioned (‘ter aanduiding van iets dat tevoren genoemd is’). All analysts saw a contrast and comparison between these two senses, and between the domain of objects and space, and words and discourse.

 A Method for Linguistic Metaphor Identification

The examples of slachtoffer and dat illustrate how, with an explicit procedure, different analysts tend to come to the same decision. The decision is reached regardless of the exact relationship between different senses and domains, and regardless of the possible underlying mapping between the two senses. This is precisely the point of the procedure: to come to an initial conclusion about the metaphor-related status of a word without directly having to explain what the exact underlying mapping and the analogy could be. In basic terms this should be doable in the same way for many languages. 7.4.2 Dutch discourse and disagreement This section will detail some of the cases of disagreement in the reliability tests. Overall, the same patterns of disagreement were found throughout the process of identification, but the examples to be discussed here have been taken from the reliability tests, so as to be able to point to problems more accurately. The percentages mentioned in the introduction showed that the analysts disagree about the relation to metaphor for less than 10% of the lexical units in the texts. A closer look at where disagreement occurs shows that we can distinguish a number of problematic areas. It also reveals that there seems to be a difference in disagreement for the two registers: the majority of the cases of disagreement in news texts occur in delexicalized verbs such as hebben (‘to have’), whereas the disagreement in conversations predominantly occurs in demonstratives and other function words such as pronouns. One form of disagreement that occurs in both registers arises from the difficulty of establishing distinct meanings for some nouns and verbs. The major areas of disagreement will be explained in the following sections, and will be linked to the register they belong to. In addition, it will be shown that these problems, some of which have also been discussed in previous chapters, occur in both the English and the Dutch texts. Delexicalized verbs in news An area where the three analysts repeatedly disagreed is the area of delexicalized verbs. As the examples below will point out, disagreement here seems attributable to problems with establishing a clear contextual meaning for a word, specifically when the verb occurs in a more or less fixed expression such as gaan om (‘to involve’) in Example (10) or when the concreteness is unclear, as in Example (11). (10) Volgens een ingewijde gaat het om een gewelddadige According to an insider goes it about a violent groep criminelen. group criminals.

(Algemeen Dagblad)

‘According to an insider it involves a violent group of criminals.’

Chapter 7. Metaphor identification in Dutch news and conversations 

In this sentence the analysts were in disagreement about the metaphorical status of gaat (‘to go’). Two of the three analysts coded gaat as a metaphor-related word. A reason for this lack of agreement is that the analysts did not agree on the contextual meaning of the verb gaan when used in a fixed expression (but not an SCV or phrasal verb) such as gaan om. It is difficult to establish what gaan means in this case, and the person who did not code it as metaphor-related decided that the meaning of gaan here has faded to such an extent that it is not possible to see a mapping between the basic concrete sense of movement and a possible abstract target domain. The verb gaan in Example (10) has become more or less delexicalized, and has lost a clear and related meaning in a context like this. The difficulty in establishing a clear contextual meaning is the main reason for disagreement between analysts on the metaphorical status of delexicalized verbs. Example (11) is of a slightly different order: (11) Het drietal heeft de Poolse nationaliteit. ‘The three people have the Polish nationality.’

(Algemeen Dagblad)

Again, two of the three analysts coded heeft as a metaphor-related word. In this case, ascribing metaphorical meaning to heeft depends on how the analysts interpret Poolse nationaliteit. Since nationality is part of a person it might be seen as not related to metaphor. But it is also possible to say that it is something abstract that cannot be physically obtained or touched. Then heeft refers to an abstract form of having, where there is no question of a concrete object. In that case, the verb would be analyzed as metaphorically used. It seems that the issue of delexicalized verbs is specific to the news texts in the corpus. This may have to do with the occurrence of these verbs in fixed expressions. Such expressions as gaan om (‘to involve’) and others like te maken hebben met (‘to have to do with’) typically introduce an event or entity into a particular situation. The expressions typically describe what content we are dealing with (such as ‘violent criminals’ in Example (10)) and the context of the situation in the news report. The conversations used for the research are spontaneous ones which do not often report a particular event or situation in this way and therefore do not contain fixed expressions that introduce these referents. A question that analysts have to answer when dealing with these verbs, and one which can be answered in different ways, is whether it is possible to still speak of one basic meaning for frequent verbs such as gaan or hebben, and, more importantly, if these verbs have a clear contextual meaning when they are used in fixed expressions (cf. Cameron 1999; Deignan 2005, Chapter 2). Another problem analysts have to deal with is whether the objects of the verbs are concrete enough to say that these verbs are used in their concrete sense, something that is also discussed by Cameron (1999, 2003) in relation to the metaphorical meanings of some verbs.

 A Method for Linguistic Metaphor Identification

Demonstrative die in conversations Demonstratives are used frequently in conversations, usually in a deictic way. In principle, the demonstratives die, dat, deze and dit will be analyzed in a similar manner as their English counterparts discussed in Chapter 4. Van Dale describes the basic meaning of demonstratives in similarly concrete terms as Macmillan. For example, it gives as a basic meaning of die (‘that’): ‘referring to something further removed in space from the speaker than when using deze (‘this’).’ The words behave in the same manner in English and Dutch, and were analyzed in the same manner, but their analysis produced different results in the two projects. In particular, the demonstrative dat received high agreement figures, whereas die accounted for most of the disagreement that occurred in the conversation tests. Although it is true that there is no one-to-one correspondence between the demonstratives in Dutch and English, they do have roughly the same functions in conversations: in both languages, they are demonstrative determiners or demonstrative pronouns used in a deictic manner (see for a comparison: Aarts & Wekker 1987: 150–153; and for an overview of their functions in English and Dutch respectively: Biber et al. 1999 and Haeseryn, Romijn et al. 1997). So let us examine some of the reasons for disagreement in the Dutch texts. Consider Examples (12) and (13). The numbers in brackets show how many analysts (three in total) coded die as a metaphor-related word. (12) - oh ja die(1) ik vertelde daarnet die(3) mevrouw die(0) (CGN-fn000745) - oh yes that I told just now that lady that ‘oh yes that I was telling just now that lady who’

In sentence (12), disagreement occurred for the first die because the referent of the deictic pronoun was difficult to establish, partly due to the fact that the speaker hesitated before continuing the utterance in a similar manner. One analyst decided that, regardless of the hesitation and the aborted utterance, the demonstrative referred to mevrouw in the resumed utterance. However, the other analysts decided the reference was unclear, and disregarded the demonstrative for metaphor analysis altogether: they added the code DFMA. As we also saw in Chapter 4, hesitations and repetition such as in (12) are characteristic of spontaneous speech. Consequently, disagreement occurred in cases where it was difficult to establish the referent of a demonstrative. Another problem is illustrated by (13): (13) - één van die(3) twee mevrouwen waar ‘k boeken voor haal - one of that two ladies where I books for get ‘one of those two ladies for whom I get books’ - die(1) gehandicapt is - that disabled is ‘who is disabled’

(CGN-fn000745)

Chapter 7. Metaphor identification in Dutch news and conversations 

In Example (13), the second die was coded as a metaphor-related word by one analyst. However, in this particular context, die is not a demonstrative pronoun but a relative pronoun. As Van Dale explains, relative pronouns are purely functional, and do not have meaning content. This particular case of die should thus not lead to a comparison with the demonstrative pronoun die. The interesting difference between English and Dutch in this respect, and possibly one of the reasons why there is more disagreement about the status of these Dutch demonstratives, is that die as a relative pronoun refers to both objects and persons. In English, when referring to a person, the relative pronoun used is usually who, and not that (see Aarts & Wekker 1987: 156). The reliability tests have shown that the two word classes in which die can occur are occasionally confused by some of the analysts. This can be seen in (13), where the second die is a relative pronoun and should thus not be coded as metaphorically related. The fact that die can occur in two different word classes, and that the referent is occasionally unclear, may have contributed to the disagreement figures. Contrasting and comparing A more detailed look at the areas of disagreement shows that some can be traced back to the procedural steps of contrasting and comparing meanings. It has become apparent that some of the less frequent words in news texts, the ones describing relatively abstract concepts, can cause trouble when it comes to determining either clear contrasts between contextual and basic meanings or a clear comparison between the two. One problem has to do with the level of distinctness between the numerous senses in the dictionary. When the following lexical units are analysed according to the procedure, a problem can arise for the word in italics: (14) (…) de grootste hoeveelheid drugs ooit onderschept ‘(…) the largest quantity of drugs ever intercepted in Nederland in The Netherlands.’

(Algemeen Dagblad)

The contextual meaning of onderscheppen is listed under 1a, ‘(iem. of iets) beletten voort te gaan of zijn doel te bereiken’ (‘prevent someone or something from continuing or reaching his/its goal’). This is a slightly different meaning than the basic one, which is ‘(iem.) op zijn weg tegenhouden’ (‘to stop/obstruct someone on his way’) since it includes objects that can be intercepted. The two senses can clearly be compared with each other, since they describe the same action. Whether onderscheppen in (14) can be seen as metaphorically used, however, depends on whether it is possible to distinguish between intercepting a person and intercepting an object as distinct but comparable meanings. It could be argued that these

 A Method for Linguistic Metaphor Identification

descriptions are two instances of the same actions, and that onderschept is therefore not metaphorically used. In the reliability test, the word onderschept was coded as a metaphor-related word by one analyst. The most obvious explanation of this analysis is the distinction made in the dictionary between ‘iemand onderscheppen’ and ‘iets onderscheppen’. In principle, the verb denotes the same action in both cases, but the object is different. The question arises if the objects, a person and a thing, can be seen as two distinct types of entities, which could result in a sufficiently distinct meaning between the two uses. In some analyses, we distinguish between persons and objects and say that they belong to two different domains. These cases, however, often involve human characteristics and emotions and are often related to personification. In the case of Example (14), both objects can be seen as concrete, and the action of the verb remains largely the same in each situation. In other cases, two meanings of a unit can be contrasted with each other but it is difficult to set up a comparison. Consider for instance the following sentence: (15) Uit afgeluisterde telefoongesprekken concludeert justitie dat From monitored phone conversations concludes police that de betrokkenheid van de twee ambtenaren (...) the involvement of the two civil servants (...)

(Algemeen Dagblad)

‘From monitored phone conversations the police have concluded that the involvement of the two civil servants (…)

The contextual meaning of the unit is ‘het betrokken-zijn in of bij iets’ (‘the involvement in or with something’). As a basic meaning of the unit we can take sense 2 in the dictionary, ‘bewolktheid’ (cloudiness), since that sense is related to a natural phenomenon which can be observed and which has a spatial characteristic. It is thus the most concrete sense of betrokkenheid. The two senses are sufficiently distinct and can thus be contrasted on different levels; the contextual sense is very abstract, and the basic sense denotes a concrete weather-related state. It is difficult, however, to see a comparison between the two senses. There does not seem to be one element of involvement that can be compared in a way to cloudiness. So although we can clearly find contrasts between the two senses, it is not possible to compare one or more elements with each other. One explanation for the lack of a comparison can be that the two instances of betrokkenheid stem from two different instances of the verb betrekken from which it derives. Betrekken in the weather sense is intransitive, whereas betrekken in the involvement sense is transitive. Even though Van Dale does not list these instances as separate entries, we can still see them as such. Therefore, the conclusion will be that betrokkenheid in (15) is not metaphorically used. The lexical unit betrokkenheid was coded as a metaphor-related word by one analyst. The disagreement among the analysts for this example results from the

Chapter 7. Metaphor identification in Dutch news and conversations 

apparent contrast between the two senses. Two analysts decided that the two meanings cannot be compared. The remaining analyst may have seen a possible mapping, possibly between the highest domains of abstractness and concreteness, and coded it as a possible metaphor. Sometimes it is difficult to pinpoint what analysts can make of the word definitions given in Van Dale. The problems listed above are general problems that occur in both registers analysed in the Dutch project (and in all registers analysed in the English project). They seem to be problems related to the nature of metaphorical language rather than to the procedure of metaphor identification. The two problems listed in this section deal with the distinctness of senses and the comparison between senses, issues that always need to be addressed when doing metaphor research, regardless of the procedure of identification that is used. The preposition van in news and conversation In some cases it proved difficult to agree on the basic meaning of prepositions. One major point of discussion has been the analysis of the preposition van (of or from). Although the reliability tests do not point to van as a problematic case, other texts analyzed by more than one analyst have shown that van is a problem area. What makes van different from the other prepositions in Dutch is that it seems to have gradually grammaticalized; the content of the function word has faded in some uses of van. It is still possible, however, to distinguish basic meanings of van from abstract meanings, and basic uses from abstract uses. It becomes complicated when the abstract uses of van seem more or less grammatical, as is the case in many of the descriptions of van in Van Dale. The preposition has the striking number of fifty-five sense descriptions, which should already be an indication as to how difficult it can be to come to a conclusion about possible metaphorical meanings. One basic meaning of van that can be distinguished in the dictionary is the first: pointing to an object, person or place from which someone or something moves or is moved or taken away. Below is an example of this use from the corpus: (16) Er volgde een wilde achtervolging van Maastricht ‘There followed a wild chase from Maastricht naar Valkenburg (…). to Valkenburg (…)’

(NRC)

Example (16) provides a relatively clear case of the concrete use of van. An example of a very functional, almost empty form of van can be found in (17): (17) (…) maar eens polsen of Wim Kok, in geval van kabinetsdeelname, (…) but see if Wim Kok, in case of cabinet participation, niet bereid is (…) is not willing (…)

(Algemeen Dagblad)

“let’s see if Wim Kok, should he join the Cabinet, is not willing …”

 A Method for Linguistic Metaphor Identification

The construction occurs when van is used as a connection between two phrases of which the first is more general and the second is a further specification or a concrete object. As can be seen in the translations of (16) and (17), the Dutch preposition van can be translated into from or of in English in the concrete sense, and into of in the abstract sense. From in English has a distinct basic sense that is concrete and other senses that can be metaphorically derived from the basic sense, but of in English has lost its basic sense and is seen as a function word that does not have metaphorically derived meanings anymore (Lindstromberg 1998: 195). This is different from how van is used in the Dutch language. The analysis of van becomes problematic when it is used in a more abstract, grammatical manner. In some cases, it is possible to identify a metaphorically derived sense, where the contextual meaning can be contrasted and compared with one of the basic senses, as in Example (18): (18) in de tweede helft van het jaar in the second half of the year

(Algemeen Dagblad)

Here, van is used in an abstract way, functioning within the domain of time. This use can be contrasted and compared with the basic sense of van in a sentence like (17); it is used in an abstract domain, but both define a part of something. In other cases, the use of van is so abstract that it is difficult to see an element that can be compared with the basic sense. That type of use is illustrated in (19): (19) Jullie willen daar uh ook van alles nog veranderen? (CGN-fn000259) You want there eh also of everything still change? ‘You also still want to change all kinds of things there?’

In (19) it is difficult to pinpoint exactly what van means when used in combination with alles. The problem concerns the exact contextual meaning and thus a contrast and a comparison. Overall, it is not easy to establish when one use of van is abstract but still comparable and contrastable to one of the basic senses, and when van has moved towards a more grammatical and functional use. This is the reason why analysts do not agree at all times about the status of van in different instances in the texts. What is clear, though, is that van is pervasive in the news texts, and that the decisions made on the possible metaphorical status of some of the uses will therefore greatly influence the data. A separate analysis of van on a conceptual level could give a better overview of how the preposition works in varying contexts. The examples also show that particular word categories can cause more problems than others when applying the procedure of metaphor identification, and that particular lexical units, constructions and meanings can be language-specific.

Chapter 7. Metaphor identification in Dutch news and conversations 

7.5 Conclusion The sections above have given an overview of various issues concerning Dutch texts and metaphor identification. These include questions of how to adopt and adjust a procedure that was originally designed to analyze English discourse. As the agreement percentages have shown, Dutch language materials can be analyzed in a similarly reliable way as English language materials. However, this can only be done if analysts take into account a number of operational differences and language-specific issues. The slightly lower percentage of overall agreement about the metaphorrelated words and the non-metaphorical words for Dutch in comparison to English is due to a number of factors. Important in relation to the procedure is the lack of specific part-of-speech tags and the use of the historically based dictionary Van Dale. In addition, Dutch contains linguistic features that need to be treated differently from those described and explained in the original procedure, in particular SCVs and polywords. The reliability tests have shown that these features, or some of them, regularly have a link with metaphorical language, which can influence the judgment of metaphor by analysts, and therefore affect the overall number of metaphor-related words. It is important to explain these issues in more detail in the procedure. However, even if the matters mentioned above are incorporated into the procedure in an explicit manner, it remains impossible to eliminate disagreement entirely. This is due to the analyst being an individual and having his or her own intuitions. As pointed out in the discussion of disagreement above, one analyst can see a possible mapping between two distinct meanings where another analyst decides that the meanings are either not distinct or not comparable on one or more levels. This individual aspect will remain at all times, since meaning descriptions in a dictionary can be interpreted in varying ways. However, since operational issues dealing with how to use the corpus, the tags and the dictionary are clarified clearly in the procedure, disagreement between analysts can be reduced to a minimum. Additionally, a detailed procedure like the one we have used will produce reliable results for metaphor identification if it is used in the same way by the different analysts, and will produce a more uniform analysis than when basing decisions about metaphorically used words on intuition alone. This chapter has illustrated that a procedure such as MIPVU, which takes the English language as its basis, can also be used for Dutch discourse, albeit with slight alterations and specifications. Part of the success is due to the resemblance between the two languages on the lexico-grammatical level. Dutch, like English, carries meaning in words, and lacks great inflectional influences. A language like

 A Method for Linguistic Metaphor Identification

German, where inflection is more important and influential, may pose different problems for procedures such as MIP and MIPVU. In addition, languages which have a more complex system of word formation and compounding, where different morphemes carrying meaning can be combined, will need a procedure that is able to identify these elements. For Dutch, however, MIPVU produces reliable results, and generates an extensive and workable data set of metaphor-related words.

chapter 8

Reliability tests 8.1 Introduction The primary motivation for this research and book concerns the issues of the validity and reliability of linguistic analysis, in particular of the identification of linguistic expressions that may be said to be related to metaphor. Researchers have an obligation to show that their data and findings are not figments of their imagination but independent entities in the world out there. When we claim that we have found patterns of metaphor in a corpus of about 190,000 words, as we will do in Chapter 10, serious questions may be asked about the validity and reliability of those patterns, and it is our responsibility to show that these questions can be answered in satisfactory ways. That research on language requires human interpretation is clear, but such interpretation needs to be turned into the observation and measurement of ‘things out there’ if it is to count as good science. Methods and techniques are independent tools which take observation out of researchers’ heads and place it into some intersubjective, operational arena. That is where observed phenomena can be subjected to critical discussion and evaluation. The explication of our coding scheme in a written form with an exhaustive set of coding instructions is one element in producing such a technique for metaphor identification. But the calibration and testing of the tool is another crucial element in proposing a tool for measurement. However systematic and explicit our set of instructions and criteria may aim to be, it is useless if it cannot be shown to guide analysts in the same way. Either the tool should lead to substantial agreement between observers, or it does not function as a tool. Concomitantly, if a tool is reliable in this way, it may be held to obtain reproducible results that are based in the same observation of reality. It is the purpose of the present chapter to examine how our procedure fares against these desiderata. The extent to which a number of analysts agree in making repeated binary decisions for any set of materials can be determined in at least two principally different ways (Dunn 1989; Scholfield 1995). One type of analysis examines the overall degree of difference between individual researchers. It measures the total number of cases (i.e. lexical units) that analysts have marked as related to metaphor, and then compares the proportions between metaphorical and non-metaphorical

 A Method for Linguistic Metaphor Identification

cases across analysts. If the differences between the proportions are too great to be due to chance alone, that is, if there is a statistically significant relation between metaphor identification and individual analysts, the analysis is not seen as sufficiently reliable. Such an outcome suggests that metaphor identification is related to the bias and performance of (groups of) individuals. The question of analyst bias can be addressed by computing a test statistic called Cochran’s Q (e.g. Dunn, 1989). It can be used to measure the importance of the differences between the metaphor analysts. If Cochran’s Q becomes statistically significant, the reliability of the procedure is compromised by the individual differences among the analysts. One problem with this particular type of reliability measurement, however, is that it does not look at lexical units that are potentially metaphorical as individual cases. Even if there were a statistically significant difference between individual researchers in terms of the total numbers of cases that they identify as related to metaphor, it could still be possible for all or most researchers to agree about a core group of cases while having different opinions about another group of more marginal cases. Thus, some words in a text might be consistently marked as metaphorical by all analysts, whereas other words would be judged in a less consistent manner. Indeed, it is a common assumption in methodology that there will always be a large group of clear cases that everybody can agree on in most classification tasks. Analyzing the data as potentially metaphorical individual cases would give more weight to differences among metaphorically used words instead of among analysts. The appropriate coefficient of agreement for this measurement is kappa (e.g. Artstein & Poesio, 2008). There are several variants, the most important of which, traditionally, are Cohen’s Kappa and Fleiss’ Kappa. The difference between the two measures is that the former can only gauge the degree of agreement between pairs of analysts, whereas the latter can analyze agreement across larger sets of analysts. What the measures share is their correction for chance agreement between analysts: if a set of data displays a particular percentage of metaphorrelated words (‘MRWs’), there is a related magnitude of chance that analysts will obtain fortuitous agreement. Kappa corrects for this level of chance and measures how often analysts agree when chance is taken out of the equation. One problem with kappa is its interpretation. Kappas range between −1 and +1, −1 suggesting that analysts perform below the level of chance. The problem lies with the range of positive values, for it is unclear which magnitude of a kappa should be accepted as adequate. Cut-off points have been suggested at various points, the most frequently used of which are 0.66 and 0.80. However, the meaning of these cut-off points in empirical terms is unclear. That is why we will simply assume a pragmatic position: we report our kappas and, together with our set

Chapter 8. Reliability tests 

of instructions and protocol of analysis, offer this approach as one way of doing metaphor identification. If other researchers wish to test our findings or emulate them in new research, they then at least know the target of reliability that has been set in the current project. Given the prime interest in a reliable description of the nature of the linguistic items and not in the performance of the human analysts, most linguistic studies that assess agreement across individual analysts on some research topic have adopted the second method. For instance, Markert and Nissim (2003) have reported Cohen’s Kappa for assessing the reliability of their (selective) method of metonymy identification. However, we believe that the first type of analysis by means of Cochran’s Q also provides critical information, and so report both types below for comparison purposes. In all of these considerations, it should also be pointed out that these reliability tests examine agreement between individual analysts before discussion. In all of the tests below, analysts were given a set of materials that they had to analyze on their own. This procedure was followed to carry out a methodological test, in this chapter, which checks the extent to which individual performance on the basis of MIPVU leads to comparable results. In our empirical work for the overall research project, however, individual analysis is only the first step of a more elaborate protocol, in which additional checks and discussions are held to reduce the inevitable degree of error that is part and parcel of this type of research.

8.2 Method All tests reported below were set up in the same manner. One or more excerpts were selected from the same set of text files that were annotated for the main research project. The length of the files was chosen in such a way that text excerpts could offer a sufficiently broad range of cases that had to be classified. Except for the first tests in the English-language project, the range of materials was also related to the different content domains that the analysts had to deal with. The application of the procedure may raise different problems in different domains, and one of the questions was whether it was equally reliable in conversation, news, fiction, and academic discourse. All analysts were instructed to annotate the excerpts according to the procedure. They had to do this by using the Word file version of the excerpt, which did not exhibit the additional information that is part of the files in the BNC-Baby. Analysts thus worked on clean text files and had to attach a relevant code at the end of a lexical unit in the Word document. In terms of our standard methodology this is also different than what happens during the real annotation process, which

 A Method for Linguistic Metaphor Identification

is dependent on special software that contains drop-down menus for the insertion of various codes. The reliability tests can therefore be seen as a stripped-down version of the genuine annotation process, with fewer distracters but also with less help, for instance in the form of POS tags. If POS tags were needed, analysts could consult the original BNC files for more information. Analysts had to submit their annotated Word files to the principal investigator, who was in charge of the statistical analysis. Texts were transformed into an SPSS database, with every lexical unit in a separate row, and with separate columns for individual analysts. Whenever a lexical unit had received a code for being related to metaphor, this was entered into the database as a 1 (one). All other lexical units automatically received a 0 (zero). This database was then used to compute Cochran’s Q (with SPSS 15) and Fleiss kappa (with an on-line programme developed by Philippe Bonnardel, at http://kappa.chez-alice.fr/kappa_intro.htm). The classification of the data into simple binary codes for statistical purposes raises a number of issues. The first issue concerns the unit of analysis. Since the procedure requires checking of polywords, phrasal verbs, and compounds, these may have to be marked up as such in the reliability test, as well. However, since these issues occurred very seldom in the test materials, and since they can be decided on the basis of fairly objective criteria, they were not included in the reliability test as an issue for statistical evaluation. In the day-to-day practice of our empirical research, all of these issues were monitored fairly systematically, and, as will be shown in the next chapter, degree of error across the complete corpus turned out to be low. The second issue has to do with the judgment that lexical units could not be sufficiently interpreted to permit a clear judgment regarding metaphoricity. These are the words that need to be given the code DFMA, discard for metaphor analysis. This problem, too, occurred too seldom to be usefully incorporated into a statistical analysis in these reliability tests. And again, our empirical research resolved the issue of DFMAs during discussion so that the importance of subjecting their identification to statistical analysis became even smaller. A similar story holds for the identification of metaphor flags, or MFlags. Even though they are theoretically equally important to the two main categories of words related or not related to metaphor, they hardly ever occur in the data. And if they are present, they rarely raise controversy. For this reason their role was also ignored in the statistical analysis of the reliability tests. A fourth issue has to do with the annotation of cases as WIDLII. This classi fication suggests that an analyst recognizes the use of a particular lexical unit as previously classified as WIDLII, and recorded as such in a special lexical database. Since these comments on the borderline status of a particular lexical unit will always be discussed in the group, their role in individual analysis is less important

Chapter 8. Reliability tests 

for our purposes; and since their occurrence is relatively infrequent, it is easier to leave them out of consideration. An additional check was performed in the last reliability test of the English language project, in which all WIDLIIs were given a separate code in the database, next to the binary distinction between metaphorical and not metaphorical, and this showed that the overall kappa for this three-fold classification was 0.80. The following reliability tests are therefore based on simple sets of binary judgments, whether a lexical unit is or is not related to metaphor. The judgments come from three or four independent judges. All other issues in their protocols are disregarded for the reasons set out above. The tests hence focus on the reliability of metaphor identification in its simplest possible form, on the basis of the extensive set of instructions reported in Chapter 2. 8.3 Results and discussion: English-language research 8.3.1 Study 1 The first reliability test was conducted ten weeks after the start of the complete research project. It was based on a draft version of the procedure as reported in chapter 2. Analysts were the four Ph.D. students originally starting out on the English-language project (Biernacka, Dorst, Kaal, and López-Rodríguez) and the one Ph.D. student working on the Dutch language project (Pasma). The materials were text FC1 from BNC-Baby, a sample containing about 2440 words from a periodical, The Weekly Law Reports 1992 Volume 3. It is in the domain of social science. Here follows a small portion of the text. It contains indications of the number of analysts who had marked a particular word as related to metaphor; the three underlined multi-word expressions were all demarcated as such by analyst 4: In12345 May 1987 the debtor, who had carried on4 the business of running123 a nursing home, sold the business as a going3 concern3 and went to live in the Canary Islands. No debts remained outstanding3, apart from4 a tax liability in3 excess of4 £500,000. In12345 1988 the debtor made12345 an offer to settle34 the tax liability which was not accepted, and in12345 February 1991 the Inland Revenue presented4 a bankruptcy petition3. The registrar in12345 bankruptcy made12345 an order13 allowing service4 of the petition3 on12345 the debtor in the Canary Islands. On12345 the hearing1 of the petition3 he decided that the debtor had ‘carried on4 business in England and Wales’ within12345 a period of three years ending with14 the presentation4 of the petition3 for the purposes of section3 265(1) (c) (ii) of the Insolvency Act 1986 and made23 the order13 sought45.

 A Method for Linguistic Metaphor Identification

The analysis was aborted, however, since there were too many problems with aspects of the procedure. Analysts raised questions in the materials, for instance about the status of legal terms such as petition in the English language as a whole as well as in the Macmillan dictionary. Since the procedure was not clear enough, it made little sense to pursue the reliability test. We decided to explicate even more aspects of the decision process, and then do a new reliability test for the improved procedure. 8.3.2 Study 2 The second reliability test was held three months later, when the procedure was almost completely in its present, final form. The materials for the test were taken from The mind at work (by W.T. Singleton, Cambridge University Press, Cambridge, 1989). This is a sample in BNC-Baby containing about 40,742 words from an academic book (domain: applied science). Our excerpt contained 713 words, and focused on human resources and system design. The findings for the test are reported in Table 8.1. Of all 713 lexical units, 79.1% were coded unanimously as not related to metaphor, and 12.1% were coded unanimously as related to metaphor. In all, the four analysts unanimously agreed on 91.2% of all cases while doing their analyses completely independently of each other. In other words, only 8.8% of the data were given different interpretations by the four analysts. The reliability test that examines agreement between analysts on an item-byitem basis, and which corrects for the role of chance, yielded a Fleiss’ Kappa of 0.83. This magnitude is in the highest category of all divisions made in the literature. It suggests that there is substantial and adequate agreement, and that the procedure produces reliable results. Table 8.1. Across four independent analysts (1, 2, 3, 4) for one text File ID in BNC

CLP (acad.)

Number lexical units

713

Percentage unanimous Not MRW

MRW

Total

79.1

12.1 (n=86)

91.2

Fleiss’ κ

Min MRWs

Max MRWs

Cochran’s Q (df=3)

0.83

114

130

8.36*

* p < 0.05

The test checking for analyst bias showed a significant Cochran’s Q (df = 3, Q = 8.36, p < 0.05). This suggests that the difference between the lowest scoring analyst, who identified 114 lexical units as related to metaphor, and the highest scoring analysts, who identified 130 lexical units as related to metaphor, is too

Chapter 8. Reliability tests 

great to be due to chance variation between analysts. In other words, part of the variation between analysts in metaphor identification in the reliability test is due to analyst bias. In the empirical research project, a text, after analysis, would next be posted on a website and then checked by the other analysts. These would comment on mistakes and controversial decisions. For instance, one analyst missed the demonstrative this in the phrase ‘In spite of all this’, and this would surely be corrected in the subsequent rounds of discussion. The role of analyst bias is always reduced in the empirical work. In the methodological tests, however, we thought it would be most informative to examine the baseline of analyst behaviour. The difference between the minimal and maximal score of metaphor identification, that is, between 114 and 130, is only 16. If the 8.8% of non-unanimous cases were to be discussed by all four analysts, the difference between the highest and lowest scores for metaphor would soon become smaller, as it always has in practice. 8.3.3 Study 3 Another reliability test was held on another portion from the same excerpt two months later. It was performed in order to be able to report first results at a conference, where we wished to claim that Study 2 was not just a case of good luck. This time the excerpt was longer, containing 1180 lexical units, and the topic, even though still in human resources, was shifted to manpower planning. An example from the materials and data is given below: From1234 the narrow1234 accountancy viewpoint1234, people are a cost23 and it is desirable to keep1234 this1234 cost2 as low1234 as possible. In1234 these1234 terms3 it is very difficult to justify1, for example1,4, sending2 a member134 of staff on1234 a training1 course1234. The training1 requires expenditure and so also does the replacement for the person away3. Where124 is the return1234? The return1234 is actually in1234 the improved human resource23 but this1234 is not readily measurable2 in1234 terms3 which accountants use1234.

The results of the reliability test are shown in Table 8.2. Table 8.2. Across four independent analysts (1, 2, 3, 4) for one text File ID in BNC

CLP (acad.)

Number lexical units

1180

*** p = 0.001

Percentage unanimous Not MRW

MRW

Total

74.6

13.5 (n=159)

88.1

Fleiss’ κ

Min MRWs

Max MRWs

Cochran’s Q (df=3)

0.79

209

246

23.22***

 A Method for Linguistic Metaphor Identification

Unanimous judgments about lexical units that were not related to metaphor covered 74.6% of all cases. Unanimity regarding metaphorical cases was achieved for 13.5% of all lexical units. Together, this means that this reliability test exhibits unanimous agreement for 88.1% of all cases. This is slightly lower than in the previous test. This is reflected in the statistics. Fleiss’ Kappa has dropped one point under 0.80, and Cochran’s Q has become vastly more significant. However, kappa is still comparatively high. The difference between the minimum and maximum scores for the identification of MRWs is 37 out of 246, which is 15 percent. In all, then, these findings confirm the impressions of the previous reliability test, that agreement is substantial and adequate, but also raise the question whether the slight drop in performance is accidental or more serious. 8.3.4 Study 4 The next test was aimed at checking the development in performance and, more importantly, at running a test across the entire range of the materials. It is the first time that four registers are included, and the question arises whether the procedure works equally well for all four. Moreover, this test is also the last test for the original team that started out on the project: by the time the present test was run, it had become clear that two of the four original analysts were to abandon the project by the end of the first year. It was therefore essential to have data about the performance of the first team across all of the four registers. As a result, the amount of data in the test also displays a considerable increase: the first test contained 713 words and the second one 1180, but the present test comprises no fewer than 1940 lexical units, divided across four excerpts. The excerpts come from the following BNC files: –– Academic, CMA: Evolution from molecules to men. Bendall, D S (ed.), Cambridge University Press, Cambridge (1985), 43–565. Sample containing about 40,043 words from a book (domain: natural sciences). –– Conversation, KBW1: 62 conversations recorded by ‘Dorothy’ (PS087) between 13 and 20 March 1992 with 25 interlocutors, totalling 19706 s-units, 115,332 words, and over 13 hours 0 minutes 42 seconds of recordings. –– Fiction, CFY: My beloved son. Cookson, Catherine, Corgi Books, London (1992), 85–221. Sample containing about 38,926 words from a book (domain: imaginative) –– News, A7W: The Guardian, electronic edition of 1989–11-08: Home news pages. Guardian Newspapers Ltd, London (1989-19-19). Sample containing about 25,255 words from a periodical (domain: world affairs)

Chapter 8. Reliability tests 

The results of the reliability test are displayed in Table 8.3. The small dip in reliability between tests 2 and 3 does not appear to be indicative of a trend. Overall reliability is good. The academic text displays a kappa of 0.89: the dip in the previous test may have been due to random variation, perhaps because of a slightly more difficult text or less optimal performance by the analysts or both. But there is no indication that it signals a negative development. Table 8.3. Across four independent analysts (1, 2, 3, 4) for four texts File ID in BNC

CMA (acad.) KBW (conv.) CFY (fict.) A7W (news) Total

Number lexical units

Percentage unanimous

Fleiss’ κ

Min MRWs

Max MRWs

Cochran’s Q (df=3)

5.31

Not MRW

MRW

Total

516

74.6

93.6

0.89

110

119

498

92.9

95.1

0.70

13

31

428

84.4

19 (n=98) 2.2 (n=11) 7.9

92.2

0.81

47

52

2.18

501

82.8

12.2 (n=61)

95.0

0.89

69

79

8.41*

1940

83.6

10.5 (n=204)

94.1

0.86

241

277

24.33***

25.65***

* p < 0.05; *** p = 0.001

On the contrary, the findings show that the procedure works well for all four registers. When the four analysts independently examine four excerpts of about 500 words each, they come to exactly the same conclusion for no less than 94.1% of the data. For the 1940 lexical units in our test, that means that 1826 cases are treated in identical ways between four analysts, whether they are related to metaphor or not. Since the range of percentages goes from a low 92.2% for fiction to a high 95.0% for news texts, there does not seem to be one register which is especially problematic. None of these findings were self-evident before we ran these tests. The relatively low kappa for conversation is nonetheless noteworthy: it is 0.70, whereas the other kappas are around 0.80 (fiction) or even approach 0.90 (academic and news). Close inspection of the data shows, however, that this low result for the conversation excerpt may be due to the rather low percentage of metaphorrelated lexical units in the materials: only 11 lexical units were unanimously judged to be related to metaphor, while the range of minimum and maximum scores for

 A Method for Linguistic Metaphor Identification

metaphor identification, going from 13 to 31, suggest that there may only be a small group of clear cases, with perhaps just as large a group of borderline cases. With such a small group of potentially metaphorical cases, any difference in opinion will be magnified in the statistics. This finding will be thrown into relief by the next tests. When we look at analyst bias, we may see a similar effect of the different registers. There are two texts, the academic text and the fiction text, which have a very low difference between the minimum and maximum scores for relation to metaphor. As a result, Cochran’s Q does not become significant, so that analyst bias does not play a role in these two texts, whereas it does in the conversation and in the news text. It is possible that this means that the academic and fiction text were so clear in their use of metaphor that the biases of individual analysts simply never came into play. This only goes to show that care needs to be taken when findings like these are to be interpreted. 8.3.5 Study 5 Reliability test 5 was crucial for the reason that it was the first test with the new research team. Two new Ph.D. students had been recruited and trained from October 2006 on, and the question arose whether the method could be transferred in a short period of time so that the empirical work could continue at the same level of quality. If the new team would achieve just as great a degree of agreement as the old team, they were living proof that our procedure is not just dependent on the heads of a number of specific individuals, but has a life of its own which can be taught to other linguists. Since this is precisely one of the points of the bigger research project, the stakes were high for reliability test 5. The following fragments were used for excerpting: –– Conversation: 60 conversations recorded by ‘Ann’ (PS02G) between 28 November and 5 December 1991 with 35 interlocutors, totalling 16,243 s-units, 103,997 words, and over 13 hours 5 minutes 36 seconds of recordings. –– News: The Guardian, electronic edition of 1989–12-10: Foreign news pages. Guardian Newspapers Ltd, London (1989–19-19). Sample containing about 3,705 words from a periodical (domain: world affairs). The topic of our excerpt is a country diary. –– News: The Guardian, electronic edition of 1989–12-13: News and features. Guardian Newspapers Ltd, London (1989–19-19). Sample containing about 258 words from a periodical (domain: social science). The topic of our excerpt is an obituary. The results of test 5 are shown in Table 8.4.

Chapter 8. Reliability tests 

Table 8.4. Across four independent analysts (1, 4, 6, 7) for three texts File ID in BNC

Number lexical units

Percentage unanimous

Fleiss’ κ

Min MRWs

Max MRWs

Cochran’s Q (df=3)

Not MRW

MRW

Total

3.2 (n=12) 12.2 (n=34) 16.9 (n=42)

98.2

0.87

13

16

2.48

91.8

0.84

41

49

7.16

94.4

0.91

48

52

2.35

9.7 (n=88)

95.1

0.88

107

116

4.50

KB7 (conv.) A9G (news) A9Y (news)

377

95

279

79.6

249

77.5

Total

905

85.4

The new team performs equally well as the old team. In fact, their percentage of total unanimous judgments is slightly higher (95.1%) than in test 4 (94.1%). Moreover, there is less difference between the four analysts when it comes to individual bias: Cochran’s Q does not reach significance in any of the three text excerpts. The difference between the minimum and maximum number of lexical units scored as related to metaphor across all 905 words is a mere 9 cases. Analysts have clearly performed in highly comparable ways. This is also true when the analysis takes into consideration case-by-case variation. Kappa reaches its highest value in all reliability tests so far. This is partly based on the excellent score for the conversational data (0.87): the low score for conversation in the previous test may indeed have been due to the particular nature of the materials. In all, the findings for the new team are at least as good as those of the old team. The procedure can be transferred to new analysts without difficulty. A brief illustration of its application to the conversational data follows below, the different digits referring to the identities of ‘old’ and ‘new’ analysts: Seat-y belt on Stuart. better put her seat-y belt on then hadn’t she? In case mister police-y man decides to stop us and say why haven’t you put your seat-y belt on Mrs M Mrs woman? Sorry, sir. It’s your responsibility to ensure that your passengers have got4 their safety belt on. Cos that1467 would be nice6 wouldn’t it6? To be fined twelve quid for not having a safety belt on and on1467 top1467 of1467 the ninety quid and the forty quid and one thing and another.

The bulk of these data is not related to metaphor, while four cases are unanimously judged to be metaphorical.

 A Method for Linguistic Metaphor Identification

8.3.6 Study 6 A final test was run at the end of the empirical research to ensure that the level of quality at that point had not decreased. The test also contained samples from all four registers, to show that the new team worked equally well across all four of them. The scope of the test was therefore just as large as the one of test 4, and it was interesting to see whether the high quality level of test 5 could be reproduced in test 6. Four texts were sampled from the following BNC-Baby files: –– Academic: Lectures on electromagnetic theory. Solymar, L, Oxford University Press, Oxford (1984), 5–118. Sample containing about 26,854 words from a book (domain: applied science). –– Conversation: 3 conversations recorded by ‘206’ (PS4XN) [dates unknown] with 5 interlocutors, totalling 455 s-units, 2,979 words (duration not recorded). –– Fiction: The divided house. Raymond, Mary, F A Thorpe (Publishing) Ltd, UK (1985), 1–236. Sample containing about 37,548 words from a book (domain: imaginative) –– News: The Scotsman: Religious affairs stories. u.p. Sample containing about 3,994 words from a periodical (domain: belief and thought). The findings of test 6 are displayed in Table 8.5. Table 8.5. Across four independent analysts (1, 4, 6, 7) for four texts File ID in BNC

Number lexical units

Fleiss’ κ

Min MRWs

Max MRWs

Cochran’s Q (df=3)

Not MRW

MRW

Total

534

73.4

87.4

0.79

102

126

17.07***

602

87.2

93.3

0.78

44

66

30.39***

401

82.3

93.3

0.85

52

61

384

77.9

14.0 (n=75) 6.1 (n=37) 11.0 (n=44) 19.5 (n=75)

97.4

0.96

77

85

16.03***

1921

80.4

12.0 (n=231)

92.5

0.85

282

317

20.36***

FEF (acad.) KNR (conv.) J54 (fict.) K58 (news) Total

Percentage unanimous

*** p = 0.001

6.57

Chapter 8. Reliability tests 

Total unanimity across the four registers is 92.5%: 1777 lexical units out of the total of 1921 were independently classified in the same way by all four analysts. Analyst bias is significant as measured by Cochran’s Q, with a minimum metaphor score of 282 versus a maximum score of 317 (a difference of 35 cases, or 11% of 317). Agreement measured on a case-by-case basis, however, with the help of Fleiss’s kappa, is adequate and substantial, again: the average kappa across all four registers is 0.85. There are two relatively lower kappas for conversation (0.78) and fiction (0.79), and one extremely high kappa for news (0.96). The final reliability test therefore demonstrates that the procedure has kept producing reliable results for the new set of analysts until the very end of the annotation process. 8.3.7 General discussion of the English language tests A series of five reliability tests was carried out to measure the degree of agreement among independent linguists who had to identify lexical units related to metaphor in four registers. The tests showed that there was analyst bias which affected the findings. This has been discussed as a manageable problem: in the encompassing empirical research project, analyst bias in terms of error and preferences for more or less metaphor identification is neutralized by additional discussion of the annotated data. The tests also showed that there was a consistently high level of agreement when the data were analyzed on a case-by-case basis: kappas were invariably high, and on average situated in the mid 0.80s. This indicates substantial and adequate agreement. This interpretation is supported by the figures for unanimous agreement, which ranged from a low 88.1% to a high 95% of all lexical units in the sample texts. It is important to note that all of these figures hold across four rather different registers. They were moreover produced by two different teams of four researchers each, in which two members were constant. These figures are better than for the two texts analyzed for methodological purposes by the Pragglejaz Group (2007). It is hard to exaggerate their importance. As has also been noted by the Pragglejaz Group, metaphor research has been hampered by a lack of reported reliability, making it less clear what has been measured, exactly, and how well. Our detailed set of instructions and our explicit statistical tests of agreement demonstrate that the present research is extremely clear about the object and quality of measurement. This enables other researchers to engage with our findings in a critical but especially precise fashion, so that subsequent work may be closely connected to the present work.

 A Method for Linguistic Metaphor Identification

8.4 Results and discussion: Dutch-language research Additional tests of the reliability of our metaphor identification procedure were carried out for the Dutch language project. They followed the same method and procedure as the English-language tests. One important difference, however, is the lack of a corpus-based learner dictionary for Dutch, so that we had to resort to an explanatory dictionary which also includes historical aspects. This may create more disagreement than for English, as has been detailed in Chapter 7. The tests took place on three different occasions during the period of annotation and were spread over one year (April 2006; December 2006, and March 2007). The tests were carried out by three analysts, all of whom were native speakers of Dutch. One of these three analysts is the Ph.D. student who is in charge of the Dutch language research project (Pasma), while the other two analysts are analysts 1 and 4 of the English language research project (Kaal and Dorst). For each test, samples of some 500 words of news and 500 words of conversation were individually analyzed by the researchers. The materials can be characterized as follows: Conversations –– Test 1: Sample of roughly 600 words from 1 conversation (fn000259), duration unknown; 2 interlocutors, parent-child relation (aged 25-34 and over 55). –– Test 2: Sample of roughly 620 words from 1 conversation (fn000745), duration unknown; 2 interlocutors, colleagues (both aged over 55). –– Test 3: Sample of roughly 580 words from 1 conversation (fn008413), duration unknown; 2 interlocutors, friends (both aged over 55). News –– Test 1: Algemeen Dagblad: the national news section, 3 May 2002 (Lexis Nexis). One complete news text containing 522 words; the topic is a drug smuggling affair. –– Test 2: NRC Handelsblad: frontpage, 3 July 2002 (Lexis Nexis). One complete news text containing 500 words; the topic is the formation of a new government. –– Test 3: Telegraaf: frontpage, 4 March 2002 (Lexis Nexis). One complete news text containing 499 words; the topic is a shooting incident in Amsterdam. The results of the test are presented in Table 8.6. The numbers are slightly lower there, because of the inclusion of headers and so on as well as words annotated with DFMA.

Chapter 8. Reliability tests 

Table 8.6. Three reliability tests across three independent analysts (1, 2, 5) 8.6a Test 1 File type

Number lexical units

Percentage unanimous

Fleiss’ κ

Min MRWs

Max MRWs

Cochran’s Q (df=2)

Not MRW

MRW

Total

10.2 (n=57) 10.2 (n=53)

92.9

0.80

67

83

11.13**

91.2

0.77

76

79

0.30

10.2 (n=110)

92.1

0.79

143

162

7.08*

Fleiss’ κ

Min MRWs

Max MRWs

Cochran’s Q (df=2)

Conv.

559

82.8

News

521

81.0

Total

1080

81.9

* p < 0.05; ** p < 0.01

8.6b Test 2 File type

Number lexical units

Percentage unanimous Not MRW

MRW

Total

7.7 (n=42) 21.5 (n=109)

91.7

0.74

56

69

10.30**

91.8

0.86

126

133

1.76

91.7

0.82

182

202

10.18**

Fleiss’ κ

Min MRWs

Max MRWs

Cochran’s Q (df=2)

50.74***

Conv.

549

84.0

News

508

70.3

Total

1057

77.4

14.3 (151)

** p < 0.01

8.6c Test 3 File type

Number lexical units

Percentage unanimous Not MRW

MRW

Total

8.1 (n=43) 15.4 (n=77)

91.8

0.78

45

77

92.8

0.86

90

101

11.7 (n=120)

92.4

0.83

135

178

Conv.

523

83.7

News

498

77.4

Total

1021

80.7

*** p < 0.001

5.52 29.66***

 A Method for Linguistic Metaphor Identification

On average, 79.9% of all the lexical units were unanimously judged as nonmetaphorical, and 12.2% was unanimously judged as possibly metaphorical. This means that there was an overall lack of agreement on the metaphorical status of lexical units for 7.9% of the total number of cases. Judging solely from the figures, we can see that the percentages of the total unanimously scored units are extremely similar for news and conversation, for all three reliability tests: the lowest total unanimous score is 91.2 (test 1, news) and the highest is 92.9 (test 1, conversation). These figures are even more homogeneous than in the English-language tests. Analyst bias displays an interestingly consistent pattern. In all three news texts, the data are apparently so clear that no differences among the analysts can emerge. In the conversations, however, there is a systematic effect of analyst bias on the metaphor identification data. The average findings between the two registers then turn out to be influenced by analyst bias. Kappas are high again: their mean is above 0.80. They are in the same regions as in the English language test. On average, news texts demonstrate a slightly higher degree of agreement than conversation, which, again, is comparable to the English language materials. On the whole, the Dutch-language results may be said to mirror the Englishlanguage tests. Reliability is high, solid, and consistent across the three tests. It is gratifying that this is the case for a different language, with a different dictionary, and yet another combination of analysts. The efforts we put into the development of the procedure and its methodological test appear to have paid off.

8.5 Conclusion Reliability tests were conducted throughout the entire period of annotation, to examine the degree of agreement between analysts when they had analyzed their materials independently of each other (before discussion). This happened over a period of almost two years, between three different combinations of analysts, for two languages, and four registers. The test of our method is not limited to a single instance. Reliability was good. Measured by Fleiss’ kappa, the mean value was about 0.85 for the English language data, and 0.82 for the Dutch language data. On average, the analysts achieved unanimous agreement before discussion about some 92% of all cases. This result may be regarded as a target for future metaphor research on authentic data. Analyst bias was significant. This suggests that analysts are human after all. In our research, this was alleviated by the overall protocol of analysis, which is described in Chapter 10. When researchers work in isolation, other remedies may have to be put in place.

Chapter 8. Reliability tests 

Overall, MIPVU offers a tool for metaphor identification that is explicit and systematic enough to elicit high levels of agreement between individual analysts. It has been possible to transfer the expertise it represents fast and effectively to other analysts. And it is based on the Pragglejaz method, MIP, which has its roots in generally accepted assumptions about metaphor and its use in discourse. The broader coverage of MIPVU than MIP should make MIPVU attractive to all researchers who aim to find a broad range of types of expressions of metaphor in discourse. This chapter has shown that MIPVU is also reliable and, given motivation and time, relatively easy to acquire.

chapter 9

From method to research Cleaning up our act This chapter reviews the ways in which we cleaned up our act after the stage of corpus annotation was finished. Even though our procedure has been made explicit (Chapter 2) and shown to exhibit a high degree of reliability (Chapter 8), this does not mean that the analysis is perfect. Looking for errors and correcting them is an important part of doing quantitative research. Perhaps even more crucial is the process of estimating margins of errors that doubtlessly remain. This is what the present chapter aims to do. It is the final stage of preparing the data for quantitative analysis, which will be reported in Chapter 10.

9.1 Lexical units The first part of our procedure that needs to be checked concerns the demarcation of lexical units as the relevant unit of analysis. In our approach, lexical units are the linguistic structures that may qualify as Metaphor-Related Words, or MRWs. Even though it is true that metaphorical use may also be found at levels below the lexical unit (morphemes), above lexical units (phrases), and even ‘around’ lexical units (constructions), we have disregarded these levels, for theoretical reasons explained in Chapter 1. To recapitulate, lexical units are the level of linguistic organization that is most closely related to the level of conceptual structures involved in cross-domain mappings: words activate concepts which apply to referents in direct ways (non-metaphorically) or in indirect ways via cross-domain comparison (metaphor). However, the precise delimitation of the notion of lexical unit is not without problems, and has alerted us to a number of issues that we have addressed in our procedure. When troubleshooting for consistency in handling lexical units in our annotated materials, we looked at the three main problem areas explicitly engaged in our instructions: phrasal verbs, polywords, and compounds. Each of these classes of linguistic forms can be counted as one or more units, and errors hence affect the total number of lexical units in the data, which in turn influences

 A Method for Linguistic Metaphor Identification

how many lexical units may be identified as related to metaphor. An estimate of the margin of error will evidently be informative if we wish to assess the reliability of our overall frequencies of particular linguistic forms of metaphor. In reporting the analyses we have carried out in this area, we are using the uncorrected figures of the overall database that results from converting the sampled BNC-Baby text files into Excel and SPSS data files. This conversion turns running linear text of all BNC text files into columns in which each lexical unit has its own row. Distinct lexical units hence constitute separate cases in the database, and display different types of information about their properties, including, most importantly, the POS tag automatically coming with the BNC as well as our own annotation for relation to metaphor inserted during the project. Other information includes the sentence, text excerpt, and register to which each lexical unit belongs. Conversion from the BNC-Baby files to our own database included taking a number of decisions: 1. We removed all punctuation marks from the data in the Excel and SPSS files: they receive a separate POS tag in the corpus and are therefore treated as separate cases in the overall figures, but they do not qualify as relevant units of analysis (lexical units) in our data. 2. We also ignored all cases consisting of genitive ‘s: these are also separate cases in POS tagging in the BNC-Baby corpus, but they do not qualify as separate lexical units in our analysis, either. The overall corpus contains 887 cases consisting of a separate morphological expression of the genitive, and these have been temporarily deselected from the data file. 3. We retained all polywords that were accorded polyword POS tags in BNC-Baby. An example is of course. Words like these were not split up into two or three separate lexical units, but included as single cases. The reason for this decision was maintaining maximal agreement with the independent criterion for lexical units utilized by the BNC, which was compatible with our own theoretical assumptions about lexical units and their use in discourse. 4. Orthographical contractions were all split up into their separate components: expressions like you’re and that’s were therefore each treated as consisting of two lexical units: you and ‘re, and that and ‘s, respectively. This was decided because they also display two separate POS tags in BNC-Baby and because this practice was compatible with our theoretical assumptions about lexical units and their use in discourse. 5. For the purpose of the analysis in the present chapter, we treated all phrasal verbs and compounds as consisting of separate units. We did so in order to

Chapter 9. From method to research 

be able to check whether individual expressions had been correctly treated as either part of phrasal verbs and compounds, or as genuinely individual expressions. It should be noted that phrasal verbs as well as compounds are taken as single units in the ultimate database used in Chapter 10. Each of these decisions has minor effects on the total numbers reported in this chapter and this book: the total number of items analyzed in what follows below is 190,148 cases. In Chapter 10, by contrast, which presents the definitive and cleaned up database, all phrasal verbs and compounds have been collapsed to single cases. This explains the slightly lower number of items in the database that this book is about (N = 187,964): this is the true number of lexical units according to our operational definition in the procedure. 9.1.1 Phrasal verbs Analysis of the 190,148 cases in the raw data showed that there were 38,405 POS tags that were verbal (20.2%). Of these verbal forms, 993 cases (2.6% of all verbs) had been classified during our metaphor annotation process as part of a phrasal verb. This had typically been based on the co-presence of a word coded as Adverbial Particle in the BNC. Some phrasal verbs displayed further combination with prepositions (PRP), and when these combinations were checked they turned out to be correctly coded as (complex) phrasal verbs. They include cases like come back to, where the preposition to is part of the phrasal verb. A small group of 25 distinct verbs (i.e. types, not tokens) had 10 or more quotations per word (tokens), accounting for 620 cases out of the total of 993 verbs marked up as participating in phrasal verbs. These verbs were bring, carry, come, cut, find, get, give, go, hang, hold, look, make, pick, point, put, run, set, shut, sit, sort, stand, take, turn, work, and write. Each of these verbs can be combined with more than one adverbial particle to form a phrasal verb. Quantitative analysis showed that the most popular particles used with these verbs were up, out, down, back, on, off, in, over, and round. Each of the four researchers in charge of annotating the BNC-Baby corpus looked at one subset of these 25 most popular verbal parts of the phrasal verbs in our data. They checked all occurrences (tokens) of the verbs (types) in question, irrespective of whether these occurrences were phrasal or not, and answered the following two questions: 1. has the verb been coded as part of a phrasal verb or not? 2. is the coding correct?

 A Method for Linguistic Metaphor Identification

The results of this check were as follows. The 25 verbs had 3,535 occurrences in the entire corpus, including, for instance, 806 instances of get and 827 instances of go. Of these 3,535 occurrences, 8% were now judged to have an incorrect code for either phrasal or non-phrasal use. For instance, there was a high number of mistakes for go. This was partly due to the fact that go back and go out were sometimes ignored as phrasal verbs, even though they should be phrasal verbs according to our criteria. Another factor was that go down and go up in the sense of physically moving are not phrasal whereas they sometimes were treated as such. The automatic analysis in BNC-Baby of phrasal verbs as indicated by the co-presence of Adverbial Particle, and our own interpretation of that analysis on the basis of our own instructions, has led to an error margin of eight per cent. All errors that were found during troubleshooting were corrected. Since the sample of 25 verbs covers almost two thirds of the phrasal verbs in the corpus, we may interpret this sample as fairly representative. In other words, there might in fact still be some eight percent error in the remaining 373 phrasal verbs, which makes for about 30 cases. This is 0.00016% of the complete corpus. 9.1.2 Polywords We checked a selection of all polywords that, according to the polyword list coming with the BNC, should be marked as polywords. These lexical units should also end up as single units in our database. We focused on those polywords which would contain a metaphorical element if they were not regarded as polywords: for example, in a great deal, great could be judged to be metaphorical if the expression was taken as three separate words. We therefore needed to check for separate occurrences of the word great, in order to test whether they might turn out to be part of the polyword a great deal but had not been automatically coded as a polyword in BNC-Baby. A similar example would be in touch with, also to be classified as a polyword by the BNC. Our strategy in examining for error in polywords was therefore to run checks of the isolated headwords (touch or deal, respectively), as well as of the polyword expression (in touch with, and a great deal). In looking at these phenomena, two questions were answered: 1. Has the linguistic form been coded as part of a polyword or not? 2. Is the coding correct? The four analysts divided the following arbitrary sample of polywords and their related headwords: a great deal, a good deal, by means of, by no means, by way of, under way, depending on, due to, in touch with, out of touch with, fed up, for the

Chapter 9. From method to research 

most part, given that, in answer to, in face of, in contact with, in (the) light of, in view of, with a view to, in(to) line with, out of line with, on account of, on board, on top of, provided that, providing that, seeing as, seeing that, supposing that. They checked a total of 1,346 occurrences of these polywords and their related headwords. They found that 48 cases involved polywords, while the rest were free uses of the headword. They also found that the BNC-Baby annotations for polyword status did not contain any mistakes. One caveat needs to be added here: we did not examine polywords whose potentially metaphorical element was a preposition, such as at all, at last or at least. Given the high degree of fixedness and recognizability of these expressions, as well as the high degree of accuracy of the automatic BNC analysis with the above sample, we felt that these cases did not need additional checking. With this minor reservation in mind, we may conclude that the automatic analysis of polyword expressions in BNC-Baby looks almost entirely fail-proof for our sample from the corpus. 9.1.3 Compounds Our strategy for checking the analysis of compounds in BNC-Baby was as follows. We looked at all lemmas which we had manually coded as participating in a multiword expression, with the exception of all phrasal verbs and adverbial particles. This yielded a sample of 817 cases (tokens). Then we omitted all phrases that had received a code for metaphorical use, since this indicates that they were seen as either novel compounds with one metaphorical element, or as conventional compounds that were used metaphorically as a compound. The reason for this removal from present consideration is that these cases are fairly conspicuous and have had to be checked in the dictionary before they could end up in our data base as both a compound and partly or wholly metaphorical. The resulting sample contained 184 types and 319 tokens of lemmas participating in compounds that were not phrasal verbs. Examples include office as in office block, or opera as in opera house. All lemma types were then divided between the four researchers and checked for the following two questions: 1. Has the linguistic form been coded as part of a compound or not? 2. Is the coding correct? Of the 319 tokens in the list, 26 cases turned out to be coded as a compound whereas in actual fact they were not (8%). Examples include next door (Adv + N), treated as a compound in the BNC corpus whereas it was not according to our criteria. More striking, however, was the finding that quite a few cases of participation of isolated words in attested conventional compounds had been missed by the

 A Method for Linguistic Metaphor Identification

tagging programme in BNC-Baby, and subsequently by ourselves. These included Nikkei Index, pine needle, police force, and power station. The discrepancy between the previous treatment in the BNC-Baby, on the one hand, and the findings in the present examination, on the other, may be due to our reliance for troubleshooting on the listings of compounds offered in the Macmillan dictionary: this is based on a different corpus than the BNC, and may also have used slightly different criteria for showing a multi-word expression as a compound. Since Macmillan was our standard for annotating the corpus, we preferred to retrospectively include in our sample as compounds the compounds missed in this way. For the lemma types checked in this part of our troubleshooting, the total number of 319 tokens was increased by 140, an increase of 44%. Our conclusion is that compound identification in the corpus is not as secure as we would like it to be. Even though all mistakes revealed by the above troubleshooting procedure have been corrected, there may be other cases of compounds for other lexical unit types which we happen to have missed. However, given the total numbers of some 816 tokens and some 500 types of compounds, in a complete corpus of about 190,000 words, we are looking at a negligible influence on the correct number of cases of lexical units (less than 50% error margin for less than 0.5% of all lexical units in the data). For those who are particularly interested in the metaphorical use of compounds, however, our analysis is less reliable than we would have liked. Since our strategy is not to code parts of compound words, unless they are novel, the incorrect treatment of some compounds as phrases instead of compounds can only lead to a slight overestimation of lexical units and therefore number of MRWs, not an underestimation. In other words, we will at least not have excluded any cases. 9.1.4 Conclusion Our check of three potentially troublesome areas in lexical unit identification yielded positive results. For phrasal verbs, there is an error margin of eight percent; error has been removed for two thirds of the data, the remaining unchecked data accounting for an error margin of 0.00016% for the entire corpus. For polywords, no problems were found. For conventional compounds, a relatively large number of tokens were found to have been missed, but all observed errors were removed and the overall number of compounds in the complete corpus is small; the remaining error margin is 0.25% for the complete corpus. The overall error margin of our number of lexical units for analysis in the corpus is therefore less than 0.3%, and reliability may be judged to be high.

Chapter 9. From method to research 

9.2 Words classified as not analyzable or borderline With all lexical units clearly demarcated, two groups of cases needed to be checked which can also affect the overall findings. One group concerns those cases which could not be analyzed as metaphorical or not metaphorical due to lack of sufficient contextual information about their meaning. These cases were given the code DFMA (“Discard For Metaphor Analysis”) and are to be removed from the data when these are subjected to statistical analysis. Another group concerns those cases where the procedure did not yield a sufficiently clear outcome, and which were kept for consideration as metaphorical on the grounds of WIDLII (“When In Doubt, Leave It In”). We will now describe how both groups were handled in our check for errors. 9.2.1 DFMAs In news, there was one case of DFMA. It was caused by a typo in the original materials: AL5-fragment03 has the excerpt “he can keep keep a grip”, of which the second keep was classified as DFMA. In the academic texts, 8 words were marked as DFMA. One of them was an error in the materials comparable with the news texts; all other cases were French words that were not part of conventionalized British English. In conversation, 397 cases were classified as DFMA. In spite of sustained efforts to interpret their function in context, they remained unclear, as indicated by the DFMA code. Illustrations of these decisions have been given in Chapter 4, Section 4.3.1. When these cases are discarded, they reduce the total size of the conversation corpus by 0.8%. This suggests that, after group discussion among the analysts, less than 1% of spoken interaction is not intelligible enough for metaphor analysis. Therefore, all DFMA codes were retained in the dataset without further checks. The lexical units exhibiting this code were excluded from the data set, however, when it was subjected to statistical analysis (Chapter 10). 9.2.2 WIDLIIs Of all 26,686 lexical units marked as related to metaphor, there were 1998 cases (tokens) which had the additional status of WIDLII, suggesting that they are questionable or borderline. Since the overall corpus has 190,148 cases, this is about 1% of all cases. When we relate the 1998 WIDLIIs to all 26,686 cases marked as MRW (‘Metaphor-Related Word’), they make up 7.5% of all MRWs. In other words, when our procedure is applied according to our protocol, one in every 13 to 14 MRWs

 A Method for Linguistic Metaphor Identification

is an explicitly marked borderline case. Examples have been given throughout the previous chapters. The raw data in the complete corpus (N=190,148) may consequently be seen as consisting of three groups of words: clearly non-metaphorical words (n = 163,295, or 85.9%), WIDLIIs (n = 1998, or 1.1%), and clear MRWs (n = 24,688, or 13%). There hence remains a small group of borderline cases between metaphorical and non-metaphorical use after discussion by the group of analysts. (It should be noted that there is also a very small fourth group in the raw data, Metaphor Flags, n=167, which will be discussed separately, below.) This division between two clear groups (85.9% as non-metaphorical, and 13% as metaphorical) with a 1.1% borderline group which constitutes some 7.5% of the smaller, metaphorical group, does not seem counter-intuitive. Therefore we have kept all WIDLII cases as WIDLII. They constitute the formally delimited bulk of the degree of error for all MRWs. Their own degree of error will be returned to in Section 4 below. Their borderline status may make them especially eligible for further study of the nature of metaphor. We are not claiming that these words will always have to be treated as WIDLII. What we are saying is that, at this moment, our application of our procedure has not given a clear-cut answer about their status, which may signal something about their structure or use, or their relation to the procedure as is, that is interesting for further scrutiny. 9.2.3 Conclusion In one register, conversation, just less than 1% of all cases had to be discarded for metaphor analysis, on the grounds of lack of a clear contextual meaning. In the complete corpus, 1% of all cases was analyzed as metaphorical with the added status of borderline (WIDLII). This questionable inclusion happened on the grounds of a maximally generous application of our procedure in order to collect as many cases of potential metaphorical meaning as possible. Both findings are findings after discussion among the analysts. They need to be taken into account when error margins for our results are estimated, and the latter group (WIDLII) will receive a little more scrutiny below. 9.3 Classes of metaphor and metaphor signals Part of the annotation process had been the classification of some types of MRWs as ‘direct’ and ‘implicit’, all other metaphor-related words turning, by default, into plain ‘metaphorical language’ (cf. Steen 2007). The figures for these special categories turned out to be extremely low: there were 27 MRWs initially

Chapter 9. From method to research 

coded as ‘implicit’, and 386 as ‘direct’. ‘Direct’ MRWs participate in simile and other expressions of comparisons that present the vehicle of comparison in direct, independent terms. This can hence be connected to the 167 lexical units annotated as metaphor flags, for these would be the typical markers of similes which exhibit direct language use when it comes to the expression of the source domain, as we have seen in the chapter on fiction. Similes also often have more than one lexical unit expressing the vehicle of comparison (or the source domain, in cognitive-linguistic terms), as was also illustrated in the chapter on fiction. Intuitively, therefore, the proportion between 386 ‘direct’ cases and 167 metaphor flags (‘MFlags’) appears to make sense. The low number of metaphor flags, and the correspondingly low proportion of ‘direct’ metaphor, might give cause for suspicion that potential candidates had been missed. This is why we posed two questions: 1. Which lemmas have been coded as MFlags, and can we check a reasonable sample of their other occurrences for mistakes? 2. Which other lemmas might act as potential MFlags, and can we check a reasonable sample of their other occurrences for mistakes? The answer to the first question yielded 167 lexical units marked up as MFlags, with 57 tokens of like and 29 of as, together accounting for 51.5% of all MFlags. Other popular MFlags were much less frequent: as if (7), of (6), so-called (3), with (3). All other MFlags had only two or one citations, and these included the new-formations canyon-like, micawber-like, pillar-like, tent-like, toy-like, wave-like as well as handle-shaped and L-shaped. There was one group of MFlags which were clearly incorrect as a group. A small number of topic-domain markers (Goatly, 1997), which had been explicitly excluded from consideration in the procedure, had been inadvertently included in the annotated files. Since it was unclear how many of this type of MFlag had been missed as a result of our strategy to exclude them, we decided to change the items involved back from MFlag to not MFlag. Examples include human in human resources (five times) and emotional in emotional somersaults (one occurrence). Their correction in the data set for quantitative analysis reduced the total number of MFlags in the raw data to 151 lexical units. A selection of the lexical units involved in this set of MFlags was collected and about half of their occurrences were checked for classification as MFlag, not related to metaphor (non-MRW), and related to metaphor (MRW). The items included were the following: apparently, appearance, as, as if, as though, constitute, metaphorical, mistake, remind, reminiscent, resemble, shape, term. A total of 102 cases were examined. The results showed 5 cases that were incorrectly coded,

 A Method for Linguistic Metaphor Identification

either as MFlag whereas they were not (3 cases), or the other way around (2 cases). The most ironic example of this is the following data, where ‘metaphorically’ had not been coded as MFlag: He was not that kind of priest. He had metaphorically shaken a large fist impotently at some looming energy-field. (FET-fragment01)

With 97 out of 102 potential MFlag tokens displaying no errors in the annotation, this suggests that all lexical units displaying at least one use as MFlag in our corpus have been fairly reliably analyzed. A selection of other terms that could also have been used as MFlags but did not appear to be annotated as such in our corpus was then also examined, in order to check that they had not been inadvertently missed. The following terms were included: analogue, analogy, comparative, compare, comparison, figurative, imagine, resemblance, similar, taste. A total of 72 quotations (potential MFlag tokens) were checked, and only 1 case was found that was incorrectly coded, but not with respect to metaphor signalling: it had been missed as an MRW. All other lexical units had to do with non-metaphorical analogues, comparisons, resemblances, and so on, as in the following example: He would have recognized her from her strong resemblance to her brother although she looked the elder by some years. (C8T-fragment01)

In other words, many other lexical units that are potentially eligible for use as MFlags do not display an MFlag quotation in our data set, and appear to have been reliably excluded as such. In all, of all 174 potential MFlags examined in this troubleshooting stage, only 5 items had been coded incorrectly with reference to the MFlag decision. It may therefore be concluded that the error margin for MFlags was less than 5%. It should be recalled that this applies to a restricted interpretation of MFlags, which leaves topic domain markers or more general markers of indirectness aside (Goatly 1997; Steen 2007). These other signals may now be studied in a more targeted and systematic fashion on the basis of the present annotated corpus. In addition, the potentially signalling use of the genitive construction by means of ‘s or the preposition of was not tested either, which may also require some more detailed attention on a larger scale. The error margin for classifying lexical units related to metaphor as implicit expressions of metaphor was separately examined and led to the conclusion that implicit metaphor had not been handled correctly. We did a separate round of analysis in which we re-analyzed all potential cases in all of the data. We did so by checking all cases of a list of 36 potentially cohesive words. This list included

Chapter 9. From method to research 

modal verbs, primary verbs, expressions such as one, another, and so on, and comprised about 16% of all data. We decided whether each token of these types was indeed used for cohesion or not, and if it was, whether its cohesive use was implicitly metaphorical. Reliability estimates between pairs of raters of truly cohesive use of these potentially cohesive devices in a test sample of over 2,000 words yielded kappas of on average 0.79. For all samples of written text, agreement about the subsequent decision for implicitly metaphorical use between four analysts was 100%, but for conversation it was substantially lower. On the basis of this test, even more explicit instructions were formulated, and all data were divided by register and then re-analyzed by one analyst each. A final sample of about 1,000 cases per register was analyzed by the principal investigator, and this led to the same reliability results between an individual analyst and the principal investigator. In all, then, reliability of implicit metaphor is roughly equal to reliability for indirect metaphor. The number of cases eventually identified as implicitly metaphorical rose from 27 to 291. In sum, troubleshooting took place for classifications of metaphor and for MFlags. The identification of signals of metaphor led to a double check: first, a sample was examined of all lexical units which displayed at least one use in our data as an MFlag; and second, another sample was scrutinized of other lexical units which could have been used as an MFlag but had not been coded as such during our annotation process. Overall reliability was high, reaching over 95% correctness. The identification of implicit metaphor led to an even more encompassing exercise, involving a reconsideration of some 16% of all data which were selected as potentially cohesive. For these cases, it was first decided whether they were truly cohesive, and then, if they were cohesive, the question was addressed whether they involved cohesion with a metaphorical antecedent; if this was the case, they were marked as implicitly metaphorical. This led to a substantial increase in the number of implicit metaphors, raising their number to roughly the same level as direct metaphor.

9.4 Individual metaphor-related words The 26,686 lexical units identified as MRWs are tokens. They can be reduced to 3,611 types, in the form of lemmas. This is a crude reduction, based on the notion of lemma used by the BNC, which conflates word classes. For instance, the lemma OPEN includes open as an adjective and as a verb. If the notion of word class is adopted as a norm of reference, which is our own position, there are considerably more metaphorical types, since many metaphorically used lemmas in our data

 A Method for Linguistic Metaphor Identification

belong to more than one word class. An estimate of this proportion will be given below. Yet for the problem that we are going to address in the next paragraphs, the notion of lemma will also suffice, and since it is automatically available from the corpus, we will utilize it for present purposes. The bulk of the 3,611 types related to metaphor had only one or two metaphorrelated tokens, or lexical units: 1,971 lemmas annotated for metaphor occurred once, and 539 lemmas occurred twice. If we add the next two ranks of three and four quotations per lemma, 261 lemmas had three metaphorical tokens, and 154 lemmas had four metaphorical tokens. When we add these figures up, we observe that no fewer than 2,925 out of 3611 types, that is 81%, have only one to four tokens in our entire corpus. The total number of tokens for these 2,925 types is 4,457 (out of a total of 26,686 MRWs). This means that 81% of all metaphor-related types accounts for 16.7% of all metaphor-related tokens. At the other extreme, the top eleven lemmas in terms of number of tokens were in (n=1,938), to (n=1,029), that (n=903), on (n=883), with (n=833), this (n=702), have (n=509), at (n=455), from (n=413), about (n=394), by (n=353). Adding up these figures for tokens, they account for 8,412 tokens out of the total of 26,686. This means that eleven lemmas (0.02% of all metaphor-related types) account for 31.5% of all metaphor-related tokens. In other words, almost one in every three metaphor-related words in our sample of authentic discourse is an instantiation of one of these eleven lemmas. Given this uneven spread of MRWs across the data, the question arose how a sensible strategy could be developed for estimating the degree of error for individual lexical items. An overall proportionate check of a representative sample was one option, but deemed less necessary on account of the general control exercised over our analysis by the explicit instructions, the protocol for discussion, and the successful reliability tests. Another option seemed more interesting, which will now be explained. 9.4.1 Rationale A selection of all 3,611 distinct metaphor-related lemmas (roughly types, but see above) was included in a specially kept lexical database. In particular, the database contains some 1,250 entries, distinguished by word class. One lemma, such as open, consequently had two entries in the lexical database, one for the adjective and one for the intransitive verb. (The transitive form of the verb open did not happen to have an entry: apparently it did not occur in the corpus, or, if it did, it did not raise any discussion during metaphor analysis.) There are hence fewer lemmas as identified by the BNC in our lexical database than the 1,250 entries based on

Chapter 9. From method to research 

word class. An estimate of this proportion on the basis of visual inspection is that there are about 1,000 BNC-type lemmas. From one perspective, all the items in the lexical database could be seen as potentially problematic. They ended up in the database because they elicited explicit comments when their analysis was checked during discussion by the group of analysts in our regular annotation process, and these comments were then recorded in the database for reference during subsequent annotation and discussion. They sometimes required solutions that were felt to be forced. The most negative estimate, therefore, of the margin of error in our findings would have to include all of these cases and reckon with approximately 1,000 lemmas as being potentially debatable out of all 3,611 lemmas marked by us for metaphor. However, this estimate is much too harsh. Many entries in the lexical database are records of decisions which are not fundamentally controversial. They were for instance also kept for archiving and later consultation after they had been discussed in the group because one of the analysts had selected them for closer inspection. The question that needs to be answered, therefore, with respect to individual MRWs, is precisely how the WIDLIIs in the corpus relate to their description in the lexical database. Their relation to the lexical database could throw more light on the question whether the number of WIDLIIs is an adequate estimate of the overall group of controversial cases in our data. If it is, this would pitch our overall potential margin of error for all MRWs at some 7.5% (the proportion of WIDLIIs in all MRWs). Given the high reliability figures reported in the previous chapter, this in fact looks like a fairly adequate estimate of the error contained in our data, but we shall now attempt to bolster this intuition with some more information. 9.4.2 Method A sample of 10% of all MRW tokens whose lemmas were coded WIDLII was selected for evaluation by two of the analysts. The adoption of tokens instead of types has to do with our focus on metaphor in discourse: each token has its own specific, situated contextual meaning, which can raise distinct problems for its analysis with reference to the lexical database. The adoption of WIDLII was motivated above, by the inherently problematic status of these items. They should be representative of a major proportion of any truly controversial decisions recorded in the lexical database. Therefore, a check of a representative sample of tokens coded as WIDLII, rather than of any or all types, was deemed appropriate. Every tenth token of all 1998 cases of WIDLII was included in the sample to be checked. This produced a sample of 159 types, and 201 tokens. Most of the

 A Method for Linguistic Metaphor Identification

selected types were the only WIDLII case for their lemma. But sometimes there were more WIDLII cases for a particular lemma, as with some frequent prepositions. In such cases, the types to be examined were randomly selected from all WIDLIIs for that lemma. For each token, the question was raised whether the type (lemma) of the token had been included as an entry in the lexical database. 9.4.3 Results and discussion Of the 201 cases of WIDLII, 39 were not included in the lexical database (19.5%). There were no disagreements between the two analysts for this analysis. If we extrapolate this, this means that 19.5%, or about 400, of all 1998 WIDLII senses or uses decided during our discussions have not been explicitly recorded in the database. They can therefore not have been re-used on other occasions, which increases the chance of error in other, corresponding uses of the same lemma that have not been coded WIDLII. This finding suggests that we need to include a 20% error margin for the figure of 7.5% of borderline cases (WIDLIIs) in proportion to all lexical units identified as related to metaphor (MRW). We may conclude that the number of WIDLIIs offers a fairly adequate view of the group of borderline cases that may in fact be experienced as incorrect by others. The 20% error margin in WIDLIIs means that we need to reckon with a maximal increase from 7.5 to 9% WIDLIIs as the true figure for borderline cases in our data. This seems to be the most radical estimate of our error margin. It boils down to the statement that maximally 1 in every 11 MRWs is borderline and potentially controversial. The other ten cases marked as MRW should be correct, according to our procedure. That these figures should not be worse than this is confirmed by a reconsideration of our reliability analyses. 9.4.4 Post hoc corrections of individual lexical items At the end of the process of annotation and troubleshooting, 27 entries in the lexical database were corrected in various ways. These entries were chosen, more or less by chance, simply because they stood out as errors in the eyes of the analysts at the end of the annotation and troubleshooting processes. All annotations in the data were adjusted accordingly.

9.5 Conclusion When language research is based on the annotation of large amounts of corpus data, difficulties that remain concealed in less encompassing research get revealed.

Chapter 9. From method to research 

In order to deal with such problems, we have followed a protocol for analysis which includes an explicit set of coding instructions, and we have collected data on the agreement between analysts and reported these in a series of reliability tests. However, the story does not end there, and troubleshooting has revealed further details about the degree of error persisting in our data. Some systematic errors were detected and removed, and remaining margins of error were estimated. The upshot of this exercise is as follows: 1. At the level of the raw data provided by the BNC, with special attention to phrasal verbs, compounds, and polywords, a margin of error of 0.3% should be taken into account. The corpus has around 190,000 lexical units, and this might have to be adjusted by some 600 words more or less. 2. At the level of lexical units to be discarded for metaphor analysis (DFMAs), 1% of all cases in the conversations have been marked and retained as such, so that the total number of lexical units of the conversation subcorpus has to be reduced by 1%. 3. At the level of borderline cases between metaphorical and non-metaphorical lexical units (WIDLIIs), 1% of all lexical units in the complete corpus have been given this classification, which is about 7.5% of all words identified as related to metaphor. We have retained this group as is, and subsequent analysis has shown that there may be a 20% error margin for this group. This suggests that, in actual fact, the number of borderline cases for metaphoricity comprises 9% instead of 7.5% of all lexical units identified as related to metaphor. 4. For the class of lexical units flagging the presence of metaphor (MFlags), reliability was about 95%, which suggests that we need to reckon with a margin of error of at most 7 cases for all approximately 150 MFlags in the corpus. 5. The error margin for lexical units related to metaphor as implicit expressions of metaphor was separately examined and led to a separate round in which we re-analyzed all potential cases in all of the data. This involved revisiting about one sixth of the data and led to a substantial increase in the number of implicit metaphors, making reliability of implicit metaphor roughly equal to reliability for indirect metaphor. The observation of metaphor is a form of linguistic categorization designed to measure the extent and nature of a phenomenon of language use. Since this is a process that is quite fallible, it needs to be supported by as many checks and balances as is possible. If it cannot, caution needs to be exerted in the assumptions that are made about the quality of the analysis and the interpretations that can be based on it.

 A Method for Linguistic Metaphor Identification

In our separate post-hoc processes of correction, most of the analyses turned out to have a small degree of error, estimates of which ranged between 0.3% for complex lexical units and 20% for borderline cases of metaphor-related words. Only one truly serious issue was discovered and redressed, which concerned the application of the category of implicit metaphor. The precise demarcation of distinct categories made it possible to resolve this problem in a targeted and selective manner.

chapter 10

Metaphor in English discourse A corpus-linguistic approach 10.1 Introduction The bulk of the research reported in this book was carried out in the framework of the programme ‘Metaphor in discourse: linguistic forms, conceptual structures, and cognitive representations’ (Netherlands Organization for Scientific Research, NWO, vici programme 277-30-001). This was a five-year project in which four Ph.D. students (Dorst, Herrmann, Kaal, and Krennmayr) and one principal investigator (Steen) collaborated in a multi-stage research programme. The first phase of the programme involved the annotation of the four samples from BNC-Baby for metaphor, as we have described in the previous chapters. This research was based on a method for linguistic metaphor identification that was systematically developed, made explicit, applied, and tested on a broad range of materials, and this methodological part of the research has constituted the focus of the present case book. The format we have adopted allows for a great degree of methodological precision and empirical illustration so that the book may be useful for other students of metaphor who wish to apply our approach for their own purposes. The present chapter, however, reports on some of the overall quantitative findings of the metaphor identification stage of the research. It affords an idea of the effect of this type of methodological care on empirical research. The costs of our approach are high in terms of time because of the required precision, but we hope that our findings can show that it has been worth the trouble. Indeed, other scholars will probably only be prepared to go to the same lengths if they can be convinced that their efforts will eventually pay off. The overall goal of the complete programme is to answer the question which metaphors are used in which forms and discourse contexts in which registers and for which purposes. This question was addressed by conducting corpus research aimed at providing a set of linguistic and conceptual profiles for metaphor that characterize its nature and use in the four registers studied in the program: conversation, news, academic discourse, and fiction. Experimental research was also carried out to investigate aspects of these metaphor profiles in cognitive processing and in its products, cognitive representations. The findings of these two types of

 A Method for Linguistic Metaphor Identification

studies will be used to offer provisional answers to the main research question about metaphor in discourse. The corpus work itself can be divided into three stages. The first stage has been the main topic of this book, and involves the identification of all the words in the materials that may be classified as related to metaphor. The second stage looks at the conceptual nature of the metaphors expressed by the metaphor-related words. And the third stage concentrates on linguistic and discourse properties of the metaphor-related words in order to describe patterns of distribution and function. The empirical findings of all of these stages will mainly be reported in separate Ph.D. theses (due by the beginning of 2011) that each focus on one distinct register. Methodological and theoretical aspects are discussed in separate publications, such as Steen (2007, 2008, in press a, b). The work on stages two and three of the corpus-linguistic research has a fair degree of individual focus on distinct registers for each of the Ph.D. researchers, but the first stage of the project involves a unique group project in which all the researchers worked closely together. Each Ph.D. researcher used the linguistic metaphor identification procedure MIPVU to annotate materials from all four registers, so that all researchers got a good feel for metaphor across the entire range of the data. And all the researchers examined their colleagues’ analyses in order to check for potential problems. The details of the protocol will be reported below in the methods section. The research as reported in this chapter, then, concerns the identification of a number of relations between language use and metaphor defined as a conceptual phenomenon involving a cross-domain mapping. As we have noted before, language use is measured at the level of lexical units, and the linguistic expression of metaphor is divided into three main categories: metaphor-related words (‘MRWs’), words flagging that other words are MRWs (MFlags), and words that are not related to metaphor (non-MRWs). Although metaphor was defined as a conceptual phenomenon in the programme as a whole, its linguistic identification in this first stage does not depend on conceptual analysis in the tradition of cognitive linguistics, which is viewed as a separate stage (Pragglejaz Group 2007; Steen 2007). This independence of the linguistic identification stage from subsequent linguistic and conceptual analysis is advocated as a methodological strength of the approach, as has been indicated throughout this book. The distribution of the metaphor categories in language use has been examined across samples of materials taken from the four registers discussed in Chapters 3 through 6. These materials have been independently annotated for word classes by automatic POS tagging which is part and parcel of the British National Corpus. The rest of the chapter will now report on the methods and techniques utilized in the research, and present and discuss the overall findings of the analyses.

Chapter 10. Metaphor in English discourse 

There is one aspect that is not included in the present chapter, and that is the parallel project on metaphor in Dutch discourse (Chapter 7) which was carried out by Pasma. That project was designed along the same lines as the English-language project, but there are some important differences. The main findings are comparable to the English-language results, but they will be reported elsewhere.

10.2 Method 10.2.1 Materials The materials were taken from the four-million-word sample BNC-Baby, itself excerpted from the 100-million-word British National Corpus. BNC-Baby was chosen because it was developed to offer a set of language materials that were parallel with the data described in the university grammar of spoken and written English by Biber et al. (1999). This focus facilitates the description of metaphor in four registers that have been well studied from a grammatical point of view. The partial annotation of these files for the semantic component of metaphor is a novel contribution to linguistic research, and the final product of the annotated subcorpus will also be published. The selection of the files was prepared with the help of Dr James Cummings from the Oxford Text Archive, who split the files up into separate fragments defined by the highest section division in the texts. The original plan of the project was to annotate ten percent of each file in the corpus. However, because the number of lexical units related to metaphor was higher than expected, and because our protocol was more time-consuming than planned, we managed to analyse excerpts from only half of the files. Selected fragments were taken from the beginning, middle, and end of the complete BNC-Baby files. Some files were discarded because their content was too difficult: it is impossible to identify metaphorical lexical units if the contextual meaning of a stretch of discourse is unclear. Other files were discarded because they were too short and therefore too deviant from the average length of the excerpts. Even though these criteria were clear from the start, it turns out in retrospect that they were not followed completely consistently. A detailed description of the corpus can be found in Appendix A. 10.2.2 Tools The Macmillan English Dictionary for Advanced Learners (Rundell 2002) was the main tool we used for making decisions about lexical units, contextual meanings, basic meanings, and distinctness of contextual and basic meanings. The reasons

 A Method for Linguistic Metaphor Identification

for using this type of dictionary, and Macmillan in particular, are that they are recent and corpus-based (cf. Pragglejaz Group, 2007). As described in the instructions for the procedure (Chapter 2), we also used a second dictionary in order to have a second opinion about specific types of problems. This was the Longman Dictionary of Contemporary English. An informal test at the beginning of the project, comparing the description and application of about 100 lexical units, showed that there was no essential or systematic difference between the two dictionaries for our purposes. We therefore somewhat arbitrarily fixed Macmillan as our first dictionary, to be supplemented by Longman only in cases of doubt. We also referred to the Oxford English Dictionary at times, usually to achieve a deeper understanding of the semantic structure of a lexical unit. Only very seldom did we in fact use the OED to make a final decision. 10.2.3 Technique Procedure An explicit set of instructions was developed and fixed at the beginning of the research, as reported in Chapter 2. The starting point of this set was provided by MIP, the Metaphor Identification Procedure published by the Pragglejaz Group (2007), and this has remained the core of our own procedure, MIPVU. The main changes to MIP involved the following two features: 1. the detailed explication of many aspects of the decision-making process regarding lexical units and the identification of metaphorically used lexical units; 2. the addition of new sections on other forms of metaphor (direct and implicit metaphor), novel compounds, and MFlags. As the research went on, the instructions were selectively improved and refined, but the basic procedure has remained unchanged after revision since the first reliability test. Reliability As reported in Chapter 8, reliability tests were conducted throughout the entire period of annotation, to examine the extent of agreement between analysts when they had analyzed their materials independently of each other (before discussion). Reliability was good. Measured by Fleiss’ kappa, the mean value was about 0.85. On average, the analysts achieved unanimous agreement before discussion for some 92% of all cases. Even though analyst bias was significant, this was alleviated by the overall protocol of analysis, which is described next.

Chapter 10. Metaphor in English discourse 

Protocol The set of instructions for annotation is the basis of our identification research, but it should be seen in the context of our overall approach to the materials. We handled the texts on the basis of the following protocol: 1. Excerpts were selected from the BNC-Baby by the principal investigator and entered into an administrative database; 2. Ph.D. students selected the excerpts assigned to them and produced an individual annotation; 3. The individual annotation had to be posted in a trimmed web version (which excluded a lot of the information present in the BNC-Baby for better readability) on an intranet website for comments by the other Ph.D. students; 4. The other Ph.D. students had to go through the trimmed web versions of the work of their colleagues and post comments and queries; 5. All Ph.D. researchers and the principal investigator had group meetings about the comments on the trimmed web versions, referring to the details of the procedure and to previous decisions about specific cases, which had been recorded in a special lexical database; they made final verdicts about problematic cases, which were recorded in the trimmed web version. 6. The annotations in the individually analyzed files were subsequently corrected on the basis of the web version; 7. The final annotations were then stored in a separate folder; 8. Any decisions about problematic cases were recorded in a special lexical database, for future reference. An example of a web version after discussion is presented below (A7T, fragment 01). But were it not for the <mrw type=“met” status=“WIDLII” morph=“n” TEIform=“seg”>perceived can you really see ‘might’? economic might of a more unified Economic <mrw type=“met” morph=“n” TEIform=“seg”>Community, like group? ‘living together’ = basic? the dozen or so nations of the Pacific Rim, who have been <mrw type=“met” status=“PP” morph=“n” TEIform=“seg”>meetingin Australia, might not have <mrw type=“met” status=“PP” morph=“n” TEIform=“seg”>bothered. 4.2 perceive: it seems both MM and LM conflate concrete and abstract senses and remain mostly within the abstract domain. I think not M. A 4.2.1 yes, not M. GVAP 4.3 community: I think so. And you do not have to be together to form this kind of economic community, so M. A 4.3.1 based on LM, M. GVAP

 A Method for Linguistic Metaphor Identification

All text in italics is from the BNC-Baby, with annotations added in angular brackets. The underlined questions inserted into the annotated BNC-Baby fragment are queries posted by the individual analyst into the annotated document; they alert the other Ph.D. students to potential problems and are meant to elicit discussion. Underneath the annotated text, new comments made by the other analysts can be found about specific lexical units. They contain the abbreviations ‘MM’ and ‘LM’ for the dictionaries we use, Macmillan and Longman, respectively. Comments are signed by the initial of the analyst who posts the comment. They are numbered by utterance number, and responses to comments can be added, with further indentation and another number being added. In this case, two responses can be seen, that are signed ‘GVAP’, our code for ‘Group Verdict After Pragglejazzing’. It should be noted that this protocol reduces the (significant) effects of individual analyst bias reported in Chapter 8 to virtually zero. That this is dependent on the group dynamics is also clear. But the basis of the identification procedure lies in the reliable individual case-by-case analyses anyway, as was shown by Fleiss’ Kappa. Therefore, what we are dealing with here is the further increase of consistency against the background of the systematic and explicit set of instructions presented in Chapter 2 as MIPVU. Troubleshooting A report on the results of the post-hoc troubleshooting was offered in Chapter 9. Some systematic errors were detected and removed, and remaining margins of error were estimated. The upshot of this exercise is as follows: 1. For the prior identification of phrasal verbs, compounds, and polywords by BNC, a margin of error of 0.3% should be taken into account. 2. One percent of all lexical units in the conversations have been discarded for metaphor analysis on account of their lack of intelligibility in context. 3. There may be a 20% error margin for the group of lexical units classified as metaphorical on the basis of WIDLII (When In Doubt, Leave It In). 4. For the class of lexical units flagging the presence of metaphor (MFlags), agreement was about 95%. 5. The error margin for classifying lexical units related to metaphor as direct expressions of metaphor was not separately examined since the behaviour of these words is closely connected to the behaviour of MFlags (see previous point). 6. The error margin for classifying lexical units related to metaphor as implicit expressions of metaphor was separately examined and led to a separate round in which we re-analyzed all potential cases in all of the data. This made reliability of implicit metaphor roughly equal to reliability for indirect metaphor.

Chapter 10. Metaphor in English discourse 

10.2.4 Preparation of final database After all annotated files had been corrected for errors discovered during the stage of troubleshooting reported in Chapter 9, they were converted into an SPSS database, with technical assistance from Onno Huber of the ICT group in the Arts faculty at VU university, Amsterdam. Separated lexical units that needed to be treated as single units (compounds, phrasal verbs, and polywords) were collapsed into single cases. A small number (n=18) of conversion problems were detected upon visual inspection of the database, and corrected. These corrections, to the extent that they were needed, have also been fed back into the annotated BNC-Baby files. In the following analyses, all DFMAs and genitive cases have been deselected. The total number of cases that remain in the SPSS database is 186,688.

10.3 Results and discussion: Initial exploration 10.3.1 Main metaphor categories The main data set was first divided into the three categories that were functionally most important: metaphor-related words (MRWs), words that were not related to metaphor (non-MRWs), and words that functioned as metaphor flags (MFlags). Their distribution was found to be as follows: total number of words = 186,688; non-MRWs = 161,105 (86.3%); MRWs = 25,442 (13.6%); and MFlags = 141 (0.1%). On average, one in every seven and a half words is related to metaphor. Since independent clauses may be very roughly and intuitively supposed to have an average length of about eight words, this amounts to the conclusion that every independent clause has on average one MRW. Metaphor is indeed ubiquitous. But there is also the contrast with the other 86.3% of non-MRWs, which demonstrates that by far the greater part of our language use is not metaphorical, or more precisely, not related to metaphor. An accurate and exact indication of this proportion has so far been lacking, and the present findings may be a first indication of the true role of metaphor in language use, at the level of lexical units, in contemporary language use in English. It is true that there may be far more metaphor in language, for instance from a historical perspective, since many words may have senses whose origins lie in historical polysemy that has a metaphorical basis (Steen 2007). Similarly, there is additional metaphor in contemporary language use at the level of morphology or phraseology, for instance because morphemes have metaphorical relations to other morphemes in attested words, such as English overstatement. And languages that have a different interaction between vocabulary and morpho-syntax than

 A Method for Linguistic Metaphor Identification

Germanic languages may require a slightly different type of measurement than our approach to lexical units has facilitated. But all of these aspects have been excluded from our analysis of metaphor in discourse, as we have focused on the lexical item as our unit of analysis. Another striking finding is the low number of words that function as signals of metaphor in discourse. It is true that we have employed a restricted definition of signals of metaphor, to the effect that only markers were included for similes and figurative comparisons and analogies and the like. However, it is also true that, theoretically, these are the most conspicuous alternative rhetorical forms of metaphor. In terms of empirical research, they have been accorded a lot of attention recently in debates about psycholinguistic models of metaphor processing (e.g. Gentner & Bowdle, 2001; Bowdle & Gentner, 2005; Glucksberg, 2001; Glucksberg & Haught, 2006; Steen, 2007, 2008). The widespread interest in simile might have led to an expectation that there are quite a few of them in the data. Our research suggests otherwise. The first findings of our corpus approach to metaphor have yielded new insights. The proportion of words related to metaphor to those not related to metaphor is about one to seven or eight, an estimate which has not been advanced on equally reliable grounds before now. In addition, the number of signals for similes, analogies, comparisons, and other explicitly flagged forms of metaphor, which are also typically direct, is extremely low. These findings form the starting point of the following more detailed initial explorations of the data. 10.3.2 Simple and complex lexical units, and borderline cases One important issue in our procedure for metaphor identification concerns the delimitation of lexical units. Although most lexical units are single words, there are some notorious borderline cases including polywords, phrasal verbs, and compounds, as we have amply illustrated. Polywords are treated as single lexical items by the POS tagging programme in BNC-Baby, but phrasal verbs and compounds are not: they are split into their component words and each of them is assigned a separate POS tag. Yet phrasal verbs and compounds function as single lexical units in our theoretical framework, because of their unitary referential function in discourse where they designate single entities, attributes, or relations. Therefore we have given all phrasal verbs and compounds an additional annotation in our database, signalling that they are complex lexical units, as opposed to all other lexical units that are simple. It turns out that practically all of these complex lexical units are attested and can be found in the dictionary. They can hence be classified as metaphorically or not metaphorically used in basically the same way as simple lexical units. However,

Chapter 10. Metaphor in English discourse 

they are less easy to identify as complex lexical units (their error margin is 8% for all phrasal verbs and 44% for all compounds, see Chapter 9), and their classification as related or not related to metaphor has sometimes caused vexing problems during our group discussions. Given these considerations, it is informative to report the number of these cases, in order to obtain a view of their impact on the relation between lexical units and metaphorical and non-metaphorical language use in discourse. Table 10.1 provides an overview. It has an additional feature for metaphorrelated words, which have been divided into clear cases and borderline cases, which were marked as ‘WIDLII’. In the table, each row displays four types of information: 1. the frequencies of a particular category actually observed in the data (‘obs count’); 2. the frequencies of a particular category expected on the basis of chance, given the total number of observed cases for a specific group of lexical units crossed with a particular metaphor category (‘exp count’); 3. the percentage of a particular metaphor category for one specific group of lexical units (row %); 4. the degree of deviation of the observed frequencies from the expected frequencies, measured by a standardized unit for the complete data set (‘std res’). Table 10.1. Relation of lexical units to metaphor, divided by lexical complexity Relation to metaphor Lexical units Simple

Complex

Total

Non-MRWs

WIDLII

MRWs

MFlags

Total

Obs count Exp count Row % Std res Obs count Exp count Row % Std res

158,687 158,524.7 86.4 0.4 2,418 2,580.3 80.9 −3.2

1,765 1,801.7 1.0 −0.9 66 29.3 2.2 6.8

23,119 23,232.8 12.6 −0.7 492 378.9 16.2 5.9

127 138.7 0.1 −1.0 14 2.3 0.5 7.8

183,698 183,698 100

Obs count Exp count Row %

161,105 161,105 86.3

1,831 1,831 1.0

23,611 23,611 12.6

141 141 0.1

186,688 186,688 100

2,990 2,990 100

In the group of simple lexical units, 158,687 lexical units have been categorized as not related to metaphor; this is 86.4% of all simple lexical units. This number is a little higher than what may be expected by chance: the expected frequencies are 158,524.7, the difference between the two figures amounting to 162.3 cases.

 A Method for Linguistic Metaphor Identification

The degree of deviation of the observed frequencies from the expected frequencies is therefore a positive figure for standardized residual, which is quite small, 0.4. This figure may be compared with other figures in the table, such as the negative standardized residual for the same metaphor category in complex lexical units. That is much larger and negative, which indicates that complex lexical units have fewer non-MRWs than might be expected by chance. The difference between expectation (2,580.3) and observation (2,418) in complex lexical units is greater than for simple ones. The difference between expectation and observation is the same in absolute figures, 162.3, but proportionately it is rather different: 162.3 is six percent of 2,580.3, but one per cent of 158,524.7. A chi-square analysis can test whether these deviations are due to chance or whether they are great enough to conclude that there is some statistically significant association between the two variables that can explain the patterns. The chisquare analysis is technically allowed, because there is only one cell out of eight that has an expected value that is lower than 5. The test statistic shows that there is a significant association between the two variables (χ2(3) = 153.9, p < 0.001), but that the effect size is small (Cramer’s V = 0.03, p < 0.001). To interpret this finding, we need to examine the standardized residuals in Table 10.1: the greater they are, the more important the deviation of a particular category in a particular group of lexical units is from what may be expected by chance, given the overall proportions between registers and metaphor categories. Critical values for standardized residuals are 1.96 for significance at 0.05, and 2.58 at 0.01. Since we are dealing with large groups of data which may lead to technically significant findings that are not ‘real’ but due to sample size (Type 1 error), we will adopt the more stringent significance level of 0.1. As can be seen, all values for the complex lexical units deviate from expectation by chance, in both positive and negative ways. This is why there is a statistically reliable interaction between metaphor and lexical complexity. Of all lexical units in our data, 1.6% (2,990 cases) are complex. These include all polywords marked as such by BNC-Baby, which in fact comprise almost half of all complex lexical units in our data (1458 cases). The other half include phrasal verbs and compounds, which were not coded as complex in the corpus but could be identified as such on the basis of the dictionary. In all, however, multi-word units defined in the way we have done, that is, with respect to their distinct referential role in the discourse, are extremely infrequent in comparison with simple lexical units. The relation of the metaphor categories to lexical complexity is as follows. In contrast to simple lexical units, complex lexical units display a clearly positive association with metaphor, borderline metaphor (‘WIDLII’), and metaphor

Chapter 10. Metaphor in English discourse 

flag, and a corresponding negative relationship with non-metaphor: metaphor is more closely associated with complex lexical units than with simple ones. Yet it should also be noted that the 492 clear cases combined with the 66 borderline cases marked as WIDLII only produce 558 complex lexical units that are related to metaphor, out of a total of 25,442 lexical units that are related to metaphor. This is 1.8% of all metaphor-related words. Even if we take into account the maximum error margin for compounds (44%), based on the discrepancy between BNC-Baby and the Macmillan dictionary, and the true error margin for phrasal verbs (8%), based on the convergence between BNC-Baby and the Macmillan dictionary (see previous chapter), there should be no more than at most a roughly estimated 700 complex lexical units that are used metaphorically. Given this quantitatively marginal role, we decided to ignore the difference between simple and complex lexical units for the main analysis below. Future research may fine-tune our view of the role of this category of metaphorical language. For now we think its overall importance may be decreased for grasping the more general patterns of metaphor-related lexical units in discourse, since it constitutes only 1.8% of these patterns. The other preliminary issue that was addressed in this initial exploration of the data was the role of borderline cases that had been explicitly marked up as such by the code WIDLII, ‘When In Doubt, Leave It In’. This code was assigned to those cases that, after initial independent annotation by one analyst, and subsequent online commenting by colleagues, were not resolved by live group discussion between all analysts. Cases eventually marked as WIDLII represent the truly problematic cases in our data, and their annotation as such explicitly signals them as an interesting group for further research. Table 10.1 also displays the distribution of WIDLII over simple and complex lexical units, contrasted with secure cases of MRWs versus non-MRWs and MFlags. There are 1,831 cases classified as WIDLII out of a total of 25,442 MRWs, which is 7.2%. Intuitively this looks like an acceptable band of borderline cases in as complex a field as metaphor identification. When we take into account the error margin calculated for the WIDLII category in Chapter 9, this band may be estimated to be in fact slightly broader, amounting to some 9% at the maximum. But all in all, the set of lexical units marked as WIDLII seem to constitute a valid group of borderline cases that warrant further attention as such. Yet there is a substantial difference between the two classes of lexical units distinguished in this table, on the one hand, and the importance of borderline metaphorical status on the other. For simple lexical units, the group of borderline cases does not have a significant standardized residual from what may be expected by chance, but for complex lexical units, it does. In complex lexical units, there are

 A Method for Linguistic Metaphor Identification

almost half as many borderline cases more than in simple lexical units. These findings reinforce the idea formulated above that, in our data, complex lexical units are more problematic than simple lexical units. For now, however, we may conclude that the analysis of borderline cases supports our decision to ignore the role of complex lexical units as a separate group in our data, simply because they are harder to interpret and constitute a very small group in the data. However, their relatively more problematic status in terms of reliable analysis does not mean that they should be removed from the dataset: even if all 66 complex lexical units that are WIDLII turned out to be not metaphorical, which is unlikely, this cannot be expected to have a major effect on the overall behaviour of all 1831 WIDLIIs in the statistical analyses.

10.4 Results and discussion: Main analysis 10.4.1 Metaphor across register and word class The initial exploration of the data has led to the following conclusions: 1. Given the extremely small numbers, MFlags will be disregarded as a separate group of cases; in the following statistical comparisons, they will be taken as part of the group of non-MRWs. 2. Given the small numbers and the comparatively greater error margin, complex lexical units will be disregarded as a separate group of cases from now on; they will be seen as one part of the encompassing group of all lexical units. 3. Given the important theoretical role of the group of WIDLIIs, and their sufficient size, they need to be examined further as a separate group of cases; they will be seen as an intermediate group between non-MRWs on the one hand and clear MRWs on the other. We will now proceed to examine the association between the three adjusted categories of ‘relation to metaphor’ (non-MRWs, WIDLII, MRWs) and the four registers of news, conversation, fiction, and academic discourse. A chi-square analysis was performed to test whether there was a statistically significant association between the two variables. The chi-square analysis is technically allowed, because there are no cells that have an expected value lower than 5. The test statistic shows that there is a significant association between the two variables, but that the effect size is modest (χ2(3) = 2,968.0, p < 0.001; Cramer’s V = 0.09, p < 0.001). The findings of the analysis are shown in Table 10.2.

Chapter 10. Metaphor in English discourse 

Table 10.2. Relation of lexical units to metaphor, divided by register Register

Academic

Conversation

Fiction

News

Total

Type of information

Obs count Exp count % register Stnd residual Obs count Exp count % register Stnd residual Obs count Exp count % register Stnd residual Obs count Exp count % register Stnd residual Obs count Exp count % register

Relation to metaphor

Total

Non-MRW

WIDLII

MRW

40,194.0 42,593.4 81.5 −11.6 44,247.0 41,401.5 92.3 14.0 39,355.0 38,563.3 88.3 4.2 37,450.0 38,687.7 83.6 −6.3

496.0 483.7 1.0 0.6 437.0 470.1 0.9 −1.5 410.0 437.9 0.9 −1.5 488.0 439.3 1.1 2.3

8,624.0 6,236.9 17.5 30.2 3,250.0 6,062.4 6.8 −36.1 4,883.0 5,646.8 10.8 −10.7 6,854.0 5,665.0 15.3 15.8

49,314 49,314 100

16,1246.0 16,1246.0 86.4

1831.0 1831.0 1.0

23,611.0 23,611.0 12.6

186,688 186,688 100

47,934 47,934 100 44,648 44,648 100 44,792 44,792 100

Again, the most important information is expressed by the standardized residuals: the larger they are, the more important the deviation of a particular category in a particular register. It is therefore convenient to simplify interpretation by establishing that, by this criterion, the category of WIDLII is not very important. The table shows that all registers display approximately 1% of WIDLII. The percentage of borderline cases is roughly equal for all four registers. It is true that the proportion of WIDLIIs in news is significantly deviant from what may be expected by chance, but not at a significance level of α = 0.01. Moreover, in absolute numbers, the deviation is so small that it pales in comparison with what we find elsewhere. In particular, the distribution of the MRWs versus the non-MRWs is quite variable across the four registers: academic has the highest proportion of MRWs (17.5%), conversation the lowest (6.8%), and news (15.3%) and fiction (10.8%) are in between; proportions of non-MRWs for all registers are inversely related. Academic discourse has the highest percentage of MRWs. This may be due to its relatively abstract content, which may prompt the frequent use of metaphorical language use. Superficially the most likely candidate for a large number of MRWs, fiction, is only in third place, at a considerable distance behind both

 A Method for Linguistic Metaphor Identification

academic and news discourse. The experience of metaphor as typical of fiction may therefore be due to other factors than just the number of MRWs, such as the attitude of the reader and the nature of the metaphors themselves (cf. Semino & Steen, 2008; Steen, 1994; Steen & Gibbs, 2004). It is noteworthy, too, that the three written registers in combination display an average percentage of MRWs of 14.5, which is twice as large as that found in spoken interaction. Since conversation has a lower degree of informative purpose than writing (Biber 1988, 1989), the question arises whether the distribution of metaphor is related to this aspect of discourse. This interpretation may be even more relevant to the types of spoken interaction included in the present sample from BNC-Baby: these do not include business meetings, job interviews, or other professional verbal interactions, but only comprise casual conversations. Informative discourse in general, by contrast, may also require many metaphorical mappings to express more abstract content, as in academic discourse. This may explain some of the difference between the written and spoken data in our corpus. The reference to Biber leads on to one of the main ideas behind our research, that metaphor does not only interact with register, but also with the interaction between register and word class. This is a well-known fact about the nature of both registers and word classes which has been extensively documented in his work (Biber 1988, 1989; Biber et al. 1999). Indeed, the very choice of our materials from BNC-Baby was partly motivated by this consideration. In other words, we need to examine our data for the presence of three-way interactions between metaphor, register, and word class. If there are any, this would prevent all direct interpretations of main associations between metaphor and register. Instead, such higher-order interactions would force us to look at the distribution of the metaphor classes across word classes per register, and then synthesize these observations into a more complex picture. In order to check for this possibility, a three-way frequency analysis was performed to develop a log-linear model of metaphor distribution across registers and main word classes. The predictors were register (academic, conversation, fiction, and news) and word class (Adjective, Adverb, Conjunctions, Determiner, Noun, Preposition, Verb, Remainder; the latter contains articles, existential there, number words, pronouns, and so on). There were 186,688 lexical units that had been coded as not related to metaphor (non-MRW), borderline (WIDLII), or related to metaphor (MRW), but the statistically required assumption that all cells displayed an expected frequency greater than 5 was violated. Therefore all borderline cases marked as ‘WIDLII’ were combined with all cases marked as clear metaphor for present purposes. Stepwise selection by simple deletion of effects using SPSS16 produced a model that included all higher-order effects. The model had a likelihood ratio of χ2(0) = 0, p = 1, indicating a good fit between observed frequencies and expected

Chapter 10. Metaphor in English discourse 

frequencies generated by the model. Even though there are significant main effects of register and of word class, these cannot be taken on their own because of the significant highest-order interaction between metaphor, register, and word class (χ2(21) = 890.95, p < 0.000). As a result, we need to go on and carry out separate tests for a more precise picture of the relations between metaphor and word class within distinct registers. This is also where the frequencies and percentages of MRWs and MRWs divided by word class will be reported. Let us briefly comment on the significance of the three-way interaction between metaphor, register, and word class. This finding suggests that ‘metaphor in language’ when divided between four registers is not just one homogeneous phenomenon: metaphor in language is in fact the sum of a set of frequencies of MRWs distributed across (in this case) eight distinct word classes, and the interaction shows that the components of this sum vary in statistically reliable ways between the four registers in the sample, as we shall detail in a moment. In other words, the total percentages of MRWs per register mask the fact that they are composed of groups of MRWs per word class that exhibit substantially diverging distributions across the four registers. If the role of word class is taken as sufficiently important to be included as a factor in the explanation of metaphor in discourse, as is desirable because of independently produced findings on register and word class in research such as Biber’s, then this variation should also be taken seriously and cannot be ignored. To put this differently, observed significant main effects cannot be simply discussed without adding further caveats and details if we wish to keep on board our interest in lexical units as groups of different word classes that interact with register. Yet another way of explaining how this is important is to consider the fact that if particular word classes were taken out of the complete picture, the overall comparison between metaphors in the four registers would look different. Similarly, observations about MRWs across word classes are abstractions across the totals of four different registers, in which these patterns work in significantly different ways. ‘Metaphor in language’ cannot just be looked at as a matter of metaphor in word classes, either, as happens in for instance Cameron (2008c), who claims that English has a tendency ‘to place metaphoricity in the verb’. The present findings remind us that it should not be forgotten that the picture that is offered by that perspective ignores the functional variation of metaphor across word classes that is due to register. Again, the three-way interaction suggests that if one or more registers were removed from the comparison, the overall patterns between the word classes would be affected and possibly altered. Register and word class In order to break down the three-way interaction between metaphor, register, and word class, we followed the lead provided by Biber (1988, 1989). We therefore first needed to check whether the materials in our metaphor

 A Method for Linguistic Metaphor Identification

corpus are sufficiently representative of the four registers of academic discourse, journalism, fiction, and conversation as described in that research. Biber and Conrad (2001: 185) present an overview of the relation between 23 registers and the most important dimension of register variation, ‘Involved versus Informational Production’. This dimension accounts for a large proportion of the linguistic variation between registers, grouping a wide range of linguistic features into two complementary co-occurrence patterns. There is one set of registers that exhibit ‘involved production’ and are characterized by the conspicuous presence of one group of linguistic features, including private verbs, present-tense verbs, first and second person pronouns, questions, discourse particles, and contractions, and the relative absence of another group of linguistic features, including nouns, prepositions, and attributive adjectives. In contrast, there is a second set of registers that are based on ‘informational production’ which are characterized by the relative absence of the former group of features and the conspicuous presence of the latter group. Of our four registers, face-to-face conversation is in the highest regions of the involved side of the dimension, and consequently low on informational production; academic prose and news are in the highest regions of the informational end of the dimension, and consequently low on involved production; and fiction (including romance, mystery and adventure, and general fiction) centres around the middle of the dimension. These scores reflect the interactional, online and spoken production of conversation as opposed to the transactional, revised and written production of academic and news texts. Since fiction typically contains dialogue as well as narrator text, it may be seen as an amalgam of both types of production. Biber’s findings for the dimension of involved versus informational production therefore suggest that the four registers in our research can be seen as three points on one continuum, with conversations at the involved end of the scale, news and academic texts at the informational end, and fiction in the middle. If we look at the major word classes that can be distinguished in discourse, these positions roughly correspond with a marked use of nouns, prepositions and adjectives for the informational end, that is, for news and academic texts, versus a marked use of verbs and the remainder category (including existential there, number words, pronouns, and so on) for conversations, while fiction would be in between these extremes for all of these word classes. A first test of the representativeness of our materials may therefore be performed by crossing the eight major word classes we have distinguished with the four registers and testing whether these predictions are borne out. A two-way frequency table was constructed crossing the variables of register (with academic texts, conversation, fiction, and news texts) with word class (with eight categories: adjectives, adverbs, conjunctions, determiners, nouns, prepositions, verbs, and remainder). Table 10.3 offers details about observed and

Chapter 10. Metaphor in English discourse 

(truncated) expected frequencies, percentages of observations per register, and standardized residuals. A chi-square analysis shows that there is a significant association between the two variables: χ2(21) = 18,213.74, p < 0.001; Cramer’s V = 0.18, p < 0.001. Table 10.3. Relation of lexical units to register, divided by major word class Academic News

Fiction

Conversation

Total

Adjectives Obs Exp % SR

4659 3470 35.5 +20.2

3760 3152 28.6 +10.8

2969 3142 22.6 −3.1

1750 3373 13.3 −27.9

13138 13138 100

Obs Exp % SR

2503 3121 21.2 −11.1

2183 2834 18.5 −12.2

2839 2825 24.0 +0.3

4290 3033 36.3 +22.8

11815 11815 100

Obs Exp % SR

3028 2737 29.2 +5.5

2437 2486 23.5 −1.0

2498 2478 24.1 +0.4

2401 2661 23.2 −5.0

10364 10364 100

Obs Exp % SR

6743 5705 31.2 +13.7

5700 5182 26.4 +7.2

4961 5165 23.0 −2.8

4195 5545 19.4 −18.1

21599 21599 100

Obs Exp % SR

13342 10962 32.1 +22.7

12930 9957 31.2 +29.8

9648 9925 23.2 −2.8

5582 10656 13.4 −49.2

41502 41502 100

Obs Exp % SR

6463 4835 35.3 +23.4

5135 4391 28.1 +11.2

4228 4377 23.1 −2.3

2479 4700 13.5 −32.4

18305 18305 100

Obs Exp % SR

8147 10027 21.5 −18.8

7869 9108 20.7 −13.0

9788 9078 25.8 +7.4

12158 9747 32 +24.4

37962 37962 100

Adverbs

Conjunctions

Determiners

Nouns

Prepositions

Verbs

(Continued)

 A Method for Linguistic Metaphor Identification

Table 10.3. Relation of lexical units to register, divided by major word class (Continued) Academic News

Fiction

Conversation

Total

Remainder Obs Exp % SR

4429 8453 13.8 −43.8

4778 7678 14.9 −33.1

7717 7653 24.1 +0.7

15079 8217 47.1 +75.7

32003 32003 100

Obs Exp %

49314 49314 26.4

44792 44792 24.0

44648 44648 23.9

47934 47934 25.7

186688 186688 100

Total

The patterns of distribution may be interpreted according to the predictions that can be derived from the work by Biber and his colleagues. It is clear that nouns, prepositions and adjectives are used much more often in academic and news texts than in conversation, and that fiction occupies a middle position. Vice versa, it is also clear that the remainder category and verbs are used much more frequently in conversations than in academic and in news texts, again with fiction in the middle. Determiners seem to correlate with the heavy nominal component of the informational end of the scale, and display the mirror image of the category labelled remainder. The idea of a complementary distribution for the two sets of word class features across specific registers is clearly borne out by the data. Other dimensions of register may also be employed here to gain better insight into the typical linguistic nature of our four registers. For instance, the third dimension of Biber’s (1988, 1989) multifactor-multifeature model concerns the expression of reference as dependent on or independent of the situation of the utterance. It is partly marked by the conspicuous use of all sorts of time and place adverbials for the situation-dependent end of the scale: the use of here and now, for instance, is highly characteristic of this side of the third dimension. Situationdependent reference is frequently used in conversation and in fiction, according to Biber (1988), but not in printed news and academic texts. This accords with the frequencies for adverbs that we find in Table 10.3. Of course, not all adverbs are of this type, but it looks as if the distribution of adverbs across registers can be partly attributed to this particular dimension. In sum, most patterns are as expected by Biber, or compatible: –– Academic texts have a high number of nouns, prepositions, adjectives, and determiners, and a low number of verbs, adverbs, and remainder –– News texts have a high number of nouns, prepositions, adjectives, and determiners, and a low number of verbs, adverbs, and remainder –– Fiction is in between academic and news on the one hand and conversation on the other, most probably because of the mix of narrative (resembling news)

Chapter 10. Metaphor in English discourse 

and dialogue (resembling conversation); it is roughly characterized by a high number of verbs and a low number of adjectives –– Conversation has a low number of nouns, prepositions, adjectives, and determiners, and a high number of verbs, adverbs, and remainder. These findings suggest that our sample for metaphor analysis extracted from BNCBaby roughly behaves in accordance with the expectations that can be derived from Biber’s corpus, so that it is representative of the way in which these registers have been described in Biber (1988, 1989). The absolute numbers of MRWs that will be reported below consequently have to be interpreted with reference to the general association of particular word classes with specific registers; the latter, in turn, are taken as a reflection of the typically involved or informational production of a specific register, its situation-dependent or independent reference, or possibly other properties labelled by the poles of Biber’s dimensions. The two-way relations between metaphor and register in Table 10.2 as well as register and word class in Table 10.3 have offered us a first glimpse of the distribution of metaphor in discourse. In particular, for metaphor and register, it is clear that academic discourse has the highest preference for MRWs, while conversation has the lowest, with fiction and news occupying intermediate positions. We have also noted that these are simplifications. They need to be corrected by examining the interaction between the factors of register and word class in relation to metaphor. This is what will happen in the following analyses. 10.4.2 Metaphor across word class in four distinct registers Separate chi-square analyses were performed for each of the four registers, to break down the three-way interaction between metaphor, register, and word class. In order to prepare for the comparisons between registers at the end of this section, the order of discussion will be slightly different than hitherto. We will now begin with the register containing the greatest number of metaphor-related words, which is academic discourse, and work our way down through news and fiction to conversations. This will make it easier to observe shifts in distribution as we go along, and it will naturally lead on to a final comparison in one table in the same order. The results for academic discourse are displayed in Table 10.4. There was a significant association between word class and relation to metaphor: χ2(7) = 4,879.22, p < 0.001; Cramer’s V = 0.32. The average percentage of lexical units related to metaphor is 18.5, which makes academic texts the most metaphorical register in our corpus: one in every five to six lexical units is related to metaphor. As hinted above, this may be due to the overall informative and abstract nature of academic discourse, which may require more metaphorical expressions than other types of discourse.

 A Method for Linguistic Metaphor Identification

Table 10.4. Relation of lexical units to metaphor in academic texts, divided by major word class Word class

Type of information

Adjectives

Adverbs

Conjunctions

Determiners

Nouns

Prepositions

Verbs

Remainder

Total

Non-MRW

MRW

Total

Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual

3841 3797.2 82.4 0.7 2251 2040.0 89.9 4.7 2987 2467.9 98.6 10.4 6199 5495.7 91.9 9.5 10997 10874 82.4 1.2 3713 5267.5 57.5 −21.4 5892 6640.0 72.3 −9.2 4312 3609.7 97.4 11.7

818 861.8 17.6 −1.5 252 463.0 10.1 −9.8 41 560.1 1.4 −21.9 544 1247.3 8.1 −19.9 2345 2468 17.6 −2.5 2750 1195.5 42.5 45.0 2255 1507.4 27.7 19.3 117 819.3 2.6 −24.5

4659 4659 100.0

Obs count Exp count % in register

40192 40192.0 81.5

9122 9122.0 18.5

2503 2503 100.0 3028 3028 100.0 6743 6743 100.0 13342 13342 100.0 6463 6463 100.0 8147 8147 100.0 4429 4429 100.0 49314 49314.0 100.0

The largest group of lexical units that is related to metaphor, in absolute numbers, is the group of prepositions, with 2750 cases. It is followed by nouns (2345) and verbs (2255). As we shall see, this is the only register in which there are more metaphor-related nouns than verbs, which may have to do with the abstract nature of the content domain of academic texts. Together, the three largest groups account for 7350 lexical units that are related to metaphor, out of a total of 49,314 lexical

Chapter 10. Metaphor in English discourse 

units in the academic subcorpus (14.9%). The three largest groups hence comprise 80.6% of all metaphor-related lexical units in the academic texts. In relative numbers, it is also the prepositions which are most often metaphorical: no fewer than 42.5% are related to metaphor. This is probably due to the temporal and abstract uses that prepositions are often put to. The second most frequent group of metaphor-related words in terms of percentages is verbs, with 27.7%. Adjectives are in third place, with 17.6%. In all, academic discourse is characterized by the highest proportion of metaphor-related words of all four registers. In absolute terms, it is dominated by metaphorical uses of prepositions, nouns, and verbs. In relative terms, too, it is prepositions and verbs, with adjectives, which have the most frequent metaphorical use. Table 10.5 presents the results for the register of news. It shows a significant association between word class and relation to metaphor (χ2(7) = 4,252.00, p < 0.001; Cramer’s V = 0.31). A total number of 7,342 (16.4%) of all 44,792 lexical units were related to metaphor. On average, one in every six words in news texts may be used to express a metaphorical mapping. As was mentioned above, this makes news texts much more metaphorical than fiction and conversations, but still less metaphorical than academic texts. Table 10.5. Relation of lexical units to metaphor in news, divided by major word class Word class

Type of information

Adjectives

Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual

Adverbs

Conjunctions

Determiners

Nouns

Non-MRW

MRW

Total

2969 3143.7 79.0 −3.1 1942 1825.2 89.0 2.7 2415 2037.5 99.1 8.4 5361 4765.7 94.1 8.6 11229 10810.6 86.8 4.0

791 616.3 21.0 7.0 241 357.8 11.0 −6.2 22 399.5 0.9 −18.9 339 934.3 5.9 −19.5 1701 2119.4 13.2 −9.1

3760 3760 100.0 2183 2183 100.0 2437 2437 100.0 5700 5700 100.0 12930 12930 100.0 (Continued)

 A Method for Linguistic Metaphor Identification

Table 10.5. Relation of lexical units to metaphor in news, divided by major word class (Continued) Word class Prepositions

Verbs

Remainder

Total

Type of information Obs count

Non-MRW

MRW

Total 5135 5135 100.0

Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual

3177 4293.3 61.9 −17.0 5697 6579.2 72.4 −10.9 4660 3994.8 97.5 10.5

1958 841.7 38.1 38.5 2172 1289.8 27.6 24.6 118 738.2 2.5 −23.8

Obs count Exp count % in register

37450 37450.0 83.6

7342 7342.0 16.4

7869 7869 100.0 4778 4778 100.0 44792 44792.0 100.0

In terms of absolute numbers, the following picture emerges. The word classes most frequently related to metaphor in the register of news are verbs, prepositions, and nouns. Verbs exhibit 2172 cases that are related to metaphor, prepositions 1958, and nouns 1701. Together, they account for 5831 cases, in a data set of 44,792 (which is 13.02%). In news texts, the three word classes of verbs, prepositions and nouns combined account for 79.42% of all MRWs. In relative terms, it is the prepositions, again, that have the highest percentage of metaphor-related cases: 38%. Verbs and adjectives are runners-up again, just as with academic discourse: 27.6% of all verbs in news texts are related to metaphor, in contrast with 21.0% of all adjectives. In all, then, news discourse is the register that has the second highest number of metaphor-related words. In absolute figures, it is dominated by verbs, prepositions and nouns, which is comparable to academic texts, with the exception that verbs and nouns have switched position. Percentage-wise, prepositions, verbs, and adjectives have the largest groups of metaphor-related cases. The behaviour of these word classes makes news fairly comparable to academic discourse, too. For fiction, another two-way frequency table was run, with relation to metaphor and word class as the two crossed variables again (see Table 10.6). There was a significant association between word class and relation to metaphor: χ2(7) = 3,473.98, p < 0.001; Cramer’s V = 0.28. Out of 44,648 lexical units, 5,293 were related to metaphor (11.9%). This makes fiction the least

Chapter 10. Metaphor in English discourse 

etaphorical of the three written registers included in our study, even though it is m still more metaphorical than the conversations. One possible explanation of this position is that fiction also contains dialogue, so that it may in fact constitute a hybrid register between written and spoken language use (cf. Biber et al. 1999: 16). Table 10.6. Relation of lexical units to metaphor in fiction, divided by major word class Word class

Type of information

Adjectives

Adverbs

Conjunctions

Determiners

Nouns

Prepositions

Verbs

Remainder

Total

Non-MRW

MRW

Total

Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual

2394 2617.0 80.6 −4.4 2575 2502.4 90.7 1.5 2473 2201.9 99.0 5.8 4583 4372.9 92.4 3.2 8632 8504.2 89.5 1.4 2817 3726.8 66.6 −14.9 8233 8627.6 84.1 −4.2 7648 6802.2 99.1 10.3

575 352.0 19.4 11.9 264 336.6 9.3 −4.0 25 296.1 1.0 −15.8 378 588.1 7.6 −8.7 1016 1143.8 10.5 −3.8 1411 501.2 33.4 40.6 1555 1160.4 15.9 11.6 69 914.8 0.9 −28.0

2969 2969 100.0

Obs count Exp count % in register

39355 39355.0 88.1

5293 5293.0 11.9

2839 2839 100.0 2498 2498.0 100.0 4961 4961 100.0 9648 9648 100.0 4228 4228 100.0 9788 9788 100.0 7717 7717 100.0 44648 44648.0 100.0

 A Method for Linguistic Metaphor Identification

In absolute numbers, verbs, prepositions, and nouns are the most prominent categories in the data again. There are 1555 verbs related to metaphor, 1411 prepositions, and 1016 nouns. Together, these three word classes account for 3982 cases out of a total of 44,648, which makes 8.92%. Of the total number of words in fiction that display a relation to metaphor, the percentage is 75.23. In relative terms, however, verbs have dropped to third place, exhibiting 15.9% of metaphor-related uses. They are preceded by adjectives, which have 19.4% metaphor-related cases, and prepositions (33.4%). In all three written registers, therefore, it is the prepositions that are relatively most often used in relation to metaphor, while verbs and adjectives are the two content word classes that are runners up. Fiction is the register that, of the three written registers included in our study, displays the lowest number of lexical units that are related to metaphor. This may be due to the amount of dialogue that is typically part of fiction, which may make a lot of fiction comparable to conversations. However, in terms of use of word classes for metaphorical purposes of expression, the rank order of most popular word classes is more or less the same in fiction as in news and academic discourse. The most striking deviation from the general patterns for written discourse seems to lie in the comparatively less frequent metaphorical use of prepositions in fiction, which may have to do with their relatively more important spatial use in imaginative narrative. This is also reflected in the absolute numbers, where the largest metaphor-related word class in fiction is verbs rather than prepositions. For conversations, the results can be found in Table 10.7. There was a significant association between word class and relation to metaphor: χ2(7) = 4,178.54, Table 10.7. Relation of lexical units to metaphor in conversations, divided by major word class Word class

Type of information

Adjectives

Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual

Adverbs

Conjunctions

Non-MRW 1517 1615.4 86.7 −2.4 3969 3960.0 92.5 0.1 2366 2216.3 98.5 3.2

MRW

Total

233 134.6 13.3 8.5 321 330.0 7.5 −0.5 35 184.7 1.5 −11.0

1750 1750 100.0 4290 4290 100.0 2401 2401.0 100.0 (Continued)

Chapter 10. Metaphor in English discourse 

Table 10.7. (Continued) Determiners

Nouns

Prepositions

Verbs

Remainder

Total

Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual Obs count Exp count % in register Stnd residual

3541 3872.3 84.4 −5.3 5121 5152.6 91.7 −0.4 1641 2288.3 66.2 −13.5 11048 11222.8 90.9 −1.7 15044 13919.1 99.8 9.5

654 322.7 15.6 18.4 461 429.4 8.3 1.5 838 190.7 33.8 46.9 1110 935.2 9.1 5.6 35 1159.9 0.2 −33.0

Obs count Exp count % in register

44247 44247.0 92.3

3687 3687.0 7.7

4195 4195 100.0 5582 5582 100.0 2479 2479 100.0 12158 12158 100.0 15079 15079 100.0 47934 47934.0 100.0

p < 0.001; Cramer’s V = 0.30. The total number of MRWs is 3,687 (7.7%). This suggests that, on average, one in every 13 to 14 words is related to metaphor in conversations. Compared against the three written registers, this means that conversations exhibit just over half the number of MRWs compared to the three types of written discourse included in the investigation, news, fiction, and academic texts. In absolute numbers, verbs, prepositions, and determiners constitute the three largest groups of lexical units related to metaphor. Verbs form the most frequent category, with 1110 cases, followed by prepositions (838) and determiners (654), which have pushed nouns back to fourth position (461). The rise of verbs, in absolute numbers, through academic, news, and fiction texts is confirmed in conversations, where verbs have now even surpassed prepositions as the general top category. But it is quite possible that this is simply due to the larger share that all verbs have in conversations, a point which we saw confirmed above in the two-way table between register and word class (Table 10.3). The total number of metaphor-related lexical units of the three word classes of verbs, prepositions, and nouns amounts to 2409, which constitutes five percent of the total number of lexical units in all conversations in

 A Method for Linguistic Metaphor Identification

the sample. Against the total number of lexical units related to metaphor in conversations, these three word classes account for 65.34% of all MRWs in conversations. Percentage-wise, the regular top category of prepositions appears to be most often related to metaphor in conversations too. Prepositions exhibit 33.8% of metaphor-related cases, but are followed by a new runner up, determiners, with 15.6%. In third place we find the adjectives, with 13.3%. Verbs are even lower, with just 9.1% of all uses in conversation related to metaphor, a figure that is very close to nouns (8.3%) and adverbs (7.5%). In all, then, the interaction between metaphor, register and word class exhibits the following pattern for conversations. One out of every thirteen to fourteen lexical units used in conversations is related to metaphor, which is about half of what happens in the three written registers examined in this corpus. Proportionately, prepositions and nouns are used considerably less frequently as metaphors in conversations than elsewhere, but it is especially verbs that are unpopular as expressions of metaphor in conversations. As a result, determiners, adjectives and adverbs gain in relative weight when it comes to the function of expressing metaphorical meanings in conversation. At the same time, because conversations are inherently characterized by a high frequency of verbs, it is still possible for verbs to dominate the absolute numbers of MRWs for the conversations that were investigated. It should be added that this is true for the present sample of conversations, which may represent only one part of the much broader range of spontaneous spoken interaction. Moreover, these patterns should be interpreted in the light of the patterns found in the other three registers, which are all written as opposed to spoken. 10.5 General comparison and conclusion 10.5.1 General comparison There is a three-way interaction between the variables of ‘relation to metaphor’ (no relation or relation), register (conversation, fiction, news and academic) and word class (adjective, adverb, conjunction, determiner, noun, preposition, verb and remainder). A two-way frequency table revealed major relations between register and metaphor, but this main effect conceals significant three-way interactions: different registers exhibit different relations between metaphor and word class, both in absolute as well as in relative terms. Different word classes similarly exhibit different relations between metaphor and register. But since our perspective in this research is that of discourse, we have pursued the first route into the data, exploring the various patterns of distribution of metaphor across word class per register. This approach reflects our concern with metaphor as a semantic device in usage events. It has to be opposed to the study of metaphor as a conventionalized sign

Chapter 10. Metaphor in English discourse 

in language as a semiotic system, which might primarily be concerned with the behaviour of metaphor in distinct word classes. Registers are taken here as language varieties which reflect a number of functional aspects of language use for distinct genres, including domain, setting, purpose, and so on (Steen 1999, in press c). It is, moreover, well-known that registers are partly constituted by their own use of word classes (as well as other linguistic features that are ignored here). Douglas Biber, in particular, has described all four registers under examination in this study, paying ample attention to the varying use and co-occurrence patterns of the distinct word classes included in our analysis (e.g. Biber 1988, 1989; cf. Biber & Conrad 2001). This is the background against which we can attempt to understand the interactions between register, word class, and relation to metaphor which have been revealed in our own research. The absolute numbers of MRWs have to be related to the general increases and decreases of particular word classes in specific registers; the latter, in turn, are taken as a reflection of their typically involved or informational production, or situation-dependent or independent reference, the most relevant dimensions of Biber’s multidimensional – multifeature model for register. This may throw some light on the variation in absolute numbers between the three major word classes of prepositions, nouns, and verbs, on the one hand, and their relation to metaphor, on the other. For instance, the fact that these three word classes account for 80% of all MRWs in news and academic texts, 75% in fiction, and 66% in conversations may now be seen as at least partly due to the frequent role of prepositions and nouns in informational production, which is characteristic of academic and news texts, is much less important in fiction, and goes against the nature of conversations. When determiners are added to this overall pattern, the scope of this explanation increases even more. In general, then, if a word class is typical of, and has an important function for, a particular class of discourse, it is consequently more frequent and therefore also more prominent in comparison to other word classes. This will naturally raise the absolute number of metaphor-related lexical units in that class. Many of the above findings about relations between word classes and metaphor can hence be explained with reference to the functional variation of these word classes in general. A lot of metaphor in discourse therefore becomes rather unremarkable. The crucial question becomes: which metaphorical uses of which word classes in which registers may indeed be regarded as extraordinary or special? Two answers may be given: first, we have a special situation if some word classes are related to metaphor in ways that go against the distributional grain; that is, if a word class is related positively to a register but its metaphorical use is negatively related to that register, or the other way around. And second, another type of special situation arises when the distribution of a word class noticeably exceeds

 A Method for Linguistic Metaphor Identification

the proportions that might be expected on the basis of the general patterns. These issues will now finally be checked with reference to a table which combines the standardized residuals which were presented in the previous section in one overview (Table 10.8). Table 10.8. Standardized residuals for relation of lexical units to metaphor per register, divided by major word class Academic

Adjectives Adverbs Conjunctions Determiners Nouns Prepositions Verbs Remainder

News

Fiction

Conversation

NonMRW

MRW

NonMRW

MRW

NonMRW

MRW

NonMRW

MRW

0.7 4.7 10.4 9.5 1.2 −21.4 −9.2 11.7

−1.5 −9.8 −21.9 −19.9 −2.5 45.0 19.3 −24.5

−3.1 2.7 8.4 8.6 4.0 −17.0 −10.9 10.5

7.0 −6.2 −18.9 −19.5 −9.1 38.5 24.6 −23.8

−4.4 1.5 5.8 3.2 1.4 −14.9 −4.2 10.3

11.9 −4.0 −15.8 −8.7 −3.8 40.6 11.6 −28.0

−2.4 0.1 3.2 −5.3 −0.4 −13.5 −1.7 9.5

8.5 −0.5 −11.0 18.4 1.5 46.9 5.7 −33.0

When we consider the figures in Table 10.8, it is striking that some word classes tend to have an overuse of metaphor-related cases and an underuse of cases that are not related to metaphor, whereas other word classes display the opposite pattern. If there were no exceptions to this overall pattern across our four registers, there would just be a two-way interaction between metaphor, word class, and register, in which some word classes tend to display a relatively high metaphorical usage whereas others do the opposite. However, there are exceptions which are related to specific registers, so that we have to become more specific. Let us begin with the cases of relatively frequent metaphorical use in our data, which can be found in prepositions and verbs; both have significantly positive standardized residuals for metaphorical use across all four registers. Their nonmetaphorical counterparts all have significantly negative standardized residuals, with one exception, verbs in conversation, which is close to being significant. This exception signals that there is no underuse of non-metaphorical verbs in conversation. The positive standardized residuals, by contrast, signal that, in each register, prepositions and verbs are consistently used more often in metaphorical ways than may be expected by chance when we look at all eight word classes together. For prepositions, this is probably a result of the fact that they have basic meanings rooted in space whereas their use often concerns abstract relations, including temporal ones, as is well known.

Chapter 10. Metaphor in English discourse 

By contrast, there are other word classes which are characterized by relatively infrequent metaphorical usage. Thus, the remainder category and the conjunctions systematically display significant positive standardized residuals for nonmetaphorical usage, which suggests an overuse of that category in comparison with metaphorical usage. Vice versa, for both the remainder category and the conjunctions, the metaphor-related cases are heavily underrepresented, as is signalled by the large standardized residuals. This may be seen as a natural reflection of their heavily grammatical, that is, skeletal semantic content, which makes it virtually impossible for these two groups to display a contrast between basic nonmetaphorical and contextual metaphorical meanings. An exception may be formed by for instance the metaphorical use of conjunctions such as where at the beginning of abstract clauses in educational or scientific texts. Four categories remain which do not completely follow one of these two patterns: 1. Adjectives are closest to the first pattern, with frequent metaphorical usage in news, fiction, and conversation, and underrepresentation of non-metaphorical adjectives in the same registers; but this word class is completely neutral for metaphorical and non-metaphorical usage in academic discourse. 2. Determiners are very much like the second pattern, behaving like the conjunctions and remainder category, with one clear exception. Metaphorical use of determiners is consistently underrepresented in the three written registers, while non-metaphorical use is consistently overrepresented in the same registers, just as with conjunctions and the remainder category. However, in conversation, this pattern is turned on its head, metaphorical use of determiners being overrepresented whereas non-metaphorical use is underrepresented. This probably has to do with the large number of demonstratives such as this, that, these, those that are used metaphorically in spoken interaction (see Chapters 4 and 7). 3. Nouns are also like conjunctions and the remainder category in its underuse of metaphorical cases, but this word class is exceptional in two ways: first of all, it does not exhibit a lack of metaphor-related nouns in conversation, which may be compared to the situation with determiners discussed just now; and secondly, it does not display a contrastive overuse of non-metaphorical nouns, except in news. Moreover, the overuse of metaphorical nouns in academic discourse borders on being significant if we stick to the level of α = 0.01, which is desirable because of the large number of cases. In this way, nouns are the most irregular, that is, unpredictable, word class of all the word classes examined so far.

 A Method for Linguistic Metaphor Identification

4. Adverbs are somewhat similar to nouns and determiners. It has systematic underuse of metaphorical cases in all three written registers. In conversations, the metaphorical use of adverbs is neutral. But adverbs display systematic overuse of non-metaphorical items in only two of the four registers: academic discourse and news texts. Together with nouns and determiners, adverbs are the main reason why there is a three-way interaction between metaphor, word class, and register, although it should not be forgotten that verbs and adjectives do not behave in one-hundred percent predictable ways either. In all, then, there are three word classes that are quite regular (prepositions, which tend to be relatively metaphor-related, and conjunctions and remainder, which tend to be relatively non-metaphorical) as opposed to five word classes that are less regular. It is striking that the less regular word classes comprise the four main content word classes. These may be substantive semantic reflections of different aspects of each of the four registers. Given the prominent role of nouns, adverbs, verbs, and adjectives in relation to register, we will now attempt to interpret the three-way interaction between metaphor, word class and register for these word classes from a functional perspective, setting out from the division between registers. For academic discourse, this means that we have to consider the positive relation between metaphor and verbs, the negative relation between metaphor and adverbs, and the neutral relation between metaphor and adjectives as well as nouns. It is easiest to begin with the neutral relation between metaphor and adjectives and nouns, since this simply signals that metaphor has no influence on the use of either of these two word classes in academic writing: non-metaphorical and metaphorical use are as expected by chance for these word classes in this register. This should be contrasted with the positive relation between metaphor and verbs, which suggests that the use of verbs in academic writing is more metaphorical than may be expected by chance. However, this is a finding which goes for almost all registers, which makes them somewhat comparable to prepositions. There is one exception, though, which disrupts this general, word-class oriented interpretation, and that is the finding that in conversation there is no concomitant underuse of non-metaphorical verbs. This might indicate that verbs in writing have a different relation to metaphorical and non-metaphorical use than verbs in spoken interaction, but this suggestion requires further research. For now, the generally acknowledged relatively abstract nature of many verbs may be the cause of their proportionately high metaphorical use, particularly in written discourse. This leaves us with the negative relation between metaphor and adverbs. Apparently academic discourse prefers non-metaphorical uses of adverbs over metaphorical uses. This also holds for the other two written registers, news and fiction, in contrast with conversation, but there is a difference too: in academic

Chapter 10. Metaphor in English discourse 

discourse and in news, the relative underuse of metaphorical adverbs is mirrored by an overuse of non-metaphorical adverbs, but this conjunction disappears in fiction, while in conversation, adverbs are used metaphorically and non-metaphorically completely according to chance. This is why it is interesting to run a brief check of the most popular metaphorical and non-metaphorical adverbs in academic discourse (see Tables 10.9 and 10.10). The ten most popular non-metaphorical adverbs are listed in Table 10.9. These ten types comprise 618 tokens (cases) out of 2503 adverbs in academic writing, which is one quarter (24.7%). It is, moreover, clear that they are almost exclusively non-metaphorical, with only very few metaphorical uses for just two types. This should be contrasted with the most popular metaphorical adverbs, listed in Table 10.10. These ten types comprise 182 tokens (cases) out of 2503, which is 7.3%. In addition, for these types there is a one to two proportion between nonmetaphorical and metaphorical use: metaphorical uses of these adverbs can clearly be contrasted with non-metaphorical uses of the same adverbs, employing more basic meanings. There is a clear difference between the two groups of cases. When we examine the content of the most popular metaphorical adverbs, it may be suggested that they to some extent have to do with the management of discourse in terms of space. Thus, the metaphorical use of here, far, where, further, above, over, under and about might all be relatable to such a mapping. Further qualitative research will have to test the validity of this interpretation, which, at this moment, is still highly tentative, superficial, and potentially problematic. Table 10.9. Top ten non-metaphor-related adverbs in academic texts

More Also How Only So Most However Very Both Then Total

Non-MRW

MRW

106 82 69 62 62 55 47 43 41 41 608

0 0 0 0 1 0 0 0 0 9 10

By contrast, many adverbs that are exclusively non-metaphorical are sentence adverbs, including also and however, or adverbs of degree and comparison, including more, so, most, and very. These adverbs seem to have semi-grammatical functions that hardly lend themselves to metaphorical exploitation. The fact that the

 A Method for Linguistic Metaphor Identification

latter are more frequent than the former in academic writing may point to the limited role of the metaphorical mapping of discourse as space in comparison with the more ubiquitous need for adverbial assistance to express relations between sentences, clauses, and phrases. Again, qualitative research will have to demonstrate the validity of this type of interpretation; however, to throw this approach into relief, it should be remembered that prepositions display a relative bias towards metaphorical use that adds up to almost 40% across all four registers, a fact which clearly has to do with the frequently needed temporal and abstract uses of prepositions as heads of prepositional phrases. Table 10.10. Top ten metaphor-related adverbs in academic texts Non-MRW

MRW

Here Far Where Further Above Over Then Under About Directly

3 2 8 1 0 0 41 0 3 2

23 18 18 14 9 9 9 8 7 7

Total

60

122

We can now turn to news. Here we have seen that there is a positive relation between metaphor and verbs and adjectives, and a negative relation between metaphor and nouns as well as adverbs. Since we just discussed adverbs in academic discourse, let us briefly examine the most popular metaphorical and nonmetaphorical adverbs in news. The ten most popular non-metaphorical adverbs in news texts are listed in Table 10.11. These ten types comprise 507 tokens (cases) out of a total of 2183 adverbs in news, which is almost one quarter (23.4%). This is just a little lower than in academic texts. The top non-metaphorical adverbs are also almost exclusively non-metaphorical, just as in academic texts. The contrast with the most popular metaphorical adverbs in news can be observed by inspecting Table 10.12. These ten types comprise 152 tokens (cases) out of 2183, which is 7.0%. Again, this is just a little lower than in the academic texts. For these types, too, there is a one-to-two proportion between non-metaphorical and metaphorical use, just as in academic texts. The clear difference between the two groups of cases in news is almost exactly identical to what we saw in academic discourse.

Chapter 10. Metaphor in English discourse 

Table 10.11. Top ten non-metaphor-related adverbs in news Non-MRW So Only More Now Also Then Just Most Too Even Total

68 64 63 63 56 43 42 39 36 33 507

MRW 2 0 0 0 0 2 0 0 0 0 4

The identity of both groups of adverbs in news is highly comparable to the ones in academic texts. Some of the adverbs occurring in academic texts but absent from the top ten in news do occur in the top twenty, so that overlap is quite large. As for the content, however, it makes less sense to suggest that the most popular metaphorical adverbs in news have to do with the management of discourse in terms of space. News discourse is not characterized by this type of discourse management, so that further qualitative research will have to examine the nature and function of these metaphor-related adverbs in news. By contrast, the exclusively nonmetaphorical adverbs can probably be interpreted in the same way as in academic discourse: news texts, too, require sentence adverbs such as also, too and even, or adverbs of degree and comparison, including more, so, and most. Again, qualitative research will have to demonstrate the validity of this type of interpretation. Table 10.12. Top ten metaphor-related adverbs in news Non-MRW

MRW

About Far Up Over Around Back Down Here There Together

1 2 5 1 1 1 6 16 17 1

20 16 12 9 8 8 7 7 7 7

Total

51

101

 A Method for Linguistic Metaphor Identification

The relative underuse of metaphorical nouns in news is mirrored by an overuse of non-metaphorical nouns, a conjunction which was lacking in academic discourse. This suggests that news may be more oriented towards non-metaphorical representations of events, including agents, circumstances, and so on. Even though news is often concerned with the representation of social and cultural events and processes in politics and society, these can apparently be designated by abstract terms that are non-metaphorical, such as government, law, debate, election, vote, and so on. This is possible in spite of the fact that news, on the whole, is a register which has a high proportion of metaphor-related words—nouns do not appear to make a substantial contribution to this finding. By comparison, verbs and adjectives are important word classes when it comes to the degree of metaphor in news. For verbs, this may be due to the same general tendency towards metaphorical overuse mentioned above when we discussed academic writing. However, it should also be noted that the degree of overuse of metaphor-related verbs in news texts is quite high. It is possible that this may be due to the same phenomenon as the frequent metaphorical use of adjectives: the surplus of metaphor in these two word classes might be caused by a deliberate colouring of news text by journalists, who wish to pimp up their texts. One example of this type of writing may be found in the following excerpt, where presumably deliberate metaphor use has been highlighted by bold: But the resulting mixture of hymns, folksy tunes and recitatives at times of (A1H fragment 05) intoxicating banality was a sensation.

When we look at fiction, there is a positive relation between metaphor and verbs as well as adjectives, and a negative relation between metaphor and adverbs as well as nouns. The positive relation between metaphor and verbs as well as adjectives is quite comparable to what we have seen for news. Part of this finding may therefore be the result of the same underlying mechanism, the deliberate use of metaphor as a rhetorical strategy to enliven the style of the fictional text. An illustration of this possibility may perhaps be found below: No golden light bathed the red brick of the house. It no longer looked mellow. Beautiful, yes, but severe somehow and to Adam’s heightened awareness (CDB, fragment 4) reproachful.

The behaviour of nouns and adverbs is somewhat similar to what happens in news texts too, but there is an important difference. For both nouns and adverbs, the negative relation with metaphorical use is not balanced by a corresponding positive relation with non-metaphorical use. That is, when nouns and adverbs are used non-metaphorically, they behave according to statistical chance, but when they are used metaphorically, they are used significantly less often than might be expected

Chapter 10. Metaphor in English discourse 

according to that same statistical chance. The question therefore arises whether they ‘suffer’ from the positive relation between metaphor and verbs and adjectives, or, what is more, whether they are in fact somewhat avoided when it comes to metaphorical use in fiction. Since there is no theoretical ground for postulating the latter assumption, the former seems a little more likely. The final register is that of conversation. It is characterized by positive relations between metaphor and adjectives and verbs, and neutral relations between metaphor and adverbs and nouns. This pattern raises interesting questions about the tenability of the above suggestion of deliberate metaphor use in fiction and news, for the same word classes display an overuse of metaphor in conversation whereas there does not seem to be a clear theoretical motivation, again, for postulating a rhetorical convention to this effect. In particular, we do not know of any position arguing that verbs and adjectives are used deliberately in conversation for rhetorical reasons, in contrast with other word classes such as nouns and adverbs. The explanation of the overuse of metaphorical meaning in verbs and adjectives in conversation, but also in fiction and news, therefore has to be given further attention. In that reconsideration, it needs to be remembered that conversation is the only register of the four in our study that displays a neutral relation between verbs and non-metaphorical use, whereas the other three registers all exhibit a balance between overuse of metaphorical verbs and underuse of non-metaphorical verbs. Adverbs and nouns, by contrast, behave according to chance. Whether they are used metaphorically or not, their frequencies are due to what may be expected on the basis of the general relation between word class and register. For conversation, therefore, metaphor is special when it comes to the use of verbs and adjectives. 10.5.2 Conclusion Overall, the three-way interaction between metaphor, register and word class revealed in this chapter may now be looked at in the following way. In general, much of the variation in relations between the three variables can be accounted for from the perspective of the natural and functional variation between word classes across registers. These independent distributional patterns appear to explain the bulk of the three-way interactions between metaphor, register and word class. If the interaction between register and word class is taken for granted and temporarily fixed, as it were, the bulk of metaphor can be seen as a constant function of the four registers, with academic discourse having the highest incidence of metaphorrelated words, conversation the lowest, and news and fiction coming in between. Much of what is happening here, it seems, can also be explained from general functional characteristics of the corpora, which predominantly have to do with the relation of the four registers to involved versus informational production.

 A Method for Linguistic Metaphor Identification

However, influence is also exerted by situation-dependent versus independent reference, and abstract versus non-abstract information. On each of these dimensions, metaphor can be seen to play a role of its own. The most basic of these seems to be informational production: fiction, news, and academic discourse have more need, it seems, for metaphorical mappings to express content than does involved production, which is more interactional than transactional; metaphor is naturally used to think about many things in terms of something else. Against this background, the special use of metaphor in some registers may be identified with more precision. Thus, it looks as if journalists use metaphor-related verbs and adjectives in ways that go beyond the functional variation of these word classes, which may point to a special rhetorical strategy of deliberate metaphorical writing. A similar story may hold true for fiction. These and comparable phenomena can now be looked for in the data. These findings support the value of doing linguistic metaphor identification in the ways that we have advocated in this book. This type of precise, exhaustive and quantified cross-register comparison has never been entertained before, and this was due to the lack of reliable data. Linguists have not been able to provide estimates of this kind before. Our results are reliable, relatively valid, and have a high degree of reproducibility because of the explicitness of the procedure and the protocol. We hope that this research may therefore act as a model and source of inspiration for much future work that will doubtlessly point out what shortcomings there are. But that is precisely the point.

chapter 11

The quality of evidence From MIP to MIPVU This book has addressed the quality of linguistic evidence. Its aim has been to show that it is possible to set out from the findings of contemporary metaphor research and develop a tool for large-scale metaphor identification in natural discourse that is valid and reliable. Validity has to do with the attempt to measure what counts as metaphor in the bulk of linguistic research on metaphor today; and reliability has to do with the attempt to be as accurate as possible in that undertaking. Both of these attempts make a number of assumptions: –– They assume that it is possible and valuable to measure metaphor in language, an assumption that is not shared by all metaphor researchers (cf. Glucksberg & Haught 2006). –– They also assume that it is possible and worthwhile to distinguish metaphor from other phenomena in language, an assumption that is not shared by all metaphor researchers either (cf. Sperber & Wilson 2008). –– And they assume that it is possible and useful to measure metaphor in language without becoming involved in substantial conceptual analysis, which is a proposition that is not shared by many cognitive linguists. Since the combination of these three assumptions is evidently not shared by all linguists, the quality of linguistic evidence turns out to be a controversial affair. There is one group of linguists who may be expected to recognize themselves in the above assumptions. These would be the discourse analysts, stylisticians, applied linguists and sociolinguists who go to conferences like RaAM, the association for Researching and Applying Metaphor. They have exhibited an interest in MIP that signals the need for a tool for metaphor identification that has the following properties: –– it can deal with metaphor in real language data (as opposed to the experimental materials preferred by Glucksberg and Haught);

 A Method for Linguistic Metaphor Identification

–– it can actually distinguish metaphor from other forms of loose talk, if it is agreed that metaphor is a form of loose talk in the first place (as opposed to Sperber and Wilson); –– and it can concentrate on language forms in discourse without having to make questionable assumptions about related conceptual structures (a position which has also been advocated by Cameron & Low 1999, Charteris-Black 2004, and Deignan 2005). Our book has attempted to be maximally systematic and explicit about the assumptions we do endorse. The focus has lain on the operational side, but its theoretical motivation is just as explicit and systematic. This is partly due to the foundation of our work in the MIP proposal by the Pragglejaz Group (2007). But MIP itself is only one tool for metaphor identification, which needs to be positioned in a much more complex field of metaphor research, as has been shown in Steen (2007). For more information on the above three—apparently controversial—assumptions involved in our approach to metaphor identification, we refer the reader to these two publications, the most relevant points of which were presented and discussed in Chapter 1. Some of our tenets are the same as the ones specified for MIP by the Pragglejaz Group (2007), but we have also found cause to deviate in some areas: –– With the Pragglejaz Group, we focus on what counts as metaphorical meaning to the contemporary language user, expressed in lexical units. This means that we consider neither historical metaphor nor metaphor in morphology, phraseology, and syntax. We have gone beyond the Pragglejaz group in carefully defining the limitations of this global position: polywords, phrasal verbs, and novel compounds have all received their own treatment from the same principle expressed in the first sentence of this bullet point. –– With the Pragglejaz Group, we interpret linguistic metaphor identification as pertaining to the semiotic analysis of language structure in usage events; this means that we do not consider the nature of underlying conceptual structures, and do not make claims about cognitive processes and products. –– With the Pragglejaz Group, we conceptualize metaphor as a matter of crossdomain mappings in conceptual structure which are expressed in language. The Pragglejaz Group have restricted their attention to indirect expressions of metaphor. Our approach also includes other linguistic forms of conceptual metaphor, or ‘metaphor in thought’, such as simile, analogy, and so on. –– With the Pragglejaz Group we operationalize metaphor as indirectness by similarity, or comparison. The Pragglejaz Group have pitched this operationalization at the level of language, testing whether lexical units are used indirectly. We have moved it to the level of conceptual structure, testing whether concepts are used indirectly, in order to cater to the other forms of expression

Chapter 11. The quality of evidence 

of metaphor than indirect language use mentioned just now. This does not just pertain to simile and analogy and the like, but also to implicit metaphor, by substitution or ellipsis. –– We have gone beyond the Pragglejaz Group in making explicit our dependence on the theoretically assumed functional relation between words, concepts and referents. This has led to a narrower definition of what counts as a lexical unit than in MIP. In particular, we take lexical units to be constrained by word class, not lemmas. This restriction leads to more systematic comparisons between contextual and basic senses of words as they are used in discourse, that is, in relation to the classes of concepts and referents that they are supposed to evoke in a particular text or situation (entities, attributes, relations, and so on). –– We have gone beyond the Pragglejaz Group’s practice by assuming that a standardized process of data collection is to be preferred. We have formulated explicit and precise guidelines for using the dictionary when making the various decisions that are needed in MIPVU. This makes it possible to be more systematic and explicit about the delimitation of lexical units, contextual meanings, basic meanings, and contrasts between the latter two. These are the most important assumptions that have driven our research. They are presented here as being open to critical debate between linguists who are serious about obtaining high-quality evidence for metaphor in discourse. This evidence is needed if linguists wish to stay in tune with, or even make substantial contributions to, the advances made in the cognitive and social sciences regarding metaphor in cognition, communication, and culture (cf. Gibbs 2008). The proof of the pudding is in the eating. A method may be as explicit and systematic as may be desired from a methodological point of view, but it still does not amount to much if it does not work. From the present point of view, a method works if it yields reliable results, that is, if it guides analysts to the same conclusions over large numbers of diverging data. Then it may be claimed that analysts have made the same observations when attempting to measure the same phenomenon. The statistical reliability tests reported in Chapter 8 have demonstrated that this is the case for MIPVU. The amount of training and work that has gone into achieving these results has been out of the ordinary. But we believe that it has produced an instrument that is useful and, after some training, relatively easy to use. Problem cases will always remain, because of remaining theoretical and operational details as well as inherent issues in the data. But the large majority of the materials appear to be very manageable. Our practice has also shown that acquisition of the tool is possible—its insights can be transferred to other analysts. We hope that one future product of

 A Method for Linguistic Metaphor Identification

our research can be a release of the portion of BNC-Baby that we annotated for metaphor, so that other students of metaphor can train and test themselves with our findings. This will make it possible to discuss the weaknesses of MIPVU that still remain, and to improve our views of the subtle details of metaphor identification in natural discourse. Even though there are these positive results to be reported about the general methodology of MIPVU, though, this is not meant to convey the impression that our approach has been infallible. Errors will always remain, both of a systematic and of an erratic kind. The entire point of developing a tool like MIP and MIPVU is to reduce error to a minimum, so that observation gains in quality. But unless MIPVU can be automated and its output be validated as 100% correct by a broad forum of researchers—which in fact may not be impossible in principle—instruments are applied by people and error will remain. What we do hope to prevent, by putting all our methodological cards on the table, is too hasty a dismissal of our efforts as a failure when researchers point to errors. Since there will always be errors, we believe that critiquing our method should go hand in hand with offering a better alternative. Such an alternative should be comparably valid from a theoretical point of view, that is, it should aim to capture the same phenomena in language. If such an alternative can then show that it is able to analyse the same amount of data with the same or better reliability, then MIPVU has a problem. We see this as an interesting challenge to the linguistic community of metaphor researchers.

appendix

Overview of annotated files from BNC-Baby Academic File ID

Total number of words in BNC file

Total number divisions in BNC file

ID number of file division coded

Number of lexical units in data

A6U ACJ ALP AMM AS6 AS6 B17 B1G CLP CLW CRS CTY EA7 ECV EW1 FEF

27,329 37,678 25,632 39,563 30,938 Id 34,305 38,559 40,742 38,714 40,250 34,131 25,531 40,343 41,695 26,854

6 2 4 2 4 Id 3 2 2 1 3 5 3 7 2 4

2 1 1 2 1 2 2 2 1 1 1 3 3 5 1 3

2,814 4,189 2,253 3,866 3,366 2,840 1,608 3,006 3,368 3,748 2,044 3,434 2,771 3,847 3,708 2,703

Total

522,264

NA

NA

49,561

Total number of words in file

Total number of divisions in file

ID number of file division coded

Number of lexical units in data

103,997 Id Id Id 31,337

60 Id Id Id 13

10 31 45 48 13

3,072 3,161 2,830 2,983 3,641

Conversation File ID KB7 KB7 KB7 KB7 KBC

(Continued)

 A Method for Linguistic Metaphor Identification

Conversation (Continued) File ID

Total number of words in file

Total number of divisions in file

ID number of file division coded

Number of lexical units in data

KBD KBD KBH KBH KBH KBH KBH KBH KBJ KBP KBW KBW KBW KBW KBW KCC KCF KCU KCV

58,087 Id 47,995 Id Id Id Id Id 11,137 27,179 115,332 Id Id Id Id 5,311 21,898 49,751 32,714

25 Id 63 Id Id Id Id Id 26 15 62 Id Id Id Id 2 30 9 50

7 21 1 2 3 4 9 41 17 9 4 9 11 17 42 02 14 02 42

3,124 2,779 436 1,227 165 1,838 714 616 1,083 2,666 1,712 1,351 1,670 2,295 2,655 836 1,305 3,347 2,495

Total

504,738

NA

NA

48,001

File ID

Total number of words in file

Total number of divisions in file

ID number of file division coded

Number of lexical units in data

AB9 AC2 BMW BPA C8T CB5 CCW CCW CDB CDB FAJ FET FPB G0L

42,247 37,662 42,584 37,769 41,117 41,727 40,408 Id 38,169 Id 42,500 35,526 41,894 43,292

8 10 9 19 2 2 4 Id 6 Id 23 7 1 1

3 6 9 14 1 2 3 4 2 4 17 1 1 1

4,221 3,045 4,584 2,920 2,877 2,818 2,083 1,958 2,703 1,907 4,058 4,222 4,119 3,377

Total

484,895

NA

NA

44,892

Fiction

Appendix. Overview of annotated files from BNC-Baby 

News File ID A1E A1F A1F A1F A1F A1F A1F A1F A1G A1G A1H A1H A1J A1J A1K A1L A1M A1N A1N A1P A1P A1U A1X A1X A1X A2D A31 A36 A38 A39 A3C A3E A3E A3K A3M A3P A4D A5E A7S A7T A7W

Total number of words in file

Total N divisions in file

ID N of file division coded

9,916 8,909 Id Id Id Id Id Id 10,242 Id 3,108 Id 13,981 Id 1,905 1,849 4,910 14,770 Id 2,595 Id 4,198 3,322 Id Id 1,042 3,492 6,173 3,254 2,355 8,522 1,858 Id 3,500 3,007 8,032 3,167 5,411 5,414 8,720 25,255

17 20 Id Id Id Id Id Id 31 Id 6 Id 40 Id 3 2 5 49 Id 4 Id 5 5 Id Id 7 3 9 3 3 13 4 Id 11 6 14 4 8 8 16 55

1 6 7 8 9 10 11 12 26 27 5 6 33 34 2 1 1 9 18 1 3 4 3 4 5 5 3 7 1 1 5 2 3 11 2 9 2 6 3 1 1

Number of lexical units in data 584 87 269 111 223 178 222 62 405 593 935 724 813 605 1,012 1,074 1,113 698 812 647 653 1,892 145 194 279 1,039 699 546 756 257 1,031 233 778 1,227 887 947 1,246 1,080 848 951 1,734 (Continued)

 A Method for Linguistic Metaphor Identification

News (Continued) File ID

Total number of words in file

Total N divisions in file

ID N of file division coded

Number of lexical units in data

A7Y A80 A8M A8N A8R A8U A98 A9J AA3 AHB AHC AHC AHD AHE AHF AHF AHL AJF AL0 AL2 AL2 AL5

10,862 10,608 3,595 12,014 6,735 8,816 6,769 3,705 9,084 17,314 39,523 Id 4,236 1,236 27,457 Id 2,552 6,472 5,143 9,361 Id 2,523

9 26 7 19 7 18 12 2 15 52 82 Id 10 3 73 Id 5 14 9 50 Id 5

3 15 2 19 2 14 3 1 8 51 60 61 6 3 24 63 2 7 6 16 23 3

895 585 313 653 851 832 593 1,505 757 901 1,116 1,097 303 315 1,202 1,311 447 669 532 410 413 827

Total

356,912

NA

NA

45,116

References Aarts, F.G.A.M. & H.C. Wekker (1987). A contrastive grammar of English and Dutch/Contrastieve grammatica Engels/Nederlands. Leiden: Martinus Nijhoff. Anderson, A.R. & Nicholson, L. (2005). News and nuances of the entrepreneurial myth and metaphor: linguistic games in entrepreneurial sense-making and sense-giving. Entrepreneurship: Theory and Practice, 29 (2), 153–172. Aristotle (c. 335 BCE). The Poetics (Greek: Περι`ποιητικη˜ς). Translated by I. Bywater (1909), with a preface by G. Murray. http://www.authorama.com/book/the-poetics.html. Last updated: April 2004. Last visited: November 2009. Bakhtin, M.M. (1981). The Dialogic Imagination. Austin: University of Texas Press. Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press. Biber, D. (1989). A typology of English texts. Linguistics, 27, 3–43. Biber, D. (2006). University language: A corpus-study of spoken and written registers. Amsterdam: John Benjamins. Biber, D. (2007). On the complexity of discourse complexity: A multi-dimensional analysis. In S. Conrad & D. Biber, Variation in English: Multi-Dimensional Studies (pp. 215–240). London: Longman. Biber, D., & S. Conrad. (2001). Register variation: A corpus approach. In Deborah Schiffrin, Deborah Tannen, & Heidi Hamilton (eds.), The handbook of discourse analysis (175–196). Oxford: Blackwell. Biber, D., S. Johansson, G. Leech, S. Conrad, E. Finegan. (1999). The Longman grammar of spoken and written English. London: Longman. Blom, C. (2005). Complex predicates in Dutch: synchrony and diachrony. Utrecht: LOT. Boers, F. (2003). Applied linguistics perspectives on cross-cultural variation in conceptual metaphor. Metaphor and Symbol, 18:4, 231–238 Boers, F. (2004). Expanding learners’ vocabulary through metaphor awareness: What expansion, what learners, what vocabulary? In M. Achard, & S. Niemeierand, (Eds.): Cognitive linguistics, second language acquisition, and foreign language teaching (pp. 211–232). Berlin & New York: Mouton de Gruyter. Booij, G.E. (2002a). Constructional idioms, morphology, and the Dutch lexicon. Journal of Germanic Linguistics 14, 301–327. Booij, G.E. (2002b). Separable complex verbs in Dutch: a case of periphrastic word formation. In N. Deh, R. Jackendoff, A. Macintyre, & S. Urban (eds.), Verb-Particle Explorations (21–42). Berlin: Mouton de Gruyter. Bowdle, B.F., & Gentner, D. (2005). The career of metaphor. Psychological Review, 112(1), 193–216. Boyd, R. (1993). Metaphor and theory change: what is metaphor a ‘metaphor’ for? In A. Ortony (Ed.), Metaphor and thought: Second edition (pp.356–408). Cambridge: Cambridge University Press. Bruce, I. (2008). Academic writing and genre: A systematic analysis. London: Continuum.

 A Method for Linguistic Metaphor Identification Brown, Th. L. (2003). Making truth: Metaphors in science. Champaign: University of Illinois Press. Caballero, R. (2003). Metaphor and genre: The presence and role of metaphor in the building review. Applied Linguistics, 24 (2), 145–167. Cameron, L. (1999). Identifying and Describing Metaphor in spoken discourse data. In: L. Cameron & G. Low (Eds.) Researching and Applying Metaphor. Cambridge: Cambridge University Press. Cameron, L. (2003). Metaphor in educational discourse. London: Continuum. Cameron, L. (2007). Patterns of metaphor use in reconciliation talk. Discourse & Society 18(2), 197–222. Cameron, L. (2008a). Metaphor and talk. In R.W. Gibbs jr. (Ed.). The Cambridge Handbook of Metaphor and Thought (pp.197–211). Cambridge: Cambridge University Press. Cameron, L. (2008b). Metaphor shifting in the dynamics of talk. In M. Zanotto, L. Cameron, & M. Cavalcanti (Eds.), Confronting metaphor in use: An applied linguistics approach (pp. 45–62). Amsterdam: John Benjamins. Cameron L. (2008c). A discourse approach to metaphor: Explaining systematic metaphors for literacy processes in a school discourse community. In A. Tyler, Y. Kim, M. Takada (Eds.), Language in the context of use. Discourse and cognitive approaches to language (pp. 321–338). Berlin, New York: Mouton de Gruyter. Cameron, L. & Deignan, A. (2003). Combining Large and Small Corpora to Investigate Tuning Devices Around Metaphor in Spoken Discourse. Metaphor and Symbol, 18(3), 149–160. Cameron, L. & Deignan, A. (2006). The emergence of metaphor in discourse. Applied Linguistics, 27 (4), 671–690. Carter, R. (2004). Language and Creativity: The Art of Common Talk. London: Routledge Case, R. (Ed.). (1992). The mind’s staircase: Stages in the development of human intelligence. Mahwah, NJ: Erlbaum. Chafe, W. (1994). Discourse, consciousness, and time: The flow and displacement of conscious experience in speaking and writing. Chicago: University of Chicago Press. Channell, J. (1994). Vague Language. Oxford: Oxford University Press. Charles, M. (2003). ‘This mystery…’: A corpus-based study of the use of nouns to construct stance in theses from two contrasting disciplines. Journal of English for Academic Purposes 2, 313–326. Charteris-Black, J. (2004). Corpus approaches to critical metaphor analysis. Basingstoke: Palgrave Macmillan. Chiang, W. & Duann, R. (2007). Conceptual metaphors for SARS: ‘war’ between whom? Discourse & Society, 18 (5), 579–602. Cienki, A. & Müller, C. (Eds.). (2008). Metaphor and gesture. Amsterdam: John Benjamins. Clark, H.H. (1973). Space, time, semantics, and the child. In T.E. Moore (Ed.), Cognitive development and the acquisition of language (pp. 27–63). New York: Academic Press. Clark, H.H. (1996). Using Language. Cambridge: Cambridge University Press. Coupland, J. (2000) Small Talk. London: Longman. Croft, W. & Cruse, A. (2004). Cognitive linguistics. Cambridge: Cambridge University Press. Cruttenden, A. (1986). Intonation. Cambridge: Cambridge University Press. Crystal, D. (2003). The Cambridge encyclopaedia of the English language: Second edition. Cambridge: Cambridge University Press. Cutting, J. (2000). Analysing the language of discourse communities. Oxford, Elsevier Science. Cutting, J. (2001). Speech acts of the in-group. Journal of Pragmatics 33(8), 1207–1233. Cutting, J. (Ed.) (2007). Vague language explored. Palgrave Macmillan.

References 

Debatin, B. (1995). The rationality of metaphor: An analysis based on the philosophy of language and communication theory. (Die Rationalität der Metapher. Eine sprachphilosophische und kommunikationstheoretische Untersuchung.) Berlin: De Gruyter Deignan, A. (2005). Metaphor and corpus linguistics. Amsterdam and Philadelphia: John Benjamins. Den Boom, T & D. Geeraerts (eds) (2005). Van Dale groot woordenboek van de Nederlandse taal. Utrecht & Antwerpen: Van Dale Lexicografie. (Electronic version) Drew, P. & Holt E. (1988). Complainable matters: The use of idiomatic expressions in making complaints. Social Problems, 35 (4), 398–417 Drew, P. & Holt, E. (1998). Figures of speech: Figurative expressions and the management of topic transition in conversation. Language in Society, 27, 495–522. Edwards, J.A. & Lampert M.D. (Eds.) (1993). Talking data: transcription and coding in discourse research. Hillsdale, New Jersey: Lawrence Erlbaum. Eggins, S. & Martin, J.R. (1997). Genres and registers of discourse. In T.A. van Dijk (Ed.), Discourse as structure and process (pp. 230–256). London: SAGE. Eggins, S. & Slade D. (1997). Analysing casual conversation. London/Oakville: Equinox Fauconnier & Turner 2002 Feyaerts, K. (2000). Refining the inheritance hierarchy: Interaction between metaphoric and metonymic hierarchies. In A. Barcelona (Ed.), Metaphor and metonymy at the crossroads (pp.59–78). Berlin and New York: Mouton de Gruyter. Fillmore, C. (1981). Pragmatics and the description of discourse. In P. Cole (Ed.), Radical Pragmatics (pp. 143–166). New York: Academic Press. Flowerdew, J. (2002). (Ed.). Academic discourse. Harlow: Longman. Flowerdew, L. (2005). An integration of corpus-based and genre-based approaches to text analysis in EAP/ESP: countering criticisms against corpus-based methodologies. English for Specific Purposes 24, 321–332. Francis, G. (1994). Labelling discourse: an aspect of nominal group lexical cohesion. In M. Coulthard (Ed.), Advances in written text analysis (pp 83–101). London: Routledge. Geeraerts, D. (1997). Diachronic processes in semantics: A contribution to historical lexicology. Oxford: Oxford University Press. Geeraerts, D. (2002). The interaction of metaphor and metonymy in composite expressions. In R. Dirven & R. Pörings (Eds.). Metaphor and metonymy in comparison and contrast (pp. 435–465). Berlin and New York: Mouton de Gruyter. Gentner, D. (1982). Are scientific analogies metaphors? In D.S. Miall (Ed.), Metaphor. Problems and perspectives (pp 106–132). Brighton: Harvester. Gentner, D., & Bowdle, B.F. (2001). Convention, form, and figurative language processing. Metaphor and Symbol, 16(3 & 4), 223–248. Gentner, D., Bowdle, B., Wolff, P., & Boronat, C. (2001). Metaphor is like analogy. In D. Gentner, K.J. Holyoak, & B.N. Kokinov (Eds.), The analogical mind: Perspectives from cognitive science (pp. 199–253). Cambridge, MA: MIT Press. Gentner, D., & Gentner, D.R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In D. Gentner & A.L. Stevens (Eds.), Mental models (pp. 99–129). Hillsdale, NJ: Lawrence Erlbaum Associates. Gentner, D. & Grudin, J. (1985). The evolution of mental metaphors in psychology: A 90-year retrospective. American Psychologist, 40, 181–192. Gentner, D., Imai, M., & Boroditsky, L. (2002). As time goes by: Evidence for two systems in processing space time metaphors. Language and Cognitive Processes, 17, 537–565.

 A Method for Linguistic Metaphor Identification Gentner, D. & Jeziorski, M. (1993). The shift from metaphor to analogy in Western science. In A. Ortony (Ed.), Metaphor and thought: Second edition (pp. 447–480). Cambridge: Cambridge University Press. Gibbs, R.W., Jr. (1994). The poetics of mind: Figurative thought, language, and understanding. Cambridge: Cambridge University Press. Gibbs, R.W., jr. (2008). The Cambridge handbook of metaphor and thought. New York: Cambridge University Press. Gibbs, R.W., Jr. & Cameron, L. (2008). The social-cognitive dynamics of metaphor performance. Cognitive Systems Research, 9, 64–75. Goatly, A. (1997). The language of metaphors. London: Routledge. Goffman, E. (1959). The presentation of self in everyday life. New York: Doubleday Anchor. Goossens, L. (2002). Metaphtonomy: The interaction of metaphor and metonymy in expressions for linguistic action. In R. Dirven & R. Pörings (Eds.). Metaphor and metonymy in comparison and contrast (pp. 349–377). Berlin and New York: Mouton de Gruyter. Hagoort, P. & Van Berkum, J. (2007). Beyond the sentence given. Philosophical Transactions of the Royal Society B, 362, 801–811. Halliday, M. (1978). Language as social semiotic: The social interpretation of language and meaning. London: Edward Arnold. Halliday, M. (2004). Language of science (edited by Jonathan J. Webster). Collected works of M.A.K. Halliday (5) London: Continuum. Halliday, M. & Hasan, R. (1976).Cohesion in English. London: Longman. Hamilton, C.A. (2002). Mapping the mind and the body: on W.H. Auden’s personifications. Style 363, 408–427. Haeseryn, W., Romijn, K., Geerts, G., De Rooij, J. & Van den Toorn, M.C. (1984/1997). Algemene Nederlandse Spraakkunst. Tweede, geheel herziene druk. Groningen & Deurne: Martinus Nijhoff uitgevers/Wolters Plantyn. Hesse, M. (1966). Models and analogies in science, Notre Dame: University of Notre Dame Press. Heywood, J., Semino, E. & Short, M.H. (2002). Linguistic metaphor identification in two extracts from novels. Language and Literature, 11, 35–54. Hoffman, R.R. (1985). Some implications of metaphor for philosophy and psychology of science. In W. Paprotte & R. Dirven (Eds.) The ubiquity of metaphor (pp. 327–380). Amsterdam: John Benjamins. Holton, G. (1995). Metaphors in science and education. In Z. Radman, (Ed.). (1995). From a metaphorical point of view: A multidisciplinary approach to the cognitive content of metaphor. (pp. 259–288). Berlin and New York: Walter de Gruyter. Holyoak, K.J. & Thagard, P. (1995). Mental leaps: Analogy in creative thought. Cambridge: Bradford. Hyland, K. (2000). Disciplinary discourses: Social interactions in academic writing. Harlow: Pearson Education. Hyland, K. (2006). English for academic purposes: An advanced resource book. London: Routledge. Hyland, K. (2009). Academic discourse: English in a global context. London: Continuum. Jucker, A.H., Smith, S.W., & Lüdge, T. (2003). Interactive aspects of vagueness in conversation. Journal of Pragmatics, 35, 1737–1769. Kaal, A.A. & Dorst, A.G. (in press). Metaphor in Discourse: beyond the boundaries of MIP. To appear in the RaAM7 Conference Proceedings. Keller, E.F. (2002). Making sense of life: Explaining biological development with models, metaphors and machines. Cambridge: Harvard University Press.

References 

Kendon, A. (2004). Gesture: Visible Action as Utterance. Cambridge: Cambridge University Press. Kennedy, V. (2000). Metaphor in the news – Introduction. Metaphor and Symbol, 15 (4), 209–211. Kitis, E. & Milapides, M. (1997). Read it and believe it: How metaphor constructs ideology in news discourse; a Case Study. Journal of Pragmatics, 28, 557–590. Koch, P. (1999). Frame and contiguity: On the cognitive bases of metonymy and certain tyopes of word formation. In K.-U. Panther & G. Radden (Eds.), Metonymy in language and thought (pp. 139–167). Amsterdam: John Benjamins. Knudsen, S. (2003). Scientific metaphors going public. Journal of Pragmatics 35, 1247–1263. Koller, V. (2004). Metaphor and gender in business media discourse: a critical cognitive study. Basingstoke and New York: Palgrave Macmillan. Kuhn, T.S. (1962). The Structure of scientific revolutions. Chicago: University of Chicago Press, Lakoff, G. & Johnson, M. (1980). Metaphors we live by. Chicago: University of Chicago Press. Lakoff, G. & Johnson, M. (1999). Philosophy in the flesh: The embodied mind and its challenge to western thought. New York: Basic Books. Lakoff, G. & Núñez, R.E. (2000). Where mathematics comes from: How the embodied mind brings mathematics into being. New York: Basic Books. Lakoff, G. & Turner, M. (1989). More than cool reason: A field guide to poetic metaphor. Chicago: Chicago University Press. Larson, B., Nerlich, B., & Wallis W. (2005). Metaphors and biorisks: The war on invasive species and infectious diseases. Science Communication 60, 2629–2639. Leary, D.E. (Ed.). (1990). Metaphors in the history of psychology. Cambridge: Cambridge University Press. Leech, G. (1969). A linguistic guide to English poetry. London: Longman. Leech, G. (2008). Language in literature: style and foregrounding. London: Longman. Leech, G. & Short, M. (1981/2007). Style in fiction: A linguistic introduction to English fictional prose: Second edition. Harlow and London: Pearson Education Limited. Lindstromberg, S. (1998). English prepositions explained. Amsterdam & Philadelphia: John Benjamins. Littlemore, J. & Low, G. (2006). Figurative thinking and foreign language learning. Basingstoke: Palgrave Macmillan. Lodge, D. (1977). The modes of modern writing: Metaphor, metonymy, and the typology of modern literature. London: Edward Arnold. Low, G. (1999). “This paper thinks…”: Investigating the acceptability of the metaphor AN ESSAY ISA PERSON. In L. Cameron & G. Low (Eds.), Researching and applying metaphor (pp. 221–248). Cambridge: Cambridge University Press. Low, G. (in press). Wot no similes? The curious absence of simile in university lectures. In A. Deignan, L. Cameron, G. Low & Z. Todd (Eds.), Researching and applying metaphor in the real world. Low, G. (2005). Explaining evolution: The use of animacy in an example of semi-formal science writing. Language & Literature, 14 (2), 129–148. Low, G. (2008a). Metaphor and education. In R.W. Gibbs (Ed.), The Cambridge handbook of metaphor and thought (pp. 212–230). Cambridge: Cambridge University Press. Low, G. (2008b). Metaphor and positioning in academic book reviews. In M.S. Zanotto, L. Cameron & M. Cavalcanti (Eds.), Confronting metaphor in use (pp. 79–100). Amsterdam: John Benjamins. Low, G., Littlemore, J. & Koester, A. (2008). Metaphor use in three UK university lectures. Applied Linguistics 29, 428–455.

 A Method for Linguistic Metaphor Identification MacKay, D.G. (1986). Prototypicality among metaphors: on the relative frequency of personification and spatial metaphors in literature written for children versus adults. Metaphor and Symbolic Activity, 1(2), 87–107. Martin, J. & Harré, R. (1982). Metaphor in science. In D.S. Miall (Ed.), Metaphor. problems and perspectives (pp 89–105). Brighton: Harvester. McNeill, D. (1992). Hand and mind. Chicago: University of Chicago Press. Moon, R. (1998). Fixed Expressions and Idioms in English. Oxford: Clarendon Press. Mukařovský, J. (1970). Standard language and poetic language. In: D.C. Freeman (Ed.), Linguistics and literary style. New York: Holt, Rinehart and Winston, Inc. Musolff, A. (2006). Metaphor scenarios in public discourse. Metaphor and Symbol, 21 (1), 23–38. Nerlich, B. & Halliday, C. (2007). Avian flu: The creation of expectations in the interplay between science and the media. Sociology of Health and Illness, 29, 46–65. Nowottny, W. (1965). The language poets use. London: The Athlone Press. Oostdijk, N. (2002). The Design of the Spoken Dutch Corpus, In P. Peters, P. Collins & A. Smith (Eds), New Frontiers of Corpus Research (105–112). Amsterdam: Rodopi. Ortony, A. (1975). Why metaphors are necessary and not just nice. Educational Theory 25, 45–53. Ortony, A. (1979/1993). Metaphor and thought. Cambridge: Cambridge University Press. Paltridge, B. (2004). Academic writing. Language Teaching 37, 87–105. Peacock M. & Flowerdew J. (Eds.) (2001). Research perspectives on English for academic purposes. Cambridge: Cambridge University Press. Poos, D. & Simpson, R. (2002). Cross-disciplinary comparisons of hedging: Some findings from the Michigan Corpus of Academic Spoken English. In R. Reppen, S.M. Fitzmaurice, & D. Biber (Eds.). Using corpora to explore linguistic variation (pp. 3–23). Amsterdam: John Benjamins. Pragglejaz Group (2007). MIP: A method for identifying metaphorically used words in discourse. Metaphor and Symbol, 22 (1), 1–39. Pulaczewska, H. (1999). Aspects of metaphor in physics. Tübingen: Max Niemeyer. Radden, G. (2002). How metonymic are metaphors? In R. Dirven & R. Pörings (Eds.), Metaphor and metonymy in comparison and contrast (pp. 407–434). Berlin and New York: Mouton de Gruyter. Ritchie, D. (2003). ARGUMENT IS WAR – Or is it a game of chess? Multiple meanings in the analysis of implicit metaphors. Metaphor and Symbol, 18, 125–146. Ritchie, D. (2004). Common ground in metaphor theory: Continuing the conversation. Metaphor and Symbol, 19, 233–244. Rundell, M. (Ed.) (2002). Macmillan English dictionary for advanced learners. Oxford: Macmillan. Santa Ana, O. (1999). ‘Like an animal I was treated’: Anti-immigrant metaphor in US public discourse. Discourse Society, 10, 191–224. Searle, J.R. (1979/1993). Metaphor. In A. Ortony (Ed.), Metaphor and thought: Second edition (pp. 83–111). Cambridge: Cambridge University Press. Semino, E. & Swindlehurst, K. (1996). Metaphor and mind style in Ken Kesey’s One Flew Over the Cuckoo’s Nest. Style, 30, 143–166. Semino, E. (2008). Metaphor in discourse. Cambridge: Cambridge University Press. Semino, E. & Steen, G.J. (2008). Metaphor in literature. In R.W. Gibbs, jr (Ed.). The Cambridge handbook of metaphor and thought, (pp. 232–246). Cambridge: Cambridge University Press. Short, M. (1996). Exploring the language of poems, plays and prose. London: Longman.

References 

Sinclair, J. McH (1987). Collocation: a progress report. In R. Steele & T. Treadgold (Eds.). Essays in honour of Michael Halliday (pp. 319–331). Amsterdam: John Benjamins. Sinclair, J. McH (1991). Corpus, concordance, collocation. Oxford: Oxford University Press. Sperber, D. & Wilson, D. (2008). A deflationary account of metaphors. In R.W. Gibbs, Jr. (Ed.).(2008). The Cambridge handbook of metaphor and thought (pp. 84–105). Cambridge: Cambridge University Press. Steen, G.J. (1994). Understanding metaphor in literature: an empirical approach. London and New York: Longman. Steen, G.J. (1999). From linguistic to conceptual metaphor in five steps. In R.W. Gibbs, jr. _ G.J. Steen (Eds.), Metaphor in cognitive linguistics, (pp. 57–77). Amsterdam: John Benjamins. Steen, G.J. (2004). Can discourse properties of metaphor affect metaphor recognition? Journal of Pragmatics 36 (7), 1295–1313. Steen, G.J. (2007). Finding metaphor in grammar and usage. Amsterdam and Philadelphia: John Benjamins. Steen, G.J. (2008). The paradox of metaphor: Why we need a three-dimensional model of metaphor. Metaphor & Symbol 23(4), 213–241. Steen, G.J. (In press a). When is metaphor deliberate? In N.-L. Johannesson, C. Alm-Arvius, & D.C. Minugh (Eds.), Selected papers from the Stockholm 2008 metaphor festival. Stockholm: University of Stockholm. Steen, G.J. (In press b). When is metaphor a matter of thought? In D. Schönefeld (Ed.), Converging evidence in cognitive linguistics. Amsterdam/Philadelphia: John Benjamins. Steen, G.J. (In press c). Genre between the humanities and the sciences. In M. Callies, W. keller, & A. Lohöfer (eds.), Bi-directionality in the cognitive sciences. Amsterdam/Philadelphia: John Benjamins. Steen, G.J., Biernacka, E.A., Dorst, A.G., Kaal, A.A., López-Rodríguez, I., & Pasma, T., (in press). Pragglejaz in practice: Finding metaphorically used words in natural discourse. In G. Low, L. Cameron, A. Deignan & Z. Todd (Eds.), Metaphor in the real world. Steen, G.J., Dorst. A.G., Kaal, A.A., Herrmann, J.B., Krennmayr, T. (accepted pending revisions). Metaphor in usage. Cognitive Linguistics. Steen, G.J., & Gibbs, R.W., jr. (2004). Questions about metaphor in literature. European Journal of English Studies 8 (3), 337–354. Sternberg, R. & Davidson J. (Eds.) (1995). The nature of insight. Cambridge, MIT Press. Sutton, C. (1993). Figuring out a scientific understanding. Journal of Research in Science Teaching, 30 (10), 1215–1227. Tannen, D. (1989). Talking Voices: Repetition, Dialogue and Imagery in Conversational Discourse. Cambridge: Cambridge University Press Traugott, E.C. (1978). On the expression of spatio-temporal relations in language. In J.H. Greenberg (Ed.), Universals of human language: Vol. 3. Word structure (pp. 369–400). Stanford, CA: Stanford University Press. Tsur, R. (1987). On metaphoring. Jerusalem: Israel Science Publishers. Turner, M. (1997). The literary mind: The Origins of thought and language. Oxford University Press. Turbayne, C.T. (1962). The myth of metaphor. New Haven: Yale University Press. Van den Bosch, A., Busser, G.J., Daelemans, W., & Canisius, S. (2007). An efficient memory-based morphosyntactic tagger and parser for Dutch, In F. van Eynde, P. Dirix, I. Schuurman, & V. Vandeghinste (Eds.), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 99–114.

 A Method for Linguistic Metaphor Identification van Dijk, T. (1988). News as discourse. Hillsdale, New Jersey: Lawrence Erlbaum. van Dijk, T. (1991). Media contents. The interdisciplinary study of news as discourse. In N.W. Jankowski & K. Bruhn Jensen (Eds.), A handbook of qualitative methodologies for mass communication research (pp. 108–120). New York: Routledge. Verhagen, A. (2005). Constructiegrammatica en ‘usage based’ taalkunde. Nederlandse Taalkunde 10, 197–222. Warren, B. (2002). An alternative account of the interpretation of referential metonymy and metaphor. In R. Dirven & R. Pörings (Eds.), Metaphor and metonymy in comparison and contrast (pp. 113–130). Berlin and New York: Mouton de Gruyter. Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge: Cambridge University Press. Young, R.M. (1988). Darwin’s metaphor: Nature’s place in Victorian culture. Cambridge: Cambridge University Press. Zinken, J. (2003). Ideological imagination: Intertextual and correlational metaphors in political discourse. Discourse & Society, 14 (4), 507–523.

Index

A abstract 1, 36, 38, 45, 47–48, 50–53, 55–56, 58, 66, 68, 71, 75–76, 91–92, 97–101, 106, 109–111, 113, 116–119, 131–132, 134–137, 141, 143–146, 187, 195–196, 201–203, 210–212, 214, 216, 218 academic discourse 22, 90, 97, 107–112, 115–116, 118, 120, 123–125, 151, 183, 194–196, 198, 201, 203–204, 206, 211–218 activate 12, 48, 92–93, 167 adjective 31, 45, 109, 121, 129, 177–178, 196, 208 adverb 28–30, 68, 129–130, 196, 208 adverbial particle 28, 30, 169–171 affect 64, affective 62 affectively 68 agreement 7–8, 20, 44–45, 55, 59, 68, 90, 105, 109, 111, 113, 116, 124, 128, 132, 138–139, 141–142, 147, 149–151, 154, 156, 158, 161, 164–165, 168, 177, 181, 186, 188 ambiguity 44, 46, 49, 82 ambiguous 34, 46–47, 49, 51, 59, 69, 71, 74, 79–80, 82–83, 85, 111 analogy 21, 40, 108, 113, 115, 140, 176, 220–221 annotation 3, 22, 31, 44, 63, 68, 83, 85, 90, 100–101, 106, 122, 151–152, 161–162, 164, 167–169, 174, 176–177, 179–180, 183, 185–187, 190, 193 applied linguistics 3, 107

B basic meaning 6, 13, 25–26, 33, 35–37, 39, 45, 48, 50–56, 58–59, 65, 67, 70, 84, 91–94, 99, 104–105, 110, 112–114, 117, 119, 123, 130–131, 136–137, 139, 141–142, 144–145 sense 6, 8, 13, 15, 17, 35, 37–38, 47, 52, 54, 56, 65, 68, 74–75, 79–80, 100, 105, 116, 118–122, 131, 135, 144, 146 bias 44, 150, 154–155, 158–159, 161, 164, 186, 188, 214 BNC(-Baby) 22, 27–32, 46, 49, 62–64, 68–70, 72, 74, 76, 89, 124, 127, 135, 137, 151–157, 159–160, 168–172, 177–179, 181, 183, 185, 187–190, 192–193, 196, 201, 222–223 borderline case 45, 50, 110, 112, 115, 119, 174 C categorization 2, 14, 181 category 11, 14, 18–19, 35–37, 66, 76, 84, 116, 154, 182, 191–193, 195, 198, 200, 207–208, 211 chance 20, 150, 154–155, 180, 191–193, 195, 210, 212–213, 216–217 character descriptions 92, 96–98, 101, 105–106 cognitive linguistics 1–2, 8–9, 11–12, 14, 35, 79, 175, 184 processing 12, 183 cohesion 40, 66, 116, 118, 125, 177

cohesive 15, 40, 71, 176–177 communicative function 61–62 comparison 6, 8–11, 13–15, 18–19, 21, 27, 35, 38–40, 45, 54, 56–58, 65, 67, 70, 91–96, 104–105, 110–115, 117, 121, 123, 136–137, 139, 143–146, 167, 175–176, 213, 215, 220 compare 6, 11, 14–17, 20, 35–36, 39–40, 48–49, 59, 68, 76–78, 80, 85, 92, 95–96, 105, 111–112, 114, 118–119, 122–123, 136–137, 143–146, 176 complex verb 29, 36, 128, 132, 134–135 compound 27, 30–32, 46–48, 114, 118, 127, 152, 166, 168–169, 171–172, 181, 186, 188–193, 220 concept 8, 11, 12–16, 21, 27–28, 30, 32, 35, 37, 39–41, 48, 50, 68, 78–79, 92–93, 99–101, 107, 115, 133, 143, 167, 220–221 conceptual domain 8–10, 11, 37, 39, 79, 94 mapping 8, 14, 110, 120, 123 metaphor 1–2, 8, 26, 88, 109, 220 structure 8–9, 11–13, 15–16, 21, 58, 61, 63, 88, 94, 106, 167, 183, 220 Conceptual Theory of Metaphor (CTM) 109 conceptualization 7, 10–12, 57–58, 115 concrete 1, 5, 17, 35–38, 47–48, 50–51, 53, 55–56, 58, 66, 68, 70, 73, 75–76, 80, 84,

 Index 91–92, 97–101, 106, 110–111, 113, 116, 118–120, 130–132, 134, 137, 139, 141–142, 144–146, 187 conflate 50, 55–56, 74, 85, 111, 177, 187 conflation 55, 76 construction 36, 116–118, 122–123, 128, 132, 146, 176 content 32, 46, 57, 63–64, 72–73, 84, 95, 97, 119–120, 141, 143, 145, 185, 195–196, 206, 211–213, 215, 218 content domain 151, 202 context 5–6, 14, 17, 33–35, 39, 45, 48–52, 56, 59, 62–65, 67–70, 73–74, 77, 81–86, 88, 91, 95, 103, 105–106, 110, 114, 116, 118, 120–121, 130, 132, 134, 136–139, 143, 173, 183, 188 contextual meaning 5–6, 9, 13, 28–29, 33–37, 39, 44–48, 50, 52–59, 65, 67, 70, 76, 80, 82, 91–93, 99, 101, 104–106, 110–115, 117–121, 123, 130, 132, 135–137, 140–141, 143–144, 146, 174, 179, 185, 221 sense 6, 17, 37–38, 56, 58, 68, 75, 80, 83, 96, 98, 110–112, 116–120, 122–123, 139, 144 contiguity 10, 51, 56, 79 contrast 6–7, 9–10, 12–13, 19, 37, 40, 45, 47–48, 54–56, 91, 93–94, 96, 104–105, 110, 119, 136–137, 139, 143, 145–146, 211, 221 conventional metaphor 88, 92, 110–111 conventionalization 49, 84, 117 conventionalized 6, 9, 28–31, 33, 48–49, 81, 92, 104, 124, 173, 208 conversation 22, 44, 61–69, 71, 73–74, 76–77, 79–80, 84–86, 90, 124, 127–129, 138, 141–142, 145, 151, 156–162, 164, 173–174, 177, 181, 183, 188, 194–196, 198–201, 203, 205–208, 210–213, 217, 223–224

co-reference 115–116 corpus 15–16, 20, 22, 27, 30–31, 43, 46–47, 49, 57, 62–63, 71, 80, 85, 89–90, 96–97, 102, 105, 108, 111, 115, 124, 127–135, 141, 145, 147, 149, 152, 162, 167–174, 176, 178–181, 183–186, 190, 192, 196, 198, 201, 208 cross-domain mapping 9–15, 25–26, 32, 38–39, 40–42, 57–58, 94, 96, 117, 121, 166, 184 D data analysis 7, 19–20, 28, 63 data collection 7, 14, 16, 19, 21, 23, 69, 221 demonstrative 40, 66, 68, 70–73, 85, 109–110, 116–117, 119, 128, 139–140, 142–143, 155, 211 determiner 23, 115, 142, 196, 198–200, 202–203, 205, 207–112 deviation 87 diachronic 35, 115, 124 dictionary 6–7, 15–18, 21, 28–38, 41, 45, 47–50, 52, 54–56, 59, 65, 67–68, 70, 74–78, 82–85, 93, 100, 102, 104–105, 109–112, 115, 124, 127, 130–131, 133, 135–139, 143–145, 147, 154, 162, 164, 171–172, 185–186, 188, 190, 192–193, 221 dimension 28, 79, 108, 117, 124, 198, 200–201, 209 direct metaphor 15, 26, 39, 41, 57–58, 108, 123–125, 177 directly expressed metaphor 11, 58–59, 92, 94–95, 105 disagreement 8, 44–46, 54, 59, 80, 109, 111–112, 131, 138, 140–144, 147, 162, 180 Discard For Metaphor Analysis (DFMA) 33, 73, 85, 142, 152, 162, 173, 181, 189 discourse 1, 4–5, 8, 11–17, 19–22, 25, 27–28, 30–31, 39, 41, 43, 45, 58–59, 61–62, 64, 67–68, 70–71, 74,

79, 81–82, 85–86, 90–91, 93–94, 97, 100–101, 103, 105–112, 114–120, 122–125, 127–129, 132, 137–140, 147, 151, 165, 168, 178–179, 183–185, 190–198, 201, 203–204, 206–209, 211–222 discourse analysis 15, 20, 27, 107 distinct 9, 14–17, 26, 30–31, 33, 37–39, 41, 48, 54–56, 65, 67, 74–77, 85, 95–96, 99, 105, 110, 114, 117–118, 121–123, 139–140, 143–144, 146–147, 168, 201 distinctness 13, 37, 48, 143, 145, 185 Dutch discourse 22, 127–128, 138, 140, 147, 185 E educational discourse 107–108 ellipsis 21, 26, 39–40, 221 evidence 4, 37, 76, 105, 123, 125, 219, 221 experimental 20, 67, 183, 219 expert knowledge 44, 47 expertise 108, 112, 124, extension 76, 80 F familiar 48, 52, 78 fiction 87, 89–90, 92, 96–98, 102–103, 105–106, 151, 156–158, 160–161, 175, 194–196, 198–201, 203–213, 216–218, 224 foregrounding 87, 98 foregrounded 85, 101 forms of metaphor 8, 10–12, 15, 21, 26, 39, 94, 96, 105–106, 168, 186, 190 function word 145–146 G genre 119, 123 gesture 69, 86 I identification 1–5, 7–10, 19–21, 26, 107–112, 137–138,

149–153, 155–156, 164–165, 172, 177 idiom 81, 83–84 image mapping 95–96 implicit metaphor 15, 26, 39–40, 42, 66, 116, 121–122, 176–177, 181–182, 186, 188, 221 incongruity 11, 92–93, 96, 105–106 indirectness 10–11, 13, 15–16, 21, 41, 58, 176, 220 indirect meaning 6, 9–10, 26, 39, 42, 58, 121 metaphor 15, 58, 177, 181, 188 indirectly expressed metaphor 105 interaction 61–64, 67, 77, 80, 173 interactional 61, 81, 198, 218 interactive 68, 85 interpretation 2–3, 7, 34, 46, 51–52, 62–63, 69–72, 74, 77–78, 80, 82–85, 101–103, 119, 121, 149, 212–215 involvement 64–66 J journalism 36, 198 journalistic 45, 49, 57, 89 L language user 7, 9, 21, 30, 34, 38, 40–41, 47, 52, 78, 84, 112, 220 lexical unit 5–6, 12–13, 16–17, 19, 27–42, 73, 80–81, 132–138, 167–168, 170, 172, 221 lexicalized 81 linguistic form 6, 12, 18, 61, 94, 101, 103 linguistic metaphor 21, 57, 63, 96, 220 literature 87–90, 103, loose talk 75, 220 M measure 12, 19–20, 150, 161, 181, 219, 221 measurement 2–3, 18, 43, 100–101, 149–150, 161, 190

Index  metaphor flag 26, 42, 94, 192 processing 13, 67, 190 MFlag 26, 41–42, 123, 175–177 metaphor identification 1–5, 7–9, 19–21, 26, 43–46, 58–59, 61, 68, 87, 107–112, 118, 124, 127–128, 132, 134, 137–138, 145–147, 149–151, 153, 155, 158, 161–162, 164–165, 183–184, 186, 190, 193, 218–220, 222 metaphorical language 6, 22, 43, 57–58, 134, 145, 147, 174, 193, 195 metaphorically used 4–5, 8–9, 11, 13–14, 20, 25, 38, 40, 45–56, 58, 63–68, 70, 72–73, 75–76, 78, 82, 84–85, 89–92, 94, 96–101, 104–107, 110, 113–117, 120–122, 130, 134, 136–137, 141, 143–144, 147, 150, 177, 186, 190 metaphorically used word 58, 65, 67–68, 70, 76, 110, 113–114, 117, 130, 134 Metaphor Related Word (MRW) 25–26, 33, 39–41, 80, 94, 106, 114, 117, 119–120, 122, 137, 139, 141–144, 154–155, 157, 159–160, 163, 173, 175–176, 179–180, 187, 189, 195–196, 202–203, 205–206, 210, 213–215 metonymy 10, 34, 46, 79–80, 82, 84–85, 97, 101–102, 110, 117–118, 151 metonymic 10, 34, 50–52, 56, 79–80, 82–85, 97, 100–102, 117 MIP 4–18, 20–21, 23, 26, 32, 44, 46–47, 52, 56–59, 62, 91–94, 100, 105–106, 127, 134, 137–138, 148, 165, 186, 219–222 MIPVU 5, 7, 9, 11, 16–17, 19–23, 25, 43–44, 52, 56–57, 59, 62–69, 71–72, 79–82, 84–85, 90–92, 94, 101, 103–106, 108–109, 112, 115, 117–120, 122–125, 127–128, 130, 132, 134, 137–138,

147–148, 151, 165, 184, 186, 188, 219, 221–222 model 2, 62, 69, 113–114, 196–197, 200, 209 monosemous 37, 54, 56, 93, 118, 123, 132, 135 morpheme 26 morphology 12, 21, 189, 220 morphological 30, 168 multimodal 43, 69, 84 multi-word expression 31, 68, 76, 171–172 N news 22, 43–47, 51–52, 57–59, 74, 90, 97, 101, 103, 111, 124, 127–129, 138, 140–141, 143, 145–146, 151, 156–164, 173, 183, 194–196, 198–201, 203–204, 206–218, 225–226 novel compounds 46–48, 171, 186, 220 O on-line 69, 82 operationalization 7, 12, 16, 18, 21, 220 P part of speech (pos-tag) 27, 129, 133, 147 performance 124, 150–151, 156–157 personification 34, 50–52, 78, 92, 101–106, 108–111, 144 persuasive function 121–122 phrasal verb 28–30, 79–80, 82, 141, 169 phrase 26, 30, 34, 71, 75, 82–83, 114–115, 118, 133–134 phraseology 189, 220 polyword 68–69, 135–137, 168, 170–171 possible personification 34, 50, 52, 103, 105–106 PP 61, 187 Pragglejaz Group 4–10, 12–13, 15–16, 18, 20–21, 26–27, 44, 46, 52, 62, 79, 81, 102, 127, 133, 161, 184, 186, 220–221 Pragglejaz method 54, 165

 Index preposition 11, 28–31, 45, 58, 91, 99–101, 106, 111, 133–134, 136, 145–146, 169, 171, 176, 196, 208 production 44, 61, 66, 81, 198, 201, 209, 217–218 pronoun 30, 52, 68, 73, 78, 116–117, 122, 142–143 proposition 15, 219 proverbs 79, 84 psycholinguistics 2–3, 13, 107 psychology 1, 107–108, 115 psychological 9, 98 R realism 89, 97 reference 15, 36, 57, 105, 115–116, 119, 142 referent 11–14, 16–17, 21, 26–28 30–32, 35, 38–39, 48, 92–94, 100, 123 register 168, 177, 184 194–210, 212, 216–218 relation 13, 17, 37–38, 51, 83, 100, 140, 158, 168, 191–192, 194, 201, 203–204, 206, 209, 212, 214, 216–217 reliability 4, 8, 20, 22, 44, 111, 128, 149–159, 161–164, 177–181, 186, 219, 221–222 rhetorical form 102 S scale 18–19 scientific analogy 113, 115 selection restriction 105–106, 110, 115 semantic 10, 37, 50, 76, 87, 110, 185–186, 208, 211–212

semiotic 21, 209, 220 similarity 8–10, 13–16, 18, 21, 33, 37–38, 79, 95, 220 simile 21, 40, 58, 93–95, 103, 106, 108, 124, 175, 190, 220–221 source domain 11–12, 14, 39, 93–95, 105, 123, 175 specialization 111–112 specialized terms 45–47, 59, 112 specification 54, 56, 75, 116 speech 61–62, 64, 69, 71, 77, 87, 90, 127, 130, 142 spoken language 63, 69, 129, 205 stance 85, 122–123 steps 44, 46, 59, 93, 104, 138–139 substitution 21, 39–40, 121–122, 221 synchronic 54 synecdoche 38 syntactic 133 systematic metaphor 61 T talk 61–62, 86 target domain 11–12, 15, 39, 91–92, 94, 123 technical term 112–113, 115 text 5, 10, 14–15, 20, 57, 66–67, 86, 92–93, 97–98, 115, 117–119, 150–151, 158, 168 text management 108, 110, 115, 117–118 thought 1–2, 9–10, 88, 220 topic 11, 14, 26, 38–39, 41, 57, 86, 175–176

transparent 81, 83 transparency 81, 84 U unanimous 44–45, 59, 90, 105, 109–110, 113, 116, 128, 138–139, 154–157, 159–161, 163–164, 186 understanding 18, 88, 98 unit of analysis 12–13, 26–27, 59, 152, 167, 190 usage 1–2, 4, 6, 8–9, 11–14, 20–21, 25, 52, 56, 62, 114, 116–117, 119–120, 123–125, 210–211, 220 utterance 11–12, 62, 64, 69, 72–74, 142, 200 V vague language 65–66, 85 validity 2, 8, 149, 213–215, 219 variable 19 variation 52, 155, 157, 159, 197–198, 209, 217–218 vehicle 175 violation 105–106, 115 W When In Doubt, Leave It In (WIDLII) 19, 34, 46, 49, 59, 71, 79–80, 85, 101, 105–106, 113, 152, 173–174, 179–180, 187–188, 191–196 word class 16–17, 59, 127, 177–179, 194, 196–212, 217, 221 world knowledge 44, 59, 105–106

In the series Converging Evidence in Language and Communication Research the following titles have been published thus far or are scheduled for publication: 14 Steen, Gerard J., Aletta G. Dorst, J. Berenike Herrmann, Anna A. Kaal, Tina Krennmayr and Trijntje Pasma: A Method for Linguistic Metaphor Identification. From MIP to MIPVU. 2010. xi, 238 pp. 13 Pütz, Martin and Laura Sicola (eds.): Cognitive Processing in Second Language Acquisition. Inside the learner's mind. 2010. vi, 373 pp. 12 Zlatev, Jordan, Timothy P. Racine, Chris Sinha and Esa Itkonen (eds.): The Shared Mind. Perspectives on intersubjectivity. 2008. xiii, 391 pp. 11 Lewandowska-Tomaszczyk, Barbara (ed.): Asymmetric Events. 2008. xii, 287 pp. 10 Steen, Gerard J.: Finding Metaphor in Grammar and Usage. A methodological analysis of theory and research. 2007. xvi, 430 pp. 9 Lascaratou, Chryssoula: The Language of Pain. Expression or description? 2007. xii, 238 pp. 8 Plümacher, Martina and Peter Holz (eds.): Speaking of Colors and Odors. 2007. vi, 244 pp. 7 Sharifian, Farzad and Gary B. Palmer (eds.): Applied Cultural Linguistics. Implications for second language learning and intercultural communication. 2007. xiv, 170 pp. 6 Deignan, Alice: Metaphor and Corpus Linguistics. 2005. x, 236 pp. 5 Johansson, Sverker: Origins of Language. Constraints on hypotheses. 2005. xii, 346 pp. 4 Kertész, András: Cognitive Semantics and Scientific Knowledge. Case studies in the cognitive science of science. 2004. viii, 261 pp. 3 Louwerse, Max M. and Willie van Peer (eds.): Thematics. Interdisciplinary Studies. 2002. x, 448 pp. 2 Albertazzi, Liliana (ed.): Meaning and Cognition. A multidisciplinary approach. 2000. vi, 270 pp. 1 Horie, Kaoru (ed.): Complementation. Cognitive and functional perspectives. 2000. vi, 242 pp.

Asymmetric Events (Converging Evidence in Language and Communication Research (Celcr))

Origins of Language: Constraints on Hypotheses (Converging Evidence in Language and Communication Research (Celcr))

Applied Cultural Linguistics: Implications for second language learning and intercultural communication (Converging Evidence in Language and Communication Research, Volume 7)

Meaning and Cognition: A Multidisciplinary Approach (Converging Evidence in Language & Communication Research)

Applied Cultural Linguistics: Implications for second language learning and intercultural communication (Converging Evidence in Language and Communication Research, Volume 7)

The Language of Pain: Expression or description? (Converging Evidence in Language & Communication Research)

Cognitive Processing in Second Language Acquisition: Inside the Learner's Mind (Converging Evidence in Language and Communication Research (Celcr))

Cognitive Semantics And Scientific Knowledge: Case Studies In The cognitive Science Of Science (Converging Evidence in Language and Communication Research)

A Method for Linguistic Metaphor Identification: From MIP to MIPVU (Converging Evidence in Language and Communication Research)

Asymmetric Events (Converging Evidence in Language and Communication Research (Celcr))

Origins of Language: Constraints on Hypotheses (Converging Evidence in Language and Communication Research (Celcr))

Applied Cultural Linguistics: Implications for second language learning and intercultural communication (Converging Evidence in Language and Communication Research, Volume 7)

Meaning and Cognition: A Multidisciplinary Approach (Converging Evidence in Language & Communication Research)

Applied Cultural Linguistics: Implications for second language learning and intercultural communication (Converging Evidence in Language and Communication Research, Volume 7)

The Language of Pain: Expression or description? (Converging Evidence in Language & Communication Research)

The Language of Pain: Expression or description? (Converging Evidence in Language & Communication Research)

Speaking of Colors and Odors (Converging Evidence in Language and Communication Research)

The Shared Mind: Perspectives on intersubjectivity (Converging Evidence in Language and Communication Research (Celcr))

The Shared Mind: Perspectives on Intersubjectivity (Converging Evidence in Language and Communication Research)

From language to communication

Cognitive Processing in Second Language Acquisition: Inside the Learner's Mind (Converging Evidence in Language and Communication Research (Celcr))

Cognitive Semantics And Scientific Knowledge: Case Studies In The cognitive Science Of Science (Converging Evidence in Language and Communication Research)

Linguistic Method in Philosophy

MIP-1α, MIP-1β

From Molecule to Metaphor: A Neural Theory of Language

From Molecule to Metaphor: A Neural Theory of Language

Understanding Language Teaching: From Method to Post-Method

Research for Advanced Practice Nurses: From Evidence to Practice

Metaphor in American Sign Language

Language and Linguistic Nonviolence

Mind, Metaphor and Language Teaching

Content and Language Integrated Learning: Evidence from Research in Europe (Second Language Acquisition)

Metaphor in American Sign Language

Linguistic Auditing: A Guide to Identifying Foreign Language Communication Needs in Corporations (Topics in Translation, 9)

Studies In Linguistic Motivation (Cognitive Linguistic Research)

The Linearization of Affixes: Evidence from Nuu-chah-nulth (Studies in Natural Language and Linguistic Theory)

Language in the Context of Use: Discourse and Cognitive Approaches to Language (Cognitive Linguistic Research)

Studies In Linguistic Motivation (Cognitive Linguistic Research)

Language Teaching: Integrational Linguistic Approaches (Routledge Advances in Communication and Linguistic Theory)

A Method for Linguistic Metaphor Identification: From MIP to MIPVU (Converging Evidence in Language and Communication Research)

Asymmetric Events (Converging Evidence in Language and Communication Research (Celcr))

Origins of Language: Constraints on Hypotheses (Converging Evidence in Language and Communication Research (Celcr))

Applied Cultural Linguistics: Implications for second language learning and intercultural communication (Converging Evidence in Language and Communication Research, Volume 7)

Meaning and Cognition: A Multidisciplinary Approach (Converging Evidence in Language & Communication Research)

Applied Cultural Linguistics: Implications for second language learning and intercultural communication (Converging Evidence in Language and Communication Research, Volume 7)

The Language of Pain: Expression or description? (Converging Evidence in Language & Communication Research)

The Language of Pain: Expression or description? (Converging Evidence in Language & Communication Research)

Speaking of Colors and Odors (Converging Evidence in Language and Communication Research)

The Shared Mind: Perspectives on intersubjectivity (Converging Evidence in Language and Communication Research (Celcr))

The Shared Mind: Perspectives on Intersubjectivity (Converging Evidence in Language and Communication Research)

From language to communication

Cognitive Processing in Second Language Acquisition: Inside the Learner's Mind (Converging Evidence in Language and Communication Research (Celcr))

Cognitive Semantics And Scientific Knowledge: Case Studies In The cognitive Science Of Science (Converging Evidence in Language and Communication Research)

Linguistic Method in Philosophy

MIP-1α, MIP-1β

From Molecule to Metaphor: A Neural Theory of Language

From Molecule to Metaphor: A Neural Theory of Language

Understanding Language Teaching: From Method to Post-Method

Research for Advanced Practice Nurses: From Evidence to Practice

Metaphor in American Sign Language

Language and Linguistic Nonviolence

Mind, Metaphor and Language Teaching

Content and Language Integrated Learning: Evidence from Research in Europe (Second Language Acquisition)

Metaphor in American Sign Language

Linguistic Auditing: A Guide to Identifying Foreign Language Communication Needs in Corporations (Topics in Translation, 9)

Studies In Linguistic Motivation (Cognitive Linguistic Research)

The Linearization of Affixes: Evidence from Nuu-chah-nulth (Studies in Natural Language and Linguistic Theory)

Language in the Context of Use: Discourse and Cognitive Approaches to Language (Cognitive Linguistic Research)

Studies In Linguistic Motivation (Cognitive Linguistic Research)

Language Teaching: Integrational Linguistic Approaches (Routledge Advances in Communication and Linguistic Theory)

Recommend Documents