Discourse In The Professions: Perspectives From Corpus Linguistics (Studies in Corpus Linguistics, SCL 16)

Discourse in the Professions Studies in Corpus Linguistics Studies in Corpus Linguistics aims to provide insights i...

Author: Ulla Connor | Thomas A. Upton

214 downloads 1613 Views 3MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Discourse in the Professions

Studies in Corpus Linguistics Studies in Corpus Linguistics aims to provide insights into the way a corpus can be used, the type of ﬁndings that can be obtained, the possible applications of these ﬁndings as well as the theoretical changes that corpus work can bring into linguistics and language engineering. The main concern of SCL is to present ﬁndings based on, or related to, the cumulative eﬀect of naturally occuring language and on the interpretation of frequency and distributional data. General Editor Elena Tognini-Bonelli Consulting Editor Wolfgang Teubert Advisory Board Michael Barlow

Graeme Kennedy

Rice University, Houston

Victoria University of Wellington

Robert de Beaugrande

Geoﬀrey Leech

Federal University of Minas Gerais

University of Lancaster

Douglas Biber

Anna Mauranen

North Arizona University

University of Tampere

Chris Butler

John Sinclair

University of Wales, Swansea

University of Birmingham

Sylviane Granger

Piet van Sterkenburg

University of Louvain

Institute for Dutch Lexicology, Leiden

M. A. K. Halliday

Michael Stubbs

University of Sydney

University of Trier

Stig Johansson

Jan Svartvik

Oslo University

University of Lund

Susan Hunston

H-Z. Yang

University of Birmingham

Jiao Tong University, Shanghai

Volume 16 Discourse in the Professions: Perspectives from corpus linguistics Edited by Ulla Connor and Thomas A. Upton

Discourse in the Professions Perspectives from corpus linguistics

Edited by

Ulla Connor Thomas A. Upton Indiana University-Purdue University Indianapolis

John Benjamins Publishing Company Amsterdam/Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Cover design: Françoise Berserik Cover illustration from original painting Random Order by Lorenzo Pezzatini, Florence, 1996.

Library of Congress Cataloging-in-Publication Data Discourse in the Professions : Perspectives from corpus linguistics / edited by Ulla Connor and Thomas A. Upton. p. cm. (Studies in Corpus Linguistics, issn 1388–0373 ; v. 16) Includes bibliographical references and indexes. 1. Sublanguage--Data processing. 2. Discourse analysis--Data processing. I. Upton, Thomas A. (Thomas Albin) II. Title. III. Series. P120.S9C666 2004 418’.00285--dc22 isbn 90 272 2287 8 (Eur.) / 1 58811 573 9 (US) (Hb; alk. paper)

2004055952

© 2004 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microﬁlm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa

Contents

v

Contents Introduction Ulla Connor and Thomas A. Upton, Editors Section I The argument for using English specialized corpora to understand academic and professional language Lynne Flowerdew

1

11

Section II Stylistic features of academic speech: The role of formulaic expressions Rita C. Simpson

37

Academic language: An exploration of university classroom and textbook language Randi Reppen

65

A convincing argument: Corpus analysis and academic persuasion Ken Hyland Section III // æ so what have YOU been WORking on REcently //: Compiling a specialized corpus of spoken business English Martin Warren TOOK // à did you // ä from the miniBAR //: What is the practical relevance of a corpus-driven language study to practitioners in Hong Kong’s hotel industry? Winnie Cheng “Invisible to us”: A preliminary corpus-based study of spoken business English Michael McCarthy and Michael Handford

87

115

141

167

vi

Contents

Legal discourse: Opportunities and threats for corpus linguistics Vijay K Bhatia, Nicola N. Langton and Jane Lung Section IV The genre of grant proposals: A corpus linguistic analysis Ulla Connor and Thomas A. Upton Rhetorical appeals in fundraising direct mail letters Ulla Connor and Kostya Gladkov

203

235

257

Framing matters: Communicating relationships through metaphor in fundraising texts Elizabeth M. Goering

287

Pronouns and metadiscourse as interpersonal rhetorical devices in fundraising letters: A corpus linguistic analysis Avon Crismore

307

Introduction

1

Introduction Ulla Connor and Thomas A. Upton, Editors

Corpus linguistics is a relatively new approach to language studies that has the opportunity to revolutionize the teaching and learning of discourses for specific purposes. e first major wave in corpus linguistics starting in the 1960s focused on developing large corpora that represent a wide range of language use so as to make general observations about how people structure and use language – spoken and written. e Brown corpus of written American English (press reporting, fiction, government documents) contained one million words; its British counterpart, the Lancaster-Oslo-Bergen corpus, consisted of one million words of British English texts. e London-Lund corpus, an example of a spoken language corpus, included 500,000 words of spoken British English from various genres. e second wave of general corpora, which started in the 1980s, has produced mega-corpora such as the 450 million-word Bank of English corpus, the 100 million-word Cambridge International Corpus, and the 40-million word Longman Spoken and Written English Corpus. e 100 million-word British National Corpus (BNC) is made up of 90% written text, and the American National Corpus, which is currently under construction, is intended to model the composition of the BNC. Unlike many early corpora, both contain complete texts rather than sections of texts. Corpora of texts of over a billion words are also being developed using websites and newswire texts as data sources. Corpus analysis techniques throughout the decades, utilizing such large general corpora, have provided evidence about recurring language patterns and about the lexical, grammatical, and lexico-grammatical aspects of language use. Such study of language patterning has been valuable for constructing general grammars and dictionaries, for example. Very importantly, linguists involved in the building of computerized corpora in the last five decades have provided a powerful argument for studying actual language use rather than elicited language samples provided by native speaker intuition which were typically used until recently.

2

Ulla Connor and Thomas A. Upton

While general corpora are important and provide a critical foundation for the study of language structure and use, they are less conducive for analyzing language use in specific academic and professional situations. Consequently, there is now a strong and growing interest in compiling specialized corpora that focus on specific types of genres within specific contexts. Instead of being compiled for the representativeness of language across a large number of communicative purposes, specialized corpora oen focus on one particular genre (e.g. research papers, letters of business requests) or a specific situation (e.g. academic lectures, office communication in business). As the chapters in this volume show, corpora compiled for specific academic and professional purposes have some advantages for the teacher and student in applied linguistics. First, many specialized corpora include complete texts for a specific purpose instead of sample sections of texts. is allows for top-down analyses that utilize insights from text linguistics and genre analysis to structure the examinations. Second, because specialized corpora are oen small and collected by the analyst, these corpora oen include more contextual information about the communicative situation than larger, general corpora. is is useful for context-sensitive analyses. Observations made on the function and use of language for particular purposes can be used for training and development, helping newcomers to the genre understand and use appropriately its key features and giving experts tools for improving and enriching the language they use. is book presents the field of specialized corpus analysis through individual contributions by a number of the leading researchers working in this area. e contributions have been selected to show the breadth and depth of the special corpus field. e breadth is obvious in the variety of different professional and academic discourses – oral and written, “standard English” and international English – that are treated by the chapters; the depth is reflected by the wide variety of analyses that can be done on any particular corpus, as showcased by the multiple studies done on the ICIC Fundraising Corpus in Section IV. Aer Flowerdew’s overview chapter on the use of specialized corpora for the study of specific genres, each ensuing chapter, in order to make the chapters accessible for researchers, teachers and trainers of English for Specific Purposes, provides an explicit description of the contribution of the corpus linguistics approach to the study of the discourse in question. Each chapter gives an overview of the specific corpus used in the research, including a description of the type, size, and collection process of the corpus, along

Introduction

3

with the rationale for its development. e second common goal is to explain the type of corpus analysis that was done as well as the benefit that this type of analysis offers to understanding the discourse of academic and professional genres in general. Last, each chapter includes a discussion of the implications of the analysis for teaching in English for Academic Purposes (EAP) or English for Specific Purposes (ESP) settings. A wide range of topics, genres, and approaches have been included in order to provide the reader with a variety of concepts and practical outcomes. Despite the varied approaches, the contributions together make a powerful argument for the use of specialized corpora in the research and teaching of situated, authentic language for academic and professional purposes. e book is divided into four sections. In Section I, Lynne Flowerdew examines the emergence of specialized corpora over the last ten years and provides a powerful argument for using specialized corpora to understand academic and professional language. She first makes a case for using general corpora to understand language systems as a whole and discusses the major contributions of some of the significant general corpora. She then gives the rationale for using specialized corpora to understand academic and professional language based on two major premises: the unsuitability of general corpora for these purposes and the methodological advantages of using specialized corpora. A major portion of the chapter deals with the presentation of exemplary specialized corpora as well as future directions in the building of such corpora. e chapter ends with a set of helpful guidelines for building a specialized corpus. Moving from Flowerdew’s comprehensive overview, Section II presents and explains corpora and their use in academic settings. e three chapters in this section explore the uses of a corpus of academic spoken English (MICASE), a corpus of spoken and written academic language (the TOEFL 2000 Spoken and Written Academic Language), and a corpus of academic research articles. e first chapter in this section, by Rita Simpson, showcases the MICASE corpus and presents an analysis of high-frequency formulaic expressions in the corpus. She examines the pragmatic functions these expressions serve in academic speech and compares students’ and professors’ use of these expressions in both monologic and interactive settings. Simpson’s chapter serves as a remarkable demonstration of how a well-designed corpus approach considers the functions of linguistic features in texts, not just their forms. In the second chapter in this section, Randi Reppen summarizes and extends research on the TOEFL 2000 Spoken and Written Academic Language corpus,

4

Ulla Connor and Thomas A. Upton

focusing on two spoken and two written registers (i.e. classroom lectures, labs/ in-class sessions, textbooks, and course packs). Reppen chose Douglas Biber’s Linguistic Dimension 1: Involved vs. Informational production (Biber 1988) and the variable of “lexical bundles” to study the four registers in her corpus sample. Her analysis shows that through specialized corpus linguistics it is possible to provide rich, accurate descriptions of language use. In the chapter, the results showed that all four registers present linguistic challenges that are not found in other registers which students need to learn. is chapter is an excellent introduction to the potential uses of this impressive corpus. e third chapter in this section by Ken Hyland discusses how a specialized corpus of research articles has been used to explore rhetorical practices in academic persuasion across a range of disciplines. Using a corpus of 240 published papers (1.3 million words) from engineering, marketing, philosophy, sociology, applied linguistics, physics, and biology, he defines in the study three key elements as constituting persuasion in academic writing – citations, interaction, and self-mention – and analyzes their use across disciplines. Hyland shows through such a corpus analysis that it is not enough for writers to identify a valued disciplinary issue and report their study of it, but they have to demonstrate its significance and locate it within disciplinary context through these three elements. Ken Hyland is a leading researcher in the area of corpus-based academic discourse, and this chapter is another clear indication of the benefits of such an approach to teachers and researchers in the field. Section III presents corpus analyses of professional and business discourses. Two chapters discuss the design and compilation of the Hong Kong Corpus of Spoken English as well as ways to analyze it for the optimum impact for ESP training. e two other chapters in the section describe research using a corpus of spoken business English and a corpus of written legal language. Martin Warren introduces the Hong Kong Corpus of Spoken English (HKCSE), which – when completed – will have four sub-corpora each consisting of half a million words of conversation, business discourse, academic discourse, and public discourse, respectively. Altogether 200 hours of spoken discourses have been collected with participants consisting of Hong Kong Chinese (first language Cantonese) and either native speakers of English or speakers of languages other than Cantonese. is is the largest corpus of naturally-occurring spoken business English collected in Hong Kong and is unique in many ways, including the fact that the corpus is both orthographically and prosodically transcribed. e chapter describes the discourse intonation

Introduction

5

system used to mark the prosodic elements or the intonation of the speakers and shows how the study of prominence, tone, key, and termination – the key elements of the system – is valuable when interpreting the pragmatic and intercultural aspects in the business sub-corpus of the HKCSE. Another key element of the chapter deals with the design of this corpus of professional discourse. Warren shows how the HKCSE emphasizes the importance of including the professional insiders in collecting and analyzing professional discourse. Principles articulated by Srikant Sarangi for successful data collection and analysis of professional discourse were carefully followed in the HKCSE for the purpose of the most effective collaboration with practitioners. e chapter does a remarkable job in articulating key features of effective collaboration in the compilation of specialized professional corpora. In her chapter, Winnie Cheng provides a fascinating analysis of the Hong Kong business corpus with the intent of showing practical relevance of the corpus to the practitioners in Hong Kong’s hotel industry. e discourse of checking out of the hotel is studied using word frequency lists and collocations. is leads to looking at move structure, intonation, and pragmatic context. From a language training point of view, the findings showed the need to address issues related to lexico-grammatical accuracy as well as intonational appropriacy. Also in this section, Michael McCarthy and Michael Handford introduce the CANBEC corpus of the Cambridge and Nottingham Business English Corpus. e corpus currently stands at ca. 900,000 words of spoken business British English recorded in a variety of business meeting settings. e study reported in this chapter comprised of a subcorpus of 250,000 words from the CANBEC and compared those data to casual conversation using key word, collocational, and “cluster” analysis, supplemented with a qualitative analysis. e findings suggested that comparative data are indeed valuable; spoken business English are found to share aspects of both common conversational English and academic data. e chapter concludes with a valuable discussion about the pedagogical implications of such analyses of spoken business English corpora. e authors underscore the point that existing teaching materials may not reflect accurately the language and interactional complexities that their natural business language data display. e last chapter in this section by Vijay Bhatia, Nicola Langton, and Jane Lung looks at written legal language. e authors first assess the usefulness of available resources in corpus linguistics for analyzing written legal discourse. ey point out the almost standardized use of forms of qualifications or hedg-

6

Ulla Connor and Thomas A. Upton

ing and their syntactic positioning in legal discourse. ese patterns are easily identified using standard quantified corpus analysis techniques, as a sample analysis in the chapter shows, and are teachable to writers. e authors further show that some legal discourse offers challenges for the researcher and the teacher due to complex features of intertextuality and interdiscursitivity, features that make legal texts difficult to understand and master. Section IV contains four chapters devoted to the description of the twomillion word corpus of written fundraising discourse compiled at the Indiana Center for Intercultural Communication (ICIC). e computerized corpus consists of the most important fundraising genres – direct letters, case statements, grant proposals and annual reports – 927 pieces in all, collected from 236 U.S. nonprofit organizations. Fundraising discourse has been a relatively neglected area in linguistic research unlike the discourse of business and law, for example. As fundraising for philanthropic purposes gains importance around the world, its linguistic study should also increase. e first of these chapters by Ulla Connor and omas Upton presents an analysis of grant proposals written for the purposes of nonprofit fundraising. e authors point out that the writing of grant proposals is of primary importance to non-profit organizations working in the arts, education, and human services. e goal of the study reported in the chapter was to use a textlinguistic approach in corpus analysis by combining a rhetorical “moves” analysis with a multidimensional linguistic analysis to develop a linguistic profile of the genre as well as identify linguistic features that characterize the rhetorical moves in the proposals. By combining a corpus-based linguistic dimensional analysis with a Swalesean genre analysis, they were able to present a more accurate and detailed description of the genre of non-profit grant proposals. e three other chapters in this section investigate the corpus of fundraising letters as part of the ICIC corpus. Ulla Connor and Kostya Gladkov also chose a textlinguistic/rhetorical approach for the study of the 245 fundraising letters (over 190,000 words) in the corpus. Based on rhetorical theories of persuasion, they developed a system of 19 appeals consisting of logos, ethos, and pathos. A high interrater reliability was achieved when this system was applied by hand to the corpus of the letters. Quantitative analyses describe the distribution of the appeals in the letters across the fields of nonprofit organizations. e expectation was that fundraising letters would contain the ethical and emotional appeals rather than rational ones. Yet, the findings showed that

Introduction

7

overwhelmingly, independent of the field, the letters favor the rational appeal, or logos. e chapter concludes with the discussion of preliminary computerized key word analyses of the data that chart directions for further manipulations of the data. Elizabeth Goering uses a computer analysis to identify ways in which metaphors are used to create relationships in fundraising letters. She used relational communication theory to come up with the constructs of dominance and aYliation that generated the major metaphor types in the letters (e.g. friend, savior, partner, and investor). Goering’s method of identifying the metaphors in the texts is probably new to most corpus linguists. She uses a soware package for qualitative analysis of text, called QSR Nud*ist to search for words aer an initial handcoded pilot analysis. Her findings showed that overwhelmingly the partnership or friendship metaphors are invoked but there are interesting differences in the use of metaphor types across fields. Goering’s chapter provides an alternative perspective on the study of specialized corpora from a discipline outside of linguistics. e last chapter in the section is by Avon Crismore, whose research on metadiscourse is world-renown. In this study, she applies her theories to the study of the fundraising letters in the ICIC corpus and compares the use of interpersonal pronouns and other metadiscoursal features in letters from health and human agencies with letters from educational organizations. e findings confirmed the similarity of the writers from the two fields regarding the most frequently used interpersonal pronouns. Both sets of writers were aware of personal and interpersonal pronouns as effective rhetorical devices for credibility and affective appeals for persuasion. e results of metatextual analyses, however, revealed some interesting differences in the use of these features by the two groups of writers. Crismore’s chapter ends with an exceptionally useful discussion about the uses of corpus analysis for the teacher of fundraising writing as well as the teacher of ESL. e chapters in this section show how a corpus can encourage interdisciplinary inquiry of language use as the authors use strong theoretical orientations of their specific fields – rhetoric and communication studies – in tackling the ICIC corpus. e papers collected in this volume represent some of the most up-to-date research on corpus design and analysis of professional and academic discourse, spoken and written. Together, these chapters serve as an introduction to the

8

Ulla Connor and Thomas A. Upton

types of research that can done with specialized corpora and how specialized corpora can be used for practical purposes like teaching and training. e myriad of questions posed, the variety of methods used to collect data, and the selection of analyses conducted speak to the versatility, strength, and potential of specialized corpora in the study of language genres. We would like to express our gratitude to many individuals who helped in the preparation of this book. We wish to thank Elena Tognini-Bonelli, general editor, and Kees Vaes, managing editor at Benjamins, for their encouragement and assistance throughout the editing of this book. We are grateful for an anonymous reviewer’s generous and helpful comments on an earlier dra of this book. We also wish to thank Jing Gao, Kyle McIntosh, and Toula Vasilopoulos, research assistants at the Indiana Center for Intercultural Communication at IUPUI, for their diligent editorial work in preparing this book for publication. Finally, we greatly appreciate the outstanding scholarship and wonderful cooperation of the authors who worked quickly and diligently to write their chapters. It has been a pleasure working with all of these people. Indianapolis, Indiana July 2004 Ulla Connor and omas A. Upton

e argument for using English specialized corpora

Section I

9

10

Lynne Flowerdew

e argument for using English specialized corpora

11

The argument for using English specialized corpora to understand academic and professional language Lynne Flowerdew Hong Kong University of Science and Technology

Introduction Before putting forward the argument for using specialized corpora to understand academic and professional language, I will first of all make a case for using general corpora to understand the language system as a whole by examining the underlying principles for using corpus linguistic techniques. I will argue that as general corpora have proved to be extremely useful for understanding how naturally-occurring language operates, then by the same token, specialized corpora can also prove to be of value in understanding academic and professional language. e rationale for using specialized corpora to understand academic and professional language is based on two premises: the unsuitability of general corpora for these express purposes, and the methodological advantages inherent in using specialized corpora to understand language, advantages which do not always pertain to general corpora. At the same time, I will also consider some drawbacks of specialized corpora and suggest how these can be resolved. Having built, what I hope, is a strong case for using specialized corpora to study discourse in the academy and the professions, I will then look closer at how various corpus linguists interpret the term specialized corpora. Next, I will give a broad overview of the types of academic and professional corpora compiled to date with suggestions for future development in this burgeoning field. I will conclude this section with a discussion of a set of guidelines to be considered for the building of specialized corpora.

12

Lynne Flowerdew

The rationale for using general corpora to understand language Computerized general corpora, as their name implies, are classified as such because they consist of different types of spoken and written texts. Compiled in the 1960’s, the one million word Brown corpus of written American English and its British counterpart the one million word Lancaster-Oslo-Bergen (LOB) corpus both consist of 2,000 word samples of 500 texts which are spread across 15 categories, including press reportage, learned and scientific writings and science fiction. e 500,000-word London-Lund corpus, compiled from the mid-seventies to mid-eighties as the spoken component of the Survey of English Usage corpus, is an example of a general corpus of spoken English, comprising various spoken genres such as conversations and radio broadcasts. In their day, such corpora were considered as large-scale, but by today’s standards would be judged as relatively small-scale (see Taylor et al. 1991 for an overview of these corpora). A few decades later, the profile of general corpora has changed considerably, not only in terms of their size but also their internal composition. ese corpora, referred to by Kennedy (1998:45), as second generation mega-corpora, are an amalgamation of different types of spoken and written texts of both British and American English. Examples of such corpora are the 450 million word Bank of English corpus, which evolved from the Collins Cobuild corpus (Sinclair 1987a), the 100 million word Cambridge International Corpus and the 40 million word Longman Spoken and Written English Corpus. e 100-million word British National Corpus (BNC) is another general corpus, dedicated to British English and made up of 90% written text. e American National Corpus (ANC), currently under construction, is intended to model as much as possible the texts in the BNC. In addition to their size, another way in which these recent corpora differ from the first generation general corpora is that they contain full texts rather than 2,000 word samples, which has implications for any subsequent analysis. We now seem to be moving into the era of giga-corpora with the release by the Linguistic Data Consortium of the English Gigaword Corpus, a corpus of English newswire text of over a billion words. Computerized general corpora have therefore been around for almost half a century. Although their profile has changed enormously, facilitated by technological advances in database storage and text-retrieval systems, the underlying theoretical rationale for using corpora to understand language remains essentially unchanged. Corpus analysis provide attested examples of recurring

e argument for using English specialized corpora

13

language patterns, which are based on empirical data rather than introspection or gathered through elicitation techniques. Much of the research carried out using the first generation corpora focused on lexical, grammatical or lexicogrammatical aspects (cf. Kjellmer’s 1990 research on patterns of collocability in the Brown corpus; Altenberg’s 1993 study of recurrent verb-complement constructions in the London-Lund corpus). More recent corpus-based work also considers language from this phraseological perspective (e.g. Partington 1998; Hunston and Francis 2000), but from a broader lexico-grammatical base also focussing more on meaning. For example, Hunston and Francis (2000) note that verbs which share the same patterning also have aspects of meaning in common and thus view lexis and grammar as interdependent. As Barlow (1996) points out this corpus-based empiricist approach to language reveals phraseological patterns, which could not be accounted for in the Chomskyan view of language: Some words simply like to occur near to each other, a fact that is unaccounted for within the generative paradigm, which has focused on creativity in the sense of a free association of lexical items, governed only by the general constraints of the grammar. (Barlow 1996:15)

e main criticisms that have been levelled against corpus-based approaches to linguistics emanate from the Chomskyan camp. However, the corpus-based approach to understanding language, which focuses on what is actually said or written (i.e. performance-based) should not be seen as denying the value of the rationalist approach of Chomskyan linguistics where language is understood in terms of what can be said or written (competence-based) derived from introspective reflection and elicitation techniques. Several leading researchers in the field (Conrad 2000; Granger 2002; Johansson 1991; Stubbs 2001) have emphasised that corpus linguistics is but one tool for understanding language and that this source of evidence for understanding language behaviour can also be supplemented with introspection and elicitation techniques. In fact, Stubbs, whose corpus work mainly focuses on the study of meaning, views native-speaker intuition of value in the area of semantics and pragmatics where “intuitions are strong and stable” (Stubbs 2001:71). Neither are the routinized patternings uncovered through corpus linguistic techniques necessarily incompatible with the notion of creativity associated with the more paradigmatic Chomskyan approach to language; Stubbs (2001) demonstrates how creativity can also be accommodated within the routines of phraseology.1 In the foregoing discussion, I have made a case for using general corpora

14

Lynne Flowerdew

for understanding language patterns. Moreover, the wealth of corpus-based research that has been carried out on general corpora in the last few years and the applications of corpus data from the more recent mega-corpora to inform the construction of general grammars and dictionaries (see the Collins COBUILD English Language Dictionary (Sinclair 1987b) and the Longman Grammar of Spoken and Written English (Biber et al. 1999)) testify to their overwhelming importance in exploring and understanding the language system and exploiting the findings for general pedagogic purposes. I would maintain that the rationale for using general corpora to understand language also applies to specialized corpora. Such corpora can play an equally important role in understanding language of a more specific academic and professional nature, as general corpora may not be suited for this role as explained below.

The rationale for using specialized corpora to understand academic and professional language e argument for using specialized corpora to understand the kind of academic and professional language described in this volume hinges on the argument as to why general corpora may not be suitable for investigating specialized language. Although the general-purpose corpora reviewed above contain a wide variety of written and spoken genres, which can themselves be regarded as comprising specialized sub-corpora, such sub-corpora may not be appropriate for understanding various types of specialized language for several reasons. First, general-purpose corpora have been compiled for their representativeness of the language as a whole and carefully balanced among the different types of text for reception and production to reflect their importance in the culture, which means that there will be a limited representation of some genres. For example, fiction is accorded more weighting in the BNC than poetry or drama because more fiction is read and published than the other two categories of imaginative writing in British culture (Aston 2001). As Aston cautions, general mixed-reference corpora have been compiled for the purpose of inferring generalisations about the language as a whole, or about broad categories of texts, not for the purpose of researching language patterns in specialised corpora, which, if not figuring significantly in British culture would not be accorded much weighting in the general corpus.

e argument for using English specialized corpora

15

Secondly, even though a general corpus might contain a specialized subcorpus of a suitable size, logistically it may be difficult to access such a corpus in a general corpus as the search fields have not been set up with this purpose in mind. is was the case initially with the BNC (Burnard 2002). However, Lee (2001) provides a reclassification of the text types within the BNC to accommodate searches for specialised corpora within a sub-domain. A third point is that some types of discourse are not easily accessible for compilation, as pointed out by Connor and Upton in the introduction to this volume. For example, spoken corpus data are generally regarded as more timeconsuming to collect than written corpus data, and therefore may only comprise a small proportion of a general corpus. In fact, Burnard (2002) acknowledges that pragmatic and economic concerns are the main reasons why spoken material only comprises 10% of the BNC. Besides spoken data, those discourses which are not in the public domain, i.e. occluded genres (Swales 1996), are also difficult to gain access to because of their semi-confidentiality. As Connor and Upton mention, such examples would include service encounters and nonprofit letters of grant proposals, which they stress are significant discourses in professional communication. A fourth, important reason why general corpora may not furnish suitable data for exploration of academic and professional language is that some general corpora comprise text segments of 2,000 words rather than full texts, which has implications for analysis. Text segments are, in the main, useable for investigation of individual lexical or grammatical items, but unsuitable for more top-down genre-based analyses where the discourse functions of lexicogrammatical items are examined within different sections of a text (cf. Connor et al. 2002; Gledhill 2000; ompson 2000; Upton and Connor 2001; Upton 2002). Bowker and Pearson (2002) also advise using full texts when compiling specialized corpora as the location of specific terms or concepts in a text may be particularly relevant for their full meaning, which echoes Hoey’s (1997) call for examining whether words are associated with a particular positioning in the overall textual organization. Specialized corpora are therefore best suited in terms of their relevance for the purpose of understanding specific types of academic and professional language as general corpora may not be appropriate for this function on account of their internal composition.

16

Lynne Flowerdew

Methodological issues Specialized corpora also have other inherent advantages from a methodological perspective over general corpora, thus further strengthening the argument for using specialized corpora for investigation of language structure and use. ese methodological aspects are discussed in detail below. One major argument against using corpus data to make predictions about language has been put forward by Widdowson (1998, 2002), who maintains that corpus data are but a sample of language, as opposed to an example of authentic language because the data is divorced from the communicative context in which it was created: “the text travels but the context does not travel with it”. Widdowson’s point is a valid one as it cannot be denied that in order to fully and accurately interpret the corpus data it is necessary to be cognizant of the role that the context of situation and context of culture play in shaping the discourse under investigation. is is where I see the value of working with specialized corpora where the analyst is probably also the compiler and does have familiarity with the wider socio-cultural dimension in which the discourse was created (Flowerdew, L. 2003). e compiler-cum-analyst can therefore act as a kind of mediating ethnographic specialist informant to shed light on the corpus data. As Aston (2002:11) notes: “It is much easier to interpret concordances or numerical data if you know exactly what texts a corpus consists of, since this allows a greater degree of top-down processing”. One drawback of working with general corpora is that these lend themselves mainly to quantitative analyses where only broad generalizations on the language data can be made on account of the size of such corpora. Although, as McCarthy and Carter (2001) point out, there is great value in demonstrating statistical evidence over many millions of words and broad contexts in general corpora (the findings of which have been of immense use for the construction of pedagogic grammars and dictionaries), this is but one type of corpus-based analysis. As a general principle, more qualitative-based analyses tend to be carried out on specialized corpora as their size and composition make them more manageable for qualitative studies. e more detailed examination of concordance lines with recourse to both the linguistic co-text and extralinguistic contextual features afforded by qualitative procedures also provides a rich source of data to complement the more quantitative-based studies. Another criticism that has been levelled against corpus-based methods is that the very methodology itself, in the form of concordance and keyword

e argument for using English specialized corpora

17

searches, limits the analysis to a somewhat atomized, bottom-up type of investigation of the corpus data. As Swales (2002) maintains, this analysis is at odds with the more top-down kind of process types of genre-based analyses (Swales 1990) where the starting point is with the macrostructure of the text with a focus on larger units of text in the form of move structure analysis rather than sentence-level lexico-grammatical patterning. However, this criticism is, in part, obviated by the corpus linguistic techniques which can be applied in particular to specialized corpora. In an article (Flowerdew, L. 1998), based on a suggestion by Leech (1991), I advocated that more work could be carried out on the semantic or discourse level of corpus data through inserting tags “to indicate the generic ‘move structures’ such as background, scope, purpose in the introductory sections of reports” (Leech 1991:549). However, this kind of tagging would have to be done manually and would therefore be well-nigh impossible to carry out on very large-scale corpora running into millions of words. But it is possible to apply these more top-down tagging procedures to specialized corpora, which tend to be smaller, and this has been done with promising results by corpus researchers such as Connor et al. (2002), ompson (2000), and Upton (2002). For example, Connor and Upton, working from a modified version of Bhatia’s (1993, 1998) move structure analysis for delineating the rhetorical structures of fundraising discourse, have drawn up a series of seven prototypical move structures (e.g. introduce the cause and /or establish credentials of organization; solicit response) for tagging rhetorical stretches of text in direct-mail letters from philanthropic organizations. Here, we have an example of what Sinclair (2001:xi) refers to as the early human intervention (EHI) method – as opposed to the late or delayed human intervention (DHI) associated with large-scale corpus analysis – where the analysts have a clear goal at the outset and thus construct a corpus and decide on the methodology with a specific purpose in mind. A key methodological aspect of specialized corpora is the comparative nature of many of the investigations, which we do not find to nearly such an extent in the literature on general corpora. As Sinclair (2001:xii) states: “e main investigation technique that is used here [i.e. for investigating small, specialised corpora], and in most EHI studies, is comparison: comparison uncovers differences almost regardless of size”. Sinclair then goes on to give examples of parallel corpora, which are used in translation studies where the source texts and translated texts are aligned with respect to each other (see Bowker and Pearson 2002, Granger et al. 2003; Tognini-Bonelli 2001 for more

18

Lynne Flowerdew

detailed discussion on using parallel corpora for translation). Many other studies of a contrastive nature have been carried out using sets of specialized corpora, which are particularly prominent in the area of learner writing where non-native speaker (NNS) corpora are compared with native speaker (NS) corpora (see Pravec 2002 for a review of learner corpora). In sum, the potential for using specialized corpora for understanding the discourse of the academy and professions is a sine qua non on two accounts: it is unlikely that general corpora can provide adequate linguistic evidence in these domains, and secondly specialized corpora allow for more top-down, qualitative, contextually-informed analyses than those carried out using general corpora. However, specialized corpora are not without their limitations, which are discussed below.

Some caveats in using specialized corpora Several corpus linguists have raised issues concerning the size and representativeness of specialized corpora as well as the generalizability of their findings. In fact, these are thorny issues which have also been widely debated in the literature on corpus studies in general, and to which there seem to be no easy answers. As many researchers have pointed out, there is no ideal size for a corpus. e size is dependant on the needs and purposes of the investigation, and oen, pragmatic factors such as how easily the data can be obtained come into play, i.e. the compiler has to fall back on non-probability sampling techniques involving judgement and convenience (Meyer 2002:44). at being said, a specialized corpus should be of adequate size such that there is a sufficient number of occurrences of a linguistic structure or pattern to validate a hypothesis. Another vexing issue related to the size of a corpus concerns the question of representativeness. As Tognini-Bonelli (2001:57) points out: “We should always bear in mind that the assumption of representativeness ‘must be regarded largely as an act of faith’ (Leech 1991:27), as at present we have no means of ensuring it, or even evaluating it objectively.” is issue of corpus representativeness could be regarded as more crucial as far as specialized corpora are concerned on account of the fact that the representativeness of specialized corpora is usually measured by reference to external selection criteria (i.e. by/for whom the text is produced, what is its subject matter), which are

e argument for using English specialized corpora

19

regarded as somewhat subjective. On the other hand, Williams (2002) sees one way round this dilemma by making a case for using internal selection criteria based on lexical items, which he argues is a more objective means of ensuring the representativity of specialized corpora. However, even though the specialized corpus may be statistically representative of the discourse under investigation, the very nature of qualitative-based approaches to corpus analysis means that we may not be able to draw generalisations from them with the same amount of certainty that we can for quantitative-based analyses. Gavioli (2002) draws attention to this potential tension between the need to ensure that the corpus size is sufficient to obtain large enough quantities of specific language features and the necessity of checking to what extent these features are generalizable within a wider domain. Gavioli notes that her students tended to overgeneralise specific language features of a small ESP corpus of medical texts on hepatitis to a wider domain, which can obviously be misleading. However, one way in which the findings from specialized corpora could be validated is to use a sub-section of a general corpus for comparable purposes. See L. Flowerdew (2003) for an account of how the seven-million word Applied Science component of the BNC was used for checking to what extent the findings from a 250,000-word corpus of technically-oriented writing also occurred in a larger-scale sub-corpus, chosen for its resemblance in text type to the specialized corpus. Having justified the use of specialized corpora, in the following section I will now shi focus and attempt to clarify what is meant exactly by specialized corpora as they have been variously defined.

DeWnitions of the term specialized corpora In the literature on corpus linguistics, we oen find the expression small specialized corpora used, but in this section I will deliberately refrain from using the term small to describe specialized corpora for the following reasons. First, small itself can be open to interpretation, but there seems to be general agreement that a corpus of up to 250,000 words can be considered as small. For Aston (1997), small corpora are in the range of 20,000–200,000 words. Of course, an optimum corpus size is related to the linguistic item under investigation, but that is another matter. It may also be misleading to always equate small with specialized for the reason that while the notion of small corpora entails

20

Lynne Flowerdew

the notion of specialized, as embodied in the studies described in Ghadessy et al. (2001) Small Corpus Studies in ELT, the term specialized corpora does not necessarily imply that the corpora in question are small. Some specialized corpora run into millions of words such as the 5-million word Cambridge and Nottingham Corpus of Discourse in English (CANCODE) (see Hughes and McCarthy 1998; McCarthy 1998; McCarthy and Carter 2002). e CANCODE corpus is specialized in the sense that it consists of conversations taking place in informal settings such as shops, offices and educational institutions. Other specialized corpora such as those summarized in J. Flowerdew (1996:101) comprise just thousands of words (cf. the Economics Corpus of 20,749 words, Mparutsa et al. 1991). Moreover, it is also necessary to examine in what sense corpora can be classified as specialized. De Beaugrande (2001:11) views a specialized corpus as “delimited by a specific register, discourse domain, or subject matter.” e most comprehensive definition can be found in Hunston (2002): Specialised corpus: A corpus of texts of a particular type, such as newspaper editorials, geography textbooks, academic articles in a particular subject, lectures, casual conversations, essays written by students etc. It aims to be representative of a given type of text. It is used to investigate a particular type of language. Researchers oen collect their own specialised corpora to reflect the kind of language they want to investigate. ere is no limit to the degree of specialisation involved, but the parameters are set to limit the kind of texts included. For example, a corpus might be restricted to a time frame, consisting of texts from a particular century, or to a social setting, such as conversations taking place in a bookshop, or to a given topic such as newspaper articles dealing with the European Union (Hunston 2002:14).

Although Hunston lists learner corpora as other types of corpora, these can also be encompassed within the definition of specialised above. According to Kennedy’s (1998:20) description of specialised corpora, major types to be included in this category would include those compiled for studies of regional or sociolinguistic variation such as dialect corpora, regional corpora, nonstandard corpora and learner corpora. Whilst acknowledging the value of these types of specialized corpora, it is also necessary to underscore the increasing importance of various types of academic and professional specialized corpora, which are reviewed briefly in the following section. Another aspect to consider regarding specialized corpora is that the largerscale specialized corpora can themselves be divided into smaller specialised

e argument for using English specialized corpora

21

sub-corpora. For example, the 1.5 million word Michigan Corpus of Academic Spoken English, MICASE, can be divided into specialized sub-corpora such as lectures, seminars, service encounters and tutorials (see Powell and Simpson 2001 for a detailed description of the classification of classroom and non-classroom speech events). e term specialized can therefore be either inclusive or exclusive in nature depending on the type of specialized corpus under investigation. Specialized corpora are also referred to as special-purpose corpora by some corpus linguists (Bowker and Pearson 2002; Meyer 2002) as a reflection of the fact that corpora are always designed with a particular purpose in mind. While some features, such as written / spoken; diachronic / synchronic; historical / contemporary, can be considered as applicable to both general and specialized corpora, other features could be regarded as defining specialized corpora. Below, I summarize the parameters by which corpora can generally be considered as specialized, with examples to illustrate these. Although the parameters in Table 1 below are presented as discrete categories, there is, in fact, an overlap between some of them; for example, the communicative purpose under Contextualization is also an aspect of genre. Likewise, subject matter is also closely aligned with the type of text / discourse under investigation. Table 1. Parameters for deWning corpora as specialized Parameters

Details / Examples

SpeciWc purpose for compilation: Contextualization:

To investigate particular grammatical, lexical, lexicogrammatical, discoursal or rhetorical features Setting (e.g. lecture hall) Participants (role of speaker / listener; writer / reader) Communicative purpose (e.g. promote, instruct)

Size: whole corpus subcorpus or small-scale corpus

1–5 million words 20,000–250,000 words

Genre:

Promotional (grant proposals, sales letters)

Type of text / discourse:

Biology textbooks, casual conversations

Subject matter / topic:

Economics, the weather

Variety of English:

Learner, non-standard (e.g. Indian, Singaporean)

22

Lynne Flowerdew

Overview of developments in specialized corpora is section first provides a broad overview of developments of specialized corpora which are of relevance to academic and professional discourse and then makes some predictions as to ways in which the field is likely to develop during the next few years.

Academic discourse e use of specialized corpora for specific academic purposes originated in the late 1980’s with the work of Tim Johns, whose pioneering work using corpora drawn from the fields of plant biology and engineering showed the huge potential of corpora and concordancing techniques for language analysis and classroom practice (Johns 1991, 1994). is tradition continues over a decade later with research on the written discourse of academics from various scientific fields having been carried out by Hyland (2002), Oakey (2003) and Tribble (2002), among many others. Register variation across different academic disciplines is discussed in Biber et al. (1998). One must also acknowledge the valuable contribution that the findings from learner corpora have played in understanding written interlanguage of a general academic nature. Various studies from a contrastive perspective, of grammar, lexis and discourse in learner corpora, are reported in Granger (1998).2 Turning now to the question of speech corpora, in the past couple of years it is quite noticeable that more attention has been paid to the compilation of spoken academic corpora for the development of appropriate teaching materials for non-native students. e British Academic Spoken English Corpus (BASE) at present consists of academic lectures with provision for a sub-corpus of seminars to be included in the future (Nesi 2000), and MICASE comprises a variety of classroom and non-classroom speech events (Simpson and Swales 2001). Yet, another large-scale initiative is underway to construct a corpus of spoken language from academic settings for the development of TOEFL exam tasks (Biber et al. 2001).3

Professional discourse Specialized corpora have not only been compiled for understanding various types of language in the academy, but have also been built to shed light on the

e argument for using English specialized corpora

23

prototypical language patterns and functions of various professional domains, mostly notably in the business field. Various discourses belonging to the promotional genre have received attention in corpus-based studies: persuasive appeals in direct mail letters were investigated by Connor and Gladkov (this volume); the use of evaluative statements in product reviews was researched by Bowker and Pearson (2002), and the salient language of project proposals was examined by Tribble (2000). Charteris-Black (2000) investigated the use of metaphor in a corpus of e Economist magazine. During the past couple of years, more research has been undertaken on contrastive corpus-based studies of professional vs. apprentice discourse. For example, Upton and Connor (2001) have analysed the politeness strategies used by Americans, Finns and Belgians in letters of application. Hewings and Hewings (2002) compared the use of anticipatory ‘it’ clauses in a corpus of published journal articles taken from the field of Business Studies with a corpus of MBA student dissertations written by non-native speakers. e Hong Kong Corpus of Spoken English (HKCSE) has yielded valuable insights into the language used in service encounters, providing useful data for training purposes of staff in the service industries (see Cheng and Warren 2000, and the individual articles by Warren and Cheng this volume).

Future pathways In contemplating the future directions specialized corpora are likely to take, I consider that these developments are dependent on the following factors: the increasing realisation of the importance of the ethnographic dimension and extralinguistic content for interpretation of corpus data; the availability of corpora; currently under-researched discourse genres and varieties of English; and the changing-face of English for professional communication in the era of globalisation (cf. Seidlhofer’s work 2000, 2001 on the Lingua Franca Corpus). In view of the fact that specialized corpora are relatively easy to compile and both researchers and classroom practitioners sometimes prefer to work with corpora which fit their own socio-cultural context, I would expect there to be an expansion in the compilation of localised corpora. Such an approach would then allow the analyst to better consider the ethnographic dimension of corpus analysis given their familiarity with the socio-cultural context in which the text was created. Moreover, as many corpus studies are now recognising the impor-

24

Lynne Flowerdew

tance of the ethnographic dimension (Hyland 1999, 2000) and extralinguistic information in the analysis of corpus data, it is expected that more use will be made of sound and video recordings of spoken corpus data (see Nesi’s 2000 compilation of academic lectures) to examine the effect of paralinguistic and prosodic features on communication. e web is a vast repository of corpus data and is increasingly being mined for compilation and exploitation as an electronic bank of specialized corpora (Foucou and Kubler 2000; Meyer et al. 2003; Ooi 2001, 2002; Pearson 2000; Renouf 2003).4 Ease of accessibility is therefore a determining factor in the construction of specialized corpora using web sources. However, see the discussion on the Corpus Linguistics list (http://www.hit.uib.no/corpora/) on the legal issues of downloading web-based sources for corpus linguistics research. At the same time, efforts are needed to compile specialized corpora which are less accessible because they belong to the category of occluded genres, leading to certain types of discourse being under-researched. Section 4 of this volume reports on various studies of philanthropic discourse, a genre which hitherto has received negligible attention in corpus-based studies. A particularly underresearched area to date, most probably because it is also an ‘occluded genre’, is that of legal discourse so the article by Bhatia et al. (this volume) is a welcome contribution to the field. ere are also some lacunae in the compilation of specialized corpora for particular varieties of English. It has been noted above that a substantial amount of research has been carried out on learner English of a general academic nature, most notably on the argumentative writing of undergraduate students. More initiatives on building learner corpora targeting specialised English, such as the Indiana Business Learner Corpus (IBLC) described in Connor et al. (2002), would seem to be another area which merits attention in the future. Another feature about most of the investigations into learner corpora to date is that they are synchronic studies; more research of a diachronic nature would therefore seem to be another area for future exploitation. In fact, Connor (2002:506) calls for the field of contrastive rhetoric “to study texts diachronically to identify the evolution of patterns and norms.” Granger (2002) emphasises the usefulness of CLC (Computer Learner Corpora) in Second Language Acquisition studies where diachronic studies are the norm, which reflects the growing interdisciplinarity of the field. Over the last decade or so, the dominance of native-speaker norms has been called into question by sociolinguists such as Kachru (1990) and World

e argument for using English specialized corpora

25

Englishes (i.e. non-standard varieties of English) are now being given more recognition. Developments in corpus linguistics have reflected this changing socio-linguistic scene with the compilation of the International Corpus of English (ICE) (see Greenbaum 1996), comprising varieties of English worldwide such as the one-million ICE-GB component (Nelson et al. 2002) and the ICE-HK sub-corpus (Bolton et al. 2004). Corpus linguists with an interest in the field of professional communication are also questioning the traditional distinction made between NS and NNS varieties of English. Seidlhofer (2000, 2001) proposes the notion of an English as a Lingua Franca (ELF) corpus on the grounds that native speaker norms are no longer so applicable in the era of globalisation of English, where English is used as a lingua franca for professional communication largely among non-native speakers. A recently instigated corpus project is the compilation of the Vienna Oxford International Corpus of English (VOICE), which at present consists of various speech events captured in international business situations. e exploration of how successful professional communication is achieved by speakers whose first language is not English will no doubt receive more coverage in future studies.

Guidelines for building a specialized corpus is final section presents a set of general guidelines for building a specialized corpus and the considerations that need to be taken into account. – What is the purpose for building a specialized corpus? Specialized corpora are constructed with an a priori purpose in mind. e purpose determines, to a large extent, the other points below. – What genre is to be investigated? In the broadest sense, genre could be subdivided into academic and professional genres. Article introductions, student dissertations and laboratory reports would fall within the domain of academic genres. Sales letters and product promotion are examples of genres in business settings, and legislative provisions and cases examples of genres in a professional setting (see Bhatia 1993). – How large should the specialized corpus be? e appropriate size for a specialized corpus is highly dependent on the phenomenon that is being investigated. e lower the frequency of the fea-

26

Lynne Flowerdew

ture one wishes to investigate, the larger the corpus should be. Conversely, if the intention is to investigate the more common features of language such as grammatical items, the corpus can be smaller. In fact there is no optimum size, but what is of paramount importance is that the size of the specialized corpus must be closely matched with the features under investigation. As suggested by Barnbrook (1996), a pilot survey could be carried out initially to establish whether the corpus yields suffcient data for analysis. – Is the specialized corpus representative of the genre? It has already been pointed out that representativeness is not a clear-cut issue. However, specialized corpora are generally considered as representative of the genre under investigation if they contain numerous texts from a variety of authors so that no one authorial style would dominate and typical lexical or grammatical patterns would be revealed. In general, full texts are preferred over extracts to ensure all features of the genre are accounted for. – How will data be collected? e data collection for specialized corpora is oen highly dependent on the genre under investigation. If an occluded genre is targeted for collection, the compiler may well have to rely on a combination of judgement and convenience sampling as the data may only be available from a limited number of sources, which may affect the size and representativeness of the specialized corpus (see Meyer 2002 for a discussion on different types of sampling frames). It goes without saying that it is much easier to compile specialized corpora from texts which are already in electronic form, such as in CD-ROM format, in on-line databases or available on the Web (see Bowker and Pearson 2002 for details on compiling corpora from electronic sources). – How will the specialized corpus be tagged / marked up? Specialized corpora are not always tagged, but their size makes them more amenable to manual tagging. Many of the well-known learner corpora, such as the International Corpus of Learner English (ICLE) have been marked up with an error tagset to aid future analysis of the corpus (see Dagneaux et al. 1998). More oen, though, a specific tagset is devised according to the purpose of the investigation (e.g. Upton and Connor 2001). Also, depending on the purpose of the analysis, the tags can be syntactic, semantic, discourse-based or rhetorical. However, some corpus linguists prefer to work with untagged text as they maintain that one should approach the data without any pre-conceived notions to arrive at a full exploitation of the corpus data. According to Tognini-Bonelli (2001:73) annotation (i.e. tagging) entails the danger that the data will be made to fit the theory, thereby insulating some of the data from

e argument for using English specialized corpora

27

analysis. As a rebuttal of this position, Aarts (2002) makes the point that if one focuses on the annotation process rather than the result, one would be testing rather than merely applying, the annotation scheme and changes could be made where needed. – What kind of reference corpus would be suitable to contrast with the specialized corpus? It has been noted that many of the analyses involving specialized corpora, especially learner corpora, rely on some kind of reference corpus for contrastive analysis purposes. However, Leech (1998:xix) points out that caution is in order when comparing a learner corpus with a reference corpus, where it tends to be the norm to isolate variables in the learner corpus in terms of which the reference corpus differs. Leech cautions against always attributing these variables to the native languages concerned as they may be due to other factors such as differences in cultural context, or in educational institutions and practices. A reference corpus, such as the BNC, which is usually much bigger than the specialized corpus and of a general nature, is oen used as a yardstick for comparison purposes. As Scott (http://www.hit.uib.no/corpora/) points out, however, there are several modes of comparison, which again, relate to the research questions posed on the specialized corpus. Scott mentions that one text could be compared with the set of all the texts in the specialized corpus if the purpose is to determine whether it exhibits a high degree of features characteristic of the genre under investigation. In the past, it has been customary to use such large-scale corpora as the BNC for contrastive purposes, but in future it is expected that more use may well be made of international corpora or corpora targeting different varieties of English for reference purposes given the increasing use of English as a lingua franca.

Concluding remarks e retrospective dimension of this chapter has shown how far the field has advanced in the last decade for expanding the linguistic knowledge-base of academic and professional discourse through investigation of various specialized corpora; the prospective dimension holds promises of exciting developments in future endeavours. At the same time, the guidelines for building specialized corpora have highlighted the ever-changing landscape in this burgeoning field.

28

Lynne Flowerdew

Notes 1. See McEnery and Wilson (1996), Meyer (2002) and Stubbs (2001) for more detailed discussion on the status of corpus linguistics vis-à-vis Chomskyan linguistics. 2. See, for example, the articles by Lorenz on overstatement in advanced learners’ writing, Virtanen on the use of direct questions in argumentative student writing, Biber and Reppen on the use of complement clauses by elementary students; Aijmer’s study on modality in advanced learners’ written interlanguage in Granger (2002), and Altenberg and Granger’s (2001) article on the grammatical and lexical patterning of make in non-native student writing. 3. See L. Flowerdew 2002 for further discussion on the uses of written and spoken corpora for general and specific academic purposes 4. See also Bowker and Pearson (2002) for useful suggestions on exploiting web sources for investigating LSP, language for special purposes.

References Aarts, J. 2002. “Review of ‘Corpus linguistics at work.’” International Journal of Corpus Linguistics 7 (1):118–123. Aijmer, K. 2002. “Modality in advanced Swedish learners’ written interlanguage.” In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, S. Granger (ed.), 55–76. Amsterdam: John Benjamins. Altenberg, B. 1993. “Recurrent verb-complement constructions in the London-Lund Corpus.” In English Language Corpora: Design, Analysis and Exploitation, J. Aarts, P. de Haan and N. Oostdijk (eds), 227–245. Amsterdam: Rodopi. Altenberg, B. and Granger, S. 2001. “e grammatical and lexical patterning of make in native and non-native student writing.” Applied Linguistics 22:177–194. Aston, G. 1997. “Small and large corpora in language learning.” In Practical Applications in Language Corpora, B. Lewandowska-Tomaszczyk and J. Melia (eds), 51–62. Łódź: Łódź University Press. Aston, G. 2001. “Text categories and corpus users: a response to David Lee.” Language Learning & Technology 5 (3):73–76. Aston, G. 2002. “e learner as corpus designer.” In Language and Computers: Studies in Practical Linguistics, B. Kettemann and G. Marko (eds), 9–25. Amsterdam: Rodopi. Barlow, M. 1996. “Corpora for theory and practice.” International Journal of Corpus Linguistics 1 (1):1–37. Barnbrook, G. 1996. Language and Computers. Edinburgh Textbooks in Empirical Linguistics. Edinburgh: Edinburgh University Press. Bhatia, V.K. 1993. Analysing genre: language use in professional settings. London: Longman.

e argument for using English specialized corpora

29

Bhatia, V.K. 1998. “Generic patterns in fundraising discourse.” New Directions for Philanthropic Fundraising 22, 95–110. Biber, D. and Reppen, R. 1998. “Comparing native and learner perspectives on English grammar: A study of complement clauses.” In Learner English on Computer, S. Granger (ed.), 145–158. London: Addison-Wesley-Longman. Biber, D., Conrad, S. and R. Reppen 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Biber, D., Johansson, S., Conrad, S. and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow, UK: Longman. Biber, D., Reppen, R., Clark, V. and Walter, J. 2001. “Representing spoken language in university settings: e design and construction of the spoken component of the T2KSWAL Corpus.” In Corpus Linguistics in North America, R. Simpson and J. Swales (eds), 48–57. Ann Arbor: University of Michigan Press. Bolton, K., Gisborne, N., Hung, J. and Nelson, G. (forthcoming 2004) e International Corpus of English Project in Hong Kong. Amsterdam: John Benjamins. Bowker, L. and Pearson, J. 2002. Working with Specialized Language: A practical guide to using corpora. London: Routledge. Burnard, L. 2002. “A retrospective look at the British National Corpus.” In Language and Computers: Studies in Practical Linguistics, B. Kettemann and G. Marko (eds), 51–70. Amsterdam: Rodopi. Charteris-Black, J. 2000. “Metaphor and vocabulary teaching in ESP economics.” English for SpeciWc Purposes 19:149–165. Cheng, W. and Warren, M. 2000. “e Hong Kong Corpus of Spoken English: Language learning through language description.” In Rethinking Language Pedagogy from a Corpus Perspective, L. Burnard and T. McEnery (eds), 133–144. Frankfurt am Main: Peter Lang. Connor, U. 2002. “New directions in contrastive rhetoric.” TESOL Quarterly 36(4): 493–510. Connor, U., Precht, K. and Upton, T. 2002. “Business English: Learner data from Belgium, Finland and the U.S.” In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, S. Granger, J. Hung and S, Petch-Tyson (eds), 175–194. Amsterdam: John Benjamins. Conrad, S. 2000. “Will corpus linguistics revolutionize grammar teaching in the 21st century?” TESOL Quarterly 34(3):548–560. Dagneaux, E., Denness, S. and Granger, S. 1998. “Computer-aided error analysis.” System 26:163–74. De Beaugrande, R. 2001. “Large corpora, small corpora, and the learning of language.” In Small Corpus Studies and ELT, M. Ghadessy, A. Henry and R. Roseberry (eds), 3–28. Amsterdam: John Benjamins. Flowerdew, J. 1996 “Concordancing in language learning.” In e Power of CALL, M. Pennington (ed.), 97–113. Houston, TX: Athelstan. Flowerdew, L. 1998. “Corpus linguistic techniques applied to textlinguistics.” System 26 (4): 541–552.

30

Lynne Flowerdew

Flowerdew, L. 2002. “Corpus-Based Analyses in EAP.” In Academic Discourse, J. Flowerdew (ed.), 95–114. London: Longman. Flowerdew, L. 2003. “A combined corpus and systemic-functional analysis of the ProblemSolution pattern in a student and professional corpus of technical writing.” TESOL Quarterly 37 (3):489–511. Foucou, P-Y, and Kübler, N. 2000. “A web-based environment for teaching technical English.” In Rethinking Language Pedagogy from a Corpus Perspective, L. Burnard and T. McEnery (eds), 65–73. Frankfurt am Main: Peter Lang. Gavioli, L. 2002. “Some thoughts on the problem of representing ESP through small corpora.” In Language and Computers: Studies in Practical Linguistics, B. Ketteman and G. Marko (eds), 293–303. Amsterdam: Rodopi. Gledhill, C. 2000. “e discourse function of collocation in research article introductions.” English for SpeciWc Purposes, 19 (2):115–135. Ghadessy, M., Henry, A. and Roseberry, R. (eds) 2001. Small Corpus Studies and ELT. Amsterdam: John Benjamins. Granger, S. (ed.) 1998. Learner English on Computer. London: Longman. Granger, S. 2002. “A Bird’s eye view of learner corpus research.” In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, S. Granger, J. Hung and S. Petch-Tyson. (eds), 3–33. Amsterdam: John Benjamins. Granger, S., Lerot., J. and Petch-Tyson, S. (eds) 2003. Corpus-based Approaches to Contrastive Linguistics and Translation Studies. Amsterdam: John Benjamins. Greenbaum, S. (ed.) 1996. Comparing English Worldwide: e International Corpus of English. Oxford: Oxford University Press. Hewings, M. and Hewings, A. 2002. “‘It is interesting to note that…’: A comparative study of anticipatory ‘it’ in student and published writing.” English for SpeciWc Purposes 21 (4):367–383. Hoey, M. 1997. “From concordance to text structure: new uses for computer corpora.” In Practical Applications in Language Corpora, B. Lewandowska-Tomaszczyk and J. Melia (eds), 2–23. Łódź: Łódź University Press. Hughes, R. and McCarthy, M. 1998. “From sentence to discourse: discourse grammar and English language teaching.” TESOL Quarterly 32 (2):263–287. Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press. Hunston, S. and Francis, G. 2000. Pattern Grammar. Amsterdam: John Benjamins. Hyland, K. 1999. “Disciplinary discourses: writer stance in research articles.” In Writing: Texts, Processes and Practices, C. Candlin and K. Hyland (eds), 99–121. London: Longman. Hyland, K. (ed.) 2000. Disciplinary discourses: Social interactions in academic writing. London: Longman. Hyland, K. 2002. “Activity and evaluation: Reporting practices in academic writing.” In Academic Discourse, J. Flowerdew (ed.), 115–130. London: Longman. Johansson, S. 1991. “Times change, and so do corpora.” In English Corpus Lingustics, K. Aijmer and B. Altenberg (eds), 305–314. London: Longman. Johns, T. 1991. “Should you be persuaded: two examples of data-driven learning”. ELR Journal (New Series) 4, 1–16.

e argument for using English specialized corpora

31

Johns, T. 1994. “From printout to handout: grammar and vocabulary teaching in the context of data-driven learning.” In Perspectives on Pedagogical Grammar, T. Odlin (ed.), 293–313. Cambridge: Cambridge University Press. Kachru, B. 1990. e Alchemy of English: e Spread, Functions and Models of Non-native Englishes. Urbana: University of Illinois Press. Kennedy, G. 1998. An Introduction to Corpus Linguistics. London: Longman. Kjellmer, G. 1990. “Patterns of collocability.” In English Language Corpora: Design, Analysis and Exploitation, J. Aarts and W. Meijs (eds), 163–178. Amsterdam: Rodopi.. Lee, D. 2001. “Genres, registers, text types, domains and styles: clarifying the concepts and navigating a path through the BNC jungle.” Language Learning and Technology 5 (3): 37–72. Leech, G. 1991. “e state of the art in corpus linguistics.” In English Corpus Lingustics, K. Aijmer and B. Altenberg (eds), 8–29. London: Longman. Leech, G. 1998. “Preface to ‘Learner English on computer.’” In Learner English on Computer, S. Granger (ed.), xiv–xx. London: Addison-Wesley-Longman. Lorenz, G. 1998. “Overstatement in advanced learners’ writing: Stylistic aspects of adjective intensification.” In Learner English on Computer, S. Granger (ed.), 53–66. London: Addison-Wesley-Longman. Mparutsa, C., Love, A. and Morrison, A. 1991. “Bringing concord to the ESP classroom”. In Classroom Concordancing, T. Johns and P. King (eds.), 115-34. Birmingham University: Birmingham. McCarthy, M. 1998. Spoken Languages and Applied Linguistics. Cambridge: Cambridge University Press. McCarthy, M. and Carter, R. 2001. “Size isn’t everything: Spoken English, corpus and the classroom.” TESOL Quarterly 35(2):337–340. McCarthy, M. and Carter, R. 2002. “Ten criteria for a spoken grammar.” In New Perspectives on Grammar Teaching in Second Language Classrooms, E. Hinkel and S. Fotos (eds), 51–75. Hillsdale, NJ: Lawrence Erlbaum. McEnery, T. and Wilson, A. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press. Meyer, C. 2002. English Corpus Linguistics. Cambridge: Cambridge University Press. Meyer, C. et al. 2003. “e World Wide Web as Linguistic Corpus.” In Corpus Analysis: Language Structure and Language Use, P. Leistyna and C. Meyer (eds), 241-254. Amsterdam: Rodopi. Nelson, G., Wallis, S. and Aarts, B. 2002. Exploring Natural Language. Working with the British Component of the International Corpus of English. Amsterdam: John Benjamins. Nesi, H. 2000. “A corpus-based analysis of academic lectures across disciplines.” In Language across Boundaries, J. Cotterill and A. Ife (eds), 201–218. London and New York: Continuum. Oakey, D. 2003. “A corpus-based study of the formal and functional variation of a lexical phrase in different academic disciplines in English.” In Using Corpora to Explore Linguistic Variation, R. Reppen, D. Biber and S. Fitzmaurice (eds), 111–129. Amsterdam: John Benjamins.

32

Lynne Flowerdew

Ooi, V. 2001. “Investigating and teaching genres using the World Wide Web.” In Small Corpus Studies and ELT, M. Ghadessy, A. Henry and R. Roseberry (eds), 175–203. Amsterdam: John Benjamins. Ooi, V. 2002. “From Shakespeare to Hungarian EFL writing: using WWW corpora to motivate student learning.” In Corpus Studies in Language Education, M. Tan (ed.), 163–177. Bangkok: IELE Press. Partington, A. 1998. Patterns and Meanings: Using Corpora for English Language and Teaching. Amsterdam: John Benjamins. Pearson, J. 2000. “Surfing the Internet: teaching students to choose their texts wisely.” In Rethinking Language Pedagogy from a Corpus Perspective, T. McEnery and L. Burnard (eds), 235–239. Frankfurt am Main: Peter Lang. Powell, C. and Simpson, R. 2001. “Collaboration between corpus linguists and digital librarians for the MICASE web search interface.” In Corpus Linguistics in North America, R. Simpson and J. Swales (eds), 32–47. Ann Arbor: University of Michigan Press. Pravec, N. 2002. “Survey of learner corpora.” ICAME Journal 26:8–14. Renouf, A. 2003. “WebCorp: providing a renewable data source for corpus linguists.” In Extending the scope of corpus-based research: New applications, new challenges, S. Granger and S. Petch-Tyson (eds), 39-58. Amsterdam: Rodopi. Seidlhofer, B. 2000. “Mind the gap: English as a mother tongue vs. English as lingua franca.” Vienna English Working Papers 9 (1):52–69. Seidlhofer, B. 2001. “Closing a conceptual gap: e case for a description of English as a lingua franca.” International Journal of Applied Linguistics 11 (2):133–159. Simpson, R. and Swales, J. (eds.) 2001. Corpus Linguistics in North America. Ann Arbor: University of Michigan Press. Sinclair, J.McH. 1987a. Looking Up: An account of the COBUILD Project. London and Glasgow: Collins ELT. Sinclair, J.McH. (ed.-in-chief) 1987b. Collins COBUILD English Language Dictionary. London: HarperCollins. Sinclair, J.McH. 2001. “Preface to small corpus studies and ELT.” In Small Corpus Studies and ELT, M. Ghadessy, A. Henry and R. Roseberry (eds), vii–xv. Amsterdam: John Benjamins. Stubbs, M. 2001. Words and Phrases: corpus studies of lexical semantics. Oxford: Blackwell. Swales, J. 1990. Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press. Swales, J. 1996. “Occluded genres in the academy.” In Academic Writing, E. Ventola and A. Mauranen (eds), 45–58. Amsterdam: John Benjamins. Swales, J. 2002. “Integrated and fragmented worlds: EAP materials and corpus linguistics.” In Academic Discourse, J. Flowerdew (ed.), 150–164. London: Longman. Taylor, L., Leech, G. and Fligelstone, S. 1991. “A survey of English machine readable corpora.” In English Computer Corpora, S. Johansson and A-B Stenström (eds), 319–354. Berlin: Mouton de Gruyter. ompson, P. 2000. “Citation practices in PhD theses.” In Rethinking Language Pedagogy from a Corpus Perspective, L. Burnard and T. McEnery (eds), 91–101. Frankfurt: Peter Lang.

e argument for using English specialized corpora

33

Tognini-Bonelli, E. 2000. Corpus Linguistics at Work. Amsterdam: John Benjamins. Tribble. C. 2000. “Genres, keywords, teaching: towards a pedagogic account of the language of project proposals.” In Rethinking Language Pedagogy from a Corpus Perspective, L. Burnard and T. McEnery (eds), 75–90. Frankfurt am Main: Peter Lang. Tribble, C. 2002. “Corpora and corpus analysis: New windows on academic writing.” In Academic Discourse, J. Flowerdew (ed.), 131–149. London: Longman. Upton, T. 2002. “Understanding direct mail letters as a genre.” International Journal of Corpus Linguistics 7 (1):65–85. Upton, T. and Connor, U. 2001. “Using computerized corpus analysis to investigate the textlinguistic discourse moves of a genre.” English for SpeciWc Purposes 20 (4):313–329. Virtanen, T. 1998. “Direct questions in argumentative student writing.” In Learner English on Computer, S. Granger (ed.), 94–106. London: Addison-Wesley-Longman. Widdowson, H.G 1998. “Context, community and authentic language.” TESOL Quarterly 32 (4):705–716. Widdowson, H.G. 2002. “Corpora and language teaching tomorrow.” Keynote lecture delivered at 5th Teaching and Language Corpora Conference, Bertinoro, Italy, 29 July. Williams, G. 2002. “In search of representativity in specialised corpora.” International Journal of Corpus Linguistics 7 (1):43–64.

34

Rita C. Simpson

Formulaic expressions in academic speech

Section II

35

36

Rita C. Simpson

Formulaic expressions in academic speech

37

Stylistic features of academic speech: The role of formulaic expressions Rita C. Simpson e University of Michigan

Introduction e study of formulaic language has quite a rich and growing tradition in the literature of several sub-fields of linguistics, including language teaching, second language acquisition, psycholinguistics, and corpus linguistics. Pawley and Syder (1983) and Nattinger and DeCarrico (1992) were two important early pieces of work that set the stage for this line of research, and Wray’s more recent work (1999, 2000, 2002) has done much to advance the field. From a corpus-based standpoint, Biber et al. in the 1999 Longman Grammar of Spoken and Written English discuss what they call lexical bundles in some detail, comparing academic prose to general conversation, and additional work in that tradition has been done more recently by Cortes et al. (2002) and Cortes (2002). Altenberg (1998) also conducted a corpus-based study of recurrent word combinations in the London-Lund Corpus of spoken English. Schmitt and Carter (2004) discuss a number of issues related to defining and researching formulaic language, especially as these relate to the acquisition and processing of such expressions. A recurring theme throughout the literature on formulaic language is that an ability to understand and use formulaic language appropriately is a key to native-like fluency. In fact, according to Fernando (1996:234), “No translator or language teacher can afford to ignore idioms or idiomaticity if a natural use of the target language is an aim.” However, Pawley and Syder (1983) make a strong case for the daunting nature of the task learners face in figuring out which grammatically possible utterances are commonly used by native speak-

38

Rita C. Simpson

ers – that is, which are idiomatic – and which utterances, though grammatically possible, are not native-like. Wray (1999) supports this claim, adding that the absence of formulaic sequences in learners’ speech results in unidiomaticsounding speech. Nattinger and DeCarrico (1992) take this argument a step further and present a typology and pragmatic analysis of what they call lexical phrases, along with a number of suggestions for incorporating them into the L2 curriculum. A few other general findings from the previous research on formulaic language are relevant to the current study. First, several researchers note that conversation is oen regarded as extremely formulaic, making use of a large number of recurring expressions. Biber et al. (1999) found that general conversation contains a higher proportion of formulaic language than academic prose. Nattinger and DeCarrico (1992) note that academic lectures are full of formulaic phrases that function as important directional signals. ey also claim that a great many of the lexical phrases that occur in transactional discourse (such as lectures) do not occur in interactional discourse (such as casual conversation) – a claim which necessitates corpus-based research such as the present study to confirm. McCarthy and Carter (in press) discuss high-frequency clusters in conversation as critical units of interaction, and argue that it is in pragmatic categories that we are likely to find the reasons why so many strings of words are recurrent. All the research so far on academic speech indicates that it is full of various signposting and metalinguistic markers. In fact, a growing number of studies on the Michigan Corpus of Academic Spoken English (MICASE) relate in some way to the topic of formulaic language – namely, Swales and Malczewski (2001) on “New Episode Flags” like okay, so now; Mauranen’s studies of metalanguage (2001) and formulae (2003), Poos and Simpson’s (2002) study of the hedges kind of and sort of, and Swales’ (2001) article on point and thing, in which he mentions phrases like at this point and the thing is. Other researchers have studied formulaic language in academic writing (Cortes et al. 2002; Cortes 2002; Oakey 2002), and have found that academic writing is rich in discourse structuring and stance expressions, some of which overlap with other spoken and written registers, and others of which seem particularly characteristic of academic prose. e present study analyzes the role of high frequency formulaic sequences in academic speech — specifically in MICASE (Simpson etal. 2002). First, approximately 200 expressions are identified on the basis of both frequency in

Formulaic expressions in academic speech

39

the corpus and structural coherence as phrasal units. ese formulaic expressions fall primarily into the category of what Wray and Perkins (2000) identify as discourse structuring sequences, which they argue aid both comprehension and production. Because academic discourse is a complex and sometimes technical register, it is especially rich in such discourse structuring devices, such as expressions used for focus, summarizing or paraphrasing, sequencing the discourse, or referring to previous or upcoming portions of the talk. e quantitative analysis begins by presenting some cross-corpus frequency comparisons of the formulaic expressions, followed by comparisons across speaker categories and situations of use. e cross-corpus comparisons are used to discover which high-frequency formulaic expressions are shared by different spoken genres, and which are more typical of academic speech. Because some of these phrases constitute an important component of the repertoire of academic speech, we would expect them to be more frequent in the speech of proficient academic speakers (i.e. professors) as compared to the speech of students, who are less experienced in this speaking style. e comparisons across speaker categories thus explore one aspect of the larger question of how spoken academic registers develop. Finally, I examine the meanings and functions of a few very characteristic academic formulas from a pragmatic perspective, and discuss implications of the findings for the teaching of English for Academic Purposes (EAP). e motivation for this research stems from two parallel interests. First, it represents an extension of earlier research on the frequencies, forms, and functions of idioms in academic speech (Simpson and Mendis 2003). In that research, we adopted a very narrow definition of idioms, concentrating on relatively low-frequency, opaque, metaphorically rich expressions such as ring a bell and a dime a dozen. But a number of high-frequency phrases, such as in other words, and so on, and more or less, came to my attention at that time as equally idiomatic, but somehow representative of a different lexical category. A related research interest motivating the present study is whether and how we can identify lexical chunks or formulae that typify an academic speaking style. is study thus attempts to find out if there is a set of stock phrases that academics across the spectrum tend to rely on in their speech. So this research builds on the previous work on idioms, and extends that research by attempting to describe and characterize one aspect of style in academic speech as it relates to idiomaticity in a broad sense.

40

Rita C. Simpson

Methodology The corpus is research is based on the MICASE, a spoken language corpus of approximately 1.7 million words (200 hours) of contemporary university speech recorded at the University of Michigan between 1997 and 2001. Speakers represented in the corpus include faculty, staff, and all levels of students, and include both native and non-native speakers (comprising 88% and 12% of the overall speech, respectively). e data collection for the corpus involved recording entire speech events sampled across student levels and academic divisions, including a variety of non-classroom academic speech events as well as the more traditional academic speech genres such as lectures, seminars, and class discussions. e impetus for the development of MICASE came from the teaching and testing divisions of the University’s English Language Institute, which envisioned such a corpus primarily as a resource for the development of materials for the speaking and listening components of their EAP courses and tests. MICASE is one of the few publicly available, web-searchable corpora; the online interface, which permits browsing or searching the corpus according to a number of speaker and speech event variables, can be found at http: //www.hti.umich.edu/m/micase.

The comparison corpora As mentioned above, the quantitative part of this study begins with a comparison of the frequencies of formulaic expressions across several different corpora of speech. ree corpora were chosen for comparison purposes: e Corpus of Spoken Professional American English (CSPAE), the Bank of English National Public Radio subcorpus (NPR), and the Switchboard Corpus (SWB). ese corpora were chosen because they were the only sizable corpora of spontaneous spoken American English available at the time of the study. Of these three corpora, the one that is most similar to MICASE in terms of speech genre is CSPAE. is is a two-million-word corpus consisting of one million words of speech from White House press conferences and one million words of faculty committee meetings. e NPR corpus consists of over three million words of news radio broadcasts from National Public Radio. Switchboard is a corpus of casual phone conversations, approximately 30 minutes

Formulaic expressions in academic speech

41

each, recorded between strangers who were recruited specifically for the purpose of constructing the corpus and were given suggested topics of conversation. As it is the only corpus containing casual conversation, it is important for comparative purposes; but since it represents an unusual, contrived situation, it is less than ideal as an example of naturally occurring speech.

Analytical procedures e methods of analysis used in this study are firmly grounded in a corpusbased approach. is approach involves, first of all, a text analysis program that can generate frequency statistics for sequences of words in the corpus, and secondly, a concordance program that shows all the occurrences of a particular phrase in its surrounding context. Using these methods allows for a detailed comparison of different genres based on quantitative evidence, and also permits more in-depth qualitative analysis of certain items chosen on the basis of those quantitative findings. is approach is thus ideally suited for the analysis of different language genres, both from a cross-corpus and a corpus-internal comparative perspective. A concordance program is simply a tool that allows the analyst to examine a relatively large number of examples of a given lexical or grammatical item in context, and thus it fits in well with a more traditional qualitative discourse analytic approach. Ultimately, the most revealing insights into professional discourse – or any particular language genre – will be gained from a closer look at the texts, the speakers, and the situational variables; quantitative analysis alone can never provide a satisfactory picture, especially when one of the goals of the research is to make the findings applicable to language teaching. e first step in the analysis was to select the high frequency expressions in MICASE to be investigated. Once this list was compiled, the next step was to compare the frequencies in MICASE against each of the three comparison corpora. en two different within-corpus comparisons were conducted – a comparison between the interactive and monologic subsets of MICASE, and comparisons between the speech of professors versus students. In addition, several of the expressions that were found to be of interest on the basis of the quantitative comparisons were also examined in context, to discover and illustrate the pragmatic functions these expressions are fulfilling in spoken academic discourse. e units of analysis for this study were frequently occurring expressions

42

Rita C. Simpson

of three, four, or five words, which I refer to as high frequency formulaic expressions. e minimum frequency used as a cutoff point was 20 tokens per million words (or 34 total tokens in MICASE). Although using three-, four-, and five-word phrases and 20 tokens per million as selection criteria was based on previous research, this remains somewhat arbitrary. at is to say, there is not enough research on these expressions to provide any independent confirmation of a minimum frequency level at which such recurring strings of words become formulaic or perceptually salient, and different researchers use different cutoff levels; Cortes (2002) uses 20 occurrences per million, Biber et al (1999) use 10 per million, and McCarthy and Carter (in press) use only 4 per million, because they wanted to include 6-word expressions in their analysis, and these are much rarer than shorter strings. e cutoff point is in fact determined more by the practical limitations and goals of the research; a cutoff point of twenty per million provides sufficient numbers of recurring expressions, but not so many that comprehensive research becomes unwieldy. e list of all three-, four-, and five-word sequences occurring at least 20 times per million in MICASE was generated using the WordList program of the WordSmith Tools concordancing program. In addition to this minimum frequency level as a basis for selecting which formulaic expressions to analyze, I applied the notions of structural and idiomatic coherence to further narrow the set of expressions investigated. Structural coherence refers to the syntactic composition of the word string; idiomatic coherence is essentially an intuitive notion (see Wray 2002 for an argument in favor of reinstating native-speaker intuition in research on formulaicity). So only strings that constitute complete syntactic units, sentence stems, or that intuitively look, sound, and feel like idiomatically independent expressions were included in the set. Examples of syntactically complete units include prepositional phrases (at the end, in the past), noun phrases (a lot of people, the Wrst thing, something like that), verb phrases (to make sure, look at this), or entire clauses (I can’t remember, does that make sense). Examples of sentence stems include: I think that, I don’t have, and do you know. And examples of idiomatically independent expressions include discourse marker strings such as well you know, or focusing expressions such as the thing is, or it turns out (that). ese selection criteria differ from the definition of lexical bundle discussed in Biber et al. (1999) and Cortes (2002), which includes any recurring multi-word sequence, regardless of coherence as a single unit. So, in my selection process, I included the phrases and in fact, in terms of, and you know what I mean, but

Formulaic expressions in academic speech

43

not and in fact you, in terms of the, or you know what I. Other examples of strings that are incomplete or span two syntactic units and thus were omitted include: one of the, this is a, look at the, what do you, end of the, of the things, that you have, in the same, part of the, some of the. Although this decision to eliminate certain recurring word strings from the analysis implies and stems from a view of formulaic expressions as lexical chunks that have some coherence and salience in the language as units, it will be evident at the end of this paper that this decision is not only a theoretical and methodological one, but is also pedagogically motivated. e entire list of three-, four-, and five-word strings in MICASE occurring at least 20 times per million words or a total of at least 34 tokens in the entire corpus included almost 1800 expressions (1611, 157, and 11 three-, four- and five-word strings, respectively), but of these, only 224 expressions were classified as structurally coherent or idiomatically complete.1 Using the list of 224 expressions as a starting point, I then looked up the frequencies of each expression in the three comparison corpora described above. From these comparisons, I derived another shorter list of expressions that occurred significantly more oen in MICASE than in at least two of the other three corpora. I used a log-likelihood estimate (hereaer, LL) — a statistical measure useful for comparing frequencies of lexical or grammatical items in language corpora — as the basis for deciding whether or not an observed frequency difference was significant (cf. Rayson and Garside 2000; Oakes 1998). Similarly, I then compared the frequencies of each of the expressions from the main list in the monologic and interactive subcorpora of MICASE. Again, expressions that were significantly more frequent in either subcorpus were identified using the LL statistic. To compare the speech of students and professors, I used the MICASE online search facility, which permits searches according to speaker characteristics. However, because the speech of students overall in MICASE is predominantly interactive, whereas the professors’ speech is both monologic and interactive, comparisons using the entire corpus largely reflect this contextual variable. erefore, I again narrowed the subcorpora, comparing monologic speech by students (i.e. student presentations) with monologic speech by professors. e final part of the analysis involved identifying, on the basis of the above comparisons, a few expressions that appeared to be quintessential academic formulae, and examining them in context from a pragmatic perspective. Starting from a set of expanded concordance lines, the expressions were analyzed

44

Rita C. Simpson

in context to determine their pragmatic functions. In other words, once certain phrases have been identified as being typical of academic speech, the final question of this research is: what functions are those expressions performing in the discourse that make them such prevalent and valuable lexical items in the repertoire of academic speakers?

Results Cross-corpus comparative frequencies As already stated, the first step in this research was to find out which expressions occur most frequently in MICASE, and of these, which expressions are more frequent than in the three comparison corpora. Table 1 shows the 20 most frequent three-, four-, and five-word expressions found in MICASE using the criteria discussed above. A number of these expressions, however, are also very frequent in other spoken corpora. For example, five of these top 20 expressions – and I think, but I think, I think that, I don’t think, I don’t have – are significantly more frequent in both CSPAE and in SWB; I don’t know, a little bit, and something like that are more frequent in SWB; and in terms of is more frequent only in CSPAE. So, in order to find out which expressions are typical of academic speech in particular, and not just characteristic high frequency expressions in any speech genre, I first looked for the expressions that were significantly more frequent in MICASE than in all three of the comparison corpora, using the LL statistic, with a significance level of p = .01. A total of 54 of the 224 expressions fell into this category, and these are listed in Tables 2 and 3, along with the comparative frequencies in each of the other three corpora for the 20 most frequent expressions; only four of these (shown with an asterisk in Table 1) also appear in the top 20 overall. All the expressions in Table 3 occur in at least 15 different transcripts, and most of them (35 of the 54) occur in 30 or more transcripts, so on the whole these expressions are fairly widely distributed throughout the corpus. However some of them (listed in the table in parentheses) had a highly skewed distribution, with almost half or more of the tokens occurring in one speech event. e expressions in this category are: the next one (33 of 78 tokens from a colloquium in which the speaker was referring frequently to her slides); in your mind and point of view (18 of 34, and 66 of 133 tokens, respectively, from a philosophy

Formulaic expressions in academic speech

45

Table 1. 20 most frequent 3-, 4-, and 5-word formulaic expressions in MICASE Expression

Total Tokens

I don’t know 1519 a little bit 669 in terms of 550 I don’t think 503 I think that 482 you can see* 368 and I think 328 do you think 258 I don’t know if 256 the same thing 235

Frequency Expression /million

Total tokens

Frequency /million

882 389 319 292 280 214 191 150 149 137

229 229 220 216 212 194 179 173 173 165

133 133 128 125 123 113 103 101 101 96

in other words at the end something like that and so on* do you know what I mean* I don’t have the same time but I think in this case*

*= expressions comparatively more frequent in MICASE as shown in Table 2.

Table 2. Top 20 expressions significantly more frequent in MICASE than in all 3 comparison corpora

Expression you can see and so on what I mean in this case I was like look at it you don’t know so you have point of view you know what I mean all of these the first one so we have what I’m saying look at this and in fact in the book it doesn’t matter do you see

MICASE 214 125 113 96 85 85 84 82 77 75 71 67 65 64 60 59 59 57 47

Frequency/million CSPAE NPR

SWB

41 59 9 34 1 63 31 19 55 2 38 33 36 23 32 19 7 5 24

36 23 49 8 31 52 43 35 24 33 12 32 35 27 6 24 3 26 14

26 28 4 34 3 7 11 7 28 1 24 8 5 4 8 17 12 5 9

46

Rita C. Simpson

Table 3. List of all 54 expressions more frequent in MICASE than in all 3 comparison corpora (using Log-likelihood statistic, at p = .01) all of these and in fact and so on and you can see as you can see both of these do you mean do you see does that make sense exactly the same how are you how do you know I was like I’ll show you

I’m gonna (going to) go in a minute (in some sense) in the book in the class (in the environment) in the same way in this case in this class (in your mind) it doesn’t matter it turns out it turns out that look at it

look at this oh my god on the bottom on the le on the right point of view so that’s why so we have so you get so you have something like this the first one (the next one) the reason why

what do you mean what does that mean what happens if what happens is what I mean what I’m saying why don’t you you can see you could say you don’t know (you know what I mean) you know what I’m saying

seminar); in the environment (24 of 34 tokens from an ecology class); in some sense (26 of 47 tokens from an office hour session); and you know what I mean (56 of 129 tokens from a peer-led study group). us, although these expressions occur in a number of transcripts in the corpus, their high frequency is a result either of one speaker’s idiolect (i.e. in some sense and you know what I mean), or of the content of one particular speech event, and therefore they cannot necessarily be considered representative academic formulae. In addition to the expressions that occurred more frequently in MICASE than in all three comparison corpora, another 85 expressions were identified as being more frequent compared to at least two of the comparison corpora. e majority of these were equally or more frequent in CSPAE, the corpus most similar to MICASE, and some of the typically interactive formulae were more frequent in SWB; none of the expressions were more frequent in the NPR corpus. For the purposes of space, this list is not included in its entirety here, however some of the expressions are included in Table 5 below, in the section on pragmatic categories.

Formulaic expressions in academic speech

47

Corpus-internal comparative frequencies e next set of results shows the distribution of expressions across the monologic and interactive subsets of MICASE. ese subcorpora consist of approximately 700,000 words each; the designation as monologic or interactive is based on a categorization scheme devised when the corpus was built.2 Table 4 shows those expressions (from the list of 54 shown in Table 3) that are significantly more frequent in either the monologic or interactive subsets, as well as those expressions that occur equally in both subsets. It is interesting, though not surprising based on existing research findings, that when contrasting the monologic and interactive subcorpora, there are more than twice as many expressions that are more frequent in the interactive speech events than in the monologic ones. A total of more than twenty expressions are not significantly more frequent in either subcorpus; in fact, it is this set of expressions that could be considered the most distilled core of expressions that typify academic speech as represented by MICASE, since they are relatively evenly distributed throughout the corpus. e final set of quantitative comparisons involved comparing formulaic expressions in the speech of students and professors. For this part of the analysis, I did not rely only on the short list generated from the cross-corpus comparisons, but rather started from the master list of 224 expressions. First, I compared the professors’ and students’ use of the expressions in the entire corpus. ese results are shown in Figures 1 and 2. It is indeed noteworthy that the most frequent differentiating expression for professors is the phrase and so on, and the second most frequent corresponding expression for the students is something like that — phrases that carry very similar meanings relating to the expression of vagueness. While professors also use something like that, just not as frequently as the students, students rarely use and so on, which is one of the highest frequency phrases in the study; it occurs in 50 transcripts, with a total of 216 tokens (or 127 per million). As is obvious from these figures, many of the students’ high-frequency expressions have something to do with interactivity explicitly, as in I have a question, does that make sense, I don’t remember, or you know what I’m saying. is is because it is primarily in the interactive segments of academic speech that students get to talk. Several of the expressions also clearly reflect the students’ role as inquirers or receivers of knowledge rather than suppliers of knowledge; examples include I don’t understand, what do you mean, I have

48

Rita C. Simpson

Table 4. Comparisons of monologic and interactive sub-corpora Expressions more frequent in Monologic Interactive (718,000 words) (704,000 words)

Expressions not significantly more frequent in either subcorpus

do you see does that make sense exactly the same in the environment in this class in your mind something like this the reason why what do you mean why don’t you

all of these and in fact and so on and you can see as you can see both of these do you mean how do you know I’m gonna go in the same way it turns out (that) look at it oh my God on the bottom on the right so we have so you get the first one what does that mean what happens if you could say you don’t know

how are you I was like I’ll show you in a minute in some sense in the book in the class in this case it doesn’t matter look at this on the le point of view so that’s why so you have the next one what happens is what I mean what I’m saying you can see you know what I mean you know what I’m saying

��

��

��

��

��

��

Figure 1. Expressions favored by professors, entire corpus

Formulaic expressions in academic speech

49

��

��

��

��

��

��

Figure 2. Expressions favored by students, entire corpus

no idea, I was wondering, and that makes sense. In order to compare students’ and professors’ use of formulaic expressions in more contextually similar situations of use, I further narrowed the subcorpus to only monologic speech events; this included mostly lectures for the professors, and student presentations for the students. ese speech events provide data in which the students are also transmitting information in larger segments, so the question is, how many of the same formulaic sequences favored by professors in lecture discourse would show up for the students in their monologic speech samples. ese resulting subcorpora were rather small — 397,000 words for the professors, and 104,000 words for the students — so the results here are tentative, especially for the students, but Figures 3 and 4 illustrate the comparisons for expressions favored by students versus professors in monologic speech. e expression I was like, with 146 tokens overall, and 141 of them from students, is the most common expression differentiating students from professors, and deserves mention here. Although this phrase may seem like the antithesis of an academic expression, because there are a significant number of students represented in the corpus — 44% of the total speech, in fact — and this expression is strongly associated with speakers in those age ranges, it is not surprising that it shows up as differentiating MICASE from the other three spoken corpora, none of which have significant numbers of teenagers or young adults. us, if academic speech is defined, as it has been for the purposes of

50

Rita C. Simpson

��

��

��

��

��

��

��

��

��

��

��

�

��

��

��

��

Figure 3. Expressions favored by professors, monologic speech

��

��

��

��

��

��

��

��

Figure 4. Expressions favored by students, monologic speech

Formulaic expressions in academic speech

51

MICASE research, as speech produced in a university environment in speech events that are in some way related to the main educational mission of the university, then this phrase is indeed part of the repertoire of academic speech, as reflected by students. It is, however, obviously not a significant part of the repertoire of professors, the most prototypical academic speakers.

Pragmatic analysis Overview of functional categories In this section, I consider the most prevalent pragmatic functions represented by the high frequency academic expressions, and then examine several of the expressions more closely, to discover exactly how they are being used by speakers in academic contexts. e formulaic expressions in MICASE fall into a wide range of functional categories, including a number of discourse organizing categories, such as summarizing, temporal sequencing, focusing, and meta-discourse, as well as categories related to interactional functions, such as explaining or demonstrating, giving instructions or advice, eliciting feedback, and asking questions. Table 5 shows the most salient pragmatic categories represented by these high frequency formulaic expressions that differentiate academic speech from other spoken genres. ese functional categories are broken down into two major subsets – discourse organizing functions, and interactional functions – plus a miscellaneous category. Although many of the expressions do not fit neatly into one category, and some of them are multi-functional, several groups of expressions do fit into specific and indeed predictable functional categories, given the contextual features of most academic speech. Expressions falling into the discourse organizing functions represent the largest number of formulaic expressions in academic speech, but quite a few expressions fall into the interactional categories as well. is reflects both the composition of the corpus — since it includes not only highly informational lectures, but also a significant proportion of interactive speech events — and also the fact that even though lectures are generally monologic and not explicitly interactive, speakers oen use various linguistic strategies to make the discourse more interactive. Discourse organizing expressions that are crucial for presenting and organizing effective classroom discourse include focusers and

52

Rita C. Simpson

introducers, meta-discourse phrases, enumerators and temporal sequencers, markers of contrast and comparison, cause-effect markers, and summarizers. Under the interactional functions, expressions for explaining and demonstrating, giving advice and instructions, as well as numerous question formulae are all widespread in the corpus. Perhaps the most ubiquitous functional category apart from question stems is a rather unsurprising set of expressions — the spatial organizers and locatives. While obviously important in terms of their propositional content, these expressions are not as pragmatically multifaceted, since they simply reflect the most common, immediate physical centers of attention such as books, classes, and the university, as well as the necessity in academic discourse for frequent deictic spatial references.

Analysis of selected expressions in context In this section I turn to a small selection of phrases for a more detailed analysis of their contextual environments in order to further elucidate their functions in academic speech. ese expressions were chosen on the basis of the results of the quantitative analysis, as well as the range and salience of the functions they seem to be performing. All but one of the following expressions were significantly more frequent in MICASE than in all three of the comparison corpora, and were equally distributed between the monologic and interactive subcorpora.

I’m gonna (going to) go e first expression in this section is one that initially seemed unlikely to appear on a list of academic formulaic expressions; it is not obvious at first glance why I’m gonna go would be comparatively more frequent in academic speech. However, a look at the 50 examples from MICASE shows that nearly half of the uses of this expression have to do with discourse or task management, as in the expressions I’m gonna go over/through/into (something), meaning to discuss or present something in the class. e examples below illustrate this use: (1) I’m gonna go through and give some examples (2) I’m gonna go through this pretty fast (3) if I have time I’m gonna go over question three and five from the problem set.

Formulaic expressions in academic speech

53

Table 5. Selected pragmatic functions (bold = significantly different in all 3 comparison corpora; italics = significantly different in 2 comparison corpora) PRIMARILY DISCOURSE ORGANIZING FUNCTIONS Focusers, introducing examples in terms of the problem is and in fact the point is the question is what happens is Meta-discourse expressions like I said I was like when we talk as I said what I’m saying you could say Enumerators, temporal sequencers Wrst of all the first one in a minute at the beginning the next one to begin with Contrast and comparison, linking (at) the same time the same thing exactly the same not the same the same way in the same way Cause-effect markers and that’s why so that’s why the reason why Summarizers in other words it turns out (that)

the thing is with respect to we’re talking about what you’re saying

PRIMARILY INTERACTIONAL FUNCTIONS Questions, sentence stems I have a question do you mean how do you know why don’t you do you see what do you mean any other questions does that make sense what does that mean do you want how are you what happens if what do you think Explaining/demonstrating and you can see so we have so you get you could say as you can see so you have you can see I’ll show you Commands, instructions, advice take a look look at it to make sure keep in mind look at this look at that MISCELLANEOUS FUNCTIONS Spatial organizers, locatives in the book on the left in the/this class on the right on the bottom on the web Hedges, mitigators in a sense in some sense in a way Generalizers, vagueness markers (and) stuV like that and so on (or) something like something like this that

at the end at the university on top of in some ways more or less

sort of like it seems like

54

Rita C. Simpson

(4) I’m gonna go a little bit into theory

Other similar uses have more to do with task management or the immediate sequencing of the unfolding discourse, as in examples (5) through (8): (5) I’m gonna go to roman numeral twenty-eight (6) I’m going to go back to your chart that you have in your handout (7) I’m gonna go back and say something I forgot to say (8) I’m gonna go ahead and put the verb for the quiz on the board

In these uses (as opposed to more literal, locative uses, such as I’m gonna go to Paris) this phrase forms the core of a meta-discursive expression, alerting the listeners to what is about to follow. Although the locative uses, as well as those describing a physical action to follow (e.g. I’m gonna go type it, I’m gonna go have some hot tea) occur in the corpus, the explanation for the comparatively high frequency of this expression in academic speech lies in its meta-discursive uses.

you could say Another phrase in the category of meta-discourse is the expression you could say. Like I’m gonna go, you could say is equally represented in both the monologic and interactive subcorpora, and is used in a variety of ways. First, it can be used to introduce a possible explanation or answer that gets refuted or qualified: (9) it seems to me that about eight out of ten movies that come out are centered in Los Angeles. now you could say oh that’s because it’s easier to film there and they don’t have to travel and it’s cheaper. but part of it as well is that... (10) and you could say, that that rosebush was pretty hardy, to be able to endure all of this stuff, and if you said that, you’d probably be right in some vernacular sense, but you wouldn’t be right as a gardener, unless what the rosebush really had done, was survive the cold winter. (11) in some way you could say she was really acting out of principle. on the other hand...

A related function is that of offering a hedged explanation; in both of these uses, the phrase, together with another hedge or downtoner, adds a tentative, mitigated tone to the statement.

Formulaic expressions in academic speech

55

(12) well, no you could say a reasonable strategy is to stop bidding on your inputs, maybe (13) so he’s been, in some ways you could say he’s been institutionalized since he was three months old

e phrase is also used as a polite way to offer advice or a suggestion, as in: (14) I think you could make it, somewhat broader than that. I think you could say that aer fourteen, twenty-five or something like that… (15) well you could say inactivity at one auction instead of quiescence

And, finally, it can be used to introduce a hypothetical example: (16) so for example, uh, you could say if we’re just sitting here I mean the earth’s going around the sun, so there’s... (17) for example you could say select children

Each of these pragmatic functions — introducing an explanation to be qualified or refuted, introducing a hypothetical example, hedging, and offering advice — are all examples of crucial and characteristic functions of academic discourse. us, when analyzed in context with the help of a concordancer, it becomes obvious why this expression figures so prominently in academic speech.

(it) turns out (that) Although the specific formulaic sequences that occurred most frequently are it turns out and it turns out that, the phrase also occurs as that turns out, and turns out, as well as turned out. is phrase is a highly idiomatic expression for summarizing or concluding a segment of discourse, or revealing an end result, with several predictable contexts of use. First, it is used to announce results in mathematical or statistical computations: (18) It turns out that the average is the same. (19) It turns out that the S values are not very reliable. (20) that turns out to be quite a lot of money (21) if your p-value turns out to be so small that you’d reject

It is also used to present other types of findings or discoveries, sometimes (examples 25- 27) with a particularly evaluative connotation to the finding: (22) turns out that leaves with teeth on them are very common...

56

Rita C. Simpson

(23) oen it turns out that if you ignore ‘em and don’t answer your colleagues and so on for a while they go away, (24) turns out to be false (25) effect of T-V violence which seems like a fairly simple thing to study, and turns out it’s very complex, (26) turns out to be a raging imbecile (27) which turns out to be very difficult to do

A closely related usage of this phrase that is also common in ordinary conversation, and thus perhaps more familiar, is that of introducing a resolution or continuation of a narrative episode: (28) so he and dad went to the library and they asked the librarian for pictures of old Celtic uniforms the basketball team, and it turns out that the project was he was supposed to find Celtic costumes. (29) It turns out that the sword is stolen.

e use of the formula turns out and its associated variants with the types of statements shown in examples (22)-(27) above transforms them from mere statements of fact, to statements about the discovery of these facts; this phrase thus contributes to a kind of dramatic presentation of the information, enlivening the talk for the listeners, in a rhetorical strategy akin to storytelling. Compare, for example, (22) with the bald statement ‘leaves with teeth on them are very common,’ or (26) with ‘he’s a raging imbecile.’ e use of turns out as a lead-in phrase seems to both personalize the statement and add a sense of anticipation, first of all, by implying that the speaker was somehow involved in discovering or ascertaining the fact that follows, and hinting at some sort of insider knowledge or expertise, and secondly, by lending an element of dramatic suspense to the delivery.

the thing is Swales (2001:46), in his article on the metadiscursive or discussive uses of the words point and thing in MICASE, mentions the thing is as a common threeword lexical bundle. is expression in its discussive sense functions pragmatically as a focuser, prefacing and drawing attention to the ensuing comment or statement. However, a closer examination of the contexts in which this phrase occurs reveals a more complex pragmatic profile. First, it is oen used

Formulaic expressions in academic speech

57

when negating, contrasting, or qualifying – and simultaneously emphasizing – a crucial point: (30) so i’m moving with the velocity here. but the thing is i’m not moving with the average velocity, right? (31) the thing is here we are not doing the T-star version we’re not going further and going through that cuz...

It is also used for explaining a problem, complication, or complex situation: (32) the thing is that, that you have to make the sculpture so it can be free standing. that’s a kind of a problem. you’ve gotta get it balanced right (33) the thing is, maximum size is, i- is rather a nebulous thing and it’s rather difficult to determine. (34) but the thing is (xx) it’s only offered, during the times that physics oneforty is offered

Perhaps the most interesting usage is illustrated by the longer excerpts in (35) and (36), in which this expression is used while arguing a point interactively. Excerpt (35) is an example from a composition class of a student struggling with a small detail about rules of punctuation, in which she questions and challenges the instructor. He in turn responds to her question “Are you sure?” by launching into a slightly more detailed explanation, and prefaces the crux of the argument with the thing is, in order to draw attention not only to the content of the following point, but also to his conviction about the importance and validity of his explanation. (35) Instructor: uhuh. this stuff goes inside, unless you’ve got a citation to include in your sentence [Student: okay.] this stuff goes outside. Student: of quotations? Instructor: right. Student: always? Instructor: always. Student: see that’s totally new to me. are you sure? Instructor: [LAUGHS] it isn’t actually. [Studs: LAUGH] um, here’s why uh you can, à the thing is if you add a comma here and it’s your comma and not Foucault’s comma, you know you still need the comma so, it’s alright. right? th- y- that’s like it’s sliding. it’s it’s_ technically you’re adding something to Foucault’s text.

58

Rita C. Simpson

Excerpt (36) is especially interesting because it occurs in a very heated debate among students, and speaker S12 uses this expression in an unsuccessful attempt to get the floor. e fact that it doesn’t succeed doesn’t make the phrase any less meaningful – in fact, it shows how the phrase can function on its own as a unit, used as a mechanism for trying to get the floor in order to introduce a crucial piece of information during an argument. (36) S12: no it no well, no it doesn’t. it really doesn’t it just says, this is what he did. and then it says why he did it. and it lets you draw your own conclusion. S6: there you go. S8: consider this. [LAUGHS] like, Bush has tons of things coming_ with the whole Iraq situation hello possible war, and then like the [S3: [LAUGHS] and Cheney might die. ] the you know, [Studs: LAUGH] Cheney, issue and S6: yeah and he he might die of heat exhaustion so [Studs: LAUGH] (xx) S8: and the the the economy thing, i think it’s important to know how, the country feels about him right now. more important, than talking about carbon dioxide. à S12: but the thing is S4: but how they feel is gonna change because th- he’s going back on what his whole political campaign like represented (and stuff) S2: that wasn’t_ his entire platform [S4: but it ] was not based on this

look at it, look at this Finally, the phrases look at it and look at this warrant closer analysis because of the particular ways this expression is used in academic speech. ese two are among the top 20 expressions comparatively more frequent in MICASE (see Table 2), with an overall frequency of 85 tokens per million for look at it, and 60 per million for look at this. ese phrases have several different meanings and pragmatic functions; one is obviously the literal meaning of look, used for example in art classes, science labs, and any speech event where there is a heavy reliance on visuals. We find look at it paired with locatives (on a graph, on the board, on a map, on the web, and under the microscope), and with adverbial or prepositional modifying expressions: carefully, close up, from multiple angles,

Formulaic expressions in academic speech

59

from the front. Look at it/this is also used to mean ‘read’, as in (37) do we wanna write it down to look at it later? (38) (if) you’d like me to take a look at it, just hand it in.

ese uses obviously have some additional connotation of considering, evaluating, or reading carefully or critically, as opposed to just reading. Related to this meaning is the use of look at it or look at this to mean ‘examine, study, or investigate’ (examples 39–41), a meaning which is understandably quite common in academic discourse. (39) we’re gonna take a look at it, we’re gonna investigate it (40) I’m writing this down so that you can look at it, compare them (41) I can give you some references, if you wanna look at it some more

A subset of this usage, which comprises the most frequent meaning set for this phrase, is that which implies considering or regarding something from a distinct perspective. Examples 42–47 illustrate a few of these uses. Excerpt (47) is particularly interesting because the phrase is used twice in close succession (once with it and once with this) to present two contrasting ways of considering the issue at hand, and in both instances the phrase is followed by a lengthy, complex clause, rather than a simple prepositional phrase or noun phrase as in the other examples. (42) let’s look at it another way (43) it was more important to look at it from a broader perspective (44) it’s the only way to look at this particular issue (45) that’s a really interesting way to look at it (46) (you can) look at it in a very broad sense or you can look at it in a more narrow sense (47) now again you can look at this as, you know, we live in a world where producers are, conceptualizing women as, dependent on men, not likely to be employed, and so on. remember though you can also look at it as, men’s characters, may be disproportionately shown as being employed

e ability to examine, perceive, consider, and evaluate facts, issues, situations, and so forth from multiple angles — and, crucially, to comment on these views — are fundamental pragmatic functions in academic discourse. us, these examples with the phrase look at it/this again illustrate the ways a simple and

60

Rita C. Simpson

seemingly transparent formulaic expression is used to introduce meanings that lie at the heart of discourse in the academy.

Discussion and implications for teaching e research for this study began by identifying a list of all three-, four- and five-word formulaic expressions in MICASE occurring above a specified frequency range. Following from that, approximately one-fourth of the expressions from that list were found to be significantly more frequent in MICASE than in three comparison corpora of other speech varieties, and thus particularly characteristic of academic speech. e study has also identified which of those expressions are more likely to occur in monologic academic discourse and which are more likely to be used in interactive academic situations. Although a number of expressions common in the interactive sections of MICASE are also found in correspondingly high frequencies in other types of interactive speech, many of them reflect aspects of the interaction typical of an academic setting. is research has also illuminated some interesting differences in the formulaic expressions used by professors versus those used by students. e comparison of similarly monologic speech episodes showed that a few phrases that professors seem to rely on extensively are noticeably absent from the students’ monologic speech, indicating that formulaic expressions are indeed part of the learning curve for the development of an academic speaking style. Finally, this research has examined the high frequency, characteristically academic formulaic expressions from a functional pragmatic perspective, showing that the most common functions can be broken down into two broad categories – functions related to the organization and structuring of discourse, and functions related to interactivity. ere is a constant interplay between these two overarching characteristics of academic speech, which is by nature an information-rich genre, but in which interaction between the participants is also of paramount importance, and the formulaic expressions identified here serve to highlight these dual pragmatic features. ere are numerous potential applications of this research to ESL teaching, though many questions remain as to the effectiveness and importance of incorporating explicit instruction of formulaic expressions into an ESL curriculum. Certainly, it would be worthwhile for teachers to become aware of the

Formulaic expressions in academic speech

61

most common formulaic expressions – both those highly frequent in multiple speech genres, and those exceptionally frequent in academic speech. As a result of studies like this one and other related ones, we are now in a position to point students toward those utterances that are most idiomatic for certain common functions in speech. And, I would argue, in answer to a question posed by Swales (2001:36), that even such a seemingly inelegant phrase as the thing is is worth teaching to non-native speakers, in part because it is so ubiquitous, but more importantly, because it is so poly-pragmatic as to be an extremely valuable discursive device, and furthermore, unlike, for example, the phrase I was like, it is not strongly associated with younger speakers or students. us, at the very least, these lists of expressions provide useful starting points for any EAP speaking or listening course that aims to teach students colloquial, idiomatic language. To this end, the methodology for identifying formulaic expressions employed in this research is important in that it relies on notions of salience and coherence of the expressions as units, since perceptually salient, idiomatically or structurally coherent expressions are far more applicable for pedagogical purposes than chunks derived purely on the basis of frequency and co-occurrence statistics. Even more relevant to pedagogical aims, the pragmatic analysis shows that when high frequency expressions such as the ones presented here are examined in context, shades of meaning and pragmatic subtleties become readily apparent. For many phrases, these functions are not necessarily transparent from the literal meaning of the phrase in isolation. For example, the thing is may be thought of as simply a type of focusing device, but numerous authentic examples serve to highlight its pragmatic richness within that broad function in a way that is not possible based on intuition alone. Excerpts such as those shown in examples (30) to (36) above can be presented to students to illustrate this phrase as it is used to introduce complexities, to negate or qualify a prior statement, or as an argumentation device. is is just one example of an expression that EAP teachers might overlook, underestimating both its teachability and its importance in academic speech. Similarly, you could say, it turns out, look at it, and I’m gonna go are phrases that are pragmatically richer than intuition alone might suggest, and the examples provided here from MICASE could easily be adapted for classroom use. ese expressions are all relatively simple on the surface, and might be dismissed by an EAP teacher as not necessary to an academic speaking curriculum, either because their meanings are

62

Rita C. Simpson

thought to be self evident to advanced learners, or because they are not seen as discoursally important enough; the results of this research, however, provide evidence to the contrary. One possible approach to incorporating these phrases into a curriculum would be to proceed from the various pragmatic functions, showing examples in context of different phrases being used for very similar functions, or conversely, showing how closely related expressions convey subtly different meanings. One important caveat in teaching these kinds of expressions, however, is that students should be cautioned against the tendency to latch onto and thus overuse any given phrase once they understand it and can use it appropriately. For this reason, it would be preferable to expose students not only to the most common expression for a given function, but to include a variety of related phrases. e comparisons between students and professors could also be used for purposes of rhetorical consciousness raising, to demonstrate different ways of saying the same thing (e.g., and stuV like that, or something like that, and so on, and so forth), and illustrate how these expressions are distributed and used differently among different speaker groups. Graduate student instructors (TAs) — many of whom in U.S. institutions are non-native speakers — must oen choose between a more formal register (approximating professors’ norms) or a more informal one; highlighting the differences can give non-native speakers some insights into the socio-pragmatics of certain conventions that they might not otherwise pick up on. All of these expressions are valuable items for EAP students to learn both for listening as well as speaking. And, since they occur across the whole range of academic divisions, they need not be presented in subject-specific classes or contexts. ese phrases are used as discourse structuring or organizing devices; for demonstrating, emphasizing, and hedging; for interactional purposes; and also sometimes as fillers. ey are oen crucial linking phrases between segments of the propositional content of utterances. As such, they contribute to idiomaticity and fluency in multiple ways, and are thus important items to include in an EAP curriculum.

Formulaic expressions in academic speech

63

Notes 1. I stress here that the motivation for narrowing the data set in this way was primarily for the purposes of identifying a limited subset of highly frequent items to compare with other corpora, and not to present precise criteria for defining what is and is not a formulaic sequence. It is certainly true that there are a number of borderline cases that could arguably have been included under my notion of ‘idiomatically independent expressions,’ but were not, and vice-versa. However, the resulting set of over 200 expressions is neither too restricted nor too broad for the purposes and scope of the research undertaken here. 2. ese categories, called ‘primary discourse mode’ in the MICASE coding scheme, are based primarily on turn length. In addition to the monologic and interactive categories, there is a third category – mixed – for a smaller subset of transcripts that fall in the middle of this spectrum, but that category was omitted for the purposes of this comparison.

References Altenberg, B. 1998. “On the phraseology of spoken English: The evidence of recurrent word-combinations.” In Phraseology: Theory, Analysis, and Applications, A.P. Cowie (ed.), 101–122. Oxford: Clarendon Press. Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. 1999. Longman Grammar of Spoken and Written English. London: Longman. Cortes, V. 2002. “Lexical bundles in Freshman composition.” In Using Corpora to Explore Linguistic Variation, R. Reppen, S.M. Fitzmaurice and D. Biber (eds), 131–145. Amsterdam and Philadelphia: John Benjamins. Cortes, V., Jones, J. and Stoller, F. 2002. “Lexical bundles in ESP reading and writing.” Paper presented at TESOL conference, Salt Lake City, April 2002. Fernando, C. 1996. Idioms and Idiomaticity. Oxford: Oxford University Press. Mauranen, A. 2001. “Reflexive academic talk: Observations from MICASE.” In Corpus Linguistics in North America: Selections from the 1999 Symposium. R.C. Simpson and J.M. Swales (eds), 165–178. Ann Arbor, MI: University of Michigan Press. Mauranen, A. 2003. “‘It seems to me you’re saying’: Formulae in academic speech.” Paper presented at AAAL conference, Arlington, VA., March 2003. McCarthy, M. and Carter, R. (in press). “This that and the other: Multi-word clusters in spoken English as visible patters of interaction.” TEANGA: Irish Journal of Applied Linguistics. Nattinger, J.R. and DeCarrico, J.S. 1992. Lexical Phrases and Language Teaching. Oxford: Oxford University Press. Oakes, M. 1998. Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press. Oakey, D. 2002. “Formulaic language in English academic writing.” In Using Corpora to Explore Linguistic Variation, R. Reppen, S. Fitzmaurice and D. Biber (eds), 111–129. Amsterdam: John Benjamins.

64

Rita C. Simpson

Pawley, A and Syder, F.H. 1983. “Two puzzles for linguistic theory: Nativelike selection and nativelike fluency.” In Language and Communication J.C. Richards and R.W. Schmidt (eds.), 191–226. New York: Longman. Poos, D. and Simpson, R. 2002. “Cross-disciplinary comparisons of hedging: Some findings from the Michigan Corpus of Academic Spoken English.” In Using Corpora to Explore Linguistic Variation, R. Reppen, S.M. Fitzmaurice and D. Biber (eds), 3–23. Amsterdam and Philadelphia: John Benjamins. Rayson, P. and Garside, R. 2000. “Comparing corpora using frequency profiling.” In Proceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), 1–6. Hong Kong. Schmitt, N. and Carter, R. (2004). “An overview of formulaic sequences.” In The Acquisition and Use of Formulaic Sequences. N. Schmitt (ed.). Amsterdam: John Benjamins. Simpson, R.C., Briggs, S.L., Ovens, J. and Swales, J.M. 2002. The Michigan Corpus of Academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan. URL: http://www.hti.umich.edu/m/micase/ Simpson, R.C. and Mendis, D. 2003. “A Corpus-based Study of Idioms in Academic Speech.” TESOL Quarterly 37 (3):419–441. Swales, J. 2001. “Metatalk in American academic talk: The cases of point and thing.” Journal of English Linguistics 29 (1):34–54. Swales, J. and Malczewski, B. 2001. “Discourse management and new episode flags in MICASE.” In Corpus Linguistics in North America: Selections from the 1999 Symposium. R.C. Simpson and J.M. Swales (eds), 145–164. Ann Arbor, MI: University of Michigan Press. Wray, A. 1999. “Formulaic language in learners and native speakers.” Language Teaching 32:213–231. Wray, A. 2000. “Formulaic sequences in second language teaching: Principle and practice.” Applied Linguistics 21:463–489. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press. Wray, A. and Perkins, M.R. 2000. “The functions of formulaic language: an integrated model.” Language and Communication 20:1–28.

Academic language

65

Academic language: An exploration of university classroom and textbook language Randi Reppen Northern Arizona University

Introduction Until recently, there have been few studies that explored both spoken and written academic discourse in university settings. In a recent TESOL Quarterly article, Biber et al. (2002) provide a linguistic description of both spoken and written academic language using Multi-dimensional analysis with over 67 linguistic features based on the TOEFL 2000 Spoken and Written Academic Language (T2K-SWAL) corpus. is corpus contains ten diVerent registers including service encounters (e.g. interactions in various settings such as the bursar’s oYce to the coVee shop), oYce hours, classroom lectures, textbook, syllabi, brochures, and university webpages, thus providing a tremendous resource for describing the language that students encounter in a university setting and also for exploring linguistic variation in diVerent university settings (e.g. classroom teaching vs. labs/in-class sessions or classroom teaching vs. textbooks). In addition to the TESOL Quarterly article, an ETS (Educational Testing Services) monograph (Biber et al. 2004), provides a complete description of the corpus and describes the diVerent types of analysis ranging from lexical to syntactic across the diVerent registers in the corpus. is chapter summarizes and extends recent research on the T2K-SWAL corpus, focusing on two spoken and two written registers (i.e. classroom lectures, labs/in-class sessions, textbooks, and course packs) and using Dimension 1 of Biber’s 1988 Multi-dimensional analysis (Biber et al. 2002) and lexical bundle information based on Biber, Conrad and Cortes (2004). Even though university students spend many of their waking hours in classes, or reading materials related to those classes, to date it has not been

66

Randi Reppen

possible to provide a thorough linguistic description of the language that students encountered in university setting since no corpora existed. Recent advances in corpus linguistics, and in technology, have enabled researchers to construct more complete descriptions of academic discourse using large corpora of academic language. At least two major corpora of academic language now exist, the 1.5 million word Michigan Corpus of Academic Spoken English (MICASE) (Simpson and Swales 2001), and the 2.7 million word T2K-SWAL corpus (Biber et al. 2002; Biber et al. 2001; Biber et al. 2004). is corpus-based research complements previous studies that have focused on particular aspects of academic language. For many years, researchers have explored academic language (both written and spoken) from the perspective of rhetorical structures (e.g. Atkinson 1999; Berkenkotter and Huckin 1995; Cazden 1988; Swales 1990), or by investigating particular linguistic features in texts (e.g. Ferguson 2000 on conditionals; Hyland 1994, 1996a, 1996b on hedges; Swales et al. 1998 on imperatives). Other studies have described particular linguistic features in certain types of academic discourse, such as, verb classes used in research articles (e.g. Hunston 1995; ompson & Ye 1991; Williams 1996), complex noun phrase structures of scientiWc texts (e.g. Halliday 1988; Varantola 1984), or discourse markers in classroom discourse (e.g. Flowerdew and Tauroza 1995; Khuwaileh 1999; Nattinger and DeCarrico 1992). ese studies have provided an important foundation for our knowledge about academic language, and now with the recent advances in corpus-based studies it has been possible to create a thorough linguistic description of the language found in university settings and to see how many linguistic features work together to create the texts, both spoken and written, that students encounter.

Overview of the corpus is section provides a summary of the T2K-SWAL corpus and the methodology used in the collection of texts. For a full description that includes detailed counts for each register in the corpus and counts of the linguistic features in each register, readers can refer to Biber et al. (2004). Descriptions can also be found in Biber et al. (2001) and the TESOL Quarterly article (Biber et al. 2002). e T2K-SWAL Corpus was designed to be used for two major purposes:1) to study the patterns of language use found in academic registers; and 2) to develop procedures for assuring that the language used in TOEFL Exam tasks

Academic language

67

is representative of real life language use. Related to the Wrst purpose, analyses based on the corpus can investigate a wide range of research issues relating to the linguistic characteristics of academic texts; these include vocabulary distributions, the use of collocations and idioms, grammatical characteristics, syntactic complexity, informational density, and rhetorical organization patterns. In all cases, the design of the corpus allows research to be undertaken from a register perspective. at is, each register (such as textbooks, course syllabi, lectures, or service encounters) can be studied in relation to the full range of other academic spoken and written registers. In addition to text studies, the corpus will be used to develop diagnostic tools to ensure that test stimuli represent the same range of variation and linguistic (lexical and grammatical) complexity that students encounter regularly in academic life. ere are at least four major advantages to adopting a corpus-based approach for these research purposes: 1) It allows for the adequate representation of naturally-occurring discourse, including representative text samples from a range of academic registers. us, corpus-based analyses can be based on long passages from each text, multiple texts from each register, and a full range of spoken and written registers. 2) e (semi-) automatic linguistic processing of texts using computers enables analyses of much wider scope than otherwise feasible. With computational processing, it is possible to undertake a comprehensive linguistic characterization of a text, analyzing a wide range of linguistic features. 3) ere is much greater reliability and accuracy for quantitative analyses of linguistic features. at is, computers do not become bored or tired – they will count a linguistic feature in the same way every time it is encountered. 4) Cumulative results and accountability are possible. Subsequent studies can be based on the same corpus of texts, or additional corpora can be analyzed using the same computational techniques.

Design and methodology e two most important considerations in the design of corpora are 1) the representation of the full range of register diversity within the target discourse domain, and 2) the size of the sample from each register (number of texts and

68

Randi Reppen

number of words per text). At the same time, these considerations must be balanced by practical considerations relating to the amount of time and resources available for corpus construction. In the design of the T2K-SWAL corpus, we tried to balance four desirable criteria (see Biber 1990, 1993a, b; Biber, Conrad and Reppen 1994, 1996, 1998): 1) e corpus must be as large as possible; 2) e corpus must be as representative as possible (of registers, disciplines, levels of education, universities, teaching styles, etc.); 3) e corpus must be collected, transcribed/scanned, edited, tagged, and tag-edited within a two year period; 4) e corpus should be constructed in a cost-eYcient manner. Unfortunately, these criteria can conXict with one another, so that the resulting design is a compromise among the four. Because the T2K-SWAL corpus was developed for a specialized domain, its 2.7 million word size is more than adequate for analyses of major grammatical features. Biber (1990) shows that the distribution of common grammatical features is stable across text samples as small as 1,000 words. e T2K-SWAL corpus is also large enough for analyses of common vocabulary patterns, although it would need to be extended for collocational studies of less common words. With respect to the representation of diversity, the design samples the range of diVerences found in academic language use. Although there are gaps not included in the corpus (e.g. aspects of regional or demographic variation – see below), the corpus does represent the various types of spoken academic language that students can expect to encounter on virtually any American university campus. ese design goals and sampling methods do not allow a full representation of demographic variation in the U.S. (regional, ethnic, social, or other). However, that was not a feasible goal for this project. For example, there are 18 major speech areas in the Eastern United States alone, each associated with a diVerent regional dialect. In addition, it is possible to identify numerous urban dialects within the country, as well as several well-deWned social and ethnic dialects within most of those urban centers. Attempting even a minimal representation of these dialect patterns would have required a project many times larger than this project (requiring a much larger research team and many years of eVort). Similarly, the corpus does not attempt a full representation of the kinds of academic institutions found in the U.S. Any attempt to do that would

Academic language

69

also have resulted in a much larger project. However, although the corpus does not achieve full demographic / institutional representativeness, it does avoid obvious skewing for these factors. Further, the design achieves a kind of face validity in relation to these factors by recognizing and including at least some of the major diVerences. e corpus materials were collected from four major regions in the U.S.: West Coast, Rocky Mountain west, Mid-west, and the Deep South. Further, materials were collected from diVerent types of academic institutions: a teacher’s college (California State, Sacramento), a mid-size regional university (Northern Arizona University), an urban research university (Georgia State), and a Research 1 university with a national reputation in agriculture and engineering (Iowa State). us, although this design does not represent the full range of demographic or institutional variation in the U.S., it is also not skewed towards any particular regions or types of institution. e T2K-SWAL Corpus was designed to be relatively large (2.7 million words), as well as representative of the range of the academic registers that students must listen to or read. e register categories in the corpus are sampled from across the full range of spoken and written activities associated with academic life. However, given the constraints of time and budget, it was not feasible to attempt a comprehensive sampling of all distinctions (e.g., including language from all disciplines sampled from the full range of American universities, etc.). Instead, we have focused on a Wnite number of registers from Wve selected disciplines (Natural Science, Humanities, Business, Engineering, and Social Science), representing the range of variability that students will encounter in academic life. e sampling of all register categories was split between undergraduate and graduate level texts. We collected texts at four academic sites: Northern Arizona University; Iowa State University; California State University at Sacramento; and Georgia State University). We did not attempt to achieve exact equivalence of sampling from the academic sites. Rather, our goal was to avoid marked skewing that might result from texts being sampled from a single university setting, although we do not anticipate any systematic diVerences across universities. Table 1 gives the composition of the entire T2K-SWAL corpus even though in the remainder of the chapter, the focus is on the four sub registers of the T2K-SWAL Corpus: two spoken and two written (Spoken: class sessions, and labs/in-class groups; Written: textbooks and course packs). e relative weight given to each register category reXects the initial analysis of the extent to which a given kind of language actually occurs in university

70

Randi Reppen

Table 1. Composition of the T2K-SWAL Corpus (e underlined registers are the focus of this chapter.) Register

# of texts

Spoken: Class sessions Classroom management* Labs/In-class groups OYce hours Study groups Service encounters Total speech:

176 (40) 17 11 25 22 251 (+40)

Written: Textbooks Course packs Course management Other campus writing Total writing: TOTAL CORPUS:

87 27 21 37 172 423 (+40)

# of words 1,248,811 39,255 88,234 50,412 141,140 97,664 1,665,516 760,619 107,173 52,410 151,450 1,071,652 2,737,168

*Classroom management texts are extracted from the ‘class session’ tapes.

settings, and the diYculty in collecting the register. Class-management discussion is one of the smallest sub registers with only 39,255 words. When it occurs at all, speech from this register takes place only at the beginning and end of class sessions, and typically takes only 1–2 minutes. Apart from this category, all ‘texts’ in the corpus are intact, representing the complex interweaving of tasks found in normal spoken discourse. However, the class-management talk found in class sessions was copied into separate ‘sub-texts’, to allow this register to be studied in more detail.

Text collection e collection of the written texts to include in the corpus was fairly straight forward. Course syllabi, textbooks, reading packets, tests, and printed campus information (both paper and electronic) were collected and scanned. However, the spoken portion of the corpus presented several challenges. Capturing naturally-occurring discourse was a primary criterion for texts to be included

Academic language

71

in the corpus. One major obstacle to collecting natural spoken language is the presence of research assistants in spoken settings which is oen intrusive and likely to result in somewhat artiWcial discourse. As a result, student participants were asked to carry tape recorders and record academic speech as it occurred spontaneously in the classes and study groups that they attended. High quality, natural interactions were obtained using this approach; the major disadvantage is that the interactions were not observed Wrst-hand and thus there is no detailed information about the setting and participants. In general, students were the primary participants. ey were recruited to record lectures, study groups, and other academic conversations. Faculty members were recruited to help with the recording of oYce hours, and university staV for service encounters. Participants were evenly distributed across the Wve disciplines (Natural Science, Humanities, Business, Engineering, and Social Science) and three levels of education (lower division undergraduate, upper division undergraduate, graduate). Also speciWc sub-disciplines (e.g. chemistry, philosophy, psychology) were targeted to enable register comparisons at a more speciWc level. Student participants recorded the class sessions and study groups that they were involved in during a two-week period, keeping a log of speech events and participants to the extent that was practical. University staV members were recruited from oYces and areas that regularly interact with students such as the registrar, housing oYce, bookstore, and library. All texts in the corpus are coded with a header to identify content area and register. Spoken texts were transcribed using a consistent transcription convention (see Edwards and Lampert 1993), and to the extent possible, speakers were distinguished and some demographic information supplied in the header for each speaker (e.g. their status as instructor vs. student). All texts were edited to insure accuracy in transcribing and scanning. en, all texts were grammatically annotated using an automatic grammatical tagger (developed and revised over a 10 year period by Biber). e grammatical tags were subsequently edited using an interactive grammar-checker, to assure a high degree of accuracy for the Wnal annotated corpus (see Biber, Conrad and Reppen 1998).

Overview of corpus analysis Two types of analyses were used to provide the information used in this chapter. First, Multi-dimensional analysis allows us to look at 67 linguistic features

72

Randi Reppen

and the interaction of those features in the texts included in the corpus. e second type of analysis is the identiWcation of four-word lexical bundles. ese two types of analyses provide diVerent insights as to the language that students encounter in university texts, both spoken and written. e Multi-dimensional analysis provides a large-scale picture of overall linguistic patterns while exploring the lexical bundles provides a closer look at speciWc groups of words.

Multi-dimensional analysis is section will provide an overview of the methodology used in Multi-dimensional analysis (for a detailed description see Biber 1988; Biber, Conrad and Reppen 1998; Conrad and Biber 2001). To create the original Multi-dimensional model, the texts in the corpus described in Biber (1988) were tagged (i.e. grammatically annotated) and interactively edited for accuracy. A factor analysis was then used to identify linguistic features that had strong co-occurrence patterns. e groups of linguistic features that were identiWed through the factor analysis are shown in Table 2. It is important to understand that 67 linguistic features were tagged in all of the texts, and then a factor analysis was performed to identify which features co-occurred. e features that were identiWed as having a statistically signiWcant co-occurrence load onto factors. In this analysis, Wve factors were identiWed. e Wve factors, or groupings, of linguistic features have a complementary distribution pattern. at is, if there are a high number of features from the positive end there will be a low number of features from the negative end. Aer the Wve factors were identiWed, they were assigned descriptive labels through a process of looking at the linguistic features that co-occurred on that factor and the distribution of the texts in that factor. Once labeled, the factors are referred to as Dimensions. ese groupings of linguistic features reXect diVerent functions being accomplished by the texts (e.g. Involved vs. Informational production, Narration, etc.). is type of analysis characterizes texts by co-occurrences of linguistic features along diVerent dimensions, hence the name Multi-dimensional analysis. Table 2 below shows the Dimension labels and the linguistic features associated with those dimensions. In order to use Mutli-dimensional analysis with the T2K-SWAL corpus, all the texts in the corpus were tagged and interactively edited for accuracy. en the linguistic features were used to compute a Dimension score for each text. For example, each text in the corpus has a count or score for each of the

Academic language

Table 2. Summary of Biber’s 1988 Factor Analysis Dimension 1: Involved vs. Informational production Positive features private verbs that-deletion contractions present tense verbs second person pronouns do as pro-verb analytic negation demonstrative pronouns general emphatics Wrst person pronouns pronoun it be as main verb causative subordination discourse particles indeWnite pronouns general hedges ampliWers sentence relatives wh-clauses Wnal prepositions Negative features nouns word length prepositions type/token ratio attributive adjectives Dimension 2: Narrative vs. Non narrative discourse Positive features past tense verbs third person pronouns perfect aspect verbs public verbs synthetic negation present participial clauses Negative features None

73

74

Randi Reppen

Table 2. Cont. Dimension 3: Situation-dependent vs. elaborated reference Positive features time adverbials place adverbials adverbs Negative features wh-relative clauses in object position pied piping constructions wh-relative clauses in subject position phrasal coordination nominalizations Dimension 4: Overt expressions of persuasion Positive features inWnitives prediction modals suasive verbs conditional subordination necessity modals split auxiliaries Negative features None Dimension 5: Non-impersonal vs. impersonal style Positive features None Negative features conjuncts agentless passives past participial adverbial clauses by passives past participial postnominal clauses other adverbial subordination

linguistic features on the dimensions. e counts for the features are then added together. If a dimension has negative features, the negative features are subtracted from the positive features to compute a score for that text along that particular dimension. Using the text scores, mean scores for various registers (e.g. textbooks, undergraduate lectures, etc.) can be computed. e mean

Academic language

75

scores are used to plot the registers along the dimensions. Of the Wve dimensions identiWed in Biber’s 1988 study, and in Biber et al. (2002, 2004), this chapter will focus on only one of those dimensions – Dimension 1: Involved vs. Informational production. is dimension was selected because it best captures important aspects related to literacy and text production (e.g. carefully produced, as in edited written material vs. produced under time constraints, as in face-to-face conversations).

Dimension 1 Following the procedures described above, dimension scores were computed for class lectures, labs/in-class sessions, textbooks, and course packs. Figure 1 below, plots the mean dimension scores for these academic registers along Dimension 1. Five registers from Biber’s 1988 study (i.e. face-to-face, personal letters, general Wction, science Wction and oYcial documents) are also plotted on these dimensions. ese other registers were plotted to provide a principled means to compare the academic registers relative to the other registers in order to see how academic language compares to some of the speech and writing that is encountered in everyday life. Along Dimension 1, in Figure 1 below, the two spoken and two written academic registers are clearly separated. is separation was reXected in all the spoken and written registers of the T2K-SWAL corpus. is clear separation between all the spoken and written registers in the corpus is surprising given that the spoken and written registers each contain a range of texts that reXect diVerent goals and production circumstances (Biber et al. 2002). Course packs and textbooks have almost identical scores and plot very near oYcial documents. ese dense texts use long words and lots of nouns and prepositions, all features typical of informational production along Dimension 1. In the excerpts below, one from a geology textbook and one from a geology lecture, the following features, characteristic of the two ends of Dimension 1 have been highlighted: nouns (bolded), prepositions (underlined) and Wrst and second personal pronouns, and pronoun it (italicized). (1) Geology textbook excerpt e eruptive sequence may be most easily studied at volcanoes such as Kilauea on the Island of Hawaii that do not ordinarily erupt violently.

76

Randi Reppen

Involved production 50 | | 45 Labs/In-class sessions (45.8) | | 40 | | 35 Face-to face conversation | | 30 | Class lectures (27.7) | 25 | | 20 Personal letters | | 15 | | 10 | | 5 | | 0 General Wction | | –5 Science Wction | | –10 | | –15 | Course packs (–16.1) Textbooks (–16.3) | –20 OYcial documents Informational production Figure 1. Mean scores of University registers along Dimension 1 (Comparison registers in italics. T2K-SWAL registers are in bold. Written registers are underlined.)

Academic language

77

Much can be learned about the eruptive process by studying such volcanoes, but it is understood that there are limits to extrapolating experience from such volcanoes in an effort to understand the eruptive sequence at volcanoes that erupt more violently. (64 words) (2) Geology lecture excerpt I showed you pictures from southern Ontario up in the Sudbury basin actually the rocks that were the big mounded rocks that were eroded there were actually rocks that were formed from this glacial event so they’re actually if you take an old piece of glacial till and liquefy it and compact it and turn it into a rock it’s called a tillite. (63 words)

From these short excerpts that typify the diVerent ends of Dimension 1, we see that the textbook excerpt (1) has a dense use of features that are typical of informational production, such as the high use of nouns, prepositions and long words coupled by an absence of personal pronouns and no contractions. In addition, the words used in the textbook excerpt are also longer than those found in the classroom lecture. e excerpt from the classroom lecture (2) has fewer nouns and prepositions than the textbook excerpt. e classroom lecture also has two features not found in the textbook excerpt – the use of pronouns (both personal pronouns and it) and the use of contractions. Both of these features are associated with the positive end of Dimension 1 and are characteristic of involved production. Classroom lectures where there is more student involvement such as the excerpt below from an education class lecture even more so highlights the diVerences between the language of textbooks and the language in the classroom. Again, nouns (bolded), prepositions (underlined) and Wrst and second personal pronouns, and pronoun it (italicized) have been highlighted. In addition to these features, notice the number of questions used in this excerpt, not only questions that are formed with wh-question markers, but also those without any overt question markers (e.g. For reading?). (3) Education lecture excerpt (speaker IDs 1: the professor 2: a student) 1: What are some of the alternatives that, could be generated for that kid? What kind of placements could you look at? 2: For reading? 1: Huh? 2: For reading you said? 1: Yeah reading and math, they, they qualiWed in learning disabilities.

78

Randi Reppen

2: Couldn’t they go to, like um, the resource room? 1: Yeah. 2: For math and reading class? 1: You could do that you could, you could consider options, such as resource for math and reading. What’s the Wrst option though under the law you have to face? 1: General classrooms. 2: [unclear] 1: OK. ose are speciWc modifications. And that’s part of it. What we’re kind of gearing toward now is where will they get the services? You gotta Wrst consider general education.

Even though this excerpt is not lexically dense or syntactically complex, it could be very challenging to follow due to the rapid interchange and the lack of structurally complete units, particularly if a student is a second language speaker. In contrast, the excerpt below from a botany textbook, is structurally complete, but is quite complex structurally and has an extremely high use of technical terms, thus presenting yet another challenge to the students reading the text. (4) Textbook excerpt Some leaves are accompanied by a pair of small scale-like or leaf-like structures known as stipules which are attached either to the petiole base or to the twig, one on either side of the petiole. ese usually drop oV as the leaf expands. Plants with stipules are stipulate; those without them are estipulate. A given leaf persists on the tree for one season of growth or two or more years and then will eventually drop aer development of an abscission layer at the junction of stem and leaf.

Along Dimension 1, textbooks and course packs most closely resemble oYcial documents, which are know for their use of dense language, complex structures, and an absence of features that involve the reader (e.g. personal pronouns, questions, contractions). At the other end of Dimension 1, the spoken university registers are linguistically more similar to face-to-face conversation. However, by looking at the content of the lectures, it is easy to see that the information load of the lectures is much greater than normal faceto-face conversation. Even though the excerpt above from a geology lecture (2) uses personal pronouns and contractions, it also contains deWnitions or explanations of technical terms (e.g. tillite) and also uses technical terms such

Academic language

79

as glacial event and glacial till. So, even though the manner of presentation is more linguistically similar to features found in face-to-face conversation, the information presented is not similar to the information frequently conveyed in face-to-face conversation. e analysis of how the academic registers of the T2K-SWAL corpus plot along Dimension 1 reveal that the language that students encounter in university settings is somewhat similar to some registers they may have encountered (e.g. oYcial documents), it is also distinct from those registers encountered in everyday life (e.g. face-to-face conversation). e written registers are informationally dense and have complex linguistic structures associated with them. e spoken university registers, although linguistically similar to face-to-face conversation, diVer greatly from the goals of most conversations. In most faceto-face conversation, presentation of self and interpersonal information are the most common goals. However, in the classroom setting, presentation of factual information is the goal. Even though the linguistic forms maybe familiar, in that they are similar to those in conversation, the information that is being delivered in classroom lectures oen involves the use of technical terms and the delivery of factual information. is mismatch of familiarity with the format but major diVerence in the goals may present a challenge to students and is worth further investigation.

Lexical bundles “Lexical bundles are simply sequences of word forms that commonly go together in natural discourse” (Biber et al. 1999:990). In most instances, lexical bundles cross structural boundaries (e.g. I don’t know why..., in the case of…). Lexical bundles are identiWed solely on the basis of the recurrence of the pattern of words, and without the use of computers to accomplish this task it would be impossible. Imagine trying to identify recurring patterns of words and then keeping track of all the instances when the words appeared in this sequence. To do this for just this chapter would be a monumental task, now imagine doing this for a corpus of several million words. Biber, Conrad, and Cortes (2004) examined the T2K-SWAL corpus and identiWed the major structural patterns of four word lexical bundles that occurred at least twenty times per million. Aer identifying the diVerent structural patterns, they counted the number of diVerent expressions that occurred within each pattern. For example, in the pattern ‘verb phrase with

80

Randi Reppen

active verb’ in classroom teaching, there were thirty-four diVerent expressions that occurred in this category, while in textbooks only three diVerent expressions were found in this category. Table 3 shows the Ween major structural patterns of bundles that occur at least twenty times per million. Table 3 shows that there is much greater variety in the number of lexical bundles that occur in classroom teaching rather than those found in the textbook register. All Ween structural types, except one (i.e. verb phrase with passive verb), are represented in the classroom teaching register, while in the textbook register, only eight of the Ween structural types are represented. e variety of lexical bundles found in classroom language is much greater than that found in textbook language. In the seven patterns where both classroom and textbook language are represented, in all but one category (i.e. prepositional phrase expressions) classroom language shows much greater variety in the number of diVerent expressions. Many of the types of lexical bundles that are found only in the classroom teaching discourse are ones that reXect involvement, or encourage students to be active participants in the lecture or at least active in processing the information being presented. e examples below come from classroom discourse and show several types of lexical bundles that are found in classroom discourse that are rare or completely absent in textbooks. If clause fragments: (5) Graduate geology lecture at’s a sub-tidal model. Now, if we look at ancient tidal deposits... (6) Undergraduate education lecture If, now I mean this, if you have a question about your project or about the grade on it, I don’t have a problem with you writing a note at the top of that. And I’ll look back over anything.

1st person pronoun + clause fragment: (7) Graduate geology lecture ... I don’t know whether it was his interview talk or the last one, when he was looking at those uh I guess little terraces oV the coast of Alaska? I don’t know if any of you saw that he was looking at these progradational phases taking place.

Academic language

81

Table 3. Distribution of lexical bundle types across classroom teaching and textbooks - all lexical bundles occur more than 20 times per million words (adapted from Biber, Conrad, and Cortes 2004) Lexical bundle Pattern

Classroom Teaching

Textbooks

1st/2nd person pronoun + clause fragment e.g., I don’t know if

74/29%

--

3rd person pronoun + clause fragment e.g., it’s going to be

19/ 8%

--

Verb phrase with active verb e.g., take a look at

34/13%

3/ 4%

Verb phrase with passive verb e.g., is based on the

--

4/ 6%

Yes/no question fragments e.g., do you want to

5/ 2%

--

WH question fragments e.g., what does that mean

6/ 2%

--

WH-clause fragments e.g., what I want to

9/ 4%

–

If-clause fragments e.g., if you have a

11/ 4%

–

To-clause fragments e.g., to come up with

17/ 7%

6/ 9%

at-clause fragments e.g., that there is a

8/ 3%

1/ 1%

82

Randi Reppen

Table 3. Cont. Lexical bundle Pattern

Classroom Teaching

Textbooks

Noun phrase + of-phrase fragment e.g., one of the things

35/14%

20/30%

Other noun phrase expressions e.g., a little bit more

13/ 5%

8/12%

Prepositional phrase expressions e.g., at the end of

18/ 7%

23/35%

Comparative expressions e.g., as well as the

4/ 2%

1/ 1%

TOTAL

253/100%

66/100%

Yes – No question fragment: (8) Graduate education lecture 1: And then someone pointed out a couple more mistakes in the course pack. Um, but I can’t remember your name. I’m sorry. .. Leslie, um do you want to tell us what you found that was wrong with the course pack? 2: Uh, um the second article, halfway [unclear words]. 1: and that was my, just a mistake with the copying. And I’ll give you those pages.

e category prepositional phrase expressions (see excerpts below) is the only structural pattern where textbook language had a greater variety of expressions than classroom language. Given the extremely high number of prepositions used in academic writing, this Wnding is consistent with what would be expected. e examples in the Multi-dimensional analysis in the section above also brought out the frequent use of prepositions as a feature strongly associated with written academic texts.

Academic language

83

Prepositional phrase expressions: (9) Botany textbook In pine cones, which mature at the end of two, or rarely three, seasons, the apophysis terminates in a small protuberance called the umbo. (10) Geology textbook A mass of new basalt was deposited on top of the 1938 deposit. At the same time, melting of the ice created a crater in the surface of the glacier and a large volume of water. is new meltwater Xowed under the ice to enlarge the sub-glacial lake.

Biber, Conrad, and Cortes (2004) also found that classroom teaching not only had a greater frequency and variety of lexical bundles but that the range of functions (e.g. marking stance, organizing discourse) performed by the lexical bundles was much greater than those functions found in textbooks or in conversation and academic prose. One factor that may contribute to the high use of lexical bundles in classroom teaching is the range of tasks performed in classroom teaching. Classroom teaching delivers content information and also attempts to be engaging to the students. ese two tasks place a great and somewhat conXicting demand on the person delivering the lecture. e frequent use of lexical bundles may provide important clues for the students listening and processing the lectures (Biber, Conrad and Cortes 2004). e patterns of lexical bundles found in classroom language may have useful pedagogical implications for ESP courses. At least by teaching the patterns and the possibilities that can occur in classroom lectures the learners can be better prepared. is pre-packaging of information or of the structures used to present information can help the listener by reducing the processing load. However, in reading textbooks or course packs, we see that there are not so many frequent patterns or groups of lexical bundles to help with written academic discourse. e linguistic diversity found in academic writing in the Multi-dimensional analysis is also reXected in the patterns found in the lexical bundles.

Conclusion and application to teaching From both the Multi-dimensional analysis and the investigation of structural patterns found in lexical bundles it is clear that the language found both in the classroom and in textbooks – and course packs – presents linguistic challenges

84

Randi Reppen

that are not found in other registers. In textbook and course packs the combination of complex linguistic structures and the use of technical vocabulary create texts that are diYcult to process. e reader must not only have control of the vocabulary, but also be able to untangle long complex sentences which oen have many prepositional phrases layered in the sentence. On the other hand, in classroom teaching, the use of technical vocabulary also presents a challenge, but so does the format. Since the academic lectures may on the surface appear to be similar to everyday spoken language, listeners may be caught oV guard as they realize that, unlike everyday conversation, the lectures and discussions are laden with information. ere are however a few patterns of lexical bundles that can help the listener process information more eYciently. From the information presented in this chapter, we see that through corpus linguistics it is possible to provide rich, accurate descriptions of language use. e use of corpora and computers allows researchers to explore complex issues related to language use and the association of linguistic features associated with diVerent situations of language use. e information from corpus-based studies of language use can provide valuable contributions in the areas of ESL/ EFL teacher training and materials development. Research can describe how the use of particular linguistic features varies across diVerent situations, or how sets of linguistic features work together in diVerent registers. All this information can contribute to the knowledge-base used to shape the materials that are used to teach language learners. Teacher trainers for ESL/EFL programs can use information from corpus-based studies of academic language to inform their classroom teaching, thus better preparing teachers to meet the needs of their ESL/EFL students, many of whom will study at American universities.

References Atkinson, D. 1999. ScientiWc discourse in sociohistorical context: The philosophical transactions of the Royal Society of London, 1675–1975. Mahwah, NJ: Lawrence Erlbaum. Berkenkotter, C. and Huckin, T. 1995. Genre knowledge in disciplinary communication. Hillsdale, NJ: Lawrence Erlbaum. Biber, D. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. Biber, D. 1990. “Methodological issues regarding corpus-based analysis of linguistic variation.” Literary and Linguistic Computing 5:257–269. Biber, D. 1993a. “Representativeness in corpus design.” Literary and Linguistic Computing 8:243–257.

Academic language

85

Biber, D. 1993b. “Using register-diversiWed corpora for general language studies.” Computational linguistics 19:219–241. Biber, D., Conrad, S. and Cortes, V. 2004. “Take a look at Lexical bundles in university teaching and textbooks.” Journal of Applied Linguistics 25: 401–435. Biber, D., Conrad, S. and Reppen, R. 1994. “Corpus-based approaches to issues in Applied Linguistics.” Journal of Applied Linguistics 15:169–189. Biber, D., Conrad, S. and Reppen, R. 1996. “Corpus-based investigations of language use.” Annual Review of Applied Linguistics 16:115–136. Biber, D., Conrad, S. and Reppen, R. 1998. Corpus linguistics: Exploring language structure and use. Cambridge: Cambridge University Press. Biber, D., Conrad, S., Reppen, R., Byrd, P. and Helt, M. 2002. “Speaking and writing in the university: A multi-dimensional comparison.” TESOL Quarterly 36:19–48. Biber, D., Conrad, S., Reppen, R., Byrd, P., Helt, M., Clark, V. Cortes, V., Csomay, E., and Urzua, A. (2004). Representing Language Use in the University: Analysis of the TOEFL 2000 Spoken and Written Academic Language Corpus. Princeton, NJ: ETS. Biber, D., Johansson, S., Leech, G. Conrad, S. and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow, UK: Longman. Biber, D., Reppen, R., Clark, V. and Walter, J. 2001. “Representing spoken language in university settings: The design and construction of the spoken component of the T2K-SWAL Corpus.” In Corpus linguistics in North America: Selections from the 1999 Symposium, R. Simpson and J. Swales (eds), 48–57. Ann Arbor: University of Michigan Press. Cazden, C. 1988. Classroom discourse: The language of teaching and learning. Portsmouth, NH: Heinemann. Conrad, S. and Biber, D. (eds) 2001. Variation in English: Multi-dimensional Studies. Harlow, UK: Longman. Edwards, J. A. and Lampert, M. D. (eds) 1993. Talking Data: Transcriptions and Coding in Discourse Research. Hillsdale, NJ: Lawrence Erlbaum. Ferguson, G. 2000. “If you pop over there: A corpus-based study of conditionals in medical discourse.” English for SpeciWc Purposes 20:61–82. Flowerdew, J. and Tauroza, S. 1995. “The eVect of discourse markers of second language lecture comprehension.” Studies in Second Language Acquisition 17:435- 458. Halliday, M. A. K. 1988. “On the language of physical science.” In Registers of written English, M. Ghadessy (ed), 162–178. London: Pinter. Hunston, S. 1995. “A corpus study of some English verbs of attribution.” Functions of Language 2:133–158. Hyland, K. 1994. “Hedging in academic writing and EAP textbooks.” English for SpeciWc Purposes 13:239–256. Hyland, K. 1996a. “Talking to the academy: Forms of hedging in science research articles.” Written Communication 13:251–281. Hyland, K. 1996b. “Writing without conviction? Hedging in science research articles.” Applied Linguistics 17:433–454. Khuwaileh, A. A. 1999. “The use of personal pronouns: Role relationships in scientiWc journal articles.” English for SpeciWc Purposes 18:121–138.

86

Randi Reppen

Nattinger, J. R. and DeCarrico, J. S. 1992. Lexical phrases and language teaching. Oxford: Oxford University Press. Simpson, R. C. and Swales, J. M. 2001. “North American perspectives on corpus linguistics at the millennium.” In Corpus linguistics in North America: Selections from the 1999 Symposium, R. Simpson & J. Swales (eds), 1–14. Ann Arbor: University of Michigan Press. Swales, J. M. 1990. Genre analysis: English in academic and research settings. Cambridge: Cambridge University Press. Swales, J. M., Ahmad, U. K., Chang, Y. Y., Chavez, D., Dressen, D. F. and Seymour, R. 1998. “Consider this: The role of imperatives in scholarly writing.” Applied Linguistics 19: 97–21. Thompson, G. and Ye, Y. 1991. “Evaluation in the reporting verbs used in academic papers.” Applied Linguistics 12:365–382. Varantola, K. 1984. On noun phrase structures in engineering English. Unpublished dissertation. University of Turku, Finland. Williams, I. 1996. “A contextual study of lexical verbs in two types of medical research report: Clinical and experimental.” English for SpeciWc Purposes 15:175–197.

Corpus analysis and academic persuasion

87

A convincing argument: Corpus analysis and academic persuasion Ken Hyland University of London

Academic discourse and scientific explanation ere is a widespread belief that academic discourse is a unique form of argument because it depends upon the demonstration of absolute truth, empirical evidence or flawless logic. Its persuasive potency is seen as grounded in rationality and based on exacting methodologies, dispassionate observation, and informed reflection. Academic writing, in other words, represents the discourses of Truth (Lemke 1995:178). It provides an objective description of what the natural and human worlds are actually like and this, in turn, serves to distinguish it from the socially contingent. We see this form of persuasion as a guarantee of reliable knowledge, and we invest it with cultural authority, free of the cynicism with which we view the partisan rhetoric of politics and commerce. is view receives its strongest support from those who champion the explanatory methods of the hard sciences. Science is held in high esteem in the modern world precisely because it is seen to provide a model of rationality and detached reasoning. e label scientific confers reliability on a method and prestige on its users, it implies all that is most objective and empirically verifiable about academic knowledge. For these reasons it has been imitated by the fields of human and social inquiry, such as sociology and linguistics, which are oen considered soer and thus less dependable forms of knowledge. Underlying this realist model is the idea that knowledge is built on the non-contingent pillars of impartial observation, experimental demonstration, replication, and falsifiability. Consequently, scientific papers are persuasive because they communicate independently existing truths which originate in our direct access to

88

Ken Hyland

phenomena in the external world. e text is merely the channel which allows scientists to relay observable facts. But scientific methods provide less reliable bases for proof than commonly supposed. Although we rely on induction in our everyday lives, believing that the bus we take to work will pass by at 8am tomorrow if it has passed at 8am every day for the past week, it has received short shri from philosophers of science. ey argue that observation does not supply a secure basis for science because by moving from observations of particular instances to general statements about unobserved cases, scientists introduce uncertainty. e widely accepted alternative – Popper’s Falsification model, which puts theories through experimental testing and replaces those that are defective with more verifiable ones – is similarly unreliable. It is simply not possible to conclusively falsify a hypothesis because the observations that form the basis for the falsification must be expressed in the language of some theory, and so will only be as reliable as that theory. at is, all reporting occurs within a pragmatic context and in relation to a theory which fits observation and data in meaningful patterns, so there is no secure observational base upon which any theories can be tested (Chalmers 1982). e problem for both inductivism and falsification is therefore that interpretation depends on the assumptions the scientist brings to the problem (e.g. Kuhn 1970). Observations are as fallible as the theories they presuppose, and so cannot provide a solid foundation for the acceptance of scientific claims. As the physicist Stephen Hawking (1993:44) notes, a theory may describe a range of observations, but “beyond that it makes no sense to ask if it corresponds to reality, because we do not know what reality is independent of a theory.” Texts cannot therefore be seen as accurate representations of what the world is really like because this representation is always filtered through acts of selection and foregrounding. To discuss results and theories is not to reveal absolute proof; it is to engage in particular forms of persuasion. In fact, because all reporting involves the interpretation of observations and data, knowledge can only emerge from a disciplinary matrix. Writers cannot step outside the beliefs or discourses of their social groups to find justifications for their research that is somehow objective. ey must draw on principles and orientations from their cultural resources to organize their work, and this grounds academic persuasion in the conventional textual practices for producing agreement. Simply, if truth does not reside in an external reality, then there will always be more than one plausible interpretation of any piece of

Corpus analysis and academic persuasion

89

data, and this plurality of competing explanations shis attention to the ways that academics argue their claims. Academic corpora play a crucial role here by helping to show how research findings are rhetorically transformed into academic knowledge. Because writers can only guide readers to a particular interpretation rather than demonstrate proof, readers always have the option of refuting their interpretations. At the heart of academic persuasion, then, is writers’ attempts to anticipate possible negative reactions to their claims. To do this they must display familiarity with the persuasive practices of their disciplines, encoding ideas, employing warrants, and framing arguments in ways that their potential audience will find most convincing. ey also have to convey their credibility by establishing a professionally acceptable persona and an appropriate attitude, both to their readers and their arguments. In sum, persuasion in academic articles, as in other areas of professional life, involves the use of language to relate independent beliefs to shared experience. Writers galvanise support, express collegiality, resolve difficulties, and negotiate disagreement through patterns of rhetorical choices which connect their texts with their disciplinary cultures. In this chapter I use a corpus of research articles to explore three key elements of persuasion in academic writing, looking at citation, interaction and self-mention. ese are all important realisations of the research writer’s concern for audience and have been the subject of speculation and interest by rhetoricians and linguists for some years. In fact, the ways that writers establish their credibility (or create an ethos) and consider readers’ potential attitudes to the argument (pathos) date back to Aristotle. is chapter therefore seeks to ground these timeless concepts in the actual behaviour of real writers and the ways they engage their disciplinary peers in webs of interaction and persuasion.

A research article corpus A corpus approach brings a distributional perspective to linguistic analysis by providing information about the relative frequency of items and the ways they are used, pointing to systematic tendencies in the selection of meanings. Corpora thus reduce the burden of evidence that is oen placed on intuitions to show how particular grammatical and lexical choices are regularly made.

90

Ken Hyland

More than this, however, corpus analyses show that writing is characterised by impressive regularities of pattern with endless variation (e.g. Sinclair 1991; Stubbs 1996), and that lexical and grammatical features are not only likely to vary across registers (e.g. Biber et al. 1999), but also across disciplines as language using communities (Hyland 2000). e corpus on which the work in this chapter is based recognizes this need for diversity and was compiled to represent a broad cross-section of academic practice. It reflects the belief that all texts reveal their writers’ assumptions about their readers, shaped by prior texts, repeated experience, and by orientations to certain conventions. It assumes that persuasion is not simply accomplished with language, but with language that demonstrates legitimacy as writers draw on institutional practices which appeal to readers from within the boundaries of their discipline. e corpus has been used to study a range of features including citations (Hyland 2003), directives (Hyland 2002a), questions (Hyland 2002b), authorial pronouns (Hyland 2002c), and engagement features (Hyland 2001). e corpus comprises 240 published papers, three from each of ten leading journals in eight disciplines. e fields were mechanical engineering (ME), electrical engineering (EE), marketing (Mk), philosophy (Phil), sociology (Soc), applied linguistics (AL), magnetic physics (Phy) and molecular biology (Bio). e journals were nominated by discipline informants as among the leading publications in their fields, and the articles were chosen at random from current issues. ese texts were scanned into a machine-readable form, producing an electronic corpus of 1.3 million words (Table 1). While frequency and collocational data provide descriptions of existing practice, telling us what writers do, to stop here runs the risk of reifying conventions rather than explaining them. e text data were therefore supplemented with interviews with experienced researcher/writers from the target fields to obtain participant perspectives on disciplinary practices. ese typically began with detailed examinations of text extracts to explore what writers had tried to achieve with specific choices. ese discourse-based interviews (Odell, Goswami, and Herrington 1983) seek to make explicit the tacit knowledge or strategies that writers and readers bring to acts of composing or reading. e interviews then moved to more general observations which focused on subjects’ impressions of disciplinary practices, but allowed them to raise any other relevant issues. ese were conducted using a semi-structured format of open-ended prompts (Cohen, Manion, and Morrison 2000), allowing subjects

Corpus analysis and academic persuasion

91

Table 1. Text corpora Disciplines

Texts

Words

Molecular Biology Mechanical Eng Electronic Eng Magnetic Physics

30 30 30 30

143, 500 114, 700 107, 700 97, 300

Total ‘Hard’ fields

120

463, 200

Texts

Words

Disciplines Sociology Philosophy Marketing Applied Linguistics Total ‘So’ fields

30 30 30 30

224, 500 209, 000 214, 900 211, 400

120

859, 800

to respond to texts as readers with insider community understandings, while also discussing their own discoursal preferences. Interviews were recorded and transcribed and then analysed recursively, looking for key ideas and patterns across respondents.

An approach to analysis e corpus was searched for specific features related to particular aspects of writer-reader interaction using WordPilot 2000 (Milton 1999), a text analysis and concordance program. Analysis of discourse features is necessarily timeconsuming and labor-intensive, involving several passes through the data and careful checking of each item in its larger sentential or textual context to ensure that each case represents an example of the target function. Concordances are important as they display all occurrences of a feature in its immediate co-text, usually with instant access to the wider text, which enables functions to be identified and ambiguities clarified. Checking concordance lines is therefore a recursive procedure to narrow down, expand and combine initial general categories. In each of the analyses reported here I made several sweeps of the data

92

Ken Hyland

to weed out irrelevant examples, ensure accurate counts and identify recurring pragmatic functions. It should be clear that there is nothing particularly esoteric in this. e chapters in this book show that corpora are approached in many ways, but I use a corpus to assist, rather than drive, research and there is nothing here that could not feasibly, if more tediously, be done with a pencil. is is a way of coding data that, while influenced by the researcher’s theoretical knowledge and experience, ensures that the categories are relevant to the research issues and, as far as possible, emerge from the data itself. e approach produces categories that are: – – –

Conceptually useful: in that they help to answer research questions Empirically valid: in emerging from the data itself Analytically practical: being easy to identify, specific, and non-overlapping

I should point out that coding needs careful validation to avoid simply reflecting the researcher’s preconceptions. Swales (1981:13) recognized the danger of this twenty years ago when he cautioned that “the discourse analyst labels something as x and then begins to see x occurring all over the place.” To avoid this kind of bias, samples of data were always coded by another rater working independently. e goal here is to ensure that there is a high degree of interrater agreement, that is, that both analysts see the same thing. Similar procedures were used with the interview material, recursively passing through the tapes and transcribing what seemed to be key aspects. In some cases the qualitative data analysis programme WinMax Pro (1998) was used for cross-referencing and drawing connections between interviews from different fields. ese were then related to the text data to illuminate the patterns observed there. e objective was to bring insiders’ understandings of what it is they do when they read and write in their disciplines to the analysis. Participant accounts are suggestive of writers’ experiences of the activities they routinely engage in and add considerably to what we are able to say about texts, allowing us to see the factors which might contribute to disciplinary meanings.

Corpus analysis and academic persuasion

93

Academic persuasion and disciplinary practice Corpus analysis shows that these disciplinary meanings are achieved through regularities in the rhetorical conventions of reporting which are, at the same time, influenced by the knowledge making practices of the disciplines. Discoursal conventions are persuasive because they are significant carriers of the epistemological and social beliefs of community members. So while individual disciplines, and sub-groupings within disciplines, have their own preferences concerning theoretical approaches, explanatory procedures, research techniques, rhetorical practices, and so on, analysis suggests that we can see knowledge as collectively constructed within the broad cognitive and procedural understandings of hard and so knowledge domains. e concept of hard and so fields carries connotations of clear-cut divisions, risking reductionism by packing a multitude of complex abstractions into a few simple opposites. But this categorization is directly related to established disciplinary groupings (Becher 1989), and gains support from studies which suggest that it may actually represent participant actors’ own perceptions of their practices (Biglan 1973; Kolb 1981; Hyland 1999, 2000). While the hard-so distinction is by no means clear cut, it does offer a useful way of examining general similarities and differences between fields. e hard knowledge disciplines can be seen as predominantly analytical and structuralist, concerned with quantitative model building and the analysis of observable experience to establish empirical uniformities. Explanations thus derive from precise measurement and systematic scrutiny of relationships between a limited number of controlled variables. Knowledge is characterised by relatively steady cumulative growth, problems emerge from prior problems and there are fairly clear-cut criteria of what constitutes a new contribution and how it builds on what has come before (Becher 1989; Hyland 2000). So knowledge disciplines, in contrast, oen address the influence of human actions on events. Variables are therefore more varied and causal connections more tenuous. ese fields tend to employ synthetic rather than analytic inquiry strategies and exhibit a more reiterative pattern of development with less scope for reproducibility (Becher 1989; Kolb 1981). ese polar distinctions obviously cannot capture the full complexity of disciplinary differences, but they do provide a useful basis for identifying dimensions of variability between fields. ey are especially valuable if we picture them as the extreme ends of a continuum along which disciplines and

94

Ken Hyland

their sub-fields are arrayed with varying degrees of correspondence to either end. e important issue here is not whether some disciplines are entirely one or the other, but whether these distinctions have effects which are reflected in writers’ preferred patterns of persuasion. In what follows I will examine some of these patterns, drawing on the corpus discussed above to detail a number of the linguistic and rhetorical practices by which academics demonstrate their professional credibility and the value of their work to their disciplines.

Connecting to textual frameworks: Citation as persuasion One of the most obvious strategies for situating research within disciplinary expectations is through appropriate citation practices (Hyland 1999; ompson and Ye 1991). Citation is central to the social context of persuasion as it helps provide an intertextual framework for new work, allowing the writer to construct an effective justification for an argument and demonstrate the novelty of his or her position (Gilbert 1977). By acknowledging previous research, writers are able to display an allegiance to a particular community or orientation, create a rhetorical gap for their research, and establish a credible writer ethos (Swales 1990). In sum, citation is a major indication of a text’s dependence on a disciplinary context, helping writers to demonstrate familiarity with the field and establish a persuasive epistemological and social framework for their arguments. Corpus analysis shows, moreover, that the frequency and use of citation differ according to different rhetorical contexts, influenced by the ways particular disciplines see the world and tackle research. Table 2 gives an idea of these variations in an 80 paper sub-corpus of the corpus discussed above, consisting of one paper from each of the journals sampled (Hyland 1999). e figures show that the articles in philosophy, sociology, marketing and applied linguistics together comprised two thirds of all the citations in the corpus, twice as many as the science disciplines, with engineering and physics well below the average. An important feature of hard knowledge is that research occurs within an established theoretical framework which provides the imperative and explanatory schema for new findings (Kolb 1981; Kuhn 1970). Writers are able to presuppose a certain amount of background and to coordinate research using a highly standardized code in place of an extensive system of references (cf.

Corpus analysis and academic persuasion

95

Table 2. Comparison of citations by discipline (Hyland 1999) Rank

Discipline

Av. per paper

1 2 3 4 5 6 7 8

Sociology Marketing Philosophy Biology Applied Linguistics Electronic Engineering Mechanical Engineering Physics Totals

104.0 94.9 85.2 82.7 75.3 42.8 27.5 24.8 67.1

per 1000 words 12.5 10.1 10.8 15.5 10.8 8.4 7.3 7.4 10.4

Total Citations 1040 949 852 827 753 428 275 248 5,372

Bazerman 1988). Citation is therefore a means of integrating new claims into a scaffolding of already accredited facts. References are oen sparse and tend to be tightly topic-bound which helps to closely define a specific context of knowledge and contributes to a sense of linear progression. In the so disciplines, however, this kind of linearity and predictability is relatively rare as writers retrace others’ steps and draw on a literature which is more dispersed and open to greater interpretation. Readers cannot be assumed to possess the same knowledge and writers oen have to pay greater attention to elaborating a context through citation. e more frequent citations in the so texts therefore suggest greater care in firmly situating research within disciplinary frameworks, reconstructing the literature to demonstrate a plausible basis for their claims. In addition to the greater frequency of citation in the so fields, these writers also give more prominence to the cited author through use of integral structures and by placing authors in subject position: (1) Weinstein (1993) suggests that the critical thinking movement may well be part of … (AL) Sherin (1990) argues that police agencies establish triage systems whereby… (Soc) Baumgartner and Bagozzi (1995) strongly recommend the use of … (Mkt)

Writers in the hard disciplines, on the other hand, tend to reduce the role of the author with non-integral and numerical-endnote formats:

96

Ken Hyland

(2) Furthermore, it has been shown [103] that the fundamental dynamic range of … (EE) As already observed by others [17], T1 was found to be … (Phy) Refs [12–19] work out the theory of spatial kinematic geometry in fine detail. (ME)

One reason for this is that persuasion at the hard end of the continuum tends to suppress the actions of human actors in constructing knowledge and to emphasize the authority of scientific procedure. Downplaying the perspective of human judgement in the interpretation of data gives the impression of nature revealing itself directly. So by removing the agent, writers remove any implication of human intervention, with all suggestion of personal interest, social allegiance, faulty reasoning and other distorting factors. It also suggests that the person who publishes a claim is immaterial to its accuracy, encouraging the idea that scientific persuasion is based on writers discovering truth, not making it. At the soer end of the continuum, however, in the humanities and social science papers, persuasion requires high author visibility. Knowledge is constructed through a personal dialogue with peers, rather than by extending the thread of knowledge from previously established truths. e extensive use of citation and author mention thus help to achieve a high degree of personal involvement among actors while positioning the writer in relation to views that he or she supports or opposes. is was made clear by several of my disciplinary informants during the interviews: Citing allows you to debate with others, the questions have been around a long time, but you hope you are bringing something new to it. You are keeping the conversation going, adding something they haven’t considered. (Phil interview) I’ve aligned myself with a particular camp and tend to cite people from there…. It’s a kind of code, showing where I am on the spectrum. Where I stand. (Soc interview)

Persuasion in the humanities and social science articles also utilizes far more, more varied, and more argumentative reporting verbs, than in the hard sciences, reflecting persuasive practices which more readily regard explicit interpretation, speculation and complexity as legitimate aspects of knowledge. is is most apparent in the greater use of citation verbs involving verbal expression (see 3) and cognition (concerned with thought and perception – see 4). Both of these types facilitate qualitative arguments:

Corpus analysis and academic persuasion

97

(3) Baddeley proposes a tripartite system of working memory,… (AL) As Hinde (1979) points out, many unhappy marriages remain intact because of… (Mkt) Jacoby accuses American intellectuals of a turn to conservatism,... (Soc) (4) Acton (1984) sees preparing students psychologically as a… (AL) Aguirre and Baker (1990) conclude that racial discrimination has become… (Soc) Donnelan believes that for most purposes we should take….. (Phil)

In contrast, the physics and engineering papers together contained only nine cognition verbs, thereby masking the role of author interpretation in the research process. Instead, choices in the hard sciences emphasize acts of research, placing a persuasive accent on real-world activities to convey an experimental explanatory schema. Knowledge is more likely to be shown as proceeding from laboratory activities rather than the interpretive operations of researchers: (5) e reasons for this are examined in detail by Yeo et al (1990), … (Bio) …a “layer” coupled-shot finline structure was studied by Mazur [7] and Tech et al [8] ... (Phy) ... using special process and design [42], or by adding [101], or removing [83] a mask. (EE) Finally, Eto et al. (1994) reviewed and analyzed the contents of several indicators,.. (ME)

Citation thus plays an explicitly persuasive role in academic persuasion and corpus analysis helps show some of the ways that textual conventions are not simply stylistic proclivities, but represent distinctions in how knowledge is typically negotiated and confirmed in academic communities. e results here show far higher use of manifest intertextuality in the so knowledge fields and suggests an evidential schema more dependent on the establishment of an explicit integration of new and existing material.

Interaction and engagement: Reader-oriented features Another significant dimension of persuasion in research papers is the writer’s projection of the perceptions, interests, and needs of a potential audience into a discourse. Any text anticipates a reader’s response and itself responds to a

98

Ken Hyland

larger discourse already in progress, so argument incorporates the active role of an addressee and is understood against a background of other views on the same theme in prior texts (Bakhtin 1986). is is most obviously achieved when writers use explicit text features to address readers directly. A list of 85 items providing potential surface feature evidence of reader engagement based on previous literature was compiled, and their patterns and frequencies explored in the 240 article corpus. ese showed the use of inclusive, second person, and indefinite pronouns and asides to address readers directly as participants in an argument, effecting interpersonal solidarity and membership of a disciplinary in-group. e main purpose of these reader appeals seems to be primarily interpersonal and acknowledges the need to meet readers’ expectations of inclusion. Another group of features, mainly questions, directives and references to shared knowledge, are used to pull the audience into the discourse at key points and guide them to particular interpretations. is second purpose is more concerned with rhetorically positioning the audience, recognizing the reader’s role as a critic and potential negater of claims by predicting and responding to possible objections and alternative interpretations. While these two broad functions are not always distinct, they help to show more clearly the uses of rhetorical persuasion and to compare the patterns of engagement across disciplines. Table 3 summarizes the distribution of devices initiating such interactions across disciplines, with reader pronouns and directives amounting to over 80% of all features. e results show some interesting cross-discipline similarities, but most obvious are the disciplinary variations, where, for example, philosophers employed ten times more devices than biologists. In general, more readeroriented markers were found in the discursive so fields, particularly reader pronouns, questions and asides. is symmetry was upset by the physicists who joined philosophers, sociologists and applied linguists in a relatively high use of inclusive we pronouns and explicit references to shared assumptions. Directives tended to comprise the highest proportion of features in the hard sciences. I will briefly discuss the most frequent engagement features, pronouns and directives, here. Readers are most explicitly addressed as discourse participants by the use of personal pronouns, most commonly inclusive we. e clearest acknowledgement of the reader’s presence, second person you and your, occur only rarely, suggesting that writers generally seek to reduce distance from their audience,

Corpus analysis and academic persuasion

99

Table 3. Frequency of reader features per discipline (per 10,000 words) (Based on Hyland 2001) Discipline

Reader Pronoun

Directives

Questions

Philosophy Sociology App Ling Marketing

110.1 22.5 19.1 11.3

26.1 15.8 19.5 12.6

14.4 6.7 4.9 3.3

Physics Elect Eng Mech Eng Biology

20.9 9.5 4.5 1.1

21.1 29.0 19.9 13.0

Overall

28.9

19.0

Shared knowledge

Asides

Totals

9.9 4.2 5.5 3.8

2.2 1.8 1.4 1.4

162.7 51.0 50.3 32.4

1.0 0.0 0.9 1.0

5.2 3.9 3.0 1.3

0.3 0.0 0.1 0.0

48.5 42.3 28.4 16.4

5.0

4.9

1.1

58.9

minimizing any implication that the writer and reader are not closely linked as members of the same disciplinary community. Where we do find second person and indefinite pronouns, then, writers use them to construct both the writer and the reader as participants with similar understanding and goals. It also sets up a dialogue between equals in which the potential point of view of the reader is woven into the fabric of the argument, articulating the thoughts and counter-claims of fellow professionals. e persuasive nature of this strategy oen extends into explicitly spelling out the conclusions the writer wants the reader to draw: (6) e reader will note the use of the passive voice when referring to …. (AL) To this end, we remind the reader that in the case of the nonrelativistic hydrogenic atom a similar situation occurs. (Phy) Furthermore, one has to consider that splice variants may alter the transactivation…. (Bio)

Laying stress on their membership, their joint affiliation to a community-situated pursuit of knowledge is an important way that writers give persuasive weight to their texts, as my informants pointed out: Part of what you are doing in writing a paper is getting your readers onside, not just getting down a list of facts, but showing that you have similar inter-

100 Ken Hyland

ests and concerns. That you are looking at issues in much the same way they would, not spelling everything out, but following the same procedures and asking the questions they might have. (Bio interview) I picture an ideal reader. Someone who is curious about the same kinds of issues, motivated by the same problems. I try and make that clear in the way I write, as if I am talking to a colleague, to someone I know. (Soc interview)

In particular, inclusive we is heavily used to bind writer and reader together as members of a disciplinary in-group: (7) Classical electromagnetic theory [9] tells us that a couple of potentials, A, V may be … (EE) …on what basis do we (who call ourselves applied linguists) decide to include … (AL) We know, however, it is only in the last few years that Weber and Simmel have … (Soc) But while the inclusive pronoun presupposes a certain communality, it can also be employed to guide readers towards a preferred interpretation, shading into explicit positioning of the reader. While stressing the involvement of writer and reader in a shared journey of exploration, it is always clear who is leading the expedition: (8) Now that we have a plausible theory of depiction, we should be able to answer the question of what static images depict. But this turns out to be not at all a straightforward matter. We seem, in fact, to be faced with a dilemma. Suppose we say that static images can depict movement. is brings us into conflict with Currie’s account,…… (Phil)

While pronouns work persuasively to establish solidarity and engage readers in the discourse, other strategies, most oen directives, draw readers into the text in order to position them. Directives instruct the reader to perform an action or to see things in a way determined by the writer (Hyland 2002a) and are typically realised in three main ways: by the presence of an imperative (see 9); by a modal of obligation addressed to the reader (see 10); and by a predicative adjective expressing the writer’s judgement of necessity/importance controlling a complement to- clause (see 11): (9) Consider now the simple conventional reflection effect in a magnetic interface (Phy) Note that the regular-verb experiments constitute the only relevant test…. (AL)

��

�� Corpus analysis and academic persuasion 101

�

�

�

�

�

��

��

��

�

�

�

��

��

��

�

��

��

� �

�

�

�

��

� ��

�

� ��

��

��

�

��

��

��

�

��

�

��

��

��

�

�

�

�

�

��

��

��

Figure 1. �� Categories of directives ��

(10) What we now need to examine is whether there is more to constancy than this. (Phil) …..we must identify the principal screws Sx and Sp. (ME) (11) As marketers, however, it is important to understand how the information …. (Mkt) Hence it is necessary to understand the capacitive coupling of the devices to the metal gates. (Phy)

ere is a clear reader-oriented focus to these statements, recognizing the dialogic dimension of research writing and directing readers to some action or understanding. But while oen seen as an imposition on addressees, it is clear that this is not always the case. An analysis of co-texts in the corpus reveals that directives can be classified according to three main type of activity they direct readers to engage in (Figure 1). Textual acts refer readers to another part of the text or to another text; Physical acts instruct readers to engage in either a research process or real world action; Cognitive acts steer readers to certain lines of thought, either by leading them through a line of reasoning, elaborating an argument, or emphasizing a point (Hyland 2002a). It is difficult to pick out clear disciplinary patterns from these functional distributions in the corpus. e summary in Table 4 shows a noticeable divi-

102 Ken Hyland

Table 4. Summary of directive functions by discipline (%) Discipline

Textual

Physical

Cognitive

Biology Physics Electronic Eng Mechanical Eng

55.7 24.4 11.6 13.1

22.2 29.8 40.0 33.6

22.1 45.8 48.4 53.3

Av Hard Fields

26.2

31.4

42.5

Discipline

Textual

Physical

Cognitive

Marketing Philosophy App Ling Sociology

52.2 16.3 55.3 68.1

8.2 2.8 10.0 2.5

39.6 80.9 34.7 29.4

Av So fields

48.0

5.9

46.1

sion between hard and so fields in the proportion of directives collocated with physical acts, and in fact over 80% of all cases occurred in the science texts. e so disciplines, with the exception of philosophy, contained more textual directives, which metadiscoursally guide readers through a discussion. Only biology in the hard sciences contained high frequencies of these uses, mainly and steered readers to tables or examples within the same text. It is worth noting here that directives comprised 61% of all the reader-oriented devices in the hard fields, compared with only 25% in the so fields. Directives are therefore a major rhetorical feature in the sciences, partly because they offer an economy of expression highly valued by informationsaturated scientists, but also because they allow writers to engage and lead an audience through an argument to a particular conclusion without expressing a clear authorial identity: (12) Note the transverse stress acts to fracture the monolith along the flow direction. (ME) e analysis given in our paper should be considered in the context of a …. (Phy) It is necessary to take into account the dT’/dUp derivative when calculating the … (EE)

Corpus analysis and academic persuasion 103

Outside philosophy, cognitive directives are less frequently employed in the so fields, perhaps because requiring readers to act or see things in a certain way more clearly violates the fiction of equality in published research writing. A greater number of directives thus tend to be citational in the so fields, a less threatening role than those which explicitly tell readers how to interpret an argument. Taken together, these features are important ways of situating academic arguments in the social interactions of members of disciplinary communities. rough their use of directives, personal pronouns, interjections, questions, and so on, we can recover something of how writers construct their readers by drawing them into both a dialogue and a relationship. Once again, these features represent relatively conventional ways of making meanings, and the considerable disciplinary variations help elucidate different contexts for interpretation, showing how writers and readers make connections, through texts, to their disciplinary cultures.

Self-mention and academic promotion A final feature of research article persuasion I want to touch on briefly here is the extent to which writers explicitly intrude into their discourse to assert their personal involvement and professional credibility. In addition to supporting their arguments with reference to prior work and engaging readers in appropriate ways, writers must also control the level of personal projection in their texts. is not only contributes to how writers display their disciplinary competence, but also helps ensure that readers recognize their individual contribution and their assertion of academic priority. Perhaps the clearest indication of the writer’s self-presentation is the use of self-citation and first person pronouns. Once again, corpus analysis reveals disciplinary uses which suggest that choices are at least partially influenced by the social practices of academic disciplines. Table 5 shows the results of a study of all exclusive first person pronouns (I, me, my, we, us, and our) and cases of self-citation in the 1.3 million word corpus (Hyland 2002d). Clearly, academic writing is not the faceless prose it is oen depicted to be and, along with abstraction and high information content, human agents are integral to their meaning. Overall, there were roughly 28 expressions of self-mention in each paper; 81% of these were pronouns (pre-

104 Ken Hyland

Table 5. Frequency of self-mention (per 10,000 words) Discipline Biology Physics Electronic Eng Mechanical Eng Av Hard Fields Discipline Marketing Philosophy App Ling Sociology Av So Fields

Totals 56.2 49.2 49.0 26.5 45.7 Totals 61.3 52.7 51.8 47.1 53.2

Citations 22.6 8.7 11.9 11.3 14.4 Citations 6.9 3.1 4.5 6.8 5.4

Mentions 33.6 40.5 37.1 15.2 31.3 Mentions 54.4 49.6 47.3 40.3 47.8

dominantly we and I), 16% were self-citations, and 2% were other mentions of the authors of the paper. Once again, we see that what constitutes admissible argument differs between communities. Self-mention is particularly dense per 10,000 words in physics, marketing, and biology, and while mechanical engineers may refer to themselves far less oen, they rely heavily on self-citation in linking their work into the disciplinary fabric. When we ignore text length and look at raw scores, we find that some 69% of all cases of self-mention occurred in the humanities and social science papers, with an average of 38 per article, compared with only 17 in science and engineering (Table 6). is was largely due to the much greater use of first person pronouns in the so disciplines. Self-citation is a prominent feature of the science and engineering papers where it made up almost 11% of all references, compared with only 5% in the so fields, and constituted 60% of all expressions of self-mention. Self-citation is obviously an important means of demonstrating one’s disciplinary credibility, and is perhaps the strongest demonstration a writer can make to establish his or her claim to be seen as an important player in a field and to have work taken seriously. e frequency variations of self-citation however also indicate conventions which reflect underlying differences in disciplinary research practices. e fact that issues in the humanities and social sciences tend to be comparatively diverse and detached from immediately prior developments means

Corpus analysis and academic persuasion 105

Table 6. Frequency of self-mention per text (%) by field type Domain

Totals

Citations

Self-Mentions

Hard fields

17.6 (31.6)

5.6 (59.1)

12.1 (26.1)

So fields

38.1 (68.4)

3.9 (40.9)

34.2 (73.9)

Overall

27.9 (100)

4.7 (100)

23.2 (100)

that there are perhaps less opportunities for self-citation. As I noted above, references in sciences and engineering tend to be tightly bound to a particular research topic. is is mainly because scientists tend to participate in highly discrete and specialised areas of research, partly as a result of the heavy investments in specialised know-how and technical equipment that hard knowledge production oen requires, and partly because of the rapid expansion of knowledge. ese factors coerce scientists into a niche of expertise from where they can make precise contributions, allowing writers to draw on their own work to a considerable extent: (13) A paper in biology is not just a one off bit of isolated research. Projects tend to be expensive and may take a long time to set up and produce anything important. What we write up probably reports a piece of research that may be going on for years. We are continuously building on what we’ve done. (Bio interview) We aren’t just blowing our own trumpets here. ere just aren’t that many people doing work in this particular field. (Phy interview)

ere are also substantial differences in how first person pronouns are employed across disciplines, both in overall frequency and in preferred patterns of use. Table 6 shows that almost 70% of all cases occurred in the humanities and social science papers, once again reflecting the different ways academics in different fields conduct research and persuade readers to accept results. Generally speaking, hard knowledge tends to be universalistic and conceptual, so research usually consists of conducting experiments to propose solutions to specific disciplinary problems. Here writers can rely on familiar procedures and relatively clear criteria, allowing them to downplay their personal role in the research and highlight the phenomena under study, the replicability

106 Ken Hyland

of research activities, and the generality of the findings. An impersonal style subordinates their own voice to that of nature and suggests research outcomes would be the same irrespective of the individual conducting it. In contrast, the different objects and methods of study in so fields mean that self-mention is a valuable strategy for conveying an appropriate degree of confidence, reliability, and authority. Arguments are more explicitly interpretive and the success of authors in gaining acceptance for their claims depends to a larger extent on their ability to invoke an intelligent, credible and engaging persona (Hyland 2000). e use of self-mention is here related to the desire to present oneself as an informed and reliable colleague, strongly identifying oneself with a particular view to gain credit for one’s individual perspective or research decisions. In addition to the frequency of self-reference, the points at which writers choose to make themselves visible in their texts also have considerable rhetorical importance, indicating what they are prepared to make commitments to and what they seek to claim credit for (Hyland 2003). e analysis revealed four main purposes, listed here with some examples from the corpus: Stating a goal or outlining the structure of the paper (14) In this article we re-examine the two-dimensional particle in a box and derive the…. (Phy) In section 1, I shall explain how PDP works. In sections II-IV, I shall consider three ….(Phil)

Explaining a procedure (15) We analysed the effect of the thermal couplings on the properties of an operational amplifier (EE) We transferred the proteins onto nitrocellulose membranes and incubated with … (Bio)

Stating results or making a claim (16) We have demonstrated that MCP can be used to form single- and multiple-helical … (ME) We found that more subjects mentioned beneficial and imagery attributes ….(Mkt)

Elaborating an argument (17) But my point here is that these laws are not enough for a complete vindication of Relevance. (Phil)

Corpus analysis and academic persuasion 107

Table 7. Functions of self-mention (%) Function

Total Bio Phy EE

Explaining a procedure Stating results or claim Elaborating an argument Stating a goal/ structure Totals Percent

400 273 220 158 1051

38 26 21 15

57 19 15 9

ME Phil

Soc

AL 26 28 20 26

46 19 17 18

50 15 20 14

49 18 14 18

5 30 41 24

100 100 100

100

100

100

Mkg Raw% 39 25 25 11

44 26 19 11

100 100

100

It is in this spirit that I offer my own contribution to the debate. I want to set out a slightly…. (Soc)

Table 7 shows that over half of all uses of the first person in the hard papers was related to setting out the methods used, while in the so fields it was also oen employed to present results and arguments. ere was an overall tendency for the first person to collocate with verbs conveying reasoning and possibility in the humanities and social sciences, where explicit self-mention seeks to establish a confident personal authority in elaborating arguments. In the hard sciences, self-mention was heavily associated with describing research activities. Author prominence here reminds readers that personal judgments have been made as a way of asserting the writer’s professional credentials: (18) Rather than attempt to prove the frequency matching concept mathematically, we elected to model the dynamic process occurring in a pulse combustor, and then….. (ME) To assist in the interpretation of serial sections, we used a Kontron image analyzer and … (Bio) We made the electrical connection to the microcoils with fine gold wire and silver epoxy. (EE)

More than in any other function, however, the use of self-mention to personally stake a claim suggests a conscious strategy to manage the reader’s awareness of the writer’s role. is is where writers can construct an explicitly accountable stance or to conceal the interpretative practices behind their

108 Ken Hyland

accounts. By strongly linking themselves to their claims, writers are able to solicit recognition for both. (19) I suggest that this arises largely because of the extreme powerlessness of … (Soc) In short, we demonstrate that what consumers know about a company can influence their ... (Mkt) For the study of ageing in society, I would argue that they can not give us that access. (AL)

To summarise, despite the strong feelings it oen generates among teachers and textbook authors, self-mention is important because it plays a crucial role in mediating the relationship between writers’ arguments and their discourse communities. It allows academics to emphasise their personal credibility and their contribution to the discipline by linking themselves closely to their work to create an identity as both disciplinary servant and persuasive originator. e fact that self-citations are higher in the sciences and first person mentions in the humanities and social sciences once again reflects the very different contexts in which knowledge is constructed.

Conclusions and teaching implications e issue of how persuasion is accomplished in research writing has been the subject of long philosophical debate (e.g. Pera and Shea 1991). Part of this debate has involved the extent to which epistemic and rhetorical factors can be distinguished: whether it is possible to separate truth-construction from the consensus achieved by techniques of persuasion. In this chapter I have argued that induction and falsification are poor resources for gaining access to natural and human realities, and that knowledge has to be seen as a rhetorical construct. I hope to have shown how corpus analysis has contributed to our understanding of the ways persuasion is socially created in research articles, a view which also, of course, has important implications for teachers of English in academic and professional contexts. First, this view emphasises that persuasion depends on overcoming numerous rhetorical problems. Writers must not only identify a valued disciplinary issue and report their study of it, they must also demonstrate its significance

Corpus analysis and academic persuasion 109

and locate it within a disciplinary context through citation, adopt an appropriate authorial stance towards it, and engage effectively with readers. Essentially, successful academic writing depends on the individual writer’s projection of a shared professional context and the construction of effective social interactions. is places language, or rather language that carries credibility, at the heart of learning to become a member of a disciplinary community. Simply, students and teachers cannot regard writing as an activity tacked on to the real business of research, a mere writing up of something which happens elsewhere in the lab or the library. Corpus analysis shows us that the linguistic features we teach are no more regularities of academic style than they are a representation of reality. It encourages us to assist learners to embed their writing in a particular social world which is reflected and conjured up through recognized discourses. Second, corpus analysis confirms that the discourses of the academy do not form an undifferentiated, unitary mass but comprise a variety of subject-specific literacies. It reveals that the ways writers present their arguments, control their rhetorical personality, and engage their readers reflect the different social and epistemological preferences of their disciplines (Hyland 2000). By showing learners that literacy is relative to the beliefs and practices of social groups, teachers are able to provide them with a way of understanding the discoursal diversity they encounter at university. is view helps teachers to reveal the variability which labels such as academic English disguise and which, by divorcing language from context, mislead learners into believing that academic literacy is an autonomous and non-contestable way of participating in academic communities. In short, analyses of this kind underline the fact that academic writing does not involve mastering a set of transferable rules, but manipulating rhetorical options in ways that readers will find persuasive. Specific instruction in these practices is essential for students to develop the skills they need to participate in particular academic contexts. ird, and more specifically, the analyses discussed here have stressed the interactive and interpersonal dimensions of academic writing. Effective academic writing depends on appropriate language choices, but we oen tend to focus our teaching on those options that affect meaning rather than those that give an impression of the writer or help negotiate claims with readers. I have sought to stress the importance of acknowledging the active role of readers and engaging them in community-specific ways, both to build a credible argument

110 Ken Hyland

and to construct a disciplinary context. Increasingly we are learning that such interpersonal aspects of writing are not simply an optional extra to be brushed up when students have gained control of summarizing, synthesising, handling referencing conventions, and so on; they are central to academic argument and to university success. Fourth, and finally, the analyses suggest the value of instruction which promotes rhetorical consciousness raising, both of students’ own writing in order to critically evaluate their practices, as well as the expectations of their disciplines and the features to be found in expert texts. Subject teachers can be helpful here in providing students with interview data on their practices and impressions of disciplinary conventions (Johns 1997). Centrally however, consciousness-raising must involve a focus on texts, and this can be achieved by students conducting mini-analyses of features in the genres they have to write, most simply with a highlighter pen, or by using classroom concordancers such as WordPilot 2000 or MonoConc. is allows students to discuss how target features are used, why they are used, and how these uses differ from field to field, from the guidelines offered in textbooks, and from expert writing. e purpose of these tasks is not to turn students into linguists, but to stimulate their curiosity and direct their attention to features of writing in their disciplines, enabling them to recognize both the choices available to them and their impact. e use of specialized corpora have provided important insights into the ways that discursive practices are used to accomplish persuasion in research articles. We are beginning to realise that the features writers select are always relative to a particular audience and social purpose, and their success in achieving these purposes ultimately depends on analyzing readers and engaging with them in appropriate ways. rough the study of features such as citations, selfmention, directives, personal pronouns, and so on, we can recover something of how writers construct their readers by drawing them into a dialogue, learn how writers make connections through texts to their disciplinary cultures, and assist students to communicate effectively.

References Bakhtin, M. 1986. Speech Genres and Other Late Essays. Austin: University of Texas Press. Bazerman, C. 1988. Shaping Written Knowledge. Madison: University of Wisconsin Press.

Corpus analysis and academic persuasion

111

Becher, T. 1989. Academic Tribes and Territories: Intellectual Inquiry and the Cultures of Disciplines. Milton Keynes: SRHE/OUP. Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson. Biglan, A. 1973. “The characteristics of subject matter in different scientific areas.” Journal of Applied Psychology 57 (3):204–213. Chalmers, A. F. 1982. What is This Thing Called Science? 2nd ed. Milton Keynes: OUP. Cohen, M., Manion, L. and Morrison, K. 2000. Research Methods in Education. 5th ed. London: Routledge. Gilbert, G. 1977. “Referencing as persuasion.” Social Studies of Science 7:113–122. Hawking, S. 1993. Black Holes and Baby Universes and Other Essays. New York: Bantam. Hyland, K. 1999. “Academic attribution: Citation and the construction of disciplinary knowledge.” Applied Linguistics 20 (3):341–367. Hyland, K. 2000. Disciplinary Discourses: Social Interactions in Academic Writing. London: Longman. Hyland, K. 2001. “Bringing in the reader: Addressee features in academic articles.” Written Communication 18 (4):549–574. Hyland, K. 2002a. “Directives: Argument and engagement in academic writing.” Applied Linguistics 23 (2):215–239. Hyland, K. 2002b. “What do they mean? Questions in academic writing.” Text 22 (4): 529–557. Hyland, K. 2002c. “Authority and invisibility: Authorial identity in academic writing.” Journal of Pragmatics 34 (8):1091–1112. Hyland, K. 2002d. “Humble servants of the discipline? Self-mention in research articles.” English for Specific Purposes 20 (3):207–226. Hyland, K. 2003. “Self-citation and self-reference: Credibility and promotion in academic publication.” Journal of American Society for Information Science and Technology 54 (3):251–259. Johns, A. M. 1997. Text, Role and Context: Developing Academic Literacies. Cambridge: Cambridge University Press. Kolb, D. A. 1981. “Learning styles and disciplinary differences.” In The Modern American College, A. Chickering (ed.), 232–255. San Francisco: Jossey Bass. Kuhn, T. 1970. The Structure of Scientific Revolutions. 2nd ed. Chicago: University of Chicago Press. Lemke, J. 1995. Textual Politics: Discourse and Social Dynamics. London: Taylor and Francis. Milton, J. 1999. WordPilot 2000. Compulang: Hong Kong. Odell, L. Goswami, D., Herrington, A. 1983. “The discourse-based interview: a procedure for exploring the tacit knowledge of writers in non-acandemic settings.” In Research on Writing: Principles and Methods, P. Mosenthal, L. Tamor and S.A. Walmsley (eds), 221–236. New York: Longman. Pera, M., and Shea, W. (eds). 1991. Persuading science: The Art of Scientific Rhetoric. Canton, MA: Science History Publications.

112

Ken Hyland

Sinclair, J. 1991. Corpus Concordance and Collocation. Oxford: Oxford University Press. Stubbs, M. 1996. Text and Corpus Analysis. Oxford: Blackwell. Swales, J. 1981. Aspects of Article Introductions (Aston ESP Research Report 1). Birmingham: University of Aston. Swales, J. 1990. Genre Analysis: English in Academic and Research Settings. Cambridge: Cambridge University Press. Thompson, G., and Ye, Y. 1991. “Evaluation of the reporting verbs used in academic papers.” Applied Linguistics 12:365–82. WinMax Pro 1998. Text analysis software for the social sciences. Kuckartz: Berlin.

// æ so what have YOU been WORking on REcently //

Section III

113

114

Martin Warren

// æ so what have YOU been WORking on REcently //

115

// æ so what have YOU been WORKing on REcently //: Compiling a specialized corpus of spoken business English Martin Warren The Hong Kong Polytechnic University

Introduction In Hong Kong, a corpus of spoken discourse, the Hong Kong Corpus of Spoken English, (HKCSE), consisting of four sub-corpora is being compiled (see Table 1). e HKCSE, when completed, will consist of 200 hours of spoken discourses, including approximately 2 million words transcribed both orthographically and prosodically. e participants are made up of Hong Kong Chinese (first language Cantonese) and either native speakers of English or speakers of languages other than Cantonese. Place of birth, age, gender, occupation, educational background, time spent living or studying overseas (for the Hong Kong Chinese) and mother tongue have been noted for all participants.. is chapter describes issues relating to the design of the sub-corpus of spoken business discourses (termed the Business Corpus or the Corpus from Table 1. e sub-corpora of the Hong Kong Corpus of Spoken English (HKCSE) Conversations (0.5m words)

Business Discourses Academic Discourses Public Discourses (0.5m words) (0.5m words) (0.5m words)

Approximately 50 hours of naturally occurring conversations recorded in homes, restaurants, cafés, cars etc.

Approximately 50 hours of meetings, service encounters, workplace presentations, job interviews etc.

Approximately 50 hours of lectures, seminars, supervisions, student presentations, telephone interviews etc.

Approximately 50 hours of public speeches, forum discussions, radio and television broadcasts etc.

116

Martin Warren

here on), the data collection and the transcription processes – in particular the prosodic transcription – of the data. e paper also describes, by means of an example, how the “corpus-driven” analysis (Tognini Bonelli 2002:75) of the data was influenced by collaboration between the corpus stakeholders and how the findings have proved to be of value, though admittedly not always of equal value, to all of the parties involved in the Business Corpus project.

Corpora of spoken discourse Until recently, corpora consisting of spoken discourse have been largely neglected. Compilers of corpora have tended to neglect the most common form of language use in favor of the form which is most easily gathered. e vast majority of corpora are collections of written texts and even those which purport to be a mixture of written and spoken data are predominately written. For example, the three largest corpora of the English language, the 450-million-word Bank of English (Sinclair 1987), the 100-million-word British National Corpus (BNC) (Aston and Burnard 1998), and the 100-million-word American National Corpus (ANC) (Fillmore et al. 1998), as well as the 5-million-word Longman Spoken and Written English (LSWE) Corpus (Biber, Conrad and Reppen 1998), only devote 10%–15% of their corpora to spoken English. Exceptions to these are the few specialised spoken corpora that are being compiled around the world. For example, the Cambridge and Nottingham Corpus of Discourse in English (CANCODE) is a collection of 5 million words of spoken English recorded between 1995 and 2000 (see, for example, Carter and McCarthy 1997). e Michigan Corpus of Academic Spoken English (MICASE) (Simpson et al. 2002) housed in the English Language Institute (ELI) at the University of Michigan consists of approximately 1.7 million words of academic speech from across the university. In China, a 500-hour spoken Chinese corpus of situated business discourse in the Beijing area (SCCSD BJ–500) is being compiled under the auspices of the Chinese Academy of Social Sciences (Gu 2002). In Great Britain, a corpus of nativespeaker to native-speaker spoken business English is currently being collected in a variety of semi-formal and informal corporate contexts by a Nottingham University team (see McCarthy and Handford this volume).

// æ so what have YOU been WORking on REcently //

117

The business corpus of the HKCSE e Business Corpus of the HKCSE was initiated in 1998 by a project team based in the English Department at the Hong Kong Polytechnic University. e Business Corpus is unique in terms of being the largest corpus of naturally-occurring spoken business English collected in Hong Kong. It consists of a variety of business discourses, including workplace presentations, meetings, job interviews, placement interviews, service encounters and informal office talk. Data were collected from a range of professional and business contexts, including hotels, the airport, private companies, government departments, academic institutions, and so on. e speakers captured in the Corpus, reflecting the reality of spoken business English in Hong Kong, are a mixture of native speakers of English, non-native speakers of English whose mother tongue is Cantonese, and other non-native speakers of English.

Researching professional discourse From the outset, issues of corpus design were fundamental. e research team has recognized that the needs and expectations they have with regard to the Business Corpus may at times be very different, though not necessarily mutually exclusive, to those of the practitioners who give permission for the spoken data to be collected. ese practitioners comprise both the managers and administrators in the business organisations and, on occasion, academics teaching applications-oriented undergraduate programs aimed at preparing future practitioners in business or teaching professionals and professional educators who provide training in business English communication skills to various levels of staff in the professional workplace. In his discussion of discourse practitioners as a community of interprofessional practice, Sarangi (2002:99) points out the importance for discourse researchers to study not only “how language mediates professional activities” but also “what constitutes professional knowledge and practice beyond performance.” In other words, Sarangi (2002:99) emphasizes the importance of an understanding of “professional practice and knowledge representations from their insiders’ perspective”. He raises three issues for researchers to consider when collecting and analyzing professional discourse, namely accessing, problematizing and interpreting professional discourse (Sarangi 2002:100– 103). Accessibility refers to the on-going problem for corpus linguists to gain

118

Martin Warren

access to business and professional data. Salience/problem identification is the mutual identification of salient issues and problems. Coding/interpretability/ articulation is necessary so that the researcher, through collaborating with practitioners, gains insider knowledge in order to better interpret the data. These issues present a useful conceptual framework for those compiling specialized business corpora. It has been of prime importance to the project team that in the design of the corpus of business data along with the collecting and analyzing of such professional discourse, the differing needs of the stakeholders (those in the context where the spoken has been gathered) need to be addressed, in relation to the issues of “(1) accessibility; (2) salience/problem identification; (3) coding/ interpretability/articulation” (Sarangi 2002:100). The first issue, accessibility, concerns the on-going problem for the research team to gain access to naturally-occurring spoken business data. The collection of these kinds of data by individuals outside of the business organization is invariably seen as potentially problematic in terms of compromising business confidentiality. The researchers’ experience of compiling this Business Corpus is that it has been by far the most difficult of the four sub-corpora making up the HKCSE to collect. It cannot be taken for granted that the business community will open up their world to corpus linguists in the name of academic research alone. Even when permission has been granted by the business organization for the research team to collect data, it was always made clear that the organization would retain the right to censor or delete any data collected. Regarding use of the data, the business organizations were asked to give their permission to the research team for the data to be used for other academic research purposes. Consequent to accessibility, one of the first questions that the researcher needs to address is “Who are the stakeholders?” and what are possible benefits that might accrue to them from the collection and analysis of the data. In the case of this Business Corpus there are up to five main stakeholders, and they include specialist business/management departments of the university, the management of the collaborating organization, the employees of the collaborating organization, full-time and part-time students on various ESP courses at the university, and finally, the research team. Participants in the discourses that make up the Business Corpus as, for example, those in meetings and presentations and workplace informal talks, were usually employees of the organization. In the case of interviews and service encounters, however, participants would also involve people outside the organization concerned. In all situations,

// æ so what have YOU been WORking on REcently //

119

all of the participants were required to sign a consent form prior to any recording, and they were assured of their personal anonymity along with that of the collaborating organization. This same consent form also served to survey the participants in terms of the personal details listed earlier. The second issue to think about when collecting and analyzing professional discourse, mutual identification of salient issues or problems, is most useful in the initial stages prior to data collection (Sarangi 2002). In reality, however, actual problems have to be identified through a process of discussion and negotiation. Very often, it is difficult for the different stakeholders to share a common view of the data and its purposes. Sarangi (2002) has identified the same problems as establishing identity and expertise and meaningful problematisation. In our experience, business organizations are usually willing to become a site for data collection if there is a perceived need for English language training for their employees and the data can be seen to contribute to language needs analysis and/or a source of authentic learning and teaching materials. The third issue is about coding, interpretability, or articulation (Sarangi 2002). Before data are collected, it is necessary for the researcher to gain insider knowledge in order to make sense of the data. In this project, the data collection procedure usually started with the research team negotiating with the management of an organization about the purpose of collecting spoken business data. Once permission was granted, the research team would meet with senior representatives of the business organization to further review and agree upon the project aims. Detailed plans and schedules were drawn up by the research team and were later approved by the business organization. en actual data collection was carried out by a member of the research team, usually the research assistant employed on the research project. In a few instances, the research assistant spent some time rotating from one department to another, observing each department and keeping journals about systems, communication and participant profiles. During the period of observation, the research assistant drew up plans for recordings based on sites where English was spoken across a cross-section of the organization’s functions. is period of observation and orientation was found to be essential to make sense of the data collected at a later stage. Related to coding/interpretability/ articulation (Sarangi 2002:100–103), a general observation made by the research assistant was that usually internal meetings were especially problematic for an outsider to understand. Another interesting observation was what is called the observer’s/participants’ paradox by the research team. Throughout

120 Martin Warren

the data collection process, the research assistant, who presented herself to employees as a university-based researcher and described the project aim as identifying training needs, was faced with different kinds of reactions. Some employees viewed the research assistant and her role with suspicion, some saw her as a co-worker, while others perceived her as some kind of an ombudsman between them and the management. Other employees, in the presence of the research assistant, used English in preference to Cantonese. On occasion, employees could be behaving more politely and could be more voluble when being recorded. These kinds of role conflicts are to be expected when collecting data of this kind and need to be taken into account.

Prosodic transcription of the data e Business Corpus is rare in that it is both orthographically and prosodically transcribed. e basic orthographic transcription of spoken data is very time consuming and costly, but these factors are magnified many-fold when prosodic transcription is added. It is well-known that it is both difficult and timeconsuming to prosodically transcribe naturally-occurring data, and it requires inter-rater reliability measures to ensure the quality of the transcription. In this regard, the prosodic transcriptions of the HKCSE were subjected to rigorous cross-checking involving three individuals, and further quality assurance was provided by a consultant to the project with many years of experience in transcribing and analyzing discourse intonation. e fact that the intonational behaviours of the speakers can be analysed, it is argued, makes the Business Corpus especially rich as both a research resource and a source of English language learning and teaching materials. While the orthographic transcription of spoken data is well established and the conventions quite well-known, the number of spoken corpora that are also prosodically transcribed is very small (see, for example, the London-Lund corpus, Svartvik 1990) and thus the representation of prosodic features in corpus data is less standardized. In this chapter, therefore, it is worth devoting some space to describing the prosodic labelling system adopted by the team in Hong Kong which will also serve to demonstrate the additional information that becomes available to the corpus linguist when the intonation of the speakers is added to the transcription. e discourse intonation system developed by Brazil (1985, 1997) was cho-

// æ so what have YOU been WORking on REcently //

121

sen to prosodically transcribe the HKCSE because the primary concern of the research team is to analyse the corpus data in terms of discoursal, pragmatic and intercultural communication phenomena. is system is especially useful for those seeking to explore the data in terms of the communicative value of intonation. Below, the system used is briefly described with examples drawn from the Business Corpus. In the prosodic transcription system developed by Brazil (1997), speakers can select from four independent systems: prominence, tone, key and termination (see Table 2) within a tone unit. In discourse intonation, a tone unit is taken to mean a stretch of speech with one tonic segment comprising at least a tonic syllable, but which may extend from an onset (first prominent syllable) to the tonic (final prominent syllable) (Hewings 1990:136). Each of the independent systems is a source of “local meaning” (Brazil 1997:xi) by which Brazil seeks to underline that these are moment by moment judgements made by speakers based on their assessment of the current state of understanding operating between the participants. In other words, Brazil’s system eschews the notions that intonation conveys fixed attitudinal meanings or is associated with particular grammatical structures. It also needs to be borne in mind that intonation alone, let alone one particular choice within the four systems, is not the sole conveyor of discourse meaning. When looking at intonation, the researcher at the same time has to be mindful of all of the other possible contributing factors in the ongoing negotiation of meaning between discourse participants. ese four systems are represented typographically in the Business Corpus using the following transcription conventions: Tone unit Prominence Tone Key

Termination

// .… // UPPER CASE LETTERS æä (fall rise); ä (rise); æ (fall); äæ (rise fall); à (level) high – written above the line mid – written on the line low – written below the line high – written above the line and underlined mid – written on the line and underlined low – written below the line and underlined (Brazil 1997)

122 Martin Warren

Table 2. Intonation choices available to speakers System

Choice

Prominence Tone Key Termination

prominent/non-prominent syllables rise-fall, fall, rise, fall-rise, level high, mid, low high, mid, low

(Adapted from Hewings and Cauldwell 1997:vii)

Each of the systems will now be explained using examples from the Business Corpus.

Prominence Brazil (1997:23–25) states that prominence is used as a means of distinguishing those words which are situationally informative. In this conceptual framework, the assigning of prominence is not fixed on the basis of grammar or word-accent/stress, it is a choice made by the speaker in context. For Brazil (1997), speakers have available to them two paradigms: existential and general. e existential paradigm is the set of possibilities that a speaker can choose from in a given situation. e general paradigm is the set of possibilities that is inherent in the language system. e choice of prominence in naturally-occurring spoken discourse is made when the speaker chooses from the existential paradigm that is available at that point in the discourse. It needs to be added that not every syllable in a word has to be made prominent for the word to have the status of prominence in a tone unit. Speaker decisions within the prominence system are on the basis of the speaker considering the status of individual words (Brazil 1997:39). e other three systems in discourse intonation, tone, key and termination, are not attributes of individual words but of the tonic segment (i.e. that section of the tone unit that falls between the first and the last prominent syllable). In Extract 1, the speakers are engaged in informal office talk and speaker B¹ has earlier commented on the fact that he has not seen his colleague for a while, and then asks what she has been doing.

// æ so what have YOU been WORking on REcently // 123

Extract 1 B: // æ so what have YOU been WORking on REcently // (HKCSE)

In this utterance (comprised of one tone unit), speaker B chooses to make you and working and recently prominent because in this context of interaction it is at these points in his utterance that existential paradigms occur. Speaker B is asking about his colleague’s work and not somebody else’s and so you, as opposed to they or I etc., is chosen to be prominent. Similarly, working, rather than studying, for example, is made prominent and then recently, rather than now, yesterday, etc. Conversely, so which in this context substitutes for “I haven’t seen you for a while” is assumed to be given information and is non-prominent.

Tone In discourse intonation a particular communicative value is associated with each of the five possible tones. A tone is the pitch movement that begins at the tonic syllable (i.e. the last prominent syllable in a tone unit) and is denoted in the transcripts by an arrow(s) at the start of the tone unit. Any spoken discourse proceeds on the basis of a considerable amount of shared knowledge between discourse participants (Brazil 1985:109), and it is for the speaker to decide moment-by-moment whether what he is saying is shared or not. Speaker-hearer convergence is not something which a speaker can be certain of, and so speakers make their choices on the basis of what they assume the common ground to be at any particular point in the discourse. Speakers, according to Brazil, basically have a choice between fall-rise/rise tones and fall/rise-fall tones. Brazil (1997:68–70) calls the former the referring tones and the latter the proclaiming tones. When a speaker chooses the referring tones he effectively indicates that this part of the discourse will not enlarge the common ground assumed to exist between the participants. Choosing the proclaiming tones, on the other hand, shows that the area of speaker-hearer convergence is about to be enlarged. e finer distinctions (Brazil 1997:86–98) between these tones are given below: –

fall-rise tone (æä) indicates that this part of the discourse will not enlarge the common ground assumed to exist between the participants.

124 Martin Warren

– – –

rise tone (ä) reactivates something which is part of the common ground between the participants. fall tone (æ) shows that the area of speaker-hearer convergence is about to be enlarged. rise-fall tone (äæ) indicates addition to the common ground and to speaker’s own knowledge at one and the same time.

ere is a fih tone, the level tone, in Brazil’s (1997:146) system which is associated with tone units which precede an encoding pause or otherwise “truncated” tone units. It should be noted that the choice of tone, as with other linguistic options, rests with the speaker, and the decision to present information as shared or new is based on a subjective assessment and is also open to exploitation should the speaker choose to do so. According to Brazil (1997:82–98), there are tone choices which may be characterized as being participant specific in specialized discourse types (i.e. discourses other than conversations). e decision to choose one of the two referring tones or the two proclaiming tones is dependent in part on the role relationships of the participants in the discourse. us in discourse types where one speaker is dominant, in the sense of having greater responsibility for the discourse and greater freedom in making linguistic choices, that speaker monopolises the fall rise/rise choice (Brazil 1985: 129–32). is observation would apply to the teacher in classroom talk, the interviewer in an interview, the doctor in a doctor/patient consultation, and so on. Likewise, although the rise-fall tone is by far the least prevalent of the tones, again Brazil claims that it tends to be the dominant speaker(s) in a discourse, in which the participants are of unequal status, who makes this selection. Extract 2 b: // ä we USED to HAVE four // æ and NOW we have

TWELVE // (HKCSE)

In extract 2, speaker b is engaged in informal office talk and is describing the change in staffing in her department. As she does so, we can see that she uses a rise tone to state the previous staffing configuration as she judges this to be reactivation of shared knowledge between her and the hearer, and then she chooses a fall tone to state the current staffing situation which is perceived by her in this context to be new for the hearer.

// æ so what have YOU been WORking on REcently //

125

Extract 3 b: // æä because NOW we offer the DAY for FIVE HUNdred dollar // à so AFter the FIVE hour // æ we CHARGed ONE dollars per HOUR // (HKCSE) Extract 3 is from a business meeting and speaker b is responding to the chair who has asked what the current charging arrangement is. Speaker b begins the utterance with fall rise tone assuming that this information is shared by the hearer. e second tone unit is spoken with a level tone as it is incomplete, and then the last tone unit is spoken with a fall tone and, in the mind of the speaker at least, is new information to the hearer.

Extract 4 a: // à BUT er // æ it’s NO room aVAIlaBLE // æ toNIGHT // (HKCSE)

Extract 4 is taken from a service encounter at an airport information counter. A customer has asked if he can book a room at the airport hotel, and speaker a begins her response hesitantly with but spoken with a level tone followed by a filled pause. For the remainder of the utterance proclaiming tones (fall) are chosen by the speaker as the information is new to the hearer.

Key and termination e last two systems concern pitch level choices available to speakers and are best looked at in combination. According to Brazil (1997:40–66) speakers can choose from a three tier system (high, mid and low which are written above, on or below the line in transcriptions respectively) in terms of the relative “key” at the onset of a tone unit which is the first prominent syllable in a tone unit. e choice of key is made on the first prominent syllable, and whether the speaker selects high, mid or low will affect the meaning of what is said. High key selection has contrastive value, mid key has additive value and the selection of low key has equative value, that is with the meaning “as to be expected” (Brazil 1985:75–84). Lastly, Brazil states the speaker also chooses pitch level again at the end of the tonic segment on the tonic syllable (i.e. the last prominent syllable in the

126 Martin Warren

tone unit which is underlined in the transcripts), and he terms this system “termination” (Brazil 1997:11). Again, this is a three tier system of high, mid and low, and when transcribed they are written above, on or below the line respectively. By means of this choice, the speaker can seek to constrain the next speaker to respond if s/he selects high or mid termination and, due to the seeming preference for “pitch concord” (Brazil 1985:86) found in spoken discourse across turn boundaries, the next speaker frequently ‘echoes’ the termination choice of the previous speaker in her/his choice of key. If the speaker chooses low termination, no attempt to elicit a response is made by the current speaker and thus leaves the next speaker to initiate a new topic or for the discourse to come to a close. e local meaning of selecting high or mid termination varies according to the functional value of what is being said and can be briefly summarized based on three broad scenarios. In the case of yes/no questions (Brazil 1997:54–55), the choice of high termination carries the meaning that adjudication is invited from the hearer while mid termination seeks concurrence. In wh-type questions (Brazil 1997:56), high termination carries the meaning that “an improbable answer is expected” and mid termination is a “straightforward request for information,” while in declaratives, the choice of high termination denotes the meaning “this will surprise you” and mid-termination the meaning “this will not surprise you” (Brazil 1997:58). Extract 5 B: // æ and WHEN does your CONtract END // TWO a1: // à CONtract END in // æ a2: // æ

months //

// oKAY (HKCSE)

Extract 5 is taken from a job interview in which speakers B and a2 are members of the interviewing panel chaired by a2 and speaker a1 is the interviewee. Speaker B asks the interviewee a question regarding her current contract employing mid key on when, which has the effect of additive value, and mid termination on end, which constrains the interviewee to respond and has the local meaning of a straightforward request for information. Speaker a1 begins

// æ so what have YOU been WORking on REcently // 127

by selecting mid key on contract, which is an example of pitch concord as it meshes with the mid termination of the previous speaker. en, in the next tone unit, the speaker selects high key and high termination on two, which carries the meaning of “against expectations and surprisingly.” Interestingly, speaker a2, the panel chair and therefore designated dominant speaker, is the one to follow up on the interviewee’s response with okay selected with low key, which has equative value and low termination. e selection of low termination does not prospect a response from the hearer and here denotes the closing of the topic, and the topic subsequently changes. e above serves as a brief introduction to the way in which the Business Corpus is being transcribed for its intonation. As far as the writer is aware, no other spoken corpus has so fully adopted Brazil’s discourse intonation notations and so the Business Corpus along with the rest of the HKCSE will be unique in this regard. However, Brazil’s discourse intonation system has been employed quite widely and successfully in the development of learning and teaching materials aimed at pronunciation and intonation skills (see, for example, Bradford 1988; Brazil 1994; Cauldwell 2002; Chun 2002; Hewings and Goldstein 1999).

Involvement of stakeholders in corpus analysis As has been discussed earlier, analysis of the Business Corpus as a whole, and the various discourse types that are contained within it, is at times problematic for the research team without working closely with the stakeholders involved. In this section of the chapter, an example of how this worked in practice with one particular discourse type will be given and its wider implications discussed. A number of discourses contained in the Business Corpus were collected in Hong Kong hotels and, as usual, the research team needed to further classify these discourses in terms of discourse type (e.g. meeting, presentation, informal office talk, etc.). is procedure was unproblematic except for one discourse which seemed to elude the team’s existing list of discourse types. With the research assistant seeking clarification from the hotel employee concerned and the hotel management, it was found that the discourse was an interesting mix of several genres between what the hotel terms a “hotel ambassador” and a hotel guest. is hybrid genre or multi-genre discourse is part service encounter, part pseudo-conversation and part hotel sales and promotion talk, which

128 Martin Warren

is termed hotel ambassador discourse by this study and is possibly a new business genre. In the analysis that follows later it is shown that without input from the stakeholders it would have been difficult for the researcher to reach a firm understanding of the status of the discourse within the business organization let alone make recommendations in terms of staff language training needs.

Hotel ambassador discourse: A multi-genre interaction e discourse begins as an ordinary service encounter with the female hotel guest, speaker A, buying a stamp with the assistance of speaker a, a female hotel staff. e discourse also involves speaker a speaking in Cantonese to a male hotel staff, speaker b, to deal with the business of selling a stamp. Field notes usefully state that speaker a was in the lobby area of the hotel and was not behind the front office reception desk. e discourse then changes to a genre which is similar to conversation in terms of collaboration in the development of topic, turn-taking organization, overlapping talk, topical content, and humour. is genre is best described as a pseudo-conversation as it is clear that participating in a conversation is not the main purpose of the talk for one of the speakers, speaker a, at least. e extracts from the discourse are analyzed below using the information available as a result of the prosodic transcription of the Business Corpus.

Analysis of discourse intonation e intonational behavior of the two speakers below will show that this important additional information contained in the Business Corpus can enrich the analysis of the texts contained within it. Extract 6 begins with speaker A engaged in a pseudo-conversational segment as she responds to having been asked where she went in downtown Hong Kong. Extract 6 45 A: // æä er we went SHOPping // à AND then we WENT to THE // 46 æ r____ hoTEL // æ for a DRINK //

// æ so what have YOU been WORking on REcently // 129

47 a: // æ R_____ hoTEL // YEA 48 A: // æ

// WHY

49 a: // æ

hoTEL you no // æ CHOICE OUR

//

50 A: ((laughs)) ALso 51 a: // æ THIS is [a GOOD one 52 A:

// æ BETter than r____ hoTEL //

// à [IT’S // YOU

53 æ i PROmise

//

54 A: ((laughs)) 55

((a and b speak Cantonese))

56 A: ((laughs)) //æ they WERE very FRIENDly [downSTAIRS here //ä the [BAR // 57

a:

æ [YEA

//

// æ [YES //

WE’RE 57 a: // à

// ä five STARS // æ hoTEL ALso // YEA

58 A: // æ

// æ it’s a GREAT hoTEL // (laughs) (HKCSE)

Speaker A begins her utterance with a fall-rise tone as she takes the information that she went shopping to be common ground, and then treats the visit to a rival 5-star hotel for a drink as new information choosing a fall tone. roughout, she uses mid key, which has additive value as she lists what she did downtown, and mid termination, which presents the information as unsurprising. On line 49, the genre shis as speaker a asserts her role as hotel ambassador and promoter of the hotel, and the genre becomes one of promotional talk. Speaker a asks why speaker A had chosen to have a drink in the rival hotel rather than this hotel. Almost every word in the utterance is made prominent, and she uses high key, which has contrastive value implying that A’s actions have gone against expectations and high termination on hotel. is constrains

130 Martin Warren

A to respond with the expectation that any response is to be viewed as improbable. Speaker A responds with an awkward or slightly nervous laugh. Speaker a continues by stating that this hotel is also good. Speaker a’s choice of prominence on this, good and also (line 51) in combination with a fall tone assumes this is new to the hearer. In addition, speaker a’s choice of high termination on also suggests that, surprisingly, the rival hotel is not the only good hotel to go to for a drink. In this utterance speaker a exploits all four discourse intonation systems to provide additional meaning and help drive home her point. Speaker A begins to speak, but speaker a continues with similar effect on line 51 in terms of intonation choices. Speaker a now chooses a fall tone to say that the hotel is not just as good as the rival but better, said with prominence. She then chooses to make promise and you prominent with high key termination on you denoting that this is unexpected and which constrains the hearer to respond. Speaker A laughs awkwardly and, once more, is probably somewhat taken aback with the sudden switch from pseudo-conversation to hotel promotion talk. Aer a brief aside while the two hotel staff say something to one another in Cantonese, speaker A laughs again and comments on the friendliness of the staff at the bar, which is possibly an attempt to shi the topic and deflect the implied criticism from the hotel ambassador that she has not patronized the hotel’s bar. is tactic does not seem to placate speaker a who continues to promote the hotel over the rival hotel visited by speaker A (line 58). Speaker a chooses high key and high termination when she says we’re on line 58, which carry contrastive value and the meaning that this will be unexpected information respectively, and then chooses a rise tone, which reactivates common ground, for five stars. is utterance ends with mid termination on also, with the meaning that this is unsurprising information although it is perceived by the speaker as new to the hearer as it is said with a fall tone. is choice of intonation by speaker a further underlines that this hotel, just like the rival hotel, has five stars. e effect of all that speaker a has said can be seen when speaker A agrees with her, and chooses high key and high termination for yea (line 59), which in this case means “contrary to what you might have thought, surprisingly, I agree with you.” Extract 7 begins with speaker a once more promoting the hotel.

// æ so what have YOU been WORking on REcently //

131

Extract 7 GOOD 192 a: // æ CHEAper RATE // æ

SERvice // æ HOW to FIND //

193 A: ((laughs)) 194 a: // ä SO // æ ONly R_____ A_____ hoTEL // ((laughs)) 195 A: // æ of COURSE // æ that’s RIGHT // ((laughs)) // à erm // ((laughs)) SALES 196 // æ then HOW could you beCOME SUCH a good

person

//

197 a: // ä SALES // æ NO // 198 A: // æ NO // 199 a: // æ [I’M reCEPtionist // æ ALso // 200 A: // æ [proMOtion // æ

// ((laughs)) YEA 201 a: // à BUT // æ er toDAY I’M r_____ amBASsador // YEA 202 A: // æ oh

//

203 a: // à WALK aROUND the hoTEL // à AND // à talKING // æ aNOther GUESTS // OH 204 A: // æ

NICE // æ that’s

// .... (HKCSE)

is time speaker a points out that the combination of low prices and good service at the hotel is not found anywhere else (line 192). She chooses prominence on the important message bearing words and uses a fall tone for all the three tone units as she perceives this information to be new to the hearer. e choice of high key on good is interesting as it serves to contrast good service with cheap rates. Once more, speaker A gives an awkward laugh and then on line 194, speaker a continues her promotional work by first reactivating com-

132 Martin Warren

mon ground by selecting a rise tone on so and then offering as new information that the only, made prominent, option is her hotel, all said with mid key (on so and only), which carries additive value, and mid termination (on so and hotel), which has the meaning of “this information will come as no surprise.” Speaker A agrees with speaker a on line 195 with a fall tone and mid key on course, which is an example of pitch concord. In the same utterance she chooses additive mid key and mid termination on right and so presents the information presented as expected. Speaker A then shis back to pseudo-conversation when she asks how could you become such a good sales person on line 196. Her choice of high termination on sales is interesting as it has the meaning that the response is anticipated to be improbable. Is she here implying by means of irony that speaker a is overdoing the sales talk? Speaker a takes the question at face value and replies that she is not sales with a rise tone, reactivating what speaker A has just said, and no is said as new information with a fall tone. Her use of mid key has additive value in the sense of factual information and her choice of mid termination indicates what is said is within expectations. ere is then overlap as speaker a says that she is receptionist also as speaker A suggests she is promotion, said with a low key yea which carries the sense of self evident, and a low termination yea, which does not prospect a response from the hearer. When on line 201, speaker a states that she is today’s hotel ambassador, she chooses a fall tone as the information is new to the hearer, but she does not predict the surprise reaction from speaker A in her own choice of key and termination, which are both mid. Speaker A responds with oh yea, and her choice of high key has contrastive value; high termination conveys a surprised evaluation, and her use of a fall tone again does not assume common ground. Similarly, aer speaker a explains her duties as ambassador (line 203), with a similar choice of intonation as that used in her previous utterance (line 201), speaker A responds with oh and that’s nice, with both the oh and nice having high key and high termination with contrastive value and thus conveying the information as surprising. It seems that the hotel guest has inferred that she is in the presence of some kind of sales/promotion figure from the hotel and is surprised to find that the hotel staff is in fact an ambassador. is could simply be because such a role is relatively rare (it has only recently been instituted in this hotel) or that the hotel guest feels that she is being harassed by a saleswoman rather than being pampered by an ambassador.

// æ so what have YOU been WORking on REcently //

133

Later in the discourse, speaker a asks A if she intends to return to the hotel in the future. Extract 8 YOU 214 a: // æ SO // æ

will // æ come BACK // ä our hoTEL aGAIN //

215 A: // æ of COURSE // æ [of

// (laughs) COURSE

216 a:

// æ [

YES // æ reMEMber ME // æä i GIVE you

217 the NICE // æ NICE NICE room for YOU // 218 A: // æ Okay // æ [GOOD // 219 a:

// æ [Okay // (HKCSE)

Speaker a’s choice of intonation is again interesting to analyze and provides evidence that speaker a is being quite persistent in her promotion of the hotel. On line 214, she makes you prominent and chooses high key for contrastive value. e first three tone units are uttered with a fall tone as they are considered by the speaker as new information while our hotel again is said with a rise tone as this is the repetition of common ground. e use of mid termination in this indirect yes/no question from speaker a invites the hearer to concur with her. is combination elicits from speaker A the response of course said twice, the second time with low key with the meaning of self evident and low termination, which does not prospect a response. As speaker A is speaking, speaker a overlaps, making speaker A’s choice of low termination, which might have served to end the topic, redundant. On line 216, speaker a begins with a yes, said with high key and high termination, and then proceeds to urge A to remember her and promises A a nice room on her next visit with prominence on remember, me, give, nice and you with the additive value of mid key and mid termination, which presents the information as expected. To this utterance speaker A gives a minimal response okay and good with mid key and mid termination, followed by speaker a saying okay in a similar fashion. It is doubt-

134 Martin Warren

ful whether speaker a has the authority to make such promises, but her choice of intonation adds to the effect that she is badgering the guest, not unlike an over-zealous shop assistant trying to make a sale. In extract 9 there is another example of a pseudo-conversation shiing to hotel promotion talk. Extract 9 GO 296 a: // à HAVE you er // à

297

to the THIRD floor OUR // à HEALTH club // ä

for the GYM // ä [to SWIM // NO

298 A:

// æ [

HAVEn’t // æ i

NO // æ

// ä was it [NICE YES

299 a:

// à [YEA // æ

//

JUST 300 very [NICE // à // æä THIRty Hong KONG DOllars // à you CAN er // [YEA 301 A: // ä // 302

// à use THE // æä faCILity // à for THE HEALTH club // à

303 and THE // æ SWIM pool // OH 304 A: // æ

// æ that’s pretty [GOOD //

305 a:

// æ [YES // à AND we HAVE the // æä ALso

306 BEAUty // æ maSSAGE

//

maSSAGE 307 A: // æ

up there //

// æ so what have YOU been WORking on REcently //

135

YEA 308 a: // æ

// OH

309 A: // æ

// ä that’s NICE // (HKCSE)

On line 296, speaker a asks A what looks like a neutral question when she asks if A has been to the health club. Her use of intonation shows that she assumes the listing of the gym and swimming pool is reactivating common ground aer the health club has been mentioned. e choice of high key on go with contrastive value suggests that speaker a thinks that while A might know of its existence, she will not have actually been to the health club. Speaker A confirms that she has not gone and chooses contrastive high key and high termination to indicate that this new information is surprising. Her choice of a rise tone for was it nice anticipates confirmation rather than information from speaker a which is what it receives on line 298. Speaker a effectively adds to the positive evaluation of the health club when she adds very nice (line 300) with a fall tone and mid key, which carries additive value, and mid termination with the meaning that this information is unsurprising. Speaker a then exploits the use of high key on just (line 300) to contrast the price (approx. US$4) with the range of facilities and also said with high termination to denote that this will surprise the hearer. In the rest of the extract the choice of high key and high termination by both speakers is interesting. Speaker A follows up with yea (line 301) said with high key and high termination and a rise tone. On line 304, speaker A chooses contrastive high key and high termination on oh to indicate surprise, and in the next utterance on line 306 the use of high termination on also by speaker a serves to underline that there is, surprisingly, even a beauty massage at the health club. In the remaining utterances in the extract, it can then be seen how the employment of high termination both constrains the hearer to respond and induces pitch concord as the speakers employ this high termination choice. It is interesting to contrast the different intonation choices made by speaker A when she says that’s nice on line 309 compared to extract 7. Here it is said with a rise tone indicating the reactivation of common ground with mid key and

136 Martin Warren

mid termination, giving it additive value and presenting it as an unsurprising assessment rather than as new and surprising information with contrastive value in extract 7 (line 204). In extract 10, speaker a is once more engaged in promotional talk. Extract 10 YES 324 a: // æ reTURN TRIP // æ would you reTURN TO our hoTEL // æ

//

YEA 325 A: // æ P____ // æ

// (HKCSE)

Speaker a asks if speaker A would return to the hotel the next time she is in Hong Kong. Speaker a’s choice of intonation is particularly noteworthy at the end of her utterance. Her choice of high termination on yes with a fall tone has the effect of trying to elicit a positive rather than an anticipated negative response from speaker A. It can be seen that the strategy works when on line 325 speaker A says yea, selecting contrastive high key and high termination, which denotes that it is against expectations.

Discussion In the above analysis of discourse intonation, there is evidence to suggest that the speakers are observing the intonational system as described by Brazil (1985, 1997). is is as true for the non-native speaker as it is for the native speaker. In other words, both speakers appear to be in control of their intonational choices, and the choices made make sense locally. However, this is not to say that at a deeper level the hotel ambassador’s use of intonation is appropriate given her role. It has been shown that speaker a seems, through her choice of intonation, to be adding an additional level of meaning at the local level which makes her promotional talk rather insistent and forceful at times. Given that she is working in a 5-star hotel which claims to place the comfort and needs of its guests above all else, it is unlikely that the hotel ambassador’s main function is to badger guests about the use of the hotel’s facilities or to keep pressuring

// æ so what have YOU been WORking on REcently // 137

them to return. It could be argued that the combination of analysing the topic development of this discourse along with the intonation choices made might well be the recommendation that hotel ambassadors be more strategically trained as to what they say and how they say it in order to more subtly and effectively promote the hotel. What is also clear is that the precise status of this discourse and its appropriacy would be impossible to determine without the involvement of the stakeholders. e importance of corpus design and the collaboration of all of the stakeholders in the data collection and data analysis processes for the Business Corpus described here are essential in order for the data to be more fully exploited and understood. It has been shown that the field notes gathered by the research assistant along with follow-up questions regarding the status of some discourses are crucial if the data are to be meaningfully analyzed. ese are methodologies more usually associated with case studies involving relatively small amounts of data, but such methods used in conjunction with large-scale corpus-driven studies are not mutually exclusive. Corpus-driven studies allow for both quantitative and qualitative studies of the data to be made and, for both kinds of study, information pertaining to the situatedness of the data is invaluable. e rationale for collecting the data, whether it is for pure or applied research or consultancy, will also have an effect on the analytical framework adopted as will the sites of data collection. Collecting a large corpus of specialized business data does not mean that one must minimize the importance of context, participants and interaction. Rather a specialized corpus, such as the Business Corpus, can promise the comparison and consolidation of a multiplicity of findings (see, for example, Cheng and Warren 2003) and help to reduce the inherent tensions in the dissemination of results and the scope of intervention and change.

Implications for learning and teaching business English e existence of the Business Corpus as a potential resource for learning and teaching has yet to be tapped, but it clearly has uses. Here we have business English as it is really spoken in naturally-occurring discourses. What is more, the mix of speakers reflects the reality of English use in the business world where most of the speakers of English are not native speakers of English. Corpora such as the Business Corpus provide useful models for what is said and

138 Martin Warren

how across a wide range of business functions, which in most ESP textbooks are based on the intuitive notions of the authors. e learning and teaching of discourse intonation has yet to find its way into mainstream English language learning and teaching materials (Chun 2002:199), but where it has been introduced (e.g. Cauldwell 2002), examples drawn from real instances of language use can serve as models for learners to discuss and replicate. It has been shown here that intonation is a set of choices and this fact needs to be made clear to learners. Activities which illustrate the choices available and the effects those choices have on local meaning are an obvious way to proceed. For example, Bradford (1988) encourages learners to experiment with different intonational choices and to discuss their effect on the meaning potential of utterances. Such activities supported with examples taken from a corpus would be a useful addition for the language learner.

Conclusions is chapter has described the design of the sub-corpus of business discourses contained in the HKCSE and the processes and considerations involved in collecting the data. e ways in which the data have been transcribed have been described, especially the prosodic transcription which is a unique characteristic of this specialized corpus. e need to collaborate with the various stakeholders in order to make sense of the data and subsequent findings has been illustrated with an example of a complex genre that would have been difficult to interpret without such collaboration taking place. Extracts of this discourse were also analyzed in terms of the intonational choices made by the speakers and it was shown that this kind of analysis can have useful implications for the stakeholders. It also underscores the ways in which the intonational decisions made by speakers are based on the need for speakers to add situation-specific meanings to words or groups of words in real time. Finally, the benefits to be gained from such a corpus and its prosodic transcription for both researchers and English language learners and teachers have been discussed.

// æ so what have YOU been WORking on REcently // 139

Acknowledgments e work described in this paper was substantially supported by a grant from the Research Grants Council of the Hong Kong Special Administration Region (Project No. B-Q396). anks are due to Richard Cauldwell who was the consultant to the project and has so enthusiastically shared his knowledge of intonation with the research team.

Notes 1. In the HKCSE, Hong Kong Chinese speakers are denoted with lower case letters and all other speakers with upper case letters.

References Aston, G. and Burnard, L. 1998. The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh: Edinburgh University Press. Biber, D., Conrad, S. and Reppen, R. 1998. Corpus Linguistics: Investigating Language Structure and Use. New York: Cambridge University Press. Bradford, B. 1988. Intonation in Context. Cambridge: Cambridge University Press. Brazil, D. 1985. The Communicative Value of Intonation. Birmingham: English Language Research. Brazil, D. 1994. Pronunciation for Advanced Learners of English. Cambridge: Cambridge University Press. Brazil, D. 1997. The Communicative Role of Intonation in English. Cambridge: Cambridge University Press. Carter, R. and McCarthy, M. 1997. Exploring Spoken English. Cambridge: Cambridge University Press. Cauldwell, R. 2002. Streaming Speech: Listening and Pronunciation for Advanced Learners of English. Birmingham: Speechinaction. Cheng, W. and Warren, M. 2003. “The use of intonation to assert dominance and control across different genres.” Eighth International Symposium on Social Communication, Discourse & Dialog. Centre for Applied Linguistics, Santiago de Cuba, January 20–24, 2003 (2):1325–1330. Chun, D.M. 2002. Discourse Intonation in L2: From Theory and Research to Practice. Amsterdam: John Benjamins. Fillmore, C., Ide, N., Jurafsky, D. and Macleod, C. 1998. “An American National Corpus: A Proposal.” Proceedings of the First International Language Resources and Evaluation Conference, Granada, Spain:965–70.

140 Martin Warren

Gu, Y.G. 2002. “Towards an understanding of workplace discourse: a pilot study for compiling a spoken Chinese corpus of situated discourse.” In Research and Practice in Professional Discourse, C. Candlin (ed.), 137–185. Hong Kong: City University of Hong Kong. Hewings M. 1990 “Patterns of Intonation in the Speech of Learners of English”. In Papers in Discourse Intonation (Discourse Analysis Monograph 16), M. Hewings (ed.), 130–144, Birmingham: English Language Research. Hewings, M. and Cauldwell, R. 1997. “Forward.” In The Communicative Role of Intonation in English, D. Brazil, v-vii. Cambridge: Cambridge University Press. Hewings, M. and Goldstein, S. 1999. Pronunciation Plus: Practice Through Interaction. Cambridge: Cambridge University Press. McCarthy, M. and Handford, M. this volume “‘Invisible to us’: A preliminary corpus-based study of spoken business English.” Sarangi, S. 2002. “Discourse practitioners as a community of interprofessional practice: some insights from health communication research.” In Research and Practice in Professional Discourse, C. Candlin (ed.), 95–133. Hong Kong: City University of Hong Kong. Simpson, R.C., Briggs, S.L., Ovens, J. and Swales, J.M. 2002. The Michigan Corpus of Academic Spoken English. Ann Arbor, MI: The Regents of the University of Michigan. Sinclair, J. (ed.) 1987. Looking Up: An Account of the COBUILD Project in Lexical Computing. London and Glasgow: Collins ELT. Svartvik, J. (ed.) 1990. The London-Lund Corpus of Spoken English: Description and Research. Lund: Lund University Press. Tognini Bonelli, E. 2002. “Functionally complete units of meaning across English and Italian: Towards a corpus-driven approach.” In Lexis in Contrast: Corpus-based Approaches, B. Altenberg, S. Granger (eds), 73–96. Amsterdam: John Benjamins.

// à did you

TOOK

// ä from the miniBAR //

141

TOOK // à did you // ä from the miniBAR //: What is the practical relevance of a corpus-driven language study to practitioners in Hong Kong’s hotel industry? Winnie Cheng The Hong Kong Polytechnic University

Introduction Corpus linguistics dates back to the 1960s (see Sinclair et al. 1969) and, due to its dependence on computer technology, has come into its own since the 1980s with the growth of multi-million word corpora. To date, corpus-driven research has produced important new findings in the fields of linguistics and applied linguistics (see Biber et al. 1998; Hunston 2002; McEnery and Wilson 1996; Sinclair 1991). Corpus-driven research emphasizes that theoretical statements are a product of the evidence from the corpus (Tognini-Bonelli 2002:75). Most importantly, these studies examining corpus data have helped researchers to identify patterning that differs from traditional models of the English language, and have demonstrated the shortcomings of relying solely on intuitive models of language in use. Spoken corpora have been and remain far rarer than written, largely due to the difficulties in obtaining and transcribing it, but native speaker varieties of English (most noticeably, British English and American English) have been collected and analyzed (see, for example, Aijmer 1996; Biber et al. 1999; Sinclair 2001). A corpus allows a combination of quantitative and qualitative analysis. With corpus analysis, new patterning is much easier to detect through quantitative analysis than with qualitative analysis alone. ere will remain phenomena which are oen discoursal, pragmatic or intonational in nature, and these kinds of discourse behavior may require looking in detail at transcripts. A corpus of spoken discourses is ideally suited to such an approach as both the transcripts and original recordings are available for qualitative analysis. However, the terms corpora and corpus linguistics,

142 Winnie Cheng

which have become widely known terms among researchers of linguistics and applied linguistics, have yet to acquire common currency among English language teachers and students. is chapter describes an initial quantitative corpus-driven (Tognini-Bonelli 2002:75) research of business discourses which can lead the corpus linguist to explore the data in more qualitative ways. e chapter also shows the ways in which such a combination of quantitative and qualitative research approaches can offer detailed and comprehensive language descriptions which, in turn, have implications for language teaching and learning. e discourses analyzed in this chapter come from a corpus of approximately 50 hours of spoken business data housed in the English Department of the Hong Kong Polytechnic University. e corpus of business discourses amounts to something in the order of 500,000 words of naturally occurring English in Hong Kong, most of which have been both orthographically and prosodically transcribed. It is a sub-corpus of the Hong Kong Corpus of Spoken English (HKCSE) (Cheng and Warren 1999, 2000). A number of the discourses contained in the corpus of spoken business English were collected in Hong Kong hotels. e hotel sector is an obvious site for collecting examples of English spoken by Hong Kong Chinese in a business context. e hotel industry is a major sector of Hong Kong’s economy with very promising growth potential. According to the Hong Kong Tourist Board (2002), in 2002 Hong Kong received around 16.5 million visitors, which represented an increase of 20.7% as compared with that of 13.7 million in 2001.

An overview of the hotel project e data described and analyzed in this chapter were collected by a project team between 1999 and 2000 in a 5-star hotel in Hong Kong. e project team comprised ESP providers of the English Department and colleagues in the School of Hotel and Tourism Management (HTM) of the Hong Kong Polytechnic University. rough the HTM collaborators’ contacts with the local hotel industry, the project team gained access to a 5-star hotel for data collection. e hotel granted permission for the project team to have access to various sites of interaction in its different departments during a period of three months. An obvious site for spoken data collection is the hotel reception desk. e project team spent two days at the reception desk collecting data by

// à did you

TOOK

// ä from the miniBAR // 143

recording interactions between the front office staff and hotel guests. e team set up two Sony digital cameras, one focusing on the guest and the other on the front office staff. A Sony MD recorder was used as back up. When recording at the reception desk, the team found that background noise was quite serious, and so they adjusted the microphone in order to obtain spoken data of as good quality as possible. Prior to recording, both the front office staff and guests were asked for their permission to be recorded. If they agreed, they would sign a consent form, which clearly explained that any time aer the recording was made, if they changed their mind about having their interaction used in research, they could contact the project team to ask for the data to be removed. All front office staff agreed to be recorded. Forty percent of the guests agreed to be recorded. None of the participants contacted the project team to request deletion of their recorded interactions.

The hotel checking out discourses One kind of spoken discourse collected at the Reception Desk of the hotel was a speech event that we are all familiar with – checking out of the hotel at the end of a stay. Checking out is viewed by the hotel industry as a key discourse in terms of front office staff displaying both customer care skills and conveying the hotel’s overall philosophy towards its guests. As probably the last discourse interaction between a hotel and its guests, checking out is also seen as important as it will be the last impression departing guests will take away with them. e hotel checking out discourse consists of six recordings which amount to approximately fourteen minutes of spoken data (see Appendix 1 for transcriptions). All of the data were recorded in the same five-star hotel in Hong Kong. e front office staff, which was all male, was made up of different individuals.

Analysis of the hotel checking out discourses Word frequencies As a corpus and applied linguist, the researcher was interested in analyzing data collected from the hotel in countless different ways, but what about the practitioners of the hotel industry? For them, the benefits of such a collection

144 Winnie Cheng

of spoken texts tend to be judged mainly in terms of its practical relevance to enhancing successful communication in the real world. Five groups of practitioners relating to the hotel industry were identified in the initial stages of the study. ese were the ESP providers, colleagues in HTM, hotel owners and management, frontline hotel staff, and the applied/corpus linguist such as the researcher herself. In this chapter, examples are given of the ways in which the analysis of checking out discourses can be of practical relevance to the main practitioners in Hong Kong’s hotel industry. e examples discussed cover a range of linguistic, discoursal, pragmatic and prosodic analyses. ese ways of examining the checking out discourse led to qualitative analysis that involved looking at each checking out text from the perspectives of collocation (Sinclair 1991, 2001; Hoey 1991; Hunston 2002), structural organization of the text-genre (Bhatia 1993), the pragmatics of intercultural communication, and discourse intonation (Brazil 1985, 1997). e methods of analysis will also be discussed to highlight the interplay between quantitative and qualitative methods available to the corpus linguist. One of the first things carried out was to subject the checking out discourses to Wordsmith Tools (Scott 1999)1, which is soware designed to interrogate a corpus in a variety of ways. First, a word frequency list was generated. An examination of the word list produced some interesting findings which resulted in a variety of lines of inquiry. Given the nature of the data, it is hardly surprising that check (N = 5), checking (N = 2) and out (N = 7) appear in each of the checking out discourses. As checking out of a hotel almost invariably involves settling the bill and paying for the stay, a number of related words are high on the frequency list. ese words include card (N = 11), credit (N = 7), visa (N = 5), bill (N = 4) and pay (N = 4). Also the social deictic expression sir (N = 8) is to be expected in service encounters of this kind involving a service provider interacting with male customers in the context of a five-star hotel. Similarly, the occurrence of thank you (N = 11) is to be expected in such service encounters. Apart from these words, another lexical item minibar became the focus of further investigation due to its unexpected and frequent occurrence (N = 6) relative to other lexical items associated with checking out of the hotel. e relatively frequent occurrence of this word, and it occurs in all of the checking out discourses, was unexpected, unlike the other words mentioned above. e researcher decided to pursue this particular word further by analyzing it

// à did you

TOOK

// ä from the miniBAR // 145

in a variety of ways. So the word minibar, along with other words with a high frequency, was the subject of a concordance search. ese initial ways of examining the discourses (i.e. word frequency list and concordancing) led to forms of more qualitative analysis with respect to the occurrence of minibar. Each of the six checking out texts was examined from discoursal, pragmatic, lexicalgrammatical and intonational perspectives.

Collocation Collocational studies have become synonymous with concordancing (e.g. Hunston 2002; Sinclair 1991, 2001). Oen when concordance lines are studied, they are taken from a written corpus and so who has written the word being searched is not the subject of reporting or discussion. What is interesting in the checking out discourses is that at times it is clear that certain words in this particular type of service encounter are predictable in terms of who does or does not utter them. In the case of the social deictic sir, it is only said by the front office staff, and it is used when the front office staff greet, thank, ask a question or request the guest to do something. Similarly, visa card, credit card and bill are typically introduced only by the front office staff, and they frequently collocate with your. e phrasal verb check out or checking out is found in four of the six discourses, and in three of the four instances is spoken by the front office staff. ere are no clear collocational patterns when the front office staff utters these words, except that in the two instances found in B008, check out is collocated with you. In B006, the utterance is “checking out” (line 1); in B007A, “it is just checking out right” (line 4), and in B008, the utterances are that “you check out now” (line 1) and “sir I will do for you er check out for you.” When it is the guest who initiates the topic of checking out, the phrasal verb is prefaced with I want to (“um I want to check out eight two two one,” line 1, B001). Another high frequency verb in these discourses is thank which collocates with you in all cases and with very much / so much in three instances. e guests use this word five times and the front office staff three times. One reason that the collocates for thank are limited is that most of the time thank you (very / so much) comprises the entire utterance. Turning now to the concordance lines for minibar, it can be seen that, again, this is the preserve of the front office staff – no departing guest in the data

146 Winnie Cheng

ever utters this word. e fact that it is the staff that monopolize much of the vocabulary associated with checking out suggests that they are in control of the topic and the progress of the discourse. Also, all instances of minibar collocate with the definite article, half of the occurrences with the word key, and in all instances with you. Another finding is that minibar has strong colligational properties (Francis 1993; Hunston 2002) as it is contained in an interrogative in all of the six instances. Regarding collocational associations of minibar in the checking out discourses, in all instances the is the adjacent collocate to the le. e word key collocates with minibar in three instances, two to the right and one to the le. It should be noted that while the minibar key is mentioned, the room key is never mentioned even though they are both of the same reprogrammable type and the room key might be perceived as the more important of the two. In addition, minibar collocates with the verbs of purchase, get, took, got, and have to the le. So, while minibar was unexpected in the frequency of its occurrence, it was predictable in terms of who says it, the syntactic structure in which it occurs, and its collocational associations. Minibar was proving to be an increasingly interesting source of study and so it was decided to pursue minibar yet further by examining where it occurs in the discourse.

The foregrounding of “minibar” in the discourse An examination of each of the discourses revealed that the topic of the minibar in the checking out discourses tends to be foregrounded in the discourse by the front office staff, except for B008 where it is to be found in the middle of the discourse (the reasons for this exception are discussed later in the chapter).

The language of the “minibar” utterances e six minibar utterances were also examined in terms of use of language and, in particular, politeness forms (Brown and Levinson 1987) employed by the front office staff. Politeness forms refer to strategies employed by the staff to best achieve their communicative goals in the context of handling the topic related to the minibar. e question in B001 “did you purchase anything from the minibar” (line 17) is characterized by an absence of honorifics, hedging through the choice of modal verbs and please. In B004, the front office staff’s question “Mister T_ (pause) have you get the minibar key” (line 3), which is lacking a please and hedging, coupled with the fact that it is the first utterance

// à did you

TOOK

// ä from the miniBAR // 147

of the interchange, makes the speaker sound rather abrupt towards this guest. e question also contains a grammatical mistake (“have … get”). In B006, the front office staff’s question “did you took from the minibar” (line 4) seems to imply that the guest is under suspicion of possibly having taken things from the minibar. Apart from the grammatical problem (“did … took”), there is no use of hedging, honorifics and the ubiquitous please seen in the teaching materials. Similarly, the receptionist in B007A does not make use of pleases and hedging in his question “er have you got the minibar key sir” (line 6). ose in B007B and B008 make scant use of honorifics, pleases and hedging in their questions: “do you have the key of the minibar” (line 3, B007B) and “did you get any drink from the minibar” (lines 9–10, B008). e checking out discourse B001 is different from the other five. In B001 while use of minibar and some of the four steps are performed, they are seemingly secondary to the interest and concern shown towards the guest. e exception to this is the foregrounding of the minibar although at least this time the guest is asked if he did “purchase” rather than “took” something from the minibar. e analysis has, therefore, revealed both lexical-grammatical problems and pragmatic failure (omas 1983) in the minibar utterances.

Structural organization Each of the checking out discourses was analyzed for its move-structure with a view to understanding the structural organization of the text-genre (Bhatia 1993). In the first checking out discourse (B001), the minibar topic is foregrounded when the front office staff asks the guest “did you purchase anything from the minibar”. is is followed by the staff showing interest and concern towards the guest when he asks “is everything okay within the” (line 23), “and you’re going home now” (line 28), and “yea so you feel very tired” (line 35) while he is preparing something for the guest to sign. On line 43, the front office staff asks about payment (“and by the way are you going to handle the account by your visa card”) and for the guest’s signature on the payment slip. When business is over, the staff continues to address interpersonal rather than transactional goals when he asks the guest about his next stop on his trip. In B004, the front office staff starts the checking out discourse by asking the guest “have you get the minibar key”. en he brings up the topic related to the bill and says how much the guest should pay, followed by checking that the

148 Winnie Cheng

guest would settle the bill by a Visa card and asking the guest for a credit card. When the guest queries why the front office staff was asking for his credit card again, he provides the reason. B006 shows the same hotel but a different front office staff this time. Again, the minibar is foregrounded followed by the staff asking the guest to sign. Similarly, in B007A the minibar key is foregrounded when the front office staff asks the guest “er have you got the minibar key sir”. (Here the staff has used a wrong social deictic form to address the female guest.) In fact, the staff pursues his topic by paraphrasing his question about the minibar key. en he queries the guest about her use of room service and explains to the guest how much she should pay for using room service. en the front office staff asks the guest about payment, and whether she would need a receipt for the payment. In B007B, the minibar key is also foregrounded (“do you have the key of the minibar”). is is followed by the front office staff explaining to the guest about a change made by an airline company. In the last checking out discourse (B008), the topic of the minibar is foregrounded (“did you get any drink from the minibar”) followed by the staff’s question about payment (“er do you prefer to handle by cash or credit card”). Table 1 shows the structural organization of the six checking out discourses. Nine moves have been identified, three of which (i.e. moves 2, 3, and 7) are Table 1. Structural organization of the checking out text-genre Move-structure 1 Exchange of greetings 2 Establish the purpose of the service encounter 3 Settling minibar account 4 Hotel staff shows interest and concern towards guest 5 Settling room service account 6 State the amount of the hotel bill 7 Ask how the guest would like to pay the hotel bill 8 Ask the guest to sign 9 Ask whether the guest needs the receipt

B001

B004

Yes Yes Yes

Yes

B006

B007A

Yes

Yes Yes

Yes

Yes

B007B

B008 Yes

Yes

Yes

Yes Yes Yes Yes

Yes

Yes Yes Yes

Yes

// à did you

TOOK

// ä from the miniBAR // 149

worth paying particular attention to. First of all, the second move Establish the purpose of the service encounter occurs four times in the six checking out discourses. Second, the third move Settling minibar account is used both as an obligatory and the first move of the checking out discourses once it has been established that the guest is checking out of the hotel. ird, the seventh move Ask how the guest would like to pay the hotel bill is also frequent and found in four of the six discourses. e four utterances that realize this move are “and by the way are you going to handle the account by your Visa card” (line 43, B001), “you’ll settle by your er Visa card sir” (lines 11–12, B004), “did you pay by your Visa card” (line 21, B007B), and “er do you prefer to handle by cash or credit card” (line 17, B008). Consequently, minibar is not only an unexpectedly frequently occurring word with predictable colligational and collocational properties, it also constitutes an obligatory move in the text-genre and typically a foregrounded move in the discourses examined. e lexical, grammatical and pragmatic analyses of the language uses in the minibar utterances have indicated a need for language training of the front office staff. Elsewhere in the paper the issue of whether or not the minibar should even be mentioned by the front office staff during routine check-outs is discussed and where it should occur in the discourse, if at all. Let us now look at the intonational features of these minibar moves.

Discourse intonation Brazil’s (1985, 1997) description of discourse intonation system identifies four systems contributing to the communicative value of intonation. e sets of options available to the speaker within the tone unit are prominence, tone, key and termination. e HKCSE is prosodically transcribed using Brazil’s (1985, 1997) system for labeling spoken discourse to enable an analysis of intonational features of the discourses. In the extracts below, each of these intonational systems is indicated by various typographical means: prominence is shown by upper case; tone is indicated by the arrow at the start of each tone unit; key and termination are identifiable by whether the text is above, on or below the line (i.e. high, mid and low respectively); and the tonic syllable in each tone unit is underlined. According to Brazil, each one of these four systems may add a different layer of information, and decisions concerning which of them to employ are made by speakers on the basis of their ongoing real-time assessment of the progress of the discourse.

150 Winnie Cheng

e utterances containing minibar together with the next utterance in the discourse are given with all of the intonational features. It is possible to examine the use of intonation by the speakers and to discuss the appropriacy of it in this context. e ways in which the front office staff handles the minibar episode from an intonational perspective is examined, and suggestions as to how it might have been done differently are explored. B001 17 b: //ä did you purCHASE //ä anything from the [miniBAR // 18 B: [//æ NO //æ NOthing //æ

// NO

B004 3 b: // ä HAVE you get the minibar KEY // 4 B: // æ I wasn’t GIven one // B006 4 b: // à did you 5 B: // ä NO //

TOOK

// ä from the miniBAR //

B007A 6 b: // à ER // ä HAVE you got the minibar KEY sir // 7 A: // ä NO // æ I didn’t HAVE one // B007B 3 b: // ä do you have the KEY of the miniBAR // 4 B // æ // æ NO nothing // NO 5 b: // æ // Okay B008 9 b: // à you GOT a local CALL // à AND er ONE // (intonation inaudible) outside Hong Kong // ä WOULD you get any DRINK // à FROM // æ the miniBAR // 11 (B shakes head)

// à did you

TOOK

// ä from the miniBAR //

151

Prominence With respect to the use of the prominence by the speakers, the decision by a speaker as to whether or not to make a syllable (s) in a word prominent or not is determined by the speaker’s judgement regarding that which is shared between speaker and hearer (and so does not need to be made prominent) and that which is perceived by the speaker to be new (and so is made prominent). According to Brazil (1997), prominence is not an inherent property of a word, it is only associated with a word as a product of it being a part of the tone unit. e placing of prominence is, therefore, a linguistic choice made by the speaker and is unrelated to word-accent or grammar. Brazil (1985) makes an important distinction between two different sets of possibilities available to a speaker, namely the existential paradigm and the general paradigm. e former is the set of possibilities available to a speaker in a particular context of interaction, and the latter is the set of possibilities which exists within the language system (Brazil 1985:44). e selection of prominence, according to Brazil, is made when a speaker selects from an existential paradigm. e information conveyed by the choice of prominence is best illustrated by examining the data. In B001 the front office staff puts prominence on purchase and minibar (it should be noted that every syllable in a word does not have to be made prominent for the word to be considered prominent in an utterance). It is at these points in the front office staff’s utterance that existential paradigms occur, and he selects purchase from other possible “senses” such as take, buy, drink, use and so on. Similarly, minibar, rather than other retail outlets in the hotel for instance, is made prominent in this particular utterance. e front office staff’s yes/no question constrains the guest’s response and the existential paradigm is “yes” or “no”; no, nothing and no are thus made prominent. is pattern of prominence selection is also seen in B006 and B008 where the departing guests are also asked if they have made use of the minibar facilities during their stay. While the sense selection differs, the points at which prominence occurs are in keeping with Brazil’s (1997:21) description. In the other three extracts, the departing guests are asked if they have or got the key to the minibar. In B004 and B007A, the two different front office staff members ask (in terms of prominence) in the same way. In both cases, prominence is selected for have and key, which for these two speakers is where the existential paradigms occur. Interestingly, by not choosing prominence for minibar, the front office staff

152

Winnie Cheng

is projecting a shared understanding that there is only one key that they are concerned with in this discourse. In other words, the possibility of other keys such as the room key is excluded by choosing to make minibar non-prominent. In response to the staff’s questions with prominence for have, both of the guests in turn make prominent the words given and have, and by doing so the guests have interpreted the front office staff’s questions as requests for the return of the key. A different choice of prominence by the receptionist in B007B results in a different response. In this extract the front office staff chooses prominence for the words key and minibar and, by not choosing prominence on have, presents an understanding that the guest has the key. e result of this is that the guest interprets the front office staff’s question as if he had been asked whether or not he had used the minibar during his stay at the hotel and responds “no no nothing.” It could be argued that in terms of the selection of prominence, the front office staff’s choices are generally unmarked in terms of the expectations described by Brazil (1985, 1997), but in the case of B007B the front office staff might have been seen to be asking a question devoid of an implicature if he had placed prominence on have.

Tone e next choice examined is the speaker’s choice of tone. Speakers, according to Brazil (1997:82), basically have a choice between referring and proclaiming tones. When a speaker chooses a referring tone (fall-rise and rise tones), he effectively indicates that this part of the discourse will not enlarge the common ground assumed to exist between the participants. Choice of a proclaiming tone (fall and rise-fall tones), on the other hand, shows that the area of speaker-hearer convergence is about to be enlarged. Brazil further explains the rationale behind these choices by describing the role relationships that exist between the participants in a discourse. In discourse types where one speaker is dominant, in the sense of having greater responsibility for the discourse and greater freedom in making linguistic choices, that speaker monopolizes these choices (Brazil 1997:86). Hence, the use of rise instead of fall-rise or rise-fall instead of fall might be interpreted as insistence or forcefulness on the part of the speaker (Brazil 1997:98). It is interesting to note that in every one of the examples, the front office staff chooses a rise tone whether they are asking for the return of the key or

// à did you

TOOK

// ä from the miniBAR //

153

they are asking the guest if s/he has made use of the minibar. is choice of tone, with its assumption of the reassertion of shared knowledge coupled with a sense of insistence or forcefulness, is probably not appropriate in this context and might explain why the guests’ responses are typically quite emphatic in nature and invariably spoken with a fall tone which serves to indicate that what is being uttered is new information for the hearer. (e exception to this is B006 which is discussed below when key and termination are examined.) us the choice of fall-rise tone by the staff would have avoided the assertion of dominance at these points in the discourses.

Key and termination Finally, the last two factors making up the communicative value of the intonation to be found in an utterance are examined. Brazil (1997:40) states that a speaker also has a choice when it comes to the relative pitch, or key, at the start of each tone unit. Key choices are relative to the preceding tone unit and are chosen from a three-tier system: high, mid and low. e choice of key is made on the first prominent syllable of the tonic segment, and Brazil claims that each key adds meaning to what is said. High key has a contrastive value, mid key has an additive value, and low key carries the sense of self-evident (Brazil 1985: 84). ere is also a choice at the end which he terms the termination of the tone unit. Termination is the choice of high, mid or low key at the beginning of fall, rise-fall and fall-rise tones or at the end of rise tones. In this choice, a speaker is able to attempt to constrain the next speaker if s/he selects high or mid key thanks to the high degree of pitch concord (Brazil 1997:119) found in spoken discourse. In selecting low key termination, however, no such constraints are imposed and the next speaker is freer, for example, to embark on a new topic or to bring the discourse to a close. In most of the extracts, the speakers choose mid key and mid termination which both expect and receive concurrence from the hearers. However, this pattern is not found in B006 and B007B and these are interesting to look at in more detail. In B006, the front office staff uses high key when saying “did you took” which, unlike the mid key employed in the other receptionists questions which carries the sense that the speaker is seeking confirmation, carries the sense that the speaker really does not know whether or not the guest has taken something from the minibar. us the guest in B006 responds no with a rise tone invoking shared knowledge which in this context has the sense of as

154 Winnie Cheng

might be expected. In B007B the guest interprets the front office staff’s choice of prominence as implying that he might have used the minibar, and the marked absence of pitch concord by the guest by selecting low key rather than mid carries the sense of both self evidence and the refutation of the assumptions made by the receptionist. It can be seen that the front office staff also employs marked pitch concord with the choice of low key in his utterance on line 5, but this time it is an example of employing low termination to end this particular topic. e same strategy is employed by the guest in B001 where he ends his utterance with a low termination no (line 18). It could be argued that the front office staff might be perceived as more diplomatic if questions relating to the use of the minibar carried the intonation associated with genuine questions rather than employing intonation associated with seeking confirmation. All of the factors that comprise discourse intonation can impact the meaning of what is said, and it has been seen that on occasion the choices within the discourse intonation system might be deemed to be inappropriate and therefore the subject of language training for front office staff. It needs to be emphasised, however, that many discourse and pragmatic features contribute to meaning in context rather than any one particular feature having overriding importance in conveying meaning. In this sense, intonation has to be considered alongside the other discourse and pragmatic features discussed here. Intonation is an important contributor to meaning making but its effects are best understood as cumulative and need to be assessed alongside other important aspects of the talk.

Implications for main practitioners Practical relevance to ESP providers In order to compare the findings from real-life data with what is prescribed in learning materials, some ESP books on hotel and tourism were examined. In Bilbow and Sutton (1995:76), learning materials catering for front office staff working at the Cash Desk are presented as being comprised of four stages, each of which is illustrated by one or two language items: (1) Saying how much it (sic) is to pay at’s a total of HK$1,540, sir. at’s HK$2,350 to pay.

// à did you

TOOK

// ä from the miniBAR //

155

(2) Asking about payment How are you paying, sir? How will you be paying, madam? (3) Asking the guest to sign Could you just sign at the top/up here, please? If you could just sign here, please, sir. (4) Handing back the receipt to the guest Here’s your receipt and your card. (Adapted from Bilbow and Sutton 1995:76)

e stages in the ESP textbook (Bilbow and Sutton 1995:76) present handling check-outs as focusing on settling the bill and, while the language is marked with redressive action to give face to the addressee (Brown and Levinson 1987: 69), such as hedges, honorifics and pleases, they are taught as a fairly simple and straightforward service encounter. Interestingly the four stages correspond to moves 6, 7, 8 and 9 (see Table 1) respectively. However, only stage 2 (Asking about payment) or move 8 (Ask how the guest would like to pay the hotel bill) occurs frequently in the data. All the other three stages can be found in the checking out discourses but are infrequent. e most note-worthy difference is the move related to the minibar, as no reference at all is made by the ESP textbook to this particular move. Consequent to the observations made of the hotel checking out discourse, modifications were made as to ESP language training materials. In the latest edition of their ESP textbook, Bilbow (2002:65) has revised the four stages when handling the bill at the cash desk by suggesting that the first stage be asking about the minibar, and a related language item be included in the fourth stage: (1) Ask clearly whether (the guest) has used the minibar a. Excuse me, sir, have you used the mini-bar this morning? b. I’m sorry, have you used the mini-bar this morning? (2) Saying how much the guest should pay a. at’s a total of HK$1,540, please. b . at’s HK$2,350 to pay, please.

156 Winnie Cheng

(3) Asking about how (the guest) is paying a. How are you paying, sir? b . How will you be paying, madam? (4) Explaining the items on the bill a. e room rate is marked at the top. b. ese are your international phone calls. c . Here are your mini-bar expenses.

Practical relevance to hotel professionals e joint collaboration between HTM and the English Department was motivated by HTM’s desire to improve the training of its students and provide input into the training needs of those in the industry. Regarding the hotel management, they gave their permission for data collection in their hotel on the understanding that it would assist them in their missions to further develop and enhance the provision of quality service in the hotel. So the question that the researcher was interested in was how were these checking out speech events viewed from the perspective of hotel professionals and what messages would they wish to impart over and above merely settling the bill in this speech event. From the perspective of hotel professionals, three points were highlighted. First of all, hotels, especially five-star hotels, have a corporate message they are actively seeking to convey in each and every speech event. Second, the reception area is both the first and last point of contact for guests, and it is where lasting views of the hotel are oen formed (Bilbow 2002:1). ird, check-outs should last no more than three minutes, but in that time front office staff are expected to make full use of their customer care skills so that guests leave with a positive impression of the hotel (Bilbow 2002:74). In terms of language this means explicit manifestations of politeness forms (in the sense of deference towards the guest), courtesy and sensitivity to the guest’s needs. Lastly, the general view suggests that up to five points should be considered in the checking out discourse: (1) Staff should greet and bid farewell to the guest. (2) Staff should ask if the guest has enjoyed the stay at the hotel.

// à did you

TOOK

// ä from the miniBAR //

157

(3) Staff should demonstrate genuine interest in the guests onward travel plans and offer assistance where appropriate. (4) Staff should explain the contents of the bill when needed and handle the payment of the bill with patience and courtesy. (5) Use of room service, minibar, etc. should only be mentioned if the guest queries these items on the bill.

What is interesting from the above is that the frequent and foregrounded occurrence of minibar is indeed unexpected. It is also interesting to compare what was actually taking place on two days at the Reception Desk of the hotel with the hotel’s overall philosophy in its employment of customer care skills by examining the Mission Statement of the hotel. e following excerpt is taken from the Corporate Information on the hotel’s homepage: X Hotels International Limited is committed to not only meeting but exceeding the individual needs of each and every guest. Business executives and leisure travelers alike are pampered with the finest hospitality characterized by stylish comforts in elegant surroundings. United by a mission to make every stay a pleasant and memorable one, the (Hotel) staff devote all their efforts towards the delivery of impeccable service reflective of the group’s dedication to excellence.

It can be seen from the Mission Statement that for practitioners in the hotel industry the checking out discourses examined in this chapter are problematic not simply from a language point of view, but more importantly from the overall message that they convey. Only one of the six checking out discourses (B001) clearly communicates a message of customer care and concern for providing “impeccable service” that is so central to the mission of the hotel. e other five, however, are solely concerned with payment, primarily minibar use and the return of the minibar key, and so may send the wrong message to guests and fail to convey the mission of the hotel. e message is one of an overriding concern about the minibar rather than with the customer satisfaction. While the minibar is no doubt a source of income for the hotel, there is a risk here that its importance is overstated in these discourses, and the message communicated may be negative.

158 Winnie Cheng

Practical relevance to front office staff Another main group of practitioners is the frontline hotel staff. ey cooperated in the data collection process on the understanding that the data would be analyzed and the results fed into the production of training materials aimed at their specific English language needs. e outcomes of the collaborative hotel project that involved ESP providers, academics in HTM and the hotel management, coupled with input from corpus linguistics and applied linguistics, should contribute towards design of training courses and instructional materials for raising the awareness of the front office staff and improving the language skills and communicative competence in their work situations.

Practical relevance to the corpus/ applied linguist e last major practitioner is the corpus linguist and applied linguist. e researcher was neither part of the original project team nor its design, but was given access to the data collected. rough the initial analysis of the checking out discourses based on corpus linguistics, discourse analysis, pragmatics and discourse intonation, additional insights have been generated.

Conclusions e discourse of checking out in the hotel has been fairly exhaustively discussed and suggestions have been made with regard to areas of ESP language learning and teaching, language awareness training, and wider issues pertaining to the hotel’s overall philosophy and how this is conveyed by its staff. It has been shown that an initial investigation of the data using a word frequency list uncovered an unexpectedly frequent word and that a study of concordance lines further underlined this particular occurrence to be a fertile area of yet further investigation. is led on to looking at move structure, intonation, pragmatics and effective corporate communication. Such a process is by no means unusual in corpus-driven language studies and serves to underscore the findings waiting to be mined by researchers with access to corpora. To conclude, this corpus-driven language study has shown that the evidence from the checking out data in a business corpus, and the associated practitioners, contributes towards making theoretical statements (Tognini-Bonelli 2002:75)

// à did you

TOOK

// ä from the miniBAR // 159

about various linguistic, paralinguistic and pragmatic features of such communicative discourse in the hotel industry. e implications of these findings are also varied, and in this case somewhat complex. From a language training perspective, the study has shown the need to address issues relating to lexico-grammatical accuracy, intonational appropriacy and pragmalinguistic failure. Beyond these, there is also the matter of the foregrounding of topics such as the use by guests of the minibar and even whether the topic should be raised unnecessarily at all from the point of view of conveying an appropriate corporate message. e discussion so far has suggested a few important considerations in future related studies. First, all the parties or practitioners need to be involved in the design of the corpus and collection of data. Second, a full understanding of the functions of speech events is needed through consultation with all practitioners. ird, any analysis of the findings is provisional until all the parties have been consulted. Finally, applications related to the findings, again, need to be considered from the points of view of all concerned. What is clear is that the corpus linguist has much to contribute to the process.

Acknowledgement e work described in this paper was substantially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region (Project No. BQ 396). anks are due to Grahame Bilbow and John Sutton for generously donating their data to the HKCSE. anks are also due to Richard Cauldwell who has been a consultant to the HKCSE on the prosodic transcription of the data.

Note 1. Wordsmith is a soware program developed by Mike Scott (1996) and distributed by Oxford University Press on the Internet. WordSmith is a collection of programs with several functions which enable researchers to find out how words are used in the texts. e main functions are concord, wordlist and keywords. e Concord tool creates concordances (lists of words in context) and finds collocates of the word. e Wordlist tool generates word lists in alphabetical and frequency order for the purpose of comparing texts lexically. It also generates such statistics as total number of words, length of words, number of sentences, and

160 Winnie Cheng

so on. e Keywords tool identifies key words in a given text. Keywords are words whose frequency is unusually high in comparison with other texts.

References Aijmer, K. 1996. Conversational Routines in Spoken Discourse. London: Longman. Bhatia, V. 1993. Analyzing Genre: Language Use in Professional Settings. London: Longman. Biber, D., Conrad, S. and Reppen, R. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Biber, D., Johansson, S., Leech, G., Conrad, S., and Finegan, E. 1999. Longman Grammar of Spoken and Written English. London: Longman. Bilbow, G. 2002. Speaking for Modern Business. Hong Kong: Pearson Education China Limited. Bilbow, G. and Sutton, J. 1995. Hospitality Business: Craft and English Language Skills for the Hotel and Catering Business. Hong Kong: Longman. Brazil, D. 1985. The Communicative Value of Intonation. Birmingham: English Language Research. Brazil, D. 1997. The Communicative Role of Intonation in English. Cambridge: Cambridge University Press. Brown, P. and Levinson, S. C. 1987. Politeness: Some Universals in Language Usage. Cambridge: Cambridge University Press. Cheng, W. and Warren, M. 1999. “Facilitating a description of intercultural conversations: The Hong Kong Corpus of Conversational English.” ICAME Journal 23:5–20. Cheng, W. and Warren, M. 2000. “The Hong Kong Corpus of Spoken English: Language learning through language description.” In Rethinking Language Pedagogy from a Corpus Perspective, L. Burnard and T. McEnery (eds), 133–144. Frankfurt am Main: Lang. Francis, G. 1993. “A corpus-driven approach to grammar.” In Text and Technology, M. Baker, G. Francis and E. Tognini-Bonelli (eds), 137–156. Amsterdam: John Benjamins. Hoey, M. 1991. Patterns of Lexis in Text. Oxford: Oxford University Press. Hong Kong Tourist Board 2002. “Year ends on record-breaking note for tourism with 16.75m arrivals,” January 24. Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press. McEnery, T. and Wilson, A. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press. Scott, M. 1999. WordSmith Tools. Oxford: Oxford University Press. Sinclair, J. M. 1991. Corpus, Concordance and Collocation. Oxford: Oxford University Press. Sinclair, J. M. 2001. “Review.” International Journal of Corpus Linguistics 6 (2):339–359. Sinclair, J., Jones, S and Daley, R., 1969. English Lexical Studies. Birmingham: University of

// à did you

TOOK

// ä from the miniBAR //

161

Birmingham, for the Office of Scientific and Technical information. Thomas, J. 1983. “Cross-cultural pragmatic failure.” Applied Linguistics 4 (2):91–112. Tognini-Bonelli, E. 2002. “Functionally complete units of meaning across English and Italian: Towards a corpus-driven approach.” In Lexis in Contrast: Corpus-based Approaches, B. Altenberg and S. Granger (eds), 73–96. Amsterdam: John Benjamins.

Appendix 1 Six hotel checking out discourses B001 b: male Hong Kong Chinese B: male native speaker of English 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.

B: um I want to check out eight two two one (pause) B: Mister G_ (pause) b: let me see (pause) b: it’s eight one two two B: eight b: eight one two two B: er eight one eight one two two right b: yea B: right sorry b: you are too tired B: yeah eight two two one that’s stupid but I was in the right room B&b: ((laugh)) (pause) b: did you purchase anything from the [minibar B: [no nothing no (pause) b: your bill is on the way it’s coming B: okay (pause) b: is everything okay [within the B: [everything okay yea thanks (pause)

162 Winnie Cheng

26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44.

B: b: b: B: b: B:

45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63.

b: B: b: b:

B: b: B: b: B: b: B: b: b: B:

B: b: B: b: B: b: B: b: B: b: B:

I needed some sleep [((laughs)) [((laughs)) and you’re going home now er yes home to China for work or for work yea (pause) the problem is when you arrive in the morning in Hong Kong in Europe it’s midnight yea so you [feel very tired [and with the kids midnight they want to sleep yea so it’s very hard alright okay signature yes please (pause) and by the way are you going to handle the account by your [visa [yea card credit yea yea I do need your credit card once again once a[gain [yea please (pause) it’s a visa yea alright (pause) now you feel better yea much better (pause) may I have your signature once again please okay yea Chiso how long will it take to go to Chi- China er one hour one hour just one hour (pause)

// à did you

64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81.

TOOK

// ä from the miniBAR // 163

B: it’s very quick b: yea (pause) b: alright here you go B: okay thank you very much b: no problem B: and you you’ll make the [other one alright b: [ah (pause) b: well actually [this morning B: [it’s not signed yet b: no [this morning we’re [holding a approval here now I’ve used up B [oh okay [yea B: okay al- alright b: okay have a nice trip huh B: okay good bye b: bye B: yea

B004 b: male Hong Kong Chinese B: male native speaker of English 1. 2. 3. 4. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

b: b: B: b: B: b: b: B: b: B: b: B:

Mister T_ (pause) have you get the minibar key I wasn’t given one sorry I didn’t have one oh I’m sorry um (pause) yes in in your bill they have er local call one hundred number call and one coffee shops in lobby lounge yea yes total is two thousand five hundred and ninety eight and you’ll settle by your er visa card sir yea can I have your credit card please er didn’t you have that

164 Winnie Cheng

16. b: 17. B: 18. b: 19.

pardon you didn’t have that um last night we just imprint the number now I need your credit card again for the payment

B006 b: male Hong Kong Chinese B: male native speaker of English a: female Hong Kong Chinese 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

b: checking out B: out ((research staff approaching B to ask for consent)) a: we’re now conducting a research about the English and communication skills I would want to record your conversation between you and the PolyU would you mind being tape recorded B: that’s er a: er just just the the transition the the b: just the normal proceed procedure you bill is prepared (pause) b: did you took from the minibar B: no (pause) b: just sign here sir (pause) B: okay (pause) b: thank you very much

B007A b: male Hong Kong Chinese A: female native speaker (British) 1. 2. 3. 4. 5.

b: good aernoon (.) how are you A: very well thank you (pause) b: just checking out [right A: [yes

// à did you

6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34.

b: A: b: A: b: A: b: A: b: A: b: A: b: A: b: A: b: A: b: A: b: b: b: b:

b: B: b: B: b:

// ä from the miniBAR // 165

er have you got the minibar key sir no I didn’t have one okay you didn’t got it huh you didn’t got the key no I didn’t get the key (pause) you’ve got a room service yeah I had a coke and a bottle of water pardon a diet coke and a bottle of water oh in the in the room [service [uhuh uhuh and er the only this is eighty five point eight yes did you pay by your visa card yes can I have your visa card again please I I I er gave it when I checked in yes but but we just imprint the number when you [check in [oh okay (pause) thank you (pause) do you need your receipt for the room service ((A shakes head.) no need ((b passes the bill to A.)) thank you

B007B b: male Hong Kong Chinese B: male native speaker of English 1. 2. 3. 4. 5.

TOOK

how are you sir thank you do you have the key of the minibar no nothing no okay

166 Winnie Cheng

6. 7. b: 8. 9. B: 10. b:

(pause) I understand that the main change will um be done by Japan Airlines that’s all no other other changes that’s all thank you so much no problem

B008 b: male Hong Kong Chinese B: male native speaker of English (American) 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

b: B: b: B: b: b:

B: B: b: B: b: B: b:

booking it’s for one night is it that you check out now (pause) I didn’t pay what did they reckon well to be honest they book for you is for overnight I wouldn’t pay for it I wouldn’t pay for it (pause) sir I will do for you er check out for you (pause) you got a local call and er one outside Hong Kong would you get any drink from the minibar ((B shakes head.)) no thank you (pause) well that’s a calling card for ten dollars what is that ten dollar er (inaudible) er do you prefer to handle by cash or credit card credit card (pause) here’s your credit card

“Invisible to us” 167

“Invisible to us”: A preliminary corpus-based study of spoken business English Michael McCarthy and Michael Handford University of Nottingham, UK

Introduction: Researching spoken business English is chapter addresses the question: Can a corpus-based study of spoken business English (SBE) characterize SBE as distinct from other kinds of talk, and, in the process, give quantitative substance to existent, non-corpus-based studies of SBE? e term spoken business English has a broad embrace within the literature, as does the term business English in general (St John 1996). Work in SBE ranges from descriptive studies of business meetings within and across corporations (most notably Bargiela-Chiappini and Harris 1997), including studies of buying and selling negotiations (e.g. Firth 1995; Charles 1996) and what might be termed office talk or workplace talk (e.g. Grimshaw 1989; Koester 2001), to more pedagogical concerns such as the authenticity or otherwise of spoken business English as portrayed in language teaching materials (e.g. Williams 1988), and the language needs of students of business (e.g. Crosling and Ward 2002). ere is, too, a robust tradition of research into cross-cultural issues in spoken business communication (e.g. Yamada 1990; Garcez 1993; Halmari 1993; Ulijn and Li 1995; Ulijn and Murray 1995; Connor 1999; Gimenez 2001). e themes and preoccupations of such studies are equally many and varied. Of central concern seems to be the generic organization of different types of spoken events (e.g. phone calls, meetings), the negotiation and building of identities, the manifestations of status and roles (Charles 1996), the creation and maintenance of business cultures, the metaphors and other insti-

168 Michael McCarthy and Michael Handford

tutionalized constructs which sustain such cultures (Bargiela-Chiappini and Harris 1997), the nature of corporate and institutional power and organization and its linguistic reflexes. Studies of power and authority have a long pedigree in investigations into organizations and institutions, and notions of power and the relational distance which oen accompanies it – along with considerations of cultural identities in terms of individualism and collectivism – may also be deemed of relevance to the questions raised by the data used in this chapter (Hofstede 1991; Chiapello and Fairclough 2002). In the cross-cultural context, as well as looking at the management of talk and turn-taking, cultural differences stemming from, for example, distinct perceptions of time and space have also been considered relevant (Yamada 1990; Ulijn and Li 1995). In terms of analytical approaches, studies have most oen been broadly based within genre analysis, discourse analytical and pragmatic frameworks and/or the conversation analysis (CA) paradigm. Bargiela-Chiappini and Harris (1997) are typical of a blending of such approaches in their concern with thematic (topical) development in business meetings, the forms and pragmatic functions of pronominalization, the forms and functions of discourse markers, metaphors, and so on. Firth (1990, 1995), who looks at selling interactions among users of English as lingua franca, works firmly within a CA framework. At the same time he is critical of CA’s assumptions of a stable speech community vis-à-vis his data, where communication problems are oen glossed over and le unresolved, rather than repaired, worked at and successfully achieved in the conventional CA sense. Larger structural and generic concerns have also been explored: Charles (1996:21) works with an approach to business negotiation that sees the talk as operating on hierarchical levels, ranging from the superstructural (the overarching relationship in which the negotiation is embedded), through the macrostructural (the event itself), to the microstructural (cycles within the event). As with other similar genre-oriented analyses, negotiations are seen as proceeding in stages, from initiation to development and then ending. Not directly connected with SBE but highly relevant to the kinds of data in the present chapter is the literature on professional discourses and workplace discourses (most typically the discourses of those working within the caring services or the legal and educational professions, but also including conflict resolution in industrial settings). e concept of communities of practice (Wenger 1998) has become important in such research. A community of practice (CofP) is akin to the wider notion of speech community but specifically

“Invisible to us” 169

entails regular mutual engagement of its members (e.g. frequent interactions as a basis for relationship building within the community), shared goals and negotiated activities, and a shared repertoire of resources accumulated over time (e.g. shared terminology, lore, ways of approaching problems, etc.). Holmes and Meyerhoff (1999:175) see joining a new workplace as a manifestation of entering membership of a CofP; the community constructs and reinforces itself in “global or specific aspects of language structure, discourse and interactional patterns.” Similarly, Drew and Heritage (1992) speak of institutional interactions as orienting to common goals and tasks, as working within constraints on what are allowable contributions, and proceeding on the basis of inferential frameworks and procedures established in particular institutional contexts. In similar vein, Roberts and Sarangi (1999:2) understand workplace communicative practices as creating an “interaction order,” while the shared habitual practices, beliefs, received knowledge and so on which professional members draw upon creates an “institutional order.” Professional discourses oen include collegiate meetings similar to business meetings, and the study of professional meetings has much to offer anyone examining business data. Boden (1995:83) shows how “everyday accounts and arrangements provide a fine interactional grid through which work colleagues from different departments and firms filter their own organizational agendas.” Boden’s (1994) description of the stages of business meetings (how they open, get down to business, etc.) parallel the patterns that can be observed in our data. Although some of the studies mentioned have built upon evidence from (at least modest) corpora, corpus linguistic techniques have been less widely used to seek out understandings of the nature of SBE. Some of the well-known large corpora do include samples of what may loosely be termed business talk in their design: the 100-million word British National Corpus (BNC) includes 1.3 million words of ‘events such as sales demonstrations, trades union meetings, consultations, interviews’ (see Aston and Burnard 1998; see also http: //www.hcu.ox.ac.uk/BNC). e International Corpus of English (ICE) project has built into its design the aim that each sub-corpus of English from the different countries which provide the data should contain some 20,000 words of spoken business transactions (see http://www.ucl.ac.uk/english-usage/ice/ design.htm). Within what might be termed smaller-scale corpus projects, Bargiela-Chiappini and Harris’s (1997) research is based on approximately 18 hours of business meetings recorded in Great Britain and Italy. Nelson’s Kielikanava (Turku, Finland) Business English Corpus (BEC) consists of one

170 Michael McCarthy and Michael Handford

million words of spoken and written data, which includes spoken recordings of meetings, negotiations and telephone calls (see Nelson 2000), and Nelson’s is an example of the kind of study the present chapter seeks to emulate. Using similar tools and corpus-analytical procedures to those of the present chapter, Nelson compared the lexis of a business English corpus with (a) the BNC as his benchmark corpus, and (b) a corpus of published business English teaching materials. He found that it was possible to describe a business English lexicon distinct from general English in that it covered a limited set of semantic fields which reflected the institutionalized relationships, activities and events of the business world.

Corpus description and data collection Our study of SBE is based on the CANBEC corpus. CANBEC stands for Cambridge and Nottingham Corpus of Business English.1 e corpus currently stands at just over the original target of 1,000,000 words of spoken business data recorded in a variety of settings: internal meetings within the same company, external meetings involving two or more differing companies, office talk, sales presentations, telephone conversations and office banter. Meetings comprise the majority of the data, and these include (a) technical/professional meetings which tend to discuss some static technological or physical aspect of work, for example computers or buildings, (b) progress review meetings which check the efficacy or success of a change or project, which in comparison are more fluid in terms of involving a discussion of a process, (c) meetings to instigate or inform of a change, and (d) sales meetings (see Figure 1). Recordings are from a range of businesses in terms of size and type, mainly in the United Kingdom; some English language data have been collected in other countries (see Appendix, Table 1). Types of interaction include discourse between peers, managers communicating with subordinates, clients and potential clients negotiating with other companies, and consultants advising businesses. e size of the organizations involved in the research reflects the spectrum within the business world, from self–employed business people to multinational corporations employing over 40,000 staff. Service, manufacturing, IT, financial and travel industries are among those which have provided substantial amounts of data (see Figure 2). Within CANBEC speakers make, change and postpone decisions, collaborate, solve and create problems, as

“Invisible to us”

171

well as argue, inform, agree and entertain. e target size of CANBEC is one million words, due for completion by the fall of 2003. e present study is based on a cross-sectional representative sub-corpus of 250,000 words, which comprises the fully transcribed and coded data at the time of conducting the present research. It was decided that the range of businesses approached should be as wide as possible, both in terms of size and type, and while the majority of recordings would be made in the UK involving native speakers of English, overseas recordings involving competent non-native speakers would also be of value. is approach would allow for a variety of language in differing contexts, and would increase the possibility of reaching the stated goal of 1 million words, given the likely barriers the project would face. Confidentiality was assumed, correctly, to be the most probable hurdle, with companies being reluctant to have a person with a microphone sitting in on sensitive discussions. Despite written assurances from the funding body guaranteeing confidentiality through systematic anonymization of all names and references which could be used to identify the company and its employees, most organizations deemed the risk to be too high. e effect recordings might have on employees’ performance and the perceived lack of reciprocal gain for the imposition were also, again correctly, deemed to be potential barriers. CANBEC researchers have also offered feedback and training seminars on effective communication, which has persuaded a few businesses to allow recordings to take place. Initial soundings with companies and organizations with which the researchers had no personal contact were found to be consistently unfruitful, the above concerns being commonly cited. Company employees with whom the researchers had (or could develop) some personal relationship, whether it be a friend, relation, old school or college contemporary, colleague, former colleague, or friend or relation of a colleague, have been the most productive sources. It should also be stated that the ratio of acceptances of this group of people has been roughly one in twenty. Contacts were also made through organizations such as regional Chambers of Commerce, local development agencies and clubs such as the Rotary Club, but these provided considerably fewer sources than those listed above. In terms of which types of businesses have been most difficult to record in, they have tended to be financial arms of multinationals and large accountancy companies. As can be seen in Appendix, Table 1, recordings were made in multinationals, and even in a multinational bank, but access has, up until this

172 Michael McCarthy and Michael Handford

point in time, generally been denied to meetings discussing pecuniary aspects of the business. Large accountancy firms have been unwilling to be recorded even with promising personal contacts within such companies, although data has been collected from smaller accountants. Obtaining data from meetings involving clients and potential clients has also been difficult, given companies’ understandable reluctance to incur any kind of perceived inconvenience on the part of their customers. CANBEC researchers have, however, obtained approximately 250,000 words involving such external data from a variety of companies. Analog recordings are made on cassette using professional microphones, with the CANBEC researcher usually present in the room (see the explanation of the title of this paper, below). It was reasoned that the recorder could thereby ensure accurate cataloguing of order of speakers and swi changeover of tape sides, which would outweigh the possible influence of having a nonparticipant present (the “observer’s paradox,” see Labov 1972). A few companies requested that they conduct the recordings themselves, usually when an external meeting or presentation was to take place.

Analytical framework and methodology Perhaps the most fruitful types of analyses are those which combine quantitative and qualitative data (e.g. Bargiela-Chiappini and Harris 1997). e present study aims to do this by combining the quantitative data of frequency lists, keyword lists, cluster lists and concordances with the insights of discourse and conversation analysis concerning such areas as pronoun use, modality, extended metaphor and other indices of interpersonal communication which have characterized the SBE literature. We do not aim to illuminate the distribution or nature of technical nomenclature or features of content, but rather to investigate how corpus analysis can contribute to the broader issues, mentioned in the review section above, concerning the creation and sustaining of organizational roles and identities, institutional cultures and the discourses which give them substance. One broad question we pose is: To what extent is SBE like or unlike everyday informal casual conversation? is question derives from the tradition of identifying spoken genres in terms of their similarities with and differences from casual conversation, where conversation is the benchmark, the primary

“Invisible to us” 173

genre, an approach which is amply exemplified in the study of genres of media talk such as interviews or talk shows (e.g. Greatbatch 1988, and the papers in Scannell 1991) as well as types of professional discourse (Drew and Heritage 1992; Larrue and Trognon 1993; Boden 1994). We therefore compare business talk with everyday casual conversation, but there is also an institutional dimension to business talk: business talk exists and evolves within and across business institutions and constructs and reflects business identities, roles and cultures which have become institutionalized over long periods of time. It is thus fruitful also to investigate SBE as an example of institutional discourse, and, to this end, we compare it with spoken academic data drawn from British university seminars, classes, tutorials, etc. Spoken academic data, we would argue, is comparable with SBE in that one might reasonably expect a similar degree of collective discussion in a generally non-conflictual environment, some degree of hierarchy or authority, a degree of institutionalized formality, clear task- and goal-orientations, and so on, qualities which will not be evident in the everyday casual conversational data and which mark academe out as a community of practice and as possessing those institutional and professional characteristics discussed by those who research professional discourses (see section 1 above). Our methodology is in the first instance quantitative. Using standardly available soware (e Wordsmith Tools suite, Version 3.0: see Scott 1999), raw frequency lists, rank-ordered and alphabetical, were generated for the CANBEC data and compared broadly with frequency lists for general conversational data. Keyword lists (i.e. lists of words occurring with statistical significance in one corpus rather than another) were generated for the CANBEC data as compared with a general conversational corpus and as compared with an institutional spoken corpus, in this case, the spoken academic corpus. Frequency lists for two-, three-, four-, five- and six-word clusters (sometimes known as lexical bundles, see below) were also generated. Finally, concordances were generated for a sub-set of the positive keywords (i.e. those shown to be occurring with significant frequency in the business data as compared with the other corpora). e qualitative analyses then turned on adducing the functions in context of the sub-set of keywords and analyses and discussion of illustrative extracts from the original conversations. e hypothesis of our study is that aspects of SBE discourse as already evidenced in qualitative exegesis will find quantitative support and ratification in a sizeable corpus, and that SBE will display similarities both with everyday casual conversation and with

174 Michael McCarthy and Michael Handford

institutional varieties such as academic talk, as well as key differences which will characterize it as an independent register.

Quantitative data Frequency Table 2 in the appendix shows a comparison of the top 40 word forms in the 250,000-word CANBEC data with a 340,00-word sub-corpus of general, informal social and family conversations (hereaer CONV), and a 340,000-word sub-corpus of spoken academic data taken from the CANCODE corpus of spoken English(hereaer ACAD)2. Similarities and differences are immediately apparent. e top 10 words in each corpus are strikingly similar, and only 13 of the top 50 CANBEC word forms do not appear in one or both of the other lists (well, if, got, no, at, think, can, as, are, don’t, them, get, then). is may be partly explained by the rather artificial cut-off point of 50 items (e.g. think is 52 in the spoken academic list, if is number 53 in the social list and well is number 55 in the spoken academic list, just excluding them from Table 2). Overall, then, SBE shares a core, highfrequency set of word forms with CONV and ACAD.

Keywords A more sensitive quantitative measure is to generate keyword lists to ascertain which word forms are occurring in CANBEC with greater significant frequency than in the other corpora. ese forms, if they exist, are likely to be at least partially responsible for what makes business talk different from the other forms with which we are comparing it. At the heart of Nelson’s (2000) study of business English is the contention that keyword analysis best defines the business lexicon since raw frequency counts, especially in the case of high frequency words, show considerable overlap between business English and general English and fail to capture crucial differences, as we indeed argue here. Table 3 in the appendix shows the top 50 words which are key to CANBEC as compared with CONV, and Table 4 shows the top 50 which are key to CANBEC as compared with ACAD. In both lists, industry- and productspecific words (e.g. crane, rack, coal) and numerals have been omitted so as to

“Invisible to us”

175

include as much as possible of a common core across the different companies recorded. Items which occur in both keyword lists are mostly content-oriented words (company/ies, customer(s), meeting, sales, paperwork, per/cent, etc.) but notable are the forms which include first-person pronoun we (we’ll, we’re) and the form need. We shall return to these below. Items of interest which are key in CANBEC compared with CONV include okay, if, which, problem, forms with pronoun we and the (semi-)modal forms need, will, can and 85 of the 128 occurrences of may (the rest being the month, May). Items of interest which are key in CANBEC compared with ACAD include we, us, our and associated contracted forms (we’ve, we’d), the forms gonna, need, anyway and mean. For reasons of space, only 50 words in each keyword list are reproduced here, but it is worth noting that scans further down each list covering the top 200 keywords reveal positive frequencies in CANBEC of items connected with modality (e.g. could, gonna, needs, wanna, want, won’t), stance adverbs (basically, actually), first person plural pronouns (our, us) and problem(s), some of which have already occurred in the same or similar forms one or other of the two lists at higher ranks. e keywords are a kind of snapshot: they certainly tell us that predictable content domains are frequently talked about (prices, customers, meetings, paperwork, etc.), but they also reveal preferences for certain pronominal reference (we, us, our) and a tendency to use particular modal expressions and expressions of stance. Equally, problem(s) seems to be a key word. It is these key words and their contexts which may provide some illuminating insights into the interpersonal stratum of spoken business communication, and which will characterize it as (a) partly akin to everyday informal casual conversation, (b) partly akin to institutional discourses (in this case academic), and (c) akin to neither, but unique and special, a register or macrogenre of speech which can be isolated and adequately described on the basis of participants’ generic activity in the construction of relations and identities, individual, corporate and cultural, in other words, its own “interaction order” (Roberts and Sarangi 1999).

Clusters e importance of the ready availability to speakers of off-the-peg language in the form of fixed expressions and multi-word sentence frames has been much discussed in the literature as a central component of real-time speech and fluency (see Wray 2002). Biber et al (1999) show how recurrent strings of words

176 Michael McCarthy and Michael Handford

are extremely frequent in texts of all kinds, even though many such clusterings are not ‘idiomatic’ in the sense of being syntactically fixed and semantically opaque fixed expressions. Biber et al call such strings ‘lexical bundles’ (see also Biber and Conrad 1999). Significant recurrence of such bundles or clusters is defined by establishing frequency cut-off points, for example, that a cluster must occur at least 10 times per million words of text (or 20 times in the case of Cortes 2002), and must occur in a number of different texts. is entails that such clusters may be fragmentary in terms of complete syntactic units but (albeit only partially) meaningful; for example to be able to or a lot of the are clusters noted by Cortes (2002), who compared freshman writing with academic texts. More obviously semantically integrated expressions such as as a result and on the other hand also arise from such searches. A repeated and convincing claim of those researching lexical bundles is that such clusters recur because they are key structuring devices which are register- (or genre-) sensitive. Oakey (2002), for example, shows how frequently recurring clusters such as it has been (shown/observed/argued/etc) that, which are used to bring in outside evidence in written texts in the three genres he investigated (social science, medical and technical), are distributed differently across the three types. It is therefore reasonable to hypothesize that clusters in the CANBEC business data may reveal something of the character of SBE as a distinct genre. Space precludes inclusion of all of the two-, three-, four-, five- and six-word clusters in the CANBEC data, but Table 5 in the appendix exemplifies the cluster-lists with the top 20 five-word clusters in CANBEC. Only seven six-word clusters occur four times or more, and beyond six words, clusters are extremely rare. First we present some raw comparative figures for the clusters, then discuss their interpretation. Discourse marking expressions are evident in all the corpora. Not surprisingly, you know, I mean and I think are the three most frequent two-word clusters. As high frequency general discourse markers, these also occur in the top four clusters of CONV, though, interestingly, all three are lower in ACAD (ranks 3, 14 and 12, respectively). Do you know what I mean occurs 30 times per million words in CONV, 16 in CANBEC and 18 in ACAD. At the end of the day is 100 times per million words in CANBEC, 36 in CONV, and only 9 in ACAD. It would seem, then, that SBE shares the basic discourse markers with CONV (more than ACAD does) and that at the end of the day is preferred in SBE and rare in ACAD. At the end of the day has an important summarizing and encapsulating function in English and illustrates the frequency with which

“Invisible to us” 177

participants seek to formulate decisions, consequences, projected and real outcomes and other goal-related functions. Vague and hedging expressions are evident in the cluster lists. Sort of is rank 4 in ACAD, rank 11 in CONV, and rank 19 in CANBEC. A couple of occurs 135 times per million in CANBEC, 83 in CONV and 82 in ACAD; a bit of is fairly evenly distributed across the three corpora. I don’t know and I don’t think are high in both CANBEC and CONV. I don’t know is rank 2 in ACAD, but I don’t think is much lower, at rank 37. Or something like that is more or less equal in CANBEC and CONV and only slightly lower in ACAD. All the rest of it is rank 15 in CONV, 4 in CANBEC and 21 in ACAD. ere is a mixed picture here, but no shortage of vagueness and hedging in SBE, especially in quantifying expressions and personal hedges. Der-der-der-der-der-der is an interesting vague expression which occurs frequently in CANBEC, used to project a high degree of shared knowledge of ritual and formulaic words typically used in particular situations. We need to occurs 520 times per million in CANBEC but only 84 and 78 in CONV and ACAD, respectively. On the other hand, you need to is 212 in CANBEC, 207 in ACAD, but only 93 in CONV. Do you want me to is 33 in CONV 28 in CANBEC and 21 in ACAD. ese seem to reflect the high degree of collective goal-stating in SBE, even if this is a only a projected or feigned collectiveness, and, as we shall see below, need is oen used in face-protecting requests and directives. At the moment occurs 592 times per million in CANBEC, and only 168 and 114 in CONV and ACAD, respectively. is probably reflects the temporal and temporary nature of many of the entities and phenomena discussed in SBE, where dynamism and change are central to the business culture. Overall, clusters can suggestively illustrate the shared communicative resources and ways of approaching problems which characterize Communities of Practice (Wenger 1998), insomuch as the repeated patterns of clusters reflect institutionalized wordings and frames that have become at least to a degree pragmatically specialized within SBE. Clearly though, clusters, like keywords and frequency lists, are still rather blunt quantitative instruments, and it is only in the more qualitative data provided by concordances and actual conversational extracts that we get the clearest picture of SBE’s institutionalized ways of communicating.

178 Michael McCarthy and Michael Handford

Qualitative analysis: Concordances Concordances, where chosen words and phrases can be displayed along with their surrounding co-text in the KWIC (key word in context) format, enable us to home in even closer on how our keywords and clusters actually occur in SBE. Concordances were generated for a range of keywords, selected to capture a cross-section of lexico-grammatical types.

Pronouns and modal expressions We is an extremely frequent word in CANBEC, and its overall frequency is considerably greater than in CONV or in ACAD, not necessarily a surprising finding, given the corporate and collective nature of much of SBE discourse (Bargiella-Chiappini and Harris 1997:121). Figure 3 in the appendix gives a comparison across the three corpora of the pronouns we and I. Tables 3 and 4 in the appendix rank we as number two in both keyword lists (i.e. CANBEC versus CONV and CANBEC versus ACAD), so it is clearly a distinctive item which marks out the CANBEC data. We carries a wide range of references, from very broad corporate reference to immediate group reference and to the individual using it to shelter behind corporate authority or responsibility or to protect interlocutors’ face (Extracts 1–3). Extract 1 [e recording involves a British hydraulics company and an international coal company. ey are discussing their advertising schedule.] (Broader, corporate we: includes people other than the speakers) <$1> Do you know what I mean? Erm and there again it it’s a case of getting in front of people when the leads are produced. <$3> It is yeah. Yeah. <$1> at’s what it’s all about. <$2> We di= Yeah Obviously if we get leads erm if the if we need to be wherever it is. We need to be in+ <$1> Mm. <$2> +China in Korea or wherever+ <$1> Wherever. <$2> +we need to be there. <$1> at’s right.

“Invisible to us” 179

Extract 2 [As for extract 1] (Immediate group reference we) <$1> Well, we can talk to Ron Dawson about it tomorrow, can’t we? <$2> Yes. Extract 3 [Internal meeting among the sales and marketing managers of a British manufacturing company. e participants are reviewing and planning sales and marketing.] (Face-protecting request/directive using corporate authority we) <$3> e spares side of things is another ball game altogether. <$2> Well. <$1> Right. = there’s no need for us to concern ourselves with that is there really. <$5> No. <$3> No. <$1> I mean you’re not, you’re bothered. <$3> No. <$1> We need to get our heads round and have a think about it as to the best way to go. e bit that you are bothered about is the price increase and whatever for the products first of the year. So we’ve decided, Derek are you happy with this, two and a half per cent as of the first of January. And all quotes from now will go out to let people know+ <$6> Mm. <$1> +that there will be an increase and we’ll dra a letter and get it out+ <$6> Yeah. <$1> +as soon as possible. <$5> Yeah. <$2> And we’ll look at the spares. I’ve made a note. I’ve put spares. <$1> Okay. Can we get that letter out next week David? <$2> Yeah. laughs <$1> Nothing else to do? <$2> No. [laughter] <$2> I’m just wondering perhaps they can do it from here. <$1> Yeah. Could you dra the letter and then just get them to do the mail shot from here?

180 Michael McCarthy and Michael Handford

<$2> Yeah. Yeah.

Extract 3 is particularly interesting since Speaker 1 freely switches from a group we (We need to get our heads round…) to a corporate we (confirming the company’s price rises), to a more authoritative but face-protecting request or directive to one individual (Can we get that letter out …), to a direct request/ directive to the same individual using you, but only following a humorous aside (Could you dra the letter …). ese shis in the personal deixis are claimed by Zupnik (1994) to be power-enhancing in the context of political discourse: powerful speakers can shi in and out of various roles and display multiple identities in particular situations. e SBE we certainly seems to operate with the same flexibility and inherent vagueness that Zupnik noted in political talk (see also Bargiella-Chiappini and Harris 1997:122). It is apposite to combine the discussion of we and need, since in 194 of the concordance lines for we, the immediate post-text is need, and of these, 130 are we need to x, making this a key sentence frame in the CANBEC data. Sometimes, we need to expresses a general group or corporate desired course of action or ambition (Extract 4): Extract 4 [Internal monthly progress meeting in a British manufacturing company, involving upper management.] <$1> Can we justify it? <$7> No. Can’t justify it at the moment. I think you know we need to look at the big picture at the end of this year and and decide then. But I think at the moment <$=> we can’t <\$=> we’re not losing enough sales at the moment <$=> to s= <\$=> to say well we’d have picked those up and a few more if we’d have had somebody else on the road. We need to also frequently prefaces corporate requests for information and for action issued by individuals with authority. As such it is an indirect form, protecting face and less direct than possibly face-threatening demands or directives (Extracts 5 and 6): Extract 5 [Meeting between a multinational car manufacturer and a British hydraulics company. ey are discussing product development.]

“Invisible to us”

181

<$3> I mean ultimat= ultimately it’s your decision whether you want a+ <$1> True. But er o= o= <$3> +a hard blow fuse if you like or a a resettable fuse. <$1> You’re right. But the thing is I mean we need to know what your rationale is. And if you say “We prefer to have a resettable one because we we know this is a problem” then it will help Nigel to make that decision you see. Extract 6 [As for extract 5] <$1> We were just talking about the durability work. Erm we don’t have any plans at the moment to do some tests on the assembly to the drop side body. And I think what we need to do is we need to do some test work. What I’d ask you to do then, it’s good preparation for that test work, is, you’ve told me what you think your durability is from your calculating the er the durabi= the life of the crane.

Need is by far the most frequent modal verb indicating obligation in CANBEC. Other possible exponents of obligation (e.g. must, ought) are very low in frequency. Figure 4 in the appendix compares the occurrence per million words of obligation-uses of the expressions need to, have (got) to/gotta, should, ought and must in CANBEC, CONV and ACAD. It can immediately be seen that need to is extraordinarily high in CANBEC compared with the other two corpora. Have got to, should and ought are more evenly distributed across the three corpora, while must is lowest in CANBEC and highest in ACAD. e high incidence of need and very low incidence of must suggest that SBE prefers more indirect expressions of obligation, and reveal how important the preservation of face is, even in a context where one might expect exigencies, pressure and urgency to be frequent and paramount. We, you and I, in descending order, are amongst the top 6 immediate prior-collocates (or so-called ‘lehand’ collocates3) of need, with the collective and/or face-preserving we need being three times more frequent than you need. Perhaps even more importantly, when further concordances are run and one examines the precise distribution of the modal expressions of obligation, shis in usage may be observed which indicate just how sensitive interlocutors are to the face needs of colleagues, even of their subordinates. In a CANBEC transcript of an internal meeting between three managers there are 37 occurrences of have (got) to/gotta, where the managers discuss necessary actions and goals. ere seems to be no face-threat perceived in the use of these rather

182 Michael McCarthy and Michael Handford

direct forms among equals (see also Donohue and Diez 1985). However, when these goals and actions are communicated to others in subordinate positions, in two other meeting transcripts from the same company, have (got) to/gotta drops dramatically in frequency (2 and 10 occurrences, respectively), and in the latter of the two transcripts, where the manager is conveying necessary changes to a subordinate, 40 examples of should occur. It seems that in these data at least, more face-protecting and indirect forms for issuing directives are preferred in order to maintain good interpersonal relations and to promote the comity, motivation and stability so necessary in business institutions. Speculative and hypothetical uses of may and might are very similar in CANBEC and ACAD, but noticeably lower in CONV (see Figure 5 in the appendix). is is to be expected in ACAD, where speculating and hypothesizing are key functions, but it also illustrates the degree of speculation and entering into irrealis worlds that characterize SBE, where, paradoxically, we have argued that goal-orientation and decision-making are important. However, it is clear that speculation and hypothesizing are important parts of the collaborative and convergent enterprise of consensus-making, and are, once again, face-protecting both for those who speculate and those who respond. Extract 7 is a typical irrealis context for may in CANBEC, occurring alongside other hypothetical expressions, vague expressions and hedges (indicated in bold): Extract 7 [Meeting between the sales staff of an IT company and a potential client. e latter is the Managing Director of an Internet Sales company.] <$1> I guess you’ll have to speak to Bob and and and to to James and and and kind of look at what you think you may have coming up. <$2> Yeah. <$1> And then we can get together again and actually you know finalize something and and and move forward. <$2> Okay. <$1> Yeah. <$2> Yeah. <$1> So it may well be that it’s it’s it makes financial sense to go with a rack and a half and put all of your existing servers into a rack. It may be better that erm you keep those ones and maybe just a half rack for the future. Or it may be a bit better just to keep buying individual collocations as and when you need them.

“Invisible to us” 183

<$2> Right okay. <$1> So I think that’s probably the three+ <$2> Yeah. <$1> +different ways that it can go. <$2> Okay.

Discourse marking We have already noted in the cluster analyses that CANBEC shares common discourse markers projecting states of shared knowledge such as you know and I mean with CONV and, to a lesser extent, with ACAD. We also noted the frequent preference in CANBEC for the encapsulating marker at the end of the day as an index of the importance of formulating decisions, consensus, convergent understandings and so on. Another marker of interest is the utterancefinal and freestanding so (see Figure 6 in the appendix), which occurs 50% more frequently in CANBEC than in CONV, and six times more frequently in CANBEC than in ACAD (note also that so is number 20 in the keyword list CANBEC versus CONV). Final and freestanding so seem to function in CANBEC to leave open the possibility of further contributions (i.e. they are listenerengaging) as well as projecting shared assumptions about the conclusions to be reached from the preceding discourse. Two examples follow (extracts 8–9): Extract 8 [fortnightly progress review in a private museum.] <$2> He’s very helpful when you want something explaining actually. I have to say that for him. When I’ve not understood about ordering something or where to put something to do with invoices he’s really good at actually saying “Right you need to do this in this way and you need to do it like this because of that”. Er he’s very good at+ <$M> Mm. <$2> +giving you a hand with stuff. But then it is just the little things that frustrate you that’s the problem isn’t it. So. <$3> And er my my impression is that a lot of that is is due to the, that over the last er couple of months basically he’s lost the plot rather. Extract 9 [As for extract 4]

184 Michael McCarthy and Michael Handford

<$4> Has somebody told them there will be a charge? If it is being towed by a Transit it will cost them extra. <$7> Well we’ve told them and put it in writing that we don’t consider the Transit to be the right vehicle to be towing it+ <$4> Right. <$7> +in the first place. <$1> It won’t be towed by a Transit. We’d have to alter how we made it. <$4> Yeah. <$1> So. <$7> Mm. You know we can’t do more than that can we. And we’ve also put in writing that our standard towing height is seventeen inches.

Problem and its institutional construction in CANBEC e word problem(s) is more than twice as frequent in CANBEC than in ACAD, and more than three times more frequent than in CONV. is perhaps should not surprise us since business meetings mostly exist to discuss and promote solutions to problems. Problems have to be evaluated and prioritized, and this is reflected in the repeated clusters the main problem, the other problem, a big problem, the biggest problem, the only problem which occur in CANBEC. Statements of perceived problems also reflect participants’ agendas in meetings. Boden (1995) notes the importance of how problems are framed by speakers and how this influences the course of their evaluation and solution. When the environment of problem is more closely examined in CANBEC such framings can be observed, oen in the form of recurrent or extended metaphors or idioms. Wenger (1998) highlights the importance of jokes, stories, lore, idioms and metaphors which become the routine ways of approaching problems in institutions and which help to construct Communities of Practice. A CANBEC example shows the problem-framing function in action, with metaphors/idiomatic expressions highlighted (the extract is edited for length, with time-hops indicated): Extract 10 [As for extract 7] (Discussing computer server problems) <$1> Erm as you know with application problems you just it it’s+

“Invisible to us” 185

<$2> Yeah. <$1> +it’ s it’s <\$=> <$2> It’s a nightmare. <$1> Yeah. [sighs] <$2> Sometimes the experts don’t know. [laughter] <$1> Yeah exactly. But it can be a real+ <$2> Okay. <$1> +er can of worms. So. [inhales] … [6 mins] <$2> +then if there is a problem and it’s irretrievable they lose a day’s transactions. <$3> Yeah. <$1> Yeah. Yeah. Which you can’t= <$2> And that’s a nightmare. <$1> Yeah. … [20 mins] <$2> But we don’t get the hosting+ <$1> Mm. <$2> +on this particular customer because <$=> they <\$=> we weren’t offering a credible twenty four by seven+ <$1> Yeah. Yeah. <$2> +erm support. <$1> Sure. <$2> And doing anything on their site is a complete nightmare+ <$1> Mm. … [20 secs] <$2> Because they’re running something like sixty sites on one machine. <$1> Yeah. <$3> Wow. <$2> But but it it just is a nightmare. Speaker 2, the client, repeatedly frames the problem as a “nightmare,” while speaker 1 stresses the complexity of it as a “can of worms”. In extract 11, a sales manager in conversation with a subordinate offers some informal evaluations of a problem and the current lack of solutions concerning work procedures:

186 Michael McCarthy and Michael Handford

Extract 11 [Internal meeting between the Technical Manager and a technician in a British Internet Server Provider company.] <$1> Okay. So we know full well the account manager’s not gonna tell them cos the account manager doesn’t give two hoots. All right. So the next person it comes from is DLM who send the customer a fax and I know DLM haven’t been doing that because they they realize that they’re gonna get it in the neck from the customer. Cos the customer will see a thing which says ‘Right let’s do a concrete example.’ So let’s say a customer says be on site by nine. <$2> Yeah. … [1 min] <$1> <$=> For this and of course the overtime will just be deducted from= Well either the overtime’ll be deducted from the account manager or somehow Forcenet’ll just pay this which I can’t believe will happen. <$2> Yeah. <$1> Yeah? So it’ll get deducted from the account managers which means the account managers’ll be up in arms but then tough. Cos the buck’s gotta stop somewhere and I don’t see why it should stop with well I don’t see why it necessarily should stop with BJE. <$2> Yeah. <$1> Well it’s been on the agenda. And I mailed you about it. I mailed the whole team about it. Cos in your= Well either way it’s got to be resolved. <$2> Yeah. <$1> Cos it’s a it’s a pain in the arse for everybody at the minute. <$2> I know I know. I know. <$1> All right? <$2> Yeah.

Such informal and idiomatic ways of framing and evaluating problems contribute to the habitual practices which ultimately build and reinforce the institutional cultures which characterize businesses and their communicative styles (see Mumby 1988). Linde (1997) sees various levels of evaluation – from incidental, one-off local evaluations to topic level evaluations, where the purpose of the whole discourse is to arrive at an evaluation – as a central element of professional discourse, and in the SBE context it is clear that evaluations play a major part in the problem-solving discourses which are at the heart of the meetings, negotiations and other discussions which constitute the data.

“Invisible to us” 187

e value of concordances of targeted words, whether grammatical (e.g. pronouns, modal verbs) or lexical (e.g. discourse markers, core lexical items such as problem and so on) are not only that they bring with them their own co-text but that they enable us to observe recurrent patterns of communicative activity, and they open up the co-text as a source of evidence for the pragmatic inferences we can make as analysts. As we have seen, such inferences are assisted by looking at co-occurrence with other types of items (e.g. modal verbs alongside hedges and vague expressions, idioms and metaphors alongside the item problem).

Conclusion It is the ability to observe repeated and parallel events over a wide range of different speakers and contexts which makes a quantified corpus-based investigation of a register such as SBE into a powerful supportive tool for existing or proposed inferential analyses conducted upon individual conversations or relatively small samples of conversations. We conclude that SBE is indeed an institutional form of talk in that it partakes of the values associated with Communities of Practice and with other modellings of professional discourse (e.g. habitual communicative practices, shared goals, repeated interactive engagement among participants, negotiation of roles and hierarchies, and so on). We also concur with Nelson (2000) that business English is not just general English with specialist nomenclature added, and feel confident that St John’s (1996) misgivings as to whether a lexico-grammar of business English can be easily defined may be considerably assuaged by appropriate corpus analysis. However, it is clear that neither the quantitative data of a corpus alone nor the one-off analysis of conversational fragments is sufficient; extra insight can be gained by working from the former to the latter and vice-versa, keeping both in constant dialectical relationship. We also conclude that comparative data are indeed valuable: SBE is in some senses like academic data and shares some of its institutional characteristics (irrealis worlds, goal-driven, chaired discussion, etc.), but it is also fundamentally conversational, sharing a great deal with the banal talk of everyday sociability, underlining its core orientation towards comity, convergence, and satisfactory and non-threatening relationships, even in the face of hierarchically conditioned institutional roles, what Boden (1995: 99) colorfully sums up, in her description of professional meetings, as the “fine

188 Michael McCarthy and Michael Handford

tinkering and maneuvering of actors dancing around agendas and arrangements, accommodating each other locally for a variety of personal, political and institutional goals.” We would also underline the immense value of naturally recorded data in business settings, though it presents challenges both for description and pedagogy. It is understandably very difficult to gain access to meetings and other confidential business events, and a good deal of time has to be invested in building up trust and confidence that the exercise of recording and transcribing is as of much value to the business world as it is to academics and does not constitute a threat. Ideally as researchers we wish to be treated like our project researcher in extract 12 below, which provides us with the quotation in the title of this chapter. A business meeting, recorded by one of our researchers, is just opening, and the chairperson (speaker 1) begins thus: Extract 12 <$1> Right. So obviously we’ve got four attendees but one’s probably not supposed to be here+ [laughter] <$1> +or is invisible to us. <$3> Yeah. And there’s one that’s supposed to be that isn’t. <$1> Er and Dennis, Dennis is away isn’t he? <$3> Yeah.

e ‘invisible’ participant is our researcher. Such situations, where participants are at least in principle willing to forget they are being observed, is of great value for description and pedagogy in the SBE context, and we conclude with what we consider the main pedagogical implications of this preliminary study of a limited set of data: 1. A good deal of the linguistic content of SBE is shared with casual conversation, in the sense that interpersonal features of meaning are accorded at least equal status, if not greater, with transactional (content) features. A comprehensive SBE pedagogy would therefore priorities awareness of areas such as personal deixis, face-protection and indirectness. 2. Nelson (2000) found that published business English materials were more focused on concrete entities rather than abstract states and qualities, less varied, and more polite than the evidence of his business corpus. Our corpus evidence seems to support Nelson, and in the case of politeness, the corpus

“Invisible to us” 189

suggests that SBE is highly context-sensitive and not amenable to over-simplifications of politeness and face-protection principles. However, the evidence does suggest that training in mitigating face-threats is vital, and, as we have seen, implies focus on core areas such as the boosting of appropriate use of particular modal expressions and the downplaying of others. 3. Skill in hedging and the use of purposive vagueness should not be underestimated in training: a repertoire of words and multi-word expressions could be devised to soen the oen too blunt and on-record utterances of secondlanguage learners aiming to operate successfully in business cultures of the kind exemplified in our data. 4. Close observation of the achievement of speech acts such as requests and directives while maintaining comity in SBE contexts is a useful awareness-raising exercise. Williams (1988), in critically examining the relationship between real data and published teaching materials, reminds us that the language of business meetings is far more complex than simply realizing functions with suitable exponents and that on-the-spot linguistic strategies and awareness of interlocutors were crucial factors. 5. Although, as with Firth’s data, many users of SBE will be using it as a lingua franca in non-native business contexts, successful business still rests on good interpersonal relations. Getting things done, either by oneself or getting them done by others, can be considerably facilitated by a greater awareness of what the linguistic resources have to offer, even when divorced from their particular (native-British or other) linguacultures and speech communities. It is for users themselves ultimately to decide whether and how to harness those resources, but not to make them available is to offer an impoverished set of tools to the business practitioner. 6. As more spoken business corpora become available, data-driven learning using concordances and open access to corpus files becomes a real possibility, and corpus researchers and teachers may, we would hope, no longer operate as gatekeepers but as facilitators, enabling business users of English to access resources aligned to their own situations and linguistic goals.

Notes 1. e corpus project was established in the School of English Studies at the

University of Nottingham, UK, and is funded by Cambridge University Press.

190 Michael McCarthy and Michael Handford

All conversational extracts in the present chapter, unless otherwise stated, are from the CANBEC corpus, and all corpus data in this chapter are copyright Cambridge University Press, 2004. 2. e CANCODE corpus comprises five million words of English conversation. e corpus was developed at the University of Nottingham, UK, and funded by Cambridge University Press, with whom sole copyright resides. e corpus project is jointly directed by Ronald Carter and Michael McCarthy. e corpus conversations were recorded in a wide variety of mostly informal settings across the islands of Britain and Ireland, then transcribed and stored in computer-readable form. Details of the corpus and its design may be found in McCarthy (1998). CANCODE forms part of the larger spoken and written Cambridge International Corpus (CIC). 3. We deliberately eschew the conventional use of ‘le-hand’ and ‘right-hand’

when referring to collocations in spoken data since spoken language has no ‘le’ or ‘right’, and le and right are page-driven metaphors borrowed from the study of (western-script) written texts and sentences. ‘Prior’ and ‘post’ would seem better metaphors for speech, but we do accept that concordances on screen are at least for the temporary purposes of analysis, ‘written’ texts.

References Aston, G. and Burnard, L. 1998. The BNC Handbook. Edinburgh: Edinburgh University Press. Bargiela-Chiappini, F. and Harris, S. 1997. Managing Language: The Discourse of Corporate Meetings. Amsterdam: John Benjamins. Biber, D. and Conrad, S. 1999. “Lexical bundles in conversation and academic prose.” In Out of Corpora: Studies in Honor of Stig Johansson, Hilde Hasselgard and Signe Oksefjell (eds), 181–190. Amsterdam: Rodopi. Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. 1999. Longman Grammar of Spoken and Written English. London: Longman. Boden, D. 1994. The Business of Talk. Organisations in Action. London: Polity Press. Boden, D. 1995. “Agendas and arrangements: Everyday negotiations in meetings.” In The Discourse of Negotiation: Studies of Language in the Workplace, A. Firth (ed.), 83–89. Oxford: Pergamon. Charles, M. 1996. “Business negotiations: Interdependence between discourse and the business relationship.” English for Specific Purposes 15 (1):19–36. Chiapello, E. and Fairclough, N. 2002. “Understanding the new management ideology: a transdisciplinary contribution from critical discourse analysis and new sociology of

“Invisible to us”

191

capitalism.” Discourse and Society 13 (2):185–208. Connor, U. 1999. “‘How like you our fish?’ Accommodation in international business communication.” In Business English: Research into Practice, M. Hewings and C. Nickerson (eds), 115–128. Harlow: Longman. Cortes, V. 2002. “Lexical bundles in freshman composition.” In Using Corpora to Explore Linguistic Variation, R. Reppen, S. Fitzmaurice and D. Biber (eds), 131–145. Amsterdam: John Benjamins. Crosling, G. and Ward, I. 2002. “Oral communication: The workplace needs and uses of business graduate employees.” English for Specific Purposes 21:41–57. Donohue, W, and Diez, M. 1985. “Directive use in negotiation interaction.” Communications Monographs 52:305–318. Drew, P. and Heritage, J. 1992. Talk at Work. Interaction in Institutional Settings. Cambridge: Cambridge University Press. Firth, A. 1990. “Lingua franca negotiations: towards an interactional approach.” World Englishes 9 (3):269–280. Firth, A. (ed.) 1995. The Discourse of Negotiation: Studies of Language in the Workplace. Oxford: Pergamon. Garcez, P. 1993. “Point-making styles in cross-cultural business negotiation: a microethnographic study.” English for Specific Purposes 12:103–120. Gimenez, J. 2001. “Ethnographic observations in cross-cultural business negotiations between non-native speakers of English: An exploratory study.” English for Specific Purposes 20:169–193. Greatbatch, D. 1988. “A turn-taking system for British news interviews.” Language in Society 17:401–430. Grimshaw, A. 1989. Collegial Discourse: Professional Conversation among Peers. Norwood NJ: Ablex. Halmari, H. 1993. “Intercultural business telephone conversations: A case of Finns vs. Anglo-Americans.” Applied Linguistics 44 (4):408.430. Hofstede, G. 1991. Cultures and Organizations: Software of the Mind. London: McGrawHill. Holmes, J. and Meyerhoff, M. 1999. “The community of practice: theories and methodologies in language and gender research.” Language in Society 28 (2):173–183. Koester, A. 2001. Interpersonal markers in workplace genres: pursuing transactional and relational goals in office talk. Unpublished PhD thesis, University of Nottingham, May 2001. Labov, W. 1972. “Some principles of linguistic methodology.” Language in Society 1: 97–120. Larrue, J. and Trognon, B. 1993. “Organisation of turn-taking and mechanism for turntaking repairs in a chaired meeting.” Journal of Pragmatics 19 (2):177–196. Linde, C. 1997. Evaluation as linguistic structure and social practice.” In The Construction of Professional Discourse, B.-L. Gunnarsson, P. Linell and B. Nordberg (eds), 151–172. London: Longman.

192 Michael McCarthy and Michael Handford

McCarthy, M. 1998. Spoken Language and Applied Linguistics. Cambridge: Cambridge University Press. Mumby, D. 1988. Communication and Power in Organisations: Discourse, Ideology and Domination. Norwood, NJ: Ablex. Nelson, M. 2000. A corpus-Based study of business English and business English teaching materials. Unpublished PhD Thesis. Manchester: University of Manchester. Oakey, D. 2002. “Formulaic language in English academic writing.” In Using Corpora to Explore Linguistic Variation, R. Reppen, S. Fitzmaurice and D. Biber (eds), 111–129. Amsterdam: John Benjamins. Roberts, C. and Sarangi, S. 1999. Talk, Work and Institutional Order: Discourse in Medical, Mediation and Management Settings. Berlin: Mouton de Gruyter. Scannell, P. (ed.) 1991. Broadcast Talk. London: SAGE Publications. St John, M-J. 1996. “Business is booming: Business English in the 1990s.” English for Specific Purposes 15 (1):3–18. Scott, M. 1999. Wordsmith Tools (corpus analytical software suite). Oxford: Oxford University Press. Ulijn, J. and Li, X. 1995. “Is interrupting impolite? Some temporal aspects of turn-taking in Chinese-Western and other intercultural business encounters.” Text 15 (4):589–627. Ulijn, J. and Murray D. (eds). 1995. “Special issue on intercultural discourse in business and technology.” Text 15 (4). Wenger, E. 1998. Communities of Practice: Learning, Meaning and Identity. Cambridge: Cambridge University Press. Williams, M. 1988.“Language taught for meetings and language used in meetings: Is there anything in common?” Applied Linguistics 9 (1):45–58. Wray, A. 2002. Formulaic language and the lexicon. Cambridge: Cambridge University Press. Yamada, H. 1990. “Topic management and turn distribution in business meetings: American versus Japanese strategies.” Text 10 (3):271–295. Zupnik, Y.-J. 1994. “A pragmatic analysis of the use of person deixis in political discourse.” Journal of Pragmatics 21:339–383.

“Invisible to us” 193

Appendix Table 1. Sites where Corpus Data Recorded Company Type

Size*

Location**

No. of words

Consultancy company Pharmaceutical company ISP/IT Bicycle manufacturer Private Museum Hydraulics manufacturer Car manufacturer Coal company Accountancy firm Tour operator Travel agent Bank Web designer Financial adviser Business Adviser Pub chain Brewer Pharmaceutical company Telecommunications company Business schools

3 5 2 3 1 2 4 3 1 2 3 5 1 1 1 4 4 1 4 N/A

UK UK UK UK UK UK UK UK UK UK Spain Japan Eire UK UK UK UK Germany UK UK

40,000 110,000 154,000 12,000 33,000 186,000 65,000 12,000 25,000 42,000 55,000 55,000 12,000 20,000 60,000 54,000 10,000 60,000 30,000 60,000

*Size refers to number of employees: “1”= 1 - 20 employees “2”= 21–100 employees “3”= 100–1000 employees “4”= 1000–10,000 employees “5”= 10,000+ employees ** Location refers to the place where the recording was made.

194 Michael McCarthy and Michael Handford

Table 2. Comparison of top 40 word forms, 250,000-word CANBEC data with 340,00word CONV, and 340,000-word ACAD. Normalised to occurrences per million words.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

CANBEC Word

Freq.

CONV Word

Freq.

ACAD Word

THE AND TO I YOU A IT YEAH THAT OF WE ER IS IN SO IT’S ERM BUT ON FOR KNOW THEY BE WELL HAVE MM IF DO THAT’S WHAT JUST WITH ALL RIGHT THIS GOT NO AT WAS

37952 23480 22420 20324 18712 18688 18632 17972 17344 14068 13464 10124 10016 9864 9176 8916 8904 8452 7896 7620 7088 6756 6744 6656 6600 6380 6288 6160 5632 5576 5376 5340 5208 5156 5080 4920 4892 4856 4832

THE I YOU AND YEAH IT A TO THAT OF MM IN WAS IT’S OH KNOW ER THEY BUT WE NO IS LIKE WELL HE ON HAVE SO ERM RIGHT ALL JUST THERE GOT DO THIS THAT’S WHAT BE

31461 29058 26640 26634 22983 22518 21405 18879 14235 13761 12807 12279 11475 10566 9786 9096 8850 8580 8448 8295 7974 7935 7689 7560 7485 7113 6873 6816 6543 6372 6366 6117 5844 5727 5619 5619 5607 5256 5001

THE AND OF YOU A TO THAT IN IS IT I ER SO IT’S THIS WHAT YEAH ERM ARE BUT ON HAVE BE WE RIGHT KNOW AS THEY IF OR DO NOT WITH ALL FOR WHICH AT ONE THERE

Freq. 51672 28248 27672 23823 23742 23040 18870 17268 17022 15501 14400 9885 9660 8565 8487 7560 7539 7242 7161 7020 6531 6216 5880 5706 5694 5667 5409 5337 5283 5241 5232 5064 5061 5025 5004 4902 4743 4731 4701

“Invisible to us” 195

Table 2. Cont. CANBEC Word 40 41 42 43 44 45 46 47 48 49 50

ONE NOT THERE THINK CAN AS ARE DON’T THEM GET THEN

Freq. 4648 4644 4520 4444 4264 4112 3960 3956 3840 3768 3756

CONV Word

Freq.

ACAD Word

DON’T FOR ONE YES SHE NOT WITH THINK ABOUT GET REALLY

4812 4800 4800 4797 4713 4566 4227 4023 4017 4014 4002

CAN ABOUT THAT’S LIKE WAS MM JUST VERY HE OKAY BECAUSE

Freq.

Table 3. Top 50 keywords CANBEC compared with CONV WORD 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

OKAY WE NEED CUSTOMER THE WE’RE WE’VE FOR IF PER WHICH PRODUCT PRICE CUSTOMERS SALES TO THAT MEETING ERM SO WILL MONTH MAIL INFORMATION CENT

FREQ. CANBEC 938 3,366 535 201 9,488 676 684 1,905 1,572 209 669 109 157 112 107 5,605 4,336 139 2,226 2,294 526 145 97 141 146

FREQ. CONV 359 2,765 171 1 10,487 339 357 1,600 1,278 36 403 0 17 3 2 6,293 4,745 17 2,181 2,272 323 24 4 24 28

KEYNESS 514.1 467.2 348.1 344.5 329.4 265.3 253.0 245.0 224.1 204.0 194.5 193.1 184.8 173.8 171.7 170.1 158.7 156.9 154.6 151.1 147.4 144.3 142.4 138.4 134.9

4665 4626 4542 4332 4203 4035 3903 3792 3693 3687 3540

196 Michael McCarthy and Michael Handford

Table 3. Cont. WORD 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

ISSUE ORDER IS USE PROBLEM BE WE’LL ACCOUNT DATABASE PAPERWORK DATE PAGE SCHEDULE LIST CAN LOAD COMPANIES COST FOUR STOCK MAY DONE COPY ORDERS MOMENT

FREQ. CANBEC 82 120 2,504 249 222 1,686 328 86 51 50 79 82 51 119 1,066 75 74 112 266 51 128 380 95 54 156

FREQ. ACAD 4 21 2,645 111 91 1,667 181 7 0 0 10 13 1 36 1,025 11 11 34 162 2 46 277 25 4 71

KEYNESS 117.1 116.2 115.2 114.4 112.7 111.8 111.0 110.1 90.3 88.6 88.0 83.2 81.5 81.1 80.6 78.8 77.3 76.1 75.7 75.4 74.7 73.2 72.0 70.8 69.8

Table 4. Top 50 keywords CANBEC compared with ACAD WORD 1 2 3 4 5 6 7 8 9 10 11

YEAH WE WELL I COS WE’VE CUSTOMER WE’RE GONNA FOR MM

FREQ. CANBEC 4,493 3,366 1,664 5,081 788 684 201 676 598 1,905 1,595

FREQ. ACAD 2,513 1,902 1,013 4,800 369 294 7 321 282 1,668 1,345

KEYNESS 1,496.1 1,098.3 473.0 417.1 338.6 327.5 301.7 285.0 254.4 210.0 200.4

“Invisible to us” 197

Table 4. Cont. WORD 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

THEM GOT MONTH NEED CENT MEETING MEAN SALES PER CUSTOMERS NO PRICE IT OH THEY’VE UP I’VE DONE US OUR MANAGER WE’LL YEAR HAVEN’T DESIGN SEND COST PHONE WORK ANYWAY JUST GOTTA ORDERS WE’D COMPANIES PAPERWORK OUT POUNDS ERM

FREQ. CANBEC 960 1,230 145 535 146 139 900 107 209 112 1,223 157 4,658 648 259 864 528 380 346 339 70 328 251 273 71 91 112 79 346 193 1,344 133 54 96 74 50 806 105 2,226

FREQ. ACAD 678 957 9 286 12 12 653 3 52 6 1,028 27 5,167 452 105 694 351 222 196 192 2 185 120 139 3 11 23 7 210 79 1,301 37 0 17 7 0 710 25 2,414

KEYNESS 198.7 197.9 197.5 189.7 186.1 174.8 174.4 164.9 164.4 157.0 155.1 153.0 151.7 138.0 132.6 126.5 125.2 116.1 111.5 109.3 107.6 106.5 104.7 104.2 103.6 102.9 99.3 98.6 98.5 97.7 97.4 96.5 95.5 92.2 90.7 88.4 86.6 85.0 84.7

198 Michael McCarthy and Michael Handford

Table 5. Top 20 five-word clusters CANBEC. Normalised to occurrences per million words. Word 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Freq

AT THE END OF THE THE END OF THE DAY TO HAVE A LOOK AT I THINK WE NEED TO YOU KNOW WHAT I MEAN WE NEED TO LOOK AT WHAT WE’RE TRYING TO DO DO YOU WANT ME TO IF YOU LOOK AT THE IT’S JUST A CASE OF ONE OF THE THINGS THAT TO BE HONEST WITH YOU A COUPLE OF WEEKS AGO ALL THAT SORT OF STUFF AND HAVE A LOOK AT DER DER DER DER DER HAVE A LOOK AT IT I DON’T KNOW IF YOU �� I DON’T KNOW WHAT THE I MEAN I DON’T KNOW

148 104 52 48 40 36 32 28 28 28 28 28 24 24 24 24 24 24 24 24

��

��

��

��

��

��

��

� ��

��

��

��

��

��

� ��

Figure�� 1. Speech events and word totals � �

��

��

��

��

��

“Invisible to �� us” 199

��

��

��

��

��

� ��

��

��

��

��

��

�

Figure �� 2. Company types in CANBEC

Figure 3. Pronouns I and we: CANBEC vs. CONV vs. ACAD

200 Michael McCarthy and Michael Handford

Figure 4. Modals of obligation: CANBEC vs. CONV vs. ACAD

Figure 5. Hypothetical may and might: CANBEC vs. CONV vs. ACAD

“Invisible to us” 201

Figure 6. Final and freestanding so: CANBEC vs. CONV vs. AC

202 Michael McCarthy and Michael Handford

Opportunities and threats for corpus linguistics 203

Legal discourse: Opportunities and threats for corpus linguistics Vijay K. Bhatia, Nicola M. Langton and Jane Lung City University of Hong Kong

Introduction Corpus linguistics has been widely claimed to be a powerful instrument for the study of linguistic frequency in and across a variety of discourses. e use of computerized corpora has further made it possible for linguists to undertake automatic analyses of lexico-grammatical and, to some extent, discoursal features of texts. In the last few years these corpus-based studies have become so popular that one rarely finds a textual study without the use of computerized corpora. is growing dependence on corpus linguistics makes one wonder to what extent this dependence is really motivated by the needs of the research questions and to what extent this is merely for the sake of convenience and somewhat ready-made solutions to interesting and intriguing questions in discourse studies. We would like to take up some of these issues that can be legitimately raised in the context of legal discourse. In particular, we would like to focus on the use of corpus linguistics in several areas of legal discourse, which may include automatic analysis of statistically significant features of lexicogrammar, analysis of intertextuality and interdiscursivity in legal discourses, and the use of corpora in language teaching and learning in legal contexts. is focus will be illustrated in particular by reference to two studies which made use of computerised legal corpora.

204 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

Review of the literature Before discussing the issues the use of corpus linguistics raises, we would like to highlight the function of legal discourse in the academic context on the one hand and in professional practice on the other, and also to distinguish the nature of legal genres from a number of other professional genres. Beginning with legal discourse, we find a broad range of individually identified legal genres – such as legislation, judgments, legal textbooks, and law cases – which are exclusively used within typical legal contexts, both academic and professional. However, within academic contexts, one finds that these various genres are intertextually and interdiscursively realized through legal research, legal argumentation, analysis of legal issues and legal thinking to give shape to other academic genres such as the problem-question genre, or the critical essay genre. In a similar manner, these broad-based genres are also used in various interesting and intriguing ways to give shape to some of the most common professional genres associated with legal practice, which include legal memoranda and legal pleadings. e ultimate outcomes of these contextualizations are some of the more narrowly identified professional legal genres, which become the instruments for successfully achieving the legal tasks of property conveyance, draing affidavits, preparing contracts, or client consultation, which oen involve a number of specific stages (see Candlin and Bhatia 1998 for details). e way these different genres are interdiscursively linked is displayed in Figure 1. To understand Figure 1, it is necessary to understand how the terms intertextuality and interdiscursivity are being used. For our purposes, intertextuality refers to the property of one text being used in another, either directly or by pragmatic implication, as in the case of a particular section of a legislative act explicitly referring to another section, which has conflict with the one being constructed, either real or potential (Bhatia 1983). It is also possible that a text may not refer to any specific or preceding legislation, but may have indirect implication because both of them may refer to similar case descriptions. Intertextuality may also be seen across genres when judgements use legislative provisions as authority as part of the argument. Interdiscursivity, on the other hand, is rather subtle, and is seen where conventions associated with one genre are cleverly exploited in another genre. Narrative conventions, for instance, are oen exploited within the early sections of judgements to describe facts of cases. us the two processes are different, one is more surface-level, and the

S T R AS TK E I GL I L ES S /

Opportunities and threats for corpus linguistics 205

206 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

other is more subtle and rather obscure. Candlin and Maley (1997:203) usefully define these terms as follows: Discourses are made internally viable by the incorporation of such intertextual and interdiscursive elements. Such evolving discourses are thus intertextual in that they manifest a plurality of text sources. However, in so far as any characteristic text evokes a particular discoursal value, in that it is associated with some institutional and social meaning, such evolving discourses are at the same time interdiscursive.

We shall return to these concepts in the later sections of the chapter, but as can be seen in Figure 1, academic practice within the academy is interactively linked to professional legal practice through what we regard as a combination of academic legal tasks and narrowly defined range of professional genres, such as legal memoranda, pleading documents, problem-solving essays, etc. ese are generally taken up during this interface between academic and professional practice through the development of skills in legal argumentation, legal research, and legal thinking. As the learners move towards professional practice, they tend to focus more specifically on legal tasks and activities, oen using a combination of narrowly defined genres already mastered in the PCLL context. e most important aspect of this view of legal education and practice is the integration of skills, genres and activities through textualization and interdiscursivity, rather than individual texts and genres. is emphasis on intertextuality and interdiscursivity does not in any way undermine the need to master the nature of legal discourse. Most specialized discourses can be placed on a continuum, at one end of which we can place the most conservative forms of discourse, such as legislation, and at the other end we can see the literary discourses. Legislative genres – such as rules and regulations, contracts and agreements – are all extremely conservative and are oen called frozen genres, where form-function correlations are rather fixed and every attempt is made to restrict the number of interpretations that a particular legislative statement can attract. As Bhatia (1983) points out in his detailed analysis of British legislation, although statutory genres are written precisely, clearly, unambiguously, and all-inclusively, parliamentary counsel makes every attempt to box the judge firmly into a corner from which the judge cannot escape, which implies that multiplicity of interpretations is definitely not considered a virtue in this kind of writing. However, if we go to the other end of the continuum and look at some of the literary genres, we find that they are extremely liberal in interpretation in the sense that any potential for

Opportunities and threats for corpus linguistics 207

multiple interpretations is considered a high virtue in such genres. A strong implication of this characteristic of literary discourses is that they are also rich in creativity and hence in their form-function correlations as well. One of the major consequences of this kind of creativity is that it is oen impossible to identify the range and extent of form-function correlations in such liberal genres, and hence corpus-based linguistics based on large corpora can be of great help in doing this efficiently. In legal discourse, on the other hand, and in particular in legislative genres, the form-function correlations are almost formulaic, and it is oen not necessary to base findings on large corpora. In Bhatia (1983), for instance, the main findings were based on a single act of British Parliament, i.e. the Housing Act of 1980. e analysis of various qualifications, including their linguistic realizations and syntactic positioning, were all done exhaustively, effectively and conveniently without the use of any tools of corpus linguistics that we have available now. Even though the analytical findings were based on a single text, they have been found comprehensive and valid even to this day. Several more recent replications of this study in some of the European, Hong Kong and Chinese legislative contexts have largely confirmed these findings. Similar experience is documented in Trosborg (1997) where she analyzed statutes and contracts using a computerized corpus of more than a million words, yet found a remarkable degree of convergence, implying that even a smaller corpus would have given equally effective results. Based on the argument here, we would like to claim that legal discourse is so conservative in its construction, interpretation and use that it oen does not require a large corpus to determine its linguistic frequencies; a manual analysis can be equally efficient and reliable. e studies on legislative texts also illustrate that the relationship between form and function is also highly constrained, with each single lexicogrammatical element realizing a specific discoursal value to serve a particular legal function. is is also true of some of the intertextual relations. To give some substance to these claims, let us take up the rhetorical function of signalling textual authority, which is predominantly realized in the form of typical complex prepositional phrases, which may appear to be almost formulaic to a large extent. Swales and Bhatia (1983:102–103) provide an excellent example; they claim that although two of the typical instances of complex prepositions – namely in pursuance of and in accordance with – may be very similar in syntactic form, they refer to two very different kinds of textual authority. We can illustrate this with an example from Section 8 of e Extra-

208 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

dition [Suppression of Terrorism] Order 1978 (emphasis added). e Secretary-General of the Council of Europe shall notify the member States of the Council of: (a) any signature; (b) any deposit of an instrument of ratification, acceptance or approval; (c) any date of entry into force of this Convention in accordance with Article 11 thereof; (d) any declaration or notification received in pursuance of the provisions of Article 12;

Both instances of complex prepositions in this section refer to legal authority indicated in Article 11 and Article 12. Although both the references are to legal authority, they are signalled through two different lexico-syntactic forms. One may legitimately question what kind of form-function relationship is realized through these different expressions. In order to find the answer, we have to look carefully at the two authorities referred to, namely Articles 11 and 12. Article 11 1. is convention shall be open to signature by the member States of the Council of Europe. It shall be subject to ratification, acceptance or approval. Instruments of ratification, acceptance or approval shall be deposited with the Secretary-General of the Council of Europe. 2. e Convention shall enter into force three months aer the date of the third instrument of ratification, acceptance or approval. 3. In respect of a signatory State ratifying, accepting or approving subsequently, the Convention shall come into force three months aer the date of the deposit of its instrument of ratification, acceptance or approval. Article 12 1. Any State may, at the time of signature or when depositing its instrument of ratification, acceptance or approval, specify the territory or territories to which this Convention shall apply. 2. Any State may, when depositing its instrument of ratification, acceptance or approval or at any later date, by declaration addressed to the SecretaryGeneral of the Council of Europe, extend this Convention to any other territory or territories specified in the declaration and for whose international relations it is responsible or on whose behalf it is authorised to give undertakings.

Opportunities and threats for corpus linguistics 209

3. Any declaration made in pursuance of the preceding paragraph may, in respect of any territory mentioned in such declaration, be withdrawn by means of a notification addressed to the Secretary-General of the Council of Europe. Such withdrawal shall take effect immediately or at such later date as may be specified in the notification. [e Extradition (Suppression of Terrorism) Order 1978]

e primary function of referring to textual authority (Bhatia 1983) is realized by complex prepositional phrases such as in pursuance of and in accordance with. However, both of them are rarely used interchangeably as these expressions carry additional legal values and implications. In accordance with in section 8 above, for instance, also specifies the procedure for “any date of entry into force” of the Convention in the context of the statutory instrument. e procedural obligation seems to be stringent, in that it specifies steps that need to be followed exactly the way they are indicated by the consistent use of shall in all the sub-sections of Article 11. In the case of in pursuance of, on the other hand, it refers to textual authority that indicates a right to voluntarily adopt a particular course of legal action. is is specifically realized by a consistent use of may in Article 12. So the use of these two different complex prepositions signals two different legal functions. is, as Bhatia (1983) points out, was confirmed by a Parliamentary Counsel as follows: … Article 11 refers to the method by which the Convention is brought into force…it will be ratified State by State and perhaps only when a certain number of States have signed it will come into force. These are, in other words, procedural steps to be complied with under Article 11. So the coming into force ‘in accordance with’ Article 11 means that all the requirements of Article 11 have been met…it is not just that Article 11 tells that the treaty will come into force…it tells you when and how precisely … and the Convention does not come into force unless events have accorded with those requirements… but a declaration or notification received ‘in pursuance of’ does not tell you anything more than a declaration or notification has been received because section so and so obliged you to declare or notify…strictly speaking the difference here is that one Article requires the meeting of certain procedural requirements whereas the other is just giving authority under which something is done… (Caldwell, quoted in Bhatia 1983)

Let us take up another typical example of this kind of signalling as illustrated in the following provision from the British Housing Act 1980:

210 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

110.5 e applicable local average rate is whichever of the two rates for the time being declared by the local authority in accordance with subsection (6a) below is applicable. 110.6 A local authority shall for such period not exceeding six months and beginning at the commencement of subsection (1) above as it may determine and for every subsequent period of six months declare, on the date falling within the month immediately preceding that period, a rate applicable to the advances and transfers… and a rate applicable to the sums le outstanding…; and (a) the rate applicable …shall be a rate exceeding by 1⁄4 per cent that it has to charge… (b) the rate applicable to the sums le outstanding shall be a rate exceeding by 1⁄4 per cent the average…of the rates…

In section 110.5, the use of the complex prepositional phrase in accordance with subsection (6a) below signals not only a purely textual link with the following subsection, but also indicates the nature of legal relationship which one must expect there. In accordance with raises an expectation of obligation in whatever one may find in the subsection being referred to, which is met by the consistent use of legally binding use of shall in subsection (6). On the other hand, in section 13.4 below, the use of in pursuance of section 4(2) raises an expectation of right, depending on the individual’s choice, which is further confirmed in section 4(2) in the use of may, which is oen used to express rights, rather than obligation: 13.4 e preceding provisions of this section do not confer any right on a person required in pursuance of section 4 (2) to share the right to buy… 4.2 A secure tenant may…require that not more than three members of his family…should have the right to buy with him…

e use of under or by virtue of to refer to textual authority, on the other hand, is more neutral; however, many legal writers tend not to use these expressions so explicitly which oen causes difficulties in interpretation, especially for those who have only an outside or peripheral interest in the language of legislation. us, signalling of textual authority can be summarized in the form of the following formulaic configuration, with very little variation whatsoever (see Figure 2).

Opportunities and threats for corpus linguistics

Reject (252/252) (256/257)

211

the Court of Appeal rejected the claim of the In Herschtal v. Stewart & Ardern, Ltd. (6) TUCKER, J., also rejected a claim based on

dismiss (1008/1009) I, accordingly, hold that the judge was right … of the case and that the appeal should be dismissed. (1211/1211) Appeal dismissed with costs. (259/259) I will therefore dismiss the application. (349/349) therefore, I agree that the appeal should be dismissed. (316/316) He accordingly dismissed the plaintiffs’ claim. grant (47/47) (752/753) (245/246) (752/753) submit (807/808) (828/830)

I granted leave for Ms. Davies of the Legal Aid Department to I would allow the appeal and grant a declaration and injunction as asked in the notice of appeal. Accordingly, I grant the Order sought which is an Order of Prohibition.” I would allow the appeal and grant a declaration and injunction as asked in the notice of appeal. As to the words …, I am of opinion that the defendant’s counsel was right in submitting that… To confine … as the plaintiffs submitted would be to… and I cannot believe that this was intended.

Figure 2. Expressions in decision-making move

e point we would like to bring out here is that form-function correlations in legislative discourse are oen one-to-one, or close together, whereas in other professional discourses, especially those that are near the liberal end of the continuum, the form-function correlations are oen one-to-many. A somewhat similar picture arises with regard to other forms of lexico-grammar in legal writing, some of which include where-clauses, and if-clauses to express case descriptions, and “subject to section…,” “notwithstanding section…” or “without prejudice to section…” phrases to resolve in typical and specific ways the possibilities of conflicts across different sections or legislative statements. We would like to argue that most of these formal as well as rhetorical features of legal writing are unique to this form of discourse, and oen contribute to the generic integrity of legislative genres. ese features are not only unique but are also predictable in terms of their functional value and syntactic posi-

212 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

tioning (see Bhatia 1983 for a comprehensive discussion of this aspect of legal discourse). If it is possible to sustain this argument, which we think it is, then there is very little need for comprehensive or automatic linguistic frequency measures, as they are easily identifiable manually. It follows then that there is very little that corpus-based linguistic analysis of legal discourse will bring to light in this sense.

Legal discourse analysis and corpus linguistics Expert texts: Legal cases ere is of course a case for using corpora when the investigation into intertextuality within and across a particular genre involves larger stretches of discourse. For example, in identifying and exploring the use of specifically favoured expressions or what might be close to verb-noun collocations in other legal genres, we found corpus linguistics helpful in determining the extent and range of variation in such discourses. In an on-going investigation focusing on the generic integrity of cases in Business and Law (Lung, in progress) using quantitative as well as qualitative procedures, we discovered interesting uses of noun-verb collocations based on a corpus of law cases, which were completely absent in business corpora from management, marketing, accountancy, and economics. e corpus consisted of 105 law cases (comprising 569,445 words) recommended by law professors for use in various tertiary institutions in Hong Kong. e corpus contained both criminal and civil cases from common law countries, mainly from the U.K. or Australia, which are relevant for Hong Kong. e corpus represented a special type of reading materials – a genre in its own right – performing specific functions in legal practice, oen requiring specific reading skills and strategies on the part of the members and would-be members of the legal profession. Using winMax, and WordPilot as a supplementary tool, we discovered the potential of corpus linguistics to offer a comprehensive account of the uses of such collocations for the development of what could be regarded as a grammar of a specific genre, in this case legal cases, which if done manually could be tedious, inaccurate and incomplete. To give some indication of this, we would like to take up four specific instances of verb and noun combinations from the corpora and look at the extent and range of such

Opportunities and threats for corpus linguistics 213

expressions common in law cases. ese are submit, grant, reject and dismiss. e analysis revealed that submit has a frequency of 346, grant 229, dismiss 111, and reject 74. It is, of course, interesting to see in what parts of law cases these verbs appear and how oen, but more interesting is to look at the collocations of these verbs with the nouns in the corpus. Let us look at submit first (see Appendix 1, which gives a selection of these combinations). e most common form that follows the verb submit is Noun Clause, such as “It is submitted that there was no valid adoption of…”, which accounts for 256 occurrences, as against 90 of other nouns, such as submit plans, submit a written submission, submit lists, submit evidence, submit a report, etc. In the case of the verb grant (see Appendix 2), all the combinations consist of grant plus nouns, such as grant consent, grant permission, grant application, grant relief, grant leave, grant bail and grant adjournment. In the case of the verb dismiss (see Appendix 3), the most preferred combination is dismiss the appeal, followed by dismiss application, dismiss claim, dismiss charges and dismiss argument. Semantically close to dismiss is the case of the verb reject (see Appendix 4), which like dismiss is typically followed by noun or a noun phrase, as in reject evidence, reject…submission, reject…argument, reject…contention, reject…application, or reject…testimony. Based on the evidence from corpus linguistics, the two verbs dismiss and reject appear to be close to each other, almost synonymous; but if one were to make a close pragmatic distinction, one would have to look for evidence from institutional practices, which are difficult to study within the available tools in the field. However, it is possible to explore further to see how these instances of nounverb collocations are distributed within the rhetorical structure of the legal case as illustrated in Table 1, which indicates the positioning of these collocations: Table 1 indicates that although these four verbs are common in most parts of the law case, they clearly are preferred in different moves. Submit is common in case descriptions and presentations of the argument by counsel, whereas dismiss and grant are more common in case descriptions. Dismiss is also common in pronouncing judgments or decision-making. e word reject is commonly found in deriving ratio decidendi. It was further discovered through close analysis that submit in the move arguing the case or presenting argument oen appeared as “counsel for the plaintiff/defendant submitted that …”, or “it was submitted that…” It was rare to have “plaintiff/defendant submitted that…” or “the witness submitted that…”. Also, the word submit was usually employed to describe the act or procedure of handing in or presenting

214 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

Table 1. Position of noun-verb collocations in law cases Genre Move Submit 1 2 3 4

Facts / Stating history of the case Presenting argument Deriving ratio decidendi Pronouncing judgment

Total

Frequency Dismiss Reject

Grant

75 263 5 3

47 6 16 42

12 9 44 9

82 51 80 16

346

111

74

229

information/document/evidence to an authority (e.g. a government department or the court). When a counsel presented arguments, either written or oral, the word submit usually appeared as “the counsel submits this argument”, or “the counsel submitted that” (followed by an argument). Other words which oen appeared in the move arguing the case are: contend and argue to describe either the counsel’s or the plaintiff/defendant’s act of presenting arguments, such as “the plaintiff contended…”, “the defendant put forward a contention that…”, or “Counsel for the accused argue that…” Reject and dismiss also appeared in various moves of the case. ey were oen used to describe the lower court’s decision on a particular claim or issue (if that was an appeal case), or, when citing another case, it is used to describe that court/judge’s decision, or in arguments which sought to persuade the court to reject or dismiss a claim, or to announce the decision on the case, as in “e lower court rejected the claim that…”, “In (a case name), (the judge’s name in that case) rejected the argument that…”, “the county court dismissed the claim of…”, “Accordingly, the appeal is dismissed”, or “it was argued that the present claim should be dismissed on the ground that…” Features such as these are varied within a restricted semantic field, and are typical of this genre. Decision-making is also signalled by these four verbs (see Figure 2) as well as by other expressions like I hold/grant, accordingly…, …is/are entitled to…, I see no reason… and I have no doubt/hesitation…(see Figure 3), which show a fair degree of variation in syntactic forms as well as in pragmatic implications. If we now compare the analyses we have offered in the case of legislation and legal cases we find that legislation is fairly standardized in terms of its use of

Opportunities and threats for corpus linguistics

215

I hold/grant, accordingly… (214/214) (742/742) (248/248) (1008/1009) (541/541) (245/246) (238/239) (491/492)

I hold, accordingly, that… I hold that the defendant… e defendant is, accordingly, also liable for… I, accordingly, hold that the judge was right in both branches of the case and that the appeal should be dismissed. Accordingly we will sustain the pursuer’s appeal… Accordingly, I grant the Order sought which is an Order of Prohibition.” Accordingly, I sentenced the defendant to 20 months imprisonment for each offence, to run concurrently, but suspended for three years. ere will be judgment for the plaintiffs accordingly….

…is/are entitled to… (215/215) (272/272) (748/748) (749/749) (41/42) (1208/1208)

the plaintiff is entitled to judgment for... Judgment for the plaintiff…. I think the plaintiffs are entitled to… I think they are entitled to… the judge was entitled to find that… In my judgment, therefore, the plaintiffs fail on this part of the case also.

Figure 3. Other expressions indicating decisions

lexico-grammatical and discoursal resources with little variation. However, if we look at legal cases, it appears that although there is a fair degree of specificity and typicality in the use of lexico-grammatical resources, there are variations in the use of lexico-grammatical and discoursal expressions. Given the degree of variability in this particular genre, one may find some of the currently available tools of corpus linguistics helpful if one were interested in constructing a grammar of legal cases, especially with the purpose of establishing the extent and range of form-function correlations typical of this genre. Indeed it will be almost impossible to develop a comprehensive lexico-grammar without the help of corpus linguistics. But one may need to remember that such grammars can only be useful for restricted genres and hence the use of genre-based small corpora will be much more useful than large corpora covering a complete register of law (Henry and Roseberry 2001; Nation 2001; Tribble 2001).

216 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

Student texts: Legal problem questions It is also possible to extend the use of small corpora to another closely related genre, that of the problem-question genre written by students within academic settings. e study, which is described below, illustrates not only how the use of small corpora can be used in the teaching and learning of legal writing, but also how a corpus can be used to extend the investigation to pragmatic features of this genre based on larger stretches of discourse. It also illustrates how the intertextual and interdiscursive nature of the genre in question needs to be identified and factored into the analysis. It is a common perception that corpus linguistics research in discourse and pragmatics has been relatively modest (McEnery and Wilson 1996:98), especially in those areas in which analysis has to rely on context. is may be partly because it is not easy to include context as part of corpora, even though it may be crucial for the analysis and interpretation of textual data. is may also be partly due to the fact that much of corpus linguistics has been involved in smaller textual units as the available programmes are designed to focus on larger corpora and smaller lexico-grammatical units. In recent years, however, we have seen some significant exceptions where attempts have been made to focus on larger units of discourse, on the one hand, and on features of context and pragmatics, on the other. Hyland (1998:157), for instance, extended his observed descriptions of hedges to cover pragmatic interpretations, pointing out that “…analysis of text features cannot be isolated from the institutional and discoursal practices in which they are embedded.” He adds, “…not only is it impossible to relate particular forms to specific functions in any contextinvariant way, but meanings are frequently expressed simultaneously, which introduces the problem of indeterminacy in specifying cases…In other words, hedges seem to require a ‘more-or-less’ rather than ‘all-or-nothing’ account.” Dealing with this kind of indeterminacy seems to be a major challenge facing the corpus linguist today. In this section, we would like to introduce another study we have been involved in to extend analyses of small corpora to study pragmatic and interdiscursive features of legal discourse. e study we refer to concerns, once again, the use of small corpora, this time student writing, to examine the range and appropriacy of expressions of certainty and doubt used by Hong Kong law students in their written answers to legal problem questions and to identify the hedging strategies and devices which form part of the genre’s requirements (see Langton 2001). ese expres-

Opportunities and threats for corpus linguistics 217

sions, strategies and devices are essential in presenting legal arguments and possible outcomes and form a key part of the legal problem question genre. e genre itself is an academic tool for developing legal thinking, analysis of legal issues, and legal argumentation that shape a range of other related and specific academic and professional legal genres. As Enright (1986:347) points out, the legal problem is: …a question or exercise where a student is asked to discuss the legal consequences of a set of facts. Normally these consequences are expressed in terms of the availability of some remedy. Further, it is common practice to construct a problem so that the legal consequences of the facts are not immediately clear… the areas where the legal consequences of the facts are not clear constitute “the issues”, and are the very essence of the problem question.

However, if the students simply identify the issues and set out relevant legal arguments without applying them to the hypothetical facts and drawing conclusions (see Jensen 2000; Langton 2001), the answer would be inadequate since an essential part of demonstrating legal reasoning is to: … recognise the relative strengths and weaknesses of opposing arguments and suggest a likely outcome of a conflict. You are not evaluated on the basis of a ‘right’ answer, but rather, whatever your conclusion, you will receive marks for showing you know how to make decisions and suggest resolutions of a dispute through a reasoned evaluation of the merits of the arguments you discuss. (Krever 1989:52)

Knowing how to answer legal problem questions is therefore a crucial skill for demonstrating knowledge application or legal thinking and involves effective and appropriate language use and genre knowledge. Beasley (1993, 1994) recommends three key areas of attention crucial for successfully answering a legal problem: – – –

Organizing answers to include correctly identifying and citing relevant cases; Identifying the legal issues and when more information is needed to fully answer the question; and Recognizing and using discourse features of legal language (grammar and lexis).

Legal opinion about the purpose of problem questions in the academy seems to be divided (see Langton 2001). e traditional view is that the question should test the students’ knowledge of the “black letter of the law,” and their ability to

218 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

organize an argument which presents and analyzes possible outcomes around issues or parties. e minority view seems to be that the problem question should be based on a belief that students be required to look carefully and in depth at the gray areas of law and discuss the issues presented by the facts in a current social, economic and political context rather than being restricted to a rule bound, a contextual framework. erefore, argumentative essay questions in examinations test students’ ability to reason and present arguments and conclusions better than the traditional problem questions since issues still need to be identified and discussed, and legal reasoning demonstrated, but within a broader and more realistic context. Regardless of these different views on the merits of the legal problem question, there seems to be general agreement on what law tutors look for when marking problem question answers, namely an ability to: – – –

Identify a range of issues and relevant law from the facts; Create and develop arguments to draw conclusions on likely outcomes; and Demonstrate deductive reasoning.

It is also agreed that the analysis and application of the relevant law to the facts is the most important part of the answer, since it requires evidence of deductive reasoning and careful consideration of the problem facts. If the question is well written by ensuring some facts are uncertain or raising uncertain issues of law, then the conclusions should never be certain but rather presented as likely outcomes (Langton 2001). e common perception is the more issues identified, the better the grade (Swales 1982; Jensen 2000). Other than if-clauses, how the whole answer is organized and argued is also considered as evidence of deductive reasoning. One of the major areas of lexico-grammar crucial for deductive reasoning and legal argumentation is the use of lexical surface and non-lexical pragmatic hedges. Examining hedging devices and strategies in the textual and pragmatic context will therefore reveal the value of appropriate expressions of certainty and doubt (Hyland 1998; Langton 2001). In order to investigate the range of surface-level pragmatic and epistemic hedges that learners were familiar with, a small corpus of student-written problem questions was analyzed (Langton 2001). e corpus consisted of 10 answers to a criminal law problem question written by students of law containing about 22,077 words. ese scripts formed part of a much larger legal corpora collected over the course of three years in a related project on

Opportunities and threats for corpus linguistics 219

improving Legal English (see CELECR 2002 for details). Without going into a detailed discussion of findings, we would like to focus on just one of the aspects of the study, that is, the expression of epistemic certainty and doubt in the mini-corpus. A list of potentially high frequency items used to express epistemic certainty and doubt was drawn up (see Appendix 5), which included epistemic modal verbs, lexical verbs, adjectives, adverbs and nouns drawn in part from the relevant literature (see for example Holmes 1988; Hyland 1998; Hyland and Milton 1997). en, using the Wordlist function of Wordsmith Tools, a frequency count of the target items in the corpus was obtained to reveal the deductive nature of the genre (see Table 2) and role of content-specific vocabulary (where potential epistemic words such as belief, reasonable, intent, liability, negligent and mistake carry technical/legal meanings and so are not lexical epistemic hedges). A textual analysis was conducted to verify which items in Table 2 were epistemic hedges and to confirm the accuracy of the computer frequency count. Initially, this was done at sentence level using the text retrieval function in Wordsmith Tools. To determine whether the modal verbs were used in their epistemic or root /non-epistemic meaning, reference was primarily made to Coates’ (1983) work on modal semantics. It has been observed in legal literature on the legal problem genre that root modality appears principally in the rule section and epistemic modality in the application to facts and conclusion sections (Beasley 1994; Howe 1990). It was therefore decided to focus on selected sentences in context, where items were considered at sentence level using Coates’s (1983) epistemic inferential-non-inferential/confident-doubt scales of meaning and their position within the text noted by referring to Jensen’s (2000) breakdown of the texts into their rhetorical IRAC components (i.e. issue, rule, application and conclusion). It was also necessary to look at the whole context of the paragraph to classify indeterminate meanings and check interpretation. Expressions and lexical items expressing epistemic modality were also verified in the same way and co-occurrence of items noted. It was found, aer using the collocation and expand text extract functions in Wordsmith Tools, that many of the potentially epistemic items appeared in the rule section and collocated with non-epistemic / root modals, meaning they were not epistemic hedges at all. In other words, although the frequency count identified potentially epistemic items, where such words and phrases appeared, as well as how and why they collated with other words and expressions, needed broader contextual verification. All lexical and non-lexical epistemic items had

220 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

Table 2. Top 30 potentially epistemic items rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

item not as if will may would but can no however whether must could only so

raw no

percent

rank

243 200 170 110 106 82 76 71 66 62 61 57 49 47 45

1 0.82 0.7 0.45 0.44 0.34 0.31 0.29 0.27 0.25 0.25 0.23 0.2 0.19 0.19

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

item should therefore more possible then prove thus evidence held know cannot might still likely proved

raw no

percent

43 42 37 36 36 30 28 27 27 27 25 23 23 21 21

0.18 0.17 0.15 0.15 0.15 0.12 0.11 0.11 0.11 0.11 0.1 0.09 0.09 0.09 0.09

to be carefully re-examined for their potential hedging function within the text – including tense, if-clauses, questions, contrast markers and reference to authority – to ultimately account for the actual lexical and non-lexical epistemic choices made, and their position in the text. e results of these surface and textual analyses were then verified with specialist informants from the field of law. In particular, views on the purpose of the genre, use of non-lexical and pragmatic hedges and appropriacy of the language used were crucial in interpreting the exact nature of various expressions of certainty and doubt. e result of this study ultimately showed that non-lexical and pragmatic discoursal hedges play a significant interdiscursive role in demonstrating legal reasoning (very similar to the reasoning and language used in law cases), and that many of the potentially epistemic lexical items present in the corpus were generally not hedges at all but indicators of inter-textuality and interdiscursivity. In other words, it was the presence of the non-lexical, pragmatic and discoursal hedges and strategies that influenced the appropriacy of the epistemic choices used in the possible outcomes ultimately suggested. is interconnecting between the intertextual and interdiscursive nature of the legal problem genre is illustrated in Figure 4. e extract in Figure 4 is a good example of how to answer a legal problem and satisfy the requirement for good deductive reasoning because the answer

Opportunities and threats for corpus linguistics 221

Issue (3) Liability to Wong – Murder – Causation – Transferred Malice

Rule

(4) To charge Wong murder, it is necessary to examine the actus reus and mens rea. (5) An actus reus represents the act of the accused.

(8b) (Case cite & brief discussion of case.) (9) However it is impossible to convict a person to murder without examining his mens rea. (10) In (Case Cite) the Court of Appeal held that it should be left to jury to decide whether the appellant has the mens rea while either he cut into the deceased’s neck or cut her body in several pieces. (11) Also since it is a direct attack with a weapon, there is no need to find the appellant’s intention as in (Case cite and brief discussion of case, and related cases).

Application

(6) Mr. Wong put the poison in a cup of tea is an actus rea. (7) Although it is unknown to us that what kind of the poison is, and whether the poison could cause a death instantly, it is not ridiculous to assume that a poison could cause, at least, grievous bodily (8) Therefore Wong harm. is possible to convict manslaughter.

(12) To follow the direction in (Case Cite), a judge should direct the jury whether Wong has the intention, either kills or causes grievous bodily harm, while he put poison in a cup of tea.

Figure 4. Deductive reasoning in legal problem answers (Langton 2002)

– – – – –

Conclusion

is structured around IRAC; sets out & cites relevant rules (moves 8b – 12); uses legal syllogism (move 13). carries reasoning/ rule language over to A & C sections creates alternative arguments by: (a) referring to known/unknown facts (move 7) (b) using if-clauses (move 13)

(13) If the jury believes, beyond reasonable doubt, that Wong had the intention, then they should convict Wong for murder.

222 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

(c) drawing distinctions with established case law (contrast markers). is study illustrates the need to confirm and further interpret the frequency and position of lexico-grammatical as well as discoursal elements in the light of pragmatic, social and institutional knowledge of the disciplinary culture which a particular genre is associated with. is is essential in the case of most of the specialized legal genres because most the lexico-grammatical and discoursal elements draw their specialized discoursal values from their institutional and disciplinary context, which are almost impossible to include in any development of the corpus. Unlike many other discourses of common currency, legal discourse is extremely rich in intertextuality and interdiscursivity. Intertextuality, as discussed in earlier sections can still be handled within corpus linguistic models, but interdiscursivity poses a significant challenge.

Pedagogic applications and conclusions e findings based on the two studies of small corpora point out a number of important pedagogic applications. e first one relates to the use of small genre-based corpora for the development of restricted grammars of specialized genres, highlighting special features of lexico-grammar, including collocations. e second study based on student corpora revealed four main types of hedge (see Langton 2001), namely: –

–

– –

non-lexical hedges and strategies which promote deductive reasoning by emphasizing the evidential reasoning between the grounds of the argument (rules) and claim (outcome) and which help meet the content requirements of the genre; non-lexical hedges and strategies which enable writers to reduce their presence and adopt the role of advisor (and maintain formality and distance from expert members of the discourse community as novice members); lexical hedges in the rule section which engage the reader and appeal to shared knowledge but function as discourse/ pragmatic strategies; and lexical hedges in the application and conclusion sections which qualify the writer’s confidence in or commitment to the truth or actuality of the event or outcome proposed and which are dependent on deductive reasoning, non-lexical epistemic and pragmatic hedges for their degree of certainty or doubt.

Opportunities and threats for corpus linguistics 223

Law students’ need to be familiar with the form and function these hedges take in different parts of the legal problem answer. It is therefore possible to design a three-step activity, as indicated below, for identifying the form non-lexical epistemic and pragmatic/discoursal hedges take and the role they play in deductive reasoning and the choice of lexical epistemic (opinion) expressions used in the final outcome statements.

Step 1: Awareness If judgments and legal problem answers can be used with a concordancer, worksheets on hedging forms in legal problem questions and judgments can be used for students to Identify & classify hedges by function, form or grammar Complete gapped concordance print outs on use of tenses, conditional clauses etc – Rank hedges by degree of certainty or doubt – Examine similar texts for variation in hedges used – Identify relationships between rule & analysis sections etc. Similar worksheets can be used with hard copies of the texts if a concordancer is not available. – –

Step 2: Contextualizing Using samples from the rhetorical moves of answers to the same legal problem or of opposing appellate judgments, students could analyse and compare these for – – – –

Typical lexico-grammatical structures & patterns Variation in degrees of certainty & doubt Variation in rule statements & case citation methods Appropriacy of conclusions based on deductive reasoning evidenced in rule & application sections etc

Step 3: Application – –

Introduce students to various forms of lexical & non-lexical hedges Work through high frequency modals, conditionals, legal syllogism structures, case referencing methods etc.

224 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

–

Students complete worksheets on: (a) Gapped texts (b) Writing alternative arguments and outcomes (c) Correcting poorly reasoned answers (d) Writing answers to past exam questions

One can even go beyond these typical lexico-grammatical features to rhetorical structures of genres as well, which can be useful in the teaching and learning of discourse patterns for pragmatic success in professional discourses. However, we would like to point out that some of these opportunities for the use of corpus linguistics in legal discourse also present potential challenges. As noted in the beginning of the chapter, legal discourse is so complex in terms of intertextuality and interdiscursivity that a number of academic and professional genres in law appear to be dynamically embedded one within the other. e legal-problem genre, for instance, makes use of legislation, cases, and judgments in the sections where the writer presents legal argument displaying his or her skills as lawyer, bringing in legal authorities, precedents, and analytical understanding of legal issues. is is the same display we find in legal pleadings, legal memoranda and some oral legal argument. What makes these genres unique or special is the level and depth of embeddings from other genres in the form of authorities and precedents. In order to analyze these genres adequately and comprehensively, we need to focus not only on the conventions that make up these embedded genres individually, but also on the conventions that make such embeddings possible. In other words, to account for the use of these conventions, we are invariably required to go beyond the textual surface and intertextuality to interdiscursive, institutional constraints and concerns that oen become crucial in correctly interpreting these academic and professional genres. It is only through these qualitative measures which focus on the sociocognitive, ethnographical, and socio-critical that one can fully appreciate the real nature and function of much of legal discourse. One needs to appreciate that qualitative analysis begins where corpus linguistics ends, at least in the context of present-day understanding of corpus linguistics. So the important issue is not to take corpus linguistics as an end in itself, but as a tool that can be integrated with other qualitative measures to provide an informed answer to the question, ‘why do legal professionals use the language the way they do?’

Opportunities and threats for corpus linguistics 225

References Beasley, C.J. 1993. “Language and content: The case of law.” Paper presented at 8th International Institute of Language in Education Conference, Hong Kong. December 15–18, 1992. Beasley, C.J. 1994. Picking up the problems: An applied linguistic analysis of the legal problem genre. Unpublished MA thesis, Faculty of Arts, Edith Cowan University. Bhatia, V.K. 1983. Applied Discourse Analysis of English Legislative Writing, A Language Studies Unit Research Report. Birmingham: University of Aston in Birmingham. Candlin, C.N. and Bhatia, V.K. 1998. “The project report on strategies and competencies.” In Legal Communication: A Study to Investigate the Communicative Needs of Legal Professionals. Hong Kong: The Law Society of Hong Kong. Candlin, C.N. and Maley, Y. 1997. “Intertextuality and interdiscursivity in the discourse of alternative dispute resolution.” In The Construction of Professional Discourse, B-L. Gunnarsson, P. Linnel, and B. Nordberg (eds), 201–222. London: Longman. CELECR 2002. Improving Legal English: Quality Measures for Programme Development & Evaluation. Department of English & Communication: City University of Hong Kong [Project No:303.20007001/ TDG0009]. Coates, J. 1983. The Semantics of the Modal Auxiliaries. Kent: Croom Helm. Enright, C. 1986. Studying the Law. 2nd ed Sydney: Branxton Press. Henry, A. and Roseberry, R.L. 2001. “Using a small corpus to obtain data for teaching a genre.” In Small Corpus Studies and ELT: Theory and Practice, M. Ghadessy, A. Henry, and R.L. Roseberry (eds), 93–132. Amsterdam: John Benjamins. Holmes, J. 1988. “Doubt and certainty in ESL textbooks.” Applied Linguistics 91:20–44. Howe, P.M. 1990. “The problem of the problem question in English for academic legal purposes.” English for Specific Purposes 9:215–236. Hyland, K. 1998. Hedging in Scientific Research Articles. Amsterdam: John Benjamins. Hyland, K. and Milton, J. 1997. “Hedging in L1 and L2 student writing.” Journal of Second Language Writing 6 (2):183–206. Jensen, C.H. 2000. Legal problem questions: Analyzing rhetorical structures and strategies using ‘IRAC’. Unpublished MA thesis, Department of English, City University of Hong Kong. Krever, R. 1989. Mastering Law Studies and Law Exam Techniques. London: Butterworths. Langton, N.M. 2001. To hedge or not to hedge: Certainty and doubt in answers to legal problem questions. MAESP dissertation submitted to the Department of English and Communication, City University of Hong Kong. Langton, N.M. 2002. “Hedging legal discourse.” Paper presented at World Congress of the Association Internationale de Linguistique Appliquee (AILA), Singapore. December 16–21, 2002. McEnery, T. and Wilson, A. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press. Nation, P. 2001. “Using small corpora to investigate learner needs: Two vocabulary research

226 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

tools.” In Small Corpus Studies and ELT: Theory and Practice, M. Ghadessy, A. Henry, and R.L. Roseberry (eds), 31–146. Amsterdam: John Benjamins. Swales, J.M. 1982. “The case of cases in academic legal purposes.” IRAL 20:139–48. Swales, J.M. and Bhatia, V.K. 1983. “An approach to the linguistic study of legal documents.” Fachsprache 5 (3):98–108. Tribble, C. 2001. “Small corpora and teaching writing.” In Small Corpus Studies and ELT: Theory and Practice, M. Ghadessy, A. Henry, and R.L. Roseberry (eds), 381–408. Amsterdam: John Benjamins. Trosborg, A. 1997. Text Typology and Translation. Amsterdam: John Benjamins.

256 18 6 6 6 4 4 4 4 3 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

submit…that (noun clause) submit…plan submit…dispute submit…notices submit to…(some acts) submit…details submit…document submit…list submit…statements submit…argument submit…drawings submit…engrossment submit…evidence submit…letter of consent submit to an order submit…account submit…application submit…certification submit…contract submit…figure submit…invoices submit…letting for flat submit…map submit…names submit…notices submit…offer submit…payment receipt submit…provision submit…quotation submit…report submit…payment receipt submit…provision submit…quotation submit…report submit…specifications submit…sections submit to…liability submit…validity submit…works

Freq

Cat Item

Appendix 1: Submit (Total frequency = 346)

(1) ...the legislature of the HKSAR. It is (176) ...the goods, the Plaintiff delayed in (154) .... Not all parties to legal disputes (310) ... continued “You are requested to (77) ...t any time, (h) the offender must (175) ...tiff by its fax dated 31 March 1998 (135) ...it D, a written submission document (309) ... the Tso properties. e lists were (121) ...directed that each defendant should (136) ...rs in a written outline of argument (173) ... 1997 regarding the filled drawings (182) ...hat when the plaintiff ’s solicitors (311) ... follows:- “You are requested to (315) ...ber, 1999 other than the request to (160) ...ain. As is conventional, Sundberg J (270) ... Note of Objections. “e account (179) ...n application for approval had been (243) ... level. Such certification shall be (296) ...r the main contract “which has been (188) ... month as opposed to what had been (52) ...March 2000 inclusive, the defendant (197) ...en 26% and 40%. At the hearing, he (312) ...in red on the site application map (307) ...Tso consisting of eleven names was (361) ...whom the revised notices were to be (172) ...aptioned project and are pleased to (189) ...orts nor payment receipts have been (288) ...re was no provision for any rate as (171) ...l payment terms. us the Plainti (196) ...with the firm, Chesterton Petty. He (189) ...orts nor payment receipts have been (288) ...re was no provision for any rate as (171) ...l payment terms. us the Plaintiff (196) ...with the firm, Chesterton Petty. He (365) ... Manufacturers specifications being (247) ...ing plans and long sections must be (224) ...against the payer, an obligation to (159) ...validity of Acts of Parliament to be (241) ...for the following works are to be

[[ submitted ]] [[ submitting ]] [[ submit ]] [[ submit ]] [[ submit ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submit ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submit ]] [[ submit ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submit ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submitted ]] [[ submit ]] [[ submitted ]] [[ submitted ]]

Example

that there was no valid adoption of... final plans to the Defendant for ap... their disputes for resolution by th... revised public notices accordingly ... to searches of places or things und... “final details” to the ird Party,... by the Crown. At about 10.30pm o... in connection with the removal of t... a defence statement to the court an... by the Crown. It was not disputed b... by Berkin and would like to inform ... the engrossment of the Assignment, ... documentary evidence confirming you... a letter of consent “from the owne... to the orders of this Court. But Dr... to the Scottish Legal Aid Board (“t... by an authorised person on behalf o... prior to any further building works... to arbitration”. Leaving aside the... as a pre-accident figure of $7,500 ... a total of 23 HYTC invoices (subjec... another comparable letting for Flat... with the current planning applicati... together with a new family tree sho... , namely whether they were to be p... our best offer as follows:- ITEM ... to the Court. I am of the view that... by the Plaintiff, therefore no mon... another (almost the same) quotation... a valuation report which determined... to the Court. I am of the view that... by the Plaintiff, therefore no mon... another (almost the same) quotation... a valuation report which determined... in respect of the proposed spiral s... to council for approval detailing t... to a liability imposed upon him or ... to our decision as abstract questio... for approval with the building appl...

Opportunities and threats for corpus linguistics 227

grant…consent grant…permission grant…certiorar grant…application grant…fiat grant…relief grant…order grant…stay grant…leave grant…bail grant…extension grant…approval grant…remedy grant…exemption grant…power grant…lease grant…trial grant…land grant…loan grant…prohibition, mandamus or injunction grant…certificate grant…right grant…adjournment grant…anonymity grant…appeal grant…custody grant…damages grant…decree grant…estate or interest grant…immunity grant…judgment grant…legal aid grant…letter of administration grant…loss grant…order grant…pay grant…privilege grant…prosecutors grant…writ

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Item

Cat

2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

24 23 16 15 15 14 13 13 12 10 10 8 8 5 5 4 4 3 3 3

Freq

(182) ...h the Legal Aid Regulations. I also (22) ...he fundamental constitutional right (276) ...examination given his condition. I (23) ...e to the effect that the Gardai be (310) ...efused by the first respondent, but (196) ...the custody of the two children was (188) ...n this matter, I am not prepared to (195) .... On 13.10.1997, the Petitioner was (322) ...d to the encroaching owner, or the (9) ...rmant agreements under which he was (174) ...avour of the Plaintiff. I therefore (194) ... Background e Petitioner was (178) ...r the Letters of Administration was (184) ...s. e Employees Compensation Board (60) ...drum. 36. e order by Sundberg J (273) ...ng Officer considered them before (24) ... only by a code letter, the Coroner (111) ...er of the proceedings before it, to (134) ...at this Court would be powerless to

(222) ...hey believed that should consent be (19) ...the case is a compelling reason for (54) ... favouring discretionary refusal to (51) ...In the event that the Supreme Court ((104)…2001, the Federal Attorney-General (130) ... and with a statutory injunction to (3) ... be interfered with. Accordingly, I (5) ...hat the courts have jurisdiction to (28) ...on the day of the murder. I would (12) ...ender may be remanded in custody or (190) ...aid, there is the built-in power to (212) .... 3) In October 1989, the Council (75) ...red by s 32 of the Judiciary Act to (175) ...ing Authroity a restricted power to (208) ... construed as going no further than (226) ...he unlawfulness of the mining lease (44) ...st him. Of course, if Mr Bikic were (259) ... of any land and additional land is (235) ...he year to 17 January 1995 the firm (57) ...is ancillary to the jurisdiction to]

Appendix 2: Grant (Total Frequency = 229)

[[ grant ]] [[ granted ]] [[ grant ]] [[ granted ]] [[ grant ]] [[ granted ]] [[ grant ]] [[ granted ]] [[ grant ]] [[ granted ]] [[ grant ]] [[ granted ]] [[ granted ]] [[ grant ]] [[ granted ]] [[ grant ]] [[ granted ]] [[ grant ]] [[ grant ]]

[[ granted ]] [[ grant ]] [[ grant ]] [[ granted ]] [[granted]] [[ grant ]] [[ grant ]] [[ grant ]] [[ grant ]] [[ grant ]] [[ grant ]] [[ granted]] [[ grant ]] [[ grant ]] [[ granted ]] [[ granted ]] [[ granted ]] [[ granted ]] [[ granted ]] [[ grant ]]

Example

a certificate for counsel. Unless a... to him.” Counsel says that there... the adjournment. In the end, aer ... anonymity at the hearing. e appli... on appeal in class 1 proceedings in... to the Respondent with reasonable a... any exemplary damages. Dated this... Decree Nisi while the custody of th... to him of any estate or interest t... immunity from prosecution for crimi... judgment for the Plaintiff in the s... legal aid on 22.5.1996 to institute... on 2nd August 1996. Initially, Fong... him 3% loss of earning capacity. Mr... the Bishops and the Episcopal Confe... the Respondent her long service pay... the said witnesses a privilege whi... to the prosecutors and/or to the At... the writ of habeas corpus[203] or a...

in this case then this would open u... permission to appeal. I disagree. A... certiorari in respect of non-jurisd... Glen’s application for the redeterm... a fiat to one of the prosecutors, n… “complete relief ”[202], there is no... the Order sought which is an Order... a stay where a fair trial of the ac... leave to appeal but dismiss the app... bail in accordance with the provisi... extension of time in that particula... approval for the construction of a a remedy in the nature of certiorar... exemptions; whereas owners of land ... administrative power.” 125. e cl... under the Mining Act 1973. e pr... a separate trial, the evidence of h... to that person with the intent that... the pursuer a loan and further loan... prohibition, mandamus, or an injunc...

228 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

Item

dismiss…appeal dismiss…application dismiss…claim dismiss…plaintiff dismiss…complaint dismiss…summons dismiss…action dismiss…him dismiss…submission dismiss…Appeal Tribunal dismiss…argument dismiss… Building Authority dismiss…charges dismiss… employee dismiss… notice of motion dismiss …objection dismiss…possibility dismiss…(proper noun)

Cat

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

53 14 12 7 5 4 3 2 2 1 1 1 1 1 1 1 1 1

Freq (7) ...answered ‘yes’ to that question and (4) ...n any way. 28. I will therefore (26) ... that I should simply make an order (73) ...lly took no part in the decision to (1) ...ake in all the circumstances was to (60) ... default judgment. at summons was (98) ...ns? Answer e action should be (65) ...risk of loss. ey had the power to (96) ...nt. I would not, with respect, have (36) ...al to the Appeal Tribunal was (92) ...ty as Hanave’s solicitor”, Lehane J (59) ...he Building Authority cannot now be (10) ...rown offered no evidence on them, I (117) ...t would be lost if the employee was (130) ...llows: 1. e notice of motion is (106) ...rs of the Court are: 1. Objection (64) ...ents Ltd supra, the Court of Appeal (93) ...occurred, had been in a position to

Appendix 3: Dismiss (Total frequency = 111)

[[ dismissed ]] [[ dismiss ]] [[ dismissing ]] [[ dismiss ]] [[ dismiss ]] [[ dismissed ]] [[ dismissed ]] [[ dismiss ]] [[ dismissed ]] [[ dismissed ]] [[ dismissed ]] [[ dismissed ] [[ dismissed ]] [[ dismissed ]] [[ dismissed ]] [[ dismissed ]] [[ dismissed ]] [[ dismiss ]]

Example the appeal. But the prosecution r... the application. 29. Dated th... SPUC’s claim. I shall do so. ... the plaintiff, and she did not come... the complaint. 9. However, the ... by Master Chu (as she then was) on ... with costs, including the costs of ... or to suspend him. But against that... such a submission lightly. I would ... on 23 February 1999. 3.Secti... that argument. He did so on the bas... as merely cycnical remarks because ... the remaining three charges relatin... in accordance with section 9 of the... . 2. e exhibits may be returned... . 2. Development consent confirme... the “utterly remote” possibility of... Mr Burke and his firm and had elect...

Opportunities and threats for corpus linguistics 229

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Item

reject…evidence reject…submission reject…argument reject…claim reject…proposition reject…it reject…contention reject…appeal reject…application reject…the conferring of judicial power reject…a gloss reject…an analysis reject…application reject…approach reject…effect reject…location adjustment reject…one version reject…someone’s conclusions reject…statement reject…suggestion rejected…the accused’s account reject…the conduct reject…the idea reject…the justification reject…the notion reject…the offer reject…testimony

Cat

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

16 10 5 5 5 5 4 3 2 2

Freq

(73) ...istinguished scholars[172]. I would (93) ... the compensation at $180,000. I (92) ...eing the basis on which the Council (62) ...this Court has already specifically (58) ...ever, he reached that conclusion by (59) ...ion and +4% for inferior layout. We (23) ...he trial magistrate was entitled to (35) ...Sharah has said. Certainly, I would (3) ... judgment in the McLaughlin case, I (66) ...ct”. e primary judge specifically (17) ... not necessarily mean that the jury (55) ...e person would understand her to be (11) ...control and in crisis when his wife (43) ...rary to principle for this Court to (71) ...mportantly, Australian law has long (15) ...f the indictment, but the offer was (85) ...s unsatisfactory and that was why I

(12) ...es were more the aggressor in it. I (25) ... a material misdirection. 10. In (44) ...jurisdiction of this Court. I would (9) ... consequence I could admit some and (13) ...nce. e jury, properly in my view, (7) ...cords it, there can be no basis for (49) ...Article 10 of the Bill of Rights. I (22) ... that the decision of this Court to (4) ...ght ahead position. e trial judge (40) ... Navigation Acts this Court did not

Appendix 4: Reject (Total frequency=74) Example

[[ reject ]] [[ reject ]] [[ rejected ]] [[ rejected ]] [[ rejecting ]] [[ reject ]] [[ reject ]] [[ reject ]] [[ reject ]] [[ rejected ]] [[ rejected ]] [[ rejecting ]] [[ rejected ]] [[ reject ]] [[ rejected ]] [[ rejected ]] [[ reject ]]

[[ reject ]] [[ rejecting ]] [[ reject ]] [[ reject ]] [[ rejected ]] [[ rejecting ]] [[ reject ]] [[ reject ]] [[ rejected ]] [[ reject ]]

such a gloss because it frustrates ... the respondent’s analysis of the va... the development application. at c.. such an approach to the meaning of ... any adverse affect of the CRA and b... any location adjustment for the sim... one version and believe the other. ... Mr Taylor’s conclusions as to Mr Sh... the statements contained therein. I... .the suggestion, made by Mr Burke du.. .the accused’s account. ey could h.. the conduct of which she was compla... this idea. He said he took tablets ... or curtail the jurisdiction so conf... the notion that the respective obli... by the Crown. Such a plea was not e... his testimony: (1) Mr Yu mention...

evidence to the effect he chased th... this submission, the Court of Appea... the argument that it was not compet... other evidence. I accept that there.. the latter proposition and I reject... it. .... It has been volunteered fr... Mr. Pirie’s contention that the app.. the appellant’s appeal cannot valid... the application to stay the proceed... the conferring of non-judicial powe...

230 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

Opportunities and threats for corpus linguistics 231

Appendix 5: Potential list of items to express certainty and doubt in problem questions MODAL VERBS Certainty must cannot could not have/has to will (prediction) would Need Lexical verbs is accepted allege appear argue/ is arguable assume believe claim conclude confirm consider decide determine depend (not)doubt no doubt establish expect find hold indicate infer (not) know (it is (not)) known (it is (not) necessary (it is) obvious predict presume (d) prove (it is) questionable (it is (not) reasonable require satisfy seem show suggest state submit (not) think

Probability

Possibility

ought to should/ should not will (not) (prediction) would (not)

could may (not) might (not)

going to Nouns allegation argument belief (un) certainty chance claim conclusion decision evidence fact (that) knowledge likelihood opinion necessity (im)possibility (im)probability proof question requirement

Adjectives apparent bad (un)(not)certain (un)(not)clear (in)correct definite (not) enough essential good high least less (un)(not)likely mere mistaken more most much (un)necessary obvious (im)possible (im)probable (un)reasonable sufficient virtual weak wrong

Qualifiers/Contrast Markers (al)though provided that as/since rare(ly) hardly so however thus if (unless) therefore never until occasionally

Adverbials actually apparently beyond (reasonable) certainly [doubt clearly definitely easily essentially generally highly (in)correctly in fact in x opinion indeed just merely (not) necessarily obviously possibly probably really simply still surely usually very likely well without doubt virtually wrongly

232 Vijay K. Bhatia, Nicola M. Langton and Jane Lung

A corpus linguistic analysis 233

Section IV

234 Ulla Connor and omas A. Upton

A corpus linguistic analysis 235

The genre of grant proposals: A corpus linguistic analysis Ulla Connor and Thomas A. Upton Indiana University Purdue University Indianapolis

Introduction Grant proposals have been explored as a distinct genre of academic writing. Myers (1990), in fact, has described grant writing as “the most basic form of scientific writing: the researcher must get money in the first place if they are to publish articles…” Connor and Mauranen (1999:47) have also recognized that grant proposals are a significant part of the professional writing of most academics, noting that they represent “a genre that all academics will have to come to terms with at some point of their career, usually the sooner the better.” In his study, Myers looked at the dras and final products of two biologists’ research proposals, and interviewed both authors about their writing processes. Myers concluded that grant proposals are from start to finish designed to persuade “without seeming to persuade” (1990:42). He found two primary elements that together account for much of the persuasive appeal of academic grant proposals; the first included how researchers situated themselves in relationship to the academic community – that is his/her status – as represented by the home institution, publications, previous funding, claims about relevance of previous work, etc. e second was the question of how well the researcher situated the planned work within the previous research in the field, balancing originality with adherence to expected conventions. Drawing on the work of Swales (1990) and Bhatia (1993), Connor and Mauranen (1999) sought to extend Myers’ exploration of the persuasive nature of academic grant proposals through genre analysis, the identification of “a linguistic/rhetorical system of genre-specific ‘moves’ to describe and evaluate” a genre (Connor 2000:2). Connor and Mauranen (1999) looked at 34 pro-

236 Ulla Connor and omas A. Upton

posals from European Union research grant applications, written in English primarily by Finnish scientists, and identified ten recurrent moves that not only overlapped with those of academic research papers (cf. Swales 1990) and promotional genres (cf. Bhatia 1993) but included moves specific to the genre of grant proposals. Connor (2000) further explored the genre moves defined by Connor and Mauranen by using them to analyze fourteen research grant proposals written by five humanities and science researchers for US government and private funders. She found the developed system of moves “clear and meaningful” to the researchers, and with minor modification proved reliable for describing the moves of these proposals. Grant proposals, however, are not a genre exclusive to academia. e writing of grant proposals is of primary importance to almost all non-profit organizations whether they are working in education, arts, human services, or elsewhere. Yet, there has been very little research looking at their distinct linguistic characteristics. As the success of grant proposals is certainly essential to non-profits, having a more detailed understanding of what, linguistically, makes a proposal persuasive is valuable. e first goal of this study was to examine the linguistic and rhetorical features of promotion and persuasion in grant proposals written by non-profit organizations. Following the lead of Connor and Mauranen (1999) and Connor (2000) in their studies of academic grant proposals, we applied a Swalesian genre analysis (Swales 1981) to identify rhetorical “moves” common to nonprofit grant proposals. A second goal, however, was to experiment with applying corpus linguistic techniques to textlinguistic investigations, providing a potentially more powerful evaluation of text features (Flowerdew 1998; Upton and Connor 2001). e previous studies described above looking at academic research proposals were limited to the hand analysis of relatively few proposals. With the development of corpus linguistic techniques that allow for the automatic processing of large numbers of texts, new avenues for exploring this genre to determine what linguistic features might contribute to their persuasive appeal are now available. Consequently, we used a multidimensional analysis technique (Biber 1988) to develop a linguistic profile of the genre. We then applied the multidimensional analysis to the individual rhetorical moves in order to evaluate common linguistic features of specific moves. e overall purpose of this study was to highlight key areas in which the

A corpus linguistic analysis 237

genre of non-profit grant proposals proves to be distinctive. Once genre-specific moves are identified, their contribution to the persuasive nature of grant proposals and the range of variation within the genre that is exhibited can be explored. Further, by combining a corpus-based linguistic dimensional analysis with a Swalesean genre analysis, we hoped to be able to present a more accurate and detailed description of the genre of non-profit grant proposals.

Description of the corpus e grant proposals used in this study are drawn from the Indiana Center for Intercultural Communication (ICIC) fundraising corpus. e ICIC fundraising corpus data collection began aer two international conferences were organized at IUPUI (Indiana University Purdue University Indianapolis) by ICIC and the Indiana University Center on Philanthropy (October 1997, August 1998). An important aspect of the conferences was the collaboration between linguistic scholars and practicing fundraisers in discussing issues of fundraising. is collaboration resulted in the collection of a nearly, two-million word, computerized databank of fundraising texts consisting of the most important fundraising genres – direct letters, case statements, grant proposals and annual reports. e corpus is gathered from six separate fields in the nonprofit sector (Education, Health/Human Services, Arts/Culture, Environment, Community Development, and Other Organizations). Among the organizations representing the field of Education were such non-profits as the Indiana University Foundation, IU School of Dentistry, and the IUPUI School of Liberal Arts. e field of Health and Human services was represented by organizations such as the American Cancer Society, Salvation Army, and Goodwill Industries. Agencies like the Indianapolis Museum of Art, Indianapolis Civic eater, and Eiteljorg Museum, etc. fall into the category of Arts and Culture. e Environment section was represented by such non-profits as Pelican Harbor Seabird Station, Environmental League of Massachusetts, and the Humane Society for Greater Nashua. Representing the field of Community Development were organizations such as the Southeast Neighborhood Development and Community Development Corporation. e category Other Organizations included organizations that do not fall into any of the five above-mentioned categories (e.g. religious organizations). is section contained organizations such as the League of Greek Orthodox Stewards, Elijah House, and

238 Ulla Connor and omas A. Upton

Table 1. ICIC fundraising corpus organization types Type Health and Human Services Environment Community Development Education Arts and Culture Other Total

n 82 24 21 44 45 20 236

Table 2. ICIC fundraising corpus document types Types Direct-mail Letters Invitations, Newsletters Case Statements Grant Proposals Annual Reports Totals

Org. n

Item n

Word n

75 172 12 27 51

45 445 13 68 84 856

94,235 922,212 121,780 154,021 523,770 1,816,018

Note: Org. n = the number of organizations represented in this type in the corpus Item n = the number of items of this type in the corpus Word n = the number of words in the documents of this type in the corpus.

the Institute for Christian and Jewish Studies. Tables 1 and 2 describe the data gathered for this corpus. e data used for the research drawn from the above corpus included 68 grant proposals from 27 organizations. Grant proposals from Health and Human Services accounted for the largest part of the corpus (60 out of 68). Table 3 provides the breakdown of grant proposals by organization type. Each grant proposal was scanned into a computer and double-checked for accuracy. Each proposal was then coded to indicate non-profit field, organization, and organization size (based on income). is information was obtained through questionnaires and interviews conducted with the agencies represented in the corpus.

A corpus linguistic analysis 239

Table 3. Number of grant proposals in the corpus by non-profits type Number Health and Human Services Environment Community Development Education Arts and Culture Other Total

60 3 2 0 3 0 68

Identifying rhetorical moves of grant proposals In order to identify rhetorical moves in the grant proposals, the system of moves developed by Connor and Mauranen (1999) was adopted. ey identified ten obligatory moves in European Union research grant proposals: territory, gap, goal, means, reporting previous research, achievements, benefits, competence claim, importance claim, and compliance claim. e Connor and Mauranen set of moves and definitions is given in Figure 1. Five out of 68 proposals in the ICIC fundraising corpus were chosen randomly and then analyzed in order to dra the initial set of moves that appeared to be used by writers of non-profit grant proposals. Aer an initial calibration session with the five proposals, a brief description of disagreements and changes were made, and move definitions were finalized. With this set of definitions, another ten proposals were analyzed and the moves marked. In an effort to gain a high inter-rater reliability, our four member research team consisting of Connor, Upton, and two research assistants, 1) reviewed the marked moves on the 10 proposals; 2) discussed the differences between the moves of research grant proposals and those of non-profit grant proposals; and, finally, 3) revised the definitions of the moves as a research team. With the revised set of move definitions, the ten sample proposals were reanalyzed and a consensus was obtained by the research team. e final set of moves for the genre included the following seven moves: territory, gap, goal, means, competence claim, importance claim, and benefits. Territory explains the present situation in which the field is placed and general needs in this area are explained. Gap points out that there is a gap in the present situation by indicating the problems or specific needs which the organization faces.

240 Ulla Connor and omas A. Upton

Territory: establishes the situation in which the research is placed or physically located. ere are two types of territory: (1) that of the “real world,” the world outside the research field; and (2) that of the field of research in which the proposal itself takes place. Gap: indicates that there is a gap in knowledge or a problem in the territory, whether in the “real world’ (for example, environmental, commercial, financial), or in the research field (for example, pointing out that something is not shown or certain). is move serves to explain the motivation of the study. Goal: is the statement of aim, or general objective of the study. In other words, it explains what it is the researcher wants to get done. Means: includes the methods, procedures, plans of action, and tasks that the proposal specifies as leading to the goal. Reporting Previous Research: refers to text that reports on or refers to earlier research in the field, either by the proposing researcher or by others. Achievements: describes the anticipated results, findings, or outcomes of the study. Benefits: explains the intended or projected outcomes of the study which could be considered useful to the “real world” outside the study itself, or even outside the research field. Competence Claim: contains statements to the effect that the research group proposing the work is well qualified, experienced, and generally capable of carrying out the tasks set out. Importance Claim: presents the proposal, its objectives, anticipated outcomes, or the territory as particularly important or topical, much needed, or urgent with respect to either the “real world” or to the research itself. Compliance Claim: makes explicit the relevance of the proposal to EU objectives, usually with highly specific reference to directives and/or the set goals of the program in question. Figure 1. Definition of moves in the European Union grant proposals

General objectives of the grant are represented in goal. at is, goal expresses what the organization will do with this grant. In means, the requested amount, detail procedures such as timetable and budget, and evaluation of the proposed project are included. In order to achieve confidence from grant-givers, general information about the organization, its ability to conduct the proposed project, and its ability to obtain other funding sources are explained in competence

A corpus linguistic analysis 241

claim. Importance claim discusses the proposed project’s importance. Benefits states possible advantages which can be acquired by the target population from the proposed project. In contrast with Connor and Mauranen’s (1999) system developed for academic grant proposals, the three moves of reporting previous research, achievement, and compliance claim did not appear in the new system for non-profit proposals (see Figure 2). Two researchers then applied the final system of moves to all the proposals. A third researcher analyzed ten randomly selected proposals, and a high agreement was found between the analyses. Figure 3 includes a sample proposal with moves tagged. e results of the moves analysis on the non-profit grant proposals are shown in Table 4. Word counts and percentages for the moves indicate that the competence claim accounted for 44 percent of all the text of the proposals. e second largest part was means, accounting for 30 percent. ree moves, namely competence claim, means, and gap accounted for more than 80 percent of the text in the non-profit grant proposals. Territory (4%), goal (4%), importance claim (1%), and benefits (5%) accounted for the remainder.

Multidimensional analysis of the non-profit fundraising corpus is analysis is based on the multidimensional approach to register variation developed by Biber (1995), which by design requires a corpus-based approach Table 4. Word counts and percentages for moves Word count

Percentage

Territory Gap Goal Means Competence Claim Importance Claim Benefits

3,376 11,443 3,988 26,694 39,354 1,115 4,093

4% 13% 4% 30% 44% 1% 4%

All Grant Proposals

90,063

100%

242 Ulla Connor and omas A. Upton

Move 1: Territory Suggests the present situation or general needs in this area Move 2: Gap Indicates the problems or specific needs which the organization in question faces Move 3: Goal States general objectives of the grant, what the organization will do with this grant Move 4: Means a. b.

c.

Requested amount Procedures – specific projects to be done with the grant – how the money will be used – time table and budget Evaluation of the proposal project

Move 5: Competence claim a. b. c.

General information about the organization (e.g., mission, history, current board) e ability of the organization to conduct the proposed project Other funding sources to which the organization has submitted a request

Move 6: Importance claim States the importance of the proposed project Move 7: Benefits a. b.

Directly related to the proposed project Related to the target population

Figure 2. Final moves and definitions for non-profit proposals

to investigating language use. Biber’s approach seeks to systematically describe the linguistic characteristics of different types of texts in English and to explain variation in text types using a notion that he calls textual dimensions. According to Biber (1988:55) “dimensions are bundles of linguistic features that cooccur in texts because they work together to mark some common underlying function.” Biber uses a multivariate analysis to identify which of 67 linguistic

A corpus linguistic analysis 243

GRANT REQUEST TO KIWANIS FOUNDATION OF INDIANAPOLIS, INC. BY PLEASANT RUN CHILDREN’S HOMES, INC. (Begin Competence claim)Pleasant Run Children’s Homes was established in 1867 to care for children orphaned by the Civil War.(End Competence claim) (Begin Territory)e founders believed the need would be a temporary one, and in time, every child would have a safe, nurturing and happy home to live in. Although more than a hundred and twenty years have passed and the needs that Pleasant Run addresses have changed, that hope remains alive ... that in time, all children will have the loving, supportive homes they deserve. (End Territory) (Begin Competence claim)Today, Pleasant Run is a not-for-profit agency that serves seriously troubled children who have been abused and neglected. Our clients are referred by county welfare caseworkers or probation officers, by state special-education officials or by parents in cooperation with health-insurance providers. Regardless of how these children come to us, our objective is to help them become healthy, happy, contributing members of society. One of Pleasant Run Children’s Homes programs is treating children ages 7 to 18 in one of our five group homes. It is in this setting that children are provided a homelike environment while providing structure and supervision to help stabilize their behavior and lives. (End Competence claim) (Begin Goal)is grant request is for recreation equipment for our residential group on Cooper Road.(End Goal) (Begin Benefits)is home is in need of age appropriate recreation equipment for boys ranging from age 10 to 15.(End Benefits)(Begin Gap) Currently, our Cooper Road home has only a basketball court.(End Gap) (Begin Importance claim)Recreation equipment is important in the lives of these young boys. Not only do they have lots of energy to spend, this type of activity helps build selfesteem for children who have had very little positive reinforcement in their lives. It is also a way to teach team-building skills.(End Importance claim)(Begin Competence claim)e children Pleasant Run Children’s Homes serve oen have suffered abuse and neglect, some have even been abandoned.(End Competence claim) (Begin Means)e equipment under consideration for purchase is manufactured by Creative Playgrounds Ltd. It is a 9 station course made out of treated pine. e cost is $4,784.00.(End Means)(Begin Importance claim) We appreciate your consideration of this request. If you do decide to provide the funding for this equipment, please know that you will be providing hours and years of enjoyment to children who have very little fun or play in their lives.[End Importance claim) Figure 3. Sample grant proposal tagged for rhetorical moves

244 Ulla Connor and omas A. Upton

1) INVOLVED VS. INFORMATIONAL Positive features: private verbs THAT deletion contractions present tense verbs second person pronouns DO as pro-verb analytic negation demonstrative pronouns general emphatics first person pronouns pronoun IT BE as main verb causative subordination discourse particles indefinite pronouns general hedges amplifiers sentence relatives WH questions possibility modals non-phrasal coordination WH clauses final prepositions ----------------Negative features: nouns word length prepositions type/token ratio attributive adjectives

2) NARRATIVE VS. NON-NARRATIVE CONCERNS Positive features: past tense verbs third person pronouns perfect aspect verbs public verbs synthetic negation present participle clauses -------------------Negative features present tense verbs attributive adjectives past participle WHIZ deletions word length 3) EXPLICIT VS. SITUATIONDEPENDENT REFERENCE Positive features: WH relative clauses on object position ‘pied piping’ constructions WH relative clauses on subject position phrasal coordination nominalizations ---------------------Negative features: time adverbials place adverbials adverbs 4) OVERT EXPRESSION OF ARGUMENTATION Positive features: infinitives prediction modals suasive verbs conditional subordination necessity modals split auxiliaries

Figure 4. Linguistic features of four dimensions of variation. From Biber (1988; 1995) based on a factor analysis of 67 linguistic features in 481 texts from 23 spoken and written genres. Features are explained in Biber (1988:221–245).

A corpus linguistic analysis 245

features typically co-occur, with each group of co-occurring features defining a dimension. Based on the assumption that features do not randomly co-occur in texts but in fact together serve a functional purpose, Biber has defined five dimensions, or continua, along which registers in English vary as reflected by their linguistic features. Figure 4 outlines four of the major dimensions and the features that define them. In the first dimension, “Involved vs. Informational Production,” we see that private verbs (verbs like think and believe that reflect unseen cognitive activity), that deletion, contractions, etc. tend to co-occur in texts. On the other hand, nouns, long words, prepositions, etc. also tend to co-occur. However, when the positive features occur, the negative features tend not to occur, and vice-versa; that is, they are in complimentary distribution. ese positive and negative features define the two ends of the continuum that makes up this dimension.1 When looking at the types of registers that display these positive features, face-to-face conversations is a genre that has a high positive score. ese features reflect face-to-face conversation’s focus on interaction and affective concerns – that is, interpersonal involvement. Official documents and academic prose, however, have strongly negative scores on this dimension. ey tend to have many nouns, long words, prepositions, etc. ese features reflect these texts’ emphasis on the precise presentation of content rather than on interactive or affective matters. As with Dimension 1, the other three dimensions discussed in this paper have titles that reflect the qualitative interpretation of the functions of its linguistic features as well as the genres that are strongly marked – positively or negatively – on the dimension. e other three dimensions used in this study – which include “Narrative versus Non-narrative Concerns,” “Explicit versus Situation-Dependent Reference,” and “Overt Expression of Argumentation” – are also shown in Figure 4. e last dimension, “Overt Expression of Argumentation,” is unique in that it only has positive features. at is, there is no set of linguistic features that tend to co-occur when the positive features are absent, as is the case for the other three dimensions. Calculating mean scores for various types of texts on each of these dimensions allows us to provide complex linguistic descriptions of each type and also allows for complex comparisons among the categories. As noted by Conrad (1996:308), “certain types of texts may, for example, be very different from each other with respect to their narrative focus but very similar with respect to

246 Ulla Connor and omas A. Upton

informational production and impersonal style.” Consequently, this multidimensional analysis permits a more thorough evaluation of a genre than would the examination of only one or two linguistic features, which is typical of many corpus studies. Each of the 68 grant proposals was analyzed by a computer program designed to identify all occurrences of the linguistic features listed in Figure 4. Counts of each feature in each text were normed to 1,000 words and then standardized to the corpus used in Biber’s (1988) study to allow for comparisons with the findings in that study. ese standardized counts were then averaged across all letters to determine a mean score for each of the four dimensions for this genre.2 Typical of studies that use multidimensional analysis, a qualitative analysis of how the linguistic features in each dimension functioned in the INVOLVED 35- Face-to-face conversations 30– 25– 20– Personal letters Public conversations 15– 10– 5– 0– –5–

Prepared speeches Professional letters

–10– Direct-mail letters –15– Academic prose –20– Non-Profit grant proposals (–20.48) INFORMATIONAL Figure 5. Mean scores of Dimension 1 for Non-profit grant proposal (in bold), Direct-mail letters (in italics) and six other English registers – Involved Versus Informational (Adapted from Biber 1995).

A corpus linguistic analysis 247

NARRATIVE 7– 6– 5– 4– 3– 2– 1– 0– –1– –2– –3–

Romance fiction

Biographies Prepared speeches Face-to-face conversations Professional letters Academic prose Direct-mail letters Non-profit grant proposals (–3.75)

NON-NARRATIVE Figure 6. Mean scores of Dimension 2 for Non-profit grant proposal (in bold), Direct-mail letters (in italics) and six other English registers – Narrative versus Non-Narrative Concerns (Adapted from Biber 1995).

texts was then conducted. is was done by examining samples of texts to look for specific examples of how the dimensions were reflected within the genre. In this study, the application of Biber’s multidimensional analysis was designed to answer the following questions: (1) Is the grant proposal genre a distinguishable genre? (2) Are linguistic features, as represented by the linguistic dimensions, consistent across the genre, or do they vary significantly across respective moves, with the linguistic dimensions being but an “average” of the combined moves? e hypothesis for the latter question was that linguistic features of specific moves would vary from the “average” linguistic features for the whole genre as a result of the specific “goals” for each move. We expected there to be a “linguistic realization” of the rhetorical features. Mean scores of the multidimensional analysis of the entire group of grant proposals are shown in Table 5. e scores range from –20.48 on the involved versus informational production to +8.37 on the explicit versus situationdependent dimension. A more meaningful analysis for the comparison of

248 Ulla Connor and omas A. Upton

EXPLICIT 8– Non-profit grant proposals (8.37) Official documents 7– Professional letters 6– 5– Direct-mail letters 4– 3– 2– 1– Prepared speeches 0– –1– –2– –3– General fiction –4– Face-to-face conversations –5– Telephone conversations SITUATION-DEPENDENT Figure 7. Mean scores of Dimension for Non-profit grant proposal (in bold), Direct-mail letters (in italics) and six other English registers – Explicit versus Situation-Dependent Reference (Adapted from Biber 1988). OVERTLY ARGUMENTATIVE 3– 2– 1– 0– –1– –2– –3– –4–

Professional letters Editorials “Benefits” move (2.09) Face-to-face conversations Academic prose Non-profit grant proposals (-0.75) Direct-mail letters Press Reviews “Territory” move (–3.13) Broadcasts

NOT OVERTLY ARGUMENTATIVE Figure 8. Mean scores of Dimension 4 for Non-profit grant proposal (in bold), two grant proposal “moves” (in bold italics), Direct-mail letters (in italics) and six other English registers – Overt Expression of Argumentation (Adapted from Biber 1995).

A corpus linguistic analysis 249

Table 5. Textual dimensions scores Dimension Involved vs. Informational Production Narrative vs. Non-Narrative Concerns Explicit vs. Situation-Dependent Reference Overt Expression of Persuasion

Dimension score –20.48 –3.75 8.37 –0.75

genres is given in Figures 5–8. Figures 5–8 situate the genre of grant proposals in relation to other genres previously studied by Biber (1988) and Connor and Upton (2003). In Dimension 1 (see Figure 5 above), non-profit grant proposals have the lowest scores of any genre. Genres with low scores in Dimension 1 are highly informational; they acknowledge interpersonal relations in a secondary manner. ey are written with considerable care, sometimes even being revised and rewritten, and thus they can show considerable lexical variety and informational density. In Dimension 2 (see Figure 6 above), non-profit grant proposals again have the lowest score of any genre. Genres with low scores in this dimension are similar to one another in that they do not have narrative concerns. ese nonnarrative purposes include (1) the presentation of expository information, which has few verbs and few animate referents; (2) the presentation of procedural information, which uses many imperatives and infinitival verb forms to give step-by-step description of what to do, and (3) description of actions usually in progress, that is, actions in the present tense, a straightforward and concise packaging of information. Dimension 3 (see Figure 7 above) places non-profit grant proposals higher than any other genre. Genres high in this score are informational texts that mark referents in an elaborated and explicit manner (as opposed to situated texts that depend on direct reference to, or extensive knowledge of, the physical and temporal situation or discourse production for understanding). Finally, in Dimension 4 (see Figure 8 above) – overt expression of argumentation – non-profit grant proposals place slightly above direct-mail letters. Negative scores show a text is not argumentative (i.e. the text does not consider several possibilities and then seek to convince the reader of

250 Ulla Connor and omas A. Upton

the advisability or likelihood of one of them). Instead, these texts tend to be more of a reportage of events and thus do not involve opinion or argumentation at all. However, this dimension in these grant proposals is not particularly strong (negatively), which indicates that the genre is “relatively undistinguished along this dimension…there is no general characterization as persuasive or not; rather certain texts within these genres are persuasive, while others are not” (Biber 1988:151). It is important to make the distinction here between argumentation and persuasion, as Biber (1988) tends to use these terms somewhat interchangeably when describing this dimension. He originally called Dimension 4 “Overt Expression of Persuasion,” noting that it shows “the degree to which persuasion is marked overtly, whether overt marking of the speaker’s own point of view, or an assessment of the advisability or likelihood of an event presented to persuade the addressee” (Biber 1988:111). In his 1995 text, Biber renames this dimension as “Overt Expression of Argumentation,” but still describes it as “reflecting overt argumentation or persuasion” (Biber 1995:161). As we pointed out in our study on direct mail letters (Connor and Upton 2003), Dimension 4 really does not describe persuasion as much as argumentation. Aer considering the linguistic features included in this dimension, such as prediction modals (e.g. will/would/shall), necessity modals (ought, should, must) and especially “suasive verbs” (e.g. command, insist, demand, beg), it makes sense that the grant proposals in our corpus would not score high in this dimension. Unlike editorials, which do score high in this dimension, grant proposals – like direct mail letters – do not necessarily argue a point of view or try to change the reader’s position. Instead of using verbs calling for action and arguing a point of view on logical grounds, as the suasive verbs in Biber’s list generally do, grant proposals (and direct mail letters) likely reflect the distinction that rhetoricians (Kinneavy 1971) are now making between “persuasion” and argumentation. Kinneavy (1971), most notably, has argued that persuasion involves not only traditional rational appeals (which is the realm of argumentation) but also ethical and emotional appeals. In other words, an argumentative text is classified as persuasive, but not all persuasive texts are argumentative. We would argue that the linguistic features in this dimension simply describe the degree of argumentation used in a genre, saying little about its overall level of persuasiveness. Indeed, what are grant proposals, not to mention direct mail

A corpus linguistic analysis

251

letters, if not documents intended to persuade donors to contribute money to a worthy project or cause? In conclusion, grant proposals, like direct-mail letters in our previous research, are distinguishable from other genres that linguists have studied. Grant proposals are highly informative; they are expository in nature even though they may include stories; they are non-situated as texts; and, finally, they are persuasive, if not argumentative, in nature.

Multidimensional analysis of the individual rhetorical moves In order to examine the linguistic features of individual moves, the multidimensional linguistic analysis was performed on the seven moves that occurred in these non-profit grant proposals. e expectation was that since each rhetorical move has a distinct and identifiable semantic function, these distinct functions would be realized through the use of distinct and consistent linguistic features, resulting in different dimensional scores for each move. In fact, contrary to our expectation, the dimensional analysis did not distinguish one move from another, except in one case. As shown in Table 6, the variation for each of the moves within the dimension scores was, on the whole, not particularly noteworthy. In Dimension 1, Involved vs. Informational Production, all of the scores were strongly negative, with scores ranging from –15.64 to –22.47 (overall Dimension 1 Score = –20.48), indicating that all of the moves were fairly equally informational in their structure. In Dimension 2, Narrative vs. Non-Narrative Concerns, the scores for each move were also closely grouped together, representing fairly common non-narrative structures, with scores ranging from –2.75 to –4.22 (overall Dimension 2 Score = –3.75). Again, in Dimension 3, Explicit vs. Situation-Dependent Reference, the dimension scores were fairly consistent across the moves, ranging from 6.61 to 10.49 (overall Dimension 3 Score = 8.37), indicating that all the moves used linguistic structures that provided fairly explicit reference. Only with Dimension 4, Overt Expression of Argumentation, did the different moves show interesting variation in their scores. While the overall score for Dimension 4 was -0.75, two moves – competence claim (–2.46) and territory (–3.13) – used very distinctly “non-argumentative” linguistic structures, while one move – benefits (2.09) – proved to be quite argumentative in

252 Ulla Connor and omas A. Upton

Table 6. Linguistic dimensions by rhetorical move Dimensions benefits competence claim gap goal importance claim means territory

Dimension Score

Involved Narrative vs. Explicit vs. Situa- Overt Expresvs.Informational Non-Narrative tion-Dependent sion of ArguProduction Concerns Reference mentation –21.89 –22.47 –16.00 –22.61 –17.99 –20.02 –15.64

–20.48

–3.56 –3.32 –2.50 –4.22 –3.33 –4.04 –2.75

–3.75

10.49 8.67 6.61 8.74 8.94 7.85 7.62

8.37

2.09 –2.46 –1.21 –0.19 1.69 1.50 –3.13 –0.75

Note: Only “interesting” dimension is Dimension 4, Overt Expression of Argumentation. Relatively strong in the “benefits” move. Relatively strongly lacking in “competence claim” and “territory” moves. See Table 4 above.

nature. e relationship of these two moves to the overall Dimension score is more clearly seen in Figure 8. Upon further reflection about the lack of variation in the dimension scores among the moves for Dimensions 1–3, these results do make sense. Grant proposals are very carefully edited, tightly packed, documents. is feature lead to the fairly strong “informational” score (Dimension 1) for the genre, and it is not surprising that each of the moves reflected this informational aspect as well since a lot of work goes into succinctly, but clearly, making the appeal for funding. Similarly, it is obvious that grant proposals as a whole are not narrative in nature, hence the negative score in Dimension 2, and clearly none of the individual rhetorical moves that comprise the genre have a narrative nature. e lack of significant variation in the scores in this dimension, then, is also not surprising. e same can also be said for Dimension 3; because of the need to be succinct and clear, grant writers are very explicit about the points they are making, including the exact need being addressed and the exact use for which funding will be used. It should not be unexpected that this linguistic feature would carry through each of the respective moves. e variation shown between some of the moves in Dimension 4, on the other hand, also intuitively makes sense. is dimension reflects whether or not the language that is used is argumentative in nature; that is, whether it seeks to convince the reader of the advisability or likelihood of an option.

A corpus linguistic analysis 253

While the overall genre scored as moderately lacking in overt expression of argumentation, one move in particular – benefits – actually scored fairly positive in this dimension (2.09). Again, intuitively this makes sense, as it is in this move that the grant writers try to “sell” the value of the work they seek to accomplish with the grant; they are, in fact, trying to convince the reader of the need (advisability) for the proposal. On the other hand, the territory move reflected a strong lack of “overt expression of argumentation” (–3.13), which again makes sense since the function of this move is merely to outline the context in which the grant activity will take place. e nature of this move is to be descriptive, not argumentative, as is clear from the example territory move given in Figure 3; more than half of this example seeks only to provide a brief history of the organization. Even though we anticipated greater variation between moves within each of the four Dimensions, the fact that the only notable variation that is apparent is within Dimension 4, in retrospect, seems logical. Even though each of the moves has distinct semantic goals, with regards to the multidimensional analysis, each achieves those different semantic goals using similar linguistic structures.

Discussion and implications for teaching Comparing the results of this study with our earlier study (Connor and Upton 2003), the linguistic dimensions of grant proposals very closely parallel those of direct mail letters. is suggests that philanthropic discourse may use common linguistic features across genres to pursue a common goal (i.e. to persuade recipients to donate/award funds). ese common features reflect strongly informational, closely edited texts that use an explicit, precise and non-narrative structure to appeal to potential donors. Further, the linguistic dimensions of the rhetorical moves do not seem to vary from the overall linguistic dimensions of the genre, with the notable exception of two moves in Dimension 4. Clearly there must be something other than linguistic features that accounts for the obvious variation in purpose reflected by the different moves. ese results underscore the necessity to study more closely the semantic dimensions of moves. is is consistent with genre theory in general. As Swales (1981) argues in his original work on rhetorical moves in research articles, individual moves have specific functions as

254 Ulla Connor and omas A. Upton

they develop together the overall purpose of a piece of text or a genre. Swales goes on to show how specific words and their collocations characterize individual moves. For example, the move gap is associated with expressions such as “there is a problem,” “not enough is known yet about so and so,” or “more needs to be done.” We believe that moves in a grant proposal should also be studied for such frequently occurring words and expressions. at knowledge will assist in the operationalization of moves. As noted at the start of this chapter, the overall purpose of this study was to highlight key areas in which the genre of non-profit grant proposals proves to be distinctive. A primary motivation for understanding not only non-profit grant proposals but also other non-profit genres like direct mail letters (Upton 2002) is to be able to train novice writers in the genre, including non-native speakers of English, to effectively use the common rhetorical moves and linguistic structures that work together to create a persuasive document. Just as Swales’ (1981) study of the rhetorical moves of research studies has provided a concrete tool for teaching novice writers what to look for when reading and what to include when writing research articles, we believe the rhetorical move structure model presented in this chapter provides an equally useful tool for those who read and write non-profit grant proposals. Similarly, instructing novice writers of the genre on the linguistic structures (i.e. “dimensions”) that are typical to non-profit grant proposals should also help them in both evaluating and writing effective proposals. In short, the purpose for seeking to understand the language of the genre is ultimately to improve the training of novice writers and to encourage the development of better and more effective grant proposals. is study provides an important contribution to those ends by more clearly outlining some rhetorical and linguistic features common to the genre.

Notes 1. e positive and negative designations are used only to distinguish between the two opposing ends of the continuum for a dimension; the values do not signify a greater – or lesser – than relationship between the two poles. 2. We are grateful to Doug Biber for assisting us with the computer analysis for this stage of the study.

A corpus linguistic analysis 255

References Bhatia, V. K. 1993. “Simplification vs. easification: The case of legal texts.” Applied Linguistics 4(1):42–54. Biber, D. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press. Biber, D. 1995. Dimensions of Register Variation: A Cross-linguistic Comparison. Cambridge, Mass: Cambridge University Press. Connor, U. 2000. “Variation in rhetorical moves in grant proposals of US humanists and scientists.” Text 20(1):1–28. Connor, U., & Upton, T.A. 2003. “Linguistic dimensions of direct mail letters. In C. Meyers & P. Leistyna (eds), Corpus Analysis: Language Structure and Language Use, 71–86. Amsterdam: Rodopi. Connor, U., & Mauranen, A. 1999. “Linguistic analysis of grant proposals: European Union research grants.” English for Specific Purposes 18(1):47–62. Conrad, S. 1996. “Investigating academic texts with corpus-based techniques: An example from Biology.” Linguistics and Education 8:299–326. Flowerdew, L. 1998. “Corpus linguistic techniques applied to textlinguistics.” System 26: 541–552. Kinneavy, J. 1971. Theory of Discourse. Englewood Cliffs, NJ: Prentice-Hall. Myers, G. 1990. Writing Biology: Texts in the Social Construction of Scientific Knowledge. Madison, WI: University of Wisconsin Press. Swales, J. 1981. Aspects of Article Introduction. Birmingham, UK: The University of Aston, Language Studies Unit. Swales, J. 1990. Genre Analysis: English in Academic and Research Settings. Cambridge: Cambridge University Press. Upton, T. A. 2002. “Understanding direct mail letters as a genre.” International Journal of Corpus Linguistics 7(1):65–85. Upton, T. A., & Connor, U. 2001. “Using computerized corpus analysis to investigate the textlinguistic discourse moves of a genre.” English for Specific Purposes: An International Journal 20:313–329.

256 Ulla Connor and omas A. Upton

Rhetorical appeals in fundraising direct mail letters 257

Rhetorical appeals in fundraising direct mail letters Ulla Connor and Kostya Gladkov Indiana University Purdue University at Indianapolis

Background As the previous chapter by Connor and Upton showed, philanthropic discourse is persuasive in nature. Fundraising letters represent the most common form of philanthropic discourse. ey sell charitable objectives to donors. e letters need to arouse the readers’ interest, and convince them of the worthiness of the cause for which he/she is donating money. People want to be wise in making investments of their money. Direct mail letters in non-profit fundraising well represent the genre of promotional texts. eir purpose is to sell a product, in the case of a direct mail fundraising letter, a good cause. Despite their importance, they are not well understood. A whole industry has developed around direct mail letters in non-profits as fundraising experts offer advice for fundraisers in books and newsletters and how to make them more effective. However, a great deal of emphasis is put on the physical appearance of the letter, saying little on how to structure the text itself. Language use, with the exception of reference to the you emphasis popular in American business letter writing, does not appear to be an important consideration in most “how to” resources. Even though donor segmentation is recommended, no specific advice is given about how to appeal to specific audiences. Linguists’ interest in the direct mail letters is relatively new. e first published research by linguists on the direct mail letter for fundraising purposes was done by Mann and ompson (1992), whose edited volume provided analyses of a single direct mail letter by a number of well-known linguists.

258 Ulla Connor and Kostya Gladkov

Mann and ompson’s volume showcased the merits of particular linguistic/rhetorical analyses (such as the Rhetorical Structure eory and the Topical Structure Analysis). However, the purpose of the volume was not necessarily advancing knowledge about the fundraising letter as a text type. Abelen, Redeker, and ompson’s (1993) treatise comparing the rhetorical structure of U.S. and Dutch fundraising letters, on the other hand, was really the first to provide a great deal of valuable information about the direct mail fundraising letter as an object of linguistic/rhetorical study. In order to understand the nature of philanthropic discourse, the Indiana Center of Intercultural Communication (ICIC) Corpus of Philanthropic Fundraising Discourse was collected, serving as a resource for several recent publications. Among them, Upton (2002) studied the rhetorical moves structures in 245 fundraising letters of the corpus and suggested a prototypical successful letter. Connor and Upton (2003) analyzed the same letters using Biber’s multidimensional analysis. eir results showed that fundraising letters resemble other planned written discourse such as academic papers rather than informal and interactive discourse. ese previous ICIC linguistic studies have contributed a great deal to our understanding of fundraising letters as a unique genre. However, persuasion in those letters has not been the focus of study in previous ICIC research. As is known from scholarship on persuasion, recipients of letters can be persuaded through three kinds of evidence: rational (logos), credibility (ethos), or affective/emotional (pathos). Persuasion, as argued by Burke (1969), Young, Becker, Pike (1970), and Lauer et al. (1985), integrates these three appeals in the effort to effect cooperation and identification with the audience. e goals of the present study were to develop an operational system of the three persuasive appeals (logos, ethos, and pathos) that would help researchers and practitioners examine fundraising discourse from a rhetorical and linguistic standpoint. e developed system was applied to a corpus to 245 fundraising direct mail letters from the ICIC fundraising corpus described in more detail by Connor and Upton (see Connor and Upton this volume). is paper will first describe methods used to set up a rhetorical appeals system. Second, it will provide the results of the application of the rhetorical appeals system to 245 direct mail letters, followed by the discussion of the study. Finally, results from some preliminary analyses of word frequencies and key words in the appeals will be presented.

Rhetorical appeals in fundraising direct mail letters 259

Development of the rhetorical appeals system In order to evaluate the degree of persuasion in the direct mail letters, a working system of appeals was needed. Such a system was found in Connor and Lauer’s (1985) work on persuasive writing. is system of persuasive strategies was designed and successfully used for teaching and evaluating collegelevel students’ argumentative essays. It includes 23 persuasive appeals with 14 rational appeals (logos), 4 credibility appeals (ethos), and 5 affective appeals (pathos). e direct sources of the appeals were the works of Aristotle and the “new rhetorician” Chaim Perelman (1982). e Connor and Lauer (1985) system is shown in Figure 1. Rational Appeals 1. Descriptive Example 2. Narrative Example 3. Classification, including definition 4. Comparison, including analogy 5. Contrast 6. Degree 7. Authority 8. Cause/Effect 9. Model 10. Stage in Process 11. Means/End 12. Consequences 13. Ideal or Principle 14. Information Credibility Appeals 15. First Hand Experience 16. Showing Writer’s Respect for Audience’s Interests and Point of View 17. Showing Writer-Audience Shared Interests and Points of View 18. Showing Writer’s Good Character and/or Judgment Affective Appeals 19. Emotion in Audience’s Situation 20. Empathy with Audience 21. Audiences Values 22. Vivid Picture 23. Charged Language Figure 1. Connor and Lauer’s (1985) system of persuasive appeals

260 Ulla Connor and Kostya Gladkov

e system was applied to the corpus of 245 direct mail letters. A team of three researchers trained in the Connor and Lauer system identified appeals in the letters. e raters worked separately. Aer the individual identification of the appeals was finished, the researchers together discussed the applicability of the system. Since Connor and Lauer’s system of persuasive strategies was developed for persuasive essay writing and not for the purpose of analyzing fundraising discourse, adjustments were needed. It was decided to combine three of Connor and Lauer’s rational appeals – cause/effect (R8), means/end (R11), and consequences (R12) – into one. In fundraising letters, these three appeals oen merge and are similar to each other. erefore, in the current system it was deemed necessary to combine them into an appeal titled: Cause/Effect, Means/End, Consequences (R8). Furthermore, affective appeals A19 (Emotions in Audience’s Situation) and A20 (Empathy with the Audience) in Connor and Lauer’s system do not appear in our system because there was not enough evidence in the direct mail letters to support them as self-standing appeals. Rather, Emotions and Empathy were found to be embedded in other appeals, such as Vivid Picture, Charged Language and Showing Writer’s Respect for Audience’s Interests and Points of View. As a result, the negotiation of the differences as well as the consideration of the overall distribution of appeals resulted in the modification of Connor and Lauer’s (1985) system of 23 appeals into a system of 19 appeals, as shown in Figure 2. Aer the finalization of the appeals to be used in the analysis, two researchers analyzed a sample of 50 letters from the 245 direct mail letters in order to ascertain a coefficient of interrater reliability and to test the applicability of the modified system. e total number of appeals identified was 463. e number of appeals with disagreement was only 38, which resulted in a high reliability coefficient of r = .92. Some of the initial definitions of appeals were further refined in order to better describe fundraising discourse. Definitions and examples of the appeals in the developed system can be found in Appendix A. e following section describes the final system and the theoretical basis for it.

Rational appeals (logos) “Persuasion is effected by the arguments, when we demonstrate the truth, real or apparent” (Aristotle 1932:xlii). Rational arguments are designed to appeal

Rhetorical appeals in fundraising direct mail letters 261

Rational Appeals R1. Descriptive Example R2. Narrative Example R3. Classification R4. Comparison R5. Contrast R6. Degree R7. Authority R8. Cause/Effect – Means/End – Consequences R9. Model R10. Stage in Process R11. Ideal or Principle R12. Information Credibility C13. First-Hand Experience C14. Showing Writer’s Respect for Audience’s Interests and Points of View C15. Showing Writer-Audience Shared Interests and Points of View C16. Showing Writer’s Good Character and/or Judgment Affective A17. Appealing to the Audience’s Views A18. Vivid Picture A19. Charged Language Figure 2. List of rhetorical appeals for analyzing fundraising letters

to the sensible and rational aspect of the reader’s mind. e first type of argument in Aristotelian persuasion is Arguing by Example. “is means of persuasion corresponds to the process of induction and induction is the basis of all reasoning” (Aristotle 1932:147). By induction, the author means deriving a general assumption from a particular case. For instance, in a direct mail letter, the writer uses a compelling narrative example, which contains a beginning, middle, and end of a story. Ted is a single father with three children under 10. He’s never been on welfare and he’s always had a job doing manual labor… There was a time when he felt like he had no choice but to tolerate his wife’s constant abuse and neglect of their children. Then Ted decided the children deserved a chance to start over in another town, no matter how difficult it might prove to be.

e author of this description is trying to make the reader see the dreadfulness

262 Ulla Connor and Kostya Gladkov

of the situation. Moreover, reading this example describing one family, the reader, according to the logical rule of induction, infers a general conclusion that such an example is true of a number of families. Such a conclusion makes the reader willing to react to this appeal. As Aristotle (1932:14) mentioned, the argument by example concerns the relation of like to like; consequently, when two things fall under the same type, one can be an example, and the other is exemplified. Depicting a family of a certain type, the author exemplifies all families of this particular type, thus intensifying the effect of the appeal by implying that the number of unhappy families is actually bigger than just one. As one can see, Aristotelian Argument by Example corresponds to the appeal of Descriptive Example (R1) as well as Narrative Example (R2) in our system. Another type of argument found in Aristotle’s theory is the argument of Classification (R3). is kind of argument places a person or a thing into a certain class and then offers defining features. As Aristotle explained, a man defines himself as being akin to a certain class of noble people so that his deeds would seem more noble (Aristotle 1932:163). Our example of Classification is as follows: “In joining SCS, you join the ranks of those who believe that bringing art and art education to the city makes life better, richer, and more rewarding for the entire community.” By making this rational appeal, the writer classifies and then defines the reader as a member of a noble group by making him akin to a limited circle of noble and distinguished individuals. e next two appeals, Comparison (R4) and Contrast (R5), build a logical argument on the relationship of like to like. In a fundraising letter, the appeal of Comparison sounds as follows: “Our faculty-student ratio is 1:27. For law schools in the United States, the range of faculty-student ratios is from 1:13 to 1:35, but well over half of the law schools in the country have better ratios than we do.” Comparison supports a conclusion about a subject from a description of a related subject; as in our example, the conclusion about one law school can be made from descriptions of other law schools. Unlike Comparison, the appeal of Contrast supports conclusion on a subject by describing its counterpart. For instance: Unfortunately, our view of the importance of philanthropy is not shared by all Americans. Many see philanthropy as no more than the grand gestures of the rich. They do not understand, as you do, that the museums, parks,

Rhetorical appeals in fundraising direct mail letters 263

hospitals and community organizations supported by philanthropy are the cornerstones of our very quality of life.

In this example, the writer’s opinion of the donor is raised by denigrating his/ her counterparts – people who do not donate. Rational appeal of Degree (R6) in Aristotle’s original theory is called an argument of More or Less. e rational principle of this argument, according to Aristotle (1932:161), is expressed by the following example: if the less frequent thing occurs, then the more frequent thing would occur. In fundraising discourse, one comes across the appeal of Degree in the form of asking for an increase in donations. For instance: “Please consider an increase in your contribution to the Girls Scout Annual Campaign”. By employing this appeal, the writer implies that if the donor has already given X amount of money. e next logical step would then be to increase their donation. One type of argument based on Perelman’s category of person is the appeal of Authority (R7) (Perelman 1982). e argument of Authority relies on the consistency between a person and his/her activities. In the argument from the authority, prestige is the quality that leads others to imitate acts of authoritative people. e Authority appeal in fundraising discourse employs a distinguished name to make the reader act under the influence of someone who is authoritative. For example: “Pat LaCrosse asked me to send this information inviting you to join the Georgia O’Keefe Circle of the Indianapolis Museum of Art’s Second Century Society (SCS).” e author used the name of Pat LaCrosse without explaining who this person is. e author assumes that the reader will be acquainted with Pat LaCrosse and will consider his actions authoritative. “e example of the great is a rhetorician of such power that it can persuade people to commit the most infamous acts” (Perelman 1982:217). An important name brings the flavor of authoritativeness to the discourse and makes it even more persuasive by presenting a model to be imitated by the reader. e appeal of Cause/Effect – Means/End – Consequences (R8) stems from both Aristotle and Perelman’s (Perelman 1982:83) theories. According to Aristotle, “Since it commonly happens that a given thing has consequences both good and bad, you may argue from these [to their antecedents] in urging or dissuading, in prosecuting or defending, in praising or blaming” (Aristotle 1932:166). is appeal helps the writer to urge action on the reader’s part by forecasting effects, consequences, or ends. Perelman adds, “Consequences can be observed or foreseen, ascertained or presumed. It is the truth of an idea

264 Ulla Connor and Kostya Gladkov

that can only be judged by its effects” (Perelman 1982:83). us, the writers of direct mail letters oen employ the Cause/Effect – Means/End – Consequences appeal to let the reader evaluate an event through its described outcomes. For instance: “As one of only a few zoos in the country that receives no local, state, or federal tax support, IZS must depend on donations for general operating funds from corporations like yours…” Here, the reader is urged to contribute in order to supply necessary funds to the organization that “receives no local, state, or federal tax support,” and as a consequence – “must depend on donations…” e appeal of Model (R9), as discussed by Perelman, provides the reader with a description of the way a proposed end can be achieved. A working model reflects and supports the current case by a precedent. For instance: A group of your colleagues recently volunteered to help set the priorities for this campaign. They surveyed members of the staff and faculty councils, administrators and others and learned that we at IUPUI have a number of vital concerns.

Here, the author gives the reader a precedent – “A group of your colleagues recently volunteered…” – to make him/her follow this model and take the same actions. Stage in Process (R10) is also an important argument in the theory of persuasion. According to Perelman, this appeal is used when a gap exists between the concept accepted by the audience and the proposal the writer is defending. e gap is closed by showing how the proposed action can be a stage in a process. “Instead of going from A to D, one offers to lead the interlocutor first to B than to C and finally to D” (Perelman 1982:87). In other words, when the audience might think that the distance between the initial and final stage or goal of the process is impossible to cover, the writer creates one or more middle stages or transitional goals, which, in audience’s opinion would be easier to reach. For instance: Three years ago, the Heritage Trust set aside land for the restoration of the Limberlost Swamp, near Geneva in eastern Indiana. Now, wildlife is returning to the area. Egrets, ducks and geese now gather at waterfowl resting ponds in large numbers; and native prairie grass has been planted to return natural diversity and other wildlife to the area.

Before the author indicates the final step, which in this case would be “to return natural diversity…to the area”, he reviews what steps have been taken

Rhetorical appeals in fundraising direct mail letters 265

– “set aside land for restoration…, native prairie grass has been planted,” – in the long process of achieving the final goal – “to return natural diversity…to the area.” e rational appeal of Ideal or Principle (R11) also helps persuade readers. “A convincing discourse is one whose premises are universalizable, that is acceptable in principle to all members of the universal audience” (Perelman 1982:18). While persuading the audience, the writer should show that his/her argument is based on a universal principle that is accepted by all members of the audience. As Perelman (1982:20) noted further, universal values play a key role in argumentation because they allow us to present specific values as more determined and, thus, logically easier to establish. In the fundraising letters, an example of this appeal occurs as follows: The mission of the Indianapolis Zoological Society is to provide recreational learning experiences for the citizens of Indiana through the exhibition and presentation of natural environment in a way to foster a sense of discovery, stewardship, and the need to preserve the Earth’s plants and animals. In short, the Society is about connecting animals, plants and people.

In this example, the writer establishes a specific value: “…providing recreational learning experience…through the exhibition of natural environment” under a universal value – “connecting animals, plants and people.” If all members of the audience agree on the fact that bringing animals, plants and people together is valuable, then they would more quickly agree on a more determined value of learning the environment through the exhibition. e last rational appeal, Information (R12), also contributes to successful persuasion. “e speaker must, first of all, be provided with a special selection of premises (facts)… e more facts he has at his command, the more easily he will make the point” (Aristotle 1932:157). e appeal of Information presents facts and statistics and gives definiteness to the writer’s argument. e writers of fundraising letters must persuade the audience not by vague generalities, but by providing the reader with accurate and meaningful numbers. For example: Through the efforts of about 300 volunteers, nearly $89,900 was raised through the IUPUI Campus Campaign. Almost 900 of us made new gifts in support of the things we care about. Together with those who were already donors, there are over 1,350 staff and faculty supporting the work of IUPUI with their gifts.

e numbers in this paragraph show the reader the definiteness of the writer’s point on the one hand, and on the other they demonstrate that the writer is

266 Ulla Connor and Kostya Gladkov

knowledgeable on the subject. To conclude this section on rational appeals, it can be stated that rational appeals are used to target the logical and rational side of the audience’s mind (logos). ese twelve arguments are employed by the writers to demonstrate the truth to the reader in a persuasive way. As Perelman (1982:13) noted, one of the aims of persuasive discourse, and, consequently, of fundraising discourse, is to make the reader admit the truth and to provoke him/her to take an immediate or eventual action. However, one should remember that, apart from logos, persuasion is also effected through ethos, the character of the writer.

Credibility appeals (ethos) According to Aristotle, the discourse must not only convince through the argument, it must create a trustworthy image of the speaker. The character of the speaker is a cause of persuasion when the speech is so uttered as to make him worthy of belief; for as a rule we trust men of probity more, and more quickly about things in general, while on points outside the realm of exact knowledge, where opinion is divided we trust them absolutely. (Aristotle 1932:8)

In fundraising discourse, the writer plays an important role because the goal of direct mail letters is to elicit a response from the audience in the form of giving money to a particular non-profit organization. It is almost always the case in direct mail letters that the organization is represented by the writer. Since the trustworthiness and reliability of the organization can be a crucial factor in the donor’s decision whether to give money or not, then it is the writer’s responsibility to create such an image of him/herself and the institution in the letter that he/she would be thought of as a reliable and unfailing person. According to Aristotle, the speaker should be a person of intelligence, virtue, and good will (Aristotle 1932:92). In fundraising discourse, there are four appeals that are used to create in the audience a positive attitude toward the writer as a person of intelligence, virtue and good will. e first credibility appeal is the appeal of First Hand Experience (C13). In fundraising discourse, it is used as a technique for providing information directly from the writer’s experiences, thus, establishing the writer’s credibility; it gives the impression that the writer is knowledgeable and versed on the subject he/she is writing about. An example follows:

Rhetorical appeals in fundraising direct mail letters 267

Purdue has been a part of my life for as long as I can remember. I was raised in West Lafayette. As I grew older, I realized more and more that Purdue isn’t just a state institution; it is a public university. Moreover, it is a world-class university.

is example indicates that the writer is a knowledgeable person, who knows and cares about Purdue University. us, the author of the letter tries to create an impression of him/herself as an individual of intelligence and virtue through the display of deep respect and gratitude for the place where he/she was educated: “Purdue isn’t just a state institution; it is a public university…it is a world class university.” e next appeal centers on Showing Writer’s Respect for Audience’s Interests and Points of View (C14) and is employed to create the necessary impression of a good willed writer in the audience’s mind. is appeal oen takes the form of the writer’s appreciation for what the donors have done for the organization. For example, In looking back at the last decade, we at the Indianapolis Zoological Society (IZS) wish to express our sincere thanks to all companies who have helped us to achieve so many successes at the Indianapolis Zoo and White River Gardens.

Since he/she is so appreciative of the noble and virtuous deeds of others, the audience would consider the writer as a man of good will. When a writer acknowledges shared values and ideas that are held with the audience, this reflects the credibility appeal. Using this appeal, the author builds up solidarity with the audience by making himself a part of it. For example: Because if you and I truly want to preserve philanthropy as a way of life, we must make certain that Americans everywhere take philanthropy seriously, that they talk about it, debate it, challenge it, and ultimately keep it alive as a cherished tradition.

e last of the credibility appeals is based on Showing Writer’s Good Character and/or Judgment (C16). It implies the same Aristotelian ideas of intelligence, virtue, and good will, but is focused on the creation of the image of the writer. In the case of this appeal, the author may take a subjective stance to make a judgment. For example: “Who helps Randy break a cycle of violence and become a better dad? Who helps Michael, who has spina bifida, learn to talk, dress himself, and get around independently? Without you, no one.” Such a

268 Ulla Connor and Kostya Gladkov

judgment should work towards contributing to the positive image of the writer in the reader’s eye. By making positive comments about the reader, a positive helping character, the writer urges the reader to view him/her as a person of good intentions, because it takes good will to notice and appreciate the good deeds of others. To conclude the discussion about credibility appeals (ethos), it can be stated that persuasion cannot be effective without taking into consideration the role of the writer’s image. According to Aristotle, the writer should make the audience see him/her as an individual of three basic merits: intelligence, virtue, and good will. So far, we have talked about the role of rationality and credibility in the theory of persuasion; however, Aristotle defined a third essential aspect of persuasion theory, namely, emotional or affective appeals (pathos).

Affective appeals (pathos) “Persuasion is effected through the audience when they are brought by the speech into a state of emotion; for we give very different decisions under the sway of pain or joy, liking or hatred” (Aristotle 1932:9). Emotions can serve as an impulse to take a certain action, and very oen the audience will look at the presented case through the prism of their emotions. As Aristotle mentioned, to the audience that is eager and hopeful, the proposed object will seem as a valuable and worthy thing, while to the audience that is pessimistic and distrustful, the same object will seem the opposite (1932:91). e following discussion presents the three appeals that are used in fundraising discourse to target the emotional aspect of audience’s mind. Appealing to the Audience’s Views (A17) arouses emotions in the reader by addressing his/her attitudinal and moral values. In fundraising discourse, this appeal can take the form of a direct request to donate, for this or that reason. For instance: “Please make a tax-deductible gi to Community Centers of Indianapolis in 1999, and know that is playing an important part in meeting the needs of its community.” In this example, the author makes an emotional appeal to donate followed by a reason for the donation. e word tax-deductible also appeals to the audience’s values, suggesting that the donor also may profit by way of cutting his/her tax. e next affective appeal, Vivid Picture (A18), is very important to persuasion theory in the sense that it creates the effect of the presence of a reader in a situation depicted by the writer. “It is when suffering seems near to them

Rhetorical appeals in fundraising direct mail letters 269

that men pity; as for disasters that are ten thousand years off in the past or the future, men cannot remember or anticipate them and either feel no pity at all for them” (Aristotle 1932:122). According to Aristotle, the more temporarily and spatially close the event is to the audience, the more emotionally involved the audience will be. Consequently, the writer, trying to persuade the audience, needs to bring an object as close to the audience as possible. For example: Do you remember how wonderful and how proud you felt in 1980 when the young United States Hockey Team beat the powerful Soviet Team 4–3, and then went on to beat Finland 4–2 for the gold… or in 1984 when 16 year old Mary Lou Retton, needing a 9.95 in her final event to tie for first place in the all around Gymnastics competition, vaulted her way to the gold by scoring a perfect 10?

e effect of presence acts upon the reader’s sensibility. Putting the statement into the form of a question involves the reader and makes him/her look for the answer and thus, makes him present at the event that took place long ago. Another aspect of this appeal in this letter is the focus on details: “beat…the team 4–3 …16 year old…needing 9.95…scoring 10.” Perelman (1982:37) noted that “it is useful to insist upon certain elements; in prolonging the attention; given them, their presence in the consciousness of the audience is increased.” Dwelling on the details creates desired emotions in the reader. us, creating a Vivid Picture is an essential appeal to arouse desired emotions in the reader. e last appeal in the system, Charged Language (A19) is the appeal that usually arouses emotions of anger and indignation. e language that is used by the writer to evince those emotions has a negative connotation. As Aristotle (1932:122) said, the writer should “heighten the effect of his description with fitting attitudes, tones, and dress.” e emotions should be appropriate to the subject, and if the writer wants the audience to experience anger, he needs to be angry in his language. For instance: “When it comes to the misuse and destruction of our natural areas, reality is not only harsh, it is deadly. Once they are developed or altered, and their fragile ecosystems are disrupted, we lose them forever.” Such words as “misuse,” “destruction,“ “harsh,” “deadly,” “lose…forever” are charged with negative emotions. While employing such an “angry” description, the writer attempts to make the audience experience relevant emotion. Consequently, being in a relevant emotional condition, the readers might take a relevant action.

270 Ulla Connor and Kostya Gladkov

Application of the system to the ICIC fundraising letters data e ICIC Fundraising Corpus was described in the previous chapter. Each of the 245 letters in the corpus was scanned into a computer and double-checked for accuracy. Each letter was then coded to indicate non-profit field, organization, and organization size (based on income). is information was obtained through questionnaires and interviews conducted with the agencies represented in the corpus. e system of 19 rhetorical appeals was applied to the 245 fundraising letters. First, a sample of 12 letters was evaluated to ensure a high coefficient of interrater reliability. ree trained researchers worked separately on the identification of appeals in their copies of the sample. Aer the negotiation of differences in the analysis and finalization of the system, another sample of 50 letters was analyzed to test the level of agreement among the raters. Aer all the discrepancies were negotiated, the other 183 direct mail letters were put to analysis. Each occurrence of a particular appeal in the letters was identified, coded, and then, manually counted. e results of the analysis are presented in the following section.

Results and discussion e results are shown in Tables 1 and 2. e overall number of appeals in the 245 sample letters was 1,829. Table 1 shows the breakdown of numbers and percentages of appeals by appeal type (rational, credibility, and affective) and by non-profit field. e overall percentage of rational appeals in all letters was 48% percent; the corresponding percentages for credibility and affective appeals were 25% and 28% percent, respectively. Table 1 indicates that the use of rational appeal was quite consistent across all six fields; however, the high amounts of use in the Health and Human Services and Environment fields (55% and 47%) were unexpected, since one might expect more emotional appeals in these fields. Concerning the use of credibility appeals, Table 1 shows that Education had the highest percentage of credibility appeals (30%), Health and Human Services the lowest percentage (18%), with Environment (23%), Community Development (24%), Arts and Culture (27%), and Other Organizations (25%) falling in between. e high percentage of credibility appeals in Education

Rhetorical appeals in fundraising direct mail letters 271

Table 1. Appeals counts totals and percentages in 245 letters by non-profit field Rational appeals

Credibility appeals

Affective appeals

TOTAL

870 48%

453 25%

606 28%

1829

320 55%

104 18%

153 27%

577

43 47%

21 23%

27 30%

91

Community Development (10 letters)

34 49%

17 24%

19 27%

70

Education (108 letters)

316 44%

214 30%

193 27%

723

Arts and Culture (37 letters)

138 44%

83 27%

91 29%

312

Other (6 letters)

19 34%

14 25%

23 41%

56

All Letters Total Appeals (245 letters) Health and Human Services (74 letters) Environment (10 letters)

reflects the relationship between the writer representing educational agencies and the target audience. Most of the letters in the corpus come from Indiana University schools, such as the School of Dentistry, the School of Law, and the School of Liberal Arts. ese letters were addressed to former students of IU and were, for the most part, authored by faculty personally acquainted with the addressees. In the letters, the writers stress the interpersonal connection with students so that the students would find the information in the letter credible and, thus, more appealing. As indicated in Table 1, 41% of the appeals in the letters representing Other Organizations were affective appeals, which is significantly higher than any other field. e letters in this category represent mainly religious organizations such as Saint Meinrad, the Church Federation of Greater Indianapolis, Elijah

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R Total C13 C14 C15 C16 C Total A17 A18 A19 A Total Total

Appeal

23 15 9 2 11 26 6 74 1 20 18 115 320 22 63 10 9 104 107 35 11 153 577

3.99% 2.60% 1.56% 0.35% 1.91% 4.51% 1.04% 12.82% 0.17% 3.47% 3.12% 19.93% 55.46% 3.81% 10.92% 1.73% 1.56% 18.02% 18.54% 6.07% 1.91% 26.25%

Health and Human Services

1 2 3 0 1 3 3 7 0 2 0 21 43 1 15 2 3 21 18 7 2 27 91

1.10% 2.20% 3.30% 0.00% 1.10% 3.30% 3.30% 7.69% 0.00% 2.20% 0.00% 23.00% 47.25% 1.10% 16.48% 2.20% 3.30% 23.08% 19.78% 7.69% 2.20% 29.67%

Environment

2 3 2 0 0 3 5 4 0 3 3 10 34 5 8 0 4 17 13 4 2 19 70

2.86% 4.29% 1.43% 0.00% 0.00% 4.29% 7.14% 5.71% 0.00% 4.29% 4.29% 14.29% 48.57% 7.14% 11.43% 0.00% 5.71% 24.29% 18.57% 5.71% 2.86% 27.14%

Community Development 7 2 16 5 4 19 20 67 4 46 16 110 316 56 124 23 11 214 161 30 2 193 723

Education

0.97% 0.28% 2.21% 0.69% 0.55% 2.63% 2.77% 9.27% 0.55% 6.36% 2.21% 15.21% 43.71% 7.75% 17.15% 3.18% 1.52% 29.60% 22.27% 4.15% 0.28% 26.69%

Table 2. Individual appeals counts and percentages in 245 letters by non-profit field

2 1 7 5 2 10 9 34 0 7 2 59 138 16 61 4 2 83 73 18 0 91 312

0.64% 0.32% 2.24% 1.60% 0.64% 3.21% 2.88% 10.90% 0.00% 2.24% 0.64% 18.91% 44.23% 5.13% 19.55% 1.28% 0.64% 26.60% 23.40% 5.77% 0.00% 29.17%

Arts and Culture

0 0 0 0 2 0 4 2 0 2 0 9 19 2 10 1 1 14 15 6 2 23 56

Other

0.00% 0.00% 0.00% 0.00% 3.57% 0.00% 7.14% 3.57% 0.00% 3.57% 0.00% 16.07% 33.93% 3.57% 17.86% 1.79% 1.79% 25.00% 26.79% 10.71% 3.57% 41.01%

35 23 36 12 20 61 47 188 5 80 39 324 870 102 281 40 30 453 387 100 19 506 1829

Total

1.91% 1.26% 1.97% 0.66% 1.09% 3.34% 2.57% 10.28% 0.27% 4.37% 2.13% 17.71% 47.57% 5.58% 15.36% 2.19% 1.64% 24.77% 21.16% 5.47% 1.04% 27.67%

272 Ulla Connor and Kostya Gladkov

Rhetorical appeals in fundraising direct mail letters 273

House, etc. As the appeals analysis demonstrates, writers for these mostly religious organizations address the audience through the extensive use of affective appeals. Concerning the use of individual appeals in the letters, Table 2 shows separate appeal counts and percentages. As Table 2 indicates, among rational appeals, appeals R8 (Cause/Effect – Means/End – Consequences) and R12 (Information) had the highest percentages:10.28 % and 17.71%, respectively. e credibility appeal that occurred most oen was appeal C14 (Showing Writer’s Respect for Audience’s Interests and Points of View); the percentage was 15.36%, as compared to 5.58% in the case of C13 (First hand experience), 2.19% of C15 (Showing Writer-Audience Shared Interests and Points of View), and 1.64% of C16 (Showing Writer’s Good Character and/or Judgment). Among affective appeals, the highest percentage (21.16%) occurred with appeal A17 (Appealing to the Audience’s Views). e other two types of affective appeals A18 (Vivid Picture) and A19 (Charged Language) appeared in the letters much less frequently:5.47% and 1.04%, respectively. ese numbers were quite consistent across fields. In summary, the results of the application of the rhetorical appeals system to 245 fundraising letters show that all three major types of appeals are used: rational, credibility, and affective. However, the extent of the use of these appeals in the letters is not equal. As the study shows, the writers of fundraising letters choose, for the most part, to persuade the audience through the use of the rational appeal. As a matter of fact, in some of the non-profit fields, it is used almost twice as much as credibility and affective appeals combined. As far as the use of individual appeals is concerned, we can conclude that the most extensively used rational appeals are the ones that provide the audience with the beneficial results or consequences of a particular philanthropic program (R8), and those appeals that provide the reader with information about the organization (R12). e credibility of the organization is most oen achieved when the writers demonstrate appreciation of the donor’s past actions (C14) and less oen by stressing organization-donor shared interests and goals (C15). Among the affective appeals, the emotional appeal to the donor’s views and attitudes, which in fundraising texts takes the form of a direct request for donation, stands out as the most frequently used individual appeal.

274 Ulla Connor and Kostya Gladkov

Computerized analysis of letters To begin to answer questions about whether fundraising letters are semantically different from other texts and whether individual rhetorical appeals can be characterized by the words used in them, a computerized analysis with WordSmith Tools 3.0 soware was used. e sample of 245 letters was sorted into two different electronic versions. In one version the beginning and the end of the rhetorical appeals were indicated with the word begin (name of the appeal) and end (name of the appeal), respectively. us, the rhetorical appeals were indicated by tags in each of the fundraising letters. Another version included the rhetorical appeals sorted by type. For this version, the rhetorical appeals of the same type were taken from all original letters and put into a separate electronic text file. For example, part of a text that starts with begin…(name of the appeal) and ends with end…(name of the same appeal) was copied from every letter where such an appeal was found and pasted into a text document. As a result, we had 19 text files, named aer each individual rhetorical appeal. e first version was necessary for running the word frequency counts on all of the letters and to compare WordSmith generated frequency lists of six non-profit sectors to the frequency wordlist of the Bank of General English, as cited by Hunston (2002, p. 5). Frequency wordlists comparisons show the differences in the frequency of the words used in different corpora. In this research, we compared the frequency wordlists of a non-profit corpus with the Bank of General English. e second version of separated rhetorical appeals gave us the opportunity to run frequency wordlist analysis on separate appeals and to examine frequency differences in the use of lexical items between the Bank of General English and each individual appeal. Also, this version allows for the keywords analysis of the individual rhetorical appeals. Keywords analysis displays how one corpus is different from the reference corpus. In our case, keywords were generated by comparing wordlists of each individual appeal to the general wordlist of all fundraising letters in the sample.

Rhetorical appeals in fundraising direct mail letters 275

Table 3. Frequency lists of British National Corpus and the fundraising letters by nonprofit field British National Corpus

Fundraising Health/Human Community Environment Corpus Services Development

Arts/ Culture

THE OF TO AND IN THAT S IS IT FOR I WAS ON HE WITH AS YOU BE AT BY BUT HAVE ARE HIS FROM THEY THIS NOT HAD HAS AN WE N’T OR

THE TO AND OF A IN FOR YOU YOUR OUR THAT IS WE ARE WITH WILL AS I THIS HAVE BE AT SCHOOL ON BY SUPPORT FROM AN CAN OR HELP YEAR IT MORE

THE OF TO AND IN A FOR YOU OUR YOUR SCHOOL IS WE THAT ARE WILL AS THIS BE I WITH HAVE STUDENTS AT LAW SUPPORT BY ON AN FROM OR YEAR ALUMNI INDIANA

THE AND TO OF A IN FOR YOU YOUR OUR THAT WE AREA IS HELP WITH WILL I HAVE AS CAN PEOPLE AT THIS BE SUPPORT FROM THEIR AN BY ON CHILDREN COMMUNITY GIFT

THE AND TO OF A IN FOR WITH I YOU THAT AS IS WE OUR DISABLED BUSINESS AT ON MIYARES HAVE THEIR WAS FROM BE MORE THEY ARE BY HIS THIS YOUR CHILDREN OR

THE TO OF AND A IN YOU WE YOUR OUR FOR IS THAT WITH BY ARE IT THIS WILL AUDUBON BE OR AN AS AT CAN MORE ON TAX HAVE BEAUTIFUL US INDIANAPOLIS IF

276 Ulla Connor and Kostya Gladkov

Table 4. Results of keyword analysis of individual appeals C13

C14

I MY AM WAS GIFT YOUR

DISABLED

WANDA’S

Note.

C15

A17

R1

R2

YOU WE YOUR PLEASE GIFT THANK ENCLOSED CONTRIBUTION TAX HOPE CONSIDER SUPPORT CARD ENVELOPE US

YOUR PLEASE YOU GIFT TAX CONTRIBUTION CONSIDER HELP TODAY ENCLOSED ENVELOPE CARD WILL DONATION

SHE JOB HER I WAS MY YMCA HIS WILL YOUR YOU

HE HIS WAS MIYARES JOB HIM HER SHE WANDA BLIND HAD BABY BUT

TODAY DONATION IF

RETURN MAKE JOIN

WERE MOTHER

HELP MAKE RETURN ANY I PLEDGE CHECK OR JOIN CALL WILL JOB THEY MR HOME PRESIDENT

CHECK HOPE SEND DEDUCTIBLE PLEDGE US RECEIVE TO MEMBERSHIP

GIFT WILL SCHOOL YOU ARE

C13 – Credibility Appeal “First-Hand Experience” C14 – Credibility Appeal “Showing Writer’s Respect for Audience’s Interests and Points of View” C15 – Credibility Appeal “Showing Writer-Audience Shared Interests and Points of View” A17 – Affective Appeal “Appealing to the Audience’s Views” R1 – Rational Appeal “Descriptive Example” R2 – Rational Appeal “Narrative Example”

Rhetorical appeals in fundraising direct mail letters 277

Results of the comparison of frequency wordlists As indicated in Table 3, the WordSmith frequency wordlists showed that there are differences in the frequencies of the Bank of General English and the corpus of fundraising letters. According to Table 3, the wordlist of the fundraising letters displays a high frequency use of content words such as school, support, gi, etc. ese words characterize the essence of fundraising letters as texts designed to induce financial or other kinds of assistance on the part of donors. Also, as columns for specific non-profit fields in Table 3 show, words such as people, children, community, disabled, students, etc., are employed much more frequently in the fundraising letters. A high frequency of some words oen reflects the nature of the non-profit fields the letters represent. For example, the high frequency of words community, children and disabled can be found in such philanthropic sectors as health/human services and community development, while the field of education has an unusually high frequency of the words students and school. Another interesting point to note can be found in the high frequency of the personal pronouns you, your, we, our (Table 3). In the wordlist of the Bank of General English, personal pronoun I, which is the 12th most frequent word, is the most frequently used personal pronoun. In the current corpus of direct mail letters, however, you is the most frequent pronoun. In other words, fundraising letters have more instances of second person singular you, whereas the Bank of General English has much higher frequency on the first person singular pronoun I. Such a difference in the use of you can be accounted for by the characteristic of a letter as being a special form of a communication with the donor and as addressed personally to a certain individual. In other words, the writers of solicitation texts tend to be rather personal in their communication with the donor and the high use of you underscore this characteristic. Interestingly, the first personal plural pronoun we and its possessive derivative our are used much more frequently in direct mail letters than in the Bank of General English. In fact, we is the 13th most frequently used word in the corpus of fundraising texts, whereas, according to Table 3, it is the 33rd in the Bank of General English. e explanation for the extensive use of we and our can be found in the fact that the writers are trying to show to the donor mutual interests in certain non-profit programs and activities. Personal pronouns we and our make the reader feel included in a certain activity and, thus, more prone to positively respond to the solicitation.

278 Ulla Connor and Kostya Gladkov

Results of the keywords analysis e keywords analysis was performed on the second version of the letters where the individual appeals were copied from the original letters and sorted into different text documents according to the type of appeal. e WordSmith keywords analysis searches for words that are used significantly more frequently in one corpus than in the other. In this study, keywords searched for the words that were used more frequently in individual appeals than in the entire corpus of fundraising letters (Table 4). If the keywords tool does not find any words that appear more frequently in the appeals than in the entire corpus, it means that semantically a given appeal does not differ from the reference corpus. As the results of the analysis show in Table 4, some of the individual appeals indeed differ lexically from the corpus they were compared against. For example, credibility appeal, C13 (showing First-Hand Experience) has a high frequency of first personal singular pronoun I and its possessive derivative my. As is indicated in Table 4, credibility appeal C14 (showing Writer’s Respect …) is different from the reference corpus in the frequent use of second person singular pronouns you and your. Also, this appeal very oen uses words which express the writer’s gratitude for the donor’s past contributions: gi, thank, contribution, etc., as well as the writer’s hope for the donor’s continuation of the support and interest in a certain philanthropic program: hope, consider, support, etc. According to Table 4, credibility appeal C15 (Showing Writer – Audience Shared Interests and Points of View) has a very high frequency of first person plural pronoun we, which would be natural to use by the writers of solicitation letters to express mutual goals and interests. As the WordSmith keywords analysis has shown, credibility appeals are indeed lexically different from the general corpus of fundraising letters in their use of such words that contribute to the credibility of the writer and to the overall effectiveness of the letter. As far as the affective appeals are concerned, the results of the analysis show affective appeal A17 (Appealing to the Audience’s Views) also has a distinctive word usage that distinguishes it from the general corpus. In the letters, this appeal is oen seen in the form of a direct request for a donation. erefore, it is not surprising that A17 has a high frequency of the use of the word please. In fact, 85% of all occurrences of this word in the corpus of direct mail letters appear within this very appeal. Also, as Table 4 indicates, while directly asking for a donation, the authors of direct mail letters oen use such words as

Rhetorical appeals in fundraising direct mail letters 279

contribution, support, donation, help, etc. Sometimes, the writers are trying to appeal to the audience’s financial motives by pointing out the tax benefits the donor will eventually have. is fact explains the high level of appearance of the word tax within the A17 appeal. Keywords analysis on other emotional appeals A18 (Vivid Picture) and A19 (Charged Language) did not show any lexical differences from the general corpus of fundraising texts for the reason of their insignificant frequency in the corpus and the consequent inability of the WordSmith program to find any words that appear within the given appeals more frequently than in the general corpus. As far as the results of rational appeals are concerned, keywords analysis showed that there were significant differences only in appeals R1 (Descriptive Example), R2 (Narrative Example), and R12 (Information). According to Table 4, rational appeals R1 and R2 differ from the reference corpus in the frequent use of third person singular pronouns he, she and their possessive derivatives his, her as well as personal names Wanda, Mayers. Such extensive use of third person singular pronouns and personal names may reflect the fact that the writers, while rationally appealing to the audience, tell stories about people whom the non-profit organizations helped in the past. According to the results of the keyword analysis, other rational appeals did not show significant difference from the reference corpus in the semantic aspects for the reason of their lexical resemblance to the text of fundraising letters. In summary, the preliminary results of the WordSmith frequency analysis of the corpus and the keywords analysis of individual appeals have shown that fundraising texts indeed are semantically different from other English texts in their high frequency of the use of such words that define fundraising letters as solicitation texts: gi, please, help, community, children, etc. Also, since fundraising texts can be characterized as the form of communication between the non-profit and the donor, there is a high incidence of the use of second person singular pronouns. e keywords analysis has shown that some of the appeals that were developed in the system can be characterized as using certain vocabulary that define their rhetorical function in the fundraising letter. Credibility appeals display extensive appearance of personal pronouns; affective appeals contain a considerable amount of such words as please, hope, help, consider, etc. We hope that further computerized analyses of the semantic aspects of the corpus can build on these preliminary findings.

280 Ulla Connor and Kostya Gladkov

Conclusion and application In this paper, we have described the development of a persuasive appeals system that can be applied to describe and evaluate direct mail letters. e system is based on solid rhetorical theories of persuasion stemming from the works of Aristotle and Perelman, and the empirical work of Lauer et al. (1985) and Connor and Lauer (1985). e latter two provided an operationalized system of persuasive appeals used for evaluating student persuasive essays. We took the work of the latter two and adjusted their systems to fit the text of fundraising letters. We believe that the system we came up with for fundraising letters is reliable in measuring the intended persuasive effect. We applied it successfully with a high interrater reliability to a corpus of 245 letters from five different types of non-profit organizations. As was expected, some rational appeals, such as e Use of Statistics and Information, Showing Writer’s Respect for Audience’s Interests and Point of View, and Appealing to Audience’s Views, were more frequent than some other appeals. Some of the findings were unexpected. For example, we did not expect the rational appeals to rank over the other appeals in frequency across non-profit disciplines since we typically associate fundraising discourse with appeals to emotions. We believe that the system is a valid measure of persuasion in the letters. However, we can only make reference to its use with the present data. Other data sets should be subjected to the analysis for validation. It is true that the strength of the rhetorical appeals analysis will be increased aer additional computerized analyses of the semantic and syntactic features of the appeals. It is important that we identify words, collocations, and modalities, for example, that characterize individual appeals. Findings of such research will be valuable for writers when coupled with the results of the appeals frequency counts in Tables 1 and 2. Writers will not only learn which appeals are frequent in successful letters but also how to develop such appeals through appropriate language.

References AAFRC Press Release 2001. American Association of Fundraising Council. Retrieved February 18, 2002, from http://www.aafrc.org/press3.html

Rhetorical appeals in fundraising direct mail letters 281

Abelen, E., Redeker, G., and Thompson, S.A. 1993. “The rhetorical structure of US-American and Dutch fund-raising letters.” Text 13 (3):323–350. Aristotle 1932. The rhetoric of Aristotle (L. D. Cooper, Trans.). New York, NY: Appleton and Company. Burke, K. 1969. A rhetoric of motives. Berkley, CA: University of California Press. Connor, U., and Lauer, J. 1985. “Understanding persuasive essay writing: Linguistic/rhetorical approach.” Text 5 (4):309–326. Connor, U., and Upton, T. 2003. Linguistic dimensions of direct mail letters. In C. Meyer & P. Leistyna (eds), Corpus analysis. Language structure and language use (No 46). Amsterdam: Rodopi Publishers. Hunston, S. 2002. Corpora in applied linguistics. Cambridge: Cambridge University Press. Lauer, J., Montague, G., Lunsford, A., and Emig, J. 1985. Four worlds of writing. 2nd ed. New York: Harper and Row. Mann, W., and Thompson, S. 1992. Discourse description: Diverse linguistic analyses of a fundraising text. Amsterdam/Philadelphia: John Benjamins. Perelman, Ch. 1982. The realm of rhetoric (W. Kluback, Trans.). Notre Dame, IN: University of Notre Dame Press. Upton, T. 2002. Understanding direct mail letters as a genre. International Journal of Corpus Linguistics 7(1):65–85. Young, R., Becker, A., and Pike, K. 1970. Rhetoric: Discovery and change. New York: Harcourt Brace.

Appendix A Definition and examples of the persuasive appeals used for analyzing fundraising letters Rational R1

Descriptive Example Using a compelling descriptive example from one’s own or someone else’s experience

• Letter A Families are being torn apart, and too oen, children are the victims. Kids like Tommie J., made a ward of the court because of repeated beatings by an alcoholic father; Alice, sent to a group home to get help because of severe behavior disorders; and John H., a recovering alcoholic, rebuilding a relationship with his family so they can live together again.

282 Ulla Connor and Kostya Gladkov

R2

Narrative Example Using a compelling narrative story that contains a beginning, middle, and end

• Letter B Ted is a single father with three children under 10. He’s never been on welfare and he’s always had a job doing manual labor…ere was a time when he felt like he had no choice but to tolerate his wife’s constant abuse and neglect of their children. en Ted decided the children deserved a chance to start over in another town, no matter how difficult it might prove to be. R3

Classification Placing the reader in a class or unit and describing what that means

• Letter C In joining SCS, you join the ranks of those who believe that bringing art and art education to the city makes life better, richer, and more rewarding for the entire community. R4

Comparison Supporting a conclusion about a subject from a description of related subjects

• Letter D Our faculty-student ratio is 1:27. For law schools in the United States, the range of faculty-student ratios is from 1:13 to 1:35, but well over half of the law schools in the country have better ratios than we do. R5

Contrast Supporting a conclusion about a subject from a description of its counterpart

• Letter E Unfortunately, our view of the importance of philanthropy is not shared by all Americans. Many see philanthropy as no more than the grand gestures of the rich. ey do not understand, as you do, that the museums, parks, hospitals and community organizations supported by philanthropy are the cornerstones of our very quality of life.

Rhetorical appeals in fundraising direct mail letters 283

R6

Degree Using an argument of more or less to imply next steps

• Letter F Please consider an increase in your contribution to the Girls Scout Annual Campaign. R7

Authority Using a distinguished name to make the reader act under the influence of someone who is authoritative

• Letter G Pat LaCrosse asked me to send this information inviting you to join the Georgia O’Keeffe Circle of the Indianapolis Museum of Art’s Second Century Society (SCS). R8

Cause/Effect – Means/End – Consequences Urging action on the reader’s part by forecasting effects, consequences, or ends

• Letter H As one of only a few zoos in the country that receives no local, state, or federal tax support IZS must depend on donations for general operating funds from corporations like yours… R9

Model Providing the reader with a description of the way a proposed end can be achieved

• Letter I A group of your colleagues recently volunteered to help set the priorities for this campaign. ey surveyed members of the staff and faculty councils, administrators and others and learned that we at IUPUI have a number of vital concerns. R10

Stage in Process Reviewing previous steps and looking forward to what steps need to be taken

284 Ulla Connor and Kostya Gladkov

• Letter J ree years ago, the Heritage Trust set aside land for the restoration of the Limberlost Swamp, near Geneva in eastern Indiana. Now, wildlife is returning to the area. Egrets, ducks and geese now gather at waterfowl resting ponds in large numbers; and native prairie grass has been planted to return natural diversity and other wildlife to the area. R11

Ideal or Principle Persuading the audience by showing that the writer’s argument is based on an ideal or principle that is accepted by all members of the audience

• Letter K e mission of the Indianapolis Zoological Society is to provide recreational learning experiences for the citizens of Indiana through the exhibition and presentation of natural environment in a way to foster a sense of discovery, stewardship, and the need to preserve the Earth’s plants and animals. In short, the Society is about connecting animals, plants and people. R12

Information Using facts and statistics to give definiteness to the writer’s argument

• Letter L rough the efforts of about 300 volunteers, nearly $89,000 was raised through the IUPUI Campus Campaign. Almost 900 of us made new gis in support of the things we care about. Together with those who were already donors, there are over 1,350 staff and faculty supporting the work of IUPUI with their gis. Credibility C13

First Hand Experience Providing information directly from the writer’s experience to establish the writer’s credibility

• Letter M Purdue has been a part of my life for as long as I can remember. I was raised in West Lafayette. As I grew older, I realized more and more that Purdue isn’t just a state institution; it is a public university. Moreover, it is a world-class university.

Rhetorical appeals in fundraising direct mail letters 285

C14

Showing Writer’s Respect for Audience’s Interests and Point of View Creating the impression of a good willed writer by showing the writer’s appreciation for the audience’s past actions

• Letter N In looking back at the last decade, we at the Indianapolis Zoological Society (IZS) wish to express our sincere thanks to all companies who have helped us to achieve so many successes at the Indianapolis Zoo and White River Gardens. C15

Showing Writer-Audience Shared Interests and Points of View Building solidarity with the audience by acknowledging shared values and ideas

• Letter O Because if you and I truly want to preserve philanthropy as a way of life, we must make certain that Americans everywhere take philanthropy seriously, that they talk about it, debate it, challenge it, and ultimately keep it alive as a cherished tradition. C16

Showing Writer’s Good Character and/or Judgment Taking a subjective stance to make a positive judgment of the writer in the reader eye

• Letter P Who helps Randy break a cycle of violence and become a better dad? Who helps Michael, who has spina bifida, learn to talk, dress himself, and get around independently? Without you, no one. Affective A17

Appealing to the Audience’s Views Arousing emotion in the reader by addressing his/her attitudinal and moral values

• Letter Q Please, make a tax-deductible gi to Community Centers of Indianapolis in 1999, and know that is playing an important part in meeting the needs of its community.

286 Ulla Connor and Kostya Gladkov

A18

Vivid Picture Involving the audience emotionally by temporarily and spatially linking them to an event

• Letter R Do you remember how wonderful and how proud you felt in 1980 when the young United States Hockey Team beat the powerful Soviet Team 4-3, and then went on to beat Finland 4-2 for the gold… or in 1984 when 16 year old Mary Lou Retton, needing a 9.95 in her final event to tie for first place in the all around Gymnastics competition, vaulted her way to the gold by scoring a perfect 10? A19

Charged Language Using strong language to arouse emotions

• Letter S When it comes to the misuse and destruction of our natural areas, reality is not only harsh, it is deadly. Once they are developed or altered, and their fragile ecosystems

Communicating relationships through metaphor in fundraising texts 287

Framing matters: Communicating relationships through metaphor in fundraising texts Elizabeth M. Goering Indiana University Purdue University Indianapolis

Introduction One of the primary activities of nonprofit organizations is fundraising. While fundraising discourse has a clear persuasive function and bringing in dollars is its obvious goal, fundraising and development are not primarily about raising money. Rather, they are about establishing and maintaining relationships. Even relatively impersonal direct mail letters are more concerned with building relationships than with raising funds, as evidenced by the fact that nonprofit organizations typically spend more than they raise on direct mail campaigns. Yet, little is known about the linguistic features of relationship formation through fundraising discourse. is chapter uses computer analysis of a fundraising corpus to identify the ways in which relationships are metaphorically constructed within direct mail letters.

Review of relevant literature and development of research questions Fundraising as relationship building As noted in the introduction, fundraising discourse is not always centered on acquiring funds. Fundraising practitioners and theorists alike are quick to agree that, at its core, fundraising is really about relationship building, about establishing a partnership between a community and an organization. Keegan (1990:13) observes, “Going out into the community, asking people for money,

288 Elizabeth M. Goering

is in fact inviting them to be our partner in making something of value happen.” Grace (1997: viii) concurs, asserting that development is the series of deliberate activities by which we “involve and retain funders in a donor-investor relationship with our organizations.” Because of the strong connection between establishing relationships and fundraising, it is, perhaps, not surprising that person-to-person solicitation is considered to be the most effective way to raise funds. Edles (1993: 13) argues, for example, that “people give money to people not to causes… . Writing a letter is the least forcible way to solicit because it’s the most impersonal.” Consequently, it may seem odd to think of the impersonal (or at best, pseudo-personal) letters we all receive in abundance – many of which end up in the trash – as tools for building relationships. And yet, that is what they are. Direct mail letters certainly do not appear to be primarily about raising money. e returns on direct acquisition mailings (mailings designed to attract new donors) are notoriously low, with response rates ranging from .5% to 2.5% considered normal (Warwick 2000). In addition, direct acquisition mailings typically return only 50–75% of their costs (Warwick 2000). In other words, these direct mail campaigns generally cost organizations considerably more than they bring in. Yet, direct mail letters are an undeniably integral part of a successful fundraising plan, functioning as the frontline in many nonprofits’ fundraising efforts. Fundraising consultants Mal Warwick & Associates, Inc. (2000:166, emphasis mine) identify direct mail as “the single biggest means used by nonprofits to recruit new donors” and conclude that “research repeatedly confirms that the majority of first time gis to charity are made by mail.” Grace (1997: 121, emphasis mine) suggests that ultimately the purpose of the direct mail letter “is the acquisition of a new donor who is then brought into a relationship with the organization.” Clearly, because of the strong link between direct mail letters and the cultivation of donor relationships, understanding the discursive construction of relationships in fundraising letters should be of value to fundraising theorists, practitioners and educators alike.

Relational communication theory and relationships in fundraising Relational communication theory provides a logical and useful framework for beginning to explore how relationships are built linguistically in fundraising discourse. Relational communication theorists identify dominance and affili-

Communicating relationships through metaphor in fundraising texts 289

ation as the two fundamental relational constructs, the “basic substance of all relational judgments” (Dillard, Solomon and Samp, 1996:704). In general terms, dominance is the degree of control one participant in a relationship has over the behavior or beliefs of another; while affiliation is the degree of affective connectedness one feels for the other. While this relational construct has its roots in the love/hate axis of Leary’s (1957) model of interpersonal behavior, Dillard, Solomon, and Samp (1996) suggest that affiliation goes beyond loving or liking and is more accurately represented as solidarity. According to relational communication theory, these two constructs, which can be combined to create a two-dimensional model of relationships (see Figure 1) form the foundation of any relationship. is model provides a useful framework for analyzing the relationships direct mail letters seek to establish. Using the Dominance/Affiliation grid, four possible types of relationships can be plotted: (l) Dominant/Affiliative, in which the emotional bond is high, and power is unequal; (2) Equal/Affiliative, in which the emotional bond is high, and power is equal; (3) Dominant/Nonaffiliative, in which the emotional bond is low, and power is unequal; and (4) Equal/Nonaffiliative, in which the emotional bond is low, and power is equal. Both Dominance and Affiliation are important constructs in theorizing relationships in fundraising. Grace (1997) suggests that concern and connec�� tion are prerequisites for giving. He maintains that capacity alone – the ability � � � � �

��

�

�

�

�

�

� � �

� � �

� � �

� � �

� � �

��

�

� � �

� � �

� � ��

� � �

� � �

� ��

�

��

� �

��

� �

� �

� �

�

��

Figure 1. Dominance/affiliation model of relationships (Adapted from Dillard, Solomon �� and Samp 1996)

290 Elizabeth M. Goering

to give – will not guarantee that a potential donor will make a contribution, because the donor must also have connection, an emotional linkage with the organization, and concern, an intellectual or thoughtful link to the organization. is would imply that the affiliative relational dimension is particularly important in fundraising. On the other hand, a relationship in which potential donors believe they have the power to really make a difference also likely motivates giving. And yet, little is known about the specific discursive manifestations of relationships in direct mail letters. is study seeks to address this gap in scholarship by analyzing the ways in which the four types of relationships are metaphorically created in fundraising discourse.

Metaphors and relationship in fundraising letters In fundraising letters, the type of relationship a potential donor is being invited to enter is oen communicated metaphorically (e.g. friend, investor, partner). Metaphors have long been recognized as the primary way in which we come to understand the unknown. Back in the 1960s, Nisbet (1969:4) observed: Metaphor is a way of knowing – one of the oldest, most deeply embedded, even indispensable ways of knowing in the history of human consciousness. It is, at its simplest, a way of proceeding from the known to the unknown. It is a way of cognition in which the identifying qualities of one thing are transferred in an instantaneous, almost unconscious flash of insight to some other thing that is, by remoteness or complexity, unknown to us.

is view of the function of metaphor is still common. For example, Lakoff and Turner (1989:214) suggest that “Metaphor is central to our understanding of our selves, our culture, and the world at large,” and Siegelman (1990: 3) explains: Metaphor is primary both in language and in thought. It is through metaphor that we come to understand the world …. As the quintessential ‘bridging operation,’ metaphor links domains by connecting insight and feeling, and what is known with what is only guessed at.

In part, the power of metaphor lies in its ability to redescribe reality. e metaphoric process allows us to develop new ideas because of its ability to link the unknown with the familiar. In fact, Siegelman (1990:4) concludes, “Indeed it seems that we can only see the new at first in terms of the old.” Within the context of this fundraising research, the reality that is being redescribed is rela-

Communicating relationships through metaphor in fundraising texts 291

tional, with the organization using metaphor to draw potential donors into a new relationship with an unknown organization. By using metaphor to frame the new relationship in familiar terms, the organization creates a bridge that connects the new with the old and makes the letter’s receiver feel as if he/she is on familiar territory. Admittedly, the relationship that is created in direct mail letters may be more illusory than real, a pseudo-relationship that may exist only in the metaphorically constructed space of fundraising discourse, and yet it is, nonetheless, a relationship that impacts human behavior. e power of metaphor to redescribe reality is particularly apparent in the association between metaphor and affect, a link that may be especially important in understanding the affiliation dimension of relationships. Siegelman (1990:7) is a strong proponent of the connection between affect and metaphor. She writes that a metaphor “gives flesh and blood to the abstract and theoretical. Metaphor, especially when used deliberately and unconventionally says ‘Look at me. Look at the world, not through it.” e affective impact of metaphor can be clearly seen if one compares saying that a used car salesman is sneaky and untrustworthy with saying that a used car salesman is a snake. Siegelman (1990:7) explains: The word snake carries an image with surplus meaning and feelings and associations: All the associations that we have collectively and individually to snakes surge in with the image we may be seeing in our mind’s eye. . . .A whole tangle of conscious and unconscious associations goes with the image, and metaphor delivers them in an economical and vivid package.

Metaphor, then, by boosting affect and ultimately establishing positive affiliation, is one way in which reality is redescribed and relationships are formed within the impersonal context of direct mail letters. Examining metaphor in fundraising discourse is not new. Turner (1991), for example, explored the functions of metaphor in shaping reality within fundraising texts. Turner illustrates how pervasive metaphors are in fundraising discourse and highlights the interaction between metaphor and the reality constructed through those metaphors. In addition, McCagg (1998) conducted a Lakovian analysis of conceptual metaphors in the promotional materials from two organizations.1 rough his analysis of metaphor, McCagg was able to identify the general, underlying conceptualizations of the audience held by each of the organizations included in the study. Barton’s 2001 study of the founding of the United Way examines how both

292 Elizabeth M. Goering

the potential donor and individuals with disabilities are discursively and metaphorically constructed in promotional texts. He concludes that by representing individuals with disabilities as children/child-like or as supercrips, these fundraising texts “eras[e] the complex experience of individuals, particularly adults, with disabilities” (Barton 2001:172). Furthermore, the United Way’s promotional rhetoric represents the United Way as a responsible business and potential donors as making sensible business decisions, a practice that erases the experience of people with disabilities completely (Barton, 2001:188). Goering (2001) expanded the analysis of the metaphoric language used to describe the type of relationships potential donors are invited to enter into through direct mail fundraising letters. In a pilot study of a limited number of direct mail letters, Goering identified three metaphor frames that were used to characterize the relationship between potential donors and the nonprofit organization: friend/family, business partner, and assistant. ese studies establish a precedent for analyzing metaphor in fundraising discourse, and they evidence that the assumptions nonprofit organizations make about their relationships with donors can be inferred from the metaphors embedded in fundraising texts. However, each of them is somewhat limited by small sample size. Hence, the current study utilizes computer analysis to examine a relatively large corpus of fundraising texts, as it investigates the metaphoric conceptualizations of relationships in direct mail fundraising letters. Specifically, this study seeks to answer the following research questions: (1) How are the relationships between potential donors and requesting organizations metaphorically described in direct mail letters? One might expect organizations of different types to define such relationships differently. For example, a social service agency might seek to establish a different kind of relationship with a potential donor than an environmental organization would. In addition, a local organization might be likely to attempt to construct a different kind of relationship than a national organization or a local affiliate of a national organization. us, this study proffers two additional research questions: (2) What differences, if any, exist in the ways in which organizations of different types metaphorically describe the relationship they are seeking with potential donors? (3) What differences, if any, exist in the ways in which national, local and local affiliates of national organizations metaphorically describe the relationship they are seeking with potential donors?

Communicating relationships through metaphor in fundraising texts 293

Method Fundraising corpus e corpus used for this research is the Fundraising Corpus collected by and housed in the Indiana Center for Intercultural Communication (ICIC) at Indiana University Purdue University Indianapolis. is corpus is a two million word, computerized databank of fundraising texts consisting of the most important fundraising genres – direct mail letters, case statements, grant proposals and annual reports. e fundraising texts in the corpus are drawn from 108 organizations representing a variety of different fields within the nonprofit sector. is study analyzed 245 of the direct mail letters included in the corpus. e letters were collected from 73 organizations and represent five different types of nonprofit organizations: Social Services, Environmental, Community Development, Education, and Cultural/Arts.

Developing theory-based metaphor categories e first step in this research was to determine if the relational metaphors identified in an earlier study (Goering 2001) could reliably be placed in the four relational quadrants of the Dominance/Affiliation Model presented in the previous section (see Figure 1). A group of ten students enrolled in an upperlevel Communication Studies class at a large, urban, Midwestern university were given a questionnaire with twelve representative statements drawn from the direct mail fundraising letters coded in the pilot study (e.g. “Join us in an investment opportunity,” “We are seeking to build partnerships,” “We still need assistance from friends like you”). e students were asked to describe the relationship implied by each statement on two dimensions: Dominance (whether the donor is given power and responsibility, or whether power and responsibility are shared equally by donor and organization) and Affiliation (whether the donor has a high or low degree of emotional connectedness with the organization). Reliability, which was computed as the percentage of agreement on how the statements were rated among the ten coders, was 83%. is high percentage of agreement indicates that the metaphoric frames used in fundraising can reliably be situated within the Dominance/Affiliation Model of Relationships. Figure 2 locates the metaphors on the model and offers

294 Elizabeth M. Goering

examples of each metaphor frame. With 83% agreement, the coders concluded that the friend or family metaphor connotes a relationship characterized by a high degree of affiliation and relatively equal power. e savior metaphor connotes considerable affiliation, but it also implies that the donor – the one doing the saving – is dominant in the relationship. e coders concluded that the investor metaphor connotes high power but relatively low affiliation, and the partner metaphor implies relatively low affiliation and equal power.

Developing metaphor word lists Once the metaphors used in fundraising discourse had been reliably placed within the theoretical framework provided by the Dominance/Affiliation relational communication model, the researcher sought to establish word �� lists that could reliably locate the four metaphor clusters in fundraising texts. is was accomplished by returning to the 58 direct mail letters that had been hand-coded in the pilot study and identifying key words associated with each �

��

� �

��

�

��

��

� � �

�

��

��

��

��

��

��

�� Figure 2. Relationship metaphors situated in dominance/affiliation relational model with examples

��

Communicating relationships through metaphor in fundraising texts 295

metaphor category. For example, words associated with the Investor metaphor included investment, shareholder, and investor, while words such as friend, family, brother, and sister were associated with the Friend/Family metaphor. Using the word lists that had been constructed for each metaphor cluster, the word-search function of QSR Nud*ist,2 a soware package for qualitative analysis of text, was utilized to search for the metaphors in a sample of documents that had been hand-coded for the pilot study. e results of the computer search were then compared with the results of the hand coding, and the word lists were modified until the list facilitated finding the metaphors in the fundraising documents reliably and parsimoniously. Reliability was operationalized as constructing a word list for each metaphor cluster that successfully located at least 90% of the occurrences of that metaphor that had been located by hand in the sample of letters. Parsimony was operationalized as constructing a word list for each metaphor cluster that did not include any words that did not increase the reliability of the search. e words brother and sister, for example, were removed from the Friend/Family metaphor list because they did not increase the reliability of finding relevant metaphors in the fundraising letters. e final word lists for each metaphor cluster included the following words: Friend Metaphor = friend, family; Investor Metaphor = investor, shareholder; Partner Metaphor = supporter, help us, partner, join us; Savior Metaphor = save, free, magic, dream. With these two stages – establishing theory-based metaphor clusters and identifying word lists for each metaphor cluster – completed, the actual analysis of the texts in the fundraising corpus could begin.

Analyzing metaphor use in fundraising corpus e analysis of metaphor use in the fundraising corpus involved a two-step process: First, once reliable and parsimonious word lists had been compiled, the word lists were used to search 245 direct mail letters in the fundraising corpus. e word searches were completed using the Pattern Search function of QSR Nud*ist. For each word, the root of the word was entered as the search term, so that any form of the word would be located. For example, investor was entered as invest, so that any word with invest in it (e.g., investor, invest, investment) would be found. e QSR Nud*ist text-search program provides the researcher with the opportunity to query each instantiation to allow elimination of false finds before finds are analyzed and saved. is function was used

296 Elizabeth M. Goering

to determine whether or not the usage of the word was, indeed, metaphorically describing the relationship the donor was being invited to enter. Only those instances that met this requirement were included in the final calculations. Aer the word-search was completed, the second step in the data analysis process was to use the soware program SPSS to statistically analyze metaphor use in the fundraising letters.

Results Metaphor use in fundraising corpus Table 1 summarizes the frequency with which each relationship metaphor occurs in the fundraising corpus. A total of 385 phrases in the corpus letters metaphorically described the relationship between organization and donor. e majority (n=148, 38%) of these occurrences metaphorically framed that relationship as a Partnership, in which the donor helps or joins with the organization in meeting its goals. e Friend relationship metaphor was utilized second most frequently, with 141 (37%) of the occurrences defining the donor as part of the family or a friend of the organization. In 66 instances (17%), the donor was metaphorically described as an Investor; while 8% (n=30) described the donor as Savior, the last chance or last hope for the nonprofit and the cause it is promoting. In all, 75% of the metaphoric descriptions of the relationship between donor and organization describe the relationship as one in which both parties have relatively equal power (the dominance dimension of relationship is low). Whether the letter uses a metaphor that elicits strong affiliation or not is fairly evenly divided, with 45% falling in the high affiliation categories. e frequencies presented to this point represent composite counts of all the words on the word list for a particular metaphor cluster. By examining the frequencies of each word on the word list separately, it is possible to identify the most commonly used metaphors within each metaphor type (see Table 1). Friend (n=127) is a much more commonly used metaphor in fundraising letters than family (n=14). Partner (n=61) is the word most oen used within the Partner metaphor category, while words with the root invest (n=64) account for nearly all of the occurrences of the Investor metaphor. e most common words used to describe the Savior metaphor are words that stem from the root, sav (n=15).

Communicating relationships through metaphor in fundraising texts 297

Table 1. Frequency of metaphor use in fundraising corpus direct mail letters Metaphor cluster metaphor search term Friend/Family Friend Family

Number of occurrences

Number of letters with metaphor

141 127 14

98* 92 12

Savior Save Free (as in “to liberate”) Magic Dream

30 15 2 3 10

22* 8 2 3 10

Investor Invest Shareholder

66 64 2

38 36 2

148 61 21 50 16

81* 32 18 41 13

Partner Partner Join us Help us Support

* is value does not equal the sum of letters with each individual search term for this metaphor cluster because some letters used multiple search terms.

So far this analysis has looked only at total occurrences of each metaphor; it has not looked at how many different metaphors occur in each letter or how many of the 245 letters metaphorically describe the relationship the organization is seeking with the donor. Relationship metaphors were located in over two-thirds of the letters (n=169, 69%). e majority of letters (n=108) use words from a single metaphor category; however, words from two of the categories are found in 53 letters and words from three categories are found in eight letters. None of the letters use words from all four metaphor categories. All in all, these results confirm that organizations do define donor relationships metaphorically. Furthermore, they provide evidence that the answer to the first research question is that, to varying degrees, nonprofits use metaphors that represent each of the four types of relationships presented in the Dominance/Affiliation model.

298 Elizabeth M. Goering

Metaphor use by type of organization e second research question this study seeks to answer relates to whether or not metaphor use varies by type of organization. In other words, do social service nonprofits frame donor-organization relationships differently than environmental organizations or educational nonprofits? Table 2 presents means and standard deviations for each of the four metaphor categories by organization type. Comparing the average metaphor usage by different types of organizations yields some interesting findings. Social service organizations employ all four metaphors, with Partner and Investor being first and second in average frequency of use. In fact, social service nonprofits use the Investor metaphor more than any of the other types of organizations. Environmental organizations use the Savior metaphor most oen and considerably more oen than any other type of organization. Interestingly, environmental associations do not employ the Investor metaphor at all. Community development organizaTable 2. Means and standard deviations for metaphor use by type of organization Type of organization

Relationship Metaphor Cluster Friend Savior Partner

Investor

Social Services (n=78) Mean Standard Deviation

.41 .69

.12 .32

.68 1.06

.51 1.71

Environmental (n=10) Mean Standard Deviation

.50 .53

1.00 1.89

.80 1.48

.00 .00

Community Development (n=10) Mean Standard Deviation

.10 .32

.00 .00

1.30 1.49

.00 .00

Education (n=108) Mean Standard Deviation

.81 1.32

.06 .29

.42 1.02

.20 .54

Cultural/Arts (n=37) Mean Standard Deviation

.43 .60

.14 .38

.78 1.16

.11 .32

Communicating relationships through metaphor in fundraising texts 299

tions overwhelmingly rely on the Partner metaphor, with the Savior and Investor metaphor not in evidence at all. Community development nonprofits use the Partner metaphor more than any other type of organization. Educational nonprofits, like social service, use all four metaphors, with Friend occurring most frequently. In fact, educational organizations use the Friend metaphor more than any other type of nonprofit. Finally, cultural/arts associations also employ all four metaphors, with Partner being used most frequently. An analysis of variance was used to determine whether these mean differences in metaphor use are statistically significant. e results of this analysis indicate that there are no statistically significant differences in the use of the Partner or Investor metaphors. In other words, no type of organization is more likely than any other to refer to potential donors as partners or investors. ere are statistically significant differences, however, in the use of the Friend (F=2.3, df=5, p=.045) and Savior (F=7.7, df=5, p=.000) metaphors. While the analysis of variance indicates that there are significant effects in the use of these two metaphors by organization type, it does not show where those significant effects lie. Tukey’s posthoc comparison of means indicates that educational nonprofits are more likely than any other type of organization to use Friend language, and environmental organizations are more likely than any other to use the Savior metaphor.

Metaphor use by locus of organization e third research question this investigation attempts to answer is concerned with metaphor use by national versus local organizations. Table 3 presents means and standard deviations for each of the four metaphor categories by organizational locus. Comparing the mean metaphor usage by national versus local versus local or regional affiliates of national organizations reveals some noteworthy, if puzzling, findings. Interestingly, local or regional affiliates of national organizations, which use the Investor metaphor more than any other metaphor, use all of the metaphor clusters more frequently on average than either their local or national counterparts. Local nonprofits rely most heavily on the Friend metaphor; while national organizations most oen describe donors metaphorically as Partners. Once again an analysis of variance was used to determine whether these differences are statistically significant. e results of this analysis indicate that national, local, and regional/local affiliates of national organizations do not use

300 Elizabeth M. Goering

Table 3. Means and standard deviations for metaphor use by organizational locus Relationship Metaphor Cluster Organizational locus Local Nonprofit (n=181) Mean Standard Deviation Local/Regional Affiliate of National Nonprofit (n=17) Mean Standard Deviation National Nonprofit (n=47) Mean Standard Deviation

Friend

Savior

Partner

Investor

.64 1.10

.08 .30

.61 1.12

.14 .45

.65 .86

.35 1.22

1.00 1.50

1.47 3.31

.58 1.02

.12 .50

.60 1.11

.27 1.05

the Friend or Partner metaphors in significantly different ways. ere are statistically significant differences, however, in the use of the Investor (F=13.83, df=2, p=.000) and Savior (F=5.01, df=2, p=.05) metaphors. While the analysis of variance indicates that there are significant effects in these two areas, it does not show where those significant effects lie. Tukey’s posthoc comparison of means indicates that local or regional affiliates of national organizations are more likely than national or local organizations to describe the donor both as Savior and Investor.

Discussion While the results section identifies numerous interesting, and even some puzzling, findings, this discussion section will focus on three main issues, namely: (1) the relationship metaphors used in fundraising, (2) patterns in the use of relationship metaphors in direct mail letters, and (3) practical, pedagogical, and methodological implications of this research.

Communicating relationships through metaphor in fundraising texts 301

Relationship metaphors in direct mail letters e results of this research confirm the usefulness of the Dominance/Affiliation model for understanding the ways in which relationships are metaphorically described in direct mail fundraising letters. As noted in an earlier section of this paper, the power of metaphor, according to Siegelman (1990), is to redescribe reality. Indeed, each of the four metaphor categories – friend, investor, partner, savior – creates a decidedly different reality for the desired relationship between the organization and the would-be donor. Each elicits quite different understandings of what the nature of the relationship between the donor and the requesting nonprofit will be, the expectations, the responsibilities. e findings of this study clearly reveal that the majority of fundraising letters attempt to establish a partnership or friendship relation with potential donors. Interestingly, both of these metaphor clusters are situated in the lowdominance half of the Dominance/Affiliation relational model, suggesting that dominance may not be as important as affiliation in cultivating desired donor relationships. e preferencing of affiliation makes intuitive, practical and theoretical sense, which can be seen through closer exploration of the implied connotative meanings of Partner and Friend. Partner, for example, invokes images of someone in a supportive role, encouraging the donor to participate in community improvements without necessarily placing them in a leadership position and endowing them with the responsibilities that would be associated with the leadership role. e Friend metaphor elicits greater affiliation (and possibly more baggage) than a partnership, but maintains the relational equality of the Partner metaphor. In doing so, the Friend metaphor invites the potential donor into a relationship that is as comfortable, non-threatening, and valuable as a friendship. Not only does this finding that dominance is less important than affiliation in fundraising relationships make intuitive sense, it is also consistent with Grace’s (1997) claim that “connection” and “concern,” emotional and thoughtful linkages between donor and organization, are key components of building relationships in fundraising. Although affiliation seems to be the salient dimension in creating donor relationships, the high-dominance metaphor clusters, Investor and Savior, were used in some of the fundraising letters. Perhaps the appeal and metaphoric impact of investor and savior is that they cast the donor in a position of power, in which the donor is given the ability and responsibility to make a difference in his/her community. One might expect that this metaphor would

302 Elizabeth M. Goering

have to be used more cautiously, because there is more potential for a donor to reject the position of responsibility implied by the label. Yet, for certain organizations in certain circumstances, imbuing the donor with that kind of power is very motivating, suggesting that metaphor use may follow distinct patterns.

Patterns in the use of relationship metaphors in direct mail letters e results of this corpus analysis also give rise to the conclusion that there are patterned differences in the ways in which relationships are metaphorically described in direct mail letters. e data suggest that nonprofits tend to use differing metaphoric relational frames in their fundraising efforts, depending on what type of organization they are. Most notable, perhaps, in this patterned usage of metaphor is the widespread use of the Savior metaphor by environmental organizations. Given the relative infrequency with which the Savior metaphor is used overall, it might seem surprising that this one type of organization would rely so heavily on Savior language in describing donor relationships. And yet, if one considers the connotative meaning of the Savior metaphor, one of urgency and the need for desperate measures in extreme situations, this pattern of metaphor use seems particularly appropriate to solicit support for environmental causes. Aer all, the prevailing discourse about environment in the culture at large casts the environment as an entity facing enormous threats and warns of dire, long-term consequences for lack of action. Furthermore, the environment is typically portrayed as an entity that cannot help or speak for itself, necessitating a savior with the ability and power to protect it. A second noteworthy pattern in metaphor use revealed in this study is the frequent invoking of the Friend metaphor by educational nonprofits. In their direct mail letters, as Table 2 illustrates, educational organizations use the Friend metaphor more than any other metaphor cluster, and they use it more oen than any other type of organization. Why would organizations that tend to focus predominantly on logic and the intellect utilize the metaphor linked most closely with emotion and affect? One answer to this question might be that educational organizations typically target fundraising campaigns to alumni, to individuals who really do have an affiliative connection with their institution. Another possible explanation is that many educational institutions, especially smaller colleges, enact a college-as-extended-family model, where students can expect the kind of care and consideration usually experienced in

Communicating relationships through metaphor in fundraising texts 303

families. A final possible explanation is the tendency of American culture to de-emphasize intellectualism and privilege family values. Within this cultural context, the use of the Friend/Family metaphor by educational organizations becomes a logical and persuasive choice. In addition to patterns of metaphor use related to type of organization, this study’s results also illustrate that national, affiliate and local organizations use metaphors in patterned ways. Local nonprofits, according to these findings, tend to rely most heavily on the Friend metaphor, which seems logical, given the familiarity and close-to-home reality that language connotes. Also, that national organizations would employ the Partner metaphor most frequently seems reasonable, because it maintains the equality of friendship but acknowledges the geographical distance, and, hence, the lessened affiliative connection, between donor and organization. e results for local and regional affiliates of national organizations, however, which suggest that these organizations rely most heavily on the Investor metaphor, are a bit more puzzling and seemingly counter-intuitive. Perhaps the results are skewed because the sample size for this organizational locus is relatively small. Before definitive conclusions can be drawn, additional research would be needed. A final pattern of metaphor use by organizational locus that is worth mentioning is the relative similarity in how local and national organizations use relationship metaphors in fundraising. As evidenced in Table 2, the mean usage of each metaphor cluster is nearly identical for national and local nonprofits. It seems likely that smaller, local organizations would model their fundraising efforts aer the successful strategies of larger, national organizations. ese results are particularly interesting within the context of institutional theory, which argues that in order to survive, organizations must establish legitimacy by adhering to institutional standards for how things are to be done. Institutional theorists observe that one way in which newly forming, localized organizations establish legitimacy is by fashioning themselves aer visible, well-established organizations (e.g., Huber and Da 1987). is research offers intriguing linguistic corroboration of this process.

Practical, pedagogical, and methodological implications of this research Both the findings related to the use of relationship metaphors in direct mail letters and the discernable patterns of their use have implications for fundrais-

304 Elizabeth M. Goering

ing practitioners as well as teachers of fundraising and English for Specific Purposes. e results of this research suggest that metaphoric framing does, indeed, matter and could be used strategically and intentionally to foster particular relational realities. By theorizing the types of relationships that are evoked in direct mail letters and by identifying the metaphoric frames through which those relationships are communicated, this research enables fundraising practitioners to tailor their fundraising messages even more systematically and deliberately to cultivate the desired donor-organization relationship. Fundraising practitioners and teachers of fundraising might make particular note of the fact that all four metaphor clusters extend a form of power to the potential donor. e high dominance metaphors (Investor and Savior) offer power by implying that the donor has the ability and, indeed, the responsibility to bring about change. On the other hand, the affiliative metaphors bestow upon the reader the power that comes from emotional connectedness and belonging. e power implicit in the metaphor clusters can, of course, be both positive and negative. For example, while some donors may respond favorably to the Savior metaphor because it gives them power, others may feel disenfranchised by the label of Savior because they may perceive the role as a burden. Hence, fundraising practitioners and teachers might be well advised to make the transference of power through metaphor a conscious and carefully deliberate choice. In some ways, through careful metaphor choice, the fundraiser linguistically can harness their power to bestow power within the organizationdonor relationship. A second useful consideration for teachers of English for Specific Purposes and fundraising practitioners is to recognize the cultural situatedness of relational metaphors. Even though metaphors such as Friend or Investor may seem universal and transcultural, the connotative meanings of metaphors always are mediated by cultural context. For example, friend has a substantially different meaning in American culture as compared to other cultures, such as Germany. In America, the word friend tends to be used more loosely to describe relationships with people with whom one is friendly. However, within the German cultural context, the word friend is only one word available for describing the many gradations of intimacy and implies a much closer, personal bond than it does in American culture. Consequently, to use the word friend within a German cultural context in an impersonal, direct mail letter, when the relationship has not progressed through the typical stages of inter-

Communicating relationships through metaphor in fundraising texts 305

personal connection could be a donor turn-off indeed. In light of culturally specific meanings invoked by metaphors, it stands to reason that any fundraising effort should choose linguistic strategies that are in keeping with careful consideration of the donor’s culture. Finally, this research has implications for future study of fundraising discourse, especially for using corpus research methods in that endeavor. e study evidences that computer soware can be reliably used to search for metaphors in fundraising texts, making it possible to examine much larger samples than would be possible with hand coding. In addition, this research illustrates the value of synthesizing linguistic corpus research methods with theories drawn from outside of the linguistics discipline, in this case from communication theories. Such research provides multiple new opportunities for fruitful interdisciplinary collaboration.

Note 1. According to Lakoff, metaphors are more than simply superficial or decorative devices; rather, metaphors play an important cognitive role in human conception. Metaphors facilitate thought by providing an experiential framework that can be used to make sense of new and abstract thoughts. 2. QSR Nud*ist, which stands for “Non-numerical Unstructured Data Indexing, Searching and eorizing,” is a soware package designed to aid users in the qualitative analysis of non-numerical data. e soware package can be used to analyze textual data, such as transcripts, letters, or literary documents, as well as non-textual data, such as musical scores, photographs, or maps. QSR Nud*ist can be used to search and index texts; however, it also can assist a researcher in theorizing about indexed data.

References Barton, E.L. 2001. “Textual practices of erasure: Representations of disability and the founding of the United Way.” In Embodies Rhetorics: Disability in Language and Culture, J. C. Wilson and C. Lewlecki-Wilson (eds), 169–199. Carbondale, IL: Southern Illinois Press. Dillard, J.P., Solomon, D.H., and Samp, J.A. 1996. “Framing social reality: The relevance of relational judgments.” Communication Research 23:703–723. Edles, L. P. 1993. Fundraising: Hands-On Tactics for Nonprofit Groups. New York: McGraw Hill.

306 Elizabeth M. Goering

Goering, E. M. 2001. “From stranger to friend? business partner? assistant?: An analysis of fundraising discourse as relational communication.” Paper presented to the Organizational and Professional Communication Interest Group at the 2001 convention of the Central States Communication Association, Cincinnati, OH. April 2002. Grace, K.S. 1997. Beyond Fund Raising: New Strategies for Nonprofit Innovation and Investment. New York: John Wiley & Sons. Huber, G. P. and Daft, R. L. 1987. “The information environment of organizations.” In Handbook of Organizational Communication: An Interdisciplinary Perspective, F. M. Jablin, L. L. Putnam, K. H. Roberts, and L. W. Porter (eds), 130–164. Newbury Park, CA: Sage. Keegan, P. B. 1990. Fundraising for Non-Profits. New York: Harper Perennial. Lakoff, G. and Turner, M. 1989. More than Cool Reason: A Field Guide to Poetic Metaphor. Chicago: University of Chicago Press. Leary, T. 1957. Interpersonal Diagnosis of Personality. New York: Ronald Press. McCagg, P. 1998. “Conceptual metaphor and the discourse of philanthropy.” New Directions for Philanthropic Fundraising 22:37–47. Nisbet, R. 1969. Social Change and History: Aspects of the Western Theory of Development. London: Oxford University Press. Siegelman, E. Y. 1990. Metaphor and Meaning in Psychotherapy. New York: Guilford Press. Turner, R. C. 1991. “Metaphors fund raisers live by: Language and reality in fund raising.” In Taking Fund Raising Seriously: Advancing the Profession and Practice of Raising Money, D. Buringame and L. Hulse (eds), 37–50. San Francisco: Jossey-Bass. Warwick, M. 2000. The Five Strategies for Fundraising Success: A Mission-Based Guide to Achieving your Goals. San Francisco: Jossey-Bass.

A corpus linguistic analysis 307

Pronouns and metadiscourse as interpersonal rhetorical devices in fundraising letters: A corpus linguistic analysis Avon Crismore Indiana University-Purdue University Fort Wayne (IPFW)

Introduction Nonprofit organizations depend on fundraising letters for their operating expenses or for funding to accomplish their capital goals. Yet these fundraising texts are unexplored territory – they are very poorly understood (Connor and Upton 2003). To begin addressing this problem the IU Center on Philanthropy has been giving some attention to the study of fundraising language and the rhetoric of fundraising relevant to the writers of philanthropy discourse and to the donors – the audience. e latest evidence of the IU Center on Philanthropy’s mission to connect practice and philanthropy research is the research now being done by the Indiana Center for Intercultural Communication (ICIC) with its focus on the systematic analysis of fundraising language using a corpus linguistics approach that results in practical advice about persuasive writing. In addition, a new semi-annual magazine, Philanthropy Matters, is published by the IU Center on Philanthropy to provide nonprofit professionals with practical information they can use based on the latest research, such as the research projects reported in the preceding chapters (Tempel 2002). is cooperation between academic researchers and practitioners is vitally important for infusing new ideas and techniques into the fields of fundraising, applied linguistics, and rhetoric. e general and specific research questions that guided my research are as follows: (1) Are interpersonal pronouns used the same or differently in

308 Avon Crismore

nonprofit fields such as Health and Human Services (HHS) and educational institutions? (2) Are metadiscourse types used the same or differently in the HHR and educational fields? (3) Are interpersonal pronouns and metadiscourse characteristics related to the various language footings/stances and dimensions identified by pragmatic linguists such as Dillon (1986) and Biber (1988; 1995)?

The non-profit fundraising corpus at ICIC e ongoing project of building a non-profit Fundraising Corpus at ICIC was started in 1999, with funding from the Indiana Center on Philanthropy. e purpose of the corpus of fundraising texts is to study how language is used for rhetorical purposes in various fundraising written genres: direct mail letters, grant proposals, annual reports, newsletters, invitations, and case statements. Building and using a corpus like this one is important for corpus linguistics for several reasons. According to Clotfelter and Ehrlich (2001:6), “e non-profit sector in America consists of some 1.5 million tax-exempt organizations, ranging in size from storefront human services agencies and one-room churches to giant universities and medical centers. is sector accounts for about 7 percent of national income and employs some 10 million workers and uses the services of some 90 million volunteers. e private sector contributes heavily to non-profit organizations such as universities.” In a recent study compiled by the Council for Aid to Education, Indiana University ranked among the top 2 percent of colleges and universities in contributions from the private sector (Williams 2000). us, universities benefit from the study of effective fundraising texts written by writers from educational organizations. In addition, the Fundraising Corpus can be used by researchers for international studies of fundraising texts. e ICIC corpus is a comparatively small one, consisting of about 900 fundraising documents from 236 organizations, totaling about 2 million words. e direct mail fundraising letter component is important because direct mail letters have been essential to fundraising and still are. us, in recent years, fundraising letters have been studied by linguists, rhetoricians, and communication researchers investigating a variety of topics (e.g. Abelen, Redeker and ompson 1993; Mann and ompson 1992; Van Nus 1996). At the time of this study, the direct mail letter component of the fundraising

A corpus linguistic analysis 309

corpus consisted of 245 letters from 75 organizations, and five separate fields in the nonprofit sector: health/human services, environmental, community development, education, as well as art and culture. For my study, I used a portion of the total direct mail letters from the health/human services and education fields for a comparative analysis of first and second person use, as well as metadiscourse use by the writers of these letters. Personnel from all agencies from the five nonprofit fields were given questionnaires or interviewed by graduate assistants at ICIC who were trained to gather direct mail letters and information about the agencies and about the demographics of their donor groups, as well as background information on the use and effectiveness of the letters. Next, each letter was scanned into a computer and double-checked for accuracy. en each letter was coded to indicate the nonprofit field, the organization, and organization size based on income. e letters used in my study were the ones that had been received at the time I started my study in 2000. Of the five agencies used for the fundraising corpus, health/human services and education had contributed the most letters to ICIC. e Wordsmith concordance program was used to analyze these letters. Each letter was tagged for metadiscourse types and for first and second person pronouns. e quantitative analysis by the Wordsmith program resulted in data for frequency counts, patterns of words and word clusters, and the immediate context for both metadiscourse and pronouns. Raw frequency counts were adjusted by the process of normalization based on 1,000 words. I decided to use a corpus linguistic analysis because it provides empirical, quantitative data for a comparative study of the metadiscourse and first/ second pronouns found in fundraising letters from two different nonprofit fields. e desired words and recurring patterns in a corpus can easily, quickly, and accurately be identified for the purposes of a comparative study. Virtanen (1998) has pointed out that corpus analyses are useful for pilot studies, and I consider this study a pilot study. She also noted that corpus analysis methods can be used in combination with other methods of analyses for a more complete understanding and a more complex interpretation of the data. My future plans are to do a qualitative contextual study of a subset of these letters, looking at the location of the metadiscourse and first/second pronoun use as well as their use by males and females in each of the two nonprofit fields. I used a rhetorical, linguistic approach for my investigation. My purpose was to investigate how both the rhetorical and the linguistic aspects of fundraising letters contribute to the persuasive appeal of the letters’ content.

310 Avon Crismore

According to the agencies, the various purposes for the letters were to give information about the agency, to find new donors, to increase the amount of previous donations, to introduce a new program, or to extend an invitation to an event. Some letters wanted to raise money for an annual campaign; others wanted money for a one-time-only special program. My rhetorical/linguistic approach involved studying the rhetorical strategies used by letter writers to persuade readers focusing on the words, sentences, and patterns used interpersonally with metadiscourse and first/second pronouns. Rhetorical analysis and linguistic analysis can work together to help build a model that explains how writers of fundraising letters persuade (or fail to persuade) readers. Both kinds of analysis help us to better understand how these letters work and how we could teach fundraising letter writers and other writers to be more persuasive. As Virtanen (1998) has argued, a corpus of philanthropic texts used for the study of rhetorical and linguistic features in written texts can help us understand reader/writer visibility, reader/writer positions, the various roles and relationships found in fundraising texts and the genre conventions for content organization and lines of argument, as well as for types of appeals and rhetorical devices used. Virtanen believes that with a dynamic rather than a static view of texts, and a clear understanding of the connections between discourse, pragmatic, and rhetorical aspects of corpus data, we can give practical advice to writers of fundraising letters and to other writers of persuasive texts.

Definitions for interpersonal pronouns and metadiscourse Interpersonal pronouns A simple definition for interpersonal pronouns is those pronouns we refer to as first and second person pronouns, both singular and plural: I, me, my, myself, mine; we, us, our, ourselves; you, your, yours, yourself. ese first and second person pronouns mark a high degree of interpersonal interaction and personal involvement (Biber 1988; Dillon 1986; Smith 1986) and are considered devices that stress solidarity with readers and also devices considered as politeness strategies (Myers 1989). A first person plural pronoun can be a rhetorical device used to stand for the author as a representative of a research group or other group, or it can be a rhetorical device used by an author to express personality and subjectivity (Bernhardt 1985; Gläser 1975; Hyland 2001a), i.e.

A corpus linguistic analysis

311

Booth’s (1961) dramatized author. Use of first person pronouns, according to rhetoricians and researchers, results in a direct address approach that is reader-centered, setting up a more interactive writer-reader relationship. Use of second person pronouns is defined by Cook (1992:157) as a “high-involvement strategy which attempts to win us over by very direct address.” e “you” can refer to a person in the text, as well as to a particular reader or to a general group of readers of the text. But, even though using “you” is oen praised as a way to engage or involve the readers in a text, Dillon (1986) points out that the second personal pronoun “you” may also be a device that the reader perceives as controlling or manipulative. e writer’s choice of first or second interpersonal pronouns in a text clearly indicates the presence of that writer in the text. is choice greatly influences the relationship between the writer and readers and the rhetorical effects on the readers, whether positive or negative. As Strapulos (2002) points out, the development of a personal relationship is particularly vital in successful written fundraising efforts. e reason is that in fundraising letters the more the reader is made to feel a part of the nonprofit agency’s cause and the closer the relationship is between writer and reader, the more effective each request for funds will be.

Metadiscourse Metadiscourse, like first and second person pronouns, is an interpersonal device that fulfills the interpersonal function of language (Dillon 1986; Halliday 1973). Vande Kopple (1985:83) defines metadiscourse as words, phrases, or clauses that “do not add propositional material but help our readers organize, classify, interpret, evaluate, and react to such material. Metadiscourse, therefore, is discourse about discourse.” Metadiscourse signals for the reader a way to understand both the writer and the text. Researchers have proposed and debated a variety of taxonomies of metadiscourse (Crismore, Markkanen, and Steffensen 1993; Luuka 1996; Mauranen 1993; Vande Kopple 1985; Williams 2003). Researchers also have noted that metadiscourse expressions serve both textual and interpersonal functions, and so have categorized metadiscourse into two major types, textual and interpersonal. ey agree that the textual subcategories can also have an interpersonal function; in other words, subcategories of textual metadiscourse can be multifunctional (Barton 1995; Crismore, Markkanen and Steffensen 1993). According to Vande Kopple’s (1985) classification system for metadis-

312 Avon Crismore

course, the textual subcategories are (1) text connectives (e.g. first, next, however, but); (2) code glosses (e.g. x means y, in other words, namely); (3) action markers (e.g. to sum up, to give an example). In Vande Kopple’s system, the interpersonal subcategories are (1) truth markers, i.e. hedges (e.g. might, perhaps, if) (2) emphatics (e.g. clearly, obviously, it is time that), and attributors (e.g. according to x, Einstein found that), (3) attitude markers (e.g. surprisingly, it is fortunate that, it is important to note that), and (4) commentaries (e.g. you may not agree that, dear reader, we urge you to read about x). Crismore, Markkanen, and Steffensen (1993) added visual metadiscourse to these verbal metadiscourse expressions – visual devices such as punctuation, white space, icons, and document design features (e.g. italicizing, bolding, capitalization, underlining, bullets and font size) – since they also signal the reader how to understand the writer and text. us, in a text there is a distinction between conveying content and interacting with the reader by using first and second pronouns and/or by using metadiscourse. Halliday and Hasan (1976) note that when the writer seeks to engage the reader as a human agent and an interlocutor, this move toward engagement of the reader brings about the intrusion of the writer into the text and is the mark of the interpersonal function of language. ey explain that first and second pronouns and both textual and interpersonal metadiscourse categories illustrate the interpersonal function of language: The interpersonal component is concerned with the social, expressive and conative function of language, with expressing the speaker’s/writer’s “angle”: his attitudes and judgements, his encoding of the role relationships into the situation, and his motive in saying anything at all. We can summarize these by saying the ideational component represents the speaker/writer in his role as observer, while the interpersonal component represents the “speaker/writer” as intruder. (Halliday and Hasan 1976:26)

Halliday and Hasan emphasize the speaker’s activity in establishing an interpersonal dimension, but as Dillon argues (1986:4), “the hearer/reader is implicated as well in the case of questions, imperatives, and sometimes even direct address with the pronoun you. And, there is little persuasive writing without the [interpersonal component] and action, for the relation of writing and reading may be dramatized as a communicative event. It is human subjects who persuade and are persuaded.” Dillon (1986:13) continues by stating that verbal interaction, especially in written texts, “is fundamentally and irreducibly rhetorical.” When we study

A corpus linguistic analysis

313

writing and reading as interaction, it is not just about studying the use of words, a linguistic analysis, for culture and social reality define and are defined by writing. Of course, as Dillon (1986:11) states, “...it certainly is linguistic, since it deals with the meanings and workings of questions, adverbs, pronouns, and a host of other grammatical and rhetorical features.” But studying interactions in written and spoken texts is also about rhetoric. Both print and visual metadiscourse and pronouns have interpersonal functions that have persuasive, rhetorical effects on readers. Rhetoric as well as linguistics is important and is achieved through interpersonal functions evidenced by linguistic features such as metadiscourse and interpersonal pronouns. As Dillon (1986:4) notes, “With persuasion the interpersonal function comes to life.” To study the interpersonal function of writing, specifically interpersonal pronouns as well as textual, interpersonal, and visual metadiscourse, it makes sense to begin with texts that are both persuasive and personal such as the fundraising letters from health and human service agencies and educational institutions.

Relationship of interpersonal pronouns and metadiscourse to linguistic dimensions and interactional footings Linguists Biber et al. (1998) and Dillon (1986) have identified linguistic/ textual dimensions that are useful when analyzing the relation of the writer and reader. Biber et al. (1998:55) define dimensions as “bundles of linguistic features that co-occur in texts because they work together to mark some common underlying function.” Biber’s (1988) five dimensions are similar in some ways to Dillon’s (1986) footings/stances, but are also different. Dimensions of English involve co-occurrence patterns that underlie Biber’s five dimensions: (1) involved versus informational production; (2) narrative versus non-narrative discourse; (3) elaborated versus situation-dependent references; (4) overt expression of argumentation; and (5) impersonal versus non-impersonal style. Using both linguistic and rhetorical approaches to writer-reader interactions, Dillon explains that there seem to be at least five distinct scales of interactional footings/stances. He states, “Within the framework of current rhetorical theory, a footing would probably be treated as a part of tone, where tone is defined as the writer’s attitude toward the reader (as distinguished from the reader’s stance toward the material) or the writer’s attitude toward the content material or the writer’s self” (Dillon 1986:18). Dillon also refers to footings as

314 Avon Crismore

textual strategies and notes that they involve allusions to cultural codes. His five distinct scales of stances or footings toward the reader are as follows: 1. 2. 3. 4. 5.

Impersonal/personal Distant/solidary (meets positive face wants) Superior, authoritative/equal, limited (respects negative face wants) Direct, confrontative/oblique (respects positive face wants) Formal/informal

ese footing/stance dimensions are not wholly independent variables because complex combinations are possible. Both Biber’s and Dillon’s linguistic, textual, and rhetorical dimensions are relevant to the study of interpersonal pronoun and metadiscourse use in fundraising letters. Investigating interpersonal pronouns and metadiscourse is important not only for infusing new ideas and practical applications in the field of fundraising but also for expanding the research that has already been done on the use of pronouns and the research done on the textual and interpersonal metadiscourse categories and subcategories. A review of the interpersonal pronoun and metadiscourse literature indicates that the research done so far has focused on academic, scientific and medical texts, technical communication and business texts, textbooks for student non-native speakers of English, and comparisons of texts written for professional or popular audiences. As far as can be determined, no studies have investigated interpersonal pronouns and metadiscourse in philanthropic texts. e research that has been done, though, is quite informative and useful for the current investigation of fundraising letters. Valle (1996) investigated the use of first person plural pronoun use in popular and scientific specialized texts written by Stephen Jay Gould, and found that we referred to different groups of people, such as general readers, natural historians, paleobiologists, and to his co-author David Woodruff. e we that Gould used ranged from more distant references to different discourse communities, as seen in his scientific articles, to more intimate relationships such as with his collaborator Woodruff, as seen in his popular essays and books. My colleague Rodney Farnsworth and I studied Gould’s use of metadiscourse in popular and scientific texts and found both we and I used to present his abundant use of metadiscourse in both types of texts (Crismore and Farmsworth 1990). Textual metadiscourse, specifically contrastive and noncontrastive connectives in academic argumentation, was studied by Barton

A corpus linguistic analysis

315

(1995) who found evidence that textual metadiscourse can be multifunctional. Connective expressions simultaneously served both textual and interpersonal functions. Mayer (1987) studied the instructional variables that influence readers’ cognitive processes such as comprehension, short- and long-term retention of content during the reading of science texts, and that influence the transfer of the cognitive processes to similar (near transfer) and not-so-similar (far transfer) reading situations. Textual metadiscourse (called signaling in his study) was one of the variables he studied. He found that signaling the previews, reviews, logical connectors, code markers and action markers helped readers see text organization better and helped them build coherence. With signaling, readers recalled more content and did well at far transfer of the cognitive strategies and signaling (textual metadiscourse) techniques. Textual metadiscourse thus helped increase comprehension and meaningful learning for the readers in his study. Although there are some studies of textual metadiscourse such as Mayer’s (1987) and Mauranen’s (1993) studies, most of the relevant published studies involve interpersonal pronouns and metadiscourse and involve scientific, academic, or professional writing. Many of the discourse analysis studies focused on first and second person pronouns and one or more subcategories of interpersonal metadiscourse such as hedges (oen modal verbs), emphatics, or attitude markers. A few studies investigated visual metadiscourse as a part of document or page design (e.g. Kumpf 2000). Examples of studies of interpersonal linguistic expressions and rhetorical devices include Bernhardt (1985); Crismore (1989); Crismore and Farnsworth (1989); Crismore, Markkanen and Steffensen (1993); Crismore and Vande Kopple (1990;1997); Flowerdew (1997); Glässer (1975); Hagge and Kastelnick (1989); Hyland (2001a; 2001b); Longo (1994); and Myers (1989). Most of these studies were designed to investigate the relationship of interpersonal aspects of language such as first and second person pronouns and metadiscourse to politeness strategies; writer and reader interaction; writers’ subjectivity, attitudes, tone, and stance; reader-friendly, considerate texts, cultural and gender differences in the use of these interpersonal aspects; as well as rhetorical functions and effects. Writers’ purposes, audience characteristics, personal style, native language, culture and disciplinary genre conventions all play a part in the type, amount, and location of the pronouns and metadiscourse. Using a linguistic corpora approach, Biber and his colleagues (1988;

316 Avon Crismore

1989; 1995) also analyzed a variety of texts focusing on the interpersonal aspects of language, pronouns and metadiscourse in order to identify and raise awareness of the various dimensions of text. e goals of this study are to use a corpus linguistic approach to continue and extend the previous studies of interpersonal pronouns as well as metadiscourse, especially those used in nonacademic texts such as the direct mail fundraising letters in the ICIC corpus.

Methods e two types of nonprofit agencies chosen for this comparative text analysis of interpersonal pronouns and metadiscourse use were Education and Health and Human Services (HHS). At the time of analysis in 2000, more total letters were available from these two fields than from the other three fields of Arts and Culture, Environment, and Community Development. Education is represented by 30 organizations (e.g. Indiana University Purdue University Indianapolis, Cathedral High School, and Malone College), and HHS by 49 (e.g. Arthritis Foundation, American Cancer Society, YWCA/YMCA). e complete list for both agencies shows that about two-thirds were national or international agencies with local offices in Indianapolis. Table 1 shows that 108 letters were received from Education and 24 letters from HHS. For Education, the letters had a total of 38,688 words averaging about 350 words per letter, about 20 sentences per letter, and about 18 words per sentence. HHS letters had a total of 26,402 words, with about 350 words per sentence, about 20 sentences per letter, and 17 words per sentence. e averages for the words per letter, sentences per letter, and words per sentence are almost the same for Education and for HHS. Table 1. Letter raw frequency counts used for normalization (based on 1,000 Words) frequency Items

Education

HHS

Direct mail letters received at time of comparative study Total words in all letters Average number of words per letter Total sentences in all letters Average number of sentences per letter Average number of words per sentence

109 38,688 355 2,186 20 18

75 26,402 352 1,512 20 18

A corpus linguistic analysis 317

When researchers use a corpus to examine the frequency of features across texts, it is important that the counts are comparable. So, the raw frequency counts from letters of different lengths or different word totals were adjusted by the process of normalization. e Education and HHS letters were normed to a basis of 1,000 words (Biber et. al. 1995). e quantitative part of analyzing these letters was accomplished using the Wordsmith soware for the frequencies of interpersonal pronoun and metadiscourse items. e comparison consisted of normed frequency counts of first and second person singular and plural pronouns, the textual metadiscourse subcategories of logical connectors, action markers and miscellaneous items. e interpersonal metadiscourse subcategories had normed frequency counts for modal verbs, non-modal hedges, emphatics, attitudinal markers, and punctuation. According to Biber (1988), in corpus studies, researchers use quantitative techniques to identify the groups of features that co-occur in texts and aerwards interpret these features in functional terms. My corpus analyses will attempt to answer the research questions stated earlier, thus complementing Biber’s textual dimension findings and Dillon’s footing/stances, as well as noting the relationship of interpersonal pronouns and metadiscourse to effective persuasive writing.

Results for normed frequency counts e comparative results in the following sections are based on a descriptive analysis of frequency counts rather than on an inferential statistical analysis such as t-tests. e findings, although not statistically significant, are nonetheless useful and important for pedagogical and practical reasons.

Results for interpersonal pronoun One research question guiding the analysis of this comparative study was “Are there differences and similarities in the use of first and second person pronouns in Education and HHS letters?” Table 2 below shows that the writers of Education and HHS letters used very similar amounts of first and second person pronouns although there were some minor differences. HHS writers used somewhat more first person singular pronouns (I, me, my/mine) than the Education writers and more first person plural pronouns (we, us, our, ours) than the Education writers. Education writers used more second person plu-

318 Avon Crismore

Table 2. Pronoun frequency counts for Education and Health and Human Services Pronoun types

I. 1st person s. I me my/mine Total av.

Frequency per 1,000 words EDUCATION letters HHS letters (38,688 total words) (24,402 total words)

6.44 .93 1.40 8.77 2.92

6.78 1.17 1.59 9.54 3.18

II. 1st person pl. we our/ours us Total av.

9.10 10.98 2.87 22.95 7.65

8.90 9.36 4.58 22.84 7.61

III. 2nd person you you/yours Total av.

13.03 11.40 24.43 12.22

12.84 10.15 22.99 11.50

56.15

55.37

18.72

18.46

IV. Combined Total 1st and 2nd pro Comb. av. Textual Metadiscourse Subcategories

Frequency per 1,000 Words Education (38,688 total words)

I. Logical connectors Additives also as well as another in addition too moreover Total

1.68 .49 .44 .36 .21 .10 3.28

HHS (24,402 total words)

.76 .42 .30 .27 .38 .00 2.13

A corpus linguistic analysis 319

Table 2. Cont. Textual Metadiscourse Subcategories Continued

Contrastives but however still yet in contrast Total Comparatives like similar Total Consequences so therefore thus consequently Total

Frequency per 1,000 Words EDUCATION

HHS

(38,688 total words)

(24,402 total words)

1.42 .62 .52 .23 .03 2.82

1.82 .23 .49 .34 .00 2.88

1.65 .05 1.70

3.11 .08 3.19

1.27 .13 .10 .08 1.58

1.63 .15 .15 .11 2.04

ral pronouns (you, your/yours) than HHS writers. Both Education and HHS writers used about the same amount of first person plural pronouns per 1000 words (Educ = 22.95, HHS = 22.84) as they did second person pronouns (Educ = 24.43, HHS 22.99). Both groups of writers used fewer first person singular pronouns (Educ = 8.77, HHS 9.54) than they used either first person plural or second person pronouns. Education writers used second person pronouns somewhat more than HHS writers. When frequency counts for all interpersonal pronoun are combined, it is interesting that Education and HHS have very similar normed frequency counts (Educ = 56.15, HHS = 55.37). e ordering of the eight most frequently used pronouns (you, your, our, we, I, us, my, and me) was exactly the same for both groups of writers, confirming the similarity for writers in both Education and HHS in this regard. Apparently writers in both fields realized the need to use first person singular at times for credibility, personal style, and interest but also realized the need to use first person plural and second person pronouns more frequently for

320 Avon Crismore

reader inclusiveness, engagement, and interaction. e results show that frequency counts for first person singular, first person plural, and second person pronouns fit Dillon’s scales for the personal and solitary stances/footings and Biber’s dimensions for involved and non-impersonal styles. e results indicate that both Education and HHS writers are very aware of personal and interpersonal pronouns as effective rhetorical devices for emotional appeals and for credibility appeals for persuasion.

Results for textual metadiscourse Again, as we can see in Table 3, if all subcategories of Textual Metadiscourse are combined, the normal frequency counts per 1,000 words of this type of metadiscourse are almost identical for Education (12.41) and for HHS (12.12). e five most frequently used textual metadiscourse items for both Education and HHS were also the same (and, also, like, but, and so). If we look at the specific logical connector subcategories, however, there are some differences. Education writers used more additive logical connectors than HHS writers (3.28 vs. 2.13), more sequencers (2.10 vs. 1.25), and also more action markers than HHS writers (.20 vs. .42). But HHS writers used more comparative connectors than Education writers (3.19 vs. 1.70) and also more consequence connectors (2.04 vs. 1.58). Both Education and HHS writers used about the same amount of contrastive connectors (Education = 2.82 and HHS = 2.88) as well as miscellaneous textual metadiscourse items such as in fact, in order to, in short (Education = .23 and HHS = .26). But overall, both used about the same amount of textual metadiscourse. Education writers and HHR writers appear to see the need to signal the relationships between and among ideas, a signaling which is necessary for logical persuasive appeals. Although Biber’s discussion of linguistic dimensions (1988) does not include features related to code glosses and action markers, he does discuss text connectives. Two of the features used in his analysis of linguistic dimensions of texts are the coordinating conjunction and as well as conjuncts (e.g. consequently, furthermore, however), which he lists under the category of lexical classes. In his summary of the factorial structure used for his factor analysis method, he lists conjuncts under Factor 5. Conjuncts had the highest positive loading for this factor. He points out that this factor is found in professional letters, pointing out that this genre is both interactive and abstract, requiring highly explicit, text-internal reference. He also states that professional letters

A corpus linguistic analysis 321

Table 3. Textual Metadiscourse Frequency Counts for Education and HHS Textual Metadiscourse Subcategories Continued

Frequency per 1,000 Words EDUCATION

HHS

(38,688 total words)

(24,402 total words)

1.14 .41 .36 .16 .03 2.10

.72 .30 .15 .04 .04 1.25

.05 .18 .00 .23

.04 .11 .11 .26

.39 .31 .70

.42 .00 .42

12.41

12.12

I. Logical Connectors (cont) Sequencers first next second third finally Total Misc. Textual Items in fact in order to in short Total II. Action Markers such as for example Total

Total Textual Metadiscourse

are characterized by the dimension of overt expression of persuasion. According to Biber, professional letters are midway on the dimension of involved versus informational production. us the results for textual metadiscourse in fundraising letters are related to the conjunct feature and the professional letter genre used in Bibers’ linguistic and functional dimensions used for his study of text variations.

Results for interpersonal metadiscourse Overall, the frequency counts for Interpersonal Metadiscourse (using combined subcategories: modal verb hedges, non-modal hedges, emphatics, attitudinals, attributers, and punctuation) were higher for HHS (36.30) than for

322 Avon Crismore

Education (27.55). As expected, the subcategory frequency counts varied for the two fields. HHS had higher frequency counts than Education for modal verb hedges (18.32) vs. (14.68), for non-modal hedges (4.63 vs. 3.98), and for attitudinal markers (2.65 vs. 1.87). e frequency counts for normed modal verb hedges frequency counts shown in Table 4 indicates that for modal verbs, the biggest differences were for the modals can / can’t and will / won’t. HHS writers used more can / can’t modals (7.66) than Education (3.33) while Education writers used more will / won’t modals (7.34) than the HHS writers (5.83). e HHS writers averaged 17.88 modals while the Education writers averaged 14.66, indicating a high degree of tentativeness and writer stance on the part of the HHS writers, using words such as may, might, could, and can. ere is the same descending order of use by both groups of writers for the four most frequently used modal verbs (will, can, would, and may), although Education writers use each more frequently. e Education writers also seem to come across with more certainty and assertiveness to readers by using words like must, will, and shall more frequently. For the non-modal hedges, the biggest difference in frequency counts was for the if conditional type of hedge. As is demonstrated in Table 8, the HHS writers used the most if conditionals (2.50 vs. 1.84 for Education), used more of the possibly / possibly items (.86 vs. .65), and also used more non-modal hedges than the Education writers (4.63 vs. 3.95). ese findings for nonmodal hedges confirm the modal verb findings for tentativeness, which is a powerful rhetorical devise when used appropriately. e two most frequently used nonmodal hedges in descending order (possible and perhaps) were the same for both groups of writers. Results for the Emphatic sub-category of Interpersonal Metadiscourse confirm the findings for the hedges. Of the combined emphatic items used in the fundraising letters, Education writers used 4.10 emphatics per 1,000 words while HHS used only 2.75. Education writers more frequently than HHS writers used the words believe, clearly, certainly, and obviously while HHS writers more frequently used the emphatics sure/ly, true(ly), assuredly, and I know. However, overall, the Education writers tended to use more certainty markers while the HHS writers tended to use more uncertainty markers. Which strategy, using more or fewer emphatics, is more effective remains to be researched. Audience characteristics may make a difference for the resulting degree of persuasiveness found for emphatics as well as for hedges in fundraising letters.

A corpus linguistic analysis 323

Table 4. Hedges Frequency Counts for Education and HHS Hedge Subcategories

Modals will/won’t can/can’t must shall/should/shouldn’t would/wouldn’t may/might could/couldn’t Total Modals

Frequency per 1,000 Words EDUCATION HHS (38,688 total words) (24,402 total words)

I.

Nonmodal Hedges if possible/possibly maybe perhaps think indicated unlikely seem/seems hopefully suggest(s) appears(ed) indicates Total Nonmodals

7.34 3.333 .54 .49 1.34 1.34 .28 14.66

5.83 7.66 .29 .29 1.23 1.68 .90 17.88

1.84 .65 .39 .26 .21 .16 .10 .05 .08 .03 .05 .16 3.95

2.50 .86 .37 .29 .00 .16 .00 .16 .08 .04 .00 .16 4.63

II.

Looking at attitudinal markers, HHS writers again used more attitudinal comments about the content than the Education writers (2.65 vs. 1.87). e attitudinal marker importantly was used more oen than any other in both Education and HHS letters, but was found more oen in HHS letters than in Education letters (1.80 vs. 1.19). In Education letters, the attitude markers unfortunately, surprise(ingly), frankly, and personally occurred more oen than in the HHS letters, but the attitude markers sadly, happy(ly) occurred more oen in the HHS letters. e HHS writers tended to use this subtype of interpersonal metadiscourse as a rhetorical strategy to involve readers more than the Education writers did. As would probably be expected, Education writers, who were mostly aca-

324 Avon Crismore

demic writers, used more attributors/source markers than the HHS writers, with the higher frequency count due mainly to the instances of noted. Both groups of writers used the same amount of the terms according to and stated. HHS writers used the term reported while Education writers did not. Compared to the other interpersonal subcategories, attributors were used rather sparingly by writers in both fields, indicating a generally more nonacademic, informal style of writing appropriate for a general audience. As for punctuation metadiscourse, Table 5 shows that higher frequency counts were found for HHS writers (7.79) than for Education writers (2.69) – more than twice as many items. Most of the difference was due to the greater use of dashes (an informal punctuation marker) in the HHS letters. More quotation marks and ellipsis marks were also found in HHS letters than in the Education letters. But more parentheses and colons (more formal punctuation marks) were used in the Education letters than in the HHS letters. us it seems clear that HHS writers prefer informal punctuation marks while Education writers prefer the more formal punctuation marks. Reviewing the list of items Biber includes under his dimension feature Lexical Classes, he not only includes the textual metadiscourse item conjuncts (the text connective subtype), he also lists items that are sub-types of interpersonal metadiscourse. Biber classifies items such as downtoners (e.g. barely, nearly, slightly), hedges (e.g. something like, almost, at about), amplifiers (e.g. absolutely, extremely), emphatics (e.g. a lot, for sure, really), and discourse particles (e.g. sentence initial well, now, anyway). ese items fit the impersonal categories of truth markers (hedges and emphatics) and perhaps commentaries. He Table 5. Punctuation Metadiscourse Frequency Counts for Education and HHC Punctuation Markers

dashes quote marks parentheses colons semi-colon exclamation ellipsis Total

Frequency per 1,000 Words EDUCATION HHS (38,688 total words) (24,402 total words) 1.16 .54 .41 .31 .05 .13 .08 2.69

5.37 1.27 .00 .12 .16 .04 .82 7.79

A corpus linguistic analysis 325

also uses modals as one of his feature categories, including: possibility modals (e.g. can, may, night, could), necessity modals (e. g. ought, should, must), and predictive modals (e.g. will, would, shall). Another of his feature categories is Specialized Verb Classes. Here Biber lists public verbs (e.g. assert, declare, mention, say); private verbs (e.g. assume, believe, doubt, know); suasive verbs (e.g. command, insist, propose); and the verbs seem and appear. All of these items are used in the interpersonal metadiscourse subcategories of hedges, emphatics, and attributors – and some could also be used as action markers, textual metadiscourse subcategory. Biber notes that along his detachment/involvement dimension, involvement is marked by emphatic particles and hedges as well as first person pronouns. ese features related to interpersonal metadiscourse are also important for the dimension of overt expression of persuasion. Professional letters are ranked at the top of this dimension. e features Biber groups in this dimension include predictive modals, necessity modals, possibility modals, conditional clauses, and suasive verbs. All of these features were important in understanding interpersonal metadiscourse use in fundraising letters and in the use of rhetorical appeals for persuasion, and again show the relationship between these interpersonal metadiscourse characteristics and Biber’s statistically derived linguistic dimensions.

Discussion and instructional application e data showed that, in general, amounts and types of interpersonal pronouns used were similar for writers in both fields as were the amounts of metadiscourse, while the subtypes of metadiscourse were used somewhat differently. As was discussed earlier, both interpersonal pronouns and metadiscourse can be considered to be closely related to the footings and stances identified by Dillon and the language dimensions identified by Biber based on his extensive use of large English language corpora for his analyses. e findings for this study are significant for future research in persuasive writings and for current and future fundraising letter writers. ere were limitations to this study, of course. Only two fields of the non-profit sector were studied; we need to study other fields, too. It would have helped to know more about the letter writers’ background such as gender, age, education level, years of experience in the field, and positions. No doubt some of the writers were

326 Avon Crismore

more experienced and more highly educated than others. Some writers were probably employees of the agency/organization while others may have been volunteers. Some may have composed the whole letter while others may have used template letters for all or parts of their letters. Some letters were written by two or more authors collaboratively. Future research should focus more on writer characteristics as well as on information about the actual effects on the intended audience’s beliefs, attitudes, and actions, especially on their reactions to the interpersonal pronouns and to the metadiscourse used. Future researchers could also combine methods for analyzing the data, using a statistical analysis approach as well as a case study approach, by qualitatively analyzing small numbers of representative or particular types of letters and contexts for interpersonal pronouns and metadiscourse use. In addition, observations, interviews and surveys could be used as well as discourse analysis methods. e connections among interpersonal pronouns, metadiscourse, and the rhetorical appeals needed for persuasive writing are many. As writers construct and cra logical, rational appeals, interpersonal pronouns and metadiscourse will help them keep in mind the various writer/reader positions. Metadiscourse will convey to readers a caring writer who signals for them these textual situations: an example, a story or narrative, a definition, a cause or effect/consequence, a part of a process or stage, a connection between persons and actions, a classification, a comparison or contrast, an analogy, and appeals to shared authority. All of this signaling can make use of interpersonal pronouns to present the metadiscourse to readers. Credibility appeals can make use of interpersonal pronouns and metadiscourse to establish a relationship between writers and readers through the medium of the letter. Writers can elaborate for readers shared values, experiences, and backgrounds by using interpersonal pronouns and metadiscourse. Craing appeals to readers’ emotions is very much dependent on how writers present both their credibility and rational appeals. Writers make use of rhetorical strategies such as the use of interpersonal pronouns and metadiscourse to help themselves as well as their readers understand the many assumptions, attitudes, ideas, and feelings they use when they make decisions about issues related to fundraising. All three types of rhetorical appeals are at the heart of effective fundraising letters. Included in rhetorical appeals is the signaling of relationships, whether between writer and readers or between and among ideas and issues. Readers need to see clearly both types of relationships. When

A corpus linguistic analysis 327

these relationships between writer and reader and between ideas are clearly signaled with interpersonal pronouns and metadiscourse, then the letters will be coherent and persuasive: they will be rhetorically effective. Corpus analysis studies, I believe, are a useful way to increase the knowledge and skills of fundraising letter writers, both non-profit employees and volunteers. From such research, writers can actually see how various non-profit fields are similar or different for writers’ use of interpersonal pronouns and metadiscourse regarding the amounts and types used. Both experienced and novice writers can compare their own uses of these rhetorical devices. ey can self-evaluate their writing, imitate successful uses of interpersonal pronouns and metadiscourse used by other writers, and share their knowledge with other fundraising letter writers in their own agency, at conferences or in written materials. ey can become researchers themselves, making use of the current corpus of letters and other texts at ICIC and also transfer their knowledge of interpersonal pronouns and metadiscourse use to other types of non-profit texts that they write. In addition to using corpus research studies carried out earlier and possibly using the corpus to do research for themselves and their field, they can also learn to become better critical readers of non-profit texts as they become aware of and then evaluate the use of rhetorical appeals and strategies in general and the use of interpersonal pronouns and metadiscourse in particular. In school settings, both teachers and non-ESL as well as ESL students can profit from reading corpus research studies of language, speaking, reading, and writing use that have been carried out in a variety of settings: developmental studies of students in all levels of schools; studies of teaching methods used to teach English language persuasive strategies (speaking, reading, and writing strategies); studies of professional uses of such persuasive/rhetorical devices in profit and non-profit sectors; studies of international writers (both professional writers and student writers) and finally studies of the trainers and training materials used to teach effective writing skills for non-profit and private sectors. Corpus studies of interpersonal pronouns and metadiscourse use are equally important for students who are native speakers of English as well as for those who are non-native speakers of English. An earlier study (Crismore 2003) has shown that Finnish students use pronouns differently than U.S. students. U.S. students used more first person pronouns than Finnish students. And gender makes a difference, too. Both U. S. and Finnish males use more first personal

328 Avon Crismore

singular pronouns than females, and females use more first person plural and second person pronouns. Another study (Intaraprawat and Steffensen 1995) found that more metadiscourse and a wider variety of metadiscourse was used by skilled ESL writers than by poor ESL writers. Hinkel (1995) found cultural differences in the use of modal verbs. Clearly, we need more corpus studies of interpersonal pronouns and metadiscourse in order to enhance the teaching and learning of English as a second or foreign language. We also need these corpus studies to learn more about the use of interpersonal devices for teaching and learning English for specific purposes (ESP). For instance, Longo (1994) found differences in the use of metadiscourse by engineers and engineering students: engineers used much more than the students did. No doubt future studies would also find differences between experts and novices in the use of interpersonal pronouns and metadiscourse in many workplaces where English is used for specific purposes. Differences also might be found for other factors such as gender, age, specific cultures, and educational level. We still have much to learn about interpersonal pronouns and metadiscourse, and corpus studies can help us find out what we need to learn.

References Abelen, E. G. Redeker and Thompson, S. A. 1993. “The rhetorical structure of U.S.-American and Dutch fund-raising letters.” Text 13:323–350. Barton, E. L. 1995. “Contrastive and non-contrastive connectors: Metadiscourse functions in argumentation.” Written Communication 12 (2):219–240. Bernhardt, S. 1985. “The writer, the reader, and the scientific text.” Technical Writing and Communication 15 (2):163–174. Biber, D. 1988. Variation Across Speech and Writing. Cambridge: Cambridge University Press. Biber, D. 1995. Dimensions of Register Variation A cross-linguistic comparison. Cambridge: Cambridge University Press. Biber, D. August 1998. “Corpus linguistic text banks.” Paper presented at The Indiana Center for Intercultural Communication and The Center on Philanthropy Roundtable, Indianapolis, Indiana University-Purdue University. Biber, D., Conrad, S., and Reppen, R. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Biber, D. and Finegan, E. 1989. “Adverbial stance types in English.” Discourse Processes 11: 1–34. Booth, W. C. 1961. The Rhetoric of Fiction. Chicago: University of Chicago Press.

A corpus linguistic analysis 329

Clotfelter, C. T. and Ehrlich, T. 2001. Philanthropy and the Nonprofit Sector. Bloomington: Indiana University Press. Connor, U. and Upton, T. 2003. “Linguistic dimensions of direct mail letters. In Corpus Analysis: Language Structure and Language Use, P. Leistyna and C. F. Meyer (eds), 71–86. Amsterdam: Rodopi. Cook, G. 1992. The Discourse of Advertising. London/New York: Routledge. Crismore, A. 1989. Talking with Readers: Metadiscourse as Rhetorical Act. New York: Peter Lang. Crismore, A. 2003. “Cross-cultural variation in student writing to convince: Personal and interpersonal pronouns.” Unpublished manuscript. Fort Wayne, Indiana: Indiana University-Purdue University. Crismore, A. and Farnsworth, R. 1989. “Mr. Darwin and his readers: Exploring interpersonal metadiscourse as a dimension of ethos.” Rhetoric Review 8 (2):91–112. Crismore, A. and Farnsworth, R. 1990. “Metadiscourse in popular and professional science discourse.” In The Writing Scholar: Studies in the Language and Conventions of Academic Discourse, U. Nash, (ed.), 118–136. Newburg Park, CA: Sage. Crismore, A., Markkanen, R., and Steffensen, M. 1993. “Metadiscourse in persuasive writing.” Written Communication, 8:39–71. Crismore, A. and Vande Kopple, W. J. 1990. “Rhetorical contexts and hedges.” Rhetorical Society Quarterly 20:49–59. Crismore, A. and Vande Kopple, W. J. 1997. “The effects of hedges and gender on the attitudes of readers in the United States toward material in a science textbook.” In Culture and Style in Academic Discourse, A. Duszak (ed.), 223–253. Berlin: Mouton de Gruyter. Dillon, G. L. 1986. Rhetoric as Social Imagination: Explorations in the Interpersonal Function of Language. Bloomington, Indiana: Indiana University Press. Flowerdew, L. June, 1997. “Interpersonal strategies: Investigating interlanguage corpora.” RELC Journal 28 (1):72–88. Glässer, R. 1975. “Emotive features in technical and scientific English.” In Style and Text, N. Enkvist (ed.), London: Longman. Hagge, J. and Kostelnick, C. 1989. “Linguistic politeness in professional prose.” Written Communication 6 (3):312–339. Halliday, M. H. K. 1973. Explorations in the Function of Language. London: Edward Arnold. Halliday, M. H. K. and Hasan, R. 1976. Cohesion in English. London: Longman. Hinkel, E. 1995. ‘The use of modal verbs as a reflection of cultural values.” TESOL Quarterly 29 (2):325–341. Hyland, K. 2001a. “Bringing in the reader: Addressee features in academic articles.” Written Communication 18 (4):549–574. Hyland, K. 2001b. “Humble servants of the discipline? Self-mention in research articles.” English for Specific Purposes 20 (1):207–226. Intaraprawat, P. and Steffensen, M. S. 1995. “The use of metadiscourse in good and poor ESL essays.” Journal of Second Language Writing 4 (3):253–272.

330 Avon Crismore

Kumpf, E. P. 2000. “Visual metadiscourse: Designing the considerate text.” Technical Communication Quarterly 9 (4):401–424. Longo, B. 1994. “How the use of metadiscourse markers is related to organizational culture: Preliminary results.” Technical Communication 18 (1) 18–36. Luuka, M. R. August 1996. “Interpersonality and academic discourse practices.” Paper presented at the 11th World Congress of Applied Linguistics, Jyvaskyla, Finland. Mann, U. C. and Thompson, S. A. (eds). 1992. Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text. Amsterdam and Philadelphia: Benjamins. Mauranen, A. 1993. “Contrastive ESP rhetoric: Metatext in Finnish-English economics texts.” English for Specific Purposes 12:3–22. Mayer, R. E. 1987. “Techniques that foster active reading strategies.” Paper presented at the American Educational Research Association. Washington, D. C. Myers, G. 1989. “The pragmatics of politeness in scientific articles.” Applied Linguistics 10 (1):1–35. Smith, E. J. Jr. 1986. “Achieving impact through the interpersonal component.” In Functional Approaches to Writing: Research Perspectives. B. Couture (ed.), 108–119. Norwood, NJ: Ablex. Strapulos, K. 2002. “The appeal for support in philanthropic fundraising and sales letters.” Unpublished manuscript. Indianapolis: Indiana University-Purdue University. Tempel, G. July 2002. “A message from Gene Tempel.” IU Center on Philanthropy News, Indianapolis IN. Valle, E. August 1996. “Whose (inter)text is it anyway? Representation of community and intertext in scientific and popular writing.” Paper presented at the 11th World Congress of Applied Linguistics. Jyvaskyla, Finland. Vande Kopple, W. J. 1985. “Some exploratory discourse on metadiscourse.” College Composition and Communication 36:82–93. Van Nus, M. August 1996. “Persuasive strategies in Dutch direct mail.” Paper presented at the 11th World Congress of Applied Linguistics. Jyvaskyla, Finland. Van Nus, M. 1998. “‘Can we count on your bookings of potatoes to Madeira?’ -corporate and context discourse practices in direct sales letters.” In Writing Business: Genres, Methods, and Language, F. Bargiela and C. Nickerson (eds), 130–141. London: Longman. Virtanen, T. 1998. “Developing a linguistic corpus for philanthropic fundraising texts.” Paper presented at The Indiana Center for Intercultural Communication and Center on Philanthropy Roundtable, Indianapolis, Indiana University-Purdue University. Williams, J. M. 2003. Style: Ten Lessons in Clarity and Grace. London: Longman. Williams, S. May 28, 2000. “Private sector contributions receive high marks for IU.” IU Home Page. Work in progress: “Parlez-vous fundraising.” Philanthropy Matters 12.

Index

331

Index A academic discourse, 2, 4, 7, 22, 39, 41, 52, 55, 59, 60, 65, 66, 83, 87, 330 affective appeal, 7, 259, 260, 268, 270, 271, 273, 278, 279 American National Corpus, 1, 12, 116, 139 applied linguistics, 2, 4, 90, 94, 141, 142, 158, 307 Aristotle, 89, 259, 261, 262, 263, 265, 266, 268, 269, 280 attitudes, 89, 269, 273, 312, 315, 326, 329 B Bank of English, 1, 12, 40, 116 see also Bank of General English, 274, 277 British Academic Spoken English Corpus, 22 British National Corpus, 1, 12, 29, 116, 169, 275 see also BNC, 14, 15, 19, 27, 31, 139, 170, 190 Brown Corpus, 1, 12, 13 business English, 5, 169, 170, C Cambridge and Nottingham Corpus of Business English, 170 see also CANBEC, 5, 171, 172, 173, 174, 175, 176, 177, 178, 180, 181, 182, 183, 184, 190, Cambridge and Nottingham Corpus of Discourse in English, 20, 116 see also CANCODE, 174, 190 Cambridge International Corpus, 1, 12, 190 cluster, 5, 172, 176, 177, 183, 295, 296, 297, 298, 300, 302, 303 coding, 64, 85, 92, 118, 119, 295, 305 Collins Cobuild Corpus, 12 communities of practice, 168

Computer Learner Corpora, 24, 28, 29, 30 conversation, 4, 5, 37, 38, 41, 56, 76, 78, 79, 83, 84, 96, 127, 128, 130, 132, 134, 164, 172, 173, 175, 185, 188, 190 see also conversation analysis, 168, 172 corpus design, 7, 117, 137 Corpus of Philanthropic Fundraising Discourse, 258 Corpus of Spoken Professional American English, 40 see also CSPAE, 44, 45, 46 corpus-driven, 116, 137, 141, 142, 158 credibility appeal, 259, 266, 267, 268, 270, 273, 278, 279, 320, 326 D dimensions, 72, 74, 75, 109, 242, 244, 245, 246, 247, 249, 252, 253, 254, 293, 308, 313, 314, 316, 320, 321, 325 discourse intonation, 4, 120, 121, 122, 123, 127, 128, 130, 136, 138, 144, 149, 154, 158 discourse marker strings, 42 discourse organizing functions, 51 E emotional appeals, 6, 250, 270, 279, 320 English for Academic Purposes, 3, 39 see also EAP, 40, 61, 62 English Gigaword Corpus, 12 ESL/EFL, 84 ESP, 3, 4, 19, 83, 118, 138, 142, 144, 154, 155, 158, 328 ethos, 6, 89, 94, 258, 259, 266, 268 F face-to-face conversation, 75, 78, 79, 245, 246, 247, 248 fundraising discourse, 6, 17, 258, 260, 263, 266, 268, 280, 287, 288, 290, 291,

332 Index

292, 294, 305 G generic organization, 167 genre analysis, 2, 6, 168, 235, 236, 237 grammatical tags, 71 H Hong Kong Corpus of Spoken English, 4, 23, 29, 115, 142, 160 see also HKCSE, 117, 118, 120, 121, 123, 124, 125, 126, 127, 131, 133, 135, 136, 138, 139, 149, 159 I idiomatic coherence, 42 idioms, 37, 39, 67, 184, 187 indeterminacy, 216 Indiana Business Learner Corpus, 24 interactional footings, 313 interactional functions, 51, 52 interactive, 3, 41, 43, 46, 47, 48, 51, 54, 60, 64, 71, 109, 187, 245, 258, 311, 320 see also interactive speech, 47, 51, 60 International Corpus of English, 25, 169 International Corpus of Learner English, 26 intertextuality, 6, 97, 203, 204, 206, 212, 222, 224 L lectures, 2, 4, 20, 21, 22, 24, 38, 40, 49, 51, 65, 67, 71, 74, 75, 76, 77, 78, 79, 83, 84, 115 legal discourse, 5, 6, 24, 203, 204, 206, 207, 212, 216, 222, 224 lexical bundle, 4, 37, 42, 56, 65, 72, 79, 80, 81, 83, 84, 173, 176 logos, 6, 7, 258, 259, 260, 266 London-Lund Corpus, 1, 12, 13, 120 Longman Spoken and Written English Corpus, 1, 12, 116 M metaphors, 7, 167, 168, 184, 187, 190, 291, 292, 293, 294, 295, 296, 297, 298,

299, 300, 301, 302, 303, 304, 305 Michigan Corpus of Academic Spoken English, 21, 38, 63, 64, 66, 116, 140 see also MICASE, 3, 22, 32, 40, 41, 42, 43, 44, 45, 46, 47, 49, 51, 52, 56, 58, 60, 61 mixed-reference corpora, 14 MonoConc, 110 monologic speech, 43, 49, 50, 60 moves, 6, 148, 149, 155, 213, 214, 223, 236, 237, 239, 240, 241, 242, 243, 247, 248, 251, 252, 253, 254, 255, 258 see also move structure, 5, 17, 158, 254 move structure analysis, 17 multi-dimensional analysis, 65, 71, 72, 82, 83, 241, 243, 247, 251 N native speaker, 1, 4, 18, 25, 62, 115, 117, 136, 137, 141, 171, 327 natural discourse, 79 see also naturally-occurring discourse, 67, 70, 137 non-native speaker, 18, 23, 25, 40, 61, 62, 117, 136, 171, 254, 314, 327 NPR, 40, 45, 46 O occluded genres, 15, 24 official documents, 245, 248 orthographic transcription, 120 P pathos, 6, 89, 258, 259, 268 persuasive texts, 250, 310 philanthropic discourse, 24, 253, 257, 258 pragmatic functions, 3, 41, 44, 51, 53, 55, 58, 59, 62, 92, 168 professional discourse, 5, 22, 27, 41, 117, 118, 119, 168, 169, 173, 186, 187, 211, 224 prosodic transcription, 116, 120, 121, 128, 138, 159 see also prosodically transcribed, 4, 120, 142, 149

Index 333

Q QSR Nud*ist, 295 R rational appeals, 250, 259, 260, 266, 270, 273, 279, 280, 326 register, 20, 22, 39, 62, 66, 67, 69, 70, 71, 80, 85, 174, 175, 176, 187, 215, 241 see also spoken university registers, 78, 79 written registers, 4, 38, 65, 67, 75, 76, 79 relational communication theory, 7, 289 research articles, 3, 4, 66, 89, 108, 110, 253, 254 rhetorical appeals, 258, 259, 261, 270, 273, 274, 280, 325, 326, 327 S spoken business discourses, 115 structural coherence, 39, 42 Switchboard Corpus, 40 see also SWB, 44, 45, 46

T teaching, 1, 3, 5, 8, 22, 37, 39, 40, 41, 60, 61, 62, 64, 65, 68, 80, 81, 82, 83, 84, 108, 109, 117, 119, 120, 127, 137, 138, 142, 147, 158, 167, 170, 189, 203, 216, 224, 253, 254, 259, 327, 328 technical terms, 78, 79 text segments, 15 textual authority, 207, 209, 210 TOEFL 2000 Spoken and Written Academic Language Corpus, 3, 65 see also T2K-SWAL Corpus, 65, 66, 68, 69, 72, 75, 79 transcribed, 68, 71, 91, 115, 126, 127, 138, 171, 190 see also transcribing, 71, 92, 120, 141, 188 transcription, 71, 116, 120, 121, 138 W WinMax Pro, 92 word strings, 43 WordPilot 2000, 91, 110 Wordsmith Tools, 144, 173, 219

In the series Studies in Corpus Linguistics (SCL) the following titles have been published thus far or are scheduled for publication: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

PEARSON, Jennifer: Terms in Context. 1998. xii, 246 pp. PARTINGTON, Alan: Patterns and Meanings. Using corpora for English language research and teaching. 1998. x, 158 pp. BOTLEY, Simon and Tony McENERY (eds.): Corpus-based and Computational Approaches to Discourse Anaphora. 2000. vi, 258 pp. HUNSTON, Susan and Gill FRANCIS: Pattern Grammar. A corpus-driven approach to the lexical grammar of English. 2000. xiv, 288 pp. GHADESSY, Mohsen, Alex HENRY and Robert L. ROSEBERRY (eds.): Small Corpus Studies and ELT. Theory and practice. 2001. xxiv, 420 pp. TOGNINI-BONELLI, Elena: Corpus Linguistics at Work. 2001. xii, 224 pp. ALTENBERG, Bengt and Sylviane GRANGER (eds.): Lexis in Contrast. Corpus-based approaches. 2002. x, 339 pp. STENSTRÖM, Anna-Brita, Gisle ANDERSEN and Ingrid Kristine HASUND: Trends in Teenage Talk. Corpus compilation, analysis and ﬁndings. 2002. xii, 229 pp. REPPEN, Randi, Susan M. FITZMAURICE and Douglas BIBER (eds.): Using Corpora to Explore Linguistic Variation. 2002. xii, 275 pp. AIJMER, Karin: English Discourse Particles. Evidence from a corpus. 2002. xvi, 299 pp. BARNBROOK, Geoﬀ: Deﬁning Language. A local grammar of deﬁnition sentences. 2002. xvi, 281 pp. SINCLAIR, John McH. (ed.): How to Use Corpora in Language Teaching. 2004. viii, 308 pp. LINDQUIST, Hans and Christian MAIR (eds.): Corpus Approaches to Grammaticalization in English. 2004. xiv, 265 pp. NESSELHAUF, Nadja: Collocations in a Learner Corpus. xii, 326 pp. + index. Expected Winter 04-05 CRESTI, Emanuela and Massimo MONEGLIA (eds.): C-ORAL-ROM. Integrated Reference Corpora for Spoken Romance Languages. ca. 300 pp. (incl. DVD). Expected Winter 04-05 CONNOR, Ulla and Thomas A. UPTON (eds.): Discourse in the Professions. Perspectives from corpus linguistics. 2004. vi, 334 pp. ASTON, Guy, Silvia BERNARDINI and Dominic STEWART (eds.): Corpora and Language Learners. 2004. vi, 311 pp.

Discourse In The Professions: Perspectives From Corpus Linguistics (Studies in Corpus Linguistics, SCL 16)

Perspectives on Corpus Linguistics (Studies in Corpus Linguistics)

Keyness in Texts (Studies in Corpus Linguistics (SCL))

Corpus Linguistics

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Keyness in Texts (Studies in Corpus Linguistics)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Collocations in a Learner Corpus (Studies in Corpus Linguistics)

Corpus Approaches to Grammaticalization in English (Studies in Corpus Linguistics)

Researching Specialized Languages (Studies in Corpus Linguistics)

Corpus, Cognition and Causative Constructions (Studies in Corpus Linguistics)

Discourse on the Move: Using corpus analysis to describe discourse structure (Studies in Corpus Linguistics)

Discourse on the Move: Using corpus analysis to describe discourse structure (Studies in Corpus Linguistics)

English Discourse Particles: Evidence from a Corpus (Studies in Corpus Linguistics)

Statistics for Corpus Linguistics

An Introduction to Corpus Linguistics (Studies in Language and Linguistics)

Corpus and Context: Investigating Pragmatic Functions in Spoken Discourse (Studies in Corpus Linguistics, Volume 30)

Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries (Studies in Corpus and Discourse)

Corpus linguistics around the world

Antonymy: A Corpus-Based Perspective (Routledge Advances in Corpus Linguistics)

Lexicology and Corpus Linguistics (Open Linguistics)

Corpus Linguistics: Method, Theory and Practice (Cambridge Textbooks in Linguistics)

A Glossary of Corpus Linguistics (Glossaries in Linguistics)

Statistics for Corpus Linguistics (Edinburgh Textbooks in Empirical Linguistics)

Metadiscourse in L1 And L2 English (Studies in Corpus Linguistics)

English Corpus Linguistics: An Introduction

Structural Nativization in Indian English Lexicogrammar (Studies in Corpus Linguistics)

A Glossary of Corpus Linguistics (Glossaries in Linguistics)

Multifactorial Analysis in Corpus Linguistics (Open Linguistics Series)

Statistics for Corpus Linguistics (Edinburgh Textbooks in Empirical Linguistics)

Corpus linguistics 25 years on

Discourse In The Professions: Perspectives From Corpus Linguistics (Studies in Corpus Linguistics, SCL 16)

Perspectives on Corpus Linguistics (Studies in Corpus Linguistics)

Keyness in Texts (Studies in Corpus Linguistics (SCL))

Corpus Linguistics

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Keyness in Texts (Studies in Corpus Linguistics)

Lexis in Contrast: Corpus-based Approaches (Studies in Corpus Linguistics)

Collocations in a Learner Corpus (Studies in Corpus Linguistics)

Corpus Approaches to Grammaticalization in English (Studies in Corpus Linguistics)

Researching Specialized Languages (Studies in Corpus Linguistics)

Corpus, Cognition and Causative Constructions (Studies in Corpus Linguistics)

Discourse on the Move: Using corpus analysis to describe discourse structure (Studies in Corpus Linguistics)

Discourse on the Move: Using corpus analysis to describe discourse structure (Studies in Corpus Linguistics)

English Discourse Particles: Evidence from a Corpus (Studies in Corpus Linguistics)

Statistics for Corpus Linguistics

An Introduction to Corpus Linguistics (Studies in Language and Linguistics)

Corpus and Context: Investigating Pragmatic Functions in Spoken Discourse (Studies in Corpus Linguistics, Volume 30)

Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries (Studies in Corpus and Discourse)

Corpus linguistics around the world

Antonymy: A Corpus-Based Perspective (Routledge Advances in Corpus Linguistics)

Lexicology and Corpus Linguistics (Open Linguistics)

Corpus Linguistics: Method, Theory and Practice (Cambridge Textbooks in Linguistics)

A Glossary of Corpus Linguistics (Glossaries in Linguistics)

Statistics for Corpus Linguistics (Edinburgh Textbooks in Empirical Linguistics)

Metadiscourse in L1 And L2 English (Studies in Corpus Linguistics)

English Corpus Linguistics: An Introduction

Structural Nativization in Indian English Lexicogrammar (Studies in Corpus Linguistics)

A Glossary of Corpus Linguistics (Glossaries in Linguistics)

Multifactorial Analysis in Corpus Linguistics (Open Linguistics Series)

Statistics for Corpus Linguistics (Edinburgh Textbooks in Empirical Linguistics)

Corpus linguistics 25 years on

Recommend Documents