Gesture and the Dynamic Dimension of Language
Gesture Studies (GS) Gesture Studies aims to publish book-length publications on all aspects of gesture. These include, for instance, the relationship between gesture and speech; the role gesture may play social interaction; gesture and cognition; the development of gesture in children; the processes by which spontaneously created gestures may become transformed into codified forms; the relationship between gesture and sign; biological studies of gesture, including the place of gesture in language evolution; gesture in human-machine interaction. Volumes in this peer-reviewed series may be collective volumes, monographs, and reference books, in the English language.
Editors Adam Kendon
University of Pennsylvania, Philadelphia
Cornelia Müller
European University Viadrina, Frankfurt/Oder
Volume 1 Gesture and the Dynamic Dimension of Language Essays in honor of David McNeill Edited by Susan D. Duncan, Justine Cassell and Elena T. Levy
Gesture and the Dynamic Dimension of Language Essays in honor of David McNeill
Edited by
Susan D. Duncan University of Chicago
Justine Cassell Northwestern University
Elena T. Levy University of Connecticut - Stamford
John Benjamins Publishing Company Amsterdam / Philadelphia
8
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data Gesture and the dynamic dimension of language : essays in honor of David McNeill / edited by Susan D. Duncan, Justine Cassell, Elena Levy. p. cm. -- (Gesture studies, issn 1874-6829 ; v. 1) Includes bibliographical references and index. 1. Gesture. I. McNeill, David. II. Duncan, Susan D. III. Cassell, Justine, 1960- IV. Levy, Elena Terry, 1952P117.G4685 2007 808.5--dc22 2007011245
isbn 978 90 272 2841 3 (Hb; alk. paper) © 2007 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa
Contents
I.
Introduction............................................................................................ 1
1
Introduction: The Dynamic Dimension of Language .............................. 3 Elena Levy, Susan Duncan, and Justine Cassell
2
On the Origins of Modern Gesture Studies............................................ 13 Adam Kendon
II. Language and Cognition ..................................................................... 29 3
Gesture with Speech and Without It ...................................................... 31 Susan Goldin-Meadow
4
From Gestures to Signs in the Acquisition of Sign Language............... 51 Nini Hoiting & Dan Slobin
5
How does Spoken Language Shape Iconic Gestures? ........................... 67 Sotaro Kita & Asli Özyürek
6
Forgetful or Strategic? The Mystery of the Systematic Avoidance of Reference in the Cartoon Story Narrative ......................................... 75 Nobuhiro Furuyama and Kazuki Sekine
7
Metagesture: An Analysis of Theoretical Discourse about Multimodal Language............................................................................ 83 Fey Parrill
8
Potential Cognitive Universals: Evidence from Head Movements in Turkana .............................................................................................. 91 Evelyn McClave
9
Blending in Deception: Tracing Output Back to Its Source .................. 99 Amy Franklin
10 A Dynamic View of Metaphor, Gesture and Thought.......................... 109 Cornelia Müller 11 Second Language Acquisition from a McNeillian Perspective ............117 Gale Stam
CONTENTS vi
III. Environmental Context and Sociality ...............................................125 12 Face-to-face Dialogue as a Micro-social Context: The Example of Motor Mimicry .................................................................................127 Janet Bavelas 13 Master Speakers, Master Gesturers: A String Quarter Master Class.... 147 John Haviland 14 Constructing Spatial Conceptualizations from Limited Input: Evidence from Norwegian Sign Language...........................................173 Scott Liddell & Marit Vogt-Svendsen 15 Environmentally Coupled Gestures ......................................................195 Charles Goodwin 16 Indexing Locations in Gesture: Recalled Stimulus Image and Interspeaker Coordination as Factors Influencing Gesture Form......... 213 Irene Kimbara 17 The Role of Iconic Gesture in Semantic Communication and Its Theoretical and Practical Implications .................................................221 Geoff Beattie & Heather Shovelton 18 Intersubjectivity in Gestures: The Speaker’s Perspective toward the Addressee........................................................................................ 243 Mika Ishino 19 An Integrated Approach to the Study of Convention, Conflict, and Compliance in Interaction..................................................................... 251 Starkey Duncan IV. Atypical Minds and Bodies ................................................................ 267 20 Discourse Focus, Gesture, and Disfluent Aphasia................................269 Susan Duncan & Laura Pedelty 21 The Construction of a Temporally Coherent Narrative by an Autistic Adolescent: Co-contributions of Speech, Enactment and Gesture ..285 Elena Levy 22 The Body in Communication: Lessons from the Near-Human ............303 Justine Cassell Index ..................................................................................................... 323
Introduction
Introduction The Dynamic Dimension of Language1 Elena Levy University of Connecticut
Susan Duncan University of Chicago
Justine Cassell Northwestern University
The twenty-one chapters on gesture and the dynamic dimension of language that follow this introduction have found their place in the current volume because of the intellectual stimulation and transformation each chapter’s author has experienced through a connection with the ideas of David McNeill. Now Professor Emeritus at the University of Chicago, McNeill has for four decades pursued the development of a unique theory of the human capacity for language that relies on the unity of distinct semiotic frameworks underlying discourse language production. McNeill’s students see it as logical that his paradigmshifting work on language finds its home at Chicago, where the social sciences have always been a place of theoretical and methodological innovation. Think of the ‘Chicago Schools’ of Economics and Sociology, and consider, for example, John Watson’s development of behaviorism, at odds with the dominant American approaches of structuralism and functionalism, and John Dewey’s functionalism, challenging the assumptions of structuralism. The persistence of these and many other Chicago scholars led to new ways of looking at human behavior, and so helped to found entire schools of thought. We regard David McNeill’s four decades of research as having similarly contributed to shaping a distinct school of language theory. In his latest book, Gesture and Thought, McNeill presents a perspective on language that forms part of a broader view of psychology, developed over the course of his career. Citing the influence of Lev Vygotsky (1986), McNeill describes a theory of language that is “antireductionist, holistic, dialectical, and grounded in action and material experience” (McNeill, 2005:4). This is a true 1
We gratefully acknowledge the many contributions from (in alphabetical order) Amy Franklin, Mika Ishino, Irene Kimbara, Karl-Erik McCullough, Fey Parrill, Laura Pedelty, Arika Okrent, and Gale Stam, that helped to shape the 2003 Festschrift Conference, this chapter, and the volume as a whole. We also thank Susan Goldin-Meadow for her comments on the Introduction.
4 LEVY, DUNCAN, AND CASSELL psycholinguistics, focused on the dynamic dimension of language and tying it, irreducibly, to intra- and interpersonal contexts. Again acknowledging Vygotsky, McNeill remarks, “[t]o cope with signs is not to cope just with them but also with the whole interconnected pattern of activity in which they are embedded” (McNeill, 2005:100). Each of the chapters in this volume reflects a view of language as a dynamic phenomenon with emergent structure, and in each, gesture is approached as part of language, not an adjunct to it. Together, the chapters support and contribute to McNeill’s theory and methodology. For readers unfamiliar with McNeill’s approach, we present an overview of key aspects of it, incorporating into our overview McNeill’s own descriptions from Gesture and Thought.2 2.
David McNeill’s Theoretical Framework
A frequent contrast to be found in McNeill’s writings is the distinction between dynamic processes of change, and static snapshots that result from the action of the dynamic processes. In Gesture and Thought, McNeill points out that the ‘static’ perspective on language, motivated by the writings of Ferdinand de Saussure, has dominated most twentieth-century theorizing about language, not only in linguistics, but in psycholinguistics as well: Classically, the traditions are called ‘linguistic’ and ‘psycholinguistic’, but much psycholinguistics remains static in its underlying assumptions, and more accurate terms are dynamic and static, with psycholinguistics [to date] mainly in the static camp (p.63).
McNeill remarks that this perspective, “after reigning for the better part of a century, may have reached a limit, its insights into language finally running dry,” and so the study of language may be “ripe for a new paradigm” (p. 65). The new approach would not overthrow the insights of the static, because that would, “relinquish…standards of form” (p.65). Rather, it would wed the static with the dynamic, creating what Saussure later termed the ‘essential duality’ of language (2002:65). In the overview that follows, we elaborate on aspects of McNeill’s new approach. 2.1 Microgenesis of meaning Most of McNeill’s writings concern the microgenesis of utterances, the “moment-by-moment thinking that takes place as one speaks” (p.15). At this level of description, utterances arise from a dialectic between two semiotic modes: imagery, embodied as gestures, and the lexicogrammatical categories of speech. These are “opposite” modes of meaning capture—one global, synthetic, 2
Unless otherwise noted, all page numbers refer to McNeill, 2005.
INTRODUCTION 5
instantaneous, and noncombinatoric, and the other sequential, segmented, arbitrary and conventional—and the dialectic is driven by the semiotic disparity between the two modes. The result is the final form of utterance that is the object of most traditional studies of language. In this account of microgenesis, meaning is not independent of thinking: “It is not that one thinks first, then finds the language to express the thought.” Rather, thinking is the source of meaning (p.125), and thinking is a dialectic between the two semiotic modes—it is “both global and segmented, idiosyncratic and linguistically patterned” (McNeill & Duncan, 2000:148). Central to this account is the view that gestures do not represent what they depict, but rather they embody it; they are action itself. Thus, this approach gives new meaning to Vygotsky’s claim, frequently cited by McNeill, that meaning does not exist and develop in isolation of its material carrier. As McNeill puts it, the materialization of imagery in gestures is the “essence of embodiment” (p.103). 2.2
The Growth Point
The initial organizing impulse of an utterance, and the starting point for developing its meaning, is the ‘Growth Point’ (GP). In McNeill’s view, this is a minimal, irreducible, psychological unit that, in Vygotsky’s sense, is a microcosm of the whole utterance. The GP is both image and linguistic categorical content: an image, as it were, with a foot in the door of language. Such imagery is important, since it grounds sequential linguistic categories in an instantaneous visuospatial context (p.115).
GPs are unpacked through the imagery-language dialectic, becoming, over microgenetic time, increasingly well-formed as lexicogrammatical constructions. The latter are the stable, conventional linguistic forms studied by Saussurianbased (psycho)linguistics—the static ‘snapshots’ formed from the dynamic processes. In this view, “gesture and the imagery it embodies are an integral step on the way to a sentence” (p.18). 2.3
Context
GPs not only consist, irreducibly, of imagery and language, but are also irreducibly tied to context. In Vygotsky’s terms, they are ‘psychological predicates,’ embodying what is ‘newsworthy’ relative to what has come before. We quote Michael Studdert-Kennedy’s lively description of the microgenetic process of utterance formation, from his 1994 review of McNeill’s (1992) Hand and Mind: For McNeill the starting node of an utterance…is its ‘growth point,’ a metaphor from embryology with dynamic implications that its alternatives lack. The growth point is a small deviation, a minor salience, among the disordered fragments of images and
6 LEVY, DUNCAN, AND CASSELL linguistic categories, the residue of immediately preceding thoughts, from which utterance and gesture assemble themselves, thus assuring some degree of sequential coherence.
As McNeill & Duncan (2000) point out, the essential connection between growth point and context has implications for models of real-time, coherent text formation. 2.4
Dialogue, monologue, and social context
In McNeill’s view, the process of utterance formation is inherently social. McNeill describes the inseparability of the intra- and interpersonal planes with respect to Vygotsky: While the GP itself is intra[personal], it ties together forces on thought and action that scatter over both the interpsychic and intrapsychic planes. Vygotsky said that everything appears in development twice, first on the social plane, then on the individual. The same logic and direction of influence applies to the GP. Vygotsky also saw the necessity of a unit that encompasses this transformation, invoking the concepts of psychological predicates and inner speech to express this unity in the minds of socially embedded individuals. The growth point concept is meant to be heir to these insights (p.162-3).
McNeill addresses two aspects of the social dimension of microgenesis. In one respect, microgenesis is social because, as described above, within the imagistic-linguistic unity that constitutes GPs, the categories of language are determined by convention. As GPs are unpacked, the categorical side of the dialectic drives development toward the “frozen,” conventional, lexicogrammatical forms that constitute the final, spoken utterances. Microgenesis is social in a second, more immediate, fluid, and idiosyncratic sense as well. This can be seen most clearly in dialogue, and is perhaps best illustrated by an analysis of a two-party conversation presented in Gesture and Thought (pp.151-159). The analysis concerns the emergence of GPs from the back-and-forth of immediately preceding dialogue. McNeill shows that the context from which each new utterance departs has been constructed from the joint contribution of gesture and speech, including the co-construction of shared gesture space by the interlocutors. The participants use this space to ground a series of indexical points, and, irreducibly, it forms part of the new GP. Thus, while growth points are in one sense intrapersonal, they are interpersonal as well: They are a product of the speaker’s “individual thinking at a particular moment in a specific pragmatic-discourse context, and encompass interpersonal, moral, discourse, and historical-biographical dimensions” (p.159). Although seemingly contradictory, the same holds true for monologue; for McNeill, the intra- and interpersonal planes are inseparable in all acts of speaking. He describes this ‘individual-social duality’ with respect to gestures:
INTRODUCTION 7
The fact is that every gesture is simultaneously ‘for the speaker’ and ‘for the listener’. I do not mean this in a bleached ecumenical way. I mean that an individualsocial duality is inherent to gesture. A gesture is a bridge from one’s social interaction to one’s individual cognition—it depends on the presence (real or imagined) of a social other and yet is a dynamic element in the individual’s cognition…[W]ith a more explicit reference to Vygotsky, every thought (intrapsychic) passes through a social filter (interpsychic) (pp.53-54).
We suggest that studies of the relationship between micro- and ontogenesis will continue to illuminate how GPs in monologue are inherently social—in this second immediate, fluid, and idiosyncratic sense. If, as Vygotsky claims, ontogenetic developments appear first on a social plane and then on an individual one, the process of constructing GPs from context may occur ontogenetically earlier in dialogue; the same processes later appropriated, over ontogenetic time, for use in monologue. From this perspective, monologue continues to be, in Vygotsky’s terms, a form of ‘inner dialogue’. 3.
Chapters in the Volume
Of the twenty-one chapters in this volume, nine originated as papers from a conference, held on June 8, 2003, to honor David McNeill’s contribution to the study of language. All the conference presenters, senior colleagues of McNeill in the field of research on language, had begun their studies of gesture—and body movement more broadly—when the fundamental importance of these for our understanding of human language and interaction still needed to be justified. The remaining chapters, by current and former students of McNeill, have been written since the 2003 conference. They were planned and executed as mostly shorter essays that pay tribute to the ongoing influence of McNeill’s work on their own. In these contributions we see clearly how McNeill’s work is continuing to influence the next generation of scholars. The introductory chapter, written by Adam Kendon, contextualizes McNeill’s perspective within a history of earlier gesture studies. Kendon’s own research, from the early 1970s on, has itself been foundational to the development of McNeill’s paradigm and his chapter here makes clear the historical foundations of all the work represented in the volume. The rest of the book is divided into two main sections, “Language and Cognition” and “Environmental Context and Sociality.” Although all of the authors in the volume would most likely agree with McNeill’s view of an ‘individual-social duality’ inherent to all of gesture and speech (see section 2.4), we might say that the work detailed in the chapters of the first section emphasizes the ‘intrapersonal plane’, while that in the second section emphasizes the ‘interpersonal plane’. The final section, “Atypical Minds and Bodies,” concerns
8 LEVY, DUNCAN, AND CASSELL lessons to be learned from studies of aphasic patients, autistic children, and artificial humans. 3.1 Language and cognition In Susan Goldin-Meadow’s comprehensive summary of her work on the cognitive, developmental, and communicative functions of gestures in children and adults, hearing and deaf, she elaborates on the semiotic “versatility” of gestures. Locating McNeill’s treatment of gesture in language on one area of the map of diverse gestural phenomena, she discusses how the spontaneous gestural systems of adults, elicited under specific experimental conditions, and the homesigns of deaf children manifest ‘resilient’ properties of language; further, how gestures can serve as a “mechanism of change” in learning and development. Nini Hoiting and Dan Slobin employ McNeill’s distinction between categorical and imagistic modes of meaning creation in their explanation of aspects of sign language use in children, demonstrating that discrete, conventionalized signs share the manual modality with gradiently-patterned, non-conventional ones. The authors propose that “[t]he ways in which [nonconventional signs] are incorporated vary from language to language and can, eventually, be rigorously specified in the terms of an expanded linguistic theory.” Kita Sotaro and Asli Özyürek offer their own well-developed theoretical perspective on the imagerylanguage dialectic. They present cross-linguistic data demonstrating the interdependence of gestural and linguistic representations. Nobuhiro Furuyama and Kazuki Sekine offer an analysis of catchment phenomena in cartoon narrations. The analysis leads to the intriguing conclusion that the structure of catchment-level units of discourse puts predictable constraints on what gesturespeech utterances can emerge over the course of narration. Fey Parrill offers an analysis of a repeating gestural form that is regularly produced by David McNeill himself. She analyzes this form with respect to the notion of Growth Point, and discusses it as an instance of a species of gesture that is stable in form while at the same time idiosyncratic to an individual speaker. Evelyn McClave discusses the stability of another type of recurring body movement—head movements that recur systematically in particular discourse contexts and across diverse and unrelated cultural groups. She concludes that the movements may reflect human cognitive universals. Amy Franklin integrates Growth Point theory with Conceptual Integration theory to describe the structure of representations underlying the gesture-speech ‘mismatches’ that occur in discourse that is meant to deceive. She argues that these representations are blended unities of their disparate inputs. Cornelia Müller extends metaphor theory on the basis of gestural manifestations of ‘metaphoricity’ that, she shows, vary depending on whether, for an individual speaker at the moment of speaking, a metaphorical concept is alive, dead, or ‘sleeping’. Her treatment of metaphor shares with McNeill’s framework the goal of elucidating the dynamically emergent properties of language structure in actual contexts of language use. The final paper in the “Language and cognition” section
INTRODUCTION 9
is by Gale Stam, who lays out the significance of McNeillian theory for research on second language acquisition. She discusses how analysis of language learners’ gestures sheds light on their evolving ‘thinking for speaking’ (Slobin, 1991) in their second language. 3.2 Environmental context and sociality In the first chapter of this section, Janet Bavelas argues that the fundamental site of language use is face-to-face dialogue. In most studies of language, however, the unit of analysis has not been micro-social interaction, but rather the conceptually isolated individual. Bavelas argues that this perspective is highly resistant to change. She supports her argument with a review of several researchers’ reactions to a study of her own that used micro-social interaction as the unit of analysis, demonstrating that most re-interpreted her findings in terms of the individual. John Haviland, in an examination of the very complex, multiparty interactions within a string quartet master class, extends the Growth Point notion as well as the domain of multimodality to include, as part of the context that shapes linguistic and musical expression, not only talk, gesture, and pedagogical interaction, but also the musical instruments and the musical score that is itself the focus of the interaction. The analysis by Scott Liddell and Marit Vogt-Svensden of a sign language conversation supports McNeill’s claim, made with respect to spoken languages, that in sign languages as well, “gestures are an integral part of language as much as are words, phrases, and sentences” (McNeill, 1992:2). In other words, in signed as well as vocally produced languages, spontaneous, imagistic gestures and conventional signs form a single, integrated, conceptual system. In their chapter they look to ‘real space blends’—signers’ conceptualizations of events, blended with their conceptualizations of what is real in their immediate environment—for evidence of gestural aspects of sign language. Charles Goodwin’s analysis of an interaction between a novice and an expert archaeologist is further demonstration of the need to broaden our concept of the multimodal nature of language use, in order to adequately account for what he refers to as ‘environmentally coupled’ gestures. His arguments make clear that environment is multi-faceted, having both physical and social dimensions, within which participants in interactions mutually devise communicative forms, making creative use of what their immediate environment affords. An experimental study by Irene Kimbara attempts to measure the differing influences on individuals’ gesture productions of mental imagery retained from viewing a stimulus, versus possibly conflictual imagery present in the gestures of interlocutors co-narrating the events in the stimulus. Her results distinguish effects both of mental imagery and of the social interaction on speakers’ gestures. Geoffrey Beattie and Heather Shovelton cite McNeill’s statement that gesture and speech are two sides of the same mental process as the root of their exploration of how gestures augment the message contained in speech. They extend McNeill’s theory to the communicative effect on the addressee of gesture-speech combinations, and they draw out
10 LEVY, DUNCAN, AND CASSELL implications for practical attempts to make communication more effective. Mika Ishino demonstrates how a speech-gesture synchrony analysis of an extended discourse reveals that multiple distinct meanings must be inferred for several instances of simple pointing gestures. She focuses on how these gestures reveal intersubjective conceptualizations, such that one interlocutor can point from the adopted perspective of another. Finally, Starkey Duncan, a long-time colleague of David McNeill’s in the Psychology Department at the University of Chicago, demonstrates how careful analysis of the steps involved in a recurring interaction (such as a parent feeding a child) reveal systematic patterns of recurrence of particular behavior sequences. He develops the idea that these are conventionalized and rule-governed patterns, an idea that seems to extend to the domain of temporally-extended interaction the ideas that others in the volume argue for in relation to conventionalized gesture forms. 3.3 Atypical minds and bodies A theory of the human capacity for language, such as McNeill’s, may be evaluated in part on the basis of how it increases our understanding of atypical language. The three papers in the last section of this Festschrift examine aspects of language in three different domains. Susan Duncan and Laura Pedelty demonstrate the utility of the McNeill method of multimodal discourse analysis, with comparisons across relevant speaker groups, for uncovering more of the underlying structure of the “telegraphic” speech of Broca’s aphasics. Their analyses yield evidence against an ‘agrammatism’ account of particular manifestations of this language disorder. Elena Levy adds analysis of multimodal dimensions of language use to a previous, speech-only, analysis of the narrative discourse of an autistic adolescent, making visible additional dimensions of language performance that had previously remained obscure. Her focus is on the co-contributions of speech, enactment, and gesture to the development of temporally coherent narrative discourse in this atypically-developing child. Finally, Justine Cassell discusses the ways in which theories of the relationship between speech and gesture have influenced the development of the Artificial Intelligence systems called Embodied Conversational Agents and, vice-versa, the ways in which Embodied Conversational Agents may contribute to the development of theories of the relationship between speech, gesture, eye gaze, posture, and the other modalities of communication. 4.
In Parting
This Festschrift assembles a true community of scholars, as can be seen by the fact that so many of the authors make frequent reference to one another’s findings and theories. We characterize the central unifying concept among all the chapters of this volume by once more quoting McNeill, from his 1985 Psychological Review paper, “So you think gestures are nonverbal?”:
INTRODUCTION 11
[T]he whole of gesture and speech can be encompassed in a unified conception, with gesture as part of the psychology of speaking, along with, and not fundamentally different from, speech itself.
McNeill’s students value a quotation from the introduction to Roger Brown’s (1973), A first language: The early stages, for what it reveals of his early contributions to the development of the field of psycholinguistics: [M]any other young scientists were in contact with our work in its early stages and have gone on to become major contributors to the field. I think particularly of David McNeill…his great talents and of the interest he has brought to developmental psycholinguistics (which, he, incidentally, christened as such) by his ability to conceive of bold and fascinating generalizations (Brown, 1973:18-19).
We find it intriguing (and inspiring) that McNeill, who coined the term “developmental psycholinguistics,” remains true to the work of crafting a psycholinguistics whose assumptions are radically different from the assumptions of traditional linguistics, concerning the irreducible relationship among language, imagery, the body, and context. References Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press. McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review, 92, 350-371. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press. McNeill, D. (2005). Gesture and Thought. Chicago: University of Chicago Press. McNeill, D. & Duncan, S. (2000). Growth points in thinking-for-speaking. In D. McNeill (Ed.), Language and gesture, (pp.141-161). Cambridge: Cambridge University Press. Saussure, F. de. (2002). Ecrits de linguistique générale (compiled and edited by S. Buquet and R. Engler). Paris: Gallimard. Slobin, D. I. (1991). Learning to think for speaking: Native language, cognition, and rhetorical style. Pragmatics, 1, 7-26. Studdert-Kennedy, M. (1994). A review by Michael Studdert-Kennedy of Hand and mind: What gestures reveal about thought by David McNeill. Language and Speech, 37(2), 203-209. Vygotsky, L.S. (1986). Thought and language. Edited and translated by E. Hanfmann and G. Vakar, revised and edited by A. Kozulin. Cambridge, MA: MIT Press.
On the Origins of Modern Gesture Studies Adam Kendon Institute for Research in Cognitive Science, University of Pennsylvania
Interest in gesture in the Western tradition dates from the late Roman era. Scholarly studies first appear at the very end of the sixteenth century. Gesture drew philosophical interest in the eighteenth century when speculations on the natural origins of human language began. Although in the nineteenth century, especially with the emergence of anthropology, interest in gesture remained high, by the beginning of the twentieth century it appears to have declined sharply and there was a veritable dearth of studies until the 1970s. From this time forward, as speculation on language origins again returned to favour, with the growth of interest in sign languages and with a revival of interest in the cognitive foundations of language, the study of gesture resumed. Especially important has been the demonstration by David McNeill, among others, that gesture serves to express aspects of the conceptual content of utterances and is not just affective decoration.
The first discussions of gesture in the Western tradition are to be found in the context of discussions of rhetoric. The most detailed treatment from this point of view is given by Quintilianus in the eleventh book of his Institutio oratoria which written in the First Century CE (Quintilianus, 1922; Dutsch, 2002). Interest in gesture as an object of scholarly investigation does not emerge before the last half of the sixteenth century, however. One may mention in this connection, for example, the work of Arias Montanus who, in 1571 published commentaries on the Bible which included an extensive study of biblical gestures. According to him, this was the first time that a neglected field was receiving attention (Knox, 1990). By the beginning of the seventeenth century treatises devoted to gesture began to appear. The first of these appears to be Giovanni Bonifaccio's L'arte de' cenni, published in Vicenza, Italy in 1616. This is a curious work, largely descriptive, motivated by a desire to re-establish the sacred language with which God had endowed us and which was common to us all, before we invented spoken languages. According to Bonifaccio, these only serve to keep people apart and burden them with too much intellectual labour (Benzoni, 1970; Knox, 1990; 1996). After Bonifaccio in 1644, come the two books of John Bulwer, his Chirologia and Chironomia (see Bulwer, 1974). Like Bonifaccio, Bulwer believed that gesture was the natural language of mankind, a form of language that, in fact, as he put it, “had the happiness to escape the curse at the confusion of Babel.” He
14
ADAM KENDON
supposed, indeed, that whereas spoken languages were artificial inventions, what he referred to as, “The Natural Language of the Hand,” was something to be discovered by studying the expressions of man directly. As Geoffrey Wollock (2002) has suggested in a recent article, Bulwer’s view of language seems akin to some modern views that seek to show that language arises from conceptual schemata that are themselves derived from bodily experiences. Language reflects nature because humans are a part of nature and the, “original and perfect language lies deep within ourselves.” It is not something that can be constructed by rational processes. Gesture comes ever more into focus among scholars in the eighteenth century. Giambattisto Vico in Naples (Vico, [1744] 1984) and Condillac in Paris (Condillac, [1756] 1971), almost at the same time, proposed gesture as the first form of language (although for somewhat different reasons) and both saw that the gesturing of the deaf was a mode of truly linguistic expression (see also Danesi, 1993 on Vico and Knowlson, 1965, Siegel, 1969 and Wells, 1987 on Condillac). In this period, in Paris especially, we see the first attempts to analyse systematically such gesturings and to understand their structure. I refer here to the work of the Abbe L'Epee and his successor Sicard. The latter, especially, was very keen on the idea that gesture could form the basis for a universal language, although it was not long before it was realized that gestural languages, like spoken ones, were subject to historical change and would diverge from one another in much the same way, so that the attempt to develop a universal language in gesture was dropped (see Knowlson, 1965 and Siegel, 1969). Nevertheless, the idea that, somehow, gesture is not only universal but must have been the first form of language has never been given up. In recent years it has been seriously revived, with modern scientific work to support it, by such writers as Gordon Hewes (1973), William Stokoe, David Armstrong and Sherman Wilcox (Armstrong, Stokoe & Wilcox, 1995) and Michael Corballis (2002). Indeed, as we shall see, the twentieth century revival of the debate about language origins (which began, I think, in the early 1960s), is one of the factors that contributed to the re-emergence of gesture studies in this period. In the nineteenth century, some of the pioneers of fields now known as anthropology and psychology took a serious interest in gesture. I think here, especially, of Edward Tylor, who, in his Researches into the early history of mankind, published first in 1865, devoted three long chapters to gesture, including a detailed study of sign language among the deaf. Tylor believed that because of what today would be called the iconic character of gestural expression, we were much better placed, through its study, to understand the processes by which symbolic systems come to be established. Tylor also devotes much space to a comparative study of certain gestures, such as gestures of greeting, and draws from this, as he also does from a consideration of what he called ‘picture writing’, conclusions that supported his general thesis of the ‘psychic unity’ of mankind. Wilhelm Wundt, likewise, in the first part of his Völkerpsychologie discussed gesture. He saw it as a link in the process by which the emergence of spoken
ORIGINS OF MODERN GESTURE STUDIES 15
language could be understood. Like Tylor, he believed that the analysis of deaf sign languages could bring us insight into the way linguistic signs come about. In his discussion of sign languages, he provided an analysis that in many ways anticipates our modern understanding (see Wundt, 1973). A third figure from the nineteenth century who is important for us is Garrick Mallery. As a military man involved in campaigns against the Indians in North America, he became extremely interested in their modes of communication. He was eventually assigned to the Smithsonian Institution so that he could devote himself to the study of the sign languages and picture-writing systems of the Plains Indians. His report of 1881, Sign language among North American Indians compared with that among other peoples and deaf mutes, remains to this day one of the broadest treatments of gesture ever written (Mallery, 1972). A little later, as anthropological observation expanded in Australia and its neighbouring archipelagos, gesture was also an important focus of interest. The sign languages and gesture systems employed in Australia received much attention from several pioneers of Australian anthropology, including Howitt, Roth, Strehlow and Spencer and Gillen (see Kendon, 1988, Ch. 2). A.C. Haddon, who, together with W.H.R. Rivers, established anthropology at Cambridge and led an anthropological expedition to the Torres Straits Islands in 1898, included the study of gesture as part of the range of phenomena examined. At that time it was considered to be a part of the study of language, broadly conceived (Haddon, 1907). Another figure from the nineteenth century whom it is important to mention is the Neapolitan cleric and archaeologist, Andrea de Jorio. His interest in gesture was not guided by concerns about its universality or the light it might throw on symbolic processes, however. He wrote the treatise that he published in 1832, La mimica degli antichi investigata nel gestire napoletano—or, in English, Gestural expression of the ancients in the light of Neapolitan gesturing—in the first place, as a sort of handbook for those who would wish to understand the imagery in the ancient frescoes, mosaics and vase paintings that were becoming known from the excavations of Herculaneum, Pompeii and elsewhere in Southern Italy. De Jorio believed (not wrongly) that there was a high degree of cultural continuity between the common people living in Naples and surrounding communities in his day and the ancient Greco-Roman founders of these cities. Thus he thought that if one understood the gestures of the Neapolitans one could better interpret these ancient images. Whether this is a valid idea or not, need not concern us here. For students of gesture what is of importance is that de Jorio produced, in effect, a remarkable ethnography of Neapolitan communicative practice, especially regarding the use of gesture. As Alain Schnapp (2000:161) has noted, with his work, de Jorio, “brought into being a new discipline: the anthropology of everyday life.” However, this was a discipline for which the nineteenth century was not yet ready, and indeed it is only lately that this aspect of de Jorio's work has been fully recognized (see Kendon, 1995 and Kendon’s Introduction in de Jorio, 2000).
16
ADAM KENDON
At the end of the nineteenth century, then, there was a considerable appreciation of the interest and importance of gesture. However, de Jorio apart, it is notable that the interest of those who, today, seem the most important—I refer here, again, to Edward Tylor, Garrick Mallery and Wilhelm Wundt—was in the idea of a ‘gesture language’. That is, their interest was in gesture as it could function as an autonomous linguistic system. It is for this reason that they took such an interest in the sign languages of the deaf. This interest was motivated, as I have already said, by the idea that insights into ‘gesture languages’ would give insight into the origins of symbolic processes in general, and so throw light upon the problem of language origins. It is notable that none of the writers I have mentioned—again, leaving de Jorio on one side—were much concerned with the relationship between speech and gesture. This notwithstanding the fact that much interest in how gesture should be employed in relation to speech in acting and oratory also persisted in the nineteenth century. This belonged to a practical tradition, however, and treatises on the ‘art of gesture’—one of the most notable being Gilbert Austin's Chironomia of 1802 (see Austin, 1966)—did not take up the philosophical issues that I have mentioned. Earlier, in the eighteenth century, the distinction between works concerning gesture from a philosophical point of view and those that were more practical, was less often maintained. Condillac, for example, paid much attention to what was said about gesture in the rhetorical tradition, and the work by the famous German enlightenment figure, dramatist, novelist, philosopher and theatre director Johann Jakob Engel—his Ideen zu einer mimik (1785-86)—combined both artistic and philosophical concerns (Fortuna 2003). Notwithstanding this philosophical interest in the nature of gesture language and the sophistication in its understanding that had been achieved, there arose a powerful move against its use as a medium for the education of the deaf. The Congress of Educators of the Deaf held in Milan in 1880 passsed a resolution banning the use of sign language as a medium of instruction for the Deaf. This led to a widespread abandonment of the use of signing in deaf educational contexts (Facchini, 1983). This, without doubt, contributed to the delegitimization of gesture as an object of study that then ensued. It is certainly striking that, as the twentieth century got underway, after the first decade or so, gesture as an object of academic interest went into eclipse. We no longer find it being given its due place in the works of those who adhered to or followed the principal currents of concern in linguistics, anthropology and psychology. Works on gesture continued to appear, but these do not seem to have fed into the developing concerns of linguists, as they tried to found linguistics as an autonomous discipline; or of anthropologists, who became preoccupied with structural-functional issues in the organization of societies; or, indeed, of psychologists, who, at least in America, under the sway of behaviorism, were mainly interested in studying phenomena that could be accommodated by such apparently elementary processes as the conditioned reflex, the law of effect and reinforcement. Complex processes like
ORIGINS OF MODERN GESTURE STUDIES 17
thinking and linguistic expression, and certainly gestural expression, were, on the whole, not then central in psychological research. An illustration of how gesture studies were regarded during this period in the context of anthropology can be found in a paper by Mervyn Meggit, a well known Australian anthropologist. In 1954, as part of his work on the Warlpiri of central Australia, he published in the journal Oceania a short paper on Warlpiri sign language. This was the first paper to appear on any Australian aboriginal sign language for forty years, and another twenty-four years were to pass before the next paper on this topic was to appear. Meggitt appeared to understand very well that the study of a gesture system such as the sign language in use among the Warlpiri was outside what might be regarded as proper for an anthropologist to be concerned with. He appears to have felt himself compelled, in his paper, to defend his decision to write on the topic against charges that it might be only of “dilettante rather than of scientific interest” (Meggitt, 1954:2). So what led to the current excitement about gesture? What led to world congresses, and to the willingness of John Benjamins of Amsterdam to venture forth with a scholarly journal on the topic? To understand how this came about many different factors must be considered. These include the development of audio-visual recording technologies (which make possible the examination of specimens of human communicative action), the development of new understandings about the nature of human communication, developments in the structural analysis of language and the broadening of the phenomena to which these methods of analysis were to be applied. To this we might also add the development of computers, although the importance of this, for gesture studies is not very direct. One consequence of the development of computers was to suggest ways in which complex mental processes could be modelled, and this contributed, in an important way, to the emergence of an interest in cognitive processes. This, as we shall see, created an environment in which the study of gesture could be seen as having a theoretical importance. Once computers were seen to be feasible, there emerged the idea that there could be machines that could translate from one language to another. To pursue this idea, of course, required a kind of understanding of the structures of languages that might make this possible. The interest in finding these created a context in which the development of generative approaches to the analysis of language grammars could find support. This, in turn, led to the idea that we could, after all, construct models of complex mental processes for, under the influence of the generative approach to the analysis of language, emphasis came to be placed on the study of linguistic competence, rather than performance, and this came to be thought of as a component of the human mind (see, for example, Chomsky, 1968). This was later to have important consequences for gesture studies, for it was in relation to the possible light gestures could throw on cognitive processes that one of the more prominent theoretical justifications for the study of gesture came into being.
18
ADAM KENDON
The revival of serious discussions about language origins also was important for gesture studies. This revival appears to have begun around 1960. One significant date is 1964, because this was the year of publication in Current Anthropology of a paper by Charles Hockett and Robert Ascher on the ‘Human Revolution’ which included a discussion of language origins. It was a paper that attracted a good deal of attention. In it, Hockett presented his ‘design feature’ analysis of communication systems (Hockett & Ascher, 1964; an earlier formulation of this idea can be found in Hockett’s textbook, A course in modern linguistics, published in 1958). This was an attempt to suggest in a precise way what features a communication system such as a human language has, so that systematic comparisons with other communication systems, such as those found among other animals would be possible. Hockett proposed that this approach would clarify attempts to understand the evolutionary steps necessary for the emergence of human language. This paper made much use of new findings from ethology. These were showing that animals of all sorts had complex and well developed systems of communication, that they made use of vocabularies of significant actions, often highly specialized for communicative purposes. So far as primates were concerned, Hockett and Ascher only had Carpenter’s studies of the gibbon to draw upon and they only discussed gibbon vocalizations. Soon after their paper appeared, however, the results of new field studies of the great apes began to become available. From these it was clear that great apes had a flexibility and a sophistication in communication using gestures and facial and bodily expressions seemingly far greater than their vocal communication capacities. This contributed to the revival of attempts to teach human language to the chimpanzee, but this time using a gestural modality (previously some people had tried to teach chimpanzees to speak). The apparent success of these attempts catalysed the linguistic study of sign languages, and this also contributed to the growing revival of the view that gesture is a modality of importance and theoretical interest. I will now elaborate a little on some of these points. Photography was invented in 1839 and cinematography followed soon after. By 1895 the Lumiere brothers, in Paris, had made it possible to see moving pictures of a train arriving in a station, workers leaving a factory, a man squirting water over another with a hose, and so forth. Anthropologists were not slow to employ these new media and A.C. Haddon, one of the first to do so, included cinematography as one of his instruments in his expedition of 1898 to the Torres Straits Islands. The field use of photography and of cinematography by anthropologists did not really become widespread, however, until after the invention of 35 millimetre film cameras and sixteen millimetre cinematography. One of the more important pioneers of the use of both media in anthropology was Gregory Bateson who, in his collaboration in Bali with Margaret Mead, showed the value of visual recording for the analysis of patterns of behaviour in human interaction. In their famous book, Balinese Character (1942), they showed how useful photography could be for showing the details of how mothers and children interacted, the nature of behaviour in trance, the methods by which the complex motor skills of Balinese dancing were taught,
ORIGINS OF MODERN GESTURE STUDIES 19
and many other things. Bateson, as I shall explain in a moment, went on to exploit the use of film much further and he was involved in the first attempt at a microanalysis of human interaction incorporating both bodily action and speech in the widely cited but never published work known as The natural history of an interview (McQuown, 1971; for details and historical discussion see LeedsHurwitz, 1987 and Kendon, 1990, Chapter 2). However, before continuing with Bateson, I must mention another figure of great importance for the later development of gesture studies, who also used film in his work, and that is David Efron (1941; 1972). Efron, as is very well known, under the direction of Franz Boas at Columbia University, carried out what must still remain one of the most remarkable studies of gesture ever undertaken. He made comparative analyses of the use of gesture in two different cultural groups— East European Yiddish speaking Jews and Southern Italians, both immigrant groups in Manhattan. He showed that there are striking differences between these groups in how gesture is employed in everyday interaction. He then went on to show that these differences disappeared when the gesturing of assimilated or Americanized descendants of these two groups were compared. Among the techniques that Efron used for gathering material was that of making 16 millimetre film recordings in the streets and cafes in Manhattan. His films, unfortunately, have never been found. There is little doubt, however, that the highly sophisticated analyses he undertook showing cultural differences in gestural styles and, especially, his important analyses of the different ways in which gesture can be employed with speech owe much to the fact that he was able to employ film. With film it is possible to to analyse the gesturing of people while they are talking. Even without sound—almost certainly, Efron’s field films were not sound films—the importance of gesture as a part of the whole activity of producing utterances must surely have become readily apparent to Efron. We may imagine that this aroused his interest in analyzing the way in which gesture and speech are related. In his book we find accounts of the different ways in which gesture may function in relation to spoken discourse, something we do not find in the work of Tylor, Mallery or Wundt, in contrast. In this respect, Efron, can be regarded as the first truly modern student of gesture. Efron was explicitly interested in how gesture and speech are used together. Efron's study remains important to this day. It is amazing in its thoroughness, historical depth and richness of insight and observation. It is a true turning point in the history of gesture studies. Now I return to Gregory Bateson. Bateson had already, in his work in New Guinea (Bateson, 1936), come to see that the coming of age rituals that he had studied were usefully interpreted in terms of communication processes. He saw that it is through the continual mutual modification of actions in interaction that rituals are brought off and it is through this that their functions are realized. So when, at the end of the Second World War, he participated in conferences on the mathematical theory of communication and cybernetics, he saw at once the relevance of these ideas for an understanding of social communication (see Lipset,
20
ADAM KENDON
1980). In about 1947 he began work as an ethnologist at the Veterans Administration Hospital in Palo Alto and collaborated with psychiatrists such as Jurgen Ruesch, Jay Haley and John Weakland, who were studying the processes of communication in psychotherapy. Bateson became interested in the communication dynamics of families in which one member showed signs of schizophrenia. Together with his colleagues, he developed the ‘double bind’ theory of schizophrenia (Bateson, Jackson, Haley & Weakland, 1956). In this work he made use of film as an instrument which would allow him to look in detail at just how the family members interacted with one another. The results of these studies made it apparent that in the interactions that transpired in psychotherapy as well as in the interactions among family members that Bateson's films made it possible to observe, far more than the words uttered by the participants was involved in communication. Tones of voice, modes of hesitation, styles of talking, patterns of intonation, vocal quality, bodily posture, bodily movements of all sorts, glances, facial expressions, were all playing a role in how the interaction proceeded and in how the participants came to understand and to react to one another. From a consideration of these phenomena from the perspective suggested by concepts derived from information theory and cybernetics it seemed that a distinction could be drawn in human communication systems between ‘digital’ and ‘analog’ communication. This soon led to the notion of ‘nonverbal communication,’ a term that became popular around 1956. The book by Jurgen Ruesch and the photographer and poet Weldon Kees published in that year under that title played an important role in promoting this idea (Ruesch & Kees, 1956). It is to Bateson that we can credit perhaps the most succinct formulation of this notion (Bateson, 1968: 614-615). He remarks that, “human kinesic communication, facial expression and vocal intonation far exceed anything that any animal is known to produce.” He then continues by suggesting that, “our iconic (i.e., analogical) communication serves functions totally different from those of language and, indeed, perform functions which verbal language is unsuited to perform.” He goes on: It seems that the discourse of nonverbal communication is precisely concerned with matters of relationship ... From an adaptive point of view, it is therefore important that this discourse be carried on by techniques which are relatively and only imperfectly subject to voluntary control.
It will be seen that, formulated in this way, it becomes hard to know where, exactly, ‘gesture’ is to be placed. It turns out that, in the period when the idea of ‘nonverbal communication’ first became current, as the quotation from Bateson suggests, the interest was, precisely, in all those aspects of behaviour in social situations which seemed ‘unconscious’ and which seemed somehow ‘revelatory’ in a new way of a participant’s feelings and attitudes. Gesture was largely left out of the research picture. Paradoxical though it may seem, although from about 1950 onwards work expanded greatly on interpersonal communication in which
ORIGINS OF MODERN GESTURE STUDIES 21
increasing attention was paid to aspects of behaviour that went well beyond what was expressed in words, this was not the period in which we see the beginning of the revival of gesture studies. For the modern revival of gesture studies we must turn not to research on socalled ‘nonverbal communication’ but to developments elsewhere, all of which were directly concerned with language. I refer here to the ‘cognitive turn’ in linguistics, to the revival of serious discussion about language origins, to the teaching of forms of human language to apes, and to the re-discovery of the linguistic character of sign language. Before discussing these matters, however, I must refer to the work of Ray Birdwhistell who fathered the idea of ‘kinesics’. Birdwhistell, trained as an anthropologist at Chicago, became associated with the linguists Edward Trager and Henry Smith, and with the anthropologist Edward Hall, who were exploring the idea (I think largely under the inspiration of Edward Sapir) that the principles of structural analysis that had been developed for understanding the structure of spoken languages could also be applied to other kinds of cultural codes. Thus, Edward Trager, with his concept of ‘paralanguage’, attempted to show the structural units by which patterns of intonation, voicing, non-verbal vocal expressions, and the like could be analyzed (Trager, 1958), and Edward Hall suggested that the way humans employed space in interaction could likewise be analyzed in structural terms, and he proposed the science of ‘proxemics’. (Hall, 1966). Birdwhistell took up the idea that human body motion, when looked at from a communicative perspective, could likewise be analysed in terms of structural units analogous to those used in linguistics, and he proposed the idea of a ‘kinesics’. Birdwhistell's project for ‘kinesics’ always remained notional and it was never fulfilled, yet his fundamental insight that visible behaviour or ‘body motion’ is socially regulated and modulated by social tradition, so that shared kinesic codes ought, in principle, to be demonstrable has never, as far as I know, been rendered invalid. Indeed, although an organization of kinesics as Birdwhistell originally conceived of it perhaps cannot be found, it yet remains that there are aspects of the communicative functioning of body motion that do appear to have something like that organization (Birdwhistell, 1970; Kendon & Sigman, 1996). Birdwhistell’s importance lies mainly in how he managed to diffuse this insight and he had a direct influence on a number of workers who were later to become important in modern gesture studies. Thus, Birdwhistell collaborated with Charles Hockett, Norman McQuown, as well as with Gregory Bateson, in the famous Natural history of an interview which I have already mentioned. His input into this project was especially important for it was he, above all, who showed the possibility of a systematic examination of body motion in its contributions to the interactions examined in this project. Although this project was never published, it had many consequences. One of these was that, as a result of the insights that Birdwhistell derived from it, several workers were inspired to undertake micro-analyses of the organization of
22
ADAM KENDON
body-motion in relation to speech. Especially important here is the work of William Condon and William Ogston who, through a series of truly microscopic analyses, using sound-synchronized films, demonstrated the integration of body motion with speech. They showed how, with every change in phonation in the speech stream, corresponding changes in bodily movement in the head, the arms, and so forth, of the speaker could be observed. They furthermore suggested that there was a complex hierarchical organization to this patterned flow of speechcoordinated movement, so that higher level units in the speech stream had a correspondence to higher level units in the kinesic stream (Condon & Ogston, 1966; 1967). The observations of Birdwhistell, together with the work of Condon and Ogston, was what inspired me to undertake my own detailed analysis of the organization of body-motion and speech. The result of this was the publication, in 1972, of my paper “Some relationships between body-motion and speech” in which I examined hand and arm movements and head movements in a quite gesticulatory Englishman, filmed in a discussion that included Birdwhistell, that took place in a London pub (Kendon, 1972). From these analyses, and some others that I undertook, I came to the conclusion, expressed in the title of a paper published in 1980, that speech and gesticulation are, “two aspects of the same process of utterance” (Kendon, 1980). These demonstrations of the integrated nature of speech and body-motion organization were not widely appreciated at the time. Furthermore, neither in linguistics nor in psychology was there a well articulated theoretical framework to which they could be related. For the theoretical importance of these observations to be appreciated fully a number of other developments had to take place. This is where the ‘cognitive turn’ in linguistics and psychology comes in. This, of course, has its own complex history which I shall not recount here. However, one may recall how, in 1933, Bloomfield, in his famous textbook (Bloomfield, 1933), had argued that linguistics must become an autonomous science. Especially, he sought to free it from any connections with psychology. At the same time, in psychology, behaviorism was in full cry. As mentioned above, this meant that the phenomena of ‘mental processes’ including language, were largely ignored. However, by the early 1950s it was widely appreciated that the intellectual resources of behaviorism were exhausted. This must be one of the reasons why Noam Chomsky’s call for an approach to the study of language which sought to account for it in terms of the mental apparatus that made it possible had such wide appeal. One of Chomsky’s more provocative proposals was that a child is endowed from birth with a ‘Language Acquisition Device’ which allows it to discover the grammar of whichever language it happens to be exposed to. A consequence of this proposal, certainly highly controversial, was a rush of new research in which the utterances of very young children were examined to see what kinds of grammatical structures they might display. As soon as video recordings were used in this research, it became obvious that it would be necessary to understand how
ORIGINS OF MODERN GESTURE STUDIES 23
children acquired the ability to engage in actions of semantic significance of any sort, not just words. Gestures thus came to be a focus of interest. Many investigators concluded that action becomes significant as socially communicative action because of the way it is embedded within the structures of the interactional exchanges the mother creates with the infant, and the ability to engage in gesture was seen as an essential part of this process (see Bullowa (Ed.), 1979). Work by Elizabeth Bates and colleagues (1979), as well as that of others, provided support for the view that gesture and speech develop together and that they both develop in relation to the same combination of cognitive capacities. This reinforced the position that gesture and spoken utterance are differentiated manifestations of a more general process. It was at about this time that David McNeill first entered the field. He had already become known for his contributions to cognitive processes involved in language acquisition, but sometime in the mid 1970s he was much struck by an observation he made that speakers employed gestures in a way that suggested that the two modes of expression were integrated from the beginning. Early observations and some theoretical explorations that flowed from them we find in his Conceptual basis of language of 1979. He elaborated his observations and ideas about gesture in relation to speech in his article in the Psychological Review of 1985 entitled, “So you think gestures are nonverbal?” There were further developments in his textbook on psycholinguistics of 1987 (McNeill 1987), where he made clear the value of studying gesture for the light it might throw on various pyscholinguistic problems. The culmination of all this was the publication, in 1992, of Hand and Mind. In all this work it is made clear that speakers use gestures for conceptual expression. The images suggested by gestures refer to the speaker's concepts so the speaker is thinking in imagery as well as in words. The central theoretical problem has been to account for this. McNeill appears to have made his initial observations on the integration of gesture and speech quite independently of the work of those few others who had demonstrated it through film-analysis and which I have mentioned. He immediately saw its importance and already in 1979 and, especially in 1985, was able to offer a theoretical framework for the study of gesture that linked it very clearly to central preoccupations in psycholinguistic theory. This framework received its fullest elaboration in Hand and mind and this book immediately received serious attention in the worlds of linguistics and psycholinguistics and cognitive studies. Work on gesture had never before, in this manner, entered the arena where the leading problems in these fields were being discussed. This is undoubtedly due to the book’s own merits as a remarkably deep study of certain very widespread phenomena of gesture use. Like everything else, however, in order to fully account for the success with which its appearance was greeted we have to take into account a wider context. I have already discussed the ‘cognitive turn’ in linguistics and psychology, without which, I think, Hand and Mind would not have found a wide audience.
24
ADAM KENDON
However, for other reasons ‘gesture’ as a topic of study had already begun to look interesting and therefore respectable. As I have noted, the question of language origins was again under serious discussion. This was given further impetus when Gordon Hewes published his contribution in 1972, in which he made a powerful case for supposing that gesture would have been the first form of language. He showed how wide a range of material there was available that could bear on the problem of language origins. By emphasizing the role of gesture, Hewes brought it once again to the attention of a wide range of people as a phenomenon that could have central theoretical importance. Very important for Hewes was the work of Beatrice and Allen Gardner (Gardner & Gardner, 1969). They had reported what to many seemed surprising success in teaching a version of American Sign Language to Washoe, a young female chimpanzee. This work caused a great stir, for it threatened to bridge what is perhaps the most important of the Rubicons that are supposed to separate humans from animals. It therefore became a matter of urgency to evaluate the nature of the sign language that the Gardners had allegedly taught to Washoe. This challenge was taken up by Ursula Bellugi. She began by trying to compare Washoe’s acquisition of sign language with language acquisition in humans. She realized, however, that for this to be valid, the comparison ought to be with deaf children acquiring sign language. However, almost nothing was known about this, indeed almost nothing seemed to be known about sign language itself. And so it was that a major project on the study of sign language was initiated, which was soon to trigger the extraordinary expansion in sign language studies that then followed (Bronowski & Bellugi 1970; Bellugi, 1981). It is remarkable that this only took place from 1970 onwards, fully ten years after William Stokoe had shown that sign language was a linguistic system in its own right with its own properties (Stokoe, 1960). At the time of Stokoe’s work, the wider theoretical implications had not been appreciated. It seems that that the question of the linguistic status of sign language had to be linked to fundamental questions about the biological roots of language if its serious study was to gain wide support and attention. All of these different factors, then, have worked to create the climate that we have today in which the study of gesture is once again widely seen as being of great relevance for an understanding of language—and thus the study of gesture is now placed close to the core of one of the central preoccupations of our age. It is striking that the link between language and gesture, re-affirmed in the recent period most forcefully by McNeill, was perceived as something new. In the seventeenth and eighteenth centuries there seems never to have been any question but that ‘gesture’ was a part of language. The separation of gesture from language began in the late nineteenth century. It was in this period that the conception of language as exclusively verbal finally triumphed. Indeed, by implication, the view seems to have been that the only truly legitimate form of language was language in its written form. As Roy Harris (1980), among others has shown, written language became the model for language. This greatly assisted in promoting the
ORIGINS OF MODERN GESTURE STUDIES 25
idea of language as an autonomous system, complete in itself, whose structures could not only be laid bare, but which could serve as a complete vehicle of thought. I believe that the mechanization of printing technology allowing the mass production of books and newspapers on a large scale, that was perfected in the nineteenth century, had much to do with this development. The concept of ‘nonverbal communication’ was surely one of the consequences of this ideology of language. This unfortunate concept, which divides communication into words and everything else, decrees that anything that is, “not verbal,” must have some other kind of function, and this, almost by definition, could not be the same as the functions of words. As we saw, this view left gesture in the lurch and it was, as a result, neglected as a topic of study. If it was considered at all it was rather awkwardly thought of as somehow part of ‘nonverbal communication’. It is against this background that we must understand McNeill's (1985) title: “So you think gestures are nonverbal?” It is not that gestures are verbal. It is, rather, that gestures, like verbal expressions, may be vehicles for the expression of thoughts and so participate in the tasks of language. One further reflection. It seems to me that it was this division between the ‘verbal’ and the ‘non-verbal’ that made gestures mysterious. A strange thing, since in ordinary life, gestures are rarely mysterious. For some it seemed impossible to imagine what role they could have. The only explanation seemed to be that they somehow helped in the organization of the speaker’s verbal formulations. Close observation of how speakers use gesture make it difficult to see how this can be their main raison d'etre. They appear rather, to be partners with speech, creating, with words, the speaker's final expressions (see Kendon, 2004). To articulate the respective contributions that gesture and speech make to these final expressions and, as a consequence, to clarify just in what sense gestures are indeed not ‘nonverbal’ I think is one of the main tasks that, today, is being confronted by gesture studies. And it has been David McNeill's contribution that has been in large part responsible for setting this firmly at the centre of the agenda. References Armstrong, D. F., Stokoe, W. C., & Wilcox, S. E. (1995). Gesture and the nature of language. Cambridge: Cambridge University Press. Austin, G. (1966 [1802]). Chironomia or, a treatise on rhetorical delivery. Edited with a Critical Introduction by Mary Margaret Robb and Lester Thonssen. Carbondale and Edwardville: Southern Illinois University Press. Bates, E. (1979). The emergence of symbols. New York: Academic Press. Bateson, G. (1936). Naven: A survey of the problems suggested by a composite picture of the culture of a New Guinea tribe drawn from three points of view. Cambridge: Cambridge University Press. Bateson, G. (1968). Redundancy and coding. In T. A. Sebeok (Ed.), Animal communication: Techniques of study and results of research (pp.614-626). Bloomington: Indiana University Press.
26
ADAM KENDON
Bateson, G., & Mead, M. (1942). Balinese character: A photographic analysis. In W. G. Valentine (Ed.), Special Publications of the New York Academy of Sciences, Vol. II. New York: New York Academy of Sciences. Bateson, G., Jackson, D., Haley, J., & Weakland, J. H. (1956). Toward a theory of schizophrenia. Behavioral Science, 1, 251-264. Bellugi, U. (1981). The acquisition of a spatial language. In F. S. Kessel (Ed.), The development of language and language researchers: Essays in honor of Roger Brown (pp.153–185). Hillsdale, New Jersey: Lawrence Erlbaum. Benzoni, G. (1970). Bonifacio, Giovanni. In Dizionario biografico degli Italiani. Rome: Enciclopedia Italiana Treccani. Birdwhistell, R. L. (1970). Kinesics and context: Essays in body motion communication. Barton Jones (Ed.) Philadelphia: University of Pennsylvania Press. Bloomfield, L. (1933). Language. New York: Henry Holt. Bronowski, J., & Bellugi, U. (1970). Language, name, and concept. Science, 168, 669–673. Bullower, Margaret (Ed.) (1979). Before speech. Cambridge: Cambridge University Press. Bulwer, J. (1974 [1644]). Chirologia or the natural language of the hand, etc. [and] Chironomia or the art of manual rhetoric, etc. Edited with an Introduction by James W. Cleary (Ed.), Carbondale and Edwardville, Illinois: Southern Illinois University Press. Chomsky, N. (1968). Language and mind. New York: Harcourt Brace Jovanovich. Chomsky, N. (1957). Syntactic strucutres. The Hague: Mouton & Co. Condillac, Étienne Bonnot de (1971). An essay on the origin of human knowledge (1756). Facsimile reproduction of the translation of Thomas Nugent, with an introduction by Robert G. Weyant (Ed.). Delmar, New York: Scholars’ Facsimiles and Reprints. Condon W. C., & Ogston, R. (1966). Sound film analysis of normal and pathological behavior patterns. Journal of Nervous and Mental Disease, 143, 338-347. Condon W. C., & Ogston, R. (1967). A segmentation of behavior. Journal of Psychiatric Research, 5, 221-235. Corballis, M. C. (2002). From hand to mouth: The origins of language. Princeton, NJ: Princeton University Press. Danesi, M. (1993). Vico, metaphor and the origin of language. Bloomington: Indiana University Press. De Jorio, A. (2000). Gesture in Naples and gesture in classical antiquity. A translation of La mimica degli antichi investigata nel gestire napoletano [1832], with an Introduction and Notes, by Adam Kendon. Bloomington: Indiana University Press. Dutsch, D. (2002). Towards a grammar of gesture: An analysis of Quintilian’s Institutio Oratoria, 11, 85-124. Gesture, 2, 265-287. Efron, D. (1972). Gesture, race and culture. Preface by Paul Ekman. [Re-issue of Gesture and environment, originally published 1941] The Hague: Mouton and Co. Ekman, P., & Friesen, W. (1969). The repertorie of nonverbal behavior: Categories, origins, usage and coding. Semiotica, 1, 49-98. Facchini, M. (1983). An historical reconstruction of the events leading to the Congress of Milan in 1880. In W. C. Stokoe & V. Volterra (Eds.), SLR '83: Sign language research (pp.356-362). Roma: Istituto di Psicologia del Consiglio Nazionale delle Ricerche, Silver Spring, Maryland: Linstok Press. Fortuna, S. (2003). Gestural expression, perception and language: A discussion of the ideas of Johan Jakob Engel. Gesture, 3, 95-124. Gardner, R. A., & Gardner, B. T. (1969). Teaching sign language to a Chimpanzee. Science, 165(3894, 15 August), 664–672. Haddon, A. C. (Ed.) (1907). Reports of the Cambridge anthropological expedition to the Torres Straits. Volume III Linguistics, S. H. Ray (Ed.), Cambridge: Cambridge University Press. Hall, E. T. (1966). The hidden dimension. Garden City, New Jersey: Doubleday. Harris, R. (1980). The language makers. Ithaca, NY: Cornell University Press. Harris, R. (1987). The language machine. Ithaca, NY: Cornell University Press.
ORIGINS OF MODERN GESTURE STUDIES 27
Hewes, G. W. (1973). Primate communication and the gestural origin of language. Current Anthropology, 14, 5-24. Hockett, C. F., & Ascher, R. (1964). The human revolution. Current Anthropology, 5, 135–168. Kendon, A. (1972). Some relationships between body motion and speech. An analysis of an example. In A. Siegman & B. Pope (Eds.), Studies in dyadic communication (pp.177-210). Elmsford, New York: Pergamon Press. Kendon, A. (1980). Gesticulation and speech: two aspects of the process of utterance. In M. R. Key (Ed.), The relationship of verbal and nonverbal communication (pp.207-227). The Hague: Mouton and Co. Kendon, A. (1988). Sign languages of aboriginal Australia: Cultural, semiotic and communicative perspectives. Cambridge: Cambridge University Press. Kendon, A. (1990). Conducting interaction: Patterns of behavior in focused encounters. Cambridge: Cambridge University Press. Kendon, A. (1995). Andrea de Jorio - the first ethnographer of gesture? Visual Anthropology, 7, 375-394. Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge: Cambridge University Press. Kendon, A., & Sigman, S.J. (1996). Ray L. Birdwhistell (1918-1994). Semiotica, 112, 231–261. Knowlson, J. R. (1965). The idea of gesture as a universal language in the 17th and 18th centuries. Journal of the History of Ideas, 26, 495–508. Knox, D. (1990). Late medieval and renaissance ideas on gesture. In V. Kapp (Ed.), Die Sprache der Zeichen und Bilder. Rhetorik un nonverbale Kommunikation in der frühen Neuzeit (pp.11-39). Marburg, Germany: Hitzeroth,. Knox, D. (1996). Giovanni Bonifacio's L'arte de' cenni and Renaissance ideas of gesture. In M. Tavoni & others (Ed.), Italia ed Europa nella linguistica del rinascimento. Confronti e relazioni, atti del convegno internazionale, Ferrara, 20-24 marzo 1991, Vol. 2 (pp.379-400). Ferrara: Franco Cosimo Panini. Leeds-Hurwitz, W. (1987). The social history of “A Natural History of an Interview”: A multidisciplinary investigation of social communication. Research on Language and Social Interaction, 20, 1-51. Lipset, D. (1980). Gregory Bateson: The legacy of a great scientist. Englewood Cliffs, NJ: Prentice Hall. Mallery, G. (1972 [1881]). Sign language among North American Indians compared with that among other peoples and deaf-mutes. Photomechanic reprint of the 1881 Smithsonian Report. The Hague: Mouton. McNeill, D. (1979). The conceptual basis of language. Hillsdale, NJ: Erlbaum. McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review, 92, 350–371. McNeill, D. (1987). Psycholinguistics: A new approach. New York: Harper and Row. McNeill, D. (1992). Hand and mind. Chicago: University of Chicago Press. McQuown, N. A. (Ed.) (1971). The natural history of an interview. Microfilm collection of manuscripts on cultural anthropology, 15th Series. University of Chicago, Joseph Regenstein Library, Department of Photoduplication, Chicago, Illinois. Meggitt, M. (1954). Sign language among the Walbiri of Central Australia. Oceania, 25, 2–16. Quintilianus, M. F. (1922). The institutio oratoria of Quintilian with an English translation by H. E. Butler. The Loeb Classical Library. New York: G. P. Putnam and Sons. [Gesture is discussed in Volume IV, Book XI, part III]. Ruesch, J., & Kees, W. (1956). Nonverbal communication: Notes on the visual perception of human relations. Berkeley, California: University of California Press. Schnapp, A. (2000). Antiquarian studies in Naples at the end of the eighteenth century: From comparative archaeology to comparative religion. In G. Imbruglia (Ed.), Naples in the eighteenth century: The birth and death of a nation state (pp. 154-166). Cambridge: Cambridge University Press. Siegel, J. P. (1969). The Enlightenment and the evolution of a language of signs in France and England. Journal for the History of Ideas, 30, 96–115.
28
ADAM KENDON
Stokoe, W. C. (1960). Sign language structure: An outline of the visual communication systems of the American Deaf. Studies in Linguistics Occasional papers No. 8. Buffalo, New York: Department of Anthropology and Linguistics, University of Buffalo. Trager, G. L. (1958). Paralanguage: A first approximation. Studies in Linguistics, 13, 1-12. Vico, G. (1984). The new science of Giambattista Vico. Unabridged translation of the third edition (1744) with the addition of the “Practice of the new science.” T. G. Bergin & Max H. Fisch, translators. Ithaca, New York: Cornell University Press. Wells, G. A. (1987). The origin of language: Aspects of the discussion from Condillac to Wundt. La Salle, Illinois: Open Court. Wollock, J. (2002). John Bulwer (1606-1656) and the significance of gesture in 17th-century theories of language and cognition. Gesture, 2, 233-264. Wundt, W. (1973). The language of gestures. Translated by J.S.Thayer, C. M. Greenleaf & M. D. Silberman from Völkerpsychologie, etc., First Volume, Fourth Edition, First Part, Chapter 2. Stuttgart: Alfred Kröner Verlag, 1921. The Hague: Mouton.
Language and Cognition
Gesture with Speech and Without It* Susan Goldin-Meadow University of Chicago
Gesture is versatile in form and function. When gesture assumes the full burden of communication, acting on its own without speech, it takes on a language-like form. Even children who have never been exposed to a model for language produce gestures that display linguistic structure when their hands are their sole means of communication. But when gesture shares the burden of communication with speech, it loses its language-like structure, assuming instead a global and synthetic form. Although not language-like in structure when it accompanies speech, gesture still constitutes an important part of language. It conveys information imagistically and, as such, gives speakers a means to convey thoughts that they cannot express in words, and a mechanism for changing those thoughts. Gesture can be part of language or can itself be language, and thus provides insight not only into the content of communication but also its form.
1.
Introduction
I came to the University of Chicago as a young assistant professor armed with a phenomenon. I was studying deaf children who were unable to learn spoken language and had not been exposed to sign language. You might think children in this circumstance would be unable to communicate but, in fact, they communicate quite well. Take, for example, a profoundly deaf child who knows neither American Sign Language nor English. The child is shown a picture of a shovel stuck in sand and uses his hands to comment not on this particular shovel but on shovels used in the snow: He gestures, “dig,” points at a picture of a shovel, gestures, “pull-on-boots,” points outside, points downstairs, points at the shovel picture, gestures, “dig,” and gestures, “pull-on-boots.” The child has managed to convey several propositions about the snow shovel––how it’s used (to dig), when it’s used (when boots are worn), where it’s used (outside), and where it’s kept (downstairs). 1
A version of this paper appeared in the Proceedings of the Thirty-Second Annual Meeting of the Berkeley Linguistics Society, E. Anti, M. Babel, C. Chang, J. Hong, M. Houser, F.-C. Liu, M. Toosarvandani & Y. Yao (Eds.), Berkeley, CA: Berkeley Linguistics Society. This research was supported by grants from the National Science Foundation (BNS 8810879), the National Institute of Deafness and Other Communication Disorders (R01 DC00491), the National Institutes of Child Health and Human Development (R01 HD47450 and P01 HD40605), and the Spencer Foundation.
32 SUSAN GOLDIN-MEADOW What is most striking about this communicative act is that it is communication in which the hands play an essential role. Surprisingly, the fact that this was a gesture system wasn’t very important to me when I arrived at Chicago. But it became important, in part because David McNeill was my colleague. Over the course of the 30 years that we have been together at the University of Chicago, David has, I think unknowingly (because David is never pushy) helped shape the course of my research. He has done it by being thoughtful about language and gesture, and always being generous about sharing those thoughts; by serving on the dissertation committees of every single student that I have ever advised at the University of Chicago, and by co-teaching a yearly seminar on gesture for all of our students; by doing a joint research project that bridged our work and, for me, contextualized the phenomenon of gesture-creation in deaf children. I begin this chapter which was written in David’s honor with the phenomenon that I brought to Chicago––gesture without speech. What I think is striking about this phenomenon is that the deaf children are using their hands to take over the burden of communication and that their hands then take on the form of language. This phenomenon stands in contrast to the gestures that David has studied––gestures produced along with speech. These gestures share the burden of communication with speech and, as David has repeatedly shown, are not language-like in form. Thus, I begin by describing gesture when it is not accompanied by speech, and end with the work that my students (all of whom have been advised by David) and I have done on gesture that does accompany speech. In between, I describe the bridging study that David and I did together that, for me, links these two parts of my research life. 2.
Gesture without Speech Takes over the Forms and Functions of Language
2.1 Background on deafness and language-learning When deaf children are exposed to sign language from birth, they learn that language as naturally as hearing children learn spoken language (Newport & Meier, 1985). However, 90% of deaf children are not born to deaf parents who could provide early access to sign language. Rather, they are born to hearing parents who, quite naturally, expose their children to speech. Unfortunately, it is extremely uncommon for deaf children with severe to profound hearing losses to acquire spoken language without intensive and specialized instruction. Even with instruction, their acquisition of speech is markedly delayed (Conrad, 1979; Mayberry, 1992). The ten children I studied were severely to profoundly deaf (GoldinMeadow, 2003a). Their hearing parents had decided to educate them in oral schools where sign systems are neither taught nor encouraged. At the time of my observations, the children ranged in age from 1;2 to 4;10 (years;months) and had
GESTURE WITH SPEECH AND WITHOUT IT 33
made little progress in oral language, occasionally producing single words but never combining those words into sentences. In addition, they had not been exposed to a conventional sign system of any sort (e.g., American Sign Language or a manual code of English). The children thus knew neither sign nor speech. Under such inopportune circumstances, these deaf children might be expected to fail to communicate, or perhaps to communicate only in non-symbolic ways. The impetus for symbolic communication might require a language model, which all of these children lacked. However, this turns out not to be the case. Many studies had shown that deaf children will spontaneously use gestures–– called ‘homesigns’––to communicate if they are not exposed to a conventional sign language (Fant, 1972; Lenneberg, 1964; Moores, 1974; Tervoort, 1961). The child described earlier is an excellent example. Children who use gesture in this way are clearly communicating. The focus of my work has been to isolate the particular constructions that the children introduced into their gesture systems. These properties of language that a child can fashion even without benefit of linguistic input are what I call the ‘resilient’ properties of language (GoldinMeadow, 2003a). 2.2
The resilient properties of language
Table 1 lists the resilient properties of language that we have found thus far in the ten deaf children’s gesture systems (Goldin-Meadow, 2003a). There may, of course, be many others––just because we haven’t found a particular property in a deaf child’s homesign gesture system doesn’t mean it’s not there. The table lists properties at the word- and sentence-levels, as well as properties of language use, and details how each property is instantiated in the deaf children’s gesture systems. 2.2.1 Words The deaf children’s gesture words have five properties that are found in all natural languages. The gestures are stable in form, although they needn’t be. It would be easy for the children to make up a new gesture to fit every new situation (and, indeed, that appears to be what hearing speakers do when they gesture along with their speech, cf. McNeill, 1992). But that’s not what the deaf children do. They develop a stable store of forms that they use in a range of situations––they develop a lexicon, an essential component of all languages (Goldin-Meadow, Butcher, Mylander & Dodge, 1994). Moreover, the gestures the children develop are composed of parts that form paradigms, or systems of contrasts. When the children invent a gesture form, they do so with two goals in mind––the form must not only capture the meaning they intend (a gesture-world relation), but it must also contrast in a systematic way with other forms in their repertoire (a gesture-gesture relation). In addition, the parts that form these paradigms are categorical. For example, one child used a ‘Fist’ handshape to represent grasping a balloon string, a drumstick, and
34 SUSAN GOLDIN-MEADOW handlebars––grasping actions requiring considerable variety in diameter in the real world. The child did not distinguish objects of varying diameters within the ‘Fist’ category, but did use his handshapes to distinguish objects with small diameters as a set from objects with large diameters (e.g., a cup, a guitar neck, the length of a straw), which were represented by a ‘C-large’ hand. The manual modality can easily support a system of analog representation, with hands and motions reflecting precisely the positions and trajectories used to act on objects in the real world. But the children don’t choose this route. They develop categories of meanings that, although essentially iconic, have hints of arbitrariness about them (the children don’t, for example, all have the same form-meaning pairings for handshapes (Goldin-Meadow, Mylander & Butcher, 1995)). Table 1. The resilient properties of language
The Resilient Property Words Stability Paradigms Categories Arbitrariness Grammatical Functions
Sentences Underlying Frames Deletion Word Order Inflections Recursion Redundancy Reduction
Language Use Here-and-Now Talk Displaced Talk Self-Talk Meta-language Generics Narrative
As Instantiated in the Deaf Child’s Gesture System Gesture forms are stable and do not change capriciously with changing situations Gestures consist of smaller parts that can be combined to produce new gestures with different meanings The parts of gestures are composed of a limited set of forms, each associated with a particular meaning Pairings between gesture forms and meanings can have arbitrary aspects, albeit within an iconic framework Gestures are differentiated by the noun, verb, and adjective grammatical functions they serve
Predicate frames underlie gesture sentences Consistent production and deletion of gestures within a sentence mark particular thematic roles Consistent orderings of gestures within a sentence mark particular thematic roles Consistent inflections on gestures mark particular thematic roles Complex gesture sentences are created by recursion Redundancy is systematically reduced in the surface of complex gesture sentences
Gesturing is used to make requests, comments, and queries about the present Gesturing is used to communicate about the past, future, and hypothetical Gesturing is used to communicate with oneself Gesturing is used to refer to one’s own and others’ gestures Gesturing is used to make generic statements, particularly about animals Gesturing is used to tell stories about self and others
GESTURE WITH SPEECH AND WITHOUT IT 35
Finally, the gestures the children develop are differentiated by grammatical function. Some serve as nouns, some as verbs, some as adjectives. As in natural languages, when the same gesture is used for more than one grammatical function, that gesture is marked (morphologically and syntactically) according to the function it plays in the particular sentence (Goldin-Meadow et al., 1994). For example, if a child were to use a twisting gesture in a verb role, that gesture would likely be produced near the jar to be twisted open (and is, in this sense, marked or inflected), it would not be abbreviated (produced with several twists), and it would be produced after a pointing gesture at the jar. In contrast, if the child were to use the twisting gesture in a noun role, the gesture would likely be produced in neutral position near the chest (in an unmarked or uninflected space), it would be abbreviated (produced with one twist rather than several), and it would occur before the pointing gesture at the jar. 2.2.2 Sentences The deaf children’s gesture sentences have six properties found in all natural languages. Underlying each sentence is a ‘predicate frame’ that determines how many arguments can appear along with the verb in the surface structure of that sentence (Goldin-Meadow, 1985). For example, four slots underlie a gesture sentence about transferring an object, one for the verb and three for the arguments (actor, patient, recipient). In contrast, three slots underlie a gesture sentence about eating an object, one for the verb and two for the arguments (actor, patient). Moreover, the arguments of each sentence are marked according to the thematic role they play. There are three types of markings that are resilient (Goldin-Meadow & Mylander, 1984; Goldin-Meadow et al., 1994). (1) Deletion––The children consistently produce and delete gestures for arguments as a function of thematic role; for example, they are more likely to delete a gesture for the object or person playing the role of transitive actor (soldier in “soldier beats drum”) than they are to delete a gesture for an object or person playing the role of intransitive actor (soldier in “soldier marches to wall”) or patient (drum in “soldier beats drum”). (2) Word order––The children consistently order gestures for arguments as a function of thematic role; for example, they place gestures for intransitive actors and patients in the first position of their two-gesture sentences (soldier-march; drum-beat). (3) Inflection––The children mark with inflections gestures for arguments as a function of thematic role; for example, they displace a verb gesture in a sentence toward the object that is playing the patient role in that sentence (the “beat” gesture would be articulated near, but not on, a drum). In addition, recursion, which gives natural languages their generative capacity, is a resilient property of language. The children form complex gesture sentences out of simple ones (Goldin-Meadow, 1982). For example, one child pointed at me, produced a “wave” gesture, pointed again at me, and then produced a “close” gesture to comment on the fact that I had waved before closing the
36 SUSAN GOLDIN-MEADOW door––a complex sentence containing two propositions: “Susan waves” (proposition 1) and “Susan closes door” (proposition 2). The children systematically combine the predicate frames underlying each simple sentence, following principles of sentential and phrasal conjunction. When there are semantic elements that appear in both propositions of a complex sentence, the children have a systematic way of reducing redundancy, as do all natural languages (Goldin-Meadow, 1982; 1987). 2.2.3 Language use The deaf children use their gestures for five central functions that all natural languages serve. They use gesture to make requests, comments, and queries about things and events that are happening in the situation––that is, to communicate about the here-and-now. Importantly, however, they also use their gestures to communicate about the non-present––displaced objects and events that take place in the past, the future, or in a hypothetical world (Butcher, Mylander & GoldinMeadow, 1991; Morford & Goldin-Meadow, 1997). In addition to these rather obvious functions that language serves, the children use their gestures to communicate with themselves––to self-talk (GoldinMeadow, 2003a)––and to refer to their own or to others’ gestures––for metalinguistic purposes (Singleton, Morford & Goldin-Meadow, 1993). They also use their gestures to make generic statements (Goldin-Meadow, Gelman & Mylander, 2005). And finally, the children use their gestures to tell stories about themselves and others––to narrate (Phillips, Goldin-Meadow & Miller, 2001). They tell stories about events they or others have experienced in the past, events they hope will occur in the future, and events that are flights of imagination. For example, in response to a picture of a car, one child produced a “break” gesture, an “away” gesture, a pointing gesture at his father, a “car-goes-onto-truck” gesture. He paused and produced a “crash” gesture and repeated the “away” gesture. The child was telling us that his father’s car had crashed, broken, and gone onto a tow truck. Note that, in addition to producing gestures to describe the event itself, the child produced what we have called a narrative marker––the “away” gesture, which marks a piece of gestural discourse as a narrative in the same way that ‘once upon a time’ is often used to signal a story in spoken discourse. 2.3
Using the spontaneous gestures of speakers as input
The deaf children I study are not exposed to a conventional sign language and thus cannot be fashioning their gestures after such a system. They are, however, exposed to the gestures that their hearing parents use when they speak. These gestures are likely to serve as relevant input to the gesture systems that the deaf children construct. The question is what does this input look like and how do the children use it?
GESTURE WITH SPEECH AND WITHOUT IT 37
We first ask whether the gestures that the hearing parents use with their deaf children exhibit the same structure as their children’s gestures. If so, these gestures could serve as a model for the deaf children's system. If not, we have an opportunity to observe how the children transform the input they do receive into a system of communication that has many of the properties of language. 2.3.1 The hearing parents’ gestures are not structured like their deaf children’s Hearing parents gesture when they talk to young children (Bekken, 1989; Shatz, 1982; Iverson, Capirci, Longobardi & Caselli, 1999) and the hearing parents of our deaf children are no exception. The deaf children’s parents were committed to teaching them to talk and therefore talked to their children as often as they could. And when they talked, they gestured. We looked at the gestures that the hearing mothers produced when talking to their deaf children. However, we looked at them not like they were meant to be looked at, but as a deaf child might look at them. We turned off the sound and analyzed the gestures using the same analytic tools that we used to describe the deaf children’s gestures (Goldin-Meadow & Mylander, 1983; 1984). We found that the hearing mothers’ gestures do not have structure when looked at from a deaf child’s point of view. Going down the list of resilient properties displayed in Table 1, we find no evidence of structure at any point in the mothers’ gestures. With respect to gestural words, the mothers did not have a stable lexicon of gestures (GoldinMeadow et al., 1994); nor were their gestures composed of categorical parts that either formed paradigms (Goldin-Meadow et al., 1995) or varied with grammatical function (Goldin-Meadow et al., 1994). With respect to gestural sentences, the mothers rarely concatenated their gestures into strings and thus provided little data from which we (or their deaf children, for that matter) could abstract predicate frames or any of the three marking patterns (deletion, word order, or inflection; Goldin-Meadow & Mylander, 1984). Whereas all of the children produce complex sentences displaying recursion, only some of the mothers produced complex gesture sentences and they first produced sentences of this type after their children (Goldin-Meadow, 1982). With respect to gestural use, the mothers did not make displaced reference with their gestures (Butcher et al., 1991), nor did we find evidence of any of the other uses to which the children put their gestures, including story-telling (e.g., Phillips et al., 2001). Of course, it may be necessary for the deaf children to see hearing people gesturing in communicative situations in order to get the idea that gesture can be appropriated for the purposes of communication. However, in terms of how the children structure their gestured communications, there is no evidence that this structure comes from the children’s hearing mothers. Thus, although the deaf children may be using hearing peoples’ gestures as a starting point, they go well beyond that point––transforming the gestures they see into a system that looks very much like language.
38 SUSAN GOLDIN-MEADOW 2.3.2 How to study the deaf child’s transformation of gesture into home sign: A cross-cultural approach How can we learn more about this process of transformation? The fact that hearing speakers across the globe gesture differently when they speak affords us with an excellent opportunity to explore if––and how––deaf children make use of the gestural input that their hearing parents provide. For example, the gestures that accompany Spanish and Turkish look very different from those that accompany English and Mandarin. As described by Talmy (1985), Spanish and Turkish are verb-framed languages, whereas English and Mandarin are satelliteframed languages. This distinction depends primarily on the way in which the path of a motion is packaged. In a satellite-framed language, path is encoded outside of the verb (e.g., “down” in the sentence, “he flew down”) and manner is encoded in the verb itself (“flew”). In contrast, in a verb-framed language, path is bundled into the verb (e.g., “sale” in the Spanish sentence, “sale volando” = exits flying) and manner is outside of the verb (“volando”). One effect of this typological difference is that manner is often omitted from Spanish sentences (Slobin, 1996). However, David (McNeill, 1998) has observed an interesting compensation: although manner is omitted from Spanish-speakers’ talk, it frequently crops up in their gestures. Moreover, and likely because Spanish-speakers’ manner gestures do not co-occur with a particular manner word, their gestures tend to spread through multiple clauses (McNeill, 1998). As a result, Spanish-speakers’ manner gestures are longer and may be more salient to a deaf child than the manner gestures of English- or Mandarin-speakers. Turkish-speakers also produce gestures for manner relatively frequently. In fact, Turkish-speakers commonly produce gestures that convey only manner (e.g., fingers wiggling in place = manner alone vs. fingers wiggling as the hand moves forward = manner + path; Ozyurek & Kita, 1999; Kita, 2000). Manner-only gestures are rare in English- and Mandarin-speakers. These four cultures––Spanish, Turkish, American, and Chinese––thus offer an excellent opportunity to examine the effects of hearing speakers’ gestures on the gesture systems developed by deaf children. Our plan in future work is to take advantage of this opportunity. If deaf children in all four cultures develop gesture systems with the same structure despite wide differences in the gestures they see, we will have strong evidence of the biases children themselves must bring to a communication situation. If, however, the children differ in the gesture systems they construct, we will be able to explore how a child’s construction of a language-like gesture system can be influenced by the gestures he or she sees. We have already found that American deaf children exposed only to the gestures of their hearing English-speaking parents create gesture systems that are very similar in structure to the gesture systems constructed by Chinese deaf children exposed to the gestures of their hearing Mandarin-speaking parents (Goldin-Meadow et al., 2005; Goldin-Meadow & Mylander, 1998; Goldin-Meadow, Mylander & Franklin, in press; Phillips et al., 2001; Zheng & Goldin-Meadow, 2002). The
GESTURE WITH SPEECH AND WITHOUT IT 39
question now is whether these children’s gesture systems are different from those of Spanish and Turkish deaf children of hearing parents. 3.
An Experimental Manipulation of Gesture with and without Speech
The hearing mothers of each of the deaf children in our studies are committed to teaching their children to speak. As a result, they never gesture without talking. And, like all speakers’ gestures, the gestures that the hearing mothers produce form an integrated system with the speech they accompany (McNeill, 1992). The mothers’ gestures are thus constrained by speech and are not “free” to take on the resilient properties of language found in their children’s gestures. The obvious question is what would happen if we forced the mothers to keep their mouths shut. David and I, in collaboration with Jenny Singleton, did just that––although the participants in our study were undergraduates at the University of Chicago, not the deaf children’s hearing mothers (Goldin-Meadow, McNeill & Singleton, 1996). We asked English-speakers who had no previous experience with sign language to describe a series of videotaped scenes using their hands and not their mouths. We then compared the resulting gestures to the gestures these same adults produced when asked to describe the scenes using speech. We found that, when using gesture on its own, the adults frequently combined their gestures into strings and those strings were reliably ordered, with gestures for certain semantic elements occurring in particular positions in the string; that is, there was structure across the gestures at the sentence level. In addition, the verb-like action gestures that the adults produced when using gesture on its own could be divided into handshape and motion parts, with the handshape of the action frequently conveying information about the objects in its semantic frame; that is, there was structure within the gesture at the word level. Neither of these properties appeared in the gestures that these same adults produced along with speech. Thus, only when asked to use gesture on its own did the adults produce gestures characterized by segmentation and combination. Moreover, they constructed these gesture combinations with essentially no time for reflection on what might be fundamental to language-like communication. The adults might have gotten the inspiration to order their gestures from their own English language. However, the particular order that they used in their gestures did not follow canonical English word order. For example, adults were asked to describe a doughnut-shaped object that arcs out of an ashtray. When using gesture without speech, the adults produced a gesture for the ashtray first, followed by a gesture for the doughnut, and finally a gesture for the arcing-out action (Goldin-Meadow et al., 1996; Gershkoff-Stowe & Goldin-Meadow, 2002). Note that a typical description of this scene in English would follow a different order: “The doughnut arcs out of the ashtray.” The adults not only displayed a non-English ordering pattern but they also displayed a non-English deletion pattern when using gesture on it own. Moreover, the deletion pattern resembled
40 SUSAN GOLDIN-MEADOW the pattern found in the deaf children’s gestures (Goldin-Meadow, Yalabik & Gershkoff-Stowe, 2000). Although the adults incorporated many linguistic properties into the gestures they produced when using gesture on its own, they did not develop all of the properties found in natural language, or even all of the properties found in the gesture systems of the deaf children. In particular, they failed to develop a system of internal contrasts in their gestures. When incorporating handshape information into their action gestures, they rarely used the same handshape to represent an object, unlike the deaf child whose handshapes for the same objects were consistent in form and in meaning (Singleton, Morford & Goldin-Meadow, 1993). Thus, a system of contrasts in which the form of a symbol is constrained by its relation to other symbols in the system (as well as by its relation to its intended referent) is not an immediate consequence of symbolically communicating information to another. The continued experience that the deaf children had with a stable set of gestures (cf. Goldin-Meadow et al., 1994) may be required for a system of contrasts to emerge in those gestures. In sum, when gesture is called upon to fulfill the communicative functions of speech, it immediately takes on the properties of segmentation and combination that are characteristic of speech. The appearance of these properties in the adults’ gestures is particularly striking given that these properties were not found in the gestures that these same adults produced when asked to describe the scenes in speech. When the adults produced gestures along with speech, they rarely combined those gestures into strings and rarely used the shape of the hand to convey any object information at all (Goldin-Meadow et al., 1996). In other words, they did not use their gestures as building blocks for larger units, either sentence or word units. Rather, they used their gestures to holistically and mimetically depict the scenes in the videotapes, as speakers typically do when they spontaneously gesture along with their talk, the topic to which we now turn. 4.
Gesture with Speech Reflects Thoughts that do not Fit into Speech
4.1 The relation between gesture and speech predicts readiness to learn Gesture and speech encode meaning differently (Goldin-Meadow, 2003b; Kendon, 1980; McNeill, 1992). Gesture conveys meaning globally relying on visual and mimetic imagery. Speech conveys meaning discretely, relying on codified words and grammatical devices. Because gesture and speech employ such different forms of representation, it is difficult for the two modalities to contribute identical information to a message. Nonetheless, the information conveyed in gesture and in speech can overlap a great deal. For example, consider a child asked first whether the amount of water in two identical glasses is the same, and then whether the amount of water in one of the glasses changes after it is poured into a low wide dish. The child says that the amounts of water in the two glasses are the same at the beginning but different
GESTURE WITH SPEECH AND WITHOUT IT 41
after the pouring transformation. When asked to explain this answer, the child focuses on the height of the water in the containers in both speech and gesture–– he says, “it’s different because this one’s low and that one’s tall,” while gesturing the height of the water first on the dish and then on the glass. The child is thus conveying a justification in gesture that overlaps a great deal with the justification in speech––a gesture-speech match (Church & Goldin-Meadow, 1986). However, there are instances when gesture conveys information that overlaps very little with the information conveyed in the accompanying speech. Consider, for example, a child who gives the same explanation as the first child in speech but conveys different information in gesture––she produces a wide “C” hand representing the width of the water in the dish, followed by a narrow “C” representing the width of the water in the glass. This child is focusing on the height of the water in speech but on its width in gesture. She has produced a gesture-speech mismatch (Church & Goldin-Meadow, 1986). Children who produce mismatches in their explanations of a task have information relevant to solving the task at their fingertips and could, as a result, be on the cusp of learning the task. If so, they ought to be particularly receptive to instruction on the task––and indeed they are. Children who produce gesturespeech mismatches prior to instruction on conservation problems of this sort are more likely to profit from that instruction than children who produce matches (Church & Goldin-Meadow, 1986). This phenomenon is robust, found in learners of all ages on a wide variety of tasks taught by an experimenter: 5 to 9 year olds learning a balance task (Pine, Lufkin & Messer, 2004); 9 to 10 year olds learning a math task (Perry, Church & Goldin-Meadow, 1988; Alibali & Goldin-Meadow, 1993); and adults learning a gears task (Perry & Elder, 1997). The phenomenon is also found in naturalistic learning situations: toddlers learning their first word combinations (Goldin-Meadow & Butcher, 2003; Iverson & Goldin-Meadow, 2005; Ozcaliskan & Goldin-Meadow, 2005) and school-aged children learning a mathematical concept from a teacher (Goldin-Meadow & Singer, 2003). 4.1.1 Why do gesture-speech mismatches predict openness to instruction? A speaker who produces a mismatch is expressing two ideas––one in speech and another in gesture. The fact that the speaker is entertaining two ideas on a single problem may lead to cognitive instability which, in turn, can lead to change. If so, a task known to encourage the activation of two ideas ought to evoke mismatches. Tower of Hanoi is a well-studied puzzle that is most efficiently solved by activating subroutines at theoretically defined choice points. There is a great deal of evidence that adults and children do indeed activate two ideas (the sub-routine and an alternative path) at particular choice points on the Tower of Hanoi problem (Anzai & Simon, 1979; Bidell & Fischer, 1995; Klahr & Robinson, 1981). We might therefore expect mismatches to occur at just these moments––and they do. When asked to explain how they solved the Tower of Hanoi puzzle, both adults and children produce significantly more gesture-speech mismatches––explanations in which speech conveys one path and gesture
42 SUSAN GOLDIN-MEADOW another––at the theoretically defined choice points than at non-choice points (Garber & Goldin-Meadow, 2002). Mismatches thus tend to occur at points known to activate two strategies. We can also test this idea from the opposite direction––we can select a situation known to elicit gesture-speech mismatches and explore whether two ideas are activated simultaneously in this situation. Consider a group of children selected because they produced either gesture-speech mismatches or matches when explaining a math problem. These children were then asked to remember a list of words while at the same time solving the math problem. All of the children solved the problem incorrectly, but children known to be mismatchers worked harder to arrive at their incorrect answers than children known to be matchers–– they remembered fewer words when solving the problems, suggesting that they were indeed activating more than one strategy (Goldin-Meadow, Nusbaum, Garber & Church, 1993). Producing mismatches appears to reflect the activation of two ideas at the same time (see also Thurnham & Pine, 2006). 4.1.2 Is the information found in gesture in a mismatch unique to gesture? When speakers produce a mismatch, the information conveyed in gesture in that mismatch is, by definition, not found in the accompanying speech. For example, the child in the conservation task described earlier conveyed width information in gesture but not in her accompanying speech. However, it is possible that, on the very next problem, this child might describe the widths of the containers in speech. Alternatively, the information found in the gesture component of a mismatch might be accessible only to gesture; if so, the child would not be able to talk about the widths of the containers on any of the problems. The second alternative turns out to be the case, at least for children in the process of learning mathematical equivalence with respect to addition––children who convey a particular strategy in the gesture half of a mismatch on a math problem do not convey that strategy in speech on any of the math problems in the set (Goldin-Meadow, Alibali & Church, 1993). What this means is that children who produce mismatches have information in their repertoires that they know implicitly but cannot articulate. It also means that, as listeners, if we want to know that a child has this information in her repertoire, we need to watch the child as well as listen to her. We have seen that the gestures people produce as they explain a task reflect what they know about the task. But gesture may do more than reflect knowledge– –it may play a role in changing that knowledge. Gesture has the potential to change knowledge in two non-mutually exclusive ways, explored in the next two sections.
GESTURE WITH SPEECH AND WITHOUT IT 43
4.2 Gesture as a mechanism of change through its communicative effects Gesture has the potential to function as a mechanism of change through its communicative effects. If gestures reflect the state of the speaker’s knowledge, they could serve as a signal to others that the speaker is at a transitional point. If listeners are then sensitive to this signal, they may, as a consequence, change the way they interact with the speaker. In this way, speakers can play a role in shaping their leaning environments just by moving their hands. The hypothesis here is simple––(1) speakers reveal information about their cognitive status through their gestures; (2) listeners pay attention to those gestures and alter their input to the speaker accordingly; (3) the speaker then profits from this altered input. We have just reviewed evidence for point (1). The next question is whether listeners pay attention to the gestures speakers produce and modify their instruction in response. We explore this question in a one-on-one tutorial involving a teacher and an individual child. 4.2.1 Do teachers alter their instruction in response to their students’ gestures? In order for gesture to play an important role in learning, listeners must not only pay attention to gesture but must be able to do so in naturalistic teaching situations. We therefore observed teachers spontaneously interacting with their students. Teachers were asked to observe children explaining how they solved a series of math problems to an experimenter. The teachers then gave the children individual instruction in mathematical equivalence. Each of the teachers, at times, picked up on information that their students produced uniquely in gesture, often translating that information into their own words (Goldin-Meadow, Kim & Singer, 1999). Teachers do pay attention to their students’ gestures. But do they alter their instruction in response to those gestures? Interestingly, the teachers gave different types of instruction to children who produced mismatches than to children who produced only matches. They used more different types of spoken strategies and more of their own gesture-speech mismatches when teaching children who produced mismatches (Goldin-Meadow & Singer, 2003). And the children who produced mismatches learned. But why? The children may have learned because their teachers gave them just the right instruction. Alternatively, they may have learned because they were ready to learn. 4.2.2 Are the adjustments teachers make in response to children’s gestures good for learning? We know that including gesture in instruction is, in general, good for learning (Church, Ayman-Nolley & Mahootian, 2004; Perry, Berch & Singleton, 1995; Valenzeno, Alibali & Klatzky, 2003). But to find out whether the particular adjustments that the teachers made in their math tutorials actually promote learning, we need to experimentally manipulate the numbers and types of strategies children are taught in speech and in gesture. Following a script, an
44 SUSAN GOLDIN-MEADOW experimenter taught children one or two strategies in speech and, at the same time, varied the relation between her speech and gestures––some children received no gesture at all, some received gestures that conveyed the same strategy as speech (matching gesture), and some received gestures that conveyed different strategies from speech (mismatching gesture). Children who were taught one spoken strategy were far more successful after instruction than children taught two––the teachers’ spontaneous adjustments in the tutorials were wrong on this count. But the teachers were right about mismatches––children who were taught with mismatching gestures were far more successful after instruction than children taught with matching gestures or no gestures (Singer & Goldin-Meadow, 2005). Getting two strategies in instruction was effective but only when those two strategies were produced across modalities, one in speech and the other in gesture. A conversation thus appears to take place in gesture alongside the conversation taking place in speech––speakers use their hands to reveal their cognitive state to their listeners who, in turn, use their hands to provide instruction that promotes learning. 4.3
Gesture as a mechanism of change through its cognitive effects
Gesture also has the potential to function as a mechanism of change through its cognitive effects. When faced with a difficult problem to solve, we often find it helpful to use a cognitive prop. For example, writing a problem down can reduce cognitive effort, thereby freeing up resources that can then be used to solve the problem. In other words, externalizing our thoughts can save cognitive effort that can then be put to more effective use. Gesture can externalize ideas and thus has the potential to affect learning by influencing learners directly. Indeed, including gesture in instruction might be effective because it encourages learners to produce gestures of their own. Adults mimic nonverbal behaviors that their conversational partners produce (Chartrand & Bargh, 1999) and even very young infants imitate nonverbal behaviors modeled by an experimenter (Meltzoff & Moore, 1977). It would therefore not be at all surprising if school-aged children were to imitate the gestures that their teachers produce. And indeed they do. More relevant to the point here is the fact that children who produce these gestures are more likely to succeed after instruction than children who do not (Cook & Goldin-Meadow, 2006). We also find that gesturing leads to learning when we manipulate it more directly, either by telling children to move their hands as they explain their answers to a problem (Broaders, Cook, Mitchell & Goldin-Meadow, 2006), or by asking them to mimic hand movements produced by the experimenter that instantiate a problem-solving strategy (Cook, Mitchell & Goldin-Meadow, 2006). Gesturing during instruction encourages children to produce gestures of their own which, in turn, leads to learning. Children may be able to use their hands to change their minds. But why? One reason may be because gesturing lightens our cognitive load. Adults and children were asked to explain how they solved a math problem while
GESTURE WITH SPEECH AND WITHOUT IT 45
at the same time remembering a list of words or letters. Both groups were found to remember more items when they gestured during their math explanations than when they did not gesture (Goldin-Meadow, Nusbaum, Kelly & Wagner, 2001). Gesturing appears to save speakers cognitive resources on the explanation task, permitting the speakers to allocate more resources elsewhere, in this case, to the memory task. But gesture might not be lightening the speaker’s load. It might merely be shifting the load away from a verbal store, perhaps to a visuo-spatial store. The idea here is that gesturing allows speakers to convey in gesture information that might otherwise have gone into a verbal store. Lightening the burden on the verbal store should make it easier to do a simultaneously performed verbal task. If, however, the burden has really been shifted to a visuo-spatial store, it should be harder to perform a spatial task (recalling the location of dots on a grid) when simultaneously gesturing than when not gesturing. But gesturing continues to lighten the speaker’s load even if the second task is a spatial one (Wagner, Nusbaum & Goldin-Meadow, 2004). Perhaps gesturing lightens a speaker’s load because it is a motor activity that energizes the system (Butterworth & Hadar, 1989). If so, the type of gesture produced shouldn’t matter––it should matter only that a speaker gestures, not what the speaker gestures. But speakers do not remember more items when they are told to produce meaningless movements with their hands, only when they are told to gesture (Cook, 2006). Moreover, the number of items that speakers remember depends on the meaning conveyed by gesture––speakers remember more items when their gestures convey the same information as their speech (one message) than when their gestures convey different information (two messages) (Wagner et al., 2004). Gesture’s content thus determines demands on working memory, suggesting that gesture confers its benefits, at least in part, through its representational properties. 5.
Conclusion
Gesture is chameleon-like in its form and that form is tied to the function the gesture is serving. When gesture assumes the full burden of communication, acting on its own without speech, it takes on a language-like form. But when gesture shares the burden of communication with speech, it loses its language-like structure, assuming instead a global and synthetic form. Although not languagelike in structure when it accompanies speech, gesture still forms an important part of language. It conveys information imagistically and, as such, has access to different information than does the verbal system. Gesture thus allows speakers to convey thoughts that may not easily fit into the categorical system that their conventional language offers (Goldin-Meadow & McNeill, 1999). Moreover, gesture has the potential to go beyond reflecting thought and play a role in shaping it.
46 SUSAN GOLDIN-MEADOW The discoveries that my students and I have made about gesture were made possible in large part because of the open and exciting intellectual climate that David has fostered at the University of Chicago. I am grateful to him for all that he has taught me about being a good colleague and, of course, for all that he has taught me about hand and mind. References Alibali, M. W., & Goldin-Meadow, S. (1993). Gesture-speech mismatch and mechanisms of learning: What the hands reveal about a child's state of mind. Cognitive Psychology, 25, 468-523. Anzai, Y., & Simon, H. A. (1979). The theory of learning by doing. Psychological Review, 86 (3), 124-140. Bekken, K. (1989). Is there "Motherese" in gesture? Unpublished doctoral dissertation, University of Chicago. Bidell, T. R., & Fischer, K. W. (1995). Developmental transitions in children's early on-line planning. In M. M. Haith, J. B. Benson, R. J. Roberts, Jr., & B. F. Pennington (Eds.), The development of future-oriented processes. Chicago: University of Chicago Press. Broaders, S., Cook, S. W., Mitchell, Z., & Goldin-Meadow (2006). Making children gesture reveals implicit knowledge and leads to learning. Under review. Butcher, C., Mylander, C., & Goldin-Meadow, S. (1991). Displaced communication in a selfstyled gesture system: Pointing at the non-present. Cognitive Development, 6, 315-342. Butterworth, B., & Hadar, U. (1989). Gesture, speech, and computational stages: A reply to McNeill. Psychological Review, 96, 168-174. Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception-behavior link and social interaction. Journal of Personality & Social Psychology, 76, 893-910. Church, R. B., Ayman-Nolley, S., & Mahootian, S. (2004). The role of gesture in bilingual education: Does gesture enhance learning? International Journal of Bilingual Education and Bilingualism, 7, 303-319. Church, R. B., & Goldin-Meadow, S. (1986). The mismatch between gesture and speech as an index of transitional knowledge. Cognition, 23, 43-71. Conrad, R. (1979). The deaf child. London: Harper & Row. Cook, S. W. (2006). Gesture, movement and working memory: A functional account. Unpublished doctoral dissertation, University of Chicago. Cook, S. W., & Goldin-Meadow, S. (2006). The role of gesture in learning: Do children use their hands to change their minds? Journal of Cognition and Development, 7, 211-232. Cook, S. W., Mitchell, Z., & Goldin-Meadow, S. (2006). Gesturing makes learning last. Under review. Fant, L. J. (1972). Ameslan: An introduction to American Sign Language. Silver Springs, Md.: National Association of the Deaf. Garber, P., & Goldin-Meadow, S. (2002). Gesture offers insight into problem-solving in children and adults. Cognitive Science, 26, 817-831. Gershkoff-Stowe, L., & Goldin-Meadow, S. (2002). Is there a natural order for expressing semantic relations? Cognitive Psychology, 45(3), 375-412. Goldin-Meadow, S. (1982). The resilience of recursion: A study of a communication system developed without a conventional language model. In E. Wanner & L. R. Gleitman (Eds.), Language acquisition: The state of the art. N.Y.: Cambridge University Press. Goldin-Meadow, S. (1985). Language development under atypical learning conditions: Replication and implications of a study of deaf children of hearing parents. In K. Nelson (Ed.), Children's Language, Vol. 5 (pp.197-245). Hillsdale, NJ: Erlbaum. Goldin-Meadow, S. (1987). Underlying redundancy and its reduction in a language developed without a language model: The importance of conventional linguistic input. In B. Lust
GESTURE WITH SPEECH AND WITHOUT IT 47
(Ed.), Studies in the acquisition of anaphora: Applying the constraints, Vol. II, (pp.105133). Boston, Mass.: D. Reidel Publishing. Company. Goldin-Meadow, S. (2003a). The resilience of language: What gesture creation in deaf children can tell us about language-learning in general. N.Y.: Psychology Press. Goldin-Meadow, S. (2003b). Hearing gesture: How our hands help us think. Cambridge, MA: Harvard University Press. Goldin-Meadow, S., Alibali, M. W., & Church, R. B. (1993). Transitions in concept acquisition: Using the hand to read the mind. Psychological Review, 100, 279-297. Goldin-Meadow, S. & Butcher, C. (2003). Pointing toward two-word speech in young children. In S. Kita (Ed.), Pointing: Where language, culture, and cognition meet. NJ: Erlbaum. Goldin-Meadow, S., Butcher, C., Mylander, C. & Dodge, M. (1994). Nouns and verbs in a selfstyled gesture system: What's in a name? Cognitive Psychology, 27, 259-319. Goldin-Meadow, S., Gelman, S., & Mylander, C. (2005). Expressing generic concepts with and without a language model. Cognition, 96, 109-126. Goldin-Meadow, S., Kim, S., & Singer, M. (1999). What the teacher's hands tell the student's mind about math. Journal of Educational Psychology, 91, 720-730. Goldin-Meadow, S. & McNeill, D. (1999). The role of gesture and mimetic representation in making language the province of speech. In Michael C. Corballis & Stephen Lea (Eds.), The descent of mind (pp. 155-172). Oxford: Oxford University Press. Goldin-Meadow, S., McNeill, D., & Singleton, J. (1996). Silence is liberating: Removing the handcuffs on grammatical expression in the manual modality. Psychological Review, 103, 34-55. Goldin-Meadow, S., & Mylander, C. (1983). Gestural communication in deaf children: The noneffects of parental input on language development. Science, 221, 372-374. Goldin-Meadow, S., & Mylander, C. (1984). Gestural communication in deaf children: The effects and non-effects of parental input on early language development. Monographs of the Society for Research in Child Development, 49,1-121. Goldin-Meadow, S. & Mylander, C. (1998). Spontaneous sign systems created by deaf children in two cultures. Nature, 91, 279-281. Goldin-Meadow, S., Mylander, C., & Butcher, C. (1995). The resilience of combinatorial structure at the word level: Morphology in self-styled gesture systems. Cognition, 56, 195-262. Goldin-Meadow, S., Mylander, C., & Franklin, A. (in press). How children make language out of gesture: Morphological structure in gesture systems developed by American and Chinese deaf children. Cognitive Psychology. Goldin-Meadow, S., Nusbaum, H., Garber, P. & Church, R. B. (1993). Transitions in learning: Evidence for simultaneously activated strategies. Journal of Experimental Psychology: Human Perception and Performance, 19, 92-107. Goldin-Meadow, S., Nusbaum, H., Kelly, S. D., & Wagner, S. (2001). Explaining math: Gesturing lightens the load. Psychological Science, 12, 516-522. Goldin-Meadow, S., & Singer, M. A. (2003). From children’s hands to adults’ ears: Gesture’s role in teaching and learning. Developmental Psychology, 39, 509-520. Goldin-Meadow, S., Yalabik, E., & Gershkoff-Stowe, L. (2000). The resilience of ergative structure in language created by children and by adults. Proceedings of Boston University Conference on Language Development, 24, 343-353. Iverson, J.M., & Goldin-Meadow, S. (2005). Gesture paves the way for language development. Psychological Science, 16, 368-371. Iverson, J.M., Capirci, O., Longobardi, E., & Caselli, M.C. (1999). Gesturing in mother-child interaction. Cognitive Development, 14, 57-75. Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), Relationship of verbal and nonverbal communication (pp.207-228). The Hague: Mouton. Kita, S. (2000). How representational gestures help speaking. In D. McNeill (Ed.), Language and gesture (pp.162-185). Cambridge, MA: MIT Press.
48 SUSAN GOLDIN-MEADOW Klahr, D., & Robinson, M. (1981). Formal assessment of problem solving and planning processes in preschool children. Cognitive Psychology, 13, 113-148. Lenneberg, E. H. (1964). Capacity for language acquisition. In J. A. Fodor & J. J. Katz (Eds.), The structure of language: Readings in the philosophy of language. NJ: Prentice-Hall. Mayberry, R. I. (1992). The cognitive development of deaf children: Recent insights. In S. Segalowitz & I. Rapin (Eds.), Child Neuropsychology, Volume 7, Handbook of Neuropsychology (pp.51-68), F. Boller & J. Graffman (Series eds.). Amsterdam: Elsevier. McNeill, D. (1998). Speech and gesture integration. In J.M. Iverson & S. Goldin-Meadow (Eds.). The nature and functions of gesture in children's communications (pp.11-28), in the New Directions for Child Development series, No. 79, San Francisco: Jossey-Bass. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: The University of Chicago Press. Meltzoff, A. N., & Moore, M. K. (1977). Imitation of facial and manual gestures by human neonates. Science, 198, 75-78. Moores, D. F. (1974). Nonvocal systems of verbal behavior. In R. L. Schiefelbusch & L. L. Lloyd (Eds.), Language perspectives: Acquisition, retardation, and intervention (pp.372-417). Baltimore: University Park Press. Morford, J. P., & Goldin-Meadow, S. (1997). From here to there and now to then: The development of displaced reference in homesign and English. Child Development, 68, 420435. Newport, E. L. & Meier, R. P. (1985). The acquisition of American Sign Language. In D. I. Slobin (Ed.), The cross-linguistic study of language acquisition, Vol. 1. Hillsdale, NJ: Erlbaum. Ozcaliskan, S. & Goldin-Meadow, S. (2005). Gesture is at the cutting edge of early language development. Cognition, 96, 101-113. Ozyurek, A. & Kita, S. (1999). Expressing manner and path in English and Turkish: Differences in speech, gesture, and conceptualization. Proceedings of the Cognitive Science Society, 21, 507-512. Perry, M., & Elder, A. D. (1997). Knowledge in transition: Adults' developing understanding of a principle of physical causality. Cognitive Development, 12, 131-157. Perry, M., Berch, D. B., & Singleton, J. L. (1995). Constructing shared understanding: The role of nonverbal input in learning contexts. Journal of Contemporary Legal Issues, Spring, 213236. Perry, M., Church, R. B., & Goldin-Meadow, S. (1988). Transitional knowledge in the acquisition of concepts. Cognitive Development, 3, 359-400. Phillips, S.B., Goldin-Meadow, S., & Miller, P.J. (2001). Enacting stories, seeing worlds: Similarities and differences in the cross-cultural narrative development of linguistically isolated deaf children. Human Development, 44, 311-336. Pine, K.J., Lufkin, N., & Messer, D. (2004). More gestures than answers: Children learning about balance. Developmental Psychology, 40, 1059-1067. Shatz, M. (1982). On mechanisms of language acquisition: Can features of the communicative environment account for development? In E. Wanner & L. R. Gleitman (Eds.), Language acquisition: The state of the art (pp.102-127). New York: Cambridge University Press. Singer, M. A., & Goldin-Meadow, S. (2005). Children learn when their teacher’s gestures and speech differ. Psychological Science, 16, 85-89. Singleton, J. L., Morford, J. P., & Goldin-Meadow, S. (1993). Once is not enough: Standards of well-formedness in manual communication created over three different timespans, Language, 69, 683-715. Slobin, D. I. (1996). From “thought and language” to “thinking for speaking.” In J. J. Gumperz & S. C. Levinson (Eds.), Rethinking linguistic relativity (pp.97-114). Cambridge: Cambridge University Press.
GESTURE WITH SPEECH AND WITHOUT IT 49
Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and syntactic description, Vol. III: Grammatical categories and the lexicon (pp.57-149). Cambridge: Cambridge University Press. Tervoort, B. T. (1961). Esoteric symbolism in the communication behavior of young deaf children. American Annals of the Deaf, 106, 436-480. Thurnham, A., & Pine, K. (2006). The effects of single and dual representaions on children's gesture production. Cognitive Development, 21, 46-59. Valenzeno, L., Alibali, M. W., & Klatzky, R. (2003). Teachers' gestures facilitate students' learning: A lesson in symmetry. Contemporary Educational Psychology, 28, 187-204. Wagner, S. M., Nusbaum, H., & Goldin-Meadow, S. (2004). Probing the mental representation of gesture: Is handwaving spatial? Journal of Memory & Language, 50, 395-407. Zheng, M., & Goldin-Meadow, S. (2002). Thought before language: How deaf and hearing children express motion events across cultures. Cognition, 85, 145-175.
From Gestures to Signs in the Acquisition of Sign Language1 Nini Hoiting Royal Effatha-Guyot Group Haren, Netherlands
Dan I. Slobin University of California, Berkeley
In signed languages, both linguistic signs and gestures are executed in the same modality. As a consequence, children acquiring a sign language may produce iconic gestures that are close to appropriate conventional signs in referential contexts. Nevertheless, the child must master the conventional forms of expression of the language, including both points and signs. Particular problems are posed by the management of gaze and by communicative requirements to distance signs and gestures from referents. Early sign language acquisition can be seen as a gradual movement from gestural indices and icons to linguistic forms.
We begin with a fundamental observation made by David McNeill in his landmark monograph, Hand and Mind, setting the stage for the role of gesture in language acquisition: As children acquire their language they are also constructing a speech-gesture system. Gesture and speech grow up together. We should speak not of language acquisition, but of language-gesture acquisition. (McNeill, 1992:295)
1.
Gesture and Modality
McNeill and his followers (McNeill, 2000) have demonstrated how gesture permeates the use of spoken language. Their work has primarily concerned itself with gestures that are performed outside of the vocal modality of speech—that is, movements of hands, arms, body, and face. In the case of signed languages, however, the gestural and the lexical-grammatical components are manifested, to 1
Research reported in this chapter was supported by the Linguistics Program of the National Science Foundation under grant SBR-97-27050, “Can a Deaf Child Learn to Sign from Hearing Parents?” to Dan I. Slobin, PI and Nini Hoiting, co-PI. Additional support was provided by the Institute of Human Development and the Institute of Brain and Cognitive Sciences, University of California, Berkeley; by the Royal Institute for the Deaf “H. D. Guyot”, Haren, Netherlands (now the Effatha-Guyot Group); and by the Max Planck Institute for Psycholinguistic Research, Nijmegen, Netherlands.
52 NINI HOITING AND DAN I. SLOBIN a great extent, in the same modality. Sign language linguists are therefore faced with a more difficult task in attempting to differentiate gestural and linguistic components of signed messages. The problem of delineating a strictly ‘linguistic’ component of the languages of the deaf remains unsolved and is the topic of extended current discussion and debate (see, for example, papers in Emmorey, 2003; Liddell, 2003; Taub, 2001). Our task here is not to solve that problem, but to highlight ways in which gestures can become schematized and conventionalized. There is a continuum from gesture to sign in the development of signed languages, in both historical and ontogenetic time frames. On the historical level, this continuum represents a course of grammaticalization in signed languages that parallels the continuum from lexicon to grammar in spoken languages (Frishberg, 1975). On the level of the individual deaf child, however, processes of language acquisition are at work in ways that seem to be strikingly different from the acquisition of spoken languages. Before turning to language development in deaf children, consider the possibilities of gesture in the vocal modality. The basic issue in both modalities of expression—vocal and body—is the simultaneous use of communicative devices that are both discrete and graded, both conventional and iconic. Even in the acoustic modality of spoken language, gradient and iconic expressions permeate spoken messages. Consider, for example, the utterance of a preschooler who is attempting to depict the sound and force of a rocket launch: (1)
3 – 2 – 1 blast off khkhkh!
The onomatopoeic khkhkh is a vocal gesture that depicts aspects of the event. It is fully integrated into the acoustic flow of the utterance and it fills a conventional structural slot though it is not an English lexical item and can hardly be classified as one of the traditional parts of speech. Such items can be freely invented; at the same time, each language provides a set of fixed vocal ‘emblems’ such as English crash, boom, pow, and the like. Voice quality is also regularly used to enact role shifts in speech, as in the following example in which a preschooler is assigning a role in a play situation by mimicking a teacher’s manner of speaking: (2)
You be the teacher and go: “Now be quiet children!” Voice quality can also be used to evaluate other people, as in:
(3)
She goes: “You can’t play with MY toys!” [singsong intonation]
The voice can thus be used to integrate ‘gestures’ into utterances. Indeed, it is no more than our ancient tradition of separating the written and spoken components of language that leads to the assumption that only written language is ‘linguistic’. As shown in the above examples of preschool speech, depictions of
GESTURES TO SIGNS 53
sounds and the enactments of the vocal productions of speakers can be systematically incorporated into linguistic constructions. The ways in which they are incorporated vary from language to language and can, eventually, be rigorously specified in the terms of an expanded linguistic theory. Clearly, depicting and enacting are universal characteristics of human communicative systems. When we turn to signed languages, we find sign and gesture in one modality—that is, the visual modality, broadly conceived, including the hands, face, and body. Scott Liddell has pinpointed the role of gradient signals in his important recent book, Grammar, gesture, and meaning in American Sign Language (2003). The title makes it clear that grammar and gesture both play key roles in the expression of meaning in ASL (and all signed languages). Liddell provides us with a clear statement with regard to the nature of an integrated linguistic system: The ASL language signal … includes gradient aspects of the signal, and gestures of various types. All of these coordinated and integrated activities constitute the language signal and contribute to expressing the conceptual structure underlying the utterance. (Liddell, 2003:362)
We have gathered data on the acquisition of a particular sign language, Sign Language of the Netherlands (SLN), by young deaf children between the ages of 15 and 36 months (Hoiting, 1997; 2006, in press; Hoiting & Slobin, 2002a; 2002b; Slobin, Hoiting et al., 2003). The data show that these children are actively gesturing while acquiring the conventional forms of the language. Because both of these aspects of communication are in what can be called ‘the body modality’ (as opposed to the vocal modality), analysis of sign language development cannot avoid the gesture-to-sign continuum that presents itself as separate tracks of analysis in discussions of hearing children’s acquisition of segmental (phonological, lexical, grammatical) and suprasegmental (pitch, stress, intonation) components of spoken languages. 2.
Index, Icon, and Symbol
Central to analyses of gestural and gradient aspects of communication are issues of indexicality and iconicity. In Peirce’s familiar terms, an ‘index’ bears an intrinsic connection to the object that is indicated; an ‘icon’ bears a perceptual resemblance to its object; and a ‘symbol’ represents an object by arbitrary rule. Applying these distinctions to signed languages, indices point, icons characterize, and symbols designate.
54 NINI HOITING AND DAN I. SLOBIN 2.1 Indices An index isolates a referent by pointing to it with the hand and/or gaze direction. In signed languages, manual indices become conventionalized, taking on particular handshapes, prosody, and directionality—both as means of indicating referents and as grammatical elements serving as proforms. (Eyepointing, which we do not consider in this discussion, also becomes conventionalized.) Note that even here, when we deal with ‘simple’ pointing gestures—either as co-speech gestures or in a signed language—there are constraints of conventionality, such as the index finger point used in some languages, various oriented hand configurations in others, and so forth (e.g., Wilkins, 2003). These conventions must be learned and used appropriately. 2.2 Icons An icon characterizes a referent or referent situation in two basic ways— depiction or enactment. Some aspects of a situation can be schematically modeled, as in the use of handshapes to indicate size and shape, and some aspects can be schematically enacted, as in demonstrating how an object is manipulated or how a person behaves. (Note, again, that because depicting and enacting are, of necessity, schematic, language-specific conventions must be learned and used.) 2.3 Symbols A symbol designates its referent by use of a conventional sign. As already indicated, both indices and icons, when integrated into a language, bear features of conventionality. And probably most signs have their origin in indexical and iconic referencing. But the conventional signs and structures of a signed language have to be learned, as do the lexical items and grammatical constructions of spoken languages. Note too that symbols—conventional signs—can have indexical and iconic properties. In adult signing, designating is permeated with pointing and characterizing. In signed languages, all three types of Peircean signs—indices, icons, and symbols—can be expressed by handshape, movement, location, face, eyes, and posture. That is, all of the resources of the body modality are available for the three types of expression. 3.
Acquiring a Signed Language
A hearing child who is acquiring a spoken language has little opportunity to invent lexical and grammatical forms. For example, if a child decides that a pair of scissors should be called a “cutter” he will soon have to accept that this device is called scissors in English. And if he then tries to turn this word into a verb, announcing that he is, “scissoring,” he will have to accept that in English one cuts
GESTURES TO SIGNS 55
with scissors. By contrast, a deaf child acquiring SLN can use the same enacting gesture—moving the index and middle finger together and apart—to refer both to the instrument and the action. He will be close to correct, in this instance, having only to learn the formational constraints that distinguish the symbol for an entity (‘noun’)—a single and precise closing of the fingers, from the symbol for an activity (‘verb’)—repeated and rapid closure. Young children naturally use many means of pointing and characterizing. As a consequence, many signed utterances used by deaf toddlers have a strong gestural component. When does an iconic gesture become a symbol? We examine this question with regard to several accomplishments in the early acquisition of SLN, looking at the development of pointing, characterizing, and designating.2 Our developmental research question is: “How do conventional signs (designations) emerge from a matrix of pointing and characterizing (depicting, enacting)?” In broad terms, the acquisition of SLN, like the acquisition of any language, demonstrates increasing levels of generalization and abstraction (as described, for example, by Tomasello, 2003). However, specific to signed languages, these processes are grounded in early uses of gesture. 4.
The Development of Pointing in SLN
The children in our sample use the index-finger point in various ways, ranging from less to more sign-like. Figure 1 shows a scene in which a girl of age 2;4 is interacting in doll play with her hearing mother who has learned SLN. In the figure we see a fully outstretched arm and a clear point at the doll, with gaze at
Figure 1: Direct point and gaze directed at object (age 2;4).
2
Longitudinal data were gathered from 30 deaf children in the age range 15–36 months (4 children with deaf parents and 26 with hearing parents). All of the children were exposed to signed communication, but half of the children with hearing parents (13/26) were exposed to a system of simultaneous speech and signing (Sign-Supported Dutch); the remainder received only SLN input. Data were gathered in the northern part of The Netherlands by Nini Hoiting.
56 NINI HOITING AND DAN I. SLOBIN the doll. This is the culmination of an interaction in which the pointing gesture becomes precise. The child, code-named ELS, has had difficulty fitting her doll into a toy washtub. The mother places the doll in the bath and signs DOLL. ELS, while looking at the doll, first points at the doll with a lax hand and looks back at her mother. Then, to secure her mother’s attention, she performs the point shown in Figure 1, executed crisply with a clear direction and gaze at the doll. The two points share indexical properties; the difference lies in execution. The first, lax point could refer to the whole event or to a part of it. Her mother does not react; then ELS provides a punctual indexical gesture near the doll’s face, looking at the doll and then back to her mother, who affirms YES DOLL. The second way of pointing is appropriate according to lexical, phonological, and pragmatic conventions of SLN: (a) neatly directed toward an object, with accompanying gaze, affirming that object as the conversational topic; (b) executed in accordance with sign language prosody: quick, short, forward movement of the index finger; (c) followed by a shift in gaze from the object to the interlocutor. Such sign-like points are typical of 2-year-old communication, playing a conversational role in turn-taking and the establishment and maintenance of direction. Even such an apparently simple index as extending the hand toward an object can take on the quality of sign language. Deaf 2-year-olds are in the process of mastering the integration of gaze, gaze shift, and direction and angle of pointing. Figures 2a and 2b show segments of a long and complex interaction between JES, a girl of 2;8, and her Deaf mother, who is off camera to the left. JES is upset with her 9-month-old baby brother, sitting in a highchair off to her right. In these scenes she is accusing him of having
Figure 2a: Direct point at object with gaze at addressee (age 2;8).
torn a paper crown from her head. She address a firm complaint to her mother. In Figure 2a we see JES with fixed gaze at her mother while pointing straight toward her brother. Her point is an index referring to her brother as the actor in this little
GESTURES TO SIGNS 57
drama, but her accusation is addressed to her mother. JES skillfully manipulates two directions of regard: a point at her brother and a gaze at her mother. She begins the point with a brief glance at the brother, immediately shifting her gaze to the mother while continuing to execute a series of sharp points at the brother. Such a rapid shift of gaze while maintaining the direction of pointing is an appropriate linguistic means of indicating reference while holding the floor for subsequent comment. It is also important to note that the point shown in Figure 2a was embedded in a full utterance: BOY GRAB POINT SHAME. That is, JES is reporting that the boy over there grabbed her hat and he should be ashamed. By age 2;8 she is skillfully integrating point, gaze, and lexical items in a prosodically fluent construction.
Figure 2b: Angled point at object with gaze at addressee (age 2;8).
In the next segment of the event, shown in Figure 2b, JES is holding the torn paper crown in her left hand and reiterating her accusation. Her gaze is fixed on her mother and she has added stress to her point by curving her wrist and performing very short, rhythmically repeated points directly at the baby. The indirect angle of the point and gaze at the addressee would probably be strange if performed by a hearing 2-year-old. However, this sort of angled point with diverted eye gaze is appropriate SLN usage. Both sorts of points demonstrated in Figure 2 represent a language-specific shaping of a basic indexical gesture. 5.
Distancing and the Incorporation of Sign and Gesture into Utterances
In Hand and Mind, McNeill summarizes the important early work of Werner and Kaplan (1963) on the processes of symbol formation in child development. On the basis of study of early language development in several spoken languages, Werner and Kaplan proposed, in McNeill’s summary: “At an early stage of
58 NINI HOITING AND DAN I. SLOBIN development the signifier and signified have little distance between them” (McNeill, 1992, p.298). McNeill discusses distancing with regard to the co-speech gestures of hearing children, where there is a distance between action and symbol. He concludes that children’s early gestures are, “burdened with characteristics that arise from their being actions but that are not needed for them to be symbols” (ibid.). For the deaf child, by contrast, gestures have characteristics that can be used as symbols. One such characteristic is the literal physical distancing of a gesture/sign from its referent object or event. As a first example, consider the situation depicted in Figure 3. We have already encountered this child, JES, with regard to pointing gestures. In Figure 3 (still age 2;8) we see a pointing gesture again, but in this instance it is part of an interesting distancing scenario. JES has
Figure 3: Distanced point at trace of event (age 2;8).
been drawing on a magic slate with her index finger and tells her mother (seated opposite) about what she has done. She traces a circle above the slate and then points at the slate—but without contact—and then, without a prosodic break, signs EASY (a 5-hand 5 brushed down the chin) while looking at her mother. The gesture of drawing has thus coalesced into a point which is a kind of distanced symbol. JES is pointing now at an event—the completed action of drawing a circle—which then becomes the topic of a comment, that is, a sort of predication in which EASY is predicated of the action on the slate, which is pointed to. JES’s point in Figure 3 is smoothly incorporated into an utterance (the equivalent of something like ‘That was easy’). It is, at one and the same time, a gesture, an index, and a budding linguistic symbol. In a signed language, elements of the physical space can be integrated into the linguistic space.
GESTURES TO SIGNS 59
Figures 4 and 5 present two segments of a complex scene, again with JES at 2;8. As described above, she had been wearing a paper crown that she had made and was proud of. Her 9-month-old baby brother reached for it when JES came
Figure 4a: Verb performed while grasping referent object (age 2;8).
Figure 4b: Verb performed while grasping referent object (age 2;8).
close to his high-chair; when JES pulled away, the crown tore. Her rage knew no bounds. She repeatedly pointed at the baby (as we have seen in Figures 2a and 2b), telling her mother that he was NAUGHTY. In Figures 4a and 4b we see JES, with torn paper crown in hand, looking at her mother. She signs BREAK appropriately (rotating two fists outward) while holding onto the crown in her left hand. This is a conventional SLN sign, produced with no distancing from the
60 NINI HOITING AND DAN I. SLOBIN referent object, which, in fact, forms an object-incorporated component of the sign. Signing upon an object or with an object in hand is possible in adult SLN as well—a kind of compact indexing of a topic while commenting on it. JES can do more, however: in Figures 5a and 5b we see a later segment of the same dramatic event. Now the torn crown lies on the floor and JES, still angry, is glaring at her mother and repeating that her bad baby brother has broken her crown. Now she has both hands free, and first signs BOY and then BREAK:
Figure 5a: Verb performed with distance from referent (age 2;8).
Figure 5b: Verb performed with distance from referent (age 2;8).
Which is a nice topic/comment or subject/predicate construction. The same BREAK sign is now produced at a distance from its object, showing that JES can produce the sign independently. Note, too that this is fully a symbol, probably making past time reference to the completed event. The two rotating fists of BREAK are a conventional lexical item in SLN. The sign in no way corresponds to the actual way in which the crown was broken. In fact, it tore off of JES’s head
GESTURES TO SIGNS 61
when the baby grasped at it. At the same time, the production of BREAK first with the object in hand, and later above the object, constitutes both an index and a symbol. JES is on the way toward producing complex linguistic constructions, smoothly incorporating reference to real world objects and events in the immediate environment. This is a necessary preparatory phase for later expanded distancing in the reporting of non-present events. 6.
Iconic Enactment or SLN Sign?
In attempting to transcribe and analyze child signing, one is often faced with the dilemma of whether a production is an invented gesture or an attempt at a conventional sign. This problem does not arise in transcribing children’s speech. When an English-speaking child, for example, describes an event by saying, “push,” we can be certain that this is a form of an English word and not a form invented by the child. And when an English-speaking or SLN-signing child acts out a pushing situation with full motor mimicry of the act, we can be sure that this is a sort of iconic enactment. But what about iconic enactments by deaf toddlers that include what look like conventional signs? Consider the situation depicted in Figure 6. A girl of age 2;6 is looking at a book of family photographs with her (hearing) mother. They are discussing a
Figure 6: PUSH: Enactment and/or sign? (age 2;6).
picture in which the girl is seated in a baby buggy and is being pushed by her father. The child signs FATHER ME PUSH. There is no question that FATHER is a conventional sign: a laterally-placed index finger moves in an arc from the girl’s forehead to her chin. And ME is clearly an index—butan index that takes on conventional form: the child points to the center of her chest with her index finger. It is the verb, or action depiction, that sharply raises the central question of this chapter: Is it a gesture or an SLN sign or something in between? It is produced by
62 NINI HOITING AND DAN I. SLOBIN thrusting two laterally-placed fists in a pushing motion away from the body. This is, of course, depictive of an act of pushing. It is also depictive of a particular sort of pushing act, namely one in which the two hands grasp a horizontal bar that is parallel to the front of the body. It is thus both a gesture that acts out the pushing of a baby buggy and an SLN sign for pushing a horizontal bar-shaped entity forward. Has the girl happened upon an SLN sign by accident, or has she learned the sign from others? Here, indeed, we face a situation that is unique to the acquisition of signed languages. English-speaking children don’t stumble upon phonological forms that happen to correspond to the correct words in context. Deaf children thus seem to have a way into their language that deserves more careful psycholinguistic attention. We have made this point earlier with regard to deaf children’s acquisition of classifier constructions—that is, the incorporation of depictive and manipulative handshapes into polycomponential verbs such as PUSH (which really means, in this instance, something like ‘move a horizontal bar forward by grasping it with two hands’): Transient innovations may be close enough to a conventional ‘classifier’ to pass unnoticed, or to be easily shaped into the conventional form. The capacity to represent objects and their movements/locations by means of arm and hand is given from the start. The deaf child, therefore, follows a special developmental path, in that normal gesture can be ‘seamlessly’ incorporated into the flow of signing. As a consequence, the deaf child—in contrast to the hearing child—has the task of ‘paring down’ the gesture system to those elements that are conventionalized in the exposure language (Slobin, Hoiting et al., 2003:277).
Gesture/signs like the form shown in Figure 6 can be characterized as SLN lexical items when they begin to fit into systematic sets of contrasts. For example, two flat hands with palms extended away from the body might be used to refer to pushing a box; or the two-fist configuration of Figure 6 might also move toward the body to designate an act of pulling a baby buggy, and so forth. As a system emerges it becomes possible to describe the child’s signing in a set of paradigmatic contrasts, including number of hands, handshapes, orientation, direction and type of movement, and so forth. The heart of SLN—like all signed languages that have been described in detail—lies in the construction and use of such polycomponential verbs (see, e.g., Engberg-Pedersen, 1993; Slobin, Hoiting et al., 2003). Deaf toddlers, we propose, are already at work in beginning to produce such constructions, drawing upon their own motor productions and imaginations, along with the linguistic forms of the sign language that they are exposed to. 7.
The Developmental Path into Sign Language
In this brief presentation we have made several developmental claims, emphasizing both a gestural way into the acquisition of a sign language and the
GESTURES TO SIGNS 63
continuing task of adjusting gesture to language-specific conventions. The claims can be summarized in five points: • • • • •
From early on in the development of SLN, pointing is performed “in a sign language way.” Linguistic signs only gradually become distanced from their referents. Pointing and characterizing acts are smoothly incorporated into syntactic constructions. What appear to be conventional signs of SLN often begin as characterizations—both enactments and depictions. With further development, points and gestures will be modified to adhere to SLN conventions of form, eventually fitting into systematic sets of linguistic contrasts.
In sum, the transition from gesture to sign, from iconic enactment to conventional symbols, is gradual. There is no clear line at which one can say: now, and only now, has the child begun to use an established sign language. Both deaf and hearing infants exploit their bodily resources in early communicative gestures, while acquiring conventional language forms. However, sign-acquiring children seem to explore the components and possibilities of gesture over a longer developmental period than do hearing children acquiring a spoken language. As a consequence, there may well be different developmental timetables and patterns for the acquisition of morphosyntax, depending on the modality of the language. These considerations require a renewed focus on definitions of ‘linguistic’, ‘extralinguistic’, and ‘nonlinguistic’ components of communication. 8.
The Gesture-to-Sign Continuum and the Boundaries of Language
In conclusion, attention to the gesture-to-sign continuum in child signing raises questions for the limits of linguistic analysis of both signed and spoken languages. Increasingly, current work on signed languages using the tools of cognitive grammar and mental space theory make it clear that gesture and sign interpenetrate each other (e.g., Dudis, 2004a; 2004b; Liddell, 2003; Taub, 2001). Scott Liddell (2003:362) has recently taken issue with “the predominant views of what constitutes language.” He concludes that although signed languages might be organized in different fashion from spoken languages: It is much more likely that spoken and signed languages both make use of multiple types of semiotic elements in the language signal, but that our understanding of what constitutes language has been much too narrow.
An examination of signing toddlers illuminates the emergence of conventional language from gestural indices and icons. As more detailed attention
64 NINI HOITING AND DAN I. SLOBIN is paid to the use of vocal indices and icons, we expect that the traditional borders between ‘linguistic’ and ‘extralinguistic’ will also prove to be permeable. David McNeill has pointed the way for a long time, and it is fitting to end this chapter by citing his vision: [A] true psychology of language requires us to broaden our concept of language to include what seems, in the traditional linguistic view, the opposite of language—the imagistic, instantaneous, nonsegmented, and holistic. Images and speech are equal and simultaneously present processes in the mind. (McNeill, 1992:2)
References Dudis, P. (2004a). Body partitioning and real-space blends. Cognitive Linguistics, 15, 223-238. Dudis, P. (2004b). Depiction of Events in ASL: Conceptual Integration of Temporal Components. Unpublished doctoral dissertation, University of California, Berkeley. Emmorey, K. (Ed.) (2003). Perspectives on classifier constructions in sign languages. Mahwah, NJ: Lawrence Erlbaum Associates. Engberg-Pedersen, E. (1993). Space in Danish Sign Language: The semantics and morphosyntax of the use of space in a visual language. Hamburg, Germany: Signum Press. Frishberg, N. (1975). Arbitrariness and iconicity: Historical change in American Sign Language. Language, 51, 676-710. Hoiting, N. (1997). Early bilingual development in Sign Language of the Netherlands and Dutch: A case study of a deaf child. In E. V. Clark (Ed.), The Proceedings of the 29th Annual Child Language Research Forum (pp.73-80). Stanford, CA: Center for Language and Information. Hoiting, N. (2006). Deaf children are verb attenders: Early sign language acquisition in Dutch toddlers. In B. Schick, M. Marschark, & P. E. Spencer (Eds.), Advances in the sign language development of deaf children (pp.161-188). Oxford/New York: Oxford University Press. Hoiting, N. (in press). Language domains of deaf children: Early acquisition of SLN. Groningen: Rijksuniversiteit Groningen. Hoiting, N., & Slobin, D. I. (2002a). Transcription as a tool for understanding: The Berkeley Transcription System for sign language research (BTS). In G. Morgan & B. Woll (Eds.), Directions in sign language acquisition (pp.55-75). Amsterdam/Philadelphia: John Benjamins. Hoiting, N., & Slobin, D. I. (2002b). What a deaf child needs to see: Advantages of a natural sign language over a sign system. In R. Schulmeister & H. Reinitzer (Eds.), Progress in sign language research: In honor of Siegmund Prillwitz / Fortschritte in der Gebärdensprachforschung. Festschrift für Siegmund Prillwitz (pp.267-278). Hamburg: Signum Liddell, S. (2003). Grammar, gesture, and meaning in American Sign Language. Cambridge: Cambridge University Press. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: Chicago University Press. McNeill, D. (Ed.) (2000). Language and gesture. Cambridge: Cambridge University Press. Slobin, D. I., Hoiting, N., Kuntze, K., Lindert, R., Weinberg, A., Pyers, J., Anthony, M., Biederman, Y., Thumann, H. (2003). A cognitive/functional perspective on the acquisition of “classifiers.” In K. Emmorey (Ed.), Perspectives on classifier constructions in sign languages (pp.271-296). Mahwah, NJ: Lawrence Erlbaum Associates. Taub, S. (2001). Language in the body: Iconicity and metaphor in American Sign Language. Cambridge: Cambridge University Press. Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition.
GESTURES TO SIGNS 65
Cambridge, MA: Harvard University Press. Werner, H., & Kaplan, B. (1963). Symbol formation: An organismic-developmental approach to language and the expression of thought. New York: Wiley. Wilkins, D. (2003). Why pointing with the index finger is not a universal (in socio-cultural and semiotic terms). In S. Kita (Ed.), Pointing: When language, culture, and cognition meet (pp. 171–215). Mahwah, NJ: Lawrence Erlbaum Associates.
How Does Spoken Language Shape Iconic Gestures? Sotaro Kita University of Birmingham
Aslı Özyürek Center for Language Studies, Radboud University and Max Planck Institute for Psycholinguistics
In this paper we review a series of studies that show empirically that the lexical and syntactic structure a speaker uses to express information influences the way iconic gestures are shaped. The findings come from cross-linguistic comparisons, second language acquisition, as well as experimental studies conducted within a single language. We argue that the findings provide direct evidence for McNeill’s claim that gestures are a necessary component of thinking and speaking, and that gestures reflect an online dynamic process of a ‘dialectic between imagery and language’.
1.
Introduction
Our interest in gesture studies and in particular how gestures are related to the speaking process originated while we were both students in David McNeill’s lab at the University of Chicago, and has continued in our collaborations at the Max Planck Institute (the ‘Gesture Project’), and thereafter. In our graduate and postdoctoral work, we have been fascinated by and pursued McNeill’s idea that gestures are a necessary component of thinking and speaking, and that gestures reflect an online dynamic process of the ‘dialectic between imagery and language’ (McNeill, 1985; 1992; 2005), currently called the Growth Point Theory (McNeill & Duncan, 2000; McNeill, 2005). In our ongoing work we attempt to provide empirical evidence for this idea, in particular for one side of the dialectic; the influence of language structure on how imagery is shaped for speaking and iconic gestures. As students coming from Japan and Turkey, and speaking native languages that differ typologically from English, we were both interested in testing the idea McNeill put forward in Hand and Mind (1992) that gestures, as reflections of imagery, should be similar across different languages. However, counter to our initial intuitions we have found this not to be the case (Özyürek & Kita, 1999; Kita & Özyürek, 2003). Since then we have been developing a theoretical framework, the Interface Hypothesis, to account for the fact that iconic gestures depict the
68 SOTARO KITA AND ASLI ÖZYÜREK same event in different ways, depending on the specific lexical item or syntactic structure used to describe the event in the concurrent speech. In this paper we first outline competing accounts of how iconic gestures are generated during speaking and their relation to imagery. Then we describe the Interface Hypothesis of online relations between speaking and gesturing, as well as empirical evidence that supports this hypothesis. We end by discussing the links between Growth Point Theory and the Interface Hypothesis. 2.
Three Hypotheses of how Iconic Gestures are Generated
According to the first hypothesis of how iconic gestures are generated in relation to speech, the Free Imagery Hypothesis, gestures are generated from imagery that is formed ‘prelinguistically’, that is, independent of linguistic formulations. Krauss and his colleagues (1996; 2000), for example, suggested that gestures are generated from spatial imagery in working memory, activated at the moment of speaking. More specifically, the ‘spatial-dynamic feature selector’ picks up spatial features in the spatial working memory that are a part of the idea to be conveyed, and these features define the content of a gesture. A subset of these features may contribute to lexical retrieval in speech by cross-modal priming. Other than that, however, the ‘spatial-dynamic feature selector’ has no access to the computation involved in the speech-production process, and thus the gestures cannot be shaped by the linguistic formulation. Unlike Krauss and his colleagues, de Ruiter (2000) proposed that representational gestures are generated by a part of the speech production process proper, the ‘conceptualizer’ (in the sense of Levelt, 1989), which produces a pre-verbal message to be fed into the ‘formulator’ (i.e., the linguistic formulation module). Despite this difference, in both Krauss’ and de Ruiter’s models, gestures are generated before and without access to processes of linguistic formulation. Consequently, both models support the Free Imagery Hypothesis, which predicts that the encoding of a concept in gesture is not influenced by the details of how that same concept is linguistically encoded in speech. The second view is the Lexical Semantics Hypothesis, according to which gestures are generated from the semantics of lexical items in their accompanying speech. Butterworth and Hadar (1989) proposed that a lexical item generates iconic gestures from one or more of its semantic features that can be interpreted spatially. Unlike the Free Imagery Hypothesis, the prediction of the Lexical Semantics Hypothesis is that representational gestures do not encode what is not encoded in the concurrent speech. The Lexical Semantics Hypothesis further states that the source of gestures lies strictly at the lexical level rather than at the levels of syntax and discourse. The third view is the Interface Hypothesis (Kita & Özyürek, 2003), according to which gestures originate from an interface representation, which is spatio-motoric, and organized for the purpose of speaking. In other words, gestures are produced from a type of ‘thinking for speaking’ in the sense of Slobin (1987;
EFFECT OF LANGUAGE ON GESTURES 69
1996) (Kita, 1993; 2000; Kita & Özyürek, 2003; McNeill, 2000; McNeill & Duncan, 2000). In this view, speaking imposes constraints on how information should be organized. The organization has to conform to the existing lexical and constructional resources of the language (Slobin, 1996) and the linear nature of speech (Levelt, 1989). Furthermore, the limited capacity of the speech production system imposes constraints as well. Rich and complicated information is organized into smaller packages so that each package has the appropriate informational complexity for verbalization within a processing unit. This may involve feedback from speech ‘formulation’ processes to ‘conceptualization’ processes (Kita, 1993; Vigliocco & Kita, 2006). In sum, the optimal informational organization for speech production is determined by an interaction between the representational resources of the language and the processing requirements of the speech production system. In line with this view of speaking, the Interface Hypothesis proposes that gestures are generated during the conceptual process that organizes spatio-motoric imagery into a suitable form for speaking. Thus, it predicts that the spatio-motoric imagery underlying a gesture is shaped simultaneously by 1) how information is organized in a readily accessible linguistic expression that is concise enough to fit within a processing unit for speech production, and 2) the spatio-motoric properties of the referent (which may or may not be verbally expressed). 3.
Evidence for the Interface Hypothesis
The initial evidence for the Interface Hypothesis was based on crosslinguistic differences in how motion events are linguistically expressed, and how they are reflected in cross-linguistic differences in gestures. The cross-linguistic variation in gestural representation was first demonstrated in a comparison of gestures produced by Japanese, Turkish, and English speakers in narratives that were elicited by an animated cartoon (Canary Row with Sylvester and Tweety, used often by David McNeill, his students and colleagues). There were two events in the cartoon for which the linguistic packaging of information differed between English on one hand and Japanese and Turkish on the other. In the first event, a protagonist swung on a rope, like Tarzan from one building to another. It was found that English speakers all used the verb “swing,” which encoded the arc shape of the trajectory, and Japanese and Turkish speakers used verbs such as “go,” which did not encode the arc trajectory. This is because Japanese and Turkish do not have an agentive intransitive verb that is equivalent to English ‘swing’, nor is there a straightforward paraphrase. Presumably, in the conceptual planning phase of the utterance describing this event, Japanese and Turkish speakers get feedback from speech formulation processes and create a mental representation of the event that does not include the trajectory shape. If gestures reflect this planning process, the gestural contents should differ crosslinguistically in a way analogous to the difference in speech. It was indeed found that Japanese and Turkish speakers were more likely to produce a straight gesture,
70 SOTARO KITA AND ASLI ÖZYÜREK which does not encode the trajectory shape, and most English speakers produced only gestures with an arc trajectory (Kita, 1993; 2000; Kita & Özyürek, 2003). In the second event, in which one of the protagonists rolled down a hill, the description differed cross-linguistically along the lines discussed by Talmy (1985). English speakers used a verb and a particle or preposition to express the ‘manner’ (rolling) and ‘path’ (descending) of the event within a single clause (e.g., “he rolled down the hill”). In contrast, Japanese and Turkish speakers separated manner and path expressions into two clauses (e.g., “he descended as he rolled”). This difference in the clausal packaging of information should have processing consequences, because a clause approximates a unit of processing in speech production (Garrett, 1982; Levelt, 1989). It is plausible that manner and path are processed within a single processing unit for English speakers, but in two separate units for Japanese and Turkish speakers. Thus, as compared to English speakers, Japanese and Turkish speakers should separate the images of manner and path more often so as to process them in turn. The gesture data are consistent with this prediction (Özyürek & Kita, 1999; Kita & Özyürek, 2003). Japanese and Turkish speakers were more likely than English speakers to produce separate gestures for manner and path. Furthermore, in the description of both the swing and the roll events, the same gestures that showed the linguistic effects described above also simultaneously expressed spatial information never encoded in speech. More specifically, these gestures systematically expressed the left-right direction of the observed motion. For example, if the cat swung from the right side of the TV screen to the left, the gestures always consistently represented the motion this way, even though this information was never expressed in speech. This phenomenon was first reported for English by McCullough (1993) and was found to be the same in all languages in our study. Further evidence for gestures encoding information not encoded by speech has also been reported in children’s explanations of scientific reasoning (e.g., Goldin-Meadow, 2003). Thus, the above studies showed, first, that the gestural packaging of information parallels linguistic packaging of the same information in speech. This finding contradicts the prediction made by the Free Imagery Hypothesis, according to which the representations that underlie gestures are determined at a level of the speech production process that has no access to the details of linguistic formulation. Second, the above studies showed that gestures encoded spatial details that were never verbalized. This poses a critical problem for the Lexical Semantics Hypothesis, as does the finding of relationships between the syntactic and gestural packaging of manner and path information. According to this hypothesis, the content of gestures should be determined only at the lexical level. Japanese, Turkish, and English speakers all used one word referring to manner and a second word referring to path, but nevertheless the gestures differed in accordance with differences at the syntactic level. On the basis of these results, it was concluded that the content of gestures is shaped simultaneously by how
EFFECT OF LANGUAGE ON GESTURES 71
speech organizes information about an event and by spatial details of the event, which may or may not have been expressed in speech. One problem with the studies described so far was that the differences in iconic gestures used by English speakers on the one hand and Turkish and Japanese speakers on the other might have causes other than the syntactic packaging of information, such as broader cultural differences. In order to eliminate this possibility, we looked at a new set of manner and path descriptions to see how gestures are shaped when Turkish and English speakers use similar versus different syntactic constructions (Özyürek, Kita, Allen, Brown & Furman, 2005). We found that the cross-linguistic difference in gestures emerges only when speakers of Turkish and English used different syntactic means (i.e., one versus two clause constructions) but not when the speakers of the two languages used comparable syntactic packaging of information. That is, when only the manner or only the path of a given event was expressed in speech, similar content was expressed in gesture regardless of the language (e.g., English speakers produced manner-only gestures when they expressed only manner in speech just as Turkish speakers did). Thus, it is not the case that English speakers always preferred gestural representations in which manner and path were expressed simultaneously in one gesture. Rather, the cross-linguistic difference in gesture was observed only when the syntactic packaging of manner and path differed. This provides further support for the view that gestural representation is shaped in the process of packaging information into readily verbalizable units for speaking, as proposed by the Interface Hypothesis. Further evidence came from a study of how native speakers of Turkish gestured when they described the manner and path of motion events in their second language, English. Özyürek (2002a) compared Turkish speakers at different proficiency levels of English (beginners, intermediate and advanced). The most proficient group typically used one-clause expressions of manner and path in speech, and produced gestures that expressed manner and path simultaneously, similar to native speakers of English (Kita & Özyürek, 2003; Özyürek et al., 2005). In contrast, the groups with lower proficiency typically used two-clause expressions (i.e., they transferred their preferred L1 structure into their L2), and produced separate gestures for manner and path, just as they would do when speaking in Turkish. This result indicates that how one syntactically packages information in speech shapes how one gesturally packages information both in L1 and L2. Finally, a more direct demonstration of how spoken language shapes iconic gestures in an online and dynamic way comes from a study of English speakers (Kita, Özyürek, Allen, Brown, Furman, & Ishizuka, submitted). This study manipulated stimulus events in such a way that English speakers produced both one-clause (e.g., “he rolled down”) as well as two-clause descriptions (e.g., “he went down as he rolled”) of manner and path. The experimental manipulation used Goldberg's (1997) insight that when the causal link between manner and path is stronger, the preference for one-clause construction is stronger. It was found
72 SOTARO KITA AND ASLI ÖZYÜREK that gestures expressing manner and path simultaneously were more common in one-clause than two-clause descriptions. In contrast, manner only and path only gestures were more common in two-clause than one-clause descriptions. In other words, the gestural representation of manner and path changed depending on what type of syntactic packaging of manner and path the speaker decided to use for a given utterance. This further substantiates the contention of the Interface Hypothesis that iconic gestures are products of online, utterance-by-utterance conceptualization for speaking. The typological preferences of a given language and how it can shape iconic gestures can change if the speaker chooses a nontypological construction. In summary, even when speakers describe the same event, gestural depiction of the event varies, depending on how the concurrent speech packages the information about the event at the lexical (Kita, 1993; Kita & Özyürek, 2003) and the syntactic level. The syntactic effect on iconic gestures has been demonstrated in both crosslinguistic (Kita & Özyürek, 2003; Özyürek & Kita, 1999; Özyürek et al., 2005) and single-language studies (for English: Kita et al., submitted; for English as a second language: Özyürek, 2002a). The linguistic effect on iconic gestures provides evidence against the Free Imagery Hypothesis, which posits that iconic gestures are generated without input from linguistic formulation processes. The content of iconic gestures, however, is not completely determined by the content of concurrent speech (Goldin-Meadow, 2003; McCullough, 1993; Kita & Özyürek, 2003). That is, gestures systematically encode spatio-motoric information that is not encoded in concurrent speech, which contradicts the Lexical Semantics Hypothesis. Taken together, the evidence supports the Interface Hypothesis. That is, iconic gestures are generated at the interface between spatiomotoric thinking and language production processes, where spatio-motoric imagery is organized into readily verbalizable units. 4.
Conclusion
The evidence we have provided so far for the Interface Hypothesis is also direct evidence for David McNeill’s claim that gestures are the product of an online ‘dialectic between imagery and language’. More specifically, crosslinguistic investigations show convincingly that the linguistic packaging of information shapes iconic gestures online. This does not mean, however, that this is the only comprehensive mechanism that underlies the ‘dialectic between imagery and language’ during speaking or that governs how iconic gestures are shaped in general. In the line of work presented here we have not discussed the roles of the communicative-social context (Özyürek, 2002b; Bavelas, this volume), the discourse context (i.e., the “field of oppositions” in discourse (McNeill, 1992)), nor the tight temporal relations between speech and gesture that play a role in this dialectic. These are elaborated in the most recent version of the Growth Point Theory (McNeill, 2005). We believe that more cross-linguistic work on these aspects of the relations
EFFECT OF LANGUAGE ON GESTURES 73
between speech and gesture is needed in the future. For example, there are initial findings that speech-gesture synchrony differs across different languages (Özyürek, 2005; also discussed in McNeill, 2005). Future research in these domains will shed light on the dynamic intertwining of speech and gesture. References Butterworth, B., & Hadar, U. (1989). Gesture, speech, and computational stages: A reply to McNeill. Psychological Review, 96, 168-174. de Ruiter, J. P. (2000). The production of gesture and speech. In D. McNeill (Ed.), Language and gesture (pp.284-311). Cambridge: Cambridge University Press. Garrett, M. F. (1982). Production of speech: Observations from normal and pathological language use. In A. W. Ellis (Ed.), Normality and pathology in cognitive functions (pp.19-76). London: Academic Press. Goldberg, A.E. (1997). The relationships between verbs and constructions. In M. Verspoor, K. D. Lee, & E. Sweetser (Eds.), Lexical and syntactical constructions and the construction of meaning (pp.383-398). Amsterdam: John Benjamins. Goldin-Meadow, S. (2003). Hearing gesture: How our hands help us think. Cambridge, MA: Harvard University Press. Kita, S. (1993). Language and thought interface: A study of spontaneous gestures and Japanese mimetics. Unpublished doctoral dissertation. University of Chicago. Kita, S. (2000). How representational gestures help speaking. In D. McNeill (Ed.), Language and gesture (pp.162-185). Cambridge: Cambridge University Press. Kita, S, & Özyürek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. Journal of Memory and Language, 48, 16-32. Kita, S., Özyürek, A., Allen, S., Brown, A., Furman, R., & Ishizuka, T. (submitted). Relations between syntactic encoding and co-speech gestures: Implications for a model of speech and gesture production. Manuscript submitted to Language and Cognitive Processes. Krauss, R.M., Chen, Y., & Chawla, P. (1996). Nonverbal behavior and nonverbal communication: What do conversational hand gestures tell us? In M. Zanna (Ed.), Advances in experimental social psychology, 28 (pp.389-450). Tampa: Academic Press. Krauss, R.M., Chen, Y., & Gottesman, R.F. (2000). Lexical gestures and lexical access: A process model. In D. McNeill (Ed.), Language and gesture (pp.261-283). Cambridge: Cambridge University Press. Levelt, W.J.M. (1989). Speaking. Cambridge, MA: MIT Press. McCullough, K.E. (1993). Spatial information and cohesion in the gesticulation of English and Chinese speakers. A paper presented at the annual meeting of American Psychological Society. Chicago. McNeill, D. (1985). So you think gestures are non-verbal? Psychological Review, 92, 350-371. McNeill, D. (1992). Hand and mind. Chicago: University of Chicago Press. McNeill, D. (2000). Analogic/analytic representations and cross-linguistic differences in thinking for speaking. Cognitive Linguistics, 11, 43-60. McNeill, D., & Duncan, S. (2000). Growth points in thinking-for-speaking. In D. McNeill (Ed.), Language and gesture (pp.141-161). Cambridge: Cambridge University Press. McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press. Özyürek, A. (2002a). Speech-gesture synchrony in typologically different languages and second language acquisition. In B. Skarabela, S. Fish & A. H. J. Do (Eds.), Proceedings of the 26th annual Boston University conference on language development (pp.500-509). Somerville, MA: Cascadilla Press.
74 SOTARO KITA AND ASLI ÖZYÜREK Özyürek, A. (2002b). Do speakers design their co-speech gestures for their addressees? The effects of addressee location on representational gestures, Journal of Memory and Language, 46, 688-704. Özyürek, A., & Kita, S. (1999). Expressing manner and path in English and Turkish: Differences in speech, gesture, and conceptualization. In M. Hahn & S. C. Stoness (Eds.), Proceedings of the twenty-first annual conference of the Cognitive Science Society (pp.507-512). Mahwah, NJ: Lawrence Erlbaum. Özyürek, A. (2005). What do speech-gesture mismatches reveal about speech and gesture integration?: A comparison of Turkish and English. In C. Cang, M. Houser, Y. Kim, D. Mortensen, M. Park-Doob, M. Toosarvandani (Eds). Proceedings of the 27th meeting of the Berkeley Linguistics Society (pp.449-456) Berkeley, CA: Berkeley Linguistics Society. Özyürek, A., Kita, S., Allen, S., Furman, R., & Brown, A. (2005). How does linguistic framing of events influence co-speech gestures? Insights from cross-linguistic variations and similarities. Gesture, 5(1), 215-237. Slobin, D.I. (1987). Thinking for speaking. In. J. Aske, N. Beery, L. Michaelis, H. Filip (Eds.), Proceedings of the 13th annual meeting of the Berkeley Linguistic Society meeting (pp. 435445), Berkeley, CA: Berkeley Linguistics Society. Slobin, D.I. (1996). From “thought and language” to “thinking for speaking.” In J.J. Gumperz & S.C. Levinson (Eds.), Rethinking linguistic relativity (pp.70-96). Cambridge: Cambridge University Press. Talmy, L. (1985). Semantics and syntax of motion. In T. Shopen (Ed.), Language typology and syntactic description, Vol.3, Grammatical categories and the lexicon (pp.57-149). Cambridge: Cambridge University Press. Vigliocco, G. & Kita, S (2006). Language-specific properties of the lexicon: Implications for learning and processing. Language and Cognitive Processes, 21(7-8), 790-816.
Forgetful or Strategic? The Mystery of the Systematic Avoidance of Reference in the Cartoon Story Narrative Nobuhiro Furuyama National Institute of Informatics, Japan
Kazuki Sekine Shirayuri College, Japan
When asked to retell the cartoon story that is the focus of analysis presented here, most narrators avoid mentioning a particular piece of information. We propose an explanation for this phenomenon in terms of the speakers’ unwillingness to disrupt gestural ‘catchments’: the recurrence of gesture features spread out over a stretch of discourse. Because their recurring features suggest a common discourse theme (McNeill, 1992), catchments are useful for reference maintenance. Our analysis points to the possibility that catchments not only serve to achieve cohesiveness in discourse (that is, reference maintenance), but may also constrain the selection of information that is talked about.
1.
The Phenomenon: Systematic Avoidance of Reference
When asked to retell a cartoon story, most speakers, curiously, avoid mentioning a particular piece of information. The cartoon we refer to is Canary Row, one of the Warner Brothers, Inc. cartoon series featuring Sylvester the Cat and Tweety Bird. This is the cartoon used often in studies by McNeill and his students. The story is less than seven minutes long, and consists of eight episodes in which Sylvester repeatedly tries to catch and eat Tweety Bird, who is staying with his owner, Granny, in a hotel across the street. Each of Sylvester's attempts is thwarted by Tweety, by Granny, or both. For the study described in this chapter, twenty-eight speakers were asked, individually, to watch the cartoon and, immediately after, to retell the story to a second person, in detail and in its original order. All sessions were videotaped. In the fifth episode of the cartoon, Sylvester tries to disguise himself by dressing up as a bellhop (see below). When recounting the events of this episode, there is one piece of information that most narrators omit: the direction in which Sylvester escapes from Granny during the episode’s ‘punch line’ scene. Although the fifth episode is rather complicated and longer in duration than the other episodes in the story, we will argue that these omissions are not due to random
76 FORGETFUL OR STRATEGIC? errors of memory; rather, they are systematic reflections of certain discourse organizing processes that the analysis of gesture together with speech can reveal. The following is an abridged description of the episode, based on the fuller description in McNeill (1992:369-371). The events that are the targets of our analysis are italicized. Granny calls the desk clerk to ask that a bellhop bring her bird to the lobby, because she wants to check out. Eavesdropping on the conversation, Sylvester disguises himself as a bellhop and knocks on Granny’s door. Granny tells him that she’ll meet him in the lobby, and disappears. Sylvester enters the room, picks up the bird cage, completely covered with a cloth, and leaves. The scene shifts to the back alley, where Sylvester, rubbing his hands and grinning evilly while looking at the covered bird cage, looks around and starts to pull off the cover. Granny appears in the bird cage, holding an umbrella. Sylvester screams as Granny says "aha!" and bops him on the head with the umbrella. Sylvester falls, gets up, Granny bops him again, and Sylvester gets up and runs off the screen to the left. Granny, still in the bird cage, follows him, and he appears on the screen to the right, running to the left. Granny appears again, still in the cage, with her feet sticking out, and runs to the left of the screen, hitting Sylvester with her umbrella.
Table 1 shows details of the speakers’ descriptions of the Bellhop episode. Twenty-three of the twenty eight speakers described the Bellhop episode in its entirety. Among them, six speakers referred to Granny chasing Sylvester in either speech, gesture, or both. Only two speakers mentioned the direction of the chase, and each expressed this information only in gesture. The question is, why do most speakers fail to mention this directional information when describing the punch line event of the Bellhop episode? Table 1: Number of speakers who referred to Bellhop episode, relative to their position with respect to the listener. Position of speaker relative to listener Referred to: To the left of To the right of Total listener (n=17) listener (n=11) Bellhop episode in speech 13 10 23 Chase in either speech or gesture 4 2 6 Direction of chase (to the left) in 1 1 2 either speech or gesture
2.
The Reason for the Biased Avoidance of Mention
2.1 Lack of memory capacity? One of the first explanations that comes to mind is that many speakers simply do not remember the information in question. Although we do not subscribe to this point of view, there are several reasons to think it could be so. Considering that the bellhop episode is the fifth of eight, it is not surprising that five of twenty-eight speakers failed to describe the scene at all. The possibility
NOBUHIRO FURUYAMA AND KAZUKI SEKINE 77
that ‘primacy’ and ‘recency’ effects are in play in the recall of a multiple-episode story would predict that an episode in the middle would be remembered less well. Given the likelihood of such effects, it is even surprising that twenty-three speakers described the Bellhop episode at all. Moreover, speakers who did describe the episode did so with great vividness and precision, including minute details of other aspects of the episode. Needless to say, behind this is the technique of the producers of the cartoon, who tried their best not to bore the audience. In any event, if lack of memory capacity is the reason for the systematic avoidance of reference to the directional information, we need to explain convincingly why this particular piece of information alone, but not others, systematically slipped out of the speakers' minds. Is there anything that biased the selection of what to talk about and what to leave out? In particular, is there anything that suppresses the description of information concerning the direction of the chase? 2.2
Anything more newsworthy?
Here is another possible explanation. The episodes preceding the Bellhop episode end with the following alternating pattern: a) Sylvester is not hit on the head (but simply thrown out of the building), b) Sylvester is hit (on the head or some other body part1), c) Sylvester is not hit, d) Sylvester is hit on the head. Although this alternating pattern makes us expect that the Bellhop episode will end with Sylvester not being hit, this was not the case. Linguistically, anything that deviates from an established pattern can normally be considered newsworthy and is likely to be marked as new information. In the present case, the deviant pattern might be treated this way. It is possible to think that this may have made the information concerning direction relatively less worthy of mentioning in the narrative. In fact, many speakers explicitly emphasized the reoccurrence of the event by saying, for example, “Sylvester was hit again”—despite the expectation of the opposite—while failing to mention the direction of motion. This explanation may sound plausible. However, the following argues against it. The punch line of the Bellhop episode deviates from a second consistent pattern: Sylvester, almost always, shows up on the right side of the screen, moves to the left to catch Tweety Bird, and is thrown out back to the right (the vertical dimension is ignored here). Although this leads us to expect the same pattern for the Bellhop episode, our expectation is betrayed at the end, when Sylvester escapes from Granny to the left of the screen. This is potentially newsworthy and thus, as for the other deviant pattern, should be marked as new information. In other words, the deviation from the consistent pattern of direction should compete with the deviation from the alternating pattern of hitting. 1
The audience merely hears the sound of hitting behind the wall of the hotel in this episode, and which body part Sylvester was hit can only be imagined. But given the sound of hitting, we can surely infer that he was hit before he was thrown out of the building.
78 FORGETFUL OR STRATEGIC? Importantly, however, that was the very information that systematically fell into oblivion in the narratives of most speakers. The explanation above, invoking the notion of new information, thus fails to explain why the priority for most speakers was to mention and even emphasize the deviation from the alternating pattern in hitting rather than the deviation from the consistent pattern in direction. It would be worthwhile to mention that most, if not all, speakers did not show any sign of hesitation in choosing to focus on the deviation from the hitting pattern alone. In other words, we see no evidence of competition between these two details from the scene, in the minds of the speakers as they proceed through their narrations. 2.3
Is the problem intrinsic to the spatial content?
Is it possibly because it is difficult or, perhaps tedious to explain the deviant spatial pattern, compared with other information? Perhaps not: A full twenty-five out of twenty-eight speakers do refer to the deviant spatial pattern in a later episode. This episode features a trolley car, and Sylvester running away from it along an electric cable, to escape from Tweety and Granny. In this later episode, Sylvester runs from right to left (Table 2). Table 2: Trolley car episode.
Referred to:
Trolley car episode in speech Chase in either speech or gesture Direction of chase (to the left) in either speech or gesture
Position of speaker relative to listener To the left of To the right of listener (n=17) listener (n=11) 17 11 17 11 14 11
Total
28 28 25
Is it because the direction of the chase is trivial or irrelevant to the punch line? Again, perhaps not. Rather, we can consider the deviant spatial pattern in the Bellhop episode to anticipate the one in the trolley car scene. We need to remember that this type of animated cartoon story is targeted to an audience of small children. It thus largely depends on spatio-temporal layouts such as areas of the screen (for example, right versus left areas) and direction of movement (right to left versus left to right), each associated with particular cartoon characters as well as with changes in layout, to convey information about the plot of the story; rather than depending solely on words. It is not that verbal expression is completely absent from this episode. It may not even be the case that the audience is explicitly aware of information conveyed by the changes in layout. However, possible subliminal effects of the spatio-temporal layouts and their changes on the audience’s understanding of the plot cannot be ruled out. In this sense, the shift in spatial organization of the characters' movements is not trivial; rather it is crucial to understanding the plot of the story. We need to explain why most speakers nonetheless avoid mentioning the direction of movement in the Bellhop episode.
NOBUHIRO FURUYAMA AND KAZUKI SEKINE 79
2.4
Another face of catchments: An explanation from the viewpoint of reference maintenance
The explanations we have examined so far only consider the speaker's cognitive limits or the constraints from the cartoon story itself. We would like to explore yet another possibility to explain the phenomenon, in terms of constraints from the speech-gesture coordination of a single communicative system, drawing on insights from McNeill's work on catchments (for example, McNeill, 1992; 2005). When discourse unfolds, there are necessarily changes and constancies. If there are no changes, discourse cannot be said to unfold. If there are no constancies, discourse cannot be understood as a single unity. In the Growth Point theory that McNeill has developed over the years, growth points (GPs) have to do with changes and catchments with constancies. The GP is a minimal unit of thinking-for-speaking that contains elements opposing one another, while the GP itself is also in opposition with its contextual background. The oppositions at different levels of analysis fuel a dialectic between opposing elements, until a full-fledged idea is developed and expressed in words and gesture. A catchment, on the other hand, is a recurrence of one or more gesture features in at least two (and often more) gestures over a stretch of discourse (McNeill, 2005). The gestures that compose a catchment are not necessarily consecutive, but can be spread throughout the discourse. The recurrent features then suggest a common discourse theme that enhances the cohesiveness of the discourse and, in particular, enables a speaker to use catchments, among other linguistic devices, for reference maintenance (Furuyama, 2001). Furuyama (2001) demonstrated that in many Japanese narrations of Canary Row, catchments materialize, for example, as lower-right gesture space reserved for Sylvester to show up and upper-left gesture space for Tweety and Granny. This reflects what the characters do in the cartoon story: Sylvester typically shows up on the right side of the screen, moves to the left to catch Tweety, and is thrown out back to the right. Spatial layout and movement are expressed consistently in the narrators’ gesture, and thus constitute a catchment. In Japanese narratives, subject noun phrases are often elided. Topic shifts usually take place with explicit formal means (for example, with a full noun phrase), but they sometimes occur without explicit formal means, even when there is more than one possible referent. Reference is often unambiguous, however, because of the recurring gesture features in accompanying catchments. Catchments are thus useful and indeed used effectively to maintain references (Ibid.).2 If a narrator were to change a catchment to precisely describe the deviant spatial pattern in the punch line of the Bellhop episode, then s/he would not be 2
Fourteen narrations in the analysis here are taken from Furuyama (2001).
80 FORGETFUL OR STRATEGIC? able to use the catchment for reference maintenance later. The cost of changing the catchment could be very high. For one thing, it takes time to develop a catchment again once it is changed, for it involves a stretch of discourse. More importantly, it could confuse the listener. In the Bellhop scene, two speakers correctly expressed the direction of movement in gesture (see Table 1). But their tendency to achieve high precision often made their narratives ambiguous and even hard to understand, such that their listeners frequently asked for clarifications. This suggests that, for many speakers, a preferred means of reference maintenance was to keep the catchment intact and preserve the main point of the punch line of the Bellhop episode, while giving up descriptive precision in terms of direction of movement. This also explains why, in contrast, many speakers mentioned the deviant spatial pattern in the trolley car scene (Table 2). That is, because this was the final scene, speakers did not need to preserve the device for later reference maintenance. In discussions of catchments over the years, they have been characterized as having to do with the cohesiveness of discourse. I once emailed David McNeill (perhaps right after I wrote my dissertation under his supervision) to ask what image he had in mind when he used the term ‘catchment’. I referred him to the following definitions from the Random House Dictionary: 1) the act of catching water; 2) something for catching water, as a reservoir or basin; 3) the water that is caught in such a catchment. McNeill’s reply was this: “These defs [sic] are all related and it's actually all three. Act of catching water—in the metaphor, ‘water’ is content from the context and the ‘act’ is the incorporation of context into the GP. Something catching the water—the zone or discourse context itself. The water—the context” (McNeill, personal communication). The present discussion unveils another face of catchments as something that constrains what to talk about and what to leave out. The linguistic determinism hypothesis says that language may influence thinking. Comparing how motion events are expressed verbally and gesturally by native speakers of English, Spanish, and Chinese, McNeill and Duncan (2000) demonstrated that typological differences among these languages, in terms of Talmy's (1985) categorization of ‘satellite framed’ versus ‘verb-framed’ languages, affect how speech and gesture share the expression of information. Up to the present, however, there has not been any discussion of whether and how speech and gesture—as a single communicative system—may influence what information to mention. The present study would thus be an addition to the speech-gesture research that McNeill pioneered more than a quarter of a century ago and that he has been in the forefront of since then.
NOBUHIRO FURUYAMA AND KAZUKI SEKINE 81
3.
Conclusion
In this paper we reported the phenomenon of the systematic avoidance of reference to certain information in cartoon narrations. We proposed an explanation in terms of narrators’ unwillingness to disrupt a catchment used to maintain reference, because its recurrent features suggest a common discourse theme (Furuyama, 2001). This explanation suggests that catchments not only serve to achieve cohesiveness in discourse (for example, through reference maintenance), but also may influence the selection of what information to talk about—and what to omit. References Furuyama, N. (2001). De-syntacticizing the theories of reference maintenance from the viewpoint of poetic function of language and gesture: A case of Japanese discourse. Unpublished doctoral dissertation, University of Chicago Psychology Department. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago and London: University of Chicago Press. McNeill, D. (2005). Gesture and thought. Chicago and London: University of Chicago Press. McNeill, D. & Duncan, S. (2000). Growth points in thinking-for-speaking. In D. McNeill (Ed.), Language and gesture (pp.141-161). Cambridge: Cambridge University Press. McNeill, D. (Ed.) (2000). Language and gesture. Cambridge: Cambridge University Press. Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and syntactic description, vol. III: Grammatical categories and the lexicon (pp.57-149). Cambridge: Cambridge University Press. Shopen, T. (Ed.) (1985). Language typology and syntactic description, vol. III: Grammatical categories and the lexicon. Cambridge: Cambridge University Press.
Metagesture An Analysis of Theoretical Discourse about Multimodal Language∗ Fey Parrill Case Western Reserve University
This chapter explores a particular gesture characteristically produced by David McNeill, a major figure in the world of multimodal language research. While the primary purpose of the analysis is an affectionate tribute to this great thinker, the phenomenon discussed gives rise to questions which are of general interest. I claim that this gesture is an example of a speaker-specific gesture. I argue that such gestures can be distinguished from representational gestures and emblems because they are neither conventional nor wholly spontaneous. The psychological processes by which they are generated are thus worthy of attention.
1.
Introduction
When I tell people that I study gesture, they often express concern that I will plumb the depths of their souls by observing their hands. In reality—and I think I am not unusual among gesture researchers in this respect—I am typically not any more attentive to my interlocutor’s gestures than someone who is not a student of multimodal language. I must confess, however, that in the case of my former advisor, David McNeill, I am guilty of covert observation. Having been fortunate enough to spend a considerable amount of time listening to David talk, I became aware of a gesture form that tended to recur in his discourse. I will refer to this form as the ‘growth point gesture’—a label to be justified shortly. In this chapter, I provide a semiotic analysis of the gesture and discuss the implications of such forms for our understanding of conventionality in gesture production. The growth point gesture involves two hands with fingers cupped, one positioned slightly further from the body than the other, as shown in Figure 1. The top hand (the right hand in the figure) is oriented with the palm facing away from the body, while the palm of the bottom hand faces towards the body. The gesture is sometimes held while superimposed beats are performed and may be followed by a gesture in which the top hand moves towards and away from the bottom hand. This gesture is sometimes followed by a ‘presenting’ gesture, in which both hands move towards the interlocutor with the palms up as though offering an object for inspection. This paper addresses two questions. I first ask whether the growth point gesture occurs with the same meaning in different contexts. I next *
I thank Susan Duncan for supplying the data used in this analysis.
84 FEY PARRILL take up the question of the status of this form. It is an idiosyncratic gesture, but appears regularly in this speaker’s talk. What does the existence of such forms tell us about conventionality in multimodal language production?
Figure 1: The growth point gesture.
2.
Data
I used two samples of David’s discourse for this analysis. (These were samples of convenience, but I believe them to be representative.) The first is a conference presentation given in 1995, in which David discusses the gestures hearing adults produce when asked not to speak, and compares them to the gestures of home-signing children (later published as Goldin-Meadow, McNeill, & Singleton, 1996). This sample provides fourteen minutes of data. The second is a segment from a film made by the BBC about a man who has lost his sense of proprioception (Crichton-Miller, 1998). David discusses the implications of this condition for gesture-speech integration. This sample provides three minutes of data. I found five instances of the growth point gesture in the seventeen minutes of video provided by these two samples. To be counted as an instance of the growth point gesture, a gesture had to meet the following criteria: two hands, both in spread C shape, one hand closer to the body than the other, with palm oriented away from the body, one hand further from body than the other, with palm oriented towards the body. I first present the five examples, then discuss their significance. In the transcriptions below, the data source is identified by year, “1995” being the
METAGESTURE 85
presentation, “1998” being the documentary.1 Because the form of the gesture is essentially the same for each example, I will not include video stills. (1)
He has this very well honed [synchronization1] [of s* speech and gesture2] so that both speech and gesture are [uh presenting the meaning3] [to the listener at the same time4] 1998. R is top hand (hand closest to body), L is bottom hand. 1 is the growth point gesture. 2 involves motion of the top hand towards and away from the bottom hand. 3 and 4 are twohanded presenting gestures.
(2)
The gestures [and their synchronized1] [speech form2] which might be a word or a a a a a longer stretch of speech uh … the gestures and their synchronized speech cover the same idea units. 1995: 3:10. L is top hand, R is bottom hand. 1 is the growth point gesture. In 2, R holds and L traces a rounded path over R.
(3)
[They present closely related ideas1] [or different aspects of a single idea2] [at the same time3] 1995: 3:29. L is top hand, R is bottom hand. 1 is the growth point gesture. 2 involves motion of the top hand towards and away from the bottom hand, and 3 is a two-handed presenting gesture.
(4)
Gestures that accompany speech are mimetic and imagistic whereas the linguistic component is analytic and provides standardized categories of experience and so by saying that [these kinds of gestures1] [and language2] [form an integrated system3] we’re saying that language is richer* more complex than would appear 1995: 4:35. L is top hand, R is bottom hand. 1 is the growth point gesture. In 2, R moves away from the speaker’s body. In 3 the two hands come together with palms facing towards center, still in spread C shapes, and two small beats are performed timed with the two syllables of, “system.”
(5)
But … so … but the point is to compare these* the* kind of the uh [the most basic properties1] of these … [home sign*2] these invented languages by children to the uh* which are gestural … to the uh the gestures of hearing adults 1995: 11:02. L is top hand, R is bottom hand. In 1 the two hands come together with palms facing towards center in spread C shapes. Two beats are performed. 2 is the growth point gesture. Two beats are performed here as well, timed with the two words.
1
Timecode has been included for the 1995 examples so the reader can tell where they occur relative to each other. Gestures occur within the bracketed speech. Peak prosodic emphasis is marked with bold text, “*” is a self-interruption, “…” is an unfilled pause.
86 FEY PARRILL 3.
The Meaning of the Growth Point Gesture
Is the meaning of the growth point gesture constant in these different examples? The first three cases suggest that the gesture occurs when David is talking about the synchronization and co-expression of the manual and vocal modalities. The two hands delimit a space in which a virtual object is contained. The virtual object is a conceptual unit comprised of speech and gesture: in other words, a growth point. This hypothesis is my basis for referring to the form as the ‘growth point gesture’. In examples 1 and 3, the growth point gesture is followed by a gesture in which the top hand moves towards and away from the bottom hand. The co-occurring speech in example 3, “different aspects of a single idea,” strongly suggests that the motion represents an opposition between the two modalities. I am thus hypothesizing that the motion in examples 1 and 3 represents a dialectic between the imagistic and linguistic aspects of the language system. This dialectic is an integral part of the growth point theory (McNeill & Duncan, 2000; McNeill, 2006) and is likely to be part of the speaker’s conceptualization. In the fourth example, the growth point gesture occurs with “these kinds of gestures.” By “these,” David means gestures that accompany speech and that are mimetic and imagistic. It is unclear whether the meaning of the gesture in this example is the same as in examples 1-3. In example 4, the virtual object may simply be speech-accompanying gestures, rather than an idea unit composed of both speech and gesture. The distinct gesture produced along with “language” could support such an interpretation. The meaning of the gesture in the fifth example is somewhat unclear, as it occurs during a period of verbal dysfluency. One interpretation is that the virtual object contained within the speaker’s hands is ‘home sign’. This is plausible in the context of the single utterance. It is less plausible when one considers the other contexts in which this form has appeared, in conjunction with the speaker’s hesitations and self-interruptions. A second possibility is that the gesture represents speech-accompanying gestures, but has anticipated the speech with which it might be expected to co-occur (“the gestures of hearing adults”). In other words, in planning the utterance, “the point is to compare [A and B],” the ordering of A and B has become confused, perhaps because a third piece of information needs to be included: namely, both A and B share the property of being gestural. As a result, the gesture occurring with the A element is actually coexpressive with the element that surfaces as B. Regardless of which of the above interpretations one favors, it is clear that this particular gesture occurs in very similar discourse situations. The growth point gesture does not appear with speech about goats or yams, but with speech about the following things: a conceptual unit comprised of speech and gesture (examples 1, 2 and 3) and speech-accompanying gestures (examples 4 and 5). Are these two meanings sufficiently different that the label ‘growth point gesture’ is not really justified? Given that speech-accompanying gestures are co-
METAGESTURE 87
expressive—that is, they are the outcome of a dialectic between imagery and language—I would argue that the difference between these meanings is not enormous. 4.
The Status of the Growth Point Gesture
The growth point gestures produced in examples 1 and 3 are extremely similar, despite being separated by several years. They are also nested within an identical series of gestures. These two facts suggest that the growth point gesture is relatively fixed for the speaker. Of course, the fact that David produces similar speech and gesture when talking about speech-gesture integration is not surprising. He has described this phenomenon perhaps hundreds of times, so his conceptualization of it is naturally somewhat entrenched. However, it is precisely the existence of such entrenched conceptual units that I believe warrants some attention. What exactly is the status of this recurring gesture? Does it reflect the same kind of visuospatial thinking that gives rise to other gestures? There are, broadly speaking, two reasons why multiple gestures with the same physical form come to be produced. The first is convention. Certain kinds of gestures (often referred to as ‘emblems’) are akin to lexical items: they have culturally-specified forms and meanings (Kendon, 2004; McNeill, 2005). An example is the ‘thumbs up’ gesture in American culture, which has the general meaning of positive evaluation. It is not always easy to determine whether or not a gesture should be regarded as an emblem—this is a complex subject which cannot be adequately addressed here (but see Parrill, in press and the references therein). The growth point gesture, however, is produced by one speaker only, thus is clearly not an emblem. Gestures with the same physical form are also produced because the imagery which gives rise to them is the same. For example, different people describing a cartoon stimulus will tend to produce similar gestures because seeing the same visual input results in the generation of similar mental imagery (McNeill, 1992; McNeill & Duncan, 2000). A single person will also produce multiple gestures with similar forms when certain imagistic content recurs at different points in a discourse. Such occurrences have been described as ‘catchments’ (McNeill, 2000). A catchment is a set of recurring gestural features (hand shape, motion, position). The recurrence of features is assumed to reflect recurring imagery. This imagery can be relatively concrete and linked to an external stimulus, as in the case of cartoon narrations, or can relate to more abstract, internally generated content. Tracking catchments is thought to provide insight into the organization of a speaker’s discourse. Is the recurrence of the growth point gesture an instance of a catchment? This seems like a logical description for the cases that occur within a single discourse (the four examples from the 1995 sample). The sighting of a virtually identical gesture in a different discourse, however, suggests something different. The growth point gesture appears to be an example of a third phenomenon, a speaker-specific gesture.
88 FEY PARRILL The two distinguishing features of the speaker-specific gesture are that 1) the gesture is generated on the basis of imagery, not convention, but 2) the gesture is routinized, rather than being wholly spontaneous. Such gestures warrant attention because routinization changes the nature of the language production process. If we repeat an action (such as describing a theoretical concept) we will form schemas and motor programs for that action (it will become routinized). Thus, certain images are very likely to be activated and certain motor programs very likely to be run whenever we have occasion to engage in that explanatory task—we have developed a procedure. The mechanisms on which such processes depend are complex, but a familiar example from speech might help to make the point. All speech production is routinized, of course, but further online routinization is sometimes observable when one speaker produces a sentence structure another speaker has just used. Here the assumption is that the structure has been primed—it is produced because it is more active in memory (Bock, 1986; Pickering & Garrod, 2004). As a result, the process of selecting a structure has become slightly less spontaneous for the speaker. My evidence for the claim that the growth point gesture is routinized comes from the fact that the form of the gesture is quite similar even when speech is different. This pattern suggests that the image is the conceptual unit, not the speech-gesture package. In other words, it is not the case that David has a completely memorized spiel about gesture-speech integration. Instead, the image which is at the heart of his theory is not created spontaneously each time he describes it, but is retrieved from memory as a unit. 5.
Conclusion: Some Future Explorations
Speaker-specific gestures are of interest because they represent a different point on a continuum of conventionality than the wholly conventional or wholly spontaneous gestures which prior research has focused on. Degree of conventionality is at the heart of language and language change, and is thus a topic of central importance. While elements of varying conventionality exist in speech (slang terms, idioms, constructions, etc.), the gesture considered here is an imagistic counterpart. I wish to end with some brief comments on the potential for empirically identifying speaker-specific gestures and for testing the claims I have made about their production. I expect that these gestures occur in the discourse of most speakers. Anyone who tells the same story over and over—whether it is a narration about a past experience or a description of a theoretical construct—will streamline production in the manner described above. Some information about the changes that occur when a narration is repeatedly retold does exist (Levy & Fowler, 2005). However, this work deals with different phenomena, and the area is, in general, under-studied. Speaker-specific gestures are likely to be particularly prevalent, and easy to identify, in discourse genres where material is prepared and familiar, such as lectures. A good starting point in understanding this
METAGESTURE 89
phenomenon, therefore, might be to look for characteristic gestures in the lectures of a number of different people. Such an analysis should establish whether the claims made here can be generalized to other speakers. Empirical tests of the psychological processes underlying the production of these gestures will have to await a better understanding of the phenomenon. In the meantime, I would like to suggest one indirect assessment. If, in speaking about a concept, a particular image is automatically generated and is an integral part of the explanatory process, drawing the speaker’s attention to that image might perturb normal production significantly. For example, a chapter analyzing such a behavior might serve to make the speaker self-consciousness enough that his or her thinking will grind to a halt. If we see no further productive work from David McNeill, my hypotheses will be supported. While such an outcome might be undesirable for David, he can console himself with the knowledge that we have gained some insight into the nature of multimodal language. References Bock, J. K. (1986). Syntactic persistence in language production. Cognitive Psychology, 18(3), 355-387. Crichton-Miller, E. (Producer). (1998). The man who lost his body. London: BBC Horizon. Goldin-Meadow, S., McNeill, D., & Singleton, J. (1996). Silence is liberating: Removing the handcuffs on grammatical expression in the manual modality. Psychological Review, 103, 34-55. Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge: Cambridge University Press. Levy, E. T., & Fowler, C. (2005). How autistic children may use narrative discourse to scaffold coherent interpretations of events: A case study. Imagination, Cognition and Personality 24(3), 207-244. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press. McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press. McNeill, D. (2000). Catchments and contexts: Non-modular factors in speech and gesture production. In D. McNeill (Ed.), Language and gesture (pp.321-328). Cambridge: Cambridge University Press. McNeill, D., & Duncan, S. D. (2000). Growth points in thinking-for-speaking. In D. McNeill (Ed.), Language and gesture (pp.141-161). Cambridge: Cambridge University Press. Parrill, F. (In press). Form, meaning and convention: An experimental examination of metaphoric gestures. In A. Cienki & C. Müller (Eds.), Metaphor and gesture. Amsterdam: John Benjamins. Pickering, M. J., & Garrod, S. C. (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27(2), 169-226
Potential Cognitive Universals Evidence from Head Movements in Turkana∗ Evelyn McClave California State University Northridge
The Turkana people of northwestern Kenya move their heads in three communicative environments, in ways identical to speakers of Arabic, Bulgarian, Korean, and African-American Vernacular English. When verbally expressing inclusivity, their heads sweep laterally. Individual head movements co-occur with individual items on a list, and they orient their heads to locate absent or abstract entities in the gesture space. The head reorients to the previously identified space with each subsequent mention of the entity. When identical head movements occur in the same communicative environments among these linguistically and culturally unrelated groups, we can hypothesize that they are indicative of cognitive universals. Conceptualizing abstract concepts and absent entities as occupying physical space may thus be a cognitive universal.
1.
Introduction
1.1
Turkana
The Turkana district of northwestern Kenya is approximately 75,000 square kilometers bordering Sudan and Uganda. In the early 1970s, the anthropologists David and Judith MacDougall lived among the Turkana and documented their customs and activities in three films that form The Turkana Trilogy. At the time, most Turkana were nomadic pastoralists. Their way of life and Kenya’s political history jointly contributed to their isolation. Even though the MacDougalls were not interested in gesture, as a result of their work researchers can observe spontaneous gestures made by an isolated community over 30 years ago. Based on the films, David McNeill (1992) discusses manual gestures made by Turkana speakers in Hand and Mind. This is how I first learned of the Turkana and why I, too, turned to The Turkana Trilogy to investigate how widespread the cooccurrence of certain head movements was with certain verbal expressions of meaning.
∗
Special recognition and gratitude are given to Dr. Michael Lokuruka, a native Turkana, for his Turkana transcriptions of the data.
92 EVELYN MCCLAVE 1.2
Head movements among European-Americans
Earlier research on movements of the head among European-Americans (McClave, 2000) found that even though the head is incapable of assuming different shapes at will as do the hands, head movements also convey the speaker’s meaning. While we are all familiar with the emblematic head gestures for “yes” and “no” in America, European-Americans move their heads in predictable ways when conveying many other concepts. Two spontaneous dyadic conversations of an hour’s length each formed the database for this study. After watching the tapes multiple times, head movements and co-occurring speech were noted. It then became clear that all four subjects moved their heads similarly in the same communicative environments. A summary of the forms of head movement and correlating contexts follows. Expressions of inclusivity such as ‘everyone’, ‘completely’, or ‘whole’ are often accompanied by lateral movements. At the beginning of direct quotes, European Americans move their heads to a different position or orientation. The change in head position marks the speaker’s change in footing from narrator to that of a character in the narration. European Americans also mark each item on a list with an individual head movement, often to a contrasting position. Heads are convenient for pointing, of course, when the hands are full, but research shows that speakers also locate absent or abstract entities in a certain location in gesture space through head orientation. The head orients to the same position with each subsequent mention of the entity. When intending to intensify what they are saying, European-Americans use lateral shakes, that is repeated lateral movements resembling the head movement for negation although no negation is stated or implied. For example, lateral shakes accompany utterances such as “It was really great.” Often the speaker’s mental conceptualization of the size of a referent is conveyed by head movements as we observe speakers raising their heads when describing tall buildings or people. Similarly, in narration speakers lower their heads to report what they said to a child or to someone shorter in height than they are. Verbally, speakers mark uncertainty with expressions such as ‘or something’ and ‘I guess’. In such environments we observe lateral shakes, often with constrained trajectories. It has been known for some time that Americans use head nods to backchannel; that is, they nod to indicate to a speaker that they are actively listening (Yngve, 1970). It was presumed that such nods were internally motivated; that is, the listener nodded when she felt like it. Microanalysis revealed, however, that speakers often requested listener feedback by nodding while speaking, and listeners recognized such nonverbal signals and complied with their own nods or verbal backchannels, often within a second. In these eight different communicative environments, head movements among EuropeanAmericans are conventionalized even though most are unaware of the connection between these movements and contexts. Table 1 summarizes these results.
POTENTIAL UNIVERSALS 93
Table 1. Communicative environments and European-American head movements. Environment Inclusivity Intensification Uncertainty Direct Quotes Mental Imagery Deixis Lists or Alternatives Backchannel Requests
Head movement Lateral movement Lateral shakes Lateral shakes – usually constrained trajectory Change in head position Indicates size of an entity Specific head orientation for specific referent Individual movement for each item on a list Nods
1.3 Head movements among speakers of Arabic, Bulgarian, Korean, and African-American English To determine whether any of the head movements observed among European-Americans were cross-cultural, the research was expanded to three unrelated languages: Bulgarian, Egyptian Arabic, and Korean. Six dyads of native speakers of these languages were recruited and filmed in casual conversation for approximately an hour by a native-speaking researcher. In addition, we also recorded a conversation among three African–American women to assess possible cultural variations within English (McClave, Kim, Tamer, Mileff, under review). The researchers first identified the eight communicative environments from the original study (McClave, 2000), based on speech alone. Then the co-occurring head movement, if any, was noted. Because all of the conversations were spontaneous, some communicative environments did not occur in some of the conversations. For example, only the Bulgarian and African-American conversations contained references to the size of entities. In other cases, some subjects did not move their heads in environments where European-Americans typically do; for example, the Bulgarians did not change their head positions at the beginning of direct quotes. We can surmise, therefore, that some head movements are culturally specific. Identical head movements, however, occurred in three environments across all four cultures: lateral movements co-occurred with expressions of inclusivity, the head changed position for each item on a list, and the head oriented to a specific location selected by the speaker when referring to non-present or abstract entities. In addition, in each culture speakers used head movements to elicit backchannels from their listeners, although the specific form of movement conformed to the culture’s head motion for affirmation. For example, the Bulgarians used lateral shakes to elicit backchannels, since they move their heads laterally to signal agreement, unlike Americans who move their heads laterally for negation. These spontaneous head movements are cross-cultural. Could they possibly be universal?
94 EVELYN MCCLAVE
2.
Head Movements Among the Turkana
2.1
Data and method
In a first attempt to investigate potential universality, we sought members of a geographically isolated, pre-technological culture who had little or no contact with outsiders and who would not have been exposed to the head movements of other groups through television. The historical anthropological tapes of the Turkana in Kenya made by the MacDougalls provided valuable data, since before Kenyan independence in 1963, “outsiders were excluded throughout British rule” from the Turkana District, and “there were still road blocks on entering Turkana as late as 1976” (Dyson-Hudson, 1999:38). The isolation was not absolute, however. The anthropologist Philip Gulliver did fieldwork there from 1948 to 1949, but at that time, there were only nine Europeans in the entire district (Gulliver cited in Dyson-Hudson, 1999:38). The MacDougalls structured their three films around one Turkana man, Lorang, who had been forced to serve in the Kenyan army. He, therefore, had been in other parts of Kenya. The examples cited below, however, come from Turkana speakers who had never left the Turkana District. Studying the Turkana also enriched the linguistic diversity of the head movement project. The five languages spoken in our data belong to four unrelated language groups: Turkana is a Nilo-Saharan language; Arabic belongs to the Semitic group; Korean is thought to be Altaic; and Bulgarian and AfricanAmerican Vernacular English are part of the Slavic and Germanic subgroups, respectively, of Indoeuropean. Another advantage of this database was that the MacDougalls’ tapes were captioned in English for general audiences. It was, therefore, possible to approximate where, if at all, any of the communicative environments identified in the cross-cultural study occurred and to note any accompanying head movement. For the precise alignment of head movements with speech, however, it was imperative to know the exact Turkana utterance and have a precise English translation. At the time I began this research (2003), there were exactly two native speakers of Turkana in the United States.1 One, Dr. Michael Lokuruka, had just finished his Ph.D. at Cornell University in the College of Agriculture and Life Sciences. He graciously agreed to assist me at Cornell. In working together, I played the target portion of the tape. Dr. Lokuruka then described the actual communicative environment. This was necessary since in some cases, the free translation used in the English captions would lead an English-speaker to assume a different communicative environment than that in the original Turkana. Dr. 1
The Kenyan Embassy informed me that there was only one Turkana in the United States, Dr. Michael Lokuruka. Dr. Lokuruka, however, knew of another Turkana married to an American and living in Texas.
POTENTIAL UNIVERSALS 95
Lokuruka then wrote down the exact Turkana utterance and provided a word-forword English translation. He later helped me align the head movements with specific speech syllables. In The Turkana Trilogy the camera was often focused only on the speaker, so any backchanneling could not be observed. However, the Turkana have no emblematic head movements for ‘yes’ and ‘no’ (Michael Lokuruka, personal communication). Thus, unlike the other cultures studied, the Turkana would not use an emblematic head movement for affirmation to request listener backchannels. The Turkana data revealed, however, that the Turkana moved their heads in the other three communicative environments in ways identical to speakers of Egyptian Arabic, Bulgarian, Korean, and African-American English; that is, the Turkana use lateral head sweeps for inclusivity; individual head movements to mark items on a list; and head orientation to locate non-present referents in the gesture space.2 2.2
Inclusivity
It is hypothesized that our bodily experience of visually surveying a collection of objects, most often arranged horizontally because of the earth’s gravitational pull, is the origin of the lateral sweep of the head to express inclusivity (McClave, 2000:860). In the film Lorang’s Way, the anthropologists have asked Ngimare, Lorang’s closest friend to talk about Lorang’s background. In this example, Ngimare is recounting how Lorang was forced into the army.3 (1)
Arikete kolong ngimoe toriko tonangeta ng{alupo They took him in the past foreigners took him got him lands nugu daaaaang} these alllll Head sweeps left on ‘–alupo nugu daang’. The syllable ‘daang’ is lengthened verbally.
“They took him in the past. Foreigners took him and got him all this land.” The marked verbal elongation of the Turkana word for ‘all’ (‘daang’) cooccurs with the lateral head sweep. This example supports the view that Turkana, like Arabic, Bulgarian, Korean, and English, uses lateral head movements to express inclusivity.
2
In the excerpts presented here, analogous to the brackets that are used to mark the duration of strokes in manual gestures, braces in the following examples mark the duration of the head movements. Underlining indicates that the head is held in that position. 3 Readers with access to the MacDougalls’ films are reminded that the excerpts below reflect the exact Turkana speech and do not necessarily match the subtitles on the films.
96 EVELYN MCCLAVE
Figure 1: ngalupo “lands”
2.3
Figure 2: daang “all”
Lists or alternatives
When listing or presenting alternatives, the speaker’s head moves with each succeeding item (McClave 2000:867). This next excerpt occurs shortly after the example quoted above. Ngimare is elaborating on how Lorang came to be prosperous by Turkana standards. A distinct head movement accompanies each object he mentions. (2)
nakolong kidyasia all along he accumulated
ngiboro lu things these
{aro}piyat, {etab}a, {ewor}u, {pause} {ekot}, ibore, {nyaep}, money tobacco cloth coats things axes 1
2
3
4
5
6
{erokon}, {pause} ibore {daang} kidi{at} adzes everything he accumulated 7 8 9 10 Head movements 1 through 4 are forward and down. Ngimare then turns his head left toward his interlocutor and the camera for movement 5. Head movements 6 through 10 are down while the head is oriented left.
“Gradually he accumulated wealth: money, tobacco, cloth, coats, axes, adzes —everything.” This example supports the following account. We see variations in the types of movement both within and across cultures, but human perception and conceptualization of the individuality of entities or events is expressed through individualized head movements.
POTENTIAL UNIVERSALS 97
2.4
Deixis and referential use of space
When the hands are occupied, people often use their heads to point. The top of the head may be tilted, the face may turn, or perhaps the chin is jutted out in the appropriate direction. Speakers can be observed making similar head movements for absent or even abstract referents; that is, they locate a non-present or abstract entity in the gesture space by orienting their heads to a particular space. The space then becomes identified with the referent, and with each subsequent mention, they reorient their heads to the previously identified space. Naingiro is the sister of Lorang’s senior wife. In the film The Wedding Camels, she is talking about the Turkana custom of young brides going to live with their husbands’ families. Then referring to the emotions of the bride’s mother she says, “But it’s worse if she’s your last.” As she says this, she moves her left hand left. In this way she sets up the last daughter to marry on her left. She then turns her head left again as she says, “But when she goes, you long to see her.” This next excerpt follows. Three additional times her head orients to the left indicating a consistent spatialization of the non-present daughter to her left. (3)
n{ainyounio} kingoliakona when you get up and you look this way
tosub seems
iyanyuwari you want to see her
1 1. The head moves to the extreme left near the left shoulder on ‘nainyounio’. The head then holds in that position as the entire torso twists left and then moves backward as she says ‘kingoliakona tosub iyanyuwari’.
“When you get up, you long to see her.” anica elalakiarosi ngirwa when time passes by days “As time goes by, you forget –” {eyeni lokilekeng} knows her husband
be then
tama kape would say go
atamarah you decide (to forget) uh
taany itiokon see your mother
2 2. Head moves left on ‘eyeni lokilekeng’.
“Her husband would say, ‘Go see your mother’.” {ani inan}gakin when prompted
iyong ebuni tama kwa ebunenen you she comes realize she would be coming
3 3. Head turns left on ‘ani inan-‘. Head then holds left orientation while torso tilts left as she says ‘-gakin iy-‘.
ikoku kang child mine “When she comes, you realize she will keep coming to see you.”
98 EVELYN MCCLAVE suwa ngide us children
lupesur akipotor girl-children giving away
bon only
eyei is expected
ngigelayek belong somewhere else “Female children don’t belong to us. It is expected that we give them away to somebody else.” In this Turkana example and in many similar ones from Arabic, Bulgarian, Korean, and English, speakers locate non-present referents in the gesture space through head orientation. 3.
Conclusion
Turkana speakers move their heads in ways identical to speakers of Arabic, Bulgarian, Korean, and English when expressing inclusivity, enumerating items on a list, and when referring to absent or abstract entities. Thus, in five languages from four unrelated language families on four different continents speakers mark individual items in a list with series of individual head movements. Verbal expressions of inclusivity co-occur with lateral head movements in all of these cultures suggesting a conceptualization of things either physical or abstract arrayed on a horizontal plane. Speakers also orient their heads in a particular direction to locate an abstract or absent entity in the gesture space. With each subsequent mention of the entity, they reorient their head or a manual gesture to the same space. The data thus support those theories that hold humans conceptualize the non-physical in terms of the physical (Lakoff & Johnson, 1999). The spatialization of abstract concepts and absent entities as evidenced in head movements for inclusivity and deixis may be a cognitive universal. References Dyson-Hudson, R. (1999). Turkana in time perspective. In M. Little & P. Leslie (Eds.), Turkana herders of the dry savanna (pp.25-40). Oxford: Oxford University Press. Lakoff, G. & Johnson, M. (1999). Philosophy in the flesh. New York: Basic Books. MacDougall, D. & MacDougall, J. (1977). Lorang’s way (film). Distributed by Berkeley Media, Berkeley, California. MacDougall, D. & MacDougall, J. (1976). The wedding camels (film). Distributed by Berkeley Media, Berkeley, California. McClave, E. (2000). Linguistic functions of head movements in the context of speech. Journal of Pragmatics, 32, 855-878. McClave, E., Kim, H., Tamer, R., & Mileff, M. (Submitted). Head movements in the context of speech in Arabic, Bulgarian, Korean, and African American Vernacular English. Under review for Gesture. McNeill, D. (1992). Hand and mind. Chicago: The University of Chicago Press. Yngve, V. (1970). On getting a word in edgewise. In Papers from the sixth regional meeting of the Chicago Linguistic Society (pp.567-578)
Blending in Deception Tracing Output Back to Its Source1 Amy Franklin University of Chicago
In this paper, I explore gesture-speech mismatches produced during utterances that are intended to deceive, using a conceptual blend analysis. Participants in an experiment view a cartoon and are instructed to misreport portions of the cartoon during a re-telling to a friend, who is naïve to both the plot line of the cartoon and the experimental manipulation. Considering the language performance as a conceptual blend of fact and misreport, I examine the expression of the blend in speech and gesture, focusing on the information source for the content in each channel. By analyzing the entire representation of an event, including each input (i.e., factual event, misreported details, and physical space), it is possible to show how gesture and speech even in a mismatch form an integrated conceptual unit.
1.
Introduction
Gesture and speech have a complicated relationship. How the two relate can vary significantly across different discourse contexts. For example, gesture can provide details to listeners that coexpress the content of speech (e.g., nodding while saying yes), gesture can complement the information in speech (e.g., “big,” accompanied by a demonstration of the actual size of some entity), and gesture can supplement speech by conveying different but related content (e.g., “he went into the room,” accompanied by a gesture of direction and rolling manner of motion (Morford & Goldin-Meadow, 1992)). Sometimes, though, speakers express completely different ideas with their hands than with their speech. Known as ‘gesture-speech mismatches’, these combinations occur when speakers have multiple (possibly conflictual) ideas in mind, and are compelled to express all of them. For example, when individuals are taking into account both their own and another person’s visual perspective, they must consider more than one way of viewing a scene (Melinger & Kita, 2004). Similarly, in problem solving tasks such the ‘Tower of Hanoi’, two 1
These data were collected in collaboration with Susan Duncan. Susan Duncan, Elena Levy, Irene Kimbara, and Fey Parrill have graciously provided assistance and insight to this project in all its stages. When it comes to tracing output back to its source, I am grateful to David McNeill who has shaped my work and thinking.
100 AMY FRANKLIN different strategies for solving a problem can be simultaneously expressed across channels (Garber & Goldin-Meadow, 2002). Such mismatches between speech and gesture can reveal transitional learning states. Adult and child performance on a variety of tasks, from learning how gears move to Piagetian conservation, have been examined. Individuals who produce gesture-speech mismatches prior to instruction on such tasks are more likely to profit from instruction than individuals who produce matches (Perry & Elder, 1997; Church & Goldin-Meadow, 1986; see also Goldin-Meadow, this volume, for further discussion of mismatches in relation to learning). Analysis of gesture in addition to speech exploits the fact that gestures are unwitting and sensitive manifestations of speaker-internal thought processes. By looking at the relationship between modalities, more is revealed about thinking than what may be expressed by speech alone. Following McNeill (1992; 2005), it is assumed here that gesture and speech originate from a unified Growth Point (GP) which contains both linguistic and imagistic content. At the heart of the GP is a dialectic between imagery and language. A single representation is articulated across modalities, with, typically, imagery expressed in gesture and categorical linguistic content found in co-expressive speech. The unity of the GP is observed in the synchrony of gesture stroke phases with speech segments and integration across channels. In gesture-speech mismatches, more than one underlying representation is present. The question is then what is the GP of a mismatch? 2 In this paper, I explore the origin of speech and gesture when a speaker misreports a remembered event. Participants in an experimental investigation watch a cartoon and are instructed to misreport parts of what they have seen when telling the story of the cartoon to a friend. Considering the multimodal language performance as a conceptual blend of fact and fiction, I analyze the speech and gesture, taking into account the information source of the content expressed in each channel. By analyzing the entire representation of an event, including each input (that is, factual event, instructed misreport, and physical space), it is possible to show how gesture and speech even in a mismatch form an integrated conceptual unit. 2.
Conceptual Blends
Conceptual blending theory is a cognitive model in which meaning is created compositionally from ‘mental spaces’. Mental spaces are partially structured mental models fashioned when communicating (Fauconnier, 1994; 1997). They contain representations of scenarios, including the elements involved 2
Cassell, McNeill & McCoullough (1999) suggest that complete uptake of information presented in mismatches (that is, listeners perceiving and reporting information that differs across the two channels) reflects the listener’s mirroring of the speaker’s growth points. Here I investigate integration not in the listener but from the speaker’s perspective.
BLENDING IN DECEPTION 101
(for example, the participants in an event), the relations between entities, as well as the frames that structure the space (such as scripts or other information that defines the space) (Fauconnier, 1994; Coulson, 2000). In conceptual integration, multiple mental spaces (that is, inputs) are unconsciously merged together to create a new composite blend which can contain emergent structure. For example, when explaining the configuration of your new house to a friend over lunch, you might use the objects on the table as place markers in your description. Suddenly, the salt shaker is the foyer and the salad plate represents the pool. From a mental space perspective, you have created a conceptual blend built from the mental space of your house plan and the space on the tabletop. 3 This makes it possible to say things like, “right next to the pool we hope to add a tennis court,” while pointing to the space adjoining the plate. For the purposes of this paper, each source of informational content (i.e., fact or misreported details) including contextual frames is considered a mental space. In telling a lie, details from each of these mental spaces will be integrated into the blend that makes up the participant’s language output. While, “the dynamic web of links between blend and inputs remains unconscious” (Fauconnier & Turner, 2002:57), analyzing a blend allows investigation of what elements are expressed in the created space. As Parrill & Sweetser (2004:202) point out in a study of gestural metaphor and blending, “in describing the mappings between elements in complex discourse this framework can be seen to be invaluable, as it allows the analyst to a build a coherent representation of the unfolding discourse structure.” Expanding the study of gesture-speech mismatches through application of conceptual integration theory makes it possible to consider mismatches not as a divergence of channels, but rather as an expression of information from a composite blending of channels. 3.
Multiple Representations of an Event
In order for an analysis tracing the components of mismatching gesture and speech back to their origins to be viable, two conditions must be met. First, it must be reasonable to assume that multiple representations of an event are in fact active.4 Second, the content of each event construal must be known. To realize these conditions, participants in this study were experimentally induced to have two mental representations of each of several events. This was achieved by asking them to deceive a naïve listener about the content of a cartoon. We can assume that, though the participants were instructed to misreport selected details, contrasting, factual information about the viewed scenes remained simultaneously 3
Real Space, including the speaker and their immediate surroundings, is a type of mental space (Liddell, 1998,;2003). The use of the table is then a Real Space Blend (RSB). For more on RSB, see Liddell, this volume. 4 Poorer performance on secondary memory tasks during mismatches compared to matches provides evidence of simultaneous activation of two strategies (Goldin-Meadow, Nusbaum, Garber & Church, 1993; Thurnham & Pine, 2006).
102 AMY FRANKLIN active in their minds. Lane, Groisman & Ferreira (2006) find that asking people to suppress information (in non-deception conditions) results in a higher probability that they will share the information. Therefore, it is not surprising that in the experiment reported here, people produce gesture-speech mismatches expressive of information about events they viewed, together with false details about those events they were instructed to include in their descriptions. That is, content from the cartoon—viewed content—‘leaks’ (Ekman & Friesen, 1969) into the narration of scenes where the speakers are instructed to misreport what they have just seen.5 To better understand what is occurring in these jumbles of fact and fiction, my analysis draws on blending or conceptual integration theory (Fauconnier & Turner, 2002). Being ‘in the know’ concerning truth, fiction, physical space and generated results, I can trace the information in each modality back to its source. By defining a speaker’s knowledge base as a blend of these available mental spaces, I find evidence that gesture and speech are integrated. This type of analysis provides evidence that even mismatches originate from unitary growth points. 3.1
Method
3.1.1 Procedure
Pairs of friends participated in a modification of the often-used cartoon narration elicitation (see McNeill, 1992 for details). One member of each dyad (the ‘speaker’) viewed a cartoon about a bird and cat and was instructed to tell the story to their partner (the ‘listener’), who was naive about the plot. A deceptive component was added to the experiment when, prior to viewing the cartoon, it was disclosed to speakers (but not to listeners) that they were to misreport selected cartoon content, by substituting information provided by the investigator for actual details witnessed in the cartoon scenes. Speakers were instructed to weave misreported content into the fabric of their narrations.6 The misreported details directly conflicted with the cartoon content.7For example, the content of the first cartoon scene, as viewed, and the instructed misreport appear in (1a) and (1b): (1a) Viewed target scene (Input 1): The cat runs down from his apartment in his building, runs across the alley and goes into the bird’s building through the back door.
5
Beattie (2003:167-172) finds leakage in a similar experimental study. During debriefing, listeners were informed of the speaker’s deception and asked if they noticed anything unusual about any of the scene descriptions. Only one instance of unusual description was reported, but, when queried, the listener did not report any suspicion. 7 Instructed misreports for viewed cartoon content were required of the participants in one-half of the cartoon episodes. Viewed and instructed misreport episodes alternated. The ordering of presentation was counterbalanced; i.e., V-I-V-I-V-I-V-I versus I-V-I-V-I-V-I-V. 6
BLENDING IN DECEPTION 103
(1b) Instructed details to misreport about this scene (Input 2): The cat jumps from rooftop to rooftop and goes into the bird’s building through a hatch on the roof. 3.1.2 Coding and analysis
The data were transcribed and coded following protocols detailed in McNeill (1992; 2005), with gesture annotations added to speech transcripts revealing gesture-speech co-occurrence to the syllable-level of detail. The data were further coded to indicate whether the information presented in speech and gesture reflected what the speaker actually viewed (Input 1) or what the speaker was instructed to misreport (Input 2). Emergent information or confabulation that was expressed but was not present in any input was also indicated. After the meaning source was attributed to each speech and gesture unit, gesture-speech pairings were grouped according to matches and mismatches, considering speech, gesture, and information source. For example, when a participant said, “the cat jumps across from his building to the bird’s,” while producing an arcing movement that co-occurred with “jumps across,” speech and gesture were coded as matching in both content (jump) and information source Iinput 2)7. Similarly, mismatches at a lexical level, such as saying the word ball while gesturing a rolling movement, and at a conceptual level, such as representing more than one strategy or information source across channels, were also present 8 4.
Gesture and Blends
4.1
Blending across spaces
(a) (b) Figure 1: “adjacent”—(a) the left hand indicates the bird’s position in hotel window; (b) the right hand indicates the cat’s lower position on the other side of the street.
8
It is possible for matches to occur even when the speaker draws on multiple inputs, if the content from the inputs is present in both spaces (referred to as a “generic mental space”). For example, a gesture with an unmarked handshape depicting an upward trajectory can match both a climb up and a fly up input if no further information is present. Such matches are deemed ambiguous.
104 AMY FRANKLIN In Figure 1, the speaker’s gestures reflect the viewed cartoon content. He first indicates the bird’s position (high in a building), followed by a gesture contrasting this with the position of the cat (standing on the ground). His gestures reflect the juxtaposition of cartoon characters (viewed input), while in his speech he utters the contradictory word, “adjacent” (misreported content). In the blend analysis of this example, partial structures from each input are merged together in the output, as shown in Figure 2. Input 1 is the viewed cartoon scene, including the details of the cat and bird as well as the street and the cat’s motion. The misreport, found in Input 2, contains the cat, bird, the rooftop, and a jump motion. Some elements, such as the cat and the bird (as well the cat’s goal to enter the building), are identical across inputs. These shared features result in what Fauconnier and Turner (2002) refer to as the generic space. The relationships between the spaces are indicated with lines.9
Figure 2: Blend across modalities.
Here, the speaker integrates the viewed representation of the cat standing on the street looking up at the bird with the false representation of the event, the cat jumping from building to building. The visual image (Input 1) of the cartoon scene is preserved and co-produced with information necessitated by the frame that structures the ‘misreport mental space’ (Input 2). To enable the cat to jump from rooftop to rooftop, the buildings must be adjacent. While this is not explicitly stated in the instructions for misreporting this cartoon scene, this contextual information is part of the input. This example makes clear how the 9
Dashed lines indicate connections between spaces while the solid lines indicate exact matches (e.g., connections between mental spaces that are shared in the generic space) and cross-space mapping (e.g., analogy).
BLENDING IN DECEPTION 105
analyst’s knowledge of details of the eliciting cartoon permit inferences concerning components of the conceptual blend from which the multimodal production flows. Only by knowing the inputs is it possible to determine what unifies the gesture of jumping with the word “adjacent.” In fact, the listener immediately interrupts, asking, “what do you mean ‘adjacent’?” indicating his awareness of the divergence between the speaker’s gesture and speech. This example not only contains a mismatch of speech and gesture at the lexical level, but also represents input from two sources. Taking the imagistic input from the viewed scene and the linguistic components from the misreport input, the final output is a blend across modalities. This example illustrates how the tensions between imagery and language, inherent in the GP, are expressed in speech and gesture, flowing from multiple representations of the event. Figure 2 shows a blend across modalities. From a GP perspective, there is no mismatch in this example. Rather, the dialectic between imagery and language has selected from the entire knowledge state of the speaker, rather than a single mental space. Both fact and fiction in the cartoon scene are part of the thinking process, and as such feed into the growth point from which the language performance emerges. While blending diagrams do not reflect the dynamic processes in language production, the inclusion of conceptual integration theory parcels out the mental spaces within the GP and from there one can explore the origination of gesture and speech. 4.2 Integration across sources and modalities The following example is of an integrated blend drawn from both input sources with content produced in each modality. While this is not a mismatch at the level of particular words and gesture (as was, e.g., saying “adjacent” while gesturing opposition), this example demonstrates a mismatch at a strategy level (e.g, expressing both viewed and misreport details). Inputs to this blend are: (2a) Target scene: The cat climbs up the outside of a drainpipe to the bird’s window. (2b) Details to misreport about this scene: The cat rode up to the bird’s window on a telephone company crane.
Figure 3: “climbs up.”
106 AMY FRANKLIN
The speaker produces a blend of fact and fiction. In her explanation of the scene, she inserts a climbing gesture into her misreport (3) about the cat and the telephone crane.
(3)
So uh he uh he kinda climbs up the telephone crane and kinda just sits on the window.10
Figure 4: Integrated blend of “he kinda climbs up the telephone crane.”
Input 1 is again the viewed cartoon scene, including details of the drainpipe and the cat’s particular motion of climbing. The misreported details in Input 2 contain a telephone crane as well as the motion of riding the crane. Rather than reporting only the information found in Input 2 (as instructed by the experimenter), the speaker produced a blend across inputs. Unlike the first example, the blend cannot be split into an imagistic representation of one input along with linguistic output derived from the other. When we dissect this blend, we find that the visual image from the viewed scene is not completely preserved. Instead, the shared elements (generic space) of the cat and its upward trajectory are combined with the climbing motion from the viewed scene and the telephone 10 Prior to the utterance in example (3), the speaker states, “So he he looks at the side of the building and there’s like work being done on the telephone pole.” This is taken as evidence for an active deceptive description. The word “climbs” leaks through this larger confabulation.
BLENDING IN DECEPTION 107
crane from the misreported details. It is important to note that the speaker and her physical space are also included. The climbing gesture is produced from a character viewpoint. When the speaker takes the perspective of the cat, her hands become the cat’s hands as she moves them upward in an alternating motion. Narrators in the description of this scene frequently embody the cat. Given the speaker’s false knowledge of the crane and the cat’s ride upward, the pressure to maintain the character viewpoint may have led to the production of the climbing gesture. To produce a gesture is to give form to thought, and as McNeill (2002:8) states, “gestures are more or less elaborated depending on the importance of material realization to the existence of the thought.” Had this scene not been so readily embodied, the integration across inputs might have been different. As with the previous example, this example demonstrates as well how the balance between imagery and language encompasses the full space of all inputs, including the pressure to embody the actor through character viewpoint. The gesture and speech emerging from the all-inclusive GP reflect an underlying integrated unit. 5.
Conclusions
In this paper, I interpreted speech and gesture produced by speakers who have more than one underlying representation, in accord with Conceptual Integration Theory. By instructing participants to misreport details of a cartoon scene, I engineered the simultaneous activation of factual and fictive representations in their minds. The blends they generated contained details from each information source. Mismatches, which seem to contain divergent information in speech and gesture, are revealed by a blends analysis to be integrated units reflecting a composite mental space. The addition of a conceptual integration analysis to the GP model demonstrates that the dialectic of imagery and language derives from the entire representation of an event. There is no mismatch across modalities; rather, gesture and speech reflect our entire mental state, and this may include multiple strategies or representations that are simultaneously expressed. References Beattie, G. (2003). Visible thought: The new psychology of body language. Hove: Routledge. Cassell, J., McNeill, D., & McCullough, K. (1999). Speech-gesture mismatches: Evidence for one underlying representation of linguistic and nonlinguistic information. Pragmatics and Cognition, 7, 1-33. Coulson, S. (2000). Semantic leaps: Frame-shifting and conceptual blends in meaning construction. Cambridge: Cambridge University Press. Church, R. & Goldin-Meadow, S. (1986). The mismatch between gesture and speech as an index of transitional knowledge. Cognition, 23, 43-71. Ekman, P. & Friesen, W. (1969). The repertoire of nonverbal behavioral categories. Semiotica, 1, 49-98. Fauconnier, G. (1994). Mental spaces. New York: Cambridge University Press.
108 AMY FRANKLIN Fauconnier, G. (1997). Mappings in thought and language. Cambridge: Cambridge University Press. Fauconnier, G. & Turner, M. (2002). The way we think: Conceptual blending and the mind's hidden complexities. New York: Basic Books. Garber, P. & Goldin-Meadow, S. (2002). Gesture offers insight into problem-solving in adults and children. Child Development, 69, 75-84. Goldin-Meadow, S. (2003). Hearing gesture: How our hands help us think. Cambridge, MA: Belknap Press of Harvard University Press. Goldin-Meadow, S., Nusbaum, H., Garber, P., & Church, R. B. (1993). Transitions in learning: Evidence for simultaneously activated strategies. Journal of Experimental Psychology: Human Perception and Performance, 19, 92-107. Lane, L., Groisman, M. & Ferreira, V. (2006). Don't talk about pink elephants. Psychological Science, 17(4), 273-277. Liddell, S. (2003). Grammar, gesture and meaning in American Sign Language. Cambridge: Cambridge University Press. McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review, 92, 350-371. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press. McNeill, D. (2000). Language and gesture. Cambridge: Cambridge University Press. McNeill, D. (2002). Gesture and language dialectic. Acta Linguistica Hafniensia, 34, 7-37. McNeill, D. (2005). Gesture and thought. Chicago, University of Chicago Press. Melinger, A. & Kita, S. (2004). When input and output diverge: Mismatches in speech, gesture and image. Proceedings of the Cognitive Science Society. Morford, J. & Goldin-Meadow, S. (1992). Comprehension and production of gesture in combination with speech in one-word speakers. Journal of Child Language, 19, 559-580. Parrill, F. & Sweetser, E. (2004). What we mean by meaning: Conceptual integration in gesture analysis and transcription. Gesture, 4, 197-219. Perry, M., & Elder, A. D. (1997). Knowledge in transition: Adults' developing understanding of a principle of physical causality. Cognitive Development, 12, 131-157. Perry, M., Church, R., & Goldin-Meadow, S. (1988). Transitional knowledge in the acquisition of concepts. Cognitive Development, 3, 359-400. Thurnham, A., & Pine, K. (2006). The effects of single and dual representaions on children's gesture production. Cognitive Development, 21, 46-59.
A Dynamic View of Metaphor, Gesture and Thought Cornelia Müller European-University Viadrina Frankfurt (Oder)
The dynamic view of metaphor, gesture and thought grounds metaphor theory in language use. It is dynamic in that it focuses on online processes of metaphor production, the succession of metaphors over a discourse, and the sequential structure of interaction. It suggests that metaphoricity is a property of metaphors with a potential for cognitive activation. Metaphoricity is shown to be dynamically gradable (following the flow of focal attention), the intrapersonal degree of activation becoming visible through foregrounding techniques, such as gazing at, or making a metaphoric gesture visible. The dynamic view of metaphor argues that language is at its core multimodal with a dynamic interaction between use and system, and its primary locus is the dynamic process of meaning construction in language use.
1.
Introduction
The dynamic view of metaphor, gesture and thought that will be sketched in this chapter was formulated and developed when David McNeill was finalizing his work on Gesture and Thought (McNeill, 2005).1 In his 1992 book Hand and Mind, McNeill began to formulate a theory that at its core holds language to be a dynamic system. He suggested that what we see when people speak and gesture is the outcome of a dynamic and dialectic process between two modes of thought, one linguistic and categorical, and the other imagistic. This means that we think in terms of the expressive modalities we have at hand, each with its own properties. This is thinking for speaking and gesturing (to extend Slobin’s 1991 reformulation of Whorf’s Linguistic Relativity Hypothesis as dynamic ‘thinking for speaking’). Also in his 1992 book, McNeill pointed out a second dynamic aspect of language, namely one that reveals itself quite naturally when extending one’s 1
For a detailed account of the dynamic view of metaphor, see Müller (in press,a,b). The dynamic view gained tremendously through manifold inspiring discussions we had at the time and I am more than grateful for the insightful, kind, and generous support he offered at every stage of my work. David McNeill was the one who encouraged me to hold on to the idea of a dynamic view of metaphor and to work out its implications for a theory of metaphor. Most important was the discovery that, although our work had different foci—my focus was reduced to one facet of language only, namely metaphors—we shared the excitement and radical belief that language cannot be reduced to a static system or to a level of de-contextualized linguistic forms.
110 CORNELIA MÜLLER focus from single gesture-speech units to the unfolding of discourse. Here McNeill introduced Firbas’ (1971) concept of ‘communicative dynamism’ to the analysis of gestures, and he discovered that gestures in discourse carry newsworthy information; that is, information that carries high communicative dynamism and thus “moves the discourse forward.” In his recent book, Gesture and Thought, this dynamic view of language, gesture and thought became the guiding topic: In 1992 the emphasis was on how gestures reveal thought; now it is how gestures fuel thought and speech. The new step is to emphasize the ‘dynamic dimension’ of language–how linguistic forms and gestures participate in a real-time dialectic during discourse, and thus propel and shape speech and thought as they occur moment to moment (McNeill, 2005:3).
The dynamic view was a consequence of a fundamentally different approach to language. While traditional linguistic approaches objectified language as a product, the study of gesture compelled us to observe language as it is used in real time situations of talk in interaction. The gestures speakers create and integrate with speech simply could not be understood outside the context of their performance. When studying these forms of gesture, it became clear that traditional linguistics could not explain what was found in the data. To integrate into theory what was observed when people speak and gesture, a different take on language became necessary. This meant no less than turning the traditional approach to language upside down. Perhaps because reflections on language could only emerge with the advent of writing, and the power of this cultural technique is precisely its de-contextualized nature, traditional linguistics dissociated language from its natural home—face-to-face situations—from the very beginning of its scholarly considerations. The consequences of this de-contextualization are dramatic and long-ranging, shaping theories of language to the present. In Chomskyan linguistics, for example, the primary and therefore natural locus of language has been treated as a secondary and dismissible application or ‘performance’ of a universal linguistic ‘competence’. Similarly in Saussurian linguistics, natural language use has been treated as the implementation of a linguistic system—‘langue’—in the form of a secondary non-systematic ‘parole’. McNeill’s concept of the dynamic dimension of language relates to the ‘New Saussure’ and it will gain further support in this chapter. Taking language back to its natural home has fundamental consequences for a theory of language. Based on an analysis of metaphors, it will be suggested that language is at its core multimodal. By this I mean that language is an integration of speech and gesture at the level of the system and of use, and a dynamic product of modality-specific forms of thought; further, that it is shaped by cognitive processes, such as the flow of attention and foregrounding of information, as well as by interactive constraints, as for instance co-constructing activities of co-participants such as ratifications or requests for clarification (for detailed accounts of this argument, see Müller, in press, a; b).
DYNAMIC VIEW OF METAPHOR 111
2.
Cognitive and Interactive Processes—How Metaphors in Discourse are Dynamically Foregrounded
Traditional as well as contemporary theories of metaphor share a decontextualized and static view of the phenomenon (Black, 1993; Lakoff & Turner, 1989; see Müller, in press, a, for an overview). Metaphors tend to be considered isolable entities with fixed properties: They are categorized as ‘dead’ or ‘alive’ (that is, ‘vital’), or as ‘conventional’ (either ‘historical’ or ‘entrenched’) or ‘novel’. Interestingly, a core classificatory criterion of this static view is consciousness of ‘metaphoricity’, yet this criterion applies to a dynamic property of metaphors, namely to how speakers and listeners process metaphors in actual situations of language use. In order for a metaphor to be processed as a metaphor, metaphoricity needs to be activated cognitively. Because metaphoricity might or might not be activated when people use language, Müller (in press, a) puts forward the argument that, rather than characterizing metaphors only in a static manner as either ‘dead’ or ‘alive’, it is more appropriate to conceive of them as ‘sleeping’ or ‘waking’, depending on the degree of cognitive activation of metaphoricity in a given speaker at a given moment of time. Given that the criterion “consciousness of metaphor” underlies metaphor definitions since Classical Times, it appears quite striking that the use of metaphors has only lately begun to attract the systematic interest of metaphor scholars (Cameron & Low, 1999). Moreover, when considering how people employ metaphors spontaneously, we find that metaphors are not restricted to language; they show up in gestures as well, and when drawing upon the same source domain as a co-occurring verbal metaphoric expression they indicate activated metaphoricity for a given speaker at a given moment in time. How is this possible? The fact that a spontaneously created metaphoric gesture draws upon the same source domain as a verbal metaphoric expression indicates that the metaphoricity of the verbal expression is cognitively accessible and activated for a given speaker at this precise moment in time. Consider for instance somebody speaking about the highs and lows of a relationship while producing gestures which outline these highs and lows as in a sine curve. Traditional metaphor theories would account for these verbal expressions as ‘dead’ metaphors because they are supposedly not processed as metaphors; that is, there is no ‘consciousness of metaphoricity’, and no ‘cognitive activation of metaphoricity’ is presumed. Conceptual Metaphor Theory (e.g., Lakoff & Turner, 1989), in contrast, assumes that the metaphors of ordinary language are always ‘alive’, presuming constant ‘consciousness of metaphoricity’ or ‘cognitive activation of metaphoricity’. The dynamic view presented here, is that at one moment in time a metaphor might be processed as metaphor (that is, metaphoricity is cognitively activated, creating a ‘waking metaphor’) and at other times it might not be processed as metaphor (metaphoricity then not being cognitively activated, producing a ‘sleeping metaphor’).
112 CORNELIA MÜLLER Furthermore, when accounting for how gesture, speech, gaze and gesture space are structured in precise relation to one another and to their interactive organization we find that metaphors may structure entire segments of discourse, as in the allegories of literary texts. We also find that metaphoricity may be dynamically foregrounded as talk in interaction moves along. Metaphoricity is backgrounded in sleeping metaphors and foregrounded in waking metaphors. The activation of metaphoricity is hence related to the structuring of information in discourse, regarded from a Gestalt Psychology perspective. Foregrounded information is interactively significant information. The following example illustrates these two points: the dynamic activation and foregrounding of metaphoricity over a stretch of discourse. A detailed account of the example is provided in Figure (1), and in Müller (in press, b). In Figure (1) the speaker, sitting to the left, characterizes her first love relation as lasting for several years, with highs and lows that eventually led to its termination. In the four segments, she describes the course of the relationship as a path that went up and down many times, with an overall tendency to move downward.2 Gesture (1)
1 2 1 2
JA es war eben ein relatives auf und [ab| Yes it was | basically a kind of [up and down [gaze to hand - - -- -->
Verbal-gestural metaphor established:
-love relation as “a kind of up and down” -head moves down, slightly & quickly
& gaze to hand → onset of activated metaphoricity
2
Gesture (2)
3 4
mit der= |mit der ständigen tendenz bergAB\](.hh)
3 with the=| with the permanent 4 tendency downhill\](.hh) - - - --> - - - -->gaze to hand]
Verbal-gestural metaphor reformulated & specified: -relation has “permanent tendency downhill” -head moves down, larger & slower -right hand with glass moves downward too & gaze to hand → increasing activation of metaphoricity
Speech transcription: text capitals indicate emphasis; = cutting off an utterance and beginning of an interruption; / rising intonation; \ falling intonation; .hh audible inhalation; (.) micro-pause; (--) longer pause (each dash for about 0.25 second pause); a:: lengthening of vowels. Gesture transcription: [text] Square brackets in the verbal line indicate beginning and end of the gesture phrase; text boldfaces indicate the gesture stroke; || vertical lines indicate change of configuration within one gesture phrase.
DYNAMIC VIEW OF METAPHOR 113
Start of gesture (3)
End of gesture (4)
5 6
7 8
5 6
[aber es ging= | ne es startete SO (---)
und flachte dann so:: (.) weiter ab]
but it went= 7 and then flattened like this well it began like THIS (---) 8 (.) out [ gaze to hand - - - - -->- - - - -->- - - - --> - - - - --> gaze follows hand]
Breaks off before verbal metaphor is uttered-
Verbal metaphor“flattened ...out” reformulated & specified Gesture(3) reformulates metaphor “up and down” Gesture (3) continued -changes gesturing hand -left hand outlines huge amplitude & gaze follows gesture & gaze continues to follow gesture → gesture performed in focal attentional space → gesture still in focal attentional space → high activation of metaphoricity → high activation of metaphoricity
Figure 1: The course of a love relation a, “a kind of up and down”: successive and increasing activation and foregrounding of metaphoricity.
A close description of the deployment of metaphors in gesture and speech reveals that metaphoricity is dynamically activated over the entire segment of discourse. In (1) we may infer that metaphoricity is activated because the spoken metaphor (“down”) is accompanied by a gesture (the head moving down), both speech and gesture drawing on the same source domain. In (2) metaphoricity is expressed in three modalities at once—head, hand and speech—and it is foregrounded (that is, turned into an interactively significant object) through gaze. Thus metaphoricity appears to be more highly activated in (2) than in (1). In (3) we infer that metaphoricity is more highly activated because a metaphoric gesture is syntactically integrated, it replaces the verbal metaphoric, it is larger and longer than the preceding gestures, and it receives the speaker’s and recipient’s gaze (that is, is in the focus of attention of both participants). Finally in (4) we infer that metaphoricity continues to be highly activated because the metaphor is now again expressed in speech and gesture, is also syntactically integrated, continues to be large and long, and continues to receive the gaze of both participants. In short, this is an example of metaphor use in which the speaker gradually increases the activation of metaphoricity expressed in both speech and gesture, and successively foregrounds it. The dynamic foregrounding of metaphors, as described above, is intertwined with interactive constraints. Looking at the coordination, integration, and fore-
114 CORNELIA MÜLLER grounding of gestures over stretches of discourse, it becomes clear that the metaphoricity of words, gestures, and gesture-word combinations depends critically on the organization of the attentional focus of the speaker, which depends in turn on both intrapersonal intent and interactive constraints such as requesting specification and ratifying information. The varying degrees of metaphoricity demonstrate that there is a constant flow of attention (part of what Chafe, 1994, terms the ‘flow of consciousness’), which may successively build up peaks that themselves may trigger subsequent peaks in this dynamic movement of attention. Analyzing how metaphoric meaning is organized and distributed, moment-by-moment over multiple modalities, uncovers this attentional structure within the flow of consciousness. What is in the focus of attention of the speaker is foregrounded in a verbal-gestural utterance. It follows that, for a co-participant, what is foregrounded by the speaker are the salient aspects of the communicative ‘message’ for the listener. Regarded in this way, attention becomes empirically observable through foregrounding techniques and their resulting salience effects. This is because the organization of the speaker’s attention is interactively relevant; it is presumably what makes a speaker tell a story the way she or he wants a listener to hear it, and this is what we as analysts may exploit systematically. The foregrounding of information within a discourse may be best characterized as a complex dynamic activity which brings together intrapersonal and interpersonal processes. 3.
Multimodality of Metaphors: Dynamic Interaction of Use and System
In the example discussed above we have seen that metaphors may be expressed in different modalities at the same time. Metaphoricity was first rather weakly active in an entrenched metaphoric expression (“it went up and down”) with a tiny downward movement of the head, and it then was successively more activated through verbal and gestural reformulations and foregrounding techniques, such as gaze direction and the spatial characteristics of the gestural movement (cf. Cienki & Müller, in press; in preparation). Notably, all multimodal metaphors draw upon the same source domain: LOVE AS A JOURNEY. What are the theoretical implications of these observations? Do they imply that all observed metaphors are equally ‘alive’ or ‘vital’? It seems evident that they are not ‘dead’; at least these creative gestural and verbal elaborations strongly indicate that they are indeed processed as metaphors as they are produced. On the other hand, it also seems obvious that there are different degrees of activated metaphoricity of entrenched metaphors. A solution to this dilemma is to make a systematic distinction between categories that apply to the level of a linguistic system and those that apply to language use. At the level of a linguistic system, all spoken metaphors we have examined are entrenched metaphors. With regard to actual use they are waking metaphors which show different degrees of activation. At the level of a linguistic system, Müller (in press) distinguishes historical, entrenched, and novel metaphors, based on the criteria of ‘transparency’ and
DYNAMIC VIEW OF METAPHOR 115
‘conventionalization’. Transparency of a verbal metaphor is given as long as there are non-metaphoric forms that make use of the same source domain. Historical metaphors are highly conventionalized and no longer transparent; entrenched metaphors are conventionalized but still transparent; and novel metaphors are not conventionalized but are transparent. These categories are in principle static, yet have fuzzy boundaries because of a permanent tendency toward conventionalization and loss of transparency; that is, metaphors tend to move from novel to entrenched to historical. How exactly these movements are driven is a topic of language change that deserves more attention within metaphor theory than it receives at the moment. For entrenched and novel metaphors, linguistic systems interact with level of use in that they offer language forms with a metaphoric potential (cf. Cameron, 1999:108). Entrenched and novel metaphors may be used in different ways, that is, their metaphoric potential may be activated to different degrees. At the level of use they are termed ‘sleeping’ or ‘waking’, depending on the degree of activation of metaphoricity they show, and they form poles of one dynamic category. In the example above, all metaphors can be considered ‘waking’ because they are subject to foregrounding. Depending on the extent to which they are foregrounded, they are waking to different degrees. The multimodal nature of metaphors offers insights into the dynamic character of metaphoricity at the level of use, and it indicates that the system offers a static repertoire of potential metaphors that is creatively exploited as people converse with each other. Use is not reducible to application of a static system, but follows its own dynamic rules. The system is sedimentation of and a tool box for language use. (For a detailed account of this argument see Müller, in press, a; b.) 4.
Summary
We have seen that metaphors are multimodal, and that metaphoricity is subject to a dynamic management of information flow: Activation of metaphoricity is a dynamic process which unfolds over time and is part of an online and adhoc process of constructing and structuring meaning between co-participants. This is why Müller (in press,a) introduces, at the level of use, a dynamic category of metaphor. In other words, a particular metaphoric expression may be waking in one context and sleeping in another one. Metaphors in actual intervals of speaking and gesturing are to be considered inherently dynamic, and the degree of activated metaphoricity is a consequence of cognitive and interactive processes such as the foregrounding and focusing of attention. 5.
Conclusion
The dynamic view sketched in this chapter is consistent with other recent approaches to cognition and language use. Within an applied linguistics framework, Cameron and Low’s (1999) proposal takes metaphor use as point of departure for a theory of metaphor, integrating distinctions between the ‘products’
116 CORNELIA MÜLLER and ‘processes’ of metaphors and accounting for different kinds of metaphors at the levels of both system and use (Steen & Gibbs, 1999). The observations presented also show how empirical observations may inform theoretical conceptualizations (Cienki & Müller, in press). The dynamic view is in harmony with David McNeill’s dynamic theory of language, gesture and thought. The multimodal nature of metaphors implies a dynamic interaction between two modes of thinking. There is thinking for speaking and gesturing. But it also directs our attention to a dynamic interaction between the linguistic system and the level of language use. The dialectic between the two forms of thought in the Growth Point is triggered by dynamically changing foci of attention (McNeill, 1992; McNeill & Duncan, 2000). Gestures are primes for attentional foci, but may be supplemented by different kinds of foregrounding techniques. This is what communicative dynamism in McNeill’s terms is about; the flow of communication as a dynamic product of thinking for speaking and gesturing in interaction: “The difference is that now I present gestures as active participants in speaking and thinking. They are conceived of as ingredients in an imagery-language dialectic that fuels speech and thought” (McNeill, 2005:3). References Black, M. (1993). More about metaphor. In A. Ortony (Ed.), Metaphor and thought (2nd ed.) (pp.19-41). Cambridge: Cambridge University Press. Cameron, L. & Graham L. (1999). Researching and applying metaphor. Cambridge: Cambridge University Press. Chafe, W. (1994). Discourse, consciousness, and time. The flow and displacement of conscious experience in speaking and writing. Chicago / London: University of Chicago Press. Cienki, A. and Müller, C. (in press). Metaphor, gesture and thought. In R.W. Gibbs (Ed.), Cambridge handbook of metaphor and thought,. Cambridge: Cambridge University Press. Cienki, A. & Müller, C. (in preparation). When speech and gesture come together. Forms of multimodal metaphor in the use of spoken language. In C. Forceville & E. Urios-Aparisi (Eds.) Multimodal metaphor, Berlin / New York: Mouton De Gruyter. Firbas, J. (1971). On the concept of communicative dynamism in the theory of functional sentence perspective. Philologia Pragensia, 8, 135–144. Lakoff, G. & Turner, M. (1989). More than cool reason. A field guide to poetic metaphors. Chicago: University of Chicago Press. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press. McNeill, D. (2005). Gesture and Thought. Chicago: University of Chicago Press. McNeill, D. & Duncan, S. (2000). Growth-points in thinking-for-speaking. In D. McNeill (Ed.), Language and gesture (pp.141-161). Cambridge: Cambridge University Press. Müller, C. (in press, a). Metaphors. Dead and alive, sleeping and waking. A cognitive approach to metaphors in language use. Chicago: University of Chicago Press. Müller, C. (in press, b). What gestures reveal about the nature of metaphor. In A. Cienki and C. Müller (Eds.), Metaphor and gesture, Amsterdam / Philadelphia: John Benjamins. Slobin, D. I. (1991). Learning to think for speaking: Native language, cognition, and rhetorical style. Pragmatics, 1, 1-25. Steen, G.J. & Gibbs, R.W., Jr. (1999). Introduction. In R.W. Gibbs, Jr. & G.J. Steen (Eds.), Metaphor in cognitive linguistics (pp.1-8). Amsterdam / Philadelphia: John Benjamins.
Second Language Acquisition from a McNeillian Perspective Gale A. Stam National-Louis University
Most second language acquisition research has concentrated on learners’ speech. This paper argues that it is necessary to look at both learners’ speech and gesture in order to better understand second language acquisition. It provides a summary of the second language acquisition process and the types of studies that have been conducted in the field, and it discusses how gesture can be used to investigate learners’ thinking for speaking.
1.
Introduction: McNeill’s Theory
Traditionally, language has been viewed as encompassing only speech. Bodily movements including gestures have been viewed as paralinguistic accessories to language, not part of it. McNeill’s theory (1992; 2005) of language is revolutionary in this regard. He argues that speech and gesture arise from the same underlying mental process and are a single, integrated system. According to his theory, both speech and gesture develop from a ‘growth point’ that has both imagistic and verbal aspects. McNeill (2005:25) proposes a model for verbal thought—“a ‘language-imagery’ or language-gesture dialectic”—in which the static and dynamic dimensions of language are combined. 1.1
McNeill’s Methodology
To test this theory and study the relationship between language and thought, McNeill (1992) developed a methodology for analyzing natural discourse that includes the observation of both speech and gesture. According to Vygotsky (1986), the relationship between thought and language is an internal process, with a continual movement back and forth from thought to language and vice versa. Vygotsky pointed out that the only way to study internal processes is to externalize them experimentally. The methodology that McNeill developed does just that. By focusing attention on both speech and gesture, it gives analysts an enhanced window onto the mind through which they can observe mental representations and processes (McNeill, 1992). The methodology has been used by McNeill and other researchers to examine aspects of speech and gesture within various populations, such as
118 GALE STAM children and adult native speakers of different languages, and individuals with disorders of language or spatial cognition due to hemispheric brain damage. It has been used to explore whether there are changes in speech and gestures when the narrator is talking to one person or two other people and to strangers or friends. In addition, it has been used to test Slobin’s (1991) ‘thinking for speaking’ hypothesis among native language speakers (McNeill, 1997; McNeill & Duncan, 2000) and applied to second language acquisition to investigate second language learners’ thinking for speaking patterns (see citations in Stam, 2006b). 2.
Second language acquisition
2.1
The second language acquisition process
Learning a language involves not only learning linguistic forms, but learning how to use these forms appropriately in different contexts. Being proficient in a language includes knowing what needs to be marked and expressed in the language versus what can be inferred by listeners (Berman & Slobin, 1994). Slobin (1991) has proposed that speakers learn a particular way of thinking for speaking in first language (L1) acquisition, and Stam (1998) has proposed that second language learners may have to learn another way of thinking for speaking in order to be proficient in their second language (L2). The notion that second language acquisition involves the learning of different patterns of thinking for speaking is an important concept to consider. Cross-linguistic research on the expression of motion events has established that speakers of typologically different languages have different patterns of thinking for speaking about motion and spatial relations (see Stam, 2006a, for representative studies). Therefore, in order to express motion and spatial relations in their L2 as native speakers would, learners whose first languages are typologically different (Talmy, 1985) from their second languages need to learn other patterns of thinking for speaking. Second language acquisition is similar to first language acquisition in that learners pass through a number of developmental stages just as children do in acquiring their first language (Dulay & Burt, 1974; Bailey, Madden & Krashen, 1974). Despite this similarity, the two processes differ. In second language acquisition, learners have already mastered the grammatical structures and semantic distinctions of one language. Also, depending on the L2 learners’ age, the second language acquisition process may not play the same role as the first language acquisition process does in social and cognitive development (Klein, 1986). In addition, learners’ first languages frequently have an influence on their acquisition of a second language. There may be both positive and negative transfer in morphology, phonology, syntax, semantics, and the lexicon. Furthermore, learners may have patterns of thinking for speaking about temporality, space, and direction derived from their first language (Slobin, 1996;
SECOND LANGUAGE ACQUISITION 119
Berman & Slobin, 1994; McNeill & Duncan, 2000) that can affect their acquisition of a second language. Slobin has claimed that many language patterns acquired in childhood are “resistant to restructuring in adult second language acquisition” (Slobin, 1996:89), and Kellerman (1995) has proposed in his ‘transfer to nowhere principle’ that adult second language learners may not even be aware of how languages vary and may learn L2 linguistic forms, but apply them from an L1 perspective. 2.2
Learners’ interlanguage systems
In the process of acquiring a second language, learners develop their own language systems, often termed ‘interlanguage systems’, (Lightbown & Spada, 1999; Gass & Selinker, 1992; Klein & Perdue, 1997). These systems include aspects of the learners’ previously learned languages, aspects of the target language, and aspects that tend to occur in all interlanguage systems, such as the simplification and omission of function words (Lightbown & Spada, 1999). The systems are influenced by the typological differences in grammatical categories, form and meaning, and ‘conceptual organization’ between the previously learned languages and the new language (Ramat, 2003:14). Interlanguage systems are dynamic. They change as learners become more proficient in their L2, although the degree to which they change varies. Some learners may fossilize in their grammatical development while continuing to add vocabulary; others may continue to develop grammatically. Because it is difficult to view the rules and structures learners have internalized, production errors have been used to assess learners’ language systems (Ellis, 1986). Although this method has merit, it does not provide a full picture of learners’ language systems because learners may produce grammatically correct utterances, but do so from an L1 perspective (Klein, 1986). To have a complete picture of learners’ progress in acquiring their L2, it is necessary to look at both their speech and gestures (Stam, 2006a; 2006b). Alone, speech tells us whether learners can produce utterances, but not how they are thinking. Gestures provide this additional information. By looking at what gestures produce and where the gestures co-occur with speech, we can determine what learners are thinking and whether they are thinking in their L1 or in their L2. 2.3
Gesture and nonverbal communication in L2 acquisition research
Second language acquisition research has been concerned with the second language acquisition process, the learner, and factors that affect the acquisition process. As a field of study, it grew out of classroom language teaching following World War II and the Contrastive Analysis Hypothesis, a behaviorist theory which viewed all errors in the L2 as the result of interference from the learner’s L1 (Newmeyer & Weinberger, 1988).
120 GALE STAM Since the inception of second language acquisition as a field, research has concentrated on contrastive analysis, error analysis, performance analysis, discourse analysis, language transfer, input, and learner variation (see LarsenFreeman, 1991 for a review of the first twenty-five years of second language acquisition research). Among the issues1 that have been explored are child and adult second language acquisition, differences between acquisition and learning, social and psychological factors affecting second language acquisition, age and the critical period hypothesis, formal (classroom) and informal language acquisition, interlanguage and transfer, and communication strategies (see Stam, 2006a for representative studies). The majority of this research has focused on learners’ spoken or written language, not their speech and gestures. However, in the last thirty years there have been a growing number of papers and empirical studies2 that have considered nonverbal communication and gesture and their place in second language and foreign language teaching and research. Some (Sainsbury & Wood, 1977; Marcos, 1979; Nobe, 1993) looked at how language fluency affects the frequency of gesturing of subordinate bilinguals and foreign language learners and found that speakers produce more gestures in their nondominant language than their dominant one. Some (Neu, 1990; Kellerman, 1992; Jungheim, 1995) argued that communicative competence in a foreign or second language involved more than just linguistic competence, while others (von Raffler-Engel, 1980; Wylie, 1985; Pennycook, 1985) advocated for the teaching of kinesics, emblems, and proxemics in the foreign and second language classroom. In addition, several have empirically investigated the relationship between speech and gesture in L2 acquisition. Gullberg (1998) examined foreign language learners’ use of gestures as communication strategies and found that learners used gestures to elicit words; clarify problems of co-reference; and signal lexical searches, approximate expressions, and moving on without resolution. Sherman & Nicoladis (2004) looked at whether advanced L2 learners used more symbolic gestures in their L1 and more deictic gestures in their L2 and found that the learners used more deictic gestures per word in their L2, but did not use more symbolic gestures in their L1. Within a Vygotskian framework, McCafferty explored the role of gesture in L2 acquisition in several different contexts. He (McCafferty, 1998) examined the relationship between L2 gesture and private speech and found that almost all forms of object-regulated and other-regulated private speech had accompanying gestures, while only one form of self-regulated private speech did. With Ahmed (McCafferty & Ahmed, 2000), he investigated whether Japanese learners of English would acquire gestures of the abstract under exposure to English in naturalistic and instruction-only conditions and found that the naturalistic learners acquired the American one-handed container gesture of 1
The scope of this paper does not permit discussion of all the studies on second language acquisition; therefore, I have provided a sample of the types of issues that have been researched. 2 See Gullberg, (2006) for additional examples of studies.
SECOND LANGUAGE ACQUISITION 121
the abstract. In addition, McCafferty (2002) examined the interactions of a Taiwanese learner of English and a native English speaker to see how gestures were used in the co-construction of meaning in creating zones of proximal development and how the same learner used gestures as a mechanism to help him think and organize his discourse (McCafferty, 2004). While these speech and gesture studies have argued that both speech and gesture must be considered in studying second language acquisition, they have not used gestures as a means to investigate learners’ thinking patterns as McNeill has done (McNeill, 1992; 1997; 2005; McNeill & Duncan, 2000). This aspect of the McNeillian perspective has been applied to second language acquisition research by studies of ‘thinking for speaking’ and gesture in second language acquisition. 2.3.1 Thinking for speaking in second language acquisition Based on Talmy’s (1985) classification of languages as verb-framed (e.g., Spanish) or satellite framed (e.g., English), Berman and Slobin (1994) conducted a cross-linguistic study of L1 narrative development to test Slobin’s (1991) thinking for speaking hypothesis. They found that Spanish speakers tend to describe states and elaborate descriptions of settings while English speakers tend to describe processes and accumulate path components, adverb particles and prepositions. McNeill & Duncan (2000) further investigated these patterns of thinking for speaking among native speakers of Spanish and English by looking at both their speech and gesture. They found speech-gesture synchrony in the expression of motion events: Spanish speakers’ path gestures tend to fall on the verb and English speakers’ path gestures tend to fall on the satellite. This speechgesture synchrony for native speakers is important as it provides a means by which to investigate second language acquisition. Stam (1998; 2006a; 2006b), Kellerman & van Hoof (2003), and Negueruela, Lantolf, Rehn Jordan & Gelabert (2004) used speech-gesture synchrony to explore whether learners’ thinking for speaking patterns about motion change when they acquire a second language. All looked at native speakers of Spanish and English and Spanish learners of English, and all replicated previous findings regarding native speakers’ thinking for speaking patterns in both speech and gesture (McNeill & Duncan, 2000). Specifically, Spanish speakers express path linguistically with a verb and their path gestures tend to fall on the verb, while English speakers express path linguistically with a satellite (an adverb or preposition) and their gestures tend to fall on the satellite. However, as a consequence of differences in study design, the results of these studies varied regarding L2 learners. Kellerman & van Hoof (2003) and Negueruela et al. (2004) looked only at the frequency of gestures co-occurring with verbs and satellites. Kellerman and van Hoof found that the same percentage of path gestures (65%) of the Spanish learners of English fell on the verb in both their L1 and their L2 narrations while Negueruela et al. found that 23% to 33% of
122 GALE STAM the path gestures of the Spanish learners of English3 fell on the verb. Both concluded that the L2 learners were still thinking for speaking in their L1. Stam (2006a; 2006b), on the other hand, looked at the expressions used linguistically to express path, the frequency of gestures co-occurring with motion event speech elements, and the interaction of speech and gesture. She found that the L2 English learners’ thinking for speaking patterns had both linguistic and gestural aspects of their L1 and L2 thinking for speaking patterns. Linguistically, L2 learners sometimes expressed path with a satellite in English, but they did not accumulate path components within a single clause in speech, with the exception of one learner. Gesturally, there was a decrease in the percentage of path gestures cooccurring with verbs and an increase in the number of path gestures co-occurring with satellites in the learners’ L2 narrations compared to their L1 narrations, but the percentages alone were misleading because they did not take into account whether speech elements were present or missing. She also found that there were developmental aspects to the learners’ speech and gesture production regarding what aspects of motion events were focused on, compared to L1 English speakers (e.g., interiority of ascent versus setting). She concluded that the learners’ L2 thinking for speaking patterns both linguistically and gesturally reflected the interlanguage systems that the learners had constructed. 3.
Conclusion: Future of second language acquisition research
The application of the McNeillian perspective that speech and gesture are a single, integrated system and that examining gesture as well as speech provides researchers with an enhanced window onto the mind (McNeill 1992; 2000) is still in an emergent stage within the field of second language acquisition. However, as the L2 speech and gesture studies mentioned in this paper illustrate, looking at both learners’ speech and gesture holds promise for understanding the second language acquisition process, learners’ interlanguage systems, and their thinking for speaking patterns. References Bailey, N., Madden, C., & Krashen, S. D. (1974). Is there a “natural sequence” in adult second language learning? Language Learning, 24, 235-43. Berman, R. A. & Slobin, D. I. (1994). Relating events in narrative: A crosslinguistic developmental study. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. Dulay, H. C. & Burt, M. K. (1974). Natural sequences in child second language acquisition. Language Learning, 24(1), 37-53. Ellis, R. (1986). Understanding second language acquisition. New York: Oxford University Press. Gass, S. M. & Selinker, L. (Eds.) (1992). Language transfer in language learning. Revised edition. Amsterdam: John Benjamins Publishing Company.
3
Neguerela et al. (2004) did not compare the speech and gesture of the Spanish learners of English in both their L1 and L2.
SECOND LANGUAGE ACQUISITION 123
Gullberg, M. (1998). Gesture as a communication strategy in second language discourse. Lund, Sweden: Lund University Press. Gullberg, M. (2006). Some reasons for studying gesture and second language acquisition (Hommage à Adam Kendon). IRAL, 44, 103-124. Jungheim, N. O. (1995). Assessing the unsaid: The development of tests of nonverbal ability. In J. D. Brown & S. O. Yamashita (Eds.), Language testing in Japan (pp.149-165). Tokyo: The Japan Association for Language Teaching. Kellerman, E. (1995). Crosslinguistic influence: Transfer to nowhere? Annual Review of Applied Linguistics, 15, 125-150. Kellerman, E. & van Hoof, A-M. (2003). Manual accents. IRAL, 41, 251-269. Kellerman, S. (1992). ‘I see what you mean’: The role of kinesic behaviour in listening and implications for foreign and second language learning. Applied Linguistics, 13(3), 239-258. Klein, W. (1986). Second language acquisition. Cambridge: Cambridge University Press. Klein, W. & Perdue, C. (1997). The basic variety (or: couldn’t natural languages be much simpler?). Second Language Research, 13(4), 301-347. Larsen-Freeman, D. (1991). Second language acquisition research: Staking out the territory. TESOL Quarterly, 25(2), 315-350. Lightbown, P. & Spada, N. (1999). How languages are learned. Revised Edition. Oxford, England: Oxford University Press. Marcos, L. R. (1979). Hand movements and nondominant fluency in bilingual. Perceptual and Motor Skills, 48, 207-214. McCafferty, S. G. (1998). Nonverbal expression and L2 private speech. Applied Linguistics, 19(1), 73-96. McCafferty, S. G. (2002). Gesture and creating zones of proximal development for second language learning. The Modern Language Journal, 86(2), 192-203. McCafferty, S. G. (2004). Space for cognition: gesture and second language learning. International Journal of Applied Linguistics, 14(1), 148-165. McCafferty, S. G. & Ahmed, M. K. (2000). The appropriation of gestures of the abstract by L2 learners. In J. P. Lantolf (Ed.), Sociocultural theory and second language learning (pp.199218). Oxford, England: Oxford University Press. McNeill, D. (1992). Hand and mind. Chicago: The University of Chicago Press. McNeill, D. (1997). Growth points cross-linguistically. In J. Nuyts & E. Pederson (Eds.), Language and conceptualization (pp.190-212). Cambridge: Cambridge University Press. McNeill, D. (2005). Gesture & thought. Chicago: The University of Chicago Press. McNeill D. & Duncan, S. (2000). Growth points in thinking-for-speaking. In David McNeill (Ed.), Language and gesture (pp.141-161). Cambridge: Cambridge University Press. Negueruela, E., Lantolf, J. P., Rehn Jordan, S., & Gelabert, J. (2004). The “private function” of gesture in second language speaking activity: a study of motion verbs and gesturing in English and Spanish. International Journal of Applied Linguistics, 14(1), 113-147. Neu, J. (1990). Assessing the role of nonverbal communication in the acquisition of communicative competence in L2. In R. Scarcella, E. S. Andersen, & S. D. Krashen (Eds.), Developing communicative competence in a second language (pp.121-138). New York: Newbury House Publishers. Newmeyer, F. J., & Weinberger, S. H. (1988). The ontogenesis of the field of second language learning research. In Suzanne Flynn & Wayne O'Neill (Eds.), Linguistic theory in second language acquisition (pp.34-45). Dordrecht, The Netherlands: Kluwer Academic Publishers. Nobe, S. (1993). Cognitive process of speaking and gesturing: A comparison between first language speakers and foreign language speakers. MS Thesis. Department of Psychology. Committee on Cognition and Communication. The University of Chicago. Pennycook, A. (1985). Actions speak louder than words: Paralanguage, communication and education. TESOL Quarterly, 19(2), 259-282. Ramat, A. G. (Ed). (2003). Typology and second language acquisition. Berlin: Mouton de Gruyter.
124 GALE STAM Sainsbury, P. & Wood, E. (1977). Measuring gesture: Its cultural and clinical correlates. Psychological Medicine, 7, 63-72. Sherman, J. & Nicoladis, E. (2004). Gestures by advanced Spanish-English second-language learners. Gesture, 4(2), 143-156. Slobin, Dan I. (1991). Learning to think for speaking: Native language, cognition, and rhetorical style. Pragmatics, 1, 7-26. Slobin, D. I. (1996). From “thought and language” to “thinking for speaking.” In J. J. Gumperz & S. C. Levinson (Eds.), Rethinking linguistic relativity (pp.70-96). Cambridge: Cambridge University Press. Stam, G. (1998). Changes in patterns of thinking about motion with L2 acquisition. In S. Santi, I. Guaïtella, C. Cavé, & G. Konopczynski (Eds.), Oralité et gestualité: Communication multimodale, interaction (pp. 615-619). Paris: L'Harmattan. Stam, G. (2006a). Changes in patterns of thinking with second language acquisition. Ph.D. Dissertation. Department of Psychology. Committee on Cognition and Communication. The University of Chicago. Stam, G. (2006b). Thinking for speaking about motion: L1 and L2 speech and gesture. IRAL, 44, 145171. Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language typology and syntactic Description: Vol 3. Grammatical categories and the lexicon (pp.57-149). Cambridge: Cambridge University Press. von Raffler-Engel, W. (1980). Kinesics and paralinguistics: A neglected factor in second language research. Canadian Modern Language Review, 36(2), 225-237. Vygotsky, L. (1986). Thought and language. Cambridge, MA: The MIT Press. Wylie, L. (1985). Language learning and communication. The French Review, 58 (6), 777-785.
Environmental Context and Sociality
Face-to-face Dialogue as a Micro-social Context The Example of Motor Mimicry∗ Janet Bavelas University of Victoria
Face-to-face dialogue proceeds moment by moment, as the participants constantly and precisely respond to each other. Their reciprocal actions are the micro-social context of language use and social interaction. The old puzzle of motor mimicry (e.g., wincing at someone else’s injury) illustrates the benefit of moving outside the boundary of the individual and examining actions in their micro-social context. Motor mimicry is not simply imitative or emotional; the evidence demonstrates that it is a micro-social communicative act with a significant role in face-to-face dialogue. Unfortunately, experimental evidence demonstrating this role has usually been reinterpreted as evidence for traditional individual theories, ignoring the micro-social level.
1.
Introduction
What does it mean to say that language is social? Often, ‘social’ means either society writ large or social stimuli for the mental processes of individuals. Although the societal and the individual approaches contribute to a complete perspective, both are abstracted from direct observation. They lie on either side of—and omit entirely—an immediate, observable focus on language use as social. Language happens in the moment-by-moment micro-social context consisting of the observable acts of interlocutors as they actually use language in face-to-face dialogue. That is, what individuals say and do in face-to-face dialogue is intimately affected by what the other person is saying and doing at that moment and by the immediate effect that their own actions will have on the other person. This chapter will offer a case for the importance of the micro-social context and also some ideas about why it is so consistently neglected. 1.1
The importance of face-to-face dialogue
A diverse group of scholars has proposed that face-to-face dialogue is the basic or fundamental site of language use (e.g., Bavelas, 1990; Bavelas, ∗
I am pleased to acknowledge both the many collaborators cited herein and research grants from the Social Sciences and Humanities Research Grant of Canada.
128 JANET BAVELAS Hutchinson, Kenwood, & Matheson, 1997; Clark, 1996; Fillmore, 1981; Garrod & Pickering, 2004; Goodwin, 1981; Levinson, 1983; Linell, 1982, 2005). Face-toface dialogue must have been the first language of the earliest humans; it is the infant’s first language developmentally; and it remains the language of everyday interactions. A corollary assumption, which I share with most of the above authors, is that, unlike written forms of language use, face-to-face dialogue includes both audible acts (words and their prosody) and visible ones (such as cooccurring hand and other bodily gestures, facial displays, and gaze) that are complementary to or even momentarily replace words (Bavelas & Chovil, 2006). One important feature of face-to-face dialogue is its affordance of microsocial interaction, that is, a high level of reciprocity and mutual influence. It represents one end of a continuum of the probability and speed of a response from the other person. In published text, for example, there is a low probability and high latency of response; if the readers respond to the writer at all, it is long after the act of writing. Letters or especially email are faster, and both are more likely (but not certain) to receive a reply. In a telephone dialogue, exchanges can occur rapidly, and even a momentary failure to respond is noticeable. In face-to-face dialogue, responses are highly probable and extremely fast; frame-by-frame microanalysis reveals that addressees often provide simultaneous feedback to the speaker (e.g., Bavelas, Coates, & Johnson, 2000). They nod or say “Mhm” and constantly convey information to the speaker by their ever-changing facial displays of attentiveness, confusion, alarm, or amusement, among many others. These responses often overlap the speaker’s turn (Goodwin, 1986), but they are not considered interruptions. Indeed, such reciprocal responses are demonstratively essential to the speaker, whose narrative falters when they are absent (Bavelas et al., 2000). Thus, because of its reciprocity and precision, language use in face-to-face dialogue is not simply abstractly social. A participant’s contribution does not originate autonomously in his or her mind (or from “language” as an abstraction) and does not evaporate into a social vacuum. Rather, each contribution is part of a social interaction at the micro-level; fortunately, with video technology, these essential details are also directly observable at that level. There is accumulating experimental evidence for micro-social effects on both verbal and nonverbal behaviours. For example, Clark and Wilkes-Gibbs (1986) showed that speakers often used verbal references that the addressee had helped to shape. Schober and Clark (1989) then demonstrated that these terms were not as clear to overhearers who did not participate in their moment-bymoment creation. Other experiments have shown that when speakers became aware that they did or did not have common ground with their addressee, they immediately adjusted their verbal descriptions (Isaacs & Clark, 1987) or hand gestures (Gerwing & Bavelas, 2004). Evidence from McNeill’s lab has revealed other micro-social effects on hand gestures, which were formed in relation to the other person’s spatial perspective (Özyürek, 2000; 2002) or previous gesture (Furuyama, 2000). In Clark and Krych’s (2004) experiment, addressees often
MICRO-SOCIAL CONTEXT 129
began to gesture relevant actions during the speaker’s verbal instructions; speakers interrupted or changed their instructions, even in mid-sentence, in order to confirm (or correct) the addressee’s proposed action. Some gestures have interactive rather than referential functions (Bavelas, Chovil, Lawrie, & Wade, 1992) and are specialized for the coordination of face-to-face dialogue, where they have predictable micro-effects on the addressee’s next act (Bavelas, Chovil, Coates, & Roe, 1995). Facial displays can also have micro-social functions. For example, Brunner’s (1979) microanalysis confirmed that addressees often precisely timed their smiles to speakers’ utterances so that these smiles served as back-channel responses, just as words and nods do. In this chapter, I will illustrate the utility and importance of a micro-social perspective with a program of research on the historical phenomenon of ‘motor mimicry’, which is Allport’s (1968) term for a reaction by an observer that is appropriate to the situation of the person being observed but not appropriate to the observer’s own immediate situation (e.g., wincing at someone else’s injury). Allport had pointed out that “the little understood tendency to elementary motor mimicry” (p.29) was still a “riddle in social psychology” (p.30). Previous theories had cast motor mimicry as “basically a perceptual motor reaction” (p.32), that is, within an individual framework. It turned out that a solution to Allport’s riddle was to place motor mimicry in its micro-social context. Before describing this research, though, I will introduce the secondary theme of this chapter, which is the apparent difficulty that many researchers have in noticing the micro-social context of language use or in remembering it even after it has been demonstrated as a solution. The next section illustrates this sub-theme by analogy. 1.2
Self-imposed Limits on Observation
The micro-social perspective seems elusive in the study of language use, even and perhaps especially in social psychology (Clark, 1985; Bavelas, 2005). I propose that its fate is analogous to the classic ‘nine-dot problem’ in perceptual psychology, which starts with a 3-by-3 matrix of points:
The problem is to draw a line through the middle of each of the dots only once, using four straight lines that are all connected. In other words, use four lines to connect all of the dots without lifting the pen or pencil from the paper and without retracing through any of the dots. (The solution can be found on the web or in a review article such as Kershaw & Ohlsson, 2004.)
130 JANET BAVELAS To connect the nine dots as required, one must think and act outside the apparent square or box that their configuration suggests. The outer dots are not a border or limit, but most of us initially impose one ourselves and try to solve the puzzle within a self-limited space, rather than using the space outside it. Similarly, even when ostensibly studying language use, many researchers still operate within the borders of the individual, with a self-imposed limit that excludes the microsocial context of which the individual is part at any given moment. The rest of this chapter will be devoted to the possibility of looking outside the conceptual limit of the isolated individual. There is an additional reason to use the nine-dot problem as an analogy here, namely, its remarkable resistance to insightful solution. Chronicle, Omerod, and MacGregor (2001) have shown that verbal and visual hints raise the success rate only slightly. Moreover, even when individuals have once seen or drawn the solution themselves, they may not be able to solve the problem later. (The reader who had previously known the solution may have had the same experience here.) I propose that the micro-social perspective on human behaviour faces a similar recalcitrance. Even when evidence reveals it as a solution, it seems to be quickly forgotten, and the focus on the individual takes over again. In brief, the nine-dot problem is really two problems: At first, it’s hard to see the solution. Then, even once you’ve seen the solution, it doesn’t stick. We will see how this analogy works for the micro-social aspects of an illustrative case. 2.
Motor Mimicry as a Micro-social Act
2.1
Background and definition
Allport (1968) credited Adam Smith, in The Theory of Moral Sentiments (1759/1966) with the first recorded description of motor (i.e., overt, behavioural) mimicry, in which the observer visibly reacts as the observed other person might react, even though this reaction is not appropriate to the observer’s own situation. th In 18 century England, it was apparently common to see another person beaten in public: “When we see a stroke aimed, and just ready to fall upon the leg or arm of another person, we naturally shrink and draw back on our leg or our own arm” (Smith, 1759/1966:4). As reviewed in Bavelas, Black, Lemery, and Mullett (1987), scholars in the 19th and 20th centuries, including Darwin (1872/1998), continued to notice motor mimicry. Allport (1968:29-32) summarized the few existing explanations for motor mimicry and concluded that, even after 200 years, the phenomenon was “at present not fully understood.” All of the proposed explanations focused entirely on individual processes (e.g., vicarious emotion or an imitative reflex). When our research group approached Allport’s riddle in the early 1980's, we had the considerable advantages of experimental methods and video technology (Bavelas, Black, Lemery, MacInnis, & Mullett, 1986a) so that we were not limited to descriptions or still photographs of motor mimicry. Because of the capacity to
MICRO-SOCIAL CONTEXT 131
observe closely and repeatedly, we began to recognize that the individual’s reaction depended on the micro-social context, in that it was more likely to occur when another person would see it. We then hypothesized that motor mimicry was a communicative act, skilfully and efficiently conveying understanding of the other’s situation. 2.2
First experiments
Our first full experiment (Bavelas, Black, Lemery, & Mullett, 1986b) tested the communicative hypothesis by varying whether or not another person would see the participants’ motor mimicry, that is, whether there was a receiver for any mimetic display the participant might make. The experiment was explicitly designed to challenge the existing individual explanations: If motor mimicry is communicative, then … the probability of its being seen by a receiver should affect the sender’s display of facial mimicry… . If, on the other hand, such mimesis reflects essentially a private experience that happens to result in overt nonverbal behavior, then … the presence or absence of a receiver should have no effect. (1986b:323).
In a highly controlled 4-second interval (described in detail below), the experimenter (E1) apparently injured his finger and then either did or did not make eye contact with the observing participant. This independent variable required a carefully rehearsed sequence that began with the participant waiting while E1 and his assistant (E2) carried a heavy TV monitor into the room to a table near the participant, where it slipped and dropped on E1’s finger. Over the next few seconds, the two experimental conditions began to diverge: either the probability of eye contact between E1 and the participant increased until it finally occurred or the probability decreased and it did not occur at all. We took great care that the only difference between conditions in this brief period was eye contact (or not). Because the experimenter’s face would ultimately be visible only in the eye-contact condition, it was essential that his expression was not the cue to which the participant was responding (which would be a fatal confound in the design; p.325). Therefore, E1’s face initially began to show visible pain in both conditions, and he also indicated his injury by a sharp intake of breath and body-acting. Then, at the point in the sequence when the participant in the no-eye-contact condition would not see E1’s face, his expression was blank in the eye-contact condition. At the maximum point of eye contact, E1 displayed no pain facially. Thus, the only difference between conditions was whether E1 and the participant made eye contact, that is, whether motor mimicry by the participant would be seen by a receiver (E1). Here is the procedural description from the published article (1986b:324):
132 JANET BAVELAS Eye contact condition
No eye contact condition
Injury and intake of breath; face begins to show pain. Brings head up and glances at observer with defocused eyes as head rolls back.
Injury and intake of breath; face begins to show pain. Hunches down over TV, with face visible to observer in profile.
Two seconds after the start of the injury: E2 lifts TV off E1’s hand. E1 pivots fully toward observer, in semi-crouch, holding his hand. Looks at hand, then directly at observer for 1 second with “blank” face.
E2 lifts TV off E1’s hand. E1 pivots fully toward E2, in semi-crouch, holding his hand. Looks at hand, then directly at E2 for 1 ssecond.
Our dependent measure was any expression of pain by the observer (the experimental participant); it did not need to be literally what E1 had done, as long as it was appropriate to his situation. Microanalysis of the videotapes revealed a clear result: If the participants had eye contact with E1, they typically winced and also made a kaleidoscopic sequence of other pain displays in response to his injury (e.g., knitted eyebrows, sharp intake of breath, vocalization such as “ouch”). At the point of maximum eye contact, when E1 looked fully at them, they made a very distinct display. The participants who saw the same injury (and the same initial facial display of pain) typically either displayed no motor mimicry at all or started to respond and quickly ceased. At their minimally social point, when E1 turned to interact with E2, few displayed any motor mimicry. The statistically significant difference between experimental conditions in participants’ motor mimicry was a micro-social effect of eye contact. These results eliminated other, individualistic explanations for motor mimicry such as imitation and vicarious emotion, because neither of these can account for the difference between conditions, that is, when motor mimicry did not occur. It is important to emphasize that we did not deny that emotion may have occurred, only that it did not produce the motor mimicry. In our parallelprocess theory (Bavelas et al, 1986b:328), the internal and communicative processes are independent, and it is the micro-social, communicative process that determines the overt behaviour. That is, observers may or may not have experienced vicarious pain in reaction to the injury they saw, but this hypothesized internal state did not automatically result in overt mimesis, because observers who witnessed the same injury without eye contact were virtually impassive. The evidence was clear: The display of motor mimicry depended on the presence of a recipient; it was meant to be seen. Only when we included the micro-social dimension—in this case, the possibility of immediate communication to another person—did the motor mimicry become intelligible. In the second part of the experiment, another group of participants rated the videotaped reactions of the previous participants. These raters received a
MICRO-SOCIAL CONTEXT 133
description of E1’s injury, but his image on the split-screen video was covered so that they would be unaware of the eye-contact variable. Their instructions were to rate each person they saw on video for “the extent to which the face expressed that the person knew how the experimenter felt . . . [and] cared about what had happened to the experimenter” (Bavelas et al., 1986b:327; emphasis original). The faces of participants in the no-eye-contact condition were rated as significantly less aware of and less caring about E1’s injury than those in the eye contact condition. (Note that these were outsiders’ ratings of the facial expressions; we have no information on how the participants actually felt.) Taken together, these two studies suggested that motor mimicry was both encoded and decoded as a brief but important interpersonal message. The next set of experiments focused specifically on encoding and decoding, in order to demonstrate that micro-social factors determined not only the occurrence but the very form of motor mimicry. Bavelas, Black, Chovil, Lemery, and Mullett (1988) examined closely the motor mimicry that was most often described by the classical authors cited above, namely, an observer watching someone else who is leaning or ducking to one side. If the observer also leans or ducks, this would be considered motor mimicry regardless of whether it is to the right or the left. However, our micro-social analysis proposed that the direction in which the observer leans is theoretically informative. To explain, I’m going to use arbitrary pronouns for the two people: The first person, who initiates the sequence by leaning or ducking to one side (e.g., to avoid or reach for something), will be female while the person who mimics her action will be male. If the first person leaned to her left and the observer, who is facing her, leaned to his left as well, he would be doing exactly what she had done, which fits individualistic theories of motor mimicry such as literal imitation, taking the role of the other, or vicarious action. From a social or dyadic perspective, however, this looks odd because its effect is that the two of them are moving in opposite compass directions:
To appear to be moving with the first person, the one facing her must lean or duck in the opposite direction. If she leaned left, and then he would lean right, which is in the same compass direction:
134 JANET BAVELAS We proposed that it is easier to see this direction as mimetic because it looks more “together”⎯a perceptual relationship that Heider (1958:200-201) called a “unit relation.” In our first experiment, each participant was facing the experimenter, who told a story in which she illustrated ducking away to avoid someone’s elbow. As predicted, virtually all of the participants who ducked mimetically did so in the same compass direction, rather than in the opposite compass direction. The participants encoded their motor mimicry in the most intelligible form by ducking ‘with’ the storyteller. Next, in order to test the effects of the two forms of mimetic leaning on viewers, we created several descriptions or photographs of situations in which one person was leaning or ducking (e.g., to avoid a squash racquet in the face or to reach a coffee mug on a side table) and the person facing was also leaning. There were always two versions of the same scene: the second person was either leaning in the same or the opposite compass directions. Large samples of participants were significantly more likely to choose the version in which the observer was leaning in the same compass direction as the one in which the observer appeared to be more “involved” and to be acting “together with” the other person. Altogether, the results in this article (Bavelas et al., 1988) led to the conclusion that the form of this kind of motor mimicry was shaped by its micro-social context, that is, the mimicry took the form that was most readily intelligible to the other person. Even the transitory actions of ducking or leaning were sensitive to their appearance in relation to the other person. This was another step outside the boundary of the individual, but our experiments still had little to do with face-toface dialogue. 2.3
Motor mimicry in dialogue
As shown in our first experiment (Bavelas et al., 1986b), many expressions of motor mimicry are facial, for example, the observer may wince or look startled when the other person is suddenly injured. Kraut and Johnston (1979) had pointed out that, in over 100 years of studies of facial expression since Darwin (1872/1998), almost none had examined its social functions. Indeed, most experiments studied individuals alone or with a non-reactive interviewer, in order that social factors would not obscure their ‘true’ emotional expressions. As with the nine-dot problem, there had been a self-imposed limit to studying the faces of isolated individuals—yet without social data, an alternative theory could hardly arise. Kraut and Johnston also introduced a useful theoretical distinction between studying a facial configuration (e.g., smiling) as an emotional ‘expression’ of some presumed internal state (which may be independent of social context) versus studying it as a social ‘display’ directed at others (which may be independent of internal state). In our lab, Chovil (1989; 1991; 1991-92) began to gather systematic social data on facial displays in dialogue.
MICRO-SOCIAL CONTEXT 135
One of Chovil’s studies (1989; 1991) tested whether the addressee’s motor mimicry in face-to-face dialogue is communicative (versus a purely individual reaction) by manipulating whether it would be seen. There were four experimental conditions in which an addressee listened to a narrator’s real close-call story: (1) the narrator and addressee were face-to-face; (2) the narrator and addressee were interacting through a partition; (3) narrator and addressee were on the telephone; or (4) the addressees were alone, listening to a dramatic close-call story on an answering-machine. In the face-to-face condition, there was a great deal of facial motor mimicry (e.g., displays of fear, pain, or alarm), and these were significantly less frequent in the other three conditions. These results replicated in dialogue the finding of Bavelas et al. (1986b) that motor mimicry required a visually available receiver. (There is still very little experimental research on non-emotional functions of facial displays in dialogue; cf. Bavelas & Chovil, 2006; Bavelas & Gerwing, in press.) Later, we borrowed Chovil’s (1989, 1991) close-call task in order to learn more about what ‘mere listeners’ do in face-to-face dialogue (Bavelas, et al., 2000). They made two different kinds of responses: As one would expect, addressees made the familiar generic responses, such as nodding, “yeah,” and “mhm.” But they also made responses that were specific to the speaker’s topic at that moment: supplying words or phrases and displaying precisely fitted facial or bodily gestures such as looking alarmed, recoiling, or crouching slightly—in other words, motor mimicry. Listeners made these specific responses at surprisingly high average rates (2 to 4 per minute), skewed toward the dramatic ending of the story. We therefore proposed that the true home of motor mimicry is in face-toface dialogue, where it constitutes important feedback to the narrator. Two experiments tested this hypothesis by using an unrelated cognitive task to distract addressees in one condition from the narrative (Bavelas et al., 2000). The results showed that distracting the addressees from the content of the story virtually eliminated specific responses such as motor mimicry, which strongly suggests that these responses require cognitive processing and are not automatic or reflexive. Moreover, the distracted addressees had a strong effect on the narrator’s storytelling, particularly at the dramatic ending, where specific responses were more likely to occur in the normal listening condition. When the addressee was distracted and made only generic responses or none at all, the narrator’s storyending fell flat and was significantly more poorly told than when the narrator was talking to a responsive addressee. A good narrator requires a good addressee, and a good addressee is not diffusely socially present but specifically active at the micro-social level. The final evidence (so far) for the micro-social nature of motor mimicry was an analysis of precisely how and when addressees time their responses to the narrator (Bavelas, Coates, & Johnson, 2002). We examined interactions in which narrators told their close-call stories to undistracted addressees (the control group in Bavelas, et al., 2000, exp. 2) and found a consistent and highly collaborative
136 JANET BAVELAS pattern between narrator and addressee: The narrator would occasionally glance at the addressee, initiating eye contact. This is when addressees made their generic or specific responses, during mutual gaze. Then the speaker looked away again, which ended the eye contact. So the speaker started the sequence by making eye contact and the addressee responded, which terminated the speaker’s gaze. This pattern was statistically different from chance not only for the sample as a whole but for each dyad. Their reciprocal coordination was the same for both generic and specific responses, that is, for motor mimicry too. Speakers and addressees in face-to-face dialogue produced these responses together, as part of a micro-social sequence. Note that this result replicates the original Bavelas et al. (1986b) eyecontact effec, this time with a dyad in spontaneous dialogue. The five articles reviewed here contained 11 studies all strongly suggesting that one answer to Allport’s (1968:30) “riddle” of motor mimicry is to see it as a micro-social process and not as an individual process. Motor mimicry is not an individual phenomonon. It depends on, is shaped by, and in turn influences a particular moment in dialogue, illustrating the fine-tuned reciprocity that is the essence of the micro-social context. However, as we will see next, this is not the direction in which the literature has gone. 2.4
Re-imposing the Limits
Recall that the second difficulty of the nine-dot problem is that giving hints or directions for a solution does not help much; the self-imposed limits seem to return on their own. The same has been true for the above research on the microsocial nature of motor mimicry. From the first experiment (Bavelas et al., 1986b) to the latest dialogue studies (Bavelas et al., 2000; 2002), the results have clearly demonstrated that motor mimicry is part of a communicative social interaction at the micro-level rather than an imitative or emotional response by an individual. Yet, as I will illustrate with a sample of citations to the first experiment, one finds quite a different version in the subsequent literature. Because of the surprising outcome of this survey and because a literature review cannot, by definition, be anonymous, it is important to emphasize its very narrow scope. The sole purpose here was to ascertain how the findings of one experiment were represented in subsequent research or theory, that is, to look for an overall pattern in the literature. For this reason, the focus was solely on each citing article’s description of the Bavelas et al. (1986b) eye-contact experiment (described in detail in section 2.2), which has been available long enough to have accumulated a number of citations. In each article reviewed, this experiment was typically only one citation among numerous others and was usually described in a few lines; it was never the main point of an article. Therefore, my comments cannot and do not apply to other parts of any article, much less to an entire article or its authors. Finally, before judging those who mis-cited the experiment, each of us should ask ourselves whether we have ever read something through our own
MICRO-SOCIAL CONTEXT 137
preconceptions. It is this all-too-familiar tendency—staying within one’s selfimposed framework—that is my point here. The sample came from PsycInfo and Google Scholar, which in late 2006 listed 37 and 50 citations, respectively, of Bavelas et al. (1986b. I selected the 39 unique citations that were in refereed journals or book chapters available in English and that cited this experiment in the context of a discussion of mimicry (excluding abstracts, conference proceedings, articles and full books on unrelated topics, and, obviously, self-citations). There were two clear patterns in these 39 citations. First, there were no criticisms of the our actual experimental design. Although a handful questioned the generalizability of our conclusions, most of the citations were positive references to the experiment in support of the citing article’s theoretical position. Second, only five of the 39 citations were wholly accurate in their descriptions of the results (and the procedure if they included it): Bertenthal, Longo, & Kosobud (2006:223), Fridlund (1991:230 & 237), Fridlund, Kenworthy, & Jaffey (1992:191-192), Gibbs & Van Orden (2003:5) and Parkinson (2005:282 & 297-298). Twelve other citations reported accurately that we had proposed a communicative or signalling function of motor mimicry (Bush, Barr, McHugo, & Lanzetta, 1989: 49; Chartrand & Bargh, 1999:901; Chartrand, Maddux, & Lakin, 2005:338 & 350; DePaulo, 1992:221, 229, & 230; Dijksterhuis & Bargh, 2001:13; Hess, 2001:398; Hess & Blairy, 2001:129 & 139; Hess, Philippot, & Blairy, 1998:509-510; Krauss & Fussell, 1996:664 & 690; Lakin & Chartrand, 2005:283 & 285; Manusov & Rodriguez, 1989:16; Wilson & Knoblich; 2005:463) but included inaccurate or inconsistent descriptions of the variables or procedure; see details below. The remainder were inaccurate descriptions of the variables, results, and/or procedure of cited experiment. Moreover, the pattern was not random: The vast majority described the experiment as supporting one of three individual theories of motor mimicry: automatic or nonconscious imitation, emotional contagion, or expression of an internal, mental state. In effect, these theories recast the experimenter as a mere stimulus for imitation rather than a conscious being who was capable of perceiving the participant’s response in their micro-social interaction. In the following, I will first explicate how these theories cannot explain our results and then describe in more detail how this incongruence came about. A theory of motor mimicry as automatically activated imitation of the other person’s facial display does not explain the significantly lower frequency of mimicry in the no-eye-contact condition. First, the procedural description reproduced in Section 2.2 shows that, in the initial two seconds, the experimenter’s facial display of pain was identical and equally visible in both conditions, so both conditions should have automatically triggered a response. Second, at the peak point of motor mimicry (during full eye contact), his facial expression was gone. There was none to imitate, yet this was the point of maximum difference between the two conditions. Third, behaviours that we
138 JANET BAVELAS included as motor mimicry were much more varied than a literally imitative facial display. In keeping with the historical definition of motor mimicry, we counted any behaviours “indicative of pain” (1986b:324), including vocalizations and certain head movements. Fourth, the mean response time was 1.27 seconds, during which the participants took into account both the injury and the probability of eye contact; their reaction was fast but not simple automaticity. Nor can the results be attributed to emotional contagion of the experimenter’s pain, again because of the no-eye-contact condition. Because the apparent pain of the injury was the same in both conditions, emotional contagion cannot explain why the participants who witnessed the same degree of pain without eye contact were unlikely to display mimicry. Also, as reported in the article (1986b:325), participants in the eye-contact condition were significantly more likely to include a smile mixed in with their pain displays, which is inconsistent with feeling vicarious pain. We interpreted those smiles as also communicative, that is, as reassuring or face-saving smiles to the experimenter. For similar reasons, the interpretation of our participants’ actions as expressions of an internal state such as rapport, liking, empathy, or an emotion (presumably pain) also has no basis in the evidence. There was no self-report or other measure of the participants’ feelings. To infer the participants’ internal state from their motor mimicry and then explain the mimicry by this inferred internal state would be circular. The only ratings obtained (Bavelas et al., 1986b, part 2) were from third parties on very specific scales, not global ratings of rapport, liking, empathy, or any emotional state. To interpret motor mimicry as the expression of an internal state would also require us to assume that the less reactive participants in the no-eye-contact condition did not feel anything when they witnessed an equally painful injury. We had pointed out that the data excluded explanations based on intrapsychic states such as empathy (1986b:327) and offered instead our parallel-process model, which separates feelings from overt displays (1986b:328). In short, the evidence in the 1986b experiment is incompatible with each of the three most common re-interpretations, primarily because of the significant effect of the independent variable, which showed when motor mimicry would and would not occur. However, this incompatibility did not arise in most of the citations, mainly because the relevant elements were absent or transformed. This occurred in several different ways: A large proportion of the inaccurate citations simply did not mention an independent variable (Aarts, Gollwitzer, & Hassin, 2004:34; Bush et al., 1989:49; Fischer, Rotteveel, Evers, & Manstead, 2004:225; Garrod & Pickering, 2004:10; Hess, 2001:398 & 400; Hess & Blairy, 2001:129 &139; Hess et al., 1998:509510; Krauss & Fussell, 1996:664 & 690; Lundqvist, 2006:263; Marsh, Richardson, Baron, & Schmidt, 2006:13; Pasupathi, Carstensen, & Levenson, 1999:175-176; Richardson, Marsh, & Schmidt, 2005:62; Russell, 2003:155; Sebanz, Knoblich, & Prinz, 2005:1235; Sonnby-Borgström, 2002:433; SonnbyBorgström & Jönsson, 2004:103; Sonnby-Borgström, Jönsson, & Svensson,
MICRO-SOCIAL CONTEXT 139
2003:4 &16; Van Swol (2003:462; Vreeke & van der Mark, 2003:183; Williams, 2002:449; Wilson & Knoblich; 2005:460 & 463). Most of these citations were in support of demonstrating that mimicry could be experimentally elicited but, lacking an independent variable, they could not describe the main purpose of the experiment, which was to show when it would not occur. Several articles did describe an independent variable. In one case, only the timing of the eye contact was incorrect (Depaulo, 1992:221). More often, a different independent variable was described (Jakobs, Manstead, & Fischer, 2001:52; Lakin & Chartrand, 2003:334; Tiedens & Fragale, 2003:559). In three cases, a small change transformed the manipulation from one that made the participant’s expression visible to the experimenter into one that made the experimenter’s facial expression more clearly visible as a stimulus to the participant (Chartrand et al., 2005:338; Dijksterhuis & Bargh, 2001:10; Niedenthal, Barsalou, Winkielman, Krauth-Gruber, & Ric, 2005:192). Recall that the experimenter’s facial expression was equally visible or not in both conditions. There were also changes in the dependent variables analysed (Bush, et al., 1989:49; Chartrand et al., 2005: 350; Fischer et al., 2004:225; DePaulo, 1992:221 & 229; Hess, 2001:400; Hess & Blairy, 2001:129 &139; Lakin & Chartrand, 2005:283 & 285; Marsh et al., 2006:13; Pasupathi et al., 1999:176; Russell, 2003:155; Van Swol, 2003:462; Williams, 2002:449; Wilson & Knoblich; 2005:463). The most common error was describing variables that went well beyond what we had measured, usually internal states such as rapport, empathy, affiliation, or emotional contagion, but there were also a few instances of different mimicry (e.g., yawning or sadness). In other cases, it was not possible to tell whether the error was the independent or dependent variable (Lundqvist, 2006:263; Sonnby-Borgström, 2002:433; Sonnby-Borgström & Jönsson, 2003:143; Sonnby-Borgström & Jönsson, 2004:103; Sonnby-Borgström, et al., 2003:4 &16), or both the independent and the dependent variables were changed (Bargh & Ferguson, 2000:930; Chartrand & Bargh, 1999:896; Dijksterhuis, 2005:211; Manusov & Rodriguez, 1989:16). Finally, there were descriptions of the experimental context or procedure that did not match ours, such as a naturally occurring interaction rather than an experiment (Bargh & Ferguson, 2000: 930; Chartrand & Bargh, 1999: 896) or as involving a speaker and listener, co-acting partners, or infants or children instead of an (adult) experimenter and observer (Hess, 2001:398; Hess et al. 1998:509; Jakobs et al., 2001:52; Krauss & Fussell, 1996:664 & 690; Manusov & Rodriguez, 1989:16; Sebanz et al., 2005:1235; Vreeke & van der Mark, 2003:183). In summary, the vast majority of this sample of 39 articles citing the experiment did not deal with the hypothesis and variables we were testing, which was that motor mimicry is elicited by social, communicative processes rather than by individual processes. The results clearly supported a micro-social explanation, yet the consistent pattern of the majority of citations was to reconstruct the experiment into one that supported an individual interpretation. (Almost none of
140 JANET BAVELAS the articles included the subsequent experiments that demonstrated motor mimicry in dialogue.) It is important to emphasize again that this was a narrow search that does not impugn anyone’s scholarship, much less suggest that anyone deliberately distorted our experimental results. I believe that the authors simply read our experiment as fitting a familiar or expected pattern. In the metaphor of this chapter, it seems that the micro-social context in the experiment was as irrelevant as the white space outside the nine dots and was similarly ignored. 3.
Why Don’t We See the Micro-social?
I propose that it is primarily the tendency to see the individual as a natural unit of analysis that prevents our seeing the micro-social context that surrounds and profoundly influences each individual. Focusing on the individual creates an implicit border, as in the nine-dot problem, which seems to limit our perceptual or conceptual field to mental rather than social processes. Even when researchers briefly step outside and notice what is happening around and in interaction with the individual, the focus soon retreats to the isolated individual. Both language and social interaction have been predominantly attributed to mental processes, not just in the broader domains of linguistics and psychology, but even in those that one might expect to be especially interested in face-to-face dialogue, such as psycholinguistics and social psychology. Because this choice of the individual as the unit of analysis seems to be a relatively unexamined one, it is worth considering briefly its causes and effects, focusing on the discipline I know best, psychology. In an insightful analysis, Danziger (1990) has shown how the person historically called “the subject” has been socially constructed in psychology. He described a Robinson Crusoe myth [which] made it seem eminently reasonable to ignore the settings that had produced the human behavior … and to reattribute it as a property of individuals in isolation. (1990: 186)
Danziger went on to point out that psychologists did not invent the Robinson Crusoe myth (although they continue to contribute to it); western culture is highly individualistic, placing a high value on individual actions and attributes. I would add that the circumscribed individual is appealing because he or she is an obvious biological unit (even though not a viable one if truly isolated). This physical boundary also defines each of us as a separate entity in the law and in society at large. According to Danziger (1990), the experimental methods to which psychology aspired even in its formative years served to reinforce the Robinson Crusoe myth in ways that systematically removed what I am calling the microsocial context:
MICRO-SOCIAL CONTEXT 141
Experimental methods isolated individuals from the social context of their existence and sought to establish timeless laws of individual behavior by analogy with the laws of natural science. Shared social meanings and relations were automatically broken up into the properties of separate individuals [versus] features of an environment that was external to each of them. … Anything social became a matter of external influence that did not affect the identity of the individual under study. (1990:186-87)
Notice that drawing a conceptual circle that includes only the individual inevitably means that what the other person in an interaction is doing at any moment becomes external or environmental and thus, at most, a stimulus for the actions of the individual. Because face-to-face dialogue is intimately and constantly reciprocal, treating the other person as external to the individual makes it impossible to see or study micro-social phenomena. (The use of confederates in social psychology is an attempt to control or eliminate the influence of the other person, although it is much more likely to produce an artificial interaction with significant effects on behaviour; e.g., Beattie & Aboudan, 1994). As discussed elsewhere (Bavelas, 2005), another reinforcer of the notion of the individual as a natural unit of study has been a mistaken application of the principle of reductionism, borrowed from natural science. It is true that reductionism advocates reducing complex phenomena to separate simpler parts (Reber, 1985:623), but it does not mean always studying the smallest or most molecular part. As Luria (1987:675) emphasized, it is essential to preserve all of the “basic features” of the phenomenon of interest. Luria pointed out that chemists cannot study water by separate research into hydrogen and oxygen, because those elements are below the level of the phenomenon of interest; they do not preserve its basic features. Studying isolated individuals is equally unlikely to predict the properties of their observable micro-social interactions. The implicit assumption of reductionism is that studying individuals who have been conceptually or even physically isolated from each other will naturally lead to an understanding of their micro-social interactions, albeit at some unspecified point in the future—an outcome that has not yet, to my knowledge, been demonstrated. What many of us have discovered is that permitting dyadic interaction does not, as feared, preclude true experiments, nor does conducting experiments necessarily destroy social interaction. As shown in the examples at the end of Section 1.1 as well as in Section 2.3, it is possible to set a task that will create the desired variable and then let the participants interact freely within it. Ancient maps are said to have warned that, in the regions outside known lands, “Here be tygers,” but we have not found such methodological dangers. Instead of the expected chaos, the data become even more systematic and interpretable when two real participants interact with each other. Perhaps because face-to-face dialogue is such a central part of our social life and because interlocutors have to make sense to each other, they will therefore also make sense to us as observers. Moreover, with a careful choice of tasks and a precise focus on the outcome of interest, the analysis can be highly reliable, if time-consuming. We attribute this high inter-analyst reliability to the fact that the analysts themselves participate
142 JANET BAVELAS constantly in face-to-face dialogue in their everyday lives, so they are natural experts in understanding its nuances. In the experiments described above and many others, a focus on the micro-social dimension has been, not just possible, but richly rewarding. 4.
Epilogue
The ideas proposed here may seem far from the work of David McNeill, but there is a direct and continuing debt. His 1985 article on gesture led to an ongoing program of research on hand gestures that has helped shape the micro-social perspective. Equally important, it led me to be able to see several kinds of socalled nonverbal communication in more linguistic ways; to do meaning-based analysis instead of physical description; and to do experimental research with real dyads in real conversations. In addition, his intellectual generosity and tolerance of ideas that are often quite different from his own is an example to us all. References Aarts, H., Gollwitzer, P. M., & Hassin, R. R. (2004). Goal contagion: Perceiving is for pursuing. Journal of Personality and Social Psychology, 87, 23-37. Allport, G. W. (1968). The historical background of modern social psychology. In G. Lindzey & E. Aronson (Eds.), The handbook of social psychology (3rd ed.). Menlo Park, CA: AddisonWesley. Bargh, J. A., & Ferguson, M. J. (2000). Beyond behaviorism: On the automaticity of higher mental processes. Psychological Bulletin, 126, 925-945. Bavelas, J. B. (1990). Nonverbal and social aspects of discourse in face-to-face interaction. Text, 10, 5-8. Bavelas, J. B. (2005). The two solitudes: Reconciling Social Psychology and Language and Social Interaction. In K. Fitch & R. Sanders (Eds.), Handbook of language and social interaction (pp.179-200). Mahwah, NJ: Erlbaum. Bavelas, J. B., Black, A., Chovil, N., Lemery, C. R., & Mullett, J. (1988). Form and function in motor mimicry. Topographic evidence that the primary function is communicative. Human Communication Research, 14, 275-299. Bavelas, J. B., Black, A., Lemery, C. R., MacInnis, S., & Mullett, J. (1986a). Experimental methods for studying "elementary motor mimicry." Journal of Nonverbal Behavior, 10, 102-119. Bavelas, J. B., Black, A., Lemery, C. R., & Mullett, J. (1986b). "I show how you feel": Motor mimicry as a communicative act. Journal of Personality and Social Psychology, 50, 322329. Bavelas, J. B., Black, A., Lemery, C. R., & Mullett, J. (1987). Motor mimicry as primitive empathy. In N. Eisenberg & J. Strayer (Eds.), Empathy and its development (pp.317-338). Cambridge: Cambridge University Press. Bavelas, J. B., & Chovil, N. (1997). Faces in dialogue. In J. A. Russell & J.-M. Fernandez-Dols (Eds.), The psychology of facial expression (pp.334-346). Cambridge, UK: Cambridge University Press. Bavelas, J. B., & Chovil, N. (2006). Hand gestures and facial displays as part of language use in face-to-face dialogue. In V. Manusov & M. Patterson (Eds.), Handbook of nonverbal communication (pp.97-115). Thousand Oaks, CA: Sage. Bavelas, J. B., Chovil, N., Coates, L., & Roe, L. (1995). Gestures specialized for dialogue. Personality and Social Psychology Bulletin, 21, 394-405.
MICRO-SOCIAL CONTEXT 143
Bavelas, J. B., Chovil, N., Lawrie, D. A., & Wade, A. (1992). Interactive gestures. Discourse Processes, 15, 469-489. Bavelas, J. B., Coates, L., & Johnson, T. (2000). Listeners as co-narrators. Journal of Personality and Social Psychology, 79, 941-952. Bavelas, J. B., Coates, L., & Johnson, T. (2002). Listener responses as a collaborative process: The role of gaze. Journal of Communication, 52, 566-580. Bavelas, J. B., & Gerwing, J. (in press). Conversational hand gestures and facial displays in faceto-face dialogue. In K. Fiedler (Ed.), Social communication (pp. 283-308). New York: Psychology Press. Bavelas, J. B., Hutchinson, S., Kenwood, C., & Matheson, D. H. (1997). Using face-to-face dialogue as a standard for other communication systems. Canadian Journal of Communication, 22, 5-24. Beattie, G., & Aboudan, R. (1994). Gestures, pauses and speech: An experimental investigation of the effects of changing social context on their precise temporal relationships. Semiotica, 99, 239-272. Bertenthal, B. I., Longo, M. R., & Kosobud, A. (2006). Imitative response tendencies following observation of intransitive actions. Journal of Experimental Psychology: Human Perception and Performance, 32, 210-225. Brunner, L. J. (1979). Smiles can be back channels. Journal of Personality and Social Psychology, 37, 728-734. Bush, L. K., Barr, C. L., McHugo, G. J., & Lanzetta, J. T. (1989). The effects of facial control and facial mimicry on subjective reactions to comedy routines. Motivation and Emotion, 13, 3152. Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: The perception-behavior link and social interaction. Journal of Personality and Social Psychology, 76, 893-910. Chartrand, T. L., Maddux, W. W., & Lakin, J. L. (2005). Beyond the perception-behavior link: The ubiquitous utility and motivational moderators of nonconscious mimicry. In R. R. Hassin, J. S. Uleman, & J. A. Bargh (Eds.), The new unconscious (pp.334-361). New York: Oxford University Press. Chovil, N. (1989). Communicative functions of facial displays in conversation. Unpublished doctoral dissertation, Department of Psychology, University of Victoria, Victoria, B.C., Canada. Chovil, N. (1991). Social determinants of facial displays. Journal of Nonverbal Behavior15, 141154. Chovil, N. (1991-92). Discourse-oriented facial displays in conversation. Research on Language and Social Interaction, 25, 163-194. Chronicle, E. P., Omerod, T. C., & MacGregor, J. N. (2001). When insight just won't come: The failure of visual cues in the nine-dot problem. Quarterly Journal of Experimental Psychology, 54, 903-919. Clark, H. H. (1985). Language use and language users. In G. Lindzey & E. Aronson (Eds.), The handbook of social psychology (3rd ed., Vol. 2, pp.179-231). New York: Harper and Row. Clark, H. H. (1996). Using language. Cambridge, UK. Cambridge University Press. Clark, H. H., & Krych, M. A. (2004). Speaking while monitoring addressees for understanding. Journal of Memory and Language, 50, 62-81. Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 139. Danziger, K. (1990). Constructing the subject. Historical origins of psychological research. Cambridge, UK: Cambridge University Press. Darwin, C. (1998). The expression of the emotions in man and animals (3rd ed.). London: HarperCollins. (Original work published 1872). DePaulo, B. M. (1992). Nonverbal behavior and self-presentation. Psychological Bulletin, 111, 203-243. Dijksterhuis, A. (2005). Why we are social animals: The high road to imitation as social glue. In S. Hurley & N. Chater (Eds.), Perspectives on imitation: From neuroscience to social science:
144 JANET BAVELAS Vol.2: Imitation, human development, and culture (pp.207-220). Cambridge, MA: MIT Press. Dijksterhuis, A., & Bargh, J. A. (2001). The perception-behavior expressway: Automatic effects of social perception on social behavior. In M. P. Zanna (Ed.), Advances in Experimental Social Psychology, 33, 1-40. Fillmore, C. J. (1981). Pragmatics and the description of discourse. In P. Cole (Ed.), Radical pragmatics (pp. 143-166). New York: Academic Press. Fischer, A. H., Rotteveel, M., Evers, C., & Manstead, A. S. R. (2004). Emotional assimilation: How we are influenced by others’ emotions. Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 22, 223-245. Fridlund, A. J. (1991). Sociality of solitary smiling: Potentiation by an implicit audience. Journal of Personality and Social Psychology, 60, 229-240. Fridlund, A. J., Kenworthy, K. G., & Jaffey, A. K. (1992). Audience effects in affective imagery: Replication and extension to dysphoric imagery. Journal of Nonverbal Behavior, 16, 191212. Furuyama, N. (2000). Gestural interaction between the instructor and the learner in origami instruction. In D. McNeill (Ed.), Language and gesture (pp.99-117). Cambridge: Cambridge University Press. Garrod, S., & Pickering, M. J. (2004). Why is conversation so easy? Trends in Cognitive Sciences, 8, 8-11. Gerwing, J., & Bavelas, J. B. (2004). Linguistic influences on gesture’s form. Gesture, 4, 157-195. Gibbs, R.W., Jr. & Van Orden, G. C. (2003). Are emotional expressions intentional? A selforganizational approach. Consciousness & Emotion, 4, 1-16. Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press. Goodwin, C. (1986). Between and within: Alternative sequential treatments of continuers and assessments. Human Studies, 9, 205-217. Heider, F. (1958). The psychology of interpersonal relations. New York: John Wiley. Hess, U. (2001). The communication of emotion. In A. Kaszniak (Ed.), Emotions, qualia, and consciousness (pp.397-409). Singapore: World Scientific Publishing. Hess, U., & Blairy, S. (2001). Facial mimicry and emotional contagion to dynamic emotional facial expressions and their influence on decoding accuracy. International Journal of Psychophysiology, 40, 129-141. Hess, U., Philippot, P., & Blairy, S. (1998). Facial reactions to emotional facial expressions: Affect or cognition? Cognition and Emotion, 12, 509-531. Isaacs, E. A., & Clark, H. H. (1987). References in conversations between experts and novices. Journal of Experimental Psychology: General, 116, 26-37. Jakobs, E., Manstead, A. S. R., & Fischer, A. H. (2001). Social context effects on facial activity in a negative emotional setting. Emotion, 1, 51-69. Kershaw, T. C., & Ohlsson, S. (2004). Multiple causes of difficulty in insight: The case of the nine-dot problem. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 3-13. Krauss, R. M., & Fussell, S. R. (1996). Social psychological models of interpersonal communication. In E. T. Higgins & A. W. Kruglanski (Eds.), Social psychology: Handbook of basic principles (pp.655-701). New York: Guilford. Kraut, R. E., & Johnston, R. E. (1979). Social and emotional messages of smiling: An ethological approach. Journal of Personality and Social Psychology, 37, 1539-1553. Lakin, J. L., & Chartrand, T. L. (2003). Using nonconscious behavioral mimicry to create affiliation and rapport. Psychological Science, 14, 334-339. Lakin, J. L., & Chartrand, T. L. (2005). Exclusion and nonconscious behavioral mimicry. In K. D. Williams, J. P. Forgas, & W. von Hippel (Eds.), The social outcast: Ostracism, social exclusion, rejection, and bullying (pp.279-295). New York: Psychology Press. Levinson, S. C. (1983). Pragmatics. Cambridge, UK: Cambridge University Press.
MICRO-SOCIAL CONTEXT 145
Linell, P. (1982). The written language bias in linguistics. Department of Communication Studies, University of Linkoping, Sweden. Linell, P. (2005). The written language bias in linguistics: Its nature, origins and transformations. London: Routledge. Lundqvist, L.-O. (2006). A Swedish adaptation of the Emotional Contagion Scale: Factor structure and psychometric properties. Scandinavian Journal of Psychology, 47, 263-272. Luria, A. R. (1987). Reductionism in psychology. In R. L. Gregory (Ed.), The Oxford companion to the mind (pp. 675-676). Oxford, UK: Oxford University Press. Manusov, V., & Rodriguez, J. S. (1989). Intentionality behind nonverbal messages: A perceiver’s perspective. Journal of Nonverbal Behavior, 13, 15-24. Marsh, K. L., Richardson, M. J., Baron, R. M., & Schmidt, R. C. (2006). Contrasting approaches to perceiving and acting with others. Ecological Psychology, 18, 1-38. McNeill, D. (1985). So you think gestures are nonverbal? Psychological Bulletin, 92, 350-371. Niedenthal, P. M., Barsalou, L. W., Winkielman, P., Krauth-Gruber, S., & Ric, F. (2005). Embodiment in attitudes, social perception, and emotion. Personality and Social Psychology Review, 9, 184-211. Özyürek, A. (2000). The influence of addressee location on spatial language and representational gestures of direction. In D. McNeill (Ed.), Language and gesture (pp.64-83). Cambridge, UK: Cambridge University Press. Özyürek, A. (2002). Do speakers design their co-speech gestures for their addressees? The effects of addressee location on representational gestures. Journal of Memory and Language, 46, 688-704. Parkinson, B. (2005). Do facial movements express emotions or communicate motives? Personality and Social Psychology Review, 9, 278-311. Pasupathi, M., Carstensen, L. L., Levenson, R. W. , & Gottman, J. M. (1999). Responsive listening in long-married couples: A psycholinguistic perspective. Journal of Nonverbal Behavior, 23, 173-193. Reber, A. S. (1985). The Penguin dictionary of psychology. London, UK: Penguin. Richardson, M. J., Marsh, K. L., & Schmidt, R. C. (2005). Effects of visual and verbal interaction on unintentional interpersonal coordination. Journal of Experimental Psychology: Human Perception and Performance, 31, 62-79 Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110, 145-172. Schober, M. F., & Clark, H. H. (1989). Understanding by addressees and overhearers. Cognitive Psychology, 21, 211-232. Sebanz, N., Knoblich, G., & Prinz, W. (2005). How two share a task: Corepresenting stimulusresponse mappings. Journal of Experimental Psychology: Human Perception and Performance, 31, 1234-1246. Smith, A. (1966). The theory of moral sentiments. London, UK: A. Miller. (Original work published 1759.) Sonnby-Borgström, M. (2002). Automatic mimicry reactions as related to differences in emotional empathy. Scandinavian Journal of Psychology, 43, 433-443. Sonnby-Borgström, M., & Jönsson, P. (2003). Models-of-self and models-of-others as related to facial muscle reactions at different levels of cognitive control. Scandinavian Journal of Psychology, 44, 141-151. Sonnby-Borgström, M., & Jönsson, P. (2004). Dismissing-avoidant pattern of attachment and mimicry reactions at different levels of information processing. Scandinavian Journal of Psychology, 45, 103-113. Sonnby-Borgström, M., Jönsson, P., & Svensson, O. (2003). Emotional empathy as related to mimicry reactions at different levels of information processing. Journal of Nonverbal Behavior, 27, 3-23. Tiedens, L. Z., & Fragale, A. R. (2003). Power moves: Complementarity in dominant and submissive nonverbal behavior. Journal of Personality and Social Psychology, 84, 558-568.
146 JANET BAVELAS Van Swol, L. M. (2003). The effects of nonverbal mirroring on perceived persuasiveness, agreement with an imitator, and reciprocity in a group discussion. Communication Research, 30, 461-480. Vreeke, G. J., & van der Mark, I. L. (2003). Empathy, an integrative model. New Ideas in Psychology, 21, 177-207. Williams, A. C. de C. (2002). Facial expression of pain: An evolutionary account. Behavioral and Brain Sciences, 25, 439-488. Wilson, M., & Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131, 460-473.
Master Speakers, Master Gesturers A String Quartet Master Class John B. Haviland University of California, San Diego
Pushing McNeill’s metaphor of the ‘growth point’, this chapter examines a string quartet ‘Master Class’, in which a group of professional musicians gives musical explanations and demonstrations to a student quartet, to improve both their performance and their musicality. It examines in detail the complex interplay of different expressive modalities, including talk, song, humming, playing, mime, and gesture. It further considers the intricate interaction between different participants—not only the students and professional musicians, but also the audience, the instruments, and the score itself, taken as a representational sediment of both composer and musical tradition. The chapter concludes with a plea for reexamining certain theoretical dichotomies, often appealed to in studies of interaction, in light of the emergent and deeply multimodal nature of this sort of masterful speech.
1.
Master Speakers
Despite the orthodox position that the object of linguistic theorizing is a shared core of linguistic competence—abstract knowledge of language that characterizes ‘ideal speaker-hearers’—in ordinary life, differential skill in using language is the norm. My first field research was to study Zinacantec musicians in highland Chiapas, Mexico, and along the way to learn about the variety of Tzotzil they spoke. That experience brought the matter strikingly home. As I was sent from one teacher to another, it was quickly obvious that certain musicians, and certain talkers, were simply better than others: everyone knew it, everyone commented on it, and even I—fledgling tenderfoot—could perceive it. By the end of my first summer in Zinacantán, I had acquired several remarkable teachers, one (shown on the left in Figure 1) a master musician, the other (in the middle) a master talker.1 Trying to keep (or catch) up with what these masters—neither of whom, lamentably, is still ta sba balamil ‘on the face of the earth’—tried to teach me has occupied me ever since. It is for them, along with another master 1
The man with a red turban shown on the right turned out to be another kind of master talker, a powerful shaman who cures through prayer. The younger man was also a linguistic expert, an ixkirvano [< Span. escribano] or scribe who kept written records for the moletik ‘elders’ or senior religious officials.
148 JOHN B. HAVILAND speaker/teacher, David McNeill, whose ideas inform virtually all current work on gesture, that I offer this brief essay about masterful talk and gesture.
Figure 1: Zinacantec elders in 1966.
2.
Master Musicians and Master Classes
As it turns out, Zinacantec musicians are themselves master speakers. Not only must they know the appropriate cycle of songs for the many kinds of ritual event where they play, but a central part of their job is to talk: giving expert ritual advice, joking, and generally entertaining ritual participants through fiestas that sometimes last for four days and nights. Being a musician in Zinacantán is a matter of specialized expertise, and although it might be possible to discern a generalized ‘least common denominator’ for a musician’s expertise, no musician is sought out for knowing only that. Zinacantec musicians do not teach their skills. In fact, Zinacantec theory counts knowing how to play music as something one cannot learn. It is a gift, bestowed by ancestral gods in a dream (Haviland 1967). One goes to sleep ‘ignorant’ one day, dreams, and wakes up ready to perform the next. There is accordingly no tradition of teaching music in Zinacantán, and almost no vocabulary for musical criticism, either of performance or technique. At the kind invitation of Leila Falk, of the Reed College Music Department, in February 2003 I filmed a different though related sort of musical expertise in action: a string quartet ‘Master Class’. The Euclid String Quartet, a young professional group, had agreed to lead a class with the Lysistrata String Quartet composed of Reed students. A long-standing interest in interaction in its various embodied forms (along with my rusty fiddler’s envious curiosity) inspired me to haul multiple cameras2 across the campus and to set them up in the practice room where the master class was to happen. A musical ‘Master Class’ is an occasion when expertise and mastery are explicitly on display. Those master musicians who are also master teachers need 2
David McNeill’s invitation to participate in a multidisciplinary project from the National Science Foundation KDI program, Grant No. BCS-9980054, “Cross-Modal Analysis of Signal and Sense: Multimedia Corpora and Tools for Gesture, Speech, and Gaze Research” headed by Francis Quek, gave me the multiple video cameras in the first place.
MASTER SPEAKERS, MASTER GESTURERS 149
to be experts in both demonstrating and ‘talking about’ what they know. In classical music, ‘master teaching’ goes beyond musical fundamentals or instrumental techniques to issues of artistry, musicianship, musical tradition, and history. Multiple modes of expression are involved: minimally talk, but usually also embodied interaction between musicians, with each other and with their instruments. Looking closely at a musical master class allows us to see the multiple signaling modalities master teachers have at their disposal, how they complement each other expressively, and how they are coordinated. 3.
Expressive Complementarity and the Growth Point
One of McNeill’s central observations is that utterances have multiple, typically complementary expressive aspects. Still, processing demands for producing a stream of phonetic segments seem to be different, for example, from those for producing the four dimensional images characteristic of gesture. The strict co-temporality of speech and gesture, therefore, suggests cognitive connections between such different sorts of processing, captured in McNeill’s metaphor of the growth point. The view licenses a search for a semiotic ‘division of labor’ between different co-expressive modalities, supposing that gesture and speech (among other signaling devices) might have different, characteristic, expressive virtues, though perhaps relative to differences among both languages and “gesture cultures” or traditions. McNeill’s notion of a ‘catchment’ has further intriguing consequences, hinting that somehow the gestural modality captures and preserves semiotic configurations or perspectives over time, giving the analyst a further window onto ongoing cognition in utterance, different from but cosynchronous with that afforded by the speech stream. Both ideas suggest empirical enquiries congenial to a field anthropologist like me. Utterances in the wild may be expected to display different semiotic balances in the expressive loads of speech and gesture, and the unfolding over a stretch of turns at talk of different utterance modalities lays bare complementary aspects of conceptualization and thought. McNeill’s ideas are thus an inspiration for ethnographic enquiry and observation. In a context like the master class, what IS the coexpressive relationship between different signaling modalities? Does the semiotic division of labor remain constant over time, different pedagogic moments, or even different utterances which can in some sense be seen to have similar ‘meanings’ or functional loads? And, to return to my starting point, is there anything distinctive about ‘masterful’ performances with respect to this multimodality? Of course, the ethnographic moment is considerably messier than the controlled environment of a psychology experiment. Still, mess can be instructive. McNeill’s growth point model will also ultimately have to deal with dilemmas of the following sort. First, since utterances are normally conversational and interactive, emerging in turn sequences, they seem to reflect not an individual but rather an intersubjective and distributed kind of cognition. The trademark experience of the
150 JOHN B. HAVILAND anthropologist is encountering people at home, doing what they do, and usually doing it together with others. This is especially true when people talk because by and large they talk together. This is why some psycholinguists view language as an ‘emergent’ phenomenon arising in joint activity between interlocutors, rather than, for example, as the excrescence of individual cognitions (Clark, 1996). Interaction is a compelling model for talk, even apparently monologic talk. Interactive ‘emergence’ is, however, completely undeniable in the case of chamber music. The string quartet is a paradigm example of a whole bigger than the sum of its parts. As any weekend musician knows, there are individual parts, but they don’t amount to much by themselves. Rather, whatever the technical or musical challenge of a single instrumental part, it remains nothing but notes without the other three parts. Interestingly, musical master classes take pains to bring this point home. ‘Mastery’ in string quartet playing is partly grasping the big picture while playing in one’s own little corner. If this is part of the musicianship a master class is concerned with, such a class is a good place to observe how one can talk (or otherwise communicate) about joint action, coordination, and emergence. The appropriate metalanguage—for talking precisely about interaction and emergence—is unavoidably marshaled to the occasion, even if it must be invented on the spot. One of my interests in these classes is how the interactants create appropriate representational metalanguages, in this specific context, for the ‘emergence’ of something organic that goes beyond individual action. Second, the interactive and emergent nature of the string quartet master class produces other complications to the monadic growth point model. An important factor is the independent role of the body, which acts not only as semiotic signaling vehicle but as a primary instrument of action (and invention) in the context of a string quartet. Musicians’ bodies and their instruments interact directly to produce the music, the primary stuff of performance and the essential target of criticism. Similarly, as part of teaching master musicians talk, but they also play—demonstrating their expertise with full performance, or with variously reduced surrogates of performance, from mime to song. Corporeal expression is thus not limited to the imagistic expressions of the putative semantic or cognitive kernel of utterance; the body has a direct generative role in what is to be communicated. Insofar as music is produced through interaction with others and with objects, and involves non-speech sound, the raw materials of the ‘lived environment’ in which a musical master class takes place are especially rich and significant for understanding the communicative process. A third complication in the material I shall present is the presence in the master class of at least one extra virtual participant: the composer, embodied in this context by the written musical score. The notes on the written page are one representation of the ‘music’, taken as an expression of the composer’s intentions. Musical tradition, history, and lore surrounding a composer’s opus also emerge during the master class, and represent a further interactive axis against which performance is evaluated and around which utterances are constructed.
MASTER SPEAKERS, MASTER GESTURERS 151
In the McNeill model as I understand it, the ‘growth point’ is taken as the dynamic cognitive kernel or wellspring which energizes different partial representations in various semiotic channels: the words have certain communicative virtues, the gestures others. The representations that emerge in the string quartet master class are especially complex: interactively produced, serendipitously constructed from a rich range of raw materials that include words, gestures, performance, and varying combinations of all these, together with the instruments, their sounds, the score itself (both as physical artifact and as virtual notation), and so on, all in interaction with the immediate environment, both social and physical. The master class is thus a useful test bed to examine the semiotic division of labor predicted by the McNeill model, over a complex sequence of communicative acts, collaboratively enunciated by different actors, and with a rich palette of expressive media. In particular, I think that examples like those in this essay provide strong, and perhaps unexpected confirmation for one of—for me— McNeill’s leading ideas: that gesture provides a rich window onto the mind. For insofar as the master teachers on display here are extemporizing their lessons, working out interactively and in the moment what they want to convey and how to do it, the evanescent marshalling of one communicative device on top of another gives clear if indirect insight into how their minds work. I start with utterances from the master class whose form seems congenial to the growth point model—a clear and complementary division of labor between spoken and gestured communicative tracks—and move through others where the semiotic division of expressive labor seems more varied, less temporally coherent, and linked in more complex ways to the physical and interactive surround. 4.
The Concept of ‘Octave Balance’
Consider how the professional cellist introduces a concept she calls ‘octave balance.’ She is commenting on the student performance of the first movement of Mozart’s string quartet #23 in F, K. 590. (See example 1.) The issue is the relative volume of the different instruments. Since the teacher is herself a cellist, not surprisingly she emphasizes the special responsibility of the cello to provide a strong foundation in the lowest octave when several instruments are playing the same notes in different octaves. (There is a historical moral here, as well, since this particular quartet—known in the tradition as the third of the ‘King of Prussia’ or ‘cello’ quartets—was commissioned by Friedrich Wilhelm II, himself a, “better than average,” cellist. So, as the professional cellist comments to her student counterpart, “You are the King of Prussia.”) The concept of ‘octave balance’ is expressed concisely in the speaker’s words: “if you’re all playing the same melody, but in different octaves . . . the heart of it is the lowest octave.”
152 JOHN B. HAVILAND (1)
Octave balance3 7
9
10
11
a c; if you’re all playing the same melody= a. both hands come together, clasped down in front of body (see Figure 2a) b c d e..... c; = but in different octaves b. Hands start to separate c. L coming up, R point down d. L rises still higher (see Figure 2b) e. B retreat to rest f the most . f. both hands rise clasped g h i j y’know the hEArt of it . is the lOwest octave . g. clasped hands beat downwards h, i, j. further clasped hand beats
Figure 2a/b Octave balance.
The speaker makes graphic use of gestures, as well, and they seem to illustrate a particular conception of the musical relationships involved: her hands are clasped (Figure 2a) as she talks about different instruments playing the same melody, and then held apart with a vertical interval between them (Figure 2b) as she talks about playing in, “different octaves.” The spatial representation of an octave, perhaps modeled on the graphic representation of standard musical notation in the score, is clearer still as she repeats her point about octave balance (in fragment 2): “if you’re playing in octaves, the lowest line will lead it.” As she says, “octave” in the phrase, “the lowest part of an octave,” her hands seem to depict four quick steps (Figure 3), again suggesting how the eight notes of an octave are separated by four lines on the stave. The gestures, that is, seem to give evidence about a mental representation of a musical relationship, the ‘octave’ named in words and in gesture modeled apparently on a traditional graphic or 3
Because of the complexity of the illustrative materials I have transcribed the video with the following conventions. Each line of text (and sometimes music played) is shown in Courier type in numbered lines. Above these lines, synchronized with the accompanying words, are small letters indicating some phase of bodily action, which is then described in words, in sans serif type, in lines keyed to the letters that follow the transcribed speech. I have also illustrated some actions in figures. Video clips of the relevant segments will be available online.
MASTER SPEAKERS, MASTER GESTURERS 153
visual representation (see Figure 4, which shows the opening bars of this Mozart quartet, the violins and viola in unison, and the cello in a lower octave). With no explicit evidence (I did not actually debrief the musicians after the filming), one might speculate that this highly trained musical expert understands the concept of ‘octave balance’ in ways similarly complementary to the different ways she expresses it: as a propositional statement of relationships between named concepts, as a visual image, and presumably also as a musical relationship expressed in sound, if not as well in her embodied experience as a performer. Gestural and verbal channels capture complementary, though interlocked, aspects of such a multimodal gestalt. (2)
“Octaves” a.... 17 the lowest part . b.....c..d..e... 18 o-of . an octave .............f...........g 19 if you’re playing in octaves ......h..........i.................j.. 20 or unison . the lOwest line will lead it a. both hands up with palms up, splayed cupped fingers b. LH under facing up, RH down over c-d-e. RH moves out in 3 short steps (‘scale, octaves’, see Figure 3) f. hands held horizontally to show octave interval, g. slight shake h. RH,. above, moves slightly forward (‘unison’) i. one beat, and RH draws back to grasp LH (‘lowest’) j. slight shake on ‘lead’
Figure 3: “an octave”
Figure 4: Octave balance.
154 JOHN B. HAVILAND In the ensuing talk, still about the relative balance between the instruments, the tight semantic coordination between word and gesture changes to what seems another characteristic pattern. Speakers frequently encode in gesture aspects of their ‘messages’ that find little or only partial expression in speech. The first violinist, in this case, takes up the question of how different instruments must assume responsibilities as the balance between parts changes. Just before he says to the student violist that she must, “play more,” he demonstrates, in gesture, how she must play “more”: by playing stronger, more loudly. (3)
Fist .......a.......b c....d 41 but you need to play . much much more a. fingers retract to a fist, shaken out once b. and twice, then held c. then shaken out again d. and again lower, and held
Figure 5: “Play much much more”
His fist, formed exactly when he says, “play,” (Figure 5) seems to fill out his words with unspoken gestural imagery. (And it is a corporeal image, a kind of proto emblem, which he uses again—see Figure 27, below.) More complex gestural semiosis is evident as he continues his exhortation to the violist, whose playing he apparently has found overly timid. Gesture has indexical immediacy largely denied to words (one reason spoken deictics often receive gestural ‘supplementation’—“give me that!” with an accompanying pointing gesture). Therefore, it is unsurprising that as he speaks to the violist he also gestures toward her, first with an open hand and extended figures at (b) in example (4), line 30. (See Figure 6) He indexes his co-present interlocutor to identify her with the hypothetical violist in his spoken scenario. As he repairs his utterance in line 31, he again points to the violist, gesturally projecting the abstract ‘viola’ he mentions in words onto the student viola player he indexes in the interactive environment. His gestural deixis is what we might call ‘semitransposed’ as it coordinates two quite distinct referential planes: the viola part in the abstract or Platonic quartet and the physically co-present performer.
MASTER SPEAKERS, MASTER GESTURERS 155
(4)
Indirect deixis a........b...... 30 then the violist. . . ..c...............d.......e 31 the viola becomes the bass line= a. LH rises b. points with open palm out to viola player, smile c. 2nd point to viola, smiling d. hand sweeps down to left and low e. retracts to adjust glasses
Figure 6: “The violist…”
5.
Mime, Song, and Score: “Little Accents”
Fundamentally different semiotic modes are evident in another part of the class, when the second violinist begins to talk about rhythm and accent in the students’ Mozart performance, citing a passage of which part appears in Figure 7. Here the 2nd violin and viola play little off-beat eighth notes against the cello’s bass foundation on the downbeat. The master teachers want to inject a bit of life here, since the students have tended to play their parts as mere accompaniment to the 1st violin’s melody. Each of the instrumentalists suggests ways his or her student counterpart can achieve the desired effect, and the professional 2nd violinist combines mime, gesture, vocalization, and musical notation.
Figure 7: Score for “little accents”
First he mimes the kind of playing that he does NOT want, by ‘playing’ the 2nd violin’s eighth notes in the air—no instrument, just hand positions and arm movements—while at the same time pretending to look around in a bored and distracted way, as if paying little attention to the music. (See Figure 8a4) 4
The afternoon sun was shining in the window, which accounts for the white blob across his face
156 JOHN B. HAVILAND
Figure 8a/b/c: Mimed distracted playing, little accents.
Instead, he suggests, the ‘accompaniment’ part is very important. He provides further images to show how it ought to be played. (See transcript 4.) First he combines the verbal expression “little accents” with a gesture (at lines 7 & 8— see Figure 8b) that appears to capture the standard graphic representation for accents in musical notion: little dots written above each note on the score. (5)
“Little accents” 7.
8.
9.
a.... .b....c.. imagine that there are . a. RH hand up, fingers bunched b. pushes out once to front c. and quickly pushes a second time a........b..........c..d..e…… little . accents on every one of those notes a. RH with bunched fingers darts out once b. and again c,d,e. and quickly out in 3 stages as shown a…. . . . . . . . . . . . . . . p p p p p p p . p p p p p p p a. little peck with the fingers on each vocalization in each group of 7, constantly moving farther forward
He goes on (in line 9) to demonstrate how the result would sound, producing a tiny vocalization for each of the notes, i.e., half singing a couple of sample measures, while at the same time illustrating the accents with a further thrust of his bunched fingers. (See Figure 8c.) He thus combines several radically different but complementary modes of signification: the words (“accents on each note”), an embodied mimed performance, the graphical musical notation, both indexed and symbolized in gesture, and a spoken simulacrum of the resulting musical performance. Such ‘multimodal’ representation turns out to be a central device in the virtuoso teaching repertoire of these master musicians.
in the picture, for which I as cameraman apologize.
MASTER SPEAKERS, MASTER GESTURERS 157
6.
The Body and the Instrument
Since the social setting of the class involves a range of different kinds of participants, utterances are presumably designed in some sense for all of them— from the active musicians to the observing teachers and students (and perhaps for me, the filming ethnographer). The composer is as I have commented virtually copresent, as well, embodied in the score and its associated lore. The musical instruments themselves are also participants, with their own expressive resources and interactive virtues. Because the instruments are operated by the playing body of musicians, the body and its techniques are prominent in the master class. Indeed, the body itself provides a repertoire of expressive resources which are variously incorporated into utterances. The instruments, too, have parts and associated techniques which can be emancipated or ritualized in an ethological sense—dissociated from actual playing and turned into signs. Finally, since music is sound, sonic surrogates can also be incorporated into the master musician’s expressive arsenal. I turn now to an extended examination of a part of the class that artfully combines these multiple semiotic affordances. We can see them in action in the ‘demonstration’ that accompanies what we might call a ‘metapragmatic presentational’ (Lucy, 1993) by the master viola player. Unable to restrain himself after the student quartet’s performance, he jumps up, instrument in hand, and begins a remarkable teaching sequence that combines spoken explanation, demonstration playing, co-playing, mime, song, gesture of various kinds, and even physical manipulation of the score, the students’ bodies, and their instruments. He starts by contrasting how the students are playing the opening bars of the Mozart quartet with how he thinks they should go. (Refer again to Figure 4.) He uses two variants of the standard American speech verb ‘to go’ or ‘to be all’ to contrast what the students are, “doing” (see example 6, lines 3-4)—which he vocalizes, with a few illustrative beats of his hands, and whose rhythm he characterizes in words (lines 5-6)—showing how instead, “it should be all…,” with an accompanying demonstration (lines 7-8) that involves exaggerated singing. (6)
“It should be all …” 3
a. . . . . you guys are doing a. RH held palm inward, fingers vibrate, rotate (Figure 9a)
158 JOHN B. HAVILAND
Figure 9a/b/c: “You guys are doing….”
4
b c d .... da:h . ba ba bum pa ba pa pa b. RH slightly away from body, still c, d etc. RH and LH with fiddle beat down sharply (Figure 9b)
The violist uses his body much the way an orchestral conductor might (see Braem & Braem 1998; and below) to suggest both dynamics and rhythm in the performance he is representing vocally. 5
a..........b.....c it sounds like four- mma. RH starts up, index finger extends b. to highest position at R shoulder c. slashes down and in d.....e................f
6
even louder . than the downbeat d. RH points to violist e. beats downward once (Figure 9c) f. hand dropped out to right, with palm up
He explains that the students seem to have emphasized the downward arpeggio at the end of the measure more than the strong initial note at its beginning, illustrating ‘downbeat’ with a downward pointing gesture (and perhaps also affiliating himself with the student viola player by pointing at her). 7
8
a.......b........c and it should be all a. RH curls in to body, b. palm facing in c. and vibrates slightly as head shakes side to side a....b.... ............c ta:m ba pa pa pa pa pa bum a. RH out, fingers curled in b. RH snakes out beat by beat, as he leans forward, singing c. RH sweeps up at end of phrase
MASTER SPEAKERS, MASTER GESTURERS 159
Figure 10: “It should be all …”
Finally, in his demonstration of how the passage should go, he illustrates in dramatic singing the emphasis and dynamics he has in mind. He further inflects this vocal performance with a reaching gesture—a kind of diagrammatic sweep of the arm (see Figure 10). The hand and arm together index each note in a progression of steps outward from his body, and also symbolically echo the expressive attitudes of an operatic singer.5 (Note the little flourish of his hand in the last frame of Figure 10.) He thus concatenates a partial verbal characterization of the music onto a virtual performance of the passage, transposed, as it were, from one musical idiom (string quartet playing) to another (song). 7.
The Syntax(es) of Multimodality
The Euclid viola player was a true virtuoso of the multimodal sign. One of the most striking features of his utterances as he teaches is the nearly seamless flow between one modality (or combination of modalities) and another. His performance also blurs the boundaries between some of the standard categories in gestural typology—an issue to which I return at the end of this essay. For example, the possibility of integrating real musical performance into utterance conjures a phenomenon akin to the Geertzian wink: what distinguishes ‘real’ playing from, say, exaggerated playing, or practice playing—rehearsing or ‘trying out’—and then again from mimed playing (which shares some aspects—more or less exact body movements, for example—with the real thing) or gestures which in more or less stylized ways mirror playing? This master teacher combines all of these and more. Consider the following complex sequence, which involves diverse interactions between the musician’s body, his instrument, and the musical score. The problem at hand is exactly how to organize the use of the bow—always an issue in string technique—in the initial Mozart passage. All four instruments are playing in unison, here, and so the teacher is trying to work out a common bowing solution. To ‘work out’ involves actually experimenting with the instrument, so he begins with the exact notes to be played, read off the score. He then ‘exhibits 5
At another point in this striking sequence he explicitly likens the way the passage should be played to how Pavarotti might sing it.
160 JOHN B. HAVILAND thinking’ (with eyes turned upward—see Figure 12) as he simultaneously seems to imagine physically what the bow motion he proposes would feel like. (Note that he holds the real bow in his right hand and moves it against the outstretched index figure of his left, which represents a virtual viola string.) He imagines first a downbow motion (from the ‘frog’ or bottom end of the bow where he holds it and moving it downward toward the tip) at lines 12 and 13 (where he mimics the same downbow motion in gesture at a), and then, when he lifts the instrument to his chin to play, he imagines instead the opposite upbow movement (as he says at line 14, having already placed the tip of his bow on the viola at c). This is how he starts when he plays at line 15: a long upbow for the first piano measure, and then a strong downbow for the first forte note of the 2nd measure. (Figure 11 shows just the first violin part.)
Figure 11: Mozart opening (1st violin part only).
(7)
Explaining through trying 11 12
a. I would suggest . a. turns head to left, looks down to score try down-
Figure 12:“Try…” a... 13 try- . . ....b .....................c 14 try starting out on . upbow a. RH with bow starts out on downbow motion b. lifts instrument to chin 15 ((plays from music)) c. moves bow to tip for upbow (Figure 13a)
MASTER SPEAKERS, MASTER GESTURERS 161
Figure 13a/b/c: “Upbow…”
Apparently satisfied with the result, he now repeats the motion of the upbow, further qualifying it in words at line 16 (“very light”) and producing a light inbreath through pursed lips, simulating both the, “light,” sound and perhaps also the anticipatory tension of the note via the inbreath.6 a.........b.... c 16 very light . on the upbow a. drops instrument from chin b. begins upbow motion with bowhand, looking down at hand (Figure 13b) c. whistling mouth
He now repeats the performance, first miming the bowing he wants (at 17 a-b), and then playing it while first humming (17 d) and then saying, “here,” (18 a) at the transition to the strong downbow in the second measure (18 b). a.......b.. c......... d… 17 almost like . seamless on the mm.. a. bows up b. bows down (without playing) c. lifts instrument to chin and d. plays upbow ((Plays)) a........b 18 he:re ((playing)) a. plays upbow (Figure 13c) b. starts strong downbow playing phrase
Once more he plays the phrase, with the desired bowing and dynamics (line 19), and then he switches modalities: he passes the bow swiftly to his left hand, and uses the empty right hand first to show a bunched fist (“strong”? at line 20 a—see Figure 21), then to mimic the downbow motion—but without actually holding the bow, thus a kind of stylized mime—at 20b, and finally anticipating the following
6
At another point he suggests actually using an inbreath on a silent downbeat to help the energy of playing a subsequent offbeat note.
162 JOHN B. HAVILAND series of short up and down bows for the sixteenth notes in measure two with a small movement of his hand (20c). a.....b 19 and then ((plays)) a. rapid upbow to get in position b. strong downbow, playing a.........b.............c 20 very- very- .... strong the downbow a. frees RH from bow (now held in LH), shakes hand with upward cupped fingers (Figure 14) b. drops RH to low position c. mimics wrist movement of short upbow
Figure 14:“Very strong…”
He now turns his full attention to the second measure, singing the notes again and miming the bowing motion he has almost experimentally proposed: a long hard downbow for the first long note (21e), and then a single smooth upbow for the sixteenth notes he sings (at 21f). e......f........... g 21 taaah: di da da da and- . e. long down bow motion w. RH (Figure 15a.) f. smooth upbow motion w. RH g. RH stops movement, lifts palm out fingers out
Figure 15a/b/c: “Taa did a…”
MASTER SPEAKERS, MASTER GESTURERS 163
Here he encounters another problem, namely the transition between the long and loud initial note of the 2nd measure, and the quick run of sixteenth notes that follows. He wants this transition to be smoother than what he has heard in the student performance, which he goes on to mimic in his ‘conductor’ whole body style (line 23), showing the unwanted slight break between long note and short notes (see Figure 23, and line 23b). a......b......c......d....e 22 very . sh:ort not so much break a-b. two downward strokes (Figure 15b.) c-e. small downward beats with RH a....b....c......d 23 not ta:m hhh. di ta ta ta ta ta a. small downbow b. whole body shifts, both hands up (Figure 15c.) c. RH beats downward with notes d. suddenly turns head to L to consult score
Here he seeks experiential confirmation, again turning to the score to play more of the passage even as he continues to talk (24b). a b... 24 actually, hold on, let me se(ee:) a. lifts instrument to chin, looking at score (Figure 16.) b. starts to sing as he enunciates ‘see’ 25 ((Plays))
Figure 16c: “Actually, hold on…”
It is in the sequence that follows (example 8) that the line not just between real and mimed playing, but also between mime and gesture begins to blur. The viola player has played the passage for himself and decided that the first measure should be played with a light upbow, followed by a strong downbow for the beginning of the second measure; but instead of drawing the bow all the way to the tip he wants the students to save enough bow to be able to play the run of short 16th notes right in the middle of the bow, where they have greater control and strength. After playing the two long notes again (line 28) he turns the bow into a diagram of itself: he points to where he wants the students to move on the bow— “all the way to the frog,” on the upbow (29b and again 30a-b, see Figure 17).
164 JOHN B. HAVILAND (8)
Mime versus gesture 28
((Plays first measure and a half on upbow and downbow)) a....................b 29 try to go all the waya. swiftly drops instrument b. with index finger of LH (holding viola) points to frog of bow in RH a..........b............... 30 if you could . try to go all the way to the frog a. touches bow low with LH index finger b. draws finger from mid bow down the bow toward frog
Figure 17: “If you could try to go all the way to the frog.”
Now he does something semiotically more complex. He mimes the downbow motion against an outstretched finger—again a virtual viola string—and enjoins the students to, “save it,” i.e., not use the whole bow length on the strong downbow note (Figure 18). This is, “so you can…” (line 31c)—but what they ‘can’ do is neither played nor stated, but demonstrated with a sung line (32) and a simultaneous gestured demonstration (Figure 19) that involves the bow as a symbol of itself, moving against a gestured virtual string (line 32a-c). 31
a........b.................c and .. save it so you can . a. LH index finger marks spot like fiddle b. RH slowly draws bow down against L index finger c. quick movement with bow hand, quickly back to middle of bow against index finger
Figure 18: “And save it…”
MASTER SPEAKERS, MASTER GESTURERS 165
a...b..c.... 32 da ta ta ta ta ta tay a-etc. mimes quick up and down bowing
Figure 19: (Mimes bowing).
The bowing solution is now conceptually complete, but it remains for this master teacher to try to implement it with his students. He wants them to try it out, and in the process he wants both to refine the solution and to justify it. When the students’ first attempt fails (because they still end up too high on the bow for the 16th notes), he steps in (example 9) to offer a slight modification: start the upbow not at the tip but only midway up the bow. (See Figure 20a.) (9) The treachery of the bow a.........b 51 well . maybe from here a. moving bow up to position b. holds it at mid bow (Figure 20a.) 52 ((plays)) 53 so you havea..... 54 you know b......c 55 less possibility to be: a. turns gaze rapidly to 1st violinist b. looks at bow moving to viola c. positions bow at extreme tip (Figure 20b.)
Figure 20a/b/c: Awkwardness at the tip of the bow 56 ((plays badly at the tip))
166 JOHN B. HAVILAND a.. 57 (to get)- in trouble at the tip ok? a. rapidly drops fiddle from chin
He now has recourse to two further ‘multimodal’ resources. One is a different kind of demonstration: how not to play. Thus at line 55c (Figure 20c) he positions his own bow on the viola at the extreme tip and demonstrates awkward playing of the 16th notes from that position (line 56). He goes on to demonstrate, again in ever more stylized ways, the correct bowing again: first the upbow (at 59a with his bow in his hand but not playing) and then the downbow (now without the bow, just moving his hand, at line 59c). 58
so- .. a..... b...c..... 59 make sure you- you travel and uh- . a. moves RH out and up as if bowing b. puts bow in LH c. moves RH like smooth downbow
Figure 21:“I would start here.”
Finally, he actually picks up the 2nd violinist’s bow, at 61a, even as she holds it (see Figure 21), and moves it to exactly where he thinks she should start. a... 60 maybea..........b......c 61 I would start . even . around here . a. reaches out and takes 2nd violinist’s bow b. moves it up to middle position c. and drops it on her string there
The violist offers one final explanation for why he has spent such a long time on bowing. My interest here is the intercalcation of word and various kinds movement, a complex choreography of spoken deictics and gesturally elided grammar. (See example10.) He returns to the bowing solution: “the reason to do that,” he says (lines 79-80) is, “so that you can…” (line 81), where the complement clause to ‘can’ is supplied by a musical demonstration (line 82).
MASTER SPEAKERS, MASTER GESTURERS 167
(10) More complex deixis a...b.. 79 you do thisc..........d.......e 80 the reason to do that . is . a. RH holding bow, index finger pointing b. second beat down c. swift point to self? (Figure 22a.) d. body bends down and hand down e. RH lifted, index finger up (Figure 22b.)
Figure 22a/b: “The reason to do that” a b.............c 81 is . so that you can um- . a. points out with RH and bow b. retracts, starts to place instrument c. under chin (Figure 23a.) 82 ((plays))
Figure 23a/b/c: “So that you can …”
However, he cuts short the performance (at line 83), utters another deictic, “this” (line 84, Figure 23b), then continues to play the phrase. He annotates the played phrase with words (“a comfortable place here,” line 86b, Figure 23c) before finishing it the way he wants to demonstrate in line 87. 83 ((short downbow cut off)) 84 thisa . . . . . . 85 to be: ((plays down)) a. starts playing as he finishes saying, “be”
168 JOHN B. HAVILAND a. . . .b...................... 86 in a comfortable place here a. still playing as he talks (Figure 23c.) b. holds bow still where he stopped playing 87 ((plays down scale))
Once again he offers a contrast—how NOT to play the phrase—and once again the spoken deictics index a musical demonstration. If you have ended up at the tip of the bow (Figure 24), the sequence of fast notes will be impossible (“it won’t work,” line 91). a..... 88 if you’re here a. holds bow in place at 3/4 length 89 ((plays))
Figure 24: “If you’re here…” a 90 ((plays short notes)) a. shakes head a b 91 it won’t work . .
a-b. shakes head side to side
8.
Conducting
The body, the instrument, the voice, and the words of these musicians all combine to do the complex semiotic work required in a musical master class. There is little doubt that these somewhat stylized communicative skills are the product of years of musical training that involves both an intimate bodily connection with one’s instrument and an immersion in techniques of listening to and producing sound, in talking and hearing about music, and of playing and otherwise experiencing it. Some of these techniques are shared in a musical tradition (a ‘culture’)—for example, many are shared with orchestral conductors—and others are individual and idiosyncratic. When the students end the class with a second run through the first movement of the Borodin String Quartet #2, each master teacher displays seemingly characteristic styles of ‘leading’ or gesturally commenting on the performance. The 2nd violinist has already characterized this movement as,
MASTER SPEAKERS, MASTER GESTURERS 169
“almost literally musical fireworks,” using hand gestures (fireworks exploding, see Figure 25a) to illustrate his metaphor, and never touching his instrument. As the students launch into the Borodin, he continues to use the same hand gestures to try to breathe some fire into their performance (Figure 25b/c).
Figure 25a/b/c: “Musical fireworks”: The second violinist.
The cellist tended in her comments to work from the score, singing along and conducting with her bow, sometimes demonstrating on her instrument—and this is precisely what she does when the students play (Figure 26).
Figure 26: The cellist and the score.
The first violinist concentrates on force and rhythm, using ‘strong’ gestures, pounding fists, and clapping hands (Figure 27).
Figure 27: The first violinist.
Finally, the violist is, as we have seen, a highly ‘embodied’ teacher. He mimes along with the student musicians, emulating and conducting their bowing (Figure 28).
170 JOHN B. HAVILAND
Figure 28: Bowing.
When the Borodin starts, he is again unable to restrain himself, pumping his hand to pull the students into a stronger rhythm, then jumping from his chair, viola in hand, to play along (Figure 29).
Figure 29: The violist jumps up.
9.
Conclusion: Dubious Dichotomies
I began with master musicians and master speakers, and a general curiosity about what makes masters masterful. Thinking that part of mastery might be located in skillful marshalling of complementary expressive modes and resources, and inspired by the ‘growth point’ metaphor, explicitly extended to coordinated multi-person collaborative interaction, I presented fragments of a string quartet master class. These gradually introduce ever more intricate combinations of expressive resources, from talk, to singing, from playing, to gesturing. They demonstrate the performative equivalent of ‘intertextuality’, except that here the interrelated ‘texts’ range from verbalizations to full musical performances, from musical technique and tradition to musical scores. The master musicians also give new meaning to the oft-used notion of ‘embodiment’ since musicians’ bodies (and their instruments) become at once vehicles of performance and meta-performance,
MASTER SPEAKERS, MASTER GESTURERS 171
means for making music and for communicating about music, again in a characteristic and theoretically instructive way. Let me summarize what I take to be the main lessons by taking a few potshots at some frequently used but dubious theoretical distinctions frequent in analysis of discourse, especially by those who pay scant attention to emdoied interaction. If we look carefully at events like the Master Class, many of the facile dichotomies that are often employed in analyzing (talk in) interaction begin to lose their appeal. Here are some of my favorite targets, and I hope the material I have presented will illustrate at least some of the reasons why. Gesture versus speech There are simply too many ‘modes’ of signaling available to the participants in a master class for such a simple opposition to have much purchase. Talk easily fades into singing, and singing into humming. Playing moves to aped playing, or mimed playing, or movement that suggests playing, or a stylized movement that recalls (thus symbolizes) a movement introduced to suggest playing. Normal typologies of gesture lose their discrete categories, and the supposed hierarchical orderings between them become muddled. What is ‘tied to verbal utterance’ or ‘language-like’ or ‘conventionalized’ or indexical of speech content vs. speech rhythm becomes increasingly hard to decide. Similarly, the criterion of interdependence between speech and gesture becomes confused: in material we have seen, a speaker can substitute a played passage, or a gestured performance, for whole clauses; yet such movement sequences can hardly be counted as emblems or “quotable.” ‘Literal’ versus some other sort(s) of meaning In linguistic semantics one often assumes, as a kind of methodological scimitar, that lexical items come with ‘literal’ or ‘basic’ or ‘nuclear’ meanings, which may be pushed out of shape, extended, distorted, even reversed on ‘occasions of use’. Useful as such a principle of parsimony may be, it is hard to enforce ‘in the field’ where eliminating the contextual pushings and shovings on any single expression (to find the underlying commonality of literal meaning) may be very hard to do. This is one of the problems of situated observation, and it is one of the reasons psychologists despair of ever learning anything scientific from ethnographers. The expressive vocabulary of these master teachers seems to rely less on a prefabricated lexicon of ‘literal meanings’ and more on malleable techniques for pulling concepts (think of ‘octave balance’) from their natural musical homes into expressive domains that are of a different, non-musical order. Monads versus interactants Who are these people we are trying to understand, anyway? Who are the participants, even in the Master Class? There are eight string players immediately involved, along with teachers, other students, and observers. But none of these comes in discrete units, even though their bodies may look that way. They interact
172 JOHN B. HAVILAND in different conglomerates; they have identities that shift and realign themselves; and there are invisible participants (themselves also not monadic)—the composers, the patrons who commissioned the works, among others. Though I have made rather little of it here, there is a parallel between the teaching of this quartet of master musicians and a ‘co-narration’—because when there are conarrators, who is ‘the narrator’? The teachers here, trying to produce a quartet from 4 instrumentalists, mimic their music in their teaching, each playing his or her own part and trying to produce something that goes beyond any single part. I suspect that all interaction is a little like that. Cognition versus embodiment Are representations in the mind or in the body? These musicians seem to produce their ideas as much with their bodies as with any other cognitive organs. In such a case separating the ideational from the embodied begins to seem not only more than usually problematic methodologically, but also analytically unattractive if not untenable. Mental image versus emergent unfolding of expression Finally, observing the Master Class raises a slightly deeper problem: not just whether one has ideas in the mind or, as it were, in the body, but whether one has ideas at all, or whether they somehow ‘emerge’ in the course of semi-planned or extemporized socially contextualized interactive utterance. My late colleague Derek Freeman once remarked that the film he most wanted to see was, “of someone changing his mind.” Although the master violist may not have actually ‘changed his mind’ about anything in the course of the fragments I have presented, he has, at least, appeared to change expressive modal horses in mid stream. In the master class we seem to see ‘emergent cognitive unfolding’ all the time, as the teachers search around for ways to give what are evidently sometimes inchoate impressions of the students’ performances, or as they generate new ideas and suggestions about how to improve it, an expressive form in the very moment of expression. The resources for doing so are inherently ‘multimodal’, here, and I suspect, in general, as different expressive resources suggest or present themselves on the fly. The virtue of looking at master musicians, as well as at master speakers, is that we cannot easily idealize this multimodality away. References Braem, P. B., & T. Braem (1998). Expressive gestures used by classical orchestra conductors. In C. Müller & R. Posner (Eds.), The semantics and pragmatics of everyday gestures (pp.127144). Berlin: Weidler Buchverlag. Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. Haviland, J. B. (1967). Vob: traditional instrumental music in Zinacantan. Unpublished manuscript, Harvard Chiapas Project, Harvard University, Department of Social Relations. Lucy, J. A. (1993). Metapragmatic presentationals: reporting speech with quotatives in Yucatec Maya. In J. A. Lucy (Ed.), Reflexive Language: reported speech and metapragmatics (pp.91-125). Cambridge: Cambridge University Press.
Constructing Spatial Conceptualizations from Limited Input Evidence from Norwegian Sign Language* Scott K. Liddell University of Utah
Marit Vogt-Svendsen University of Oslo
Natural sign languages contain sets of signs that must be directed toward things as a normal and expected part of their production. The Norwegian Sign Language (NSL) possessive pronoun POSS→x, for example, meaning ‘his’, ‘her’, or ‘its’ must be directed toward the physically present possessor. If not physically present, the possessor will be conceptualized as present. As a result of this conceptualization, POSS→x can meet its grammatical requirement to be directed toward the possessor by being directed toward the conceptualized-as-present possessor. In order to understand the signer’s message the addressee must form spatial conceptualizations like those of the signer. However, the information signers provide addressees to guide them in the construction of spatial conceptualizations is sometimes quite limited. We explore how minimal clues can lead to the construction of elaborate spatial conceptualizations and how smoothly and easily the signer moves from one spatial representation to the next.
1.
Introduction
William Stokoe was the first to argue that American Sign Language was a real language, equivalent structurally and expressively to vocally produced languages. Prior to Stokoe (1960) sign languages were regarded as ‘merely gestures’, not equivalent to and more ‘primitive’ than vocally produced languages. In the 1970s linguists continued arguing for the status of sign languages as real languages. The examination of numerous sign languages revealed that their grammars show great structural and conceptual similarities to the grammars of vocally produced languages. For example, both spoken and signed languages have extensive lexical inventories. Both types of languages have grammatical mechanisms for creating morphologically complex words. Both have grammatical means of combining words/signs into larger syntactic constructions. The *
We give our thanks to Hege Lønning, Lise Marie Nyberg, Tommy Riise, Odd-Inge Schröder, and Line Stenseth for helpful discussions of the video narrative we examine in this paper and to KariAnne Selvik for her comments on an earlier draft of the chapter.
174 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN acquisition of both types of languages follows nearly identical timetables. It appears that, in general, Deaf children acquiring a signed language produce their first signs earlier than hearing children produce their first words. This may be due to the different types of muscular control needed to control the hands and arms as opposed to the muscles in the vocal tract. Except for this slightly earlier start for signing children, the acquisition of spoken and signed languages is remarkably parallel. Another part of the argument involved showing that signs were words structured by means of an abstract ‘cheremic’ representation equivalent in abstractness and structure to phonological representations in vocally produced languages.7 That is, in both types of languages lexical units are built from a limited inventory of meaningless (phonological) parts. This argument for the language status of signed languages had the effect of removing the concept of gesture as well as visual imagery from consideration in the analysis of signed languages. In vocally produced languages words are produced in the vocal tract while gestures are produced by other parts of the body.8 This made it relatively easy to conceptualize vocal language and gesture as two distinct and separate activities. In his ground-breaking work on gesture and vocally produced language David McNeill argues that “gestures are an integral part of language as much as are words, phrases, and sentences – gesture and language are one system” (1992:2, emphasis in the original). McNeill’s claim is that as different as gesture and language may appear to be, careful analysis reveals how strongly they are united into a single conceptual system. At that time, however, he saw no room for gesture in the case of signed languages. Following Kendon (1988), he argues that the sign language signal does not include the types of spontaneous imagistic gestures found in vocally produced languages. The basis of the argument is that the conventionalization of signs eliminates the possibility of spontaneous, imagistic gestures as part of the sign language signal (McNeill, 1992:40). Sign languages present an appearance that is quite opposite from vocally produced languages. Since all the activity of language production is accomplished by the very same articulators, including the face, the hands, and the body, it was easy to see sign languages as composed merely of gestures, as they were prior to Stokoe (1960), or as composed exclusively of signs and morphemes (but no gestures), as they were following the work of a large number of linguists in the 1970s. Liddell (1994) argues that some categories of lexical signs (e.g., pronouns and indicating verbs) are directed gesturally toward physically present referents. For example, directing the verb meaning ‘give’ toward a specific person, identifies that person as the recipient of the giving. Directing the verb meaning 7
Stokoe (1960) introduced the term ‘chereme’ as the signed equivalent of the phoneme in vocally produced languages. 8 Obviously there are vocal gestures as well as gestures of the arms, torso, face, etc. These vocal gestures have also been separated from the analysis of language.
CONSTRUCTING SPATIAL CONCEPTUALIZATION 175
‘tell’ toward a person identifies that person as the recipient of the information communicated by the telling. In both cases, if the person being referred to is physically present, the physically present person provides a target toward which the sign can be directed. While most of the aspects of the production of these signs are grammatically fixed, the direction in which they move is not. The directionality of these signs depends on the locations of those things being talked about. If the referent of a pronoun is to the right of the signer and at a lower level, for example, the pronoun itself will be directed rightward and down toward the physically present referent. Signers also use the space around them in ways that conceptualize nonphysically-present referents as if they were present (Liddell 2003). Pronouns and indicating verbs can then be directed toward these conceptualized-as-present entities. These analyses reintroduce both visual imagery and gesture into the analysis of signed languages. Liddell (2003:362) concludes that spontaneous gestures and visual imagery are as important in signed languages as they are in vocally produced languages. … the ASL language signal consists of more than conventional linguistic forms. It also includes gradient aspects of the signal (typically directional aspects or placement), and gestures of various types. All of these coordinated and integrated activities constitute the language signal and contribute to expressing the conceptual structure underlying the utterance.
Thus, while McNeill and Liddell are both dealing with the issue of gesture and language, McNeill was arguing for the unification of these two obviously physically different aspects of vocally produced language while Liddell was arguing in the case of signed languages for the need to make distinctions between grammar and gesture where the physical distinctions in the signal were not obvious. Crucial evidence for these gestural aspects of the sign language signal come from the behavior of signs with respect to physically present entities and spatial conceptualizations in which entities are conceptualized as present. In this chapter we explore how signers can provide addressees with minimal information that nevertheless allows them to create the appropriate spatial conceptualizations necessary for understanding signed discourse. 2.
Real Space Blends
Fauconnier (1985) introduces the concept of a mental space into the analysis of meaning in language. Mental spaces are conceptual structures containing representations of the things we talk about as we speak. A speaker or signer produces grammatical forms whose meanings are intended to be associated with mental spaces and mental space elements in the process of meaning construction (Fauconnier, 1997). One important mental space in the analysis of sign languages is real space, a person’s current conceptualization of the immediate environment (Liddell 1995; 2003).
176 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN It is typical of sign language discourse for signers to direct signs at real people, places, or things. If there were a man standing to the right of an NSL signer and the signer wanted to make reference to that man’s sister, this could easily be done with the signs in (1). (1)
POSS→man on right SISTER ‘his (the man on the right) sister’
In producing the possessive sign POSS→x (translatable as ‘his’, ‘her’, or ‘its’ depending on the nature of the possessor), a B handshape (all fingers fully extended and not spread apart) with the fingertips oriented up has the palm oriented toward some entity x and also moves outward toward x. The entity x is the possessor of the thing being talked about. In (1) the possessor is the man to the signer’s right. Thus, to correctly produce POSS→x when the possessor is the man to the right, the sign will be directed toward him. The notation POSS→man on right indicates that the sign is directed toward the man on the right. If the entity being talked about is not present, signers will conceptualize the space around them as if the entity were present. Consider, for example, the actions of the ASL signer in Figure 1 who is demonstrating how to slice open a fish by passing her right hand in a B shape between the thumb and fingers of her left hand. There is no knife and no fish present, so the signer conceptualizes the space around her as if a knife and fish were present.
Figure 1: An ASL demonstration of how to slice open a fish (from Liddell, 2003).
In this context it is easy to conceptualize the left hand as if it were holding a large fish and to conceptualize the right hand as the blade of a knife. The signer is describing an event—a person using a knife to cut open a fish. We can label that mental space as an event space. Her conceptualization of what is real in her current immediate environment is another mental space—real space. Elements from the event space and elements from real space blend together in the demonstration of cutting open a fish. The hand from real space blends with the knife in the event space to produce the knife in the blended space.9 We use the convention of enclosing references to blended entities within vertical brackets. Thus, we describe the hand-as-knife in Figure 1 as the |knife|. This is a shorthand 9
Fauconnier and Turner (1996) introduced the concept of mental space blending. See Liddell (2003) for an extended discussion of real space and real space blending.
CONSTRUCTING SPATIAL CONCEPTUALIZATION 177
way of talking about a conceptual element of a real space blend. It is much simpler to talk about the |knife| than to talk about the “hand-as-knife conceptualized as existing in the space ahead of the signer.” In addition to the |knife|, the blend in Figure 1 contains other easily identifiable elements including a |cook| holding the |fish| with her |hand|. The presence of the |knife|, the |hand|, and the |fish| is not difficult to understand. Evidence for the |cook| can be found in the first picture, where the |cook| is looking at the |fish| as the |knife| begins its movement.10 Many of the spatial conceptualizations ahead of the signer that we describe in this paper contain invisible elements. An addressee’s ability to properly conceptualize these invisible elements is crucial in order to understand what the signer is expressing. Because spatial conceptualizations are so important, signers must provide addressees with information helpful in constructing them so that addressees will know how the signer has conceptualized the space around her. Often signers are very explicit in providing information to the addressee with respect to how space has been conceptualized. Sometimes, however, the information that signers provide about spatial conceptualizations is quite limited. Regardless of whether the information is highly explicit or very limited, an addressee must properly conceptualize space in order to understand the signed message. The problem that addressees face in general, and we as analysts face in particular, is that spatial conceptualizations such as the |fish| in Figure 1 are completely invisible. Since we cannot see them, we are dependent on clues provided by the signer. In the narrative we examine in this chapter, the signer provides evidence for the spatial conceptualizations she uses by the way her signs are directed and placed, as well as how she directs her eye gaze and controls her posture. We use that evidence in an attempt to create spatial conceptualizations compatible with the signer’s own spatial conceptualizations. Even though the six spatial conceptualizations we discuss below are consistent with the directionality and placement of the signs in the narrative, it is likely that the conceptualizations any individual addressee will create will depend on knowledge of the subject being discussed (i.e., computers), the addressee’s personal experience, and so on. For example, an addressee used to working with large screen computers might be more likely to conceptualize a large computer screen in the space ahead of her. A signer used to working with a laptop computer might be more likely to conceptualize the screen of a laptop computer. The spaces we describe below are based on our experience and background with computers.
10
Readers not familiar with mental space theory or mental space blending can get much more detailed descriptions of these conceptual processes in Fauconnier (1997), Fauconnier & Turner (1996), and Liddell (2003). For more detailed descriptions of real space and real space blends used in ASL discourse see Liddell (2003).
178 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN 3.
Constructing Real Space Blends in the NSL narrative
3.1
Real space blend 1
For approximately five minutes two female Deaf signers of Norwegian Sign Language have been discussing the rapid development of technology and the use of computers in employment and education. The signer’s interlocutor has just mentioned that the computer makes it possible to edit a letter so that it can be written in grammatically perfect Norwegian. The signer responds with the statement in (2).11 (2)
PLUS ALSO OCCASIONALLY COME SO SQUIGGLY-LINE-EXTENDTOL1-L2 Plus, sometimes a squiggly line also appears.
Of the six signs in (2), SQUIGGLY-LINE-EXTEND-TOL1-L2 is the only one that is meaningfully placed in the space ahead of the signer. Figure 2 shows the beginning of the sign with the signer’s fingertip ahead of her nose at location L1. Her hand moves to her right from L1 to L2 along the path shown by the black line. As her hand moves along the rightward path her index finger repeatedly flexes at the tip. In addition, her face and gaze are directed forward and her eyes are narrowed as if looking at something.
Figure 2: The path of the finger from L1 to L2 in producing SQUIGGLY-LINE-EXTEND-TOL1-L2
At the conclusion of this sign her interlocutor responds, and both briefly sign at the same time. Their overlapping signing is shown in (3). (3)
Signer:
MEANS WRONG SPELL OR It means it is spelled wrong or
11 Throughout this paper we tentatively use the same notational symbols found in Liddell 2003. That is, we use the downward arrow as in SQUIGGLY-LINE-EXTEND-TOL1-L2 for signs that appear to be placed in a depicting space and a rightward arrow as in POSS→man on right SISTER (‘his sister’) for signs that appear to be directed toward things. We have to be tentative in this way since a thorough analysis of these NSL signs has not been undertaken.
CONSTRUCTING SPATIAL CONCEPTUALIZATION 179
Interlocutor:
YES GIVE-MESSAGE SQUIGGLY-LINE-EXTENDTOL1'-L2' Yes, it gives the message of a squiggly line.
The signer then stops while the interlocutor continues, explaining how annoying it is to type something, have the computer mark it as wrong, and not be able to find the reason it is marked wrong. Then the interlocutor adopts a posture and head position as if she were gazing at something ahead of her and makes two short horizontal movements with her index finger, one below the other, as if running her finger over the text on a computer monitor ahead of her. Immediately afterward the signer does something very similar by producing the three horizontal lines illustrated in Figure 3. The three horizontal movements of the signer’s index finger in Figure 3 are all lower than the initial location of the hand when she produced SQUIGGLY-LINE-EXTEND-TOL1-L2.
Figure 3: Three horizontal movements of the index finger at three different heights.
The addressee ends her turn with a reference to looking and looking for the source of the error. The signer then continues where she left off in (2) with the statement in (4). (4)
MEANS THERE→|program| PROGRAM THERE→|program| HAVE^NOT THERE→|word| [WORD]→|word| IN POSS→|computer| WORD^LIST12 … it means that the program doesn’t have the word in its word list.
In (4) the signer produces five directed signs. We interpret this directionality as evidence that the signer has conceptualized entities in the space ahead of her and is directing these signs toward them. Following Liddell (2003), we use superscript arrows at the end of the sign gloss to identify spatially directed or placed signs. Each arrow is followed by a label for the entity or location the sign is directed toward. Thus, the notation THERE→|program| identifies an instance of the sign THERE→L (where L is a variable location), directed in this case toward the |program|, conceptualized as being in the space ahead of the signer. The two
12
Placing square brackets around non-directional signs used in a directional way is another convention tentatively adopted in this chapter. That is, the notation [WORD]→|word| implies that the sign WORD is non-directional, but in this case is nevertheless being directed toward the conceptual |word| in the real space blend.
180 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN instances of THERE→|program| in the phrase THERE→|program| PROGRAM THERE→|program| are illustrated in Figure 4.13
a. THERE→|program| b. THERE→|program| Figure 4: The signer directs two pointing signs at the computer |program|.
A buoy is a weak hand sign held in a stationary configuration as the strong hand produces other signs.14 The weak hand in Figure 4 is producing a buoy in the form of the index finger directed upward both before and after (but not during) the production of PROGRAM. It is obscured by the right hand in Figure 4a, but can be clearly seen in Figure 4b. It is virtually identical in form, and appears to carry out the same role as the THEME buoy in ASL. In ASL the THEME buoy gives prominence to an important discourse theme (Liddell 2003). The noun phrase THERE→|program| PROGRAM THERE→|program| is referential and we assume that both instances of THERE→|program| are directed toward the computer |program| conceptualized as being in the space ahead of her, thereby identifying the |program| as the referent associated with the noun phrase. Her signing is based on an invisible spatial conceptualization. She produces the first instance of THERE→|program| (Figure 4a) deliberately and clearly over approximately two-tenths of a second (six video frames). The second instance of THERE→|program| follows PROGRAM (Figure 4b) and is much more abbreviated, being produced in approximately one-tenth of a second (3 video frames). We interpret these two instances of THERE→|program| as evidence that she has conceptualized the computer |program| further away from her than the |squiggly line|. When referring to the misspelled |word| she directs her signs differently. Figure 5a illustrates that in producing the phrase THERE→|word| [WORD]→|word|, she directs both signs to the same location in space where she previously directed SQUIGGLY-LINE-EXTEND-TOL1-L2 (Figure 2). 13
Similar signs in ASL have been treated as determiners (Hoffmeister 1977; Wilbur 1979; Zimmer & Patschke 1990; MacLaughlin 1997). Such an analysis has not been undertaken for NSL so we are tentatively treating the signs preceding and following PROGRAM as instances of the locative sign THERE→L. The distinction between identifying the signs as THERE→L or DETERMINER→x does not affect the analysis of spatial conceptualizations described here since both categories of signs are directed toward spatial entities. For a fuller discussion of directing signs toward locations and things see Liddell (2003), chapter 6. 14 See Liddell (2003) for an extensive description of buoys and their properties and Liddell, VogtSvendsen & Bergman (in press) for buoys in ASL, Norwegian Sign Language, and Swedish Sign Language.
CONSTRUCTING SPATIAL CONCEPTUALIZATION 181
a. THERE→|word| [WORD]→|word| b. WORD (citation form) Figure 5: The spatially directed signs THERE→|word| [WORD]→|word| and the citation form WORD.
The second picture in Figure 5a shows the signer producing a directed form of WORD. For purposes of comparison Figure 5b illustrates the citation form of WORD. When talking about the misspelled word, she directs [WORD]→|word| toward the same location as SQUIGGLY-LINE-EXTEND-TOL1-L2 as shown in Figure 5a. When producing WORD^LIST, she produces the citation form of WORD illustrated in Figure 5b.15 The two forms of WORD both have the same hand configurations with the thumb and index fingers extended from the fist in opposition to one another and with the tip of the index finger bent over. While there is also a difference in the palm orientation of the two forms of WORD, the significant difference between the two forms is their placement. The citation form is produced with the thumb initially in contact with the chin while [WORD]→|word| is produced toward the same spatial location as SQUIGGLYLINE-EXTEND-TOL1-L2. There is an additional difference between her two productions of WORD in Figure 5. While producing [WORD]→|word| (Figure 5a) she directs her face and gaze toward the location in space where the sign is produced. While producing the citation form WORD (Figure 5b) she directs her gaze toward the addressee. 3.2
Constructing real space blend 1 (RSB1)
The directionality and placements of the signs in Figures 2-5 are consistent with the presence of a computer |monitor| conceptualized in the space ahead of the signer. The pictures in Figure 6 have the outline of a three-dimensional computer monitor superimposed over the images of the signer.
15 We are assuming that WORD is normally produced near the chin as illustrated in Figure 5b. That is, we have added square brackets and a superscript arrow to indicate that this is a displaced form of WORD in that it is produced well ahead of the shoulder rather than near the chin. Since we have not performed an analysis to determine how common the displaced form is, it is also possible that there is a conventional directional noun that could be glossed WORD→x.
182 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN
a. SQUIGGLY-LINE-EXTEND-TOL1-L2 b. [WORD]→|word| c. THERE→|program| Figure 6: The outline of the computer |monitor| with a |word| on its |screen| overlaid on three directional signs.
Although the signing does not provide a way of determining the dimensions of the computer |monitor| the placement of the signs in Figures 2-5 is consistent with the outlined dimensions in Figure 6. In Figure 6a the signer directs SQUIGGLY-LINE-EXTEND-TOL1-L2 along the length of the misspelled |word| conceptualized as being on the |screen| ahead of her. The noun [WORD]→|word| in Figure 6b is directed at virtually the same location on the |screen|. The sign THERE→|program| in Figure 6c is directed through the |screen| into the interior of the |monitor|. She explicitly labels this as the computer |program|. That is, she appears to be treating the interior of the |monitor| as the location of the computer |program|. Based on the directional signs we have illustrated, we propose that RSB1 contains an invisible three-dimensional computer |monitor| conceptualized directly ahead of the signer, some |lines of text| on the |screen|, and a misspelled |word| underlined with a |squiggly line|.16 The misspelled |word| is higher than the other |lines of text|. In spite of the importance of the |lines of text| on the |screen| of the computer |monitor|, the signer does not mention a computer monitor or a screen in her entire description of spell checking. Since the signer does not mention a computer monitor or screen, how is the addressee supposed to come to share this invisible conceptualization ahead of her? We have identified four sources of information that would allow the addressee to create her own RSB1 containing a |monitor| with |lines of text| on a |screen| and a misspelled |word| underlined with a |squiggly line|. The four types of information are shared knowledge of the world, shared knowledge of the current discourse, shared knowledge of NSL grammar, and the directional signs produced by the signer (see Table 1). Before the signer begins her description of spell checking on the computer, she and her interlocutor share considerable background knowledge. Both have worked with computers and are familiar with the physical characteristics of a computer monitor and the operation of wordprocessing software. They also both participated in the immediately preceding discourse in which the operation of word-processing software was discussed. In
16
The directional signs could also be consistent with a screen with a hard disk (containing the program) behind the screen. We prefer the conceptualization described in the text since the existence of a hard disk behind the screen does not reflect our experience with computers.
CONSTRUCTING SPATIAL CONCEPTUALIZATION 183
addition, as signers of NSL, they also share the knowledge that certain types of signs are directed toward things—present or conceptualized as present. Table 1: Information from which to construct the conceptual computer monitor. Shared knowledge of the world Computer monitors are large, heavy, and three(at that time) dimensional objects one sits in front of to operate a computer. Words appear on the monitor screen as one types on a keyboard. Misspelled words are identified as misspelled by the computer by means of a squiggly red line underneath the word. Shared knowledge of current discourse Computers allow editing of documents so that all mistakes can be fixed, resulting in perfect Norwegian text. Shared knowledge of NSL grammar Specific categories of signs are directed at things. Directional signs SQUIGGLY-LINE-EXTEND-TOL1-L2 A |squiggly line| occupies a horizontal breadth in the space ahead of and slightly to the right of the signer’s nose. Her head and eye gaze give the impression that she is looking at the |squiggly line|. There is a |word| in the space ahead of and THERE→|word| [WORD]→|word| slightly to the right of the signer’s nose. The signer’s head and eye gaze are directed toward the |word|. This phrase identifies a computer program. The THERE→|program| PROGRAM directionality of the first instance of THERE→|program| THERE→|program| locates the |program| both lower than and past the location of the misspelled |word|. During this sign her gaze is also directed toward that location. The directionality of the second instance of THERE→|program| may be less precise due to its rapid abbreviated form.
The interpretation of the sign SQUIGGLY-LINE-EXTEND-TOL1-L2 depends on a real space blend that the addressee must construct based on the four types of information in Table 1. The signer’s initial statement that the appear-ance of squiggly lines means that a word is misspelled makes it clear that the topic is still word processing. But the addressee also knows that certain signs are directed toward things. Specifically, the sign SQUIGGLY-LINE-EXTEND-TOL1-L2 moves horizontally. This horizontal movement depicts the location of a |squiggly line| and its length. In this case it is a line conceptualized as being several inches ahead of and slightly to the right of the signer’s nose. The direction of the movement of the signer’s hand shows that the line is horizontal and oriented so that full length of the line faces the signer. Background knowledge includes the fact that such squiggly lines appear on a computer monitor screen underneath misspelled words. As a result, it is reasonable to conclude that not only is a squiggly line being depicted ahead of the signer, but that the squiggly line is
184 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN located underneath the misspelled |word|. The presence of the |word| at that location is later confirmed when THERE→|word| [WORD]→|word| is directed at the same location. This provides evidence for the presence of the misspelled |word| with a |squiggly line| under its extent. Evidence for the |monitor| comes in two forms. First, there is only one place where misspelled words appear with squiggly underlining—a computer screen. Second, by making three horizontal movements underneath the misspelled |word|, the signer provides evidence for a vertical surface—the monitor’s |screen|. Third, when she makes reference to the spell-checking program, she directs THERE→|program| lower and further ahead of her than the misspelled |word|. This locates the |program| at that place. This suggests the presence of a threedimensional entity ahead of the signer with a misspelled underlined |word| on a vertical surface containing |lines of text| near the signer and a |program| further away. Thus the vertical surface is the |screen| of a computer monitor and the threedimensional entity is the |monitor| itself. 3.3
Real space blend 2
Prior to the completion of (4) the signer has already begun to provide evidence for a new real space blend. She begins (4) referring to the |program| in its location in RSB1 – directly ahead of her as in Figure 7a. But she ends (4) with the phrase IN POSS→|computer| WORD^LIST (in its word list). Within that phrase she directs the possessive determiner POSS→|computer| toward the right as illustrated in Figure 7b. We interpret this as evidence that she has created a new real space blend with the |computer| located ahead of her and to the right.
a. PRO→x directed toward the |program| b. POSS→x directed toward the in RSB1 computer in RSB2 (ahead and central) (ahead and to the right) Figure 7: The locations of the |computer| or |program| in real space blends 1 and 2.
This placement of the |computer| in RSB2 is apparently done in anticipation of her upcoming description of a |computer| comparing |words| on a |screen| on the right with |words| on a |word list| on the left (RSB3). In her next statement she mentions that the computer compares things. In making this statement she directs both the subject pronoun PRO→|computer| and the verb COMPARE→|computer| to the right as illustrated in Figure 8. Directing these two signs in this way confirms that the |computer| is conceptualized on the right. She makes use of RSB2 for less than
CONSTRUCTING SPATIAL CONCEPTUALIZATION 185
5 seconds. While this space is active she maintains her role as narrator talking about the |computer| located ahead of her and to her right.
COMPARE→|computer| PRO→|computer| (right hand) It compares things. Figure 8: Directing the subject and verb toward the |computer| on the right.
3.4
Real space blend 3
In RSB3 she personifies the computer and its actions.17 She mentions that the computer has a word list. Immediately after mentioning the word list she adopts the pointing and gazing posture shown in Figure 9a as she signs GAZE-AT3→|word list| . The left hand is a FLAT-SURFACE buoy—a weak hand sign depicting a flat surface. The buoy can potentially remain in place for an extended period as the strong hand continues to produce signs. Next she comes out of the pointing and gazing posture to sign CURIOUS (Figure 9b), then adopts the pointing and gazing posture again as she signs GAZE-AT3→|list word 1| (Figure 9c).
a. GAZE-AT3→|word list| b. CURIOUS c. GAZE-AT3→|list word 1| Figure 9: |computer| examines the |word list|.
Arriving at the conceptualization of a personified computer gazing at its own word list is immediate and straightforward for competent signers. It involves both semantic and gestural clues. The semantic information is provided by the immediately preceding compound sign WORD^LIST, the FLAT-SURFACE buoy, the verb GAZE-AT3→y directed toward the FLAT-SURFACE buoy, and the isolated sign CURIOUS.18 After signing WORD^LIST she produces the FLAT17 Personification is a frequently seen metaphorical device common to both spoken and signed languages. 18 The symbol 3 seen in GAZE-AT3→y is used in Liddell (2003) to indicate that the signer must not only direct the sign toward entity y, but must also direct her face and gaze toward y.
186 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN SURFACE buoy on her left at about chin height, palm facing in and fingertips up. She directs GAZE-AT3→y toward the |word list| in the form of a FLATSURFACE buoy while simultaneously gazing at it. Signing WORD^LIST immediately prior to producing the FLAT-SURFACE buoy can be taken as an explicit instruction to blend the concept of a word list with the buoy. Thus, an addressee would immediately see the raised left hand as the computer’s |word list|. There is a subtle difference between the signer’s posture in Figure 9a,c and the posture in 9b. In 9a and 9c the signer is gazing at the palm of her left hand. In 9b, however, she gazes at her addressee as she signs CURIOUS. In 9a and 9c the signer is in the posture of someone gazing at something. In between these two postures the signer looks at the addressee and signs CURIOUS. This sign can then be understood as a comment relating to the preceding and following postures. Directing one’s attention toward something could be understood as an act of curiosity. Therefore, the curious entity is the one gazing at the word list. But this is not the signer. It is the one doing the gazing in RSB3. She has been talking about a computer and the fact that it has a list of words. Thus, CURIOUS is a comment about the computer—it is curious. This comment from narrator to addressee is about RSB3, which both precedes and follows the comment. When she adopts the posture shown in Figure 9c, RSB3 can now be understood as the image of a curious computer gazing at its own word list. The subsequent behavior of the |computer| is illustrated in the following signs. The pictures in Figure 10 demonstrate a |computer| (in this case a person-ascomputer) looking at |words| on a |list| and |words| on a computer |screen|. The signer had earlier stated that the computer compares things. Given this context it is possible to understand that a |computer| is looking back and forth between |words| on a |word list| and |words| on a |screen| as an act of comparison. If nothing special happens we assume that the comparison has not revealed any problems. Thus, the first four photos in Figure 10 do not reveal any problems as the |computer| compares |words| on the |word list| with |words| on the |screen|. The next row of pictures, however, shows that the |computer| has encountered a problem. After looking at |word 2| on the |screen| on the right, the |computer| checks it against the |word list|. The |computer| first looks for the corresponding |word| on the |word list|, then does something that real computers never do. It moves closer to the |word list| for a closer look at the |word list|.19 The |computer| then makes a short, sharp, downward movement of its face with a slight lowering of the brows. This is a common means that humans use to signal a problem. It appears that the |computer| has determined that |word 2| is not on its |word list| and is reflecting its problematic nature through the movement of the head and facial expression. In the third line of photos the |computer| then looks at |word 2| on the right and comments that it is different (from what is on the |word 19
Obviously, real computers do not use eyes to look at words on a word list or on the screen as the |computer| does in this example. So there are many things about this example that real computers do not do. Taking a closer look, however, stands out as a particularly non-computer-like activity.
CONSTRUCTING SPATIAL CONCEPTUALIZATION 187
list|) and that it is wrong. The |computer| then draws a squiggly line underneath |word 2|.
GAZE-AT3→|list word 1| GAZE-AT3→|screen word 1| GAZE-AT3→|list word 2| GAZE-AT3→|screen word 2| FLAT-SURFACEL ----------------------------------------------------------------------------------------
GAZE-AT3→|list word 2| -------------------------------------------------------FLAT-SURFACEL --------------------------------------------------------closer look find a problem
DIFFERENT→|word 2|
PRO→|word 2|
WRONG→|word 2|
SQUIGGLY-LINEEXTEND-TOL3-L4 FLAT-SURFACEL -------------------------------------------------------------------------------------Figure 10: The |computer| compares |words| on the |word list| on the left with |word 1| and |word 2| on the |screen| on the right. The |computer| finds an error with |word 2| and underlines it.
Figure 11 represents the three major elements of RSB3: the invisible computer |screen| on her right with |words| on it, the |word list| on the left in the form of the signer’s hand with |words| on it, and the |computer| in the form of the signer herself. Of these three elements, the signer is only explicit in identifying the |word list|. She does not tell us that there is a computer screen on the right or that there are words on it. This needs to be worked out by the addressee based on the information that the computer is comparing words on a |word list| with something else. The comparison must be between words on the list and some other words. Those words appear in different places displaced both vertically and horizontally—but not displaced on an in-out dimension. They must be words on a vertical surface such as a computer screen. She also does not tell us that she is
188 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN going to personify the computer. She tells us that the computer compares things and that it has a list. Then we see her going back and forth, apparently comparing things. From this the addressee is to conclude that she has taken the role of |computer|.
Figure 11: Drawn images of aspects of RSB3 superimposed on the image of the signer.
3.5
Real space blend 4
Next the signer directs her gaze back to the addressee and makes the statement in (5). By directing her gaze back to the addressee she signals that she is once again narrating. She nevertheless makes use of two of the three elements of RSB3. She directs NOT-THERE→|word list| toward the |word list| and she directs SQUIGGLY-LINE-EXTEND-TOLL toward the location of the word the |computer| previously underlined when it found that |word 2| on the |screen| was not on the |word list|. (5)
OR IF NOT-THERE→|word list| SQUIGGLY-LINE-EXTEND-TOLstart of |word 2| - end of |word 2| or if it is not on the list, it adds a squiggly line underneath the word.
The difference between RSB4 and RSB3 is that in RSB4 the signer is in the role of narrator rather than |computer|. When the signer blends with the computer, her actions demonstrate the actions of the |computer|. In RSB4, however, the signer’s actions are once again her own. This demonstrates how easily real space blends can change. Simply by directing her gaze toward the addressee the signer once again reverts to her role as narrator, thus changing from RSB3 to RSB4. She continues narrating and making use of RSB4 by explaining that the computer adds the squiggled lines because it doesn’t know what the word is (she identifies the word by directing PRO→x toward |word 2| on the |screen| on the right). One must click on the underlined word to get it into the word list. She
CONSTRUCTING SPATIAL CONCEPTUALIZATION 189
indicates which word must be clicked on by directing CLICK→x toward the underlined |word 2| on the |screen|. She indicates that this results in the word being in the list by directing INx→y from |word 2| on the |screen| to the |word list|. Directing INx→y initially toward |word 2| on the |screen| identifies this word as the entity that is in something. Ending the movement of INx→y with the small finger in contact with the palm of her left hand identifies the |word list| as the entity containing |word 2|. The signer concludes this portion of her narrative by cautioning her addressee that it is important to make sure the word added to the computer’s word list is spelled correctly. 3.6
Real space blend 5 The signer then creates RSB5 while making the observation in 6.
(6)
OCCASIONALLY SQUIGGLY-LINE-EXTEND-TOL1-L2 PRO-1 GAZE-AT3→|word|, “CORRECT PRO→|word|”. Sometimes a squiggly underline appears (under a word on the screen). I look at (the word) – “It’s right.”
Without mentioning a word on a computer screen, she provides evidence for such a word by producing SQUIGGLY-LINE-EXTEND-TOL1-L2, as illustrated in Figure 12a. This places a |squiggly line| on a |screen| ahead and to the right of her.
a. SQUIGGLY-LINE-EXTEND-TOL1-L2 b. GAZE-AT3→|word| Figure 12: Locating the squiggly line and gazing at the underlined word.
From the previous context we know that such lines appear when the computer identifies a word as misspelled. In addition, she earlier provided evidence for a conceptual computer screen ahead and to the right of her when demonstrating how a computer checks spelling. So producing SQUIGGLY-LINEEXTEND-TOL1-L2 now supports the existence of a depicted |squiggly line| on a |screen| ahead and to the right of her. Additionally, the addressee’s background knowledge of computers would place the line underneath a |word| the computer has identified as misspelled. The clause PRO-1 GAZE-AT3→|word| places her at another place and time in front of the |screen| because she is describing herself gazing at the underlined |word|. Thus, what we see is |herself| (at another place and time) in front of a |screen| with an underlined |word| on it.
190 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN Additional evidence for the depicted |word| comes from the signer’s description of |herself| looking at the word and saying to |herself| that it is correct. She provides this evidence by turning her face and looking toward the |word| and also directing GAZE-AT3→|word| as shown in Figure 13. Next she produces (7), which is parallel to (6) but is about a word in a dictionary rather than a word on a screen. (7)
R: WORD^BOOK, GAZE-AT3→|dictionary|; “CORRECT”. L: FLAT-SURFACEL3 ---------------------------------------The dictionary, I look at it. “(The word) is correct.”
Immediately after signing WORD^BOOK (dictionary) she locates a FLATSURFACE buoy on her left with her left hand (in a different location from the FLAT-SURFACE buoy earlier used to depict a word list) and directs GAZEAT3→|dictionary| toward it, as illustrated in Figure 13. Since she mentions WORD^BOOK immediately prior to placing the FLAT-SURFACE buoy, the buoy is understood as a depiction of the dictionary. The signer directs GAZEAT3→y toward the depicted |dictionary| in the form of a FLAT-SURFACE buoy. This is to be understood as gazing toward the dictionary. She then signs CORRECT. This is her comment that the word marked as incorrect by the computer is correctly spelled.
Figure 13: GAZE-AT3→|dictionary|
3.7
Real space blend 6
Next she directs GAZE-AT3→y toward the |word| on the |screen| ahead of her right shoulder, pauses, makes a slightly disgusted facial expression, then makes the comment in Figure 14, while directing her face and gaze toward the addressee. By reestablishing eye contact with the addressee she is no longer |herself| at another time and place. She has removed |herself| from the blend. In addition, the left hand as the |dictionary| also disappears. One important part of RSB5, however, remains. That is the |screen| on the right. This allows her to comment about the elements of RSB5 as narrator rather than participant. We refer to this new blend without the signer as |herself| and without the |dictionary| as real space blend 6.
CONSTRUCTING SPATIAL CONCEPTUALIZATION 191
MEAN [TYPE^MACHINE]→|computer| NOT-HAVE THERE→|word| [WORD]→|word| IN→|computer| [THEME] →|word| --------------------------------------------------It means the computer doesn’t have the (misspelled) word inside it. Figure 14: Directing signs toward |word| and |computer|.
Both the directionality of the signs and the meanings expressed in Figure 14 provide evidence that the signer has conceptualized both a computer screen and a computer on her right. In our analysis [TYPE^MACHINE]→|computer| and IN→|computer| are directed toward the |computer| while THERE→|word|, [WORD]→|word|, and [THEME]→|word| are directed toward the misspelled |word| on the computer |screen|. It appears that she has conceptualized a |computer| with a |screen| as an integral part of it on her right. There is text on the |screen|, including the misspelled |word|. The inside of the computer is conceptualized behind the |screen|. The way she directs IN→|computer| provides evidence for this three-dimensional conceptualization since, as she produces IN→|computer| she extends her hand further away from her body than for all the other directional signs. She extends her arm past the |screen|, to the inside of the conceptualized computer |monitor|. 4.
Comparing the Real Space Blends
We have examined how the directional signs produced by a signer of Norwegian Sign Language provide evidence for six sequentially conceptualized real space blends. The transitions between spatial conceptualizations are carried out so smoothly that this forty-seven second narrative gives the impression of highly fluent signing without interruptions for the purpose of creating new spatial conceptualizations. Contrary to popular assumptions about how space is used in sign languages, the signer is not using space as a convenient means of keeping track of the things she was talking about. She creates six real space blends that assist her in expressing specific types of meanings. Initially she is simply a narrator talking about computers without having created a real space blend. Then, when she begins talking about underlined words on a computer monitor, she provides evidence for RSB1, containing a three-dimensional computer |monitor| with a |screen| ahead of her with an underlined |word| on it slightly to her right. The presence of the conceptualized computer |monitor| allows her to place a misspelled |word| on the |screen| and describe it as being underlined. In the context of the discussion this indicates a spelling or grammatical problem. Next, apparently in anticipation of the demonstration of computer spell checking that was to follow, she creates RSB2, which consists of a |computer| on her right. She makes three references to the |computer| there by directing the pronoun PRO→x,
192 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN the possessive determiner POSS→x, and the verb COMPARE→x toward it. In RSB2 her signing provides no evidence that RSB2 was a three dimensional computer |monitor| with a screen. Her signing only provides evidence for the |computer| on her right. RSB3 is much more elaborate than RSB1 or RSB2. In RSB3 the signer becomes a |computer| carrying out the activity of spell checking, making use of a |word list| on her left and |screen| on her right. Additionally, the |word list| consists of individual listed |words| and the |screen| has typed |words| on it. The |computer| then compares individual listed |words| with individual typed |words|. When the |computer| finds a misspelled word the |computer| underlines it with a |squiggly line|. This constitutes a demonstration of the process of spell checking which the addressee witnesses. RSB4 is like RSB3 except that the signer is acting as narrator while still maintaining the |word list| on her left and the |screen| on her right. The difference between RSB3 and RSB4 makes an important point that arises again in the difference between RSB5 and RSB6. What we observe in both cases is that the signer is able to remove herself from the spatial conceptualization simply by reestablishing eye gaze with the addressee. In RSB3 the signer becomes a |computer| with a |screen| with typed |words| on it to the right and a |word list| with |words| on it to the left. By reestablishing eye contact with the addressee she is able to return to the role of narrator rather than |computer| while still retaining the other aspects of the blend. That is, in RSB4 she, as narrator, is still able to direct signs toward the |screen| on the right and the |word list| on the left. RSB5 also shares similarities with RSB3 except that instead of taking the role of computer, she adopts the role of |herself| at a different place and time. Instead of a |word list| on her left, there is a |dictionary| on her left and a |screen| on her right. RSB5 allows her to explain the frustration of encountering an underlined word on a screen while struggling to understand why the computer has underlined it. As with the difference between RSB3 and RSB4, she is also able to remove herself from RSB5 by reestablishing eye contact with the addressee. This produces RSB6, which, like RSB5, has a |screen| and a |computer| on the right. However, it differs from RSB5 in that the signer is no longer a part of the blend and the left hand as |dictionary| is no longer present. 5.
Summary and Conclusions
Real space blends can be created by signers when the entities being talked about are not physically present. In order for an addressee to understand the signer, the addressee must understand the nature of the real space blends created by the signer. The focus of this paper has been on examining the amount of information the signer provides to the addressee in order for the addressee to correctly construct real space blends. In providing evidence for RSB1 the signer uses the depicting verb SQUIGGLY-LINE-EXTEND-TOL1-L2. The verb depicts a |squiggly line| in the
CONSTRUCTING SPATIAL CONCEPTUALIZATION 193
space ahead of the signer. Beneath that the signer depicts three horizontal lines. She also directs the sign WORD toward the part of the depicted space associated with the squiggly line. This depicts a word at the same location as the squiggly line. At some distance further away from the signer she identifies a program. From these clues and the fact that she is talking about what computers do, we propose that in order to understand the signer the addressee must also construct a three-dimensional computer with screen ahead of the signer. Although the clues provided by the signer are minimal, they are sufficient, given the context of the discussion and the background knowledge of the two signers. Background knowledge would tell the addressee that squiggly lines appear on computer screens. It follows, then, that in producing SQUIGGLYLINE-EXTEND-TOL1-L2 the signer is depicting a computer screen. The three lines below the squiggly line must be lines of text rather than straight graphical lines. The program is further away from the signer because it is inside the threedimensional computer conceived of as being ahead of the signer. The examination of the six real space blends reveals that rather than being exceptional, there is a pattern of providing just enough information that, when combined with background knowledge and the context of the utterance, proves sufficient for constructing the appropriate real space blends. Thus, contrary to the widely accepted (prescriptive) view that signers must identify every spatial element prior to making use of it, we find that this signer provides explicit identification of only some of the elements of her real space blends. The conceptual task of creating the remainder of each real space blends falls on the addressee. References Fauconnier, G. (1985). Mental spaces. Cambridge, MA: The MIT Press. [Reprinted 1994, Cambridge: Cambridge University Press.] Fauconnier, G. (1997). Mappings in thought and language. Cambridge: Cambridge University Press. Fauconnier, G., & Turner, M. (1996). Blending as a central process of grammar. In A. Goldberg (Ed.), Conceptual Structure, Discourse and Language (pp.113-130). Stanford, CA: CSLI Publications. Hoffmeister, R. J. (1977). The influential point: In W. C. Stokoe (Ed.), Proceedings of the First National Symposium on Sign Language Research and Teaching. Chicago, Il (pp.177-191). Kendon, A. (1988). How gestures can become like words. In F. Poyatos (Ed.), Crosscultural Perspectives in Nonverbal Communication (pp.131-141). Toronto: Hogrefe. Liddell, S. K. (1994). Tokens and Surrogates. In I. Ahlgren, B. Bergman, & M. Brennan (Eds.), Perspectives on Sign Language Structure. Papers from the Fifth International Symposium on Sign Language Research, vol. I (pp.105-119). University of Durham, England: The Deaf Studies Research Unit. Liddell, S. K. (1995). Real, surrogate, and token space: Grammatical consequences in ASL. In K. Emmorey & J. Reilly (Eds.), Language, gesture, and space (pp.19-41). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers. Liddell, S. K. (2003). Grammar, Gesture, and Meaning in American Sign Language. Cambridge: Cambridge University Press.
194 SCOTT K. LIDDELL AND MARIT VOGT-SVENDSEN Liddell, S. K., Vogt-Svendsen, M., & Bergman, B. (2007). A Crosslinguistic Comparison of Buoys: Evidence from American, Norwegian, and Swedish Sign Language. In M. Vermeerbergen, L. Leeson, & O. Crasborn (Eds.), Simultaneity in Sign Languages: Form and Function (pp. 187–215). Amsterdam: John Benjamins. MacLaughlin, D. (1997). The Structure of Determiner Phrases: Evidence from American Sign Language. Unpublished doctoral dissertation, Boston University. McNeill, D. (1992). Hand and Mind: What Gestures Reveal about Thought. Chicago: University of Chicago Press. Stokoe, W. C. (1960). Sign Language Structure: An Outline of the Visual Communication System of the American Deaf. Studies in Linguistics Occasional Papers, no. 8. Buffalo, NY: Department of Anthropology and Linguistics, University of Buffalo. Wilbur, R. B. (1979). American Sign Language and Sign Systems: Research and Application. Baltimore: University Park Press. Zimmer, J. & Patchke, C. (1990). A Class of Determiners in ASL. In C. Lucas (Ed.), Sign Language Research: Theoretical Issues (pp.201-210). Washington, Gallaudet University Press.
Environmentally Coupled Gestures Charles Goodwin University of California, Los Angeles
Using as data videotapes of archaeologists excavating a prehistoric village this chapter investigate gestures that cannot be defined completely within the skin of the actor(s), but require as well phenomena in the environment, such as archaeological structure in the dirt under a moving hand. What emerges are gestures built through the mutual elaboration of different materials in different media that have a symbiotic organization in which a whole that is greater than, and different from, any single part is created. Environmentally coupled gestures are central to the cognitive organization of a profession such as archaeology and the ongoing constitution of the distinctive professional mind of the archaeologist. Simultaneously they force us to expand our sense of what counts as gesture, and the analytic frameworks required to study it.
The work of David McNeill (1992, and much more) provides exemplary analysis of the intimate relationship between gesture and language. He demonstrates that utterances emerge within a microgenetic process in which language and gesture develop together as integrated but complementary meaning making resources. Here I want to investigate a range of phenomena relevant to the organization of gesture that encompass not only psychological processes within the speaker, but also embodied participation frameworks constructed through the collaborative actions of multiple parties, and structure in the environment. 1.
Gestures Tied to the Environment
I will focus on environmentally coupled gestures, gestures that cannot be understood by participants without taking into account structure in the environment to which they are tied. Consider the following. Talk is transcribed using a system developed by Gail Jefferson (Sacks, Schegloff & Jefferson, 1974: 731-733): (1)
Father: So she sold me this. But she didn’t sell me this (0.2) or tha:t.
It is impossible to grasp just what the speaker is telling his recipient from the talk alone. Clearly a major reason for this is the use in the talk of deictic terms (‘this’ and ‘that’) that instruct the hearer to attend to phenomena beyond the
196 CHARLES GOODWIN stream of speech. Indeed each of these terms indexes a gesture. Characteristically gesture is analyzed by linking what a hand is doing to the structure of the talk in progress. Here however that is inadequate. When the gesturing hands alone are taken into account what exactly is being talked about is still not visible:
Figure 1: Gesture alone.
To grasp what the speaker is saying and demonstrating a hearer must take into account an object being held by the speaker and being presented and demonstrated through the gesture (see Figure 2). The object here is a pitcher for an electric blender that the speaker has ordered over the Internet. The speaker is telling his addressees that while the pitcher was shipped he did not receive either the top for the pitcher, or its screw-in base. While this is not made visible through gesture and its accompanying talk alone, it becomes vividly clear when a larger multimodal sign complex that encompasses not only talk and gesture, but also objects in the world is taken into account (Streeck, 1996). As the speaker begins this utterance (more specifically during the word “sold”) his hands noticeably grasp the pitcher. He is not grasping the pitcher to hold it (it is already well supported by his other hand) but instead to prominently display the object to his addressees. One might think of this hand movement as a gestural practice for presenting or indicating something, that is as an action similar to a pointing gesture. However it is crucial to not restrict analytic focus to the gesturing hand, but to also take into account the object in the world being grasped.
ENVIRONMENTALLY COUPLED GESTURES
197
As is demonstrated a moment later this object forms a crucial part of the multimodal signs that display the missing parts of the blender. The gesturing hands alone fail to make visible the absent base and lid (see Figure 1).
Figure 2: Object incorported into gesture.
The co-occurring talk is equally crucial in that it formulates what is being done as describing something absent that can be inferred from the structure of the object being held. The general importance of the talk that elaborates a gesture is made particularly clear when the party producing the gesture can’t speak, as can happen for example in aphasia. Rather than being immediately, transparently clear, a gesture such as this unaccompanied by relevant talk can set off a long sequence devoted to figuring out what a speaker suffering from aphasia is trying to say to his interlocutors through the gesture (Goodwin, 1995; 2002). Gestures coupled to phenomena in the environment are pervasive in many settings (archaeological field excavations, weather forecasts, pointing to overheads in academic talks, etc.—consider how many computer screens are smeared with fingerprints). Gestures linked to the environment would thus seem to constitute a major class of gesture. However with a few notable exceptions (Goodwin, 2000; 2003; Haviland, 1996; 1998; Heath & Hindmarsh, 2000; Hutchins & Palen, 1997; LeBaron, 1998; LeBaron & Streeck, 2000; Murphy,
198 CHARLES GOODWIN 2005; Nevile, 2001; Streeck, 1996) multimodal sign complexes that encompass both gesture and phenomena in the world have been largely ignored. This neglect may result from the way in which such gestures slip beyond theoretical frameworks focused on either ties between gesture and psychological processes inside the mind of the individual speaker, or exclusively on the talk and bodies of participants in interaction. An invisible analytic boundary is frequently drawn at the skin of the participants. However, rather than being something that can be studied in isolation as a neat, self contained system, gesture is an intrinsically parasitic phenomenon, something that gets its meaning and organization from the way in which it is fluidly linked to the other meaning making practices and sign systems that are constituting the events of the moment. Human cognition and action are unique in the way in which they use as resources both the details of language, and physical and cultural environments that have been shaped by human action on an historical time scale. Environmentally coupled gestures are pervasive in the work of archaeologists who must articulate for each other visible structure in the dirt they are excavating together. In Figure 3, Ann, a senior archaeologist, is guiding the work of Sue, a new graduate student at her first field excavation. Sue is outlining in the dirt the shape of a post mould that will be then be transferred to a map. Ann locates relevant structure in the dirt for Sue with a series of environmentally coupled gestures, while formulating with her talk what is to be seen there. As Ann says, “This is just a real nasty part of it,” her extended index finger outlines something in the faint color patterning visible in the dirt.
Figure 3: Environmentally coupled gesture.
Most analysis of gesture focuses on the movements of the speaker’s body, typically the hand. However, neither Sue, nor anyone else, could see the action that Ann is performing here by attending only to her hand. What Sue must see if
ENVIRONMENTALLY COUPLED GESTURES
199
she is to understand Ann’s action in a relevant fashion is not only a gesture, but also the patterning in the earth she is being instructed to follow. The dirt under Ann’s finger is indispensable to the action complex being built here. The finger indicates relevant graphic structure in the dirt, while simultaneously that structure provides organization for the precise location, shape and trajectory of the gesture. Each mutually elaborates the other, and both are further elaborated by the talk that accompanies the gesture (see Figure 4). Ann’s gesturing hand is but part of a multimodal complex that includes not only the speaker’s talk, but extends beyond the body to encompass material structure in the environment. This was true as well for the first example, shown in Figure 4:
Figure 4: Multimodal organization of action.
In brief what one finds here is a small ecology in which different signs in different media (talk, the gesturing body and objects in the world) dynamically interact with each other. Each individual sign is partial and incomplete. However, as part of a larger complex of meaning making practices they mutually elaborate each other to create a whole, a clear statement, that is not only different from its individual parts, but greater than them in that no sign system in isolation is adequate to construct what is being said.
200 CHARLES GOODWIN 2.
The Communicative Status of Environmentally Coupled Gestures
It has sometimes been argued that gestures are not inherently communicative (Krauss, Morrel-Samuels & Colasante, 1991; Rimé & Schiaratura, 1991). For example, people on the telephone, as well as blind speakers, can be observed to gesture. Indeed, in light of LeBaron and Streeck’s (2000) demonstration that one primordial basis for gesture is the hand’s engagement with a world, one would certainly not want to argue that all gestures are communicative. Many gestures emerge from the actor’s experience of working in the world and can help the speaker conceptualize phenomena that are known through embodied action. However, if my argument is valid that gesture, talk and relevant structure in the environment are all interdependent components of the actions being built with environmentally coupled gestures, then addressees must take into account not only the talk, but also the gesture. How might this be demonstrated? In Figure 5 Ann’s talk in lines 44-46 is grammatically incomplete. The noun phrase projected to occur after the preposition “of” is central to the action in progress in that it will specify what Ann is inquiring about. However it is never produced. Instead Ann points to seeable structure in the dirt where Sue is trying to trace the outline of a feature.
Figure 5: Utterance presupposes gesture.
ENVIRONMENTALLY COUPLED GESTURES
201
Despite the absence of this crucial noun phrase Ann has no difficulty whatsoever in understanding and responding to Ann’s request. With the “it” in line 48 she not only displays that she has unproblematically located what she has been asked to see, but incorporates that recognition into the structure of her own subsequent utterance. The way in which Sue is building subsequent action by explicitly taking into account Ann’s environmentally coupled gesture is further demonstrated by her own gestural activity. During the end of Ann’s utterance (“of uh:,” in line 46) Sue brings her hand right next to Ann’s, and points with her trowel at the very place that Ann is indicating with her finger (second image in the top row of Figure 5). Once Sue’s pointing hand has been linked to both Ann’s gesture and the relevant structure in the dirt being scrutinized, Sue uses this position as the point of departure for an environmentally coupled gesture of her own in lines 48-49 that constitutes the answer to Ann’s question. Once again crucial grammatical structure, such as the locative complement to the first “around” in line 48 (around where?) is provided not by structure in the talk, but instead by the accompanying gesture. The environment that a gesture ties to has been discussed so far in terms of the physical surround that is the focus of the participants’ attention. However another crucial component of the environment that organizes participants’ actions is the prior talk and action that constitutes the contextual point of departure for the production of subsequent action (Heritage, 1984; Sacks et al., 1974). Through the way in which Sue’s hand is visibly linked to the prior placement of Ann’s hand, and the deictic reference in her talk, her action is explicitly tied to, and indeed emerges from, this sequential environment, as well as the structure in the dirt that her moving hand traces. Her action is coupled to a range of quite different, but mutually relevant environments. The communicative status of the environmentally coupled gestures that occur here is demonstrated in a number of different ways. First, Sue clearly and explicitly takes Ann’s gesture into account in the construction of her reply to Ann, both by including in her subsequent utterance a deictic term that indexes what Ann’s gesture has indicated, and by visibly using Ann’s gesture and the space it has located as the point of her departure for her own subsequent gesture. Second, Ann’s utterance is not in any way marked as defective, for example through use of repair initiators (Schegloff, Jefferson & Sacks, 1977). It nonetheless uses linguistic structure that would be grammatically incomplete if all relevant meaning making resources were to be found exclusively within the stream of speech (for example in line 48 “around” without a locative complement). The grammatical choices made by the speaker presuppose that the addressee has attended to the gesture (see Goodwin, 2003b for further examples of this process). The speaker is incorporating into the construction of her utterance the communicative expectation that a relevant gesture will not only be seen, but systematically taken into account for proper understanding of what is being said.
202 CHARLES GOODWIN 2.1
Embedding gesture within participation frameworks
What practices warrant the assumption that certain gestures will be treated as communicative and be attended to as crucial to the organization of the talk and action in progress? Consider what happens in Figure 6. As Ann begins to explain something with an environmentally coupled gesture her two addressees look away from the dirt being pointed at and briefly talk together (lines 166-168, first image). Ann interrupts her developing utterance without bringing it to completion (line 166) and her addressees return their gaze to the dirt being pointed at (see Goodwin, 1981 for extended analysis of how restarts are used to secure the gaze of nongazing hearers). Once they are gazing toward the environmentally coupled gesture Ann does it again while recycling an earlier section of her talk (“show that”), but only now moves that talk forward to the point of her demonstration, the location of a “stripe” in the dirt, something that her addressees are now visibly positioned to see.
Figure 6: Addressee gaze toward gesture.
The environmentally coupled gesture is thus constructed as a communicative event by being performed right at the place where its addressee is gazing. It is built to be seen. Moreover, such positioning is not accidental, but, as demonstrated by the sequence in Figure 6, something that parties making such gestures not only attend to, but systematically work to achieve (for example by
ENVIRONMENTALLY COUPLED GESTURES
203
delaying the crucial conjunction of gesture, space and talk until the relevant gaze of the addressee has been obtained). More generally, the production of the gesture is embedded within a multi-party embodied participation framework (Goodwin, 1981; 2002c; in press; Goodwin & Goodwin, 2004; Kendon, 1990) that creates for the participants a shared focus of visual and cognitive attention toward both each other and relevant phenomena in the environment. In this it has similarities to what Tomasello (1999; 2003) has described as a, “joint attentional frame.” Note however that the participation framework encompasses more than the mental life of the actors. It is systematically organized through visible embodied practice, and is capable of ongoing negotiation and calibration, as indeed occurs in Figure 6 when Ann sees that her addressees are not attending to her. Moreover, though beyond the scope of the present paper, such participation frameworks encompass not only orientation toward events in the environment (the primary focus of Tomasello’s analysis), but also their attention to each other. Indeed the use of participation frameworks to systematically organize mutual orientation between speakers and hearers is central to the organization of talk-in-interaction (Goodwin, 1981; in press). Through the ongoing organization of relevant participation frameworks participants are able to hold each other accountable for detailed and relevant participation in the events of the moment, something that is central to their ability to build ongoing courses of action in concert with each other. The communicative status of particular gestures is constituted through the way in which they are organized to be seen within relevant participation frameworks. Both gesture and participation frameworks are built through visible embodied displays. Both thus constitute a primordial locus for the organization of human action and cognition through embodiment. It is however important to note that they in fact constitute quite different kinds of semiotic processes that stand in a complementary relationship to each other (see Figure 7). Gestures are intimately linked to the details of what is being said, and, like the words they frequently accompany, are evanescent. Particular gestures rapidly disappear as the talk moves onward. By way of contrast participation frameworks are not about the substance of what is being said, but instead about the relationship of the participants toward each, or more precisely their mutual orientation. They also have a far more extended temporal duration than gestures do. Indeed they typically frame extended strips of talk and gesture. Most crucially participation frameworks create an embodied, multi-party environment within which structurally different kinds of sign exchange, including talk and gesture, can occur (Goodwin, 2000; 2003b). In his analysis of gesture McNeill draws attention to the importance of gesture space which he initially identifies as something that can be visualized “as a shallow disk in front of the speaker, the bottom half flattened when the speaker is seated” (1992:86) Consideration of environmentally coupled gestures enables us to expand the notion of gesture space to encompass, first, structure in the surround that is implicated in the organization of a participant’s gestures (for
204 CHARLES GOODWIN
Figure 7: Participation framework creates frame for other sign exchange processes.
example, the patterning in the dirt that is incorporated into an environmentally coupled gesture and which shapes the movement of the gesturing hand), and second, the bodies of not only the party making the gesture, but also the body of the addressee (see also Goodwin, 1998). 3.
Environmentally Coupled Gestures and the Social Calibration of Professional Vision
How might the distinctive properties of environmentally coupled gestures be implicated in the organization of other aspects of human cognition? One phenomenon will be briefly noted here: the social calibration of embodied knowledge and professional vision. Communities, workgroups, and professions categorize phenomena in the environments that are the focus of their concern in distinctive ways. For example, unlike laymen, archaeologists systematically see traces of past human activity in the color patterns visible in the dirt they are excavating. Moreover they use such seeing, as well as an ensemble of other embodied practices (such as the ability to
ENVIRONMENTALLY COUPLED GESTURES
205
reveal structure in dirt through the precise movement of a trowel) to construct the distinctive textual artifacts, such as maps and coding schemes, that constitute the documentary infrastructure of archaeology as a profession (Goodwin, 2000). Archaeologists trust each other to competently see relevant structure in the complex visual field provided by the emerging soil of an excavation. Indeed a crucial cognitive component of what it means to validly occupy the identity of archaeologist is mastery of such professional vision (Goodwin, 1994). Both vision and embodiment are frequently analyzed from a perspective that focuses on the experience of an isolated, individual actor (for example the psychology of the actor doing the seeing). However, to function in the social life of a profession the ability to see relevant structure in a complex environment must be organized, not as an idiosyncratic individual ability, but instead as systematic public practice. Environmentally coupled gestures provide important resources for shaping the perceptual activities of individuals into the ways of seeing required to accomplish the distinctive work of a community. To investigate this it is useful to first briefly describe some of the tasks faced by archaeologists excavating a site.
Figure 8: Mapping a feature.
Though visitors to museums typically look at artifacts—physical objects such as pottery and tools—much of the evidence used by archaeologists to study
206 CHARLES GOODWIN earlier human activity consists simply of color patterns in the soil being excavated. The colored shapes left by a fire or a decayed post provide examples. The very process of excavation systematically destroys such features. The dirt that reveals them is subsequently removed to uncover what is underneath. Making accurate maps of such features to provide an enduring record of structures visible at specific points in the excavation is thus one of the central tasks of excavation. To make such maps relevant features must first be located in the dirt through systematic excavation. This process is complicated by the fact that the soil containing the feature may be visibly disturbed by the actions of burrowing animals or later human activity, such as a plow moving through the dirt. Once a feature has been clearly revealed through careful trowel work, its shape is outlined in the dirt with the point of a trowel, a process that archaeologists call ‘defining a feature’. The position of the feature is then precisely measured and transferred to graph paper to make a map (see Goodwin, 1994 for a more detailed description of map making). Figure 8 provides an overview of this process. Some of the ways in which environmentally coupled gestures provide resources for socially organizing the practices of seeing and acting that are crucial to the work of a community will now be briefly examined. Reliably locating relevant archaeological structure in the dirt that is the focus of an excavation is by no means a transparent task. What patterns of color differences count as a feature of a particular type? How is the patterning that constitutes a feature to be distinguished from an intruding disturbance? As argued by Wittgenstein (1958; see also Baker & Hacker, 1983; Edgeworth, 2003) there is a gap between the diverse, frequently amorphous events in a complex visual environment being scrutinized by working actors, and the categories used by their social group to classify such phenomena (for example, categorizing a pattern visible in patch of dirt as an archaeological “feature”—a process that may include drawing a line that gives that analytic object a precise shape within the documents, such as maps, that organize the work of the group doing the classification). Environmentally coupled gestures provide resources for bridging this gap through work-relevant practice In the two images above each other on the left side of Figure 8, Ann, the senior archaeologist, makes an environmentally coupled gesture, running her hand in an inverted U shape over a long stripe in the dirt while describing it as, “that disturbance.” The stripe is later identified as a plow scar (see (Goodwin, 2003a for more extended discussion of this sequence). Environmentally coupled gestures, such as the one made here, integrate in a single action package both categories (for example, a ‘feature’, a ‘post mould’, a ‘disturbance’) and the phenomena in the setting that are being categorized (actual structure in the dirt), and moreover do this as part of the consequential activities that make up the significant work of a community. When the patterns of movement that trace a shape leave a physical mark on the surface being described the activities of the archaeologist’s moving hand can move beyond gesture into inscription. Thus one of the most common gestures at
ENVIRONMENTALLY COUPLED GESTURES
207
an archaeological field site takes the form of tracing with a finger or trowel a shape argued to be present in the dirt just below the moving hand (see A in Figure 9). If the finger or trowel is lowered so that it actually penetrates the soil a more enduring record of the gesture in the form of a line in the dirt is created (B in Figure 9). Though drawing on the environment might be argued to fall beyond the boundaries of gesture, there is in fact a continuity of action, a family resemblance, between the gestures used to highlight structure in the dirt being scrutinized (A in Figure 9) and the activity of inscription (B in Figure 9), which transduces such a gesture so that it leaves an annotation in the environment itself (Goodwin, 2003a: 228-233).
Figure 9: From gesture to durable marks in the environment.
This line in the dirt that imposes precise shape on the color pattern is a category, the first version of the iconic sign for the feature that will then be transferred to the map. However, unlike maps, which free themselves from the ground from which they emerge and travel from the site in a new medium, sheets of paper (i.e., they have the properties of Latour's 1987 immutable mobiles), the inscription in the dirt has a liminal status. Though it has the clear, humanly drawn, precise shape that will later be found on the map, it is constructed in the same visual field and from the same materials as its signified. It has not yet been removed from the very color patterning in the dirt that it stands as a sign for. This
208 CHARLES GOODWIN has a range of most important consequences for the social organization of embodied practice and professional vision. First, the way in which an actor, such as an apprentice archaeologist, sees a relevant structure in a patch of dirt is no longer a private process of perception. Instead, when an inscription is made, the feature is given precise shape in a public arena. By virtue of the way in which the inscription, and the gestures that accompany it, occur within a participation framework that creates a shared focus of visual and cognitive attention (see Figure 7), the senior archaeologist is systematically positioned to see both the complex visual environment that is the focus of their work (the dirt currently being excavated), and the operations being performed on that environment by a newcomer attempting to master the practices required to properly see and annotate relevant structure in that field. Of great importance to the social calibration of vision and practice is the liminal status of the inscription, the way in which it is positioned simultaneously in both the world of clear, distinct archaeological categories (features, disturbances, plow scars, etc. as iconically displayed through the sharply defined figures drawn by the archaeologist), as well as in the messy, material particulars of the dirt that quite literally constitutes the primordial ground for the objects of knowledge that animate archaeology as a discipline. The senior archaeologist is able to see simultaneously both precisely how her co-participant sees and categorizes the structures they are working with together, and the evidence used for such seeing. She can thus judge not only the correctness of the line (something that becomes impossible later in the lab when the shape has been moved from the dirt to a blank piece of paper), but also the correctness and competence of her student’s action. Second, by virtue of the way in which they are engaged in interaction with each other in an environment with these properties, the relationship between evidence and categorization, the correctness of fit, can itself be topicalized, investigated, and negotiated. Environmentally coupled gestures are central to this process. Figure 9 provides two examples. In A on the left Sue uses her finger to trace just above the dirt how she would have drawn the line there differently. In C on the top right in Figure 9 Ann actually lowers her gesturing finger slightly into the soil as she demonstrates where she would have made Sue’s inscription. These environmentally coupled gestures provide resources for imagining and publicly displaying alternative outcomes (Murphy, 2005) to the current task of seeing and categorization. Note that in both of these cases the gesture is tied to not only the color patterning in the dirt (and indeed constitutes an argument through gesture about what is to be seen there), but also to another act of categorization that has been given public shape through an inscription. The gestures in A and C of Figure 9 are sequentially next gestures to a prior act of classification, an existing line defining the feature. The subsequent gesture parasitically builds upon that earlier hand movement of another party by indexically tying to the trace it left. This mutual commentary, a dialogue of gestures in a complex, contested visual field, provides resources for publicly probing and debating the proper way to see and
ENVIRONMENTALLY COUPLED GESTURES
209
delineate relevant structure in the dirt, and thus to move toward the entrainment of individual perception into socially organized professional practice. 4.
Conclusion
In his groundbreaking analysis of gesture McNeill (1992) demonstrated that gesture and the speech that accompanies it have a common origin in the mind of the speaker. Thus the combination of gesture and language found in the prototypical utterance are parallel, integrated manifestations of a unitary psychological process. While using McNeill’s powerful demonstration of the close ties between language structure and gesture as an essential point of departure, the present paper has investigated how the scope of phenomena relevant to the organization of at least some gestures can extend beyond the skin of the actor. Several components of such an expanded gesture space have been briefly examined. First, a particular class of gestures cannot be understood by taking into account only a gesturing body and its accompanying talk (see Figure 1). Such gestures are tied to different kinds of structure in the environment that are central to the organization of both what they are seen to mean, and to the actions being built through them. Environmentally coupled gestures are pervasive in certain settings. Much of the analysis in the present paper focuses on videotapes of archaeologists engaged in the process of excavation. A particular environment, the dirt they are excavating, is the explicit focus of their work and their gestural activity provides them with resources for locating and highlighting relevant structure in that complex visual field. One clear demonstration of the importance of such gestures is the regular occurrence of hybrid utterances that are grammatically incomplete, but which pose no problems of understanding for participants who are expected to take into account not only the talk in progress, but also the gesture and the structure in the dirt indicated by the gesture. A second component of this expanded gesture space is the participation framework structured by the mutual orientation of the participants’ bodies. Not all gestures are communicative. However, systematically placing a gesture within a relevant participation framework, in other words, designing it to be seen and taken into account by an addressee, is one method for publicly establishing the communicative status of a particular class of gestures. From such a perspective the gesture space includes the body of the addressee, as well as that of the speaker making the gesture. A clear demonstration of the importance of the addressee is provided by cases in which a speaker discovers that she does not have the gaze of an addressee, solicits that gaze, and only then produces the gesture (see Figure 6 and (Goodwin, 1998). Though both gestures and participation frameworks are made visible through embodied displays, they in fact constitute quite different kinds of semiotic processes. The gesture elaborates what is being said or done at the moment, while the participation framework does not deal with such local content, but instead is about the orientation of the participants toward each other.
210 CHARLES GOODWIN It creates an embodied frame, a publicly displayed, shared focus of visual and cognitive attention, within which other kinds of sign processes, such as gesture, can flourish. An equally important, but quite different kind of framing is provided by the sequential context from which an action containing a gesture emerges. That context can include not only talk, but also the prior gestures of others. Moreover, on some occasions these gestures leave enduring traces that provide organization for subsequent action, and which suggest continuity between gesture and the human capacity to structure a consequential environment by annotating it with meaningful marks. The environmentally coupled gestures investigated here are thus organized through a rich and diverse set of structurally different kinds of spaces and frames. They are built through the mutual interplay of multiple semiotic fields, including the moving hand, the dirt which the hand is articulating, the accompanying talk, the participation framework constituted through the positioning of the participants’ bodies, local sequential organization, the larger activity that these particular actions are embedded within, etc. Since these gestures are built through the mutual elaboration of different materials in different media (e.g, the dirt, the hand, the postures of multiple bodies, language structure, etc.), they have a symbiotic organization in which a whole that is greater than, and different from any single part is created. A central feature of David McNeill’s research is his continuing emphasis on the importance of gesture for the analysis of human cognition. This raises the question of how environmentally coupled gestures might be relevant to cognition. The work of many communities, including professions such as law, archaeology and medicine, requires that members of the community have the ability to see relevant structure in an environment that is the focus of their professional scrutiny, and transform what they see there into the distinctive categories, objects of knowledge, and documents that define the special expertise of their community. Part of being an archaeologist includes the ability to see in the color patterning of dirt being excavated specific traces of earlier human activity, such the holes of posts that held up the roof of a now vanished building. Environmentally coupled gestures provide crucial resources for organizing such professional vision (Goodwin, 1994) as a form of public practice rather than private experience or idiosyncratic competence, that is as a precise way of seeing the world and constituting objects within it that can be trusted and relied upon by others. Gesture’s interstitial position as something that links the details of language use to structure in the environment provides a key analytic point of entry for investigation of the rich interdigitiaton of different kinds of semiotic resources that human beings use to build relevant action in the consequential settings that define the lifeworld of a society. Environmentally coupled gestures are central to the cognitive organization of archaeology and the ongoing constitution of the distinctive professional mind of the archaeologist. Simultaneously they force us to
ENVIRONMENTALLY COUPLED GESTURES
211
expand our sense of what counts as gesture, and the analytic frameworks required to study it. References Baker, G. P., & Hacker, P. M. S. (1983). Wittgenstein: Meaning and understanding. Chicago: The University of Chicago Press. Edgeworth, M. (2003). Acts of discovery: An ethnography of archaeological practice. Oxford, England: Archaeopress. Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York: Academic Press. Goodwin, C. (1994). Professional vision. American Anthropologist, 96(3), 606-633. Goodwin, C. (1995). Co-constructing meaning in conversations with an aphasic man. Research on Language and Social Interaction, 28(3), 233-260. Goodwin, C. (1998). Gesture, aphasia and interaction. In D. McNeill (Ed.), Language and gesture (pp.84-98). Cambridge: Cambridge University Press. Goodwin, C. (2000). Action and embodiment within situated human interaction. Journal of Pragmatics, 32, 1489-1522. Goodwin, C. (2002). Conversational frameworks for the accomplishment of meaning in aphasia. In C. Goodwin (Ed.), Situating language impairments within conversation (pp.90-116). Oxford, New York: Oxford University Press. Goodwin, C. (2002c). Time in action. Current Anthropology, 43(Supplement, August-October 2002), S19-S35. Goodwin, C. (2003a). Pointing as Situated Practice. In S. Kita (Ed.), Pointing: Where language, culture, and cognition meet (pp. 217-241). Hillsdale, NJ: Lawrence Erlbaum Associates. Goodwin, C. (2003b). The body in action. In J. Coupland & R. Gwyn (Eds.), Discourse, the body and identity (pp.19-42). Houndsmill Hampshire and New York: Palgrave/Macmillan. Goodwin, C. (in press). Human sociality as mutual orientation in a rich interactive environment: Multimodal utterances and pointing in aphasia. In N. J. Enfield & S. C. Levinson (Eds.), Roots of human sociality. London: Berg Press. Goodwin, C., & Goodwin, M. H. (2004). Participation. In A. Duranti (Ed.), A companion to linguistic anthropology (pp.222-243). Oxford: Basil Blackwell. Haviland, J. B. (1996). Projections, transpositions, and relativity. In J. J. Gumperz & S. C. Levinson (Eds.), Rethinking linguistic relativity (pp.271-323). Cambridge: Cambridge University Press. Haviland, J. B. (1998). Early pointing gestures in Zincantán. Journal of Linguistic Anthropology, 8(2), 162-196. Heath, C., & Hindmarsh, J. (2000). Configuring action in objects: From mutual space to media space. Mind, culture and activity, 7(1&2), 81-104. Heritage, J. (1984). Garfinkel and ethnomethodology. Cambridge: Polity Press. Hutchins, E., & Palen, L. (1997). Constructing meaning from space, gesture, and speech. In L. Resnick, R. Säljö, C. Pontecorvo & B. Burge (Eds.), Discourse, tools and reasoning: Essays on situated cognition (pp.23-40). Berlin, Heidelberg, New York: Springer-Verlag. Kendon, A. (1990). Behavioral foundations for the process of frame-attunement in face-to-face interaction. In A. Kendon (Ed.), Conducting interaction: Patterns of behavior in focused encounters (pp.239-262). Cambridge: Cambridge University Press. Krauss, R. M., Morrel-Samuels, p., & Colasante, C. (1991). Do conversational gestures communicate? Journal of Personality and Social Psychology, 61, 743-754. Latour, B. (1987). Science in action: How to follow scientists and engineers through society. Cambridge, MA: Harvard University Press. LeBaron, C. (1998). Building communication: Architectural gestures and the embodiment of ideas. Ph. D. dissertation, Department of Communition,The University of Texas at Austin.
212 CHARLES GOODWIN LeBaron, C. D., & Streeck, J. (2000). Gestures, knowledge, and the world. In D. McNeill (Ed.), Language and gesture (pp.118-138). Cambridge: Cambridge University Press. McNeill, D. (1992). Hand & Mind: What Gestures Reveal about Thought. Chicago: University of Chicago Press. Murphy, K. M. (2005). Collaborative imagining: The interactive use of gestures, talk, and graphic repesentation in architectural practice. Semiotica, 156(1/3), 113-145. Nevile, M. (2001). Beyond the black box: Talk-in-interaction in the airline cockpit. Ph.D. dissertation, Department of Linguistics, Australian National University, Canberra. Rimé, B., & Schiaratura, L. (1991). Gesture and speech. In R. Feldman & B. Rimé (Eds.), Fundamentals of nonverbal behavior (pp.239-281). Cambridge: Cambridge University Press. Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50, 696-735. Schegloff, E. A., Jefferson, G., & Sacks, H. (1977). The preference for self-correction in the organization of repair in conversation. Language, 53, 361-382. Streeck, J. (1996). How to do things with things. Human Studies, 19, 365-384. Tomasello, M. (1999). The cultural origins of human cognition. Cambridge, MA: Harvard University Press. Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Wittgenstein, L. (1958). Philosophical investigations. Edited by G. E. M. Anscombe & R. Rhees, Translated by G. E. M. Anscombe, 2nd edition. Oxford: Blackwell.
Indexing Locations in Gesture Recalled Stimulus Image and Interspeaker Coordination as Factors Influencing Gesture Form1 Irene Kimbara Kushiro Public University, Japan
Gestures visually depict a speaker’s mental image of a referent. The visuo-spatial aspects of referents are mapped onto the gesture’s form. Especially during cartoon narrations, speakers are likely to map the locational information contained in the recalled stimulus onto their gestures. The present study investigated the strength of such recalled stimulus images by pitting them against another factor that also influences gesture form, interspeaker coordination. Members of dyads watched mirror image clips (the same video clips with reversed left-right orientation) and narrated the content collaboratively. The results revealed effects of both recalled images and coordination on gesture form.
1.
Introduction
At the heart of McNeill’s Growth Point (GP) theory lies the idea that gestures result from a language-imagery dialectic, a process where two fundamentally different modes of thinking, linguistic categorical and visuospatial, seek a balance (McNeill, 2005:92-94). One of the strengths of GP theory comes from its embrace of various factors that shape gesture output. McNeill argues that gesture derives its form in part from the imagery of objects and events obtained through visual experience (McNeill, 1992:12). A consequence of this is the consistency with which a referent’s actual location is mapped onto gesture space (McCullough, 2005). For instance, in narrations elicited using a cartoon stimulus, an object originally located at the right of the screen in the cartoon tends to be gestured in right gesture space. Further, McNeill points out that the growth point contains information derived from social interaction (McNeill, 2005:151-163). Thus, gestures are also influenced by the interlocutors’ seating position (Özyürek, 2002), by the way meanings are encoded by other conversational participants (Furuyama, 2000;
1
Amy Franklin, Fey Parrill, Karl-Eric McCullough, Mika Ishino, and Susan Duncan helped me conduct this research at various stages and gave me helpful comments and suggestions. I was also fortunate to have spent my graduate years under the guidance of David, whose everlasting intellectual curiosity and vigor as a researcher I always admire.
214 IRENE KIMBARA Tabensky, 2001; Kimbara, 2006), and by negotiations over meaning with other speakers (McNeill, 2003). My goal in this study is to contribute to our understanding of how these factors—both within and across speakers—affect gesture. I will focus on intraspeaker influences by examining the robustness of the mapping between the perceptual properties of objects and how they are depicted in gesture. I will focus on interspeaker influences by examining how this mapping is affected when other speakers’ gestures present discrepant information during collaborative description. 2.
Inducing Conflict between a Recalled Stimulus Image and a Second Speaker’s Gestures
In the experiment, conflicts between speakers’ recalled stimulus images and their co-participants’ gestures were induced by collecting narrative data after each member of a dyad watched a different form of the same cartoon. That is, each watched a mirror image clip of what their co-participant saw, produced by reversing the left-right orientation of the original cartoon. With this manipulation, except for the relative location of objects and the orientation of motion events on the screen, the stimuli sets were identical (see Figure 1). In all, ten short stimulus clips, ranging from 4 to 39 seconds, were created from three cartoon episodes; one from a ‘Sylvester and Tweety’ cartoon, (Canary Row, Warner Brothers, Inc.) and two from a ‘Tom and Jerry’ cartoon (Tom and Jerry, MGM). Participants were not informed about the mirror-image discrepancy between the clips they and their partners viewed. They were simply told that they would watch the same videos. The assumption behind this experimental manipulation was that it created different mental representations, across the members of the dyads, of the motion trajectories of events. We then reasoned that, if the orientation of gestures derives from a recalled visual image through direct mapping, co-participants should gesture in opposite directions. If, alternatively, they monitor each other’s gestures for directionality and coordinate their own gestures with those of their coparticipants, the gestures of both members of a dyad should move in the same direction.2 Cartoon descriptions were collected under two conditions, one in which members of dyads described the content of the cartoon as a monologue (‘monologue description’) and the second in which they described the same content with their partners (‘dyad description’). First, monologue descriptions were collected so that the influence of recalled stimulus images on speakers’ own gestures could be examined without the influence of other speakers’ gestures. Then, dyad descriptions were collected to examine how the mapping in the monologue descriptions were affected by interspeaker influence. 2
It is of special importance to note that orientations are rarely mentioned in speech; speakers rarely say, for example, The cat runs from the right and falls into a river on the left. This means that the probability of participants being made aware of the orientation of events through their coparticipants’ speech is very low, a factor which might otherwise confound the results of the study.
INDEXING LOCATIONS IN GESTURE 215
Figure 1: Original stimulus (on the left) and mirror image stimulus (on the right).
The experimental procedure was as follows. Upon arrival, the individual members of each dyad were led to separate rooms where each watched a single cartoon clip (either the original or mirror image version) on a computer monitor. A monologic description of the clip was then recorded in the same room, after the participant was instructed to be “casual but coherent,” so that the description of what the participant had seen could be understood by someone who had not seen the animated clip. The members of the dyad then met in another room, where a dyadic, collaborative description was recorded. When disagreements arose about the content of the clip, participants were encouraged to resolve them through discussion. This entire procedure was repeated for all ten stimulus clips. At the completion of the experiment, participants were asked whether they could guess the goal of the experiment and whether they noticed anything unexpected or unusual during their participation. The experiment was conducted with English and Japanese native speakers, to determine whether speakers of typologically different languages, from different cultural groups, display different patterns. A total of eight dyads of native English speakers and eight dyads of native Japanese speakers participated. With the exception of one dyad, all were previously acquainted with one another. 3.
Recalled Stimulus Images: Analysis of Monologic Descriptions
The monologic data were taken from a total of 316 descriptions, produced by all sixteen dyads; eight English-speaking3 and eight Japanese-speaking. The gesture data were coded as follows. The cartoon stimuli contain a total of 22 motion trajectories.4 Gestures representing these trajectories were identified and coded for their orientation: right, left, forward, and backward. When a speaker’s hand moved diagonally, the gesture was coded as a combination of two of the primary orientations above, for example, “right-forward.” Gestures were also categorized into three different groups according to the axis along which their
3
One dyad could not finish the last two clips because the experiment went overtime. If a trajectory changes its orientation midway (e.g., when a character makes a turn), each path with its unique orientation was considered to be a separate trajectory. 4
216 IRENE KIMBARA trajectory was aligned relative to the speaker’s body: the transverse axis (parallel to the body), the sagittal axis (perpendicular to the body), and the diagonal axis. Gestures were then coded with respect to McNeill’s (McNeill, 1992) distinction between ‘character viewpoint’ (C-VPT) and ‘observer viewpoint’ (OVPT) gestures. C-VPT gestures embody a speaker’s identification with a character; the speaker performing the character’s action using his or her own body. O-VPT gestures depict the referent from an observer’s perspective, for example, a mouse running to the right on the screen might be represented by a hand moving toward the speaker’s right. To determine the accuracy of the direct mapping between a referent’s orientation and that of a corresponding gesture, O-VPT gestures were divided into those whose motion trajectory was in the same direction as the stimulus image (‘ipsilateral’ gestures), and those with motion trajectories in a direction opposite to the stimulus image (‘contralateral’ gestures). Because C-VPT gestures involve a shift in perspective from that of the speaker as an observer to that of the cartoon character and, thus, do not show direct mapping by definition, only O-VPT gestures were considered in this analysis. It is important to note that restricting the analysis to O-VPT gestures eliminated less than 25% of the data. That is, in both English and Japanese, O-VPT gestures were the most frequent, accounting for more than three quarters of all trajectory gestures (in English, Mean=81%; in Japanese, Mean=76%). C-VPT gestures were the second most frequent (in English, Mean=13%; in Japanese, Mean=15%); and there were only a small number of diagonal gestures (in English, Mean=6%; in Japanese, Mean=8%).
Figure 2: Mean proportion of ipsilateral and contra-lateral gestures in monologue description (O-VPT gestures only).
Figure 2 shows the mean proportion of ipsilateral and contralateral gestures in the monologic descriptions. The results strongly indicate that, in both language groups, visuo-spatial representations of stimulus trajectories robustly influence
INDEXING LOCATIONS IN GESTURE 217
gesture movement trajectories. The figure shows that on average, 98% (Std.Dev.=10%) of the gestures of English speakers and 94% (Std.Dev.=10%) of the gestures of Japanese speakers were ipsilateral with respect to the recalled stimulus image. Contralateral gestures were used on average in 3% (Std.Dev.=10%) and 6% (Std.Dev.=11%) of gestures in the English and Japanese data respectively. Pairwise t-tests revealed the speakers’ strong preference for a veridical mapping, for the Japanese data (t(15)=9.42, p < .001) and for the English data (t(15) = 8.08, p < .001). 4.
Gesture Coordination: Analysis of Dyadic Descriptions
Having seen that in monologic descriptions a speaker’s gestures were strongly influenced by recalled images of cartoon episodes, I then examined how this tendency was affected by gestures produced by the other member of the dyad. This involved analysis of 158 dyadic, collaborative descriptions from the eight English-speaking and eight Japanese-speaking dyads. The focus of analysis, gesture coordination across speakers, was the frequency of orientation conflicts across coreferential gestures. Coreferential gestures are those produced by different members of a dyad that depict the same stimulus content. First, gestures were entered into the database only if the same referent trajectory was depicted by both members of a dyad. In all, the dyadic descriptions included 131 interspeaker coreferential gesture pairs. Each gesture in the database was then categorized according to trajectory orientation, trajectory axis, and viewpoint, as for gestures in the monologic data (see section 3, above). In addition, gesture pairs were coded according to the temporal relationship and relative orientation and viewpoint that held across gestures in each pair. With respect to temporal relationships, a gesture pair was coded as occurring in ‘overlap’ if any part (that is, preparation, stroke, hold, or retraction phase) cooccurred for any interval of time. The temporal relation between gestures was taken into consideration because of the assumed cognitive advantage it conferred in terms of later recall. In order for coordination to occur across speakers, speakers need not only recognize the meaning of each other’s gestures but must also retain that information so that the same meaning can later be encoded. In this experiment, it was necessary for speakers to remember the orientation of their coparticipants’ trajectory gestures, to avoid contradictory representations of the same motion trajectory (e.g., one speaker’s gesture moving to the left while that of the co-participant moved to the right). Memory load is assumed to be smallest when coreferential gestures are produced without a temporal gap. For this reason, the effect of interpersonal gesture coordination in terms of trajectory orientation was expected to manifest itself more clearly when gestures were produced in overlap. In all, 44% of the English speakers’ gestures occurred in overlap, and 56% of the Japanese. With respect to orientation and perspective, gesture pairs were grouped into three types, depending on whether both speakers used O-VPT gestures, and
218 IRENE KIMBARA whether there was an orientation conflict. Table 1 shows the distribution of the 130 coreferential gesture pairs among these three types (one case, where the first speaker produced a gesture contra-lateral to her stimulus image, was excluded). The large proportion of orientation conflicts (34% in overlapping and 63% in nonoverlapping pairs) suggests that, even though their partners’ gestures present radically different images, speakers still tended to gesture in accord with the images they retained from viewing the cartoon stimuli. Table 1: Dyads’ viewpoint and trajectory orientation for overlapping and non-overlapping interspeaker coreferential gesture pairs. Overlap (N=68) Non-overlap (N=62) N (%) N (%) Orientation conflict (both O-VPT) 23 (34) 39 (63) No orientation conflict (both O-VPT) 8 (12) 0 (0) C-VPT used by at least 1 speaker 37 (54) 23 (37)
When comparing overlapping and non-overlapping pairs, however, one can also see the effect of temporal relation on interpersonal gesture coordination. The rate of conflicting gestures decreased by approximately half when coreferential gestures appeared in overlap. Coreferential, non-overlapping gestures had conflicting orientations in 63% [39/62] of the cases, but in only 34% [23/68] of the cases when gestures overlapped. An analysis of variance was performed, with temporal relation (Overlap and Non-overlap) as the independent variable and orientation type as the dependent variable. The results revealed a significant main effect (F(1, 128) = 16.45, p < .001). In contrast, the difference between the two languages was not significant (F(1, 128) = 0.69, p < .41 n.s.), indicating that speakers of these typologically and culturally different language groups behaved similarly in terms of how their recalled stimulus image influenced their gestures during collaborative descriptions. In many cases, it appeared that speakers avoided discrepant representations of a unique event by employing a viewpoint that did not bring out the conflict. As shown in Table 1, dyads produced more C-VPT gestures during overlap (54% of gestures in overlapping pairs had C-VPT by at least one speaker, compared to 37% for non-overlapping pairs). This suggests that when gestures co-occurred, the speakers became more sensitive to each other’s gestures, and tried to reach a compromise between their recalled stimulus image and their partner’s gesture by adopting a viewpoint which lacked relevant information regarding left-right oppositions. Interpersonal coordination also characterizes the eight instances of O-VPT gesture pairs in which both speakers gestured along the transverse axis (parallel to the body) and yet shared a common orientation for their gesture’s trajectory. In the elicitation context, this meant that gestural coordination was occurring at the expense of one speaker’s underlying (left→right or right→left) image. Eight instances is a small number relative to the size of the whole dataset (131 coreferential gesture pairs); so small, in fact, as to suggest that what seems like
INDEXING LOCATIONS IN GESTURE 219
coordination on the surface may well be due to memory failure, instead. Crucially, however, all of these instances were found among overlapping gestures. Given that temporal proximity likely correlates with the degree to which speakers attend to each other’s gestures, this makes a strong case for the view that these instances were not products of memory failure. Thus, though small in number, these instances are particularly significant because they show the degree to which the factor of coordination influences gesture output, overriding the effect of the recalled stimulus image. 5.
Conclusion
In the present experiment, as far as the O-VPT gestures are concerned, the vast majority of trajectory gestures from the monologic descriptions consistently reflected the actual directionality of events on the screen during cartoon stimulus presentation, in both English and Japanese. That is, in most cases, when the leftright axis was used, the orientation of the stimulus motion trajectory events mapped directly onto that of the corresponding gestures. Moreover, the data from dyadic descriptions revealed that gestures are strongly influenced by recalled stimulus images as well, even when the gestures of their co-participants present incompatible information. Why is the recalled stimulus image so powerful? One might wonder if participants allowed conflicting gestures to pass without comment despite their awareness of the discrepancy. Comments in debriefing sessions, however, indicated that the participants were unaware of the experimental manipulation. None expressed suspicion that the clips were presented in a mirror image to their co-participants. The answer was always negative when participants were asked directly if they noticed that their partner’s gestures frequently went in a different direction from their own. In this regard, I point out that, in a few rare cases where the dyads were inadvertently alerted to the possibility that they had watched mirror image clips, they became aware of the discrepancy through their descriptions in speech, but not in gesture. For example, one speaker explicitly mentioned the position of an object in relation to the screen, saying sono gamen no hidari hashi ni wa chuubu ga atte, “there is a tube at the left end of the screen.” However, the speakers’ suspicions in this and other cases were not pursued further in the subsequent descriptions, even though their gestures presented ample evidence, right in front of their eyes, that such suspicions were justifiable. Instead, the suspicions were resolved by attributing them to one speaker’s failure to memorize accurately. An alternative and perhaps more likely account of the robustness of the recalled stimulus image is that the semantic domain of motion trajectory orientation is particularly resistant to interpretive processes. Gesture coordination relies on perception of the other speaker’s gestures, followed by semantic decoding by the perceiver. For speakers to be aware of a conflict in orientation, the perceived gesture and the speaker’s own recalled stimulus image of the
220 IRENE KIMBARA corresponding referent need to be compared. Previous studies (Krauss et al., 1991; Beattie & Shovelton, 1999a; 1999b) showed that gestures encoding different semantic information are recognized and decoded by listeners to varying degrees. Despite the robustness of the recalled stimulus image, however, the present analysis of the rate of representational conflicts indicates that the temporal proximity between coreferential gestures affects how speakers perceive each other’s gestures and how likely gestures are to be coordinated across speakers. In the present experiment, the effect of coordination materialized itself more often in contexts where both speakers entertained a visual image of the same referent at the same time, when formulating their gestures. More specifically, in such contexts, some speakers chose to adopt their partner’s gesture orientation while other speakers shifted their perspectives from O-VPT to C-VPT, as a means of overcoming a problem induced experimentally with the use of mirror-image stimulus clips. Since the participants were not informed about the mirror-image discrepancy, in order to collaboratively construct a consistent description of trajectory events through gestures, some speakers adjusted their gestures in accordance with those of their co-participants. To summarize, the present study tested the interpersonal effect of gesture coordination by putting it at odds with the intrapsychological factor of the recalled stimulus image derived from the speaker’s own perceptual experience. The findings supported one of the central theses of GP theory, that gestures are generated by both intrapsychological and interpsychological inputs. References Beattie, G. & Shovelton, H. (1999a). Do iconic hand gestures really contribute anything to the semantic information conveyed by speech?: An experimental investigation. Semiotica, 123, 1-30. Beattie, G. & Shovelton, H. (1999b). Mapping the range of information contained in the iconic hand gestures that accompany spontaneous speech. Journal of Language and Social Psychology, 18, 438-462. Furuyama, N. (2000). Gestural interaction between the instructor and the learner in origami instruction. In D. McNeill (Ed.), Language and gesture (pp.99-117). Cambridge: Cambridge University Press. Kimbara, I. (2006). On gestural mimicry. Gesture, 6(1), 39-61. Krauss, R. M., Morrel-Samuels, P., & Colasante, C. (1991). Do conversational hand gestures communicate? Journal of Personality and Social Psychology, 61, 743-754. McCullough, K.-E. (2005). Using gestures during speaking: Self-generating indexical fields. Unpublished doctoral dissertation, University of Chicago, Illinois. McNeill, D. (1992). Hand and mind. Chicago: University of Chicago Press. McNeill, D. (2003). Pointing and morality in Chicago. In S. Kita (Ed.), Pointing: Where language, culture, and cognition meet (pp.293-306). Mahwah, NJ: Erlbaum. McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press. Özyürek, A. (2002). Do speakers design their co-speech gestures for their addressees?: The effects of addressee location on representational gestures. Journal of Memory and Language, 46, 688-704. Tabensky, A. (2001). Gesture and speech rephrasings in conversation. Gesture, 1(2), 213-235.
The Role of Iconic Gesture in Semantic Communication and Its Theoretical and Practical Implications Geoffrey Beattie and Heather Shovelton The University of Manchester
David McNeill has claimed that, “[u]tterances possess two sides, only one of which is speech; the other is imagery, actional and visuo-spatial.” One implication of this claim is that listeners should receive considerable amounts of information from the gestures that accompany talk. This chapter reviews the experimental evidence to test this hypothesis. The basic paradigm involves a comparison of the information listeners receive when they hear speech and see accompanying gestures compared with just hearing speech. This programme of research provides conclusive evidence that gestures do communicate. The chapter also tests the implications of McNeill’s theory for the design of TV adverts and provides evidence that the inclusion of gestures in these adverts is a most effective way of communicating information. The overall conclusion is that the research of McNeill has major implications for our conceptualisation of human communication as well as all attempts to make communication more effective.
1.
Introduction
There is something quite compelling about watching someone speak. The involvement, the animation and the movement of the speaker all draw the eye as well as the ear of the listener. Thomas Mann (1924) wrote in The Magic Mountain that, “[s]peech is civilization itself. The word, even the most contradictory word preserves contact⎯it is silence which isolates.” But it is talk, in our view, that preserves this contact; it is speech and all of the accompanying, intimately-timed hand and arm movements that bind the senses of the listener and draw us to the other. Some scholars have recognized the intimacy of this connection and have attempted to understand how speech and the movement of the hands and arms link together or can be linked together for rhetorical effect (Quintillian, 100/1902; Bacon, 1605/1952; Bulwer, 1644/1974; Austin, 1806/1966; Tylor, 1878) but the majority of scholars working in psychology, linguistics and semiotics have aimed to keep speech and movement apart on conceptual and functional grounds, following Wundt (1921/1973) and others. Language and speech, it is assumed, are primarily concerned with propositional thought and the communication of semantic information about the world, whereas the movements of the body, changes in facial expression, and posture and hand and arm movement are
222
GEOFFREY BEATTIE AND HEATHER SHOVELTON
assumed to communicate emotional information and form the basis of the social processes through which interpersonal relationships are established, developed and maintained (see Argyle, 1972; but see also Beattie, 1983; 2003, Chapter 2). But what was most striking to those who analysed speakers in action, in sufficient detail, was how close the connection really was between what was said in speech and the patterns of the accompanying hand movements. These hand movements often appear to ‘illustrate’ what is being said through a series of images. In their early taxonomy of nonverbal behaviour Ekman and Friesen duly called these hand movements ‘illustrators’, the hands apparently floating over the surface of communication somehow drawing ‘pictures’, simultaneously representing what was being discussed in the speech. “Some illustrators could be considered communicative,” they wrote (Ekman & Friesen, 1969, our italics), but there was no real detail and no hard evidence presented for the claim. ‘Some’ can mean ‘a few’ or it can mean ‘a lot’; the extent of the communicative power of these movements was left hanging, but the actual quote itself seemed, somehow, to downplay the role of this type of hand movement in communication. 1.1
A new model of human communication
But the research of David McNeill (McNeill, 1979; 1985; 1992; 2000) changed how we thought about these movements; it gave them a new theoretical import, and it provided them with a central role in human communication. In his view, they were as significant in the communication of thoughts and ideas as the speech itself. He argued that these ‘illustrators’, which he termed ‘iconic’ gestures (or ‘metaphoric’ for the more abstract) do not merely illustrate what is in the speech. Rather they cooperate centrally with the speech to convey propositional or semantic information and, “we should regard the gesture and the spoken utterance as different sides of a single underlying mental process” (1992:1). To support such a radical idea, McNeill used examples like the following, where an adult male is describing how a cartoon character bends a tree back to the ground: (1)
‘and he [bends it way back]’1 Hand appears to grip something and pull it back and down
McNeill pointed out that, “[i]n bending back with a single hand, the gesture displayed what was, at most, only implicit in speech—that the object bent back was fastened at one end. ‘Bends it’” could equally describe bending an object held at both ends” (2000:7). His theoretical conclusion was that, “Utterances possess two sides, only one of which is speech; the other is imagery, actional and visuospatial. To exclude the gesture side, as has been traditional, is tantamount to ignoring half of the message out of the brain” (2000:139). The challenge to the 1
The boundaries of the meaningful part of the gesture (the so-called ‘stroke’ phase of the gesture) are shown by enclosing the concurrent segments of speech in square brackets.
ICONIC GESTURE IN COMMUNICATION 223
disciplines of psychology, linguistics, semiotics and micro-sociology was immense. They had, in effect, only been working with half the picture when it came to understanding how ideas were communicated. McNeill’s primary theoretical focus has been on the production of gesture, but his theory does have obvious and important implications for how iconic gestures might work in social interaction. If half of the messages out of the brain are represented in iconic gestures then presumably listeners may well be responsive to them. But what evidence is there that ‘listeners’, watching now as well as actually listening, pick up on the information encoded in these iconic gestures and somehow combine this with the information encoded in the speech itself. Unfortunately, no researcher had experimentally and conclusively demonstrated that listeners extract the information contained within naturallyoccurring iconic gestures to combine with the information extracted from the speech modality. This could be seen as a major shortcoming of a theory which maintains that utterances have two sides and that gestures, like speech itself, are a core part of the communication of thoughts and ideas. What McNeill did present as evidence were interpretations of speech-gesture combinations, little bundles of behaviour, with a ‘reading’ of what was in the speech and a ‘reading’ of what was in the gesture. The example above is a good demonstration of the basic approach. The speech segment ‘and he bends it way back’ with its accompanying iconic gesture is interpreted by the analyst, but is never presented as an utterance to other individuals to determine what extra information they actually receive from the gesture. The assumption a reader might make is that other individuals would be able to extract the information that is encoded in the gesture, in real time and in the context of the general flux of behaviour, just like the analyst, and somehow combine it with the information encoded in the speech modality. But this, of course, is an enormous assumption. So what empirical evidence is there that iconic gestures do contribute to semantic communication? McNeill himself carried out very few experiments to determine how listeners process the information contained within speech and within gestures. He did demonstrate that staged gestures and speech, which did not match in their gesture-speech combinations, were combined by listeners in their memory of the event (McNeill, Cassell & McCullough, 1994). For example, a narrator, describing a Sylvester and Tweety cartoon, said, “and he came out the pipe,” (a statement without any depiction of manner) whilst performing an upand-down bouncing gestural movement (a gesture specifically depicting manner) at the same time. This utterance was recalled by listeners as the character emerging from the pipe in a particular bouncing manner. McNeill (McNeill et al, 1994) argued that this process of resolution was done quite unconsciously by listeners, and it occurred with a range of semantic features. But this experiment was based solely on staged combinations, by definition gesture-speech combinations that are odd and somewhat attention-grabbing (even if the resolution was done unconsciously). How do listeners deal with information in the gestural and speech modalities when they occur naturally, when the combinations are more mundane and ordinary, where they do not draw attention
224
GEOFFREY BEATTIE AND HEATHER SHOVELTON
to themselves through their strangeness? There may be a variety of both theoretical and practical reasons why listeners are unable to use the information from both natural sources. They may not for example be attuned to the more mundane iconic gesture, except when there is a mismatch, or the information contained within the natural gesture might be too vague, or the information contained within the gesture might be too hard to interpret in real time, or the listeners might be overcome by the sheer complexity of combining information from linguistic and non-linguistic sources. A number of other psychologists had attempted to determine if iconic gestures are communicative, but they had all used, in our opinion, unsuitable methods. For example, Krauss, Morrel-Samuels & Colasante (1991) investigated whether people could match gestures with the actual words that they accompanied in talk, and they found that people were not very good at doing this. Their conclusion was that iconic gestures cannot be communicative because the relationship of gesture to speech is “relatively imprecise” and “unreliable.” However, this study only considered individuals’ ability to connect together the speech and the gesture; it only considered the relationship between the two modalities. But that is not the critical issue. The critical issue for interpersonal communication is the relationship between iconic gestures and the ‘world out there’ being talked about (and of course, the relationship between the speech and the ‘world out there’); the relationship between the two modalities is very much secondary. Similarly, Hadar (2001) asked participants to choose that word that best described the meaning of a gesture clip shown to them. They could do this at above chance level, but in Hadar’s opinion the level was too low (40%, with chance being 20%, in a forced choice condition with five questions) to suggest that iconic gestures could have a major communicative effect. Hadar concluded that, “although the shaping of gestures is clearly related to the conceptual and semantic aspects of the accompanying speech, gestures cannot be interpreted well by naïve listeners” (p.294). The implication is that whatever gestures do, they are not communicating core information in everyday conversation. (Hadar thinks that what they actually do is to assist the speaker in encoding language, particularly in aiding lexical access from the mental lexicon; see Butterworth and Beattie, 1978; Butterworth & Hadar, 1989; Hadar & Butterworth, 1997; but see Beattie & Coughlan, 1998; 1999; Beattie & Shovelton, 2000; 2002a.) But we would want to argue that the approaches of Krauss and Hadar are in principle unable to answer the question as to the possible communicational function of iconic gestures. If gestures are designed to communicate then they should provide critical information about the semantic domain to be encoded, rather than about the accompanying speech. Whether participants in an experiment can match up the speech and the accompanying iconic gestures will depend upon a number of other factors, the most obvious of which is, of course, whether the gestures are essentially ‘co-expressive’ in their mode of operation, i.e. expressing roughly the same information as in the speech, or ‘complementary’,
ICONIC GESTURE IN COMMUNICATION 225
i.e. expressing information that is not contained within the speech. Iconic gestures show both types of relationship with speech in everyday talk, but co-expressive gestures, we would hypothesise, would be significantly easier to match with their accompanying speech. But regardless of whether co-expressive or complementary gestures are used this is not the best approach to this issue. 1.2
Experimental tests of the basic proposition
We wanted a different sort of test of the theoretical idea derivable from McNeill that iconic gestures contribute to semantic communication. In our first study (Beattie & Shovelton, 1998; 1999a), we video-recorded participants narrating cartoon stories and then played either short speech segments or the gesture-speech combinations (the short speech segments plus their accompanying gesture) to two sets of participants who were then questioned about details of the original stories. For example, some participants just heard: (2)
Billy going sliding along and causing all sorts of mayhem
The other set of participants were presented with the following gesturespeech combination: (3)
Billy going [sliding along] and causing all sorts of mayhem Fingers of left hand are straight and close together, palm is pointing downwards. Hand makes a rapid movement to the left.
The speech segments varied in length, but all were longer than individual clauses. In this experiment, we studied all of the 34 iconic gestures that were generated by seven encoders. In other words, there was no quality filter applied to exclude any gestures from the decoding part of the study, for example on the grounds that the gestures were too ‘vague’ or too ‘quick’. We wanted a truly representative sample of the kinds of gestures found in everyday talk. We then used 60 decoders—30 decoders just heard the speech segments, and another 30 were presented with the gesture-speech combinations. After each extract was presented the decoders were asked two questions relating to what was happening in the original cartoon. We generated the questions on the basis of what we, as experimenters, thought might be being depicted in the gesture. For example, in the extract above we thought that the iconic gesture could have communicated something about the direction of movement and perhaps also something about the speed of the movement. So we asked two questions after each extract was presented; the questions were to be answered by a simple ‘yes’ or ‘no’ to make the scoring completely unambiguous: (i) ‘Does Billy slide to his left?’ or, (ii) ‘Does Billy slide slowly?’ We asked questions like: ‘Does the table move in a circular motion as it is rising?’, ‘Is the net very low down?’, ‘Is the pole very large in relation to the table?’ Thirty-one of these questions related primarily to properties of actions and
226
GEOFFREY BEATTIE AND HEATHER SHOVELTON
thirty-seven related to properties of objects. They covered such things as the ‘identity’, ‘size’, ‘shape’, ‘number’ and ‘relative position’ of any people or objects discussed; also the ‘nature’, ‘speed’ and ‘direction’ of any action and whether the action involved ‘upward movement’, ‘rotation’ or ‘contact’, as well as the ‘location’ of any action. The highest possible mean score for each participant for each segment was 2.00, corresponding to getting both questions correct; the chance probability in answering two yes/no questions correctly was, of course, 1.00. We found that those participants who were presented with the gesture-speech combinations got an average of 1.67 questions correct, whereas those who heard only the speech extracts got 1.42 questions correct. The overall percentage correct was 83.5% for gesture-speech combinations and 71.0% for speech-only segments. In other words, those participants presented with gesture-speech combinations got significantly more information about the original story than those who only heard the speech. This, in our minds, was an important finding. Taking into account the chance probability of guessing, our decoders received an extra 0.25/0.42 (Beattie & Shovelton, 1999a) more information from the gesture-speech combinations, in other words approximately 60% more information. These iconic gestures, in other words, are crucial to the overall message and in purely quantitative terms carry over half as much again as the verbal part of the message. The iconic gestures also seem to carry information about a range of different semantic features including the ‘speed’ and ‘direction’ of the action, whether the action involved ‘rotation’ or ‘upward movement’, the ‘relative position’ of the people and objects depicted, and the ‘size’ and ‘shape’ of the people and the objects depicted. ‘Relative position’ and ‘size’ emerged as the most robust effects, mainly because they seemed to be the primary dimensions (as evidenced by the actual questions generated) encoded by our particular sample of gestures. Clearly, however, these results were very much in line with the view that if you want to get the full message you need to take both the iconic gesture and the speech into consideration. Those who either fail to notice or ignore iconic gestures are clearly missing a source of much potential information. The next study (Beattie & Shovelton, 1999b) tried to be more precise about exactly what information listeners pick up from the iconic gestures that accompany speech. The first study only asked two questions about each gesture, but there was always the possibility that each gesture contained a good deal more information than we were attempting to measure. So in this second study after each participant heard just the speech, saw just the gesture on its own, or was presented with the gesture-speech combination, we asked eight general questions. These explored fourteen relevant types of information that we thought the iconic gestures were associated with. 1. What object(s) are identified here? (identity). 2. What are the object(s) doing? (description of action, manner). 3. What shape are the object(s)? (shape).
ICONIC GESTURE IN COMMUNICATION 227 4. How big are each of the object(s) identified? (size). 5. Are any object(s) moving? (movement). 6. If so in what direction are they moving? (direction, rotation, upward movement). 7. And at what speed are they moving? (speed). 8. What is the position of the [moving/stationary] object(s) relative to something else? (relative position, location of action, orientation, contact).
Participants in this experiment were presented with eighteen clips (six containing only speech, six containing only gestures, and six containing gesturespeech combinations–all randomised), which were all now restricted to clause length units on the basis that iconic gestures rarely cross clause boundaries. (This restriction allowed a greater degree of control over the size of the speech segment, which did vary in the first study.) After the clip was played the experimenter asked the participant half of the questions; the same clip was then played again, and the remaining questions were asked. These interviews, it should be added, were very intensive and lasted up to two hours each. Each participant also had to give a confidence rating on each of their judgments on a scale from 1 to 3, where 1 meant ‘not confident’, 2 meant ‘moderately confident’, and 3 meant ‘very confident’. This experiment again found that when participants were presented with the gesture-speech combination they were significantly better at answering questions about the original cartoon story than when they heard just the speech extracts. We estimated the mean percentage accuracy for the gesture-speech combinations to be 62.1%, for the speech only to be 51.3% and for iconic gestures on their own to be 20.4% (averaging across all of the different semantic categories). The estimate here of how much information the iconic gesture adds to the speech is much lower than in the first study because all semantic dimensions were considered for every gesture (and not just the semantic dimensions to which the gestures seem to be particularly geared). There was another very important observation in this study. McNeill has always argued that iconic gestures convey meaning in a ‘top-down’ rather than a ‘bottom-up’ fashion; that is, you need to have some understanding of the overall image portrayed in the hand movement before you can understand what the component actions are representing. He says that, “[t]he wiggling fingers mean running only because we know that the gesture, as a whole depicts someone running.” But we found that an iconic gesture can convey the speed of movement, the direction of movement, and also information about the size of the entity depicted in the gesture, even when people watching the iconic gesture in isolation could not determine exactly what the entity being depicted actually was. You only had to know that something was sliding along in a particular direction, and at a certain speed, to get certain questions correct, but you didn’t have to be able to say what that something actually was. So iconic gestures may operate in a ‘top-down’ fashion, but that does not mean that you have to get the full meaning at the highest level before any information is transmitted via the gesture. The meaning of the gesture may still be global in one sense, with the meaning of the individual parts
228
GEOFFREY BEATTIE AND HEATHER SHOVELTON
determined by the meaning of the gesture as whole, but the process of decoding can operate even when there is considerable ambiguity at the highest level. One of the most extraordinary results in this experiment concerns the performance of individual participants. Although all the participants gleaned some additional meaning from the gestures, the percentage increase in accuracy in moving from the speech only to the gesture-speech combinations ranged from 0.9% to 27.6%. We also found that the ability to interpret iconic gestures on their own correlated with the ability to interpret them when they were presented alongside speech. Why some participants are particularly good at extracting information from iconic gestures clearly requires further investigation. This study suggested that McNeill had perhaps, if anything, underestimated the communicative power of iconic gestures. For example, consider: (4)
[she’s eating the food] Fingers on left hand are close together, palm is facing body, and thumb is directly behind index finger. Hand moves from waist-level towards mouth.
Using McNeill’s line of argument, one might conclude that the sentence conveys the action involved (‘eating’), but not how it is accomplished. The iconic gesture is critical to communication here because it shows the method of eating— bringing the food to the mouth with the hand. McNeill would also presumably point out that the sentence in the example above is well formed and therefore the gesture cannot be considered as a repair or some other transformation of the sentence. The speech and gesture appear to cooperate to present a single cognitive representation. But unlike McNeill, we determined, through interviewing three sets of participants who either saw the gesture with or without speech, or did not see the gesture but just heard the speech, what information they actually received from the iconic gesture. What we discovered was that they received a wider range of additional information than McNeill’s typical argument would suggest. For example, all four participants who saw the gesture, in addition to hearing the speech, knew that the food was moving towards the mouth (our category of ‘relative position’) in the original cartoon, whereas only one out of three participants who did not see the gesture reported this. The other two thought that the food was ‘below the character’, presumably on a plate. Without hearing the speech (gesture-only), one out of three participants got the ‘description of action’ right. All four participants who watched the gesture-speech combination got the ‘direction’ of the movement correct—food was being drawn upwards towards the mouth (only one out of three participants in the speech only condition got this right). Consider now a second example: (5)
[by squeezing his nose] Fingers on left hand are quite straight and only slightly apart; thumb is pointing away from the fingers. Fingers and thumb then move further away from each other before moving towards each other so that hand becomes closed.
ICONIC GESTURE IN COMMUNICATION 229
Here the sentence conveys the action involved (“squeezing”) and the object involved (“nose”). When participants saw the gesture-speech combination or just heard the speech they reported this information correctly. However, the gesture when added to the speech seemed to convey information about the ‘shape’ of the nose (oblong-shaped) being squeezed. It conveyed information about the ‘relative position’ of the nose with respect to the hand that is squeezing it. The gesture also conveyed information about the ‘size’ of the nose and to a much lesser extent, the ‘speed’ of the movement. This study also showed that the semantic dimensions, ‘relative position’ of people and objects and ‘size’ of people and objects depicted, were significant right across this particular sample of gestures. With respect to these particular types of information it was also found that in the gesture-only condition participants were significantly more confident that the answers they were giving were correct than they were when answering questions about ‘identity’, ‘description of action’, ‘shape’, ‘movement’, ‘direction’ and ‘speed’. It is not just that participants were receiving more information in these particular categories right across the board; they also knew that they were. But which iconic gestures are most communicative? There is one fundamental property of gestures that was not considered in these early studies but which may well be critical, and that is the ‘viewpoint’ from which the gesture is generated. McNeill (1992) points out that two different viewpoints appear in the gestures people perform during narratives: observer-viewpoint and characterviewpoint. An observer-viewpoint gesture, “excludes the speaker’s body from the gesture space and his hands play the part of the character as a whole.” Whereas, for character-viewpoint gestures, “we feel that the narrator is inside the story,” in that a character-viewpoint gesture, “incorporates the speaker’s body into the gesture space”, where the speaker’s hands may represent the hands (paws, etc.) of the character (McNeill, 1992). McNeill’s research has suggested that characterviewpoint gestures are strongly associated with transitive verbs (e.g., ‘he hit the ball’) and observer-viewpoint gestures with intransitive verbs (e.g., ‘she is jumping’). In this experiment (Beattie & Shovelton, 2002b) we asked twenty-one participants to narrate a number of cartoon stories. In this task they displayed a total 513 identifiable hand and arm movements; 103 identified as iconic gestures. Thirty of these were selected for presentation to a set of participants. The only quality filter we used was that the gesture had to be in shot throughout and could not overlap in content with any other gesture (again we wanted both apparently vague and very quick gestures if they were the sorts of gestures spontaneously generated in this situation). Of the thirty iconic gestures we selected, fifteen were generated from a character-viewpoint and fifteen from an observer-viewpoint. The speech sample to be played to participants was restricted to the clausal unit in the immediate vicinity of the gesture (however one of the gestures did cross a clause boundary so in this case a slightly larger speech unit was used). It was found that all of the character-viewpoint gestures in our corpus were associated with transitive clauses and all of the observer-viewpoint gestures with
230
GEOFFREY BEATTIE AND HEATHER SHOVELTON
intransitive clauses. These gestures, produced from different viewpoints, were randomly ordered onto the presentation tape. Each gesture, without its corresponding speech, was played twice, and then the participants had thirty seconds to write down their answer to the following question: “Please give as much information as possible about any actions and any objects depicted in the following gesture.” Again we used eight broad semantic categories to break the meaning down into its parts, namely ‘identity’, ‘description of action’, ‘shape’, ‘size’, ‘movement’, ‘direction’, ‘speed’ and ‘relative position’. This experiment revealed that iconic gestures shown in isolation from speech and generated from a character-viewpoint were significantly more communicative than those generated from an observer-viewpoint. (Mean accuracy scores 18.8% and 10.8% respectively). For example, when seeing this gesture: Left hand moves quickly upwards; hand closes and a sharp downwards movement is made (in the absence of the speech segment, “by [pulling on his tie]”), one participant wrote, “Somebody is grabbing hold of a rope with their hand.” In this particular case the character-viewpoint gesture was scored as having conveyed information to this participant about the ‘relative position’ of the physical entities involved (i.e. the hand being wrapped around something) and the fact that ‘movement’ was occurring. Although none of the other semantic categories were scored as correct in the case of this participant, many other participants did extract a good deal more information from this same gesture (including the ‘speed’ and ‘direction’ of the action, the ‘size’ and ‘shape’ of the object involved as well as the ‘relative position’ of the physical entities depicted in the gesture). When the eight different semantic categories were considered in detail in the analysis, it was found that of the eight individual semantic categories, ‘relative position’ was communicated most effectively by character-viewpoint gestures in comparison with observer-viewpoint gestures. Character-viewpoint gestures seem to be particularly good at this semantic category because these gestures can directly show the position of something in relation to the actor’s body, the actor’s body being central to the generation of a character-viewpoint gesture. The actor’s body can act as a point of reference, which is not the case with observer-viewpoint gestures where the actor’s body is necessarily absent. Indeed, those characterviewpoint gestures in the present study tended to involve ‘relative position’ information that fell into a particular sub-category of ‘relative position’, namely the position of the entity with respect to a particular part of another entity, and character-viewpoint gestures made up 86.6% of this sub-category. Observerviewpoint gestures tended to contain ‘relative position’ information that fell into the following two sub-categories⎯the position of a moving entity with respect to a fixed entity (83.3% of these were observer-viewpoint gestures) and the position of a fixed entity with respect to another fixed entity (100% of these were observerviewpoint gestures). This experiment uncovered how a fundamental property of iconic gesture, namely the viewpoint from which it is generated, relates to its communicative effectiveness. However, something else was observed in the current study, which
ICONIC GESTURE IN COMMUNICATION 231
may have significant implications for our understanding of how iconic gestures work in everyday talk. As mentioned earlier, McNeill (1992) had proposed that character-viewpoint gestures tend to be strongly associated with transitive clauses and observer-viewpoint gestures with intransitive clauses. We found in this experiment that there was a significant tendency for the participants to propose transitive structures (e.g., ‘he’s flicking a coin’) in their answers after viewing character-viewpoint gestures, and these structures occurred even if the participants could not identify the specific entity involved (e.g., “he’s flicking something,” or, “an object is being flicked”). On the other hand, there was a significant tendency for participants to propose non-transitive answers, either involving intransitive structures or partial answers about the ‘identity’ of objects (e.g., “something that is long, thin and smooth”) after viewing observer-viewpoint gestures. This suggests that character-viewpoint gestures not only convey significant semantic information (particularly about the ‘relative position’ and somewhat less reliably the ‘size’ of the actual entities involved in the event described) but also about the syntactic structure of the clause. The transitivity of the clause in the linguistic channel, in other words, seems to be partially signaled by the accompanying gesture. Of course, this last experiment had its own particular limitations. It did not try to assess the power of gestures generated from different viewpoints when presented alongside speech, but only when presented in isolation from speech. So now we used two new conditions (Beattie & Shovelton, 2001). The same thirty iconic gestures were used as in the previous experiment, and either they were shown in combination with the speech that they accompanied or the speech extracts were played on their own. For example: (6)
she starts [spewing bubbles] Fingers on both hands point towards mouth area then point upwards away from mouth.
After viewing the above character-viewpoint gesture presented with speech one participant wrote, “Somebody begins to spew bubbles out of their mouth and the bubbles move upwards away from their mouth.” Here the gesture-speech combination was scored as having conveyed information to this participant about the categories ‘identity’ (bubbles), ‘description of action’ (spewing), ‘shape’ (round), ‘movement’ (yes), ‘direction’ (upwards), and ‘relative position’ (moving away from the mouth). No information was provided by the participant about the ‘speed’ at which the bubbles were moving or about the ‘size’ of the bubbles. This experiment found that the overall mean accuracy score where participants could see the gestures in addition to hearing the speech was 56.8%, whereas in the speech-only condition it was 48.6%. Therefore again there was a significant increase in information obtained about the semantic properties of the original cartoon when the gestures are added to the speech. The overall percentage increase from the speech-only condition to gesture-speech combinations for character-viewpoint gestures was 10.6%, but it was only half that—5.7%—for observer-viewpoint gestures. Statistical tests revealed that character-viewpoint
232
GEOFFREY BEATTIE AND HEATHER SHOVELTON
gestures and observer-viewpoint gestures both added a significant amount of information to speech but character-viewpoint gestures added significantly more. This experiment again demonstrated that iconic gestures contain significant amounts of information and that character-viewpoint gestures were more communicative than observer-viewpoint gestures when they were displayed alongside speech. Character-viewpoint gestures were particularly good again at conveying information about the semantic category ‘relative position’. (Verbal clauses associated with character-viewpoint gestures seem to be particularly poor at conveying ‘relative position’ information, but the accompanying gestures more than make up for this.) This study also found that the semantic features like, ‘size’, ‘identity’, ‘movement’, ‘direction’ and ‘description of action’ were communicated more effectively by character-viewpoint gestures than by observer-viewpoint gestures. However, despite the overall communicational advantage of character-viewpoint gestures, observer-viewpoint gestures were actually better at communicating additional information about ‘speed’ and ‘shape’. This might be because observerviewpoint gestures can show ‘speed’ relative to a stationary observer and they enable ‘shape’ to be mapped out with the hands, as if an observer were directly looking at something. The categories ‘speed’ and ‘shape’ did not reach overall statistical significance however, due to the fact that although some observerviewpoint gestures were very effective at communicating information about these categories their effectiveness was not consistent across all observer-viewpoint gestures. A detailed analysis of the data reveals that there are some gestures that are highly communicative in the absence of speech, but once speech is added their contribution to the communication of meaning becomes almost redundant. In addition, there are gestures that do not communicate in the absence of speech but do communicate effectively once the speech has signaled the current theme that is being articulated. There are also some gestures that are consistently effective in terms of communication in both situations (and others that are consistently ineffective in both). To make this a little clearer we rank ordered the communicative effectiveness of each gesture on their own and in terms of what they added to speech. We found that five gestures were good communicators in the gesture-only condition but poor communicators when they were added to speech; three of these gestures were produced from an observer-viewpoint. There were also five gestures that were poor communicators in the gesture-only condition but good communicators when they were added to speech; again three of these were produced from an observer-viewpoint. There were ten gestures that were good communicators in both conditions and seven of these were produced from a character-viewpoint. (In the final cell there were ten gestures that were poor communicators in both conditions, six of these produced from an observerviewpoint.)
ICONIC GESTURE IN COMMUNICATION 233
The following is an example of a gesture that works better in the gestureonly condition than when added to speech: (7)
bouncing the ball [on the ground] Palm of right hand points downwards; hand moves rapidly downwards and upwards three times.
When this character-viewpoint gesture was shown in the gesture-only condition it was found to convey a good deal of information about six semantic categories (‘identity’ (a ball), ‘description of action’ (bouncing), ‘shape’ (round), ‘movement’ (yes), ‘direction’ (up and down), and ‘relative position’ (the ball moving up and down between the hand and the ground)). Once speech was added to the gesture, however, the gesture became redundant with respect to all six categories (although interestingly ‘speed’, i.e., the fact that the ball was being bounced very quickly, only tended to be mentioned when participants saw the gesture-speech combinations rather than seeing just the gesture). In no condition did participants get the ‘size’ of the object correct. It was, in fact, a large ball. The overall percentage accuracy score for this gesture was 50.5% in the gesture-only condition, whereas it was 75.0% in the speech-only condition increasing to 82.5% for gesture-speech combinations. The gesture therefore only added 7.5% additional information to the speech. Below is an example of an observer-viewpoint gesture that was good at communicating information about ‘shape’ (44.0% accuracy) in the gesture-only condition, but once the verbatim speech was added, the gesture became redundant (zero percent additional information about ‘shape’ and only 2.5% overall additional information). (8)
it’s got two [long bench either side] Hands are close together, palms are pointing towards each other, hands move apart in a horizontal direction.
In gesture-only, this gesture communicated to participants that the object being described was long (with very little information about what the object actually was), but when the speech was added the information provided by the gesture was clearly redundant. On the other hand, some gestures can only successfully communicate about certain semantic categories once the speech has first provided basic information. Below is an example of an observer-viewpoint gesture that was relatively poor at communicating information about ‘speed’, or any of the other semantic categories, in gesture-only. However, once the speech was added, the gesture then became more than just a flick of the hand—it became a male running very quickly out of his house, thus demonstrating the importance of the global meaning of the gesture in determining the meaning of the individual components of the gesture. (9)
[runs out of his house] again
234
GEOFFREY BEATTIE AND HEATHER SHOVELTON
Thumb of right hand is pointing upwards, other fingers are curled together. Hand moves upwards slightly and then to the right in a rapid movement.
In the gesture-only condition 14.0% of the participants correctly identified the ‘speed’ of this action whereas in the case of gesture-speech combinations 90.0% got this right (with zero percent in the speech-only condition). There are also a number of cases where the gesture conveyed a good deal of information both in isolation from speech and working alongside speech. For example: (10) [and gets covered in soup] Hands move to a position in front of the face; they then move apart and follow the curve of the face.
When this gesture was added to its accompanying speech it provided participants with information about the ‘relative position’ of the soup with respect to the character—it is the character’s face that gets covered in soup. The gesture also demonstrated the ‘direction’ that the soup was moving and the ‘size’ of the area that gets covered in soup. In the gesture-only condition the gesture conveyed information to participants about the same semantic categories, even though in this case they do not know what it is that is actually covering the face. So this gesture not only conveyed important information both in isolation and alongside speech, it conveyed information about the same semantic categories in both cases. However, there are other gestures that are very effective at communicating when they are presented both with speech and in isolation from speech, but they convey information about quite different semantic categories in the two cases. This relationship is illustrated by the following character-viewpoint gesture, presented above: (11) by [pulling on his tie] Left hand moves quickly upwards; hand closes and a sharp downwards movement is made.
This gesture provided participants with information, over and above that conveyed by the speech, particularly about the ‘speed’ of the action (fast) and the ‘relative position’ of the physical entities (the hand being wrapped around the tie). In the gesture-only condition, however, the gesture provided participants with information that mainly concerned the ‘size’ and ‘shape’ of the object involved (it was long and thin). In summary, these experiments have provided, in our opinion, incontrovertible evidence that iconic gestures do have a significant communicative role in semantic communication and that they interact with speech in a series of complex ways to communicate meaning. McNeill’s basic theoretical point, that, “utterances possess two sides, only one of which is speech; the other is imagery, actional and visuo-spatial,” would seem to be supported by this series of experimental investigations. These investigations carefully dissected each piece of
ICONIC GESTURE IN COMMUNICATION 235
information that experimental participants received from the gestures that accompany speech. It was found that the imagistic, actional and visuo-spatial component of the utterance was indeed crucial to the process of communication. 2.
Applying this New Model
Recently, we have begun to explore the implications of this research for applied communicational situations and in particular the design of effective messages. One domain that we have focused on is advertising (see Beattie & Shovelton, 2005). Part of the core function of advertising is to communicate semantic information about the distinct features of a product and to build brand image and identity. Research in this area is guided by a traditional theory of human communication, which asserts that language is the primary or sole medium of semantic communication. But does the new model of human communication (McNeill, 1985; 1992) which maintains that semantic communication depends upon both speech and accompanying iconic gesture, have implications for how we should think about the design of effective advertisements? And, does this new model have implications for helping us understand how different forms of media, for example TV (which allows for gesture-speech combination) versus radio (just speech), or newspapers (just text), might compare in terms of relative communicative effectiveness? We began by collecting a sample of natural gestures and speech by videorecording 50 undergraduates discussing cars, holidays and mobile phones–the three products to be advertised. We wanted to identify some core iconic and metaphoric representations for each of these products. Based on these recordings, three messages were scripted–one for each product. Next, TV messages were constructed using an actor who was filmed narrating the script and performing six gestures per message. The physical movements and the exact timing of the gesture with regard to the speech were choreographed (see McNeill, 1992; Beattie, 2003). This, in some sense, was the difficult bit of the project because up to this point we had been concerned solely with natural spontaneous imagistic movements and now we were attempting to simulate them, consciously and deliberately, explicitly taking into account the timing of the preparation, stroke and retraction phases and the form of the movements themselves. For the radio message, only the audio soundtrack was used, and for the text message a verbatim transcription was produced. Multiple-choice questionnaires were developed to measure communicative effectiveness. The questions related to semantic properties such as ‘manner’ (e.g. style of dancing in the holiday advert), ‘size’ (e.g. size of the wheels in the car advert), and ‘speed’ (e.g. speed of vibration in the mobile phone advert) amongst others. Each question had four alternative answers and there were twelve questions where the information required was in gesture and/or speech (‘complementary gestures’) and six questions where the information was in both gesture and speech (‘co-expressive gestures’). There were 150 participants—50 in each condition. The analyses revealed that, overall, significantly more questions
236
GEOFFREY BEATTIE AND HEATHER SHOVELTON
were answered correctly in the TV condition than in the radio or text conditions, and there was no significant difference between text and radio conditions. Participants in the TV condition gained 10.0% more information than participants in the radio condition and 9.2% more information than participants in the text condition. At first sight these differences, although statistically reliable, might not seem that large. However, it is important to remember that multiple-choice questionnaires with four possible alternatives were used to measure communicative effectiveness. These allowed 25% correct responses through guessing alone. When we allow for chance guessing we find that in the case of the TV condition the increase from chance to the percentage of correct answers obtained is 33.2%, and for the radio and text conditions combined it is 23.6%. Therefore the participants in the TV condition gained 40.7% more information than the participants in the radio and text conditions. In other words, the effect is not just statistically reliable; it is also quite a large effect. The holiday message was particularly well communicated in the TV condition. The increase from chance to the percentage of correct answers obtained for this message in the TV condition is 39.3% and for the radio and text conditions combined it is 18.0%. Therefore the participants in the TV condition gain 118.3% more information about the holiday message than the participants in the radio and text conditions combined. Thus, it appears that gestures in adverts are communicatively effective, but much more so for some products (for example, the ‘holiday’) rather than others. Why is this? Our hypothesis is that it is not the product per se that is important here but rather the nature of the corresponding gestures. One critical feature of a gesture is the span of the ‘stroke’ phase (the meaningful part of the gesture), which is the distance the hands travel from the start point of the stroke phase to the end point of the stroke phase. Some gestures are much larger than others in terms of stroke phase and therefore much more obviously noticeable and interpretable. We therefore analysed the span of our eighteen gestures and discovered that the nine most communicative iconic gestures had a larger span of stroke phase than the nine least communicative gestures (see Beattie and Shovelton, 2005). We had therefore discovered that TV ads seem to be a highly effective way of communicating information about a product, compared to radio or text presentations. But is the effect due to the presence of gestures in these ads or could the significant differences be attributable to a non-specific effect of simply more attention being devoted to a TV image? This question formed the basis for the next study, which compared two TV conditions, one with and one without gesture. A London advertising agency (Cartwright) created two broadcast standard TV advertisements for a (then) non-existent fruit juice drink, one involving speech and image and the other speech and gesture. The agency was advised on what iconic and metaphoric gestures to use for particular properties of the product in one version of the ad (again on the basis of the spontaneous gestures generated by
ICONIC GESTURE IN COMMUNICATION 237
a sample of 50 undergraduates). Three gestures were selected for use in the speech-gesture advertisement. See Table 1. Table 1: The images, gestures and speech used in the TV advertisements. TV condition one: Image TV condition two: Gesture Grocers’ spoken commentary
Grocer 1: Come on son, you invented “F” so fess1 up, how’s it done? Grocer 2: Mango, pear, cranberries, banana and orange. Five fruit portions [crammed into every tiny little bottle].
Grocer 1: Look we’re not monsters. Grocer 2: No we’re greengrocers. Grocer 1: [Everyone’s drinking it.]
Grocer 2: Delicious, [fresh], you’re muscling in on our patch.
Voiceover: “F”, five daily portions of pure fruit in a oner2. 1 2
Cockney slang for ‘confess’. Cockney slang for ‘in one’.
The gestures represented three core properties of the product, namely the ‘size’ of bottle (iconic gesture: hands move towards each other until they represent the size of the bottle), that, “everyone,” was drinking it (metaphoric gesture: right hand and arm move away from the body making a large sweeping movement) and that the fruit used was, “fresh” (metaphoric gesture: hands are together in front of chest, they move away from each other abruptly as fingers stretch and become wide apart). For the speech-image advertisement the advertising agency created their own images (based on their professional experience) to convey these same 3 properties (size: image of the actual bottle with respect to the hand; everyone: the Sun newspaper, Britain’s best read daily newspaper, displaying the headline,
238
GEOFFREY BEATTIE AND HEATHER SHOVELTON
“everyone’s drinking it”; freshness: juice sparkling on the fruit). The advertisements were spoof gangster style interactions. Two greengrocers are interrogating the inventor of “F” as to how he managed to cram five portions of fruit into a little bottle. The greengrocers are worried that people will no longer need to buy as much fresh fruit from them. The advertisements were played twice to two independent groups of 50 participants. A multiple-choice questionnaire was used with five alternatives for each of these core properties. In terms of these three core properties, we compared the proportions of correct answers and ‘wrong’ answers—those answers deemed most undesirable for this particular product. The ‘wrong’ answers here were not necessarily the residual responses (although it was the residual response for ‘freshness’) rather they were the responses that the advertising agency considered most damaging for this particular product. See Table 2. Table 2: Speech and image versus speech and gesture: a comparison. % choosing correct answer % choosing ‘wrong’ answer Speech & Image Speech & Speech & Image Speech & Gesture Gesture Fresh 86% 94% 14% 6% For everyone 26% 50% 28% 12% Bottle size 32% 46% 20% 4%
The analyses revealed that significantly more participants reported that the product was ‘fresh’, ‘for everyone’ and the ‘right size’ compared to the ‘wrong’ answers when these properties were represented with gestures rather than with image. In other words, gestures are particularly effective at communicating core semantic properties of products compared to other non-gestural images. These results suggest that the effects that we have observed are not simply attributable to the general effects of TV per se but rather that iconic and metaphoric gestures are particularly effective at transmitting core information about a product, and are better (at least in the cases we looked at) than the images generated by a professional agency. One possible explanation as to why gestures are so effective is that these hand movements encode just the core semantic properties of a product. For example, the gesture representing the ‘size’ of the bottle just encodes the size dimension, and nothing else, with the hands moving closer together to uniquely identify the size of the bottle. There are no other aspects of the bottle communicated through this gesture, for example, ‘colour’, ‘texture’, the ‘shape’ of the bottle or width at the top, to distract from the core communication. Complex visual images of the kinds used in TV ads have many properties; gestures, on the other hand, are able to isolate just the core dimensions that one wishes to communicate. This could well be why they are so effective. The results of these latter two studies have potentially significant implications for how we might think about the design of effective communication and particularly about the construction of powerful TV ads. They also suggest that speech and gesture together are better at semantic communication than speech
ICONIC GESTURE IN COMMUNICATION 239
alone. They suggest that gestures, particularly large span gestures, are particularly effective in TV ads and may be better than the kinds of images that advertisers would normally choose to use. Some television advertisements currently use gestures, but these gestures possess few of the semantic or temporal properties of natural gestures. They simply do not start and stop at the right points with respect to the speech and in some the form of the gesture is completely unchanged with a new syntactic clause. The research reported here suggests that the effectiveness of television advertisements may be improved by incorporating spontaneous images of the hands with the right temporal and semantic properties. Of course, there is an intriguing idea here: given that speech evolved in the context of (Goldin-Meadow & McNeill, 1999) and possibly through (Allott, 1992) such gestures, this research suggests that aspects of our evolutionary past may have significant implications for the content of modern advertisements. The brain has, after all, clearly evolved to deal with speech in the context of the spontaneous images created by the human hand. McNeill’s theory may help us understand this process a little better at the psychological level. New developments in brain imaging may help to reveal the neurophysiological underpinnings. 3.
Conclusions
In summary, these investigations tell us a great deal about how speech and gesture interact in the communication of meaning. The debt that this research owes to the work of David McNeill should be clear. McNeill’s seminal research was the principal driver behind the experiments that we have been conducting, that and our judgement that researchers had not conducted the right kind of experiments to support or disprove some of the core ideas that sprang from his theory, especially the idea that gestures are a core part of semantic communication. We believe that our experimental programme has demonstrated that iconic and metaphoric gestures are crucial to communication and that they interact with speech in the communication of meaning in everyday talk in a number of significant ways. We also believe that this research has major implications for a wide variety of communicational situations and we have shown that in advertising, research on gestures, and McNeill’s theory in particular, may have a significant role to play in terms of conceptual developments in this important field. The potential practical applications are enormous. To return to the words of Thomas Mann that we quoted at the start of this chapter, we now understand a little more about how human talk preserves contact in human face-to-face communication (and even in human communication mediated through television). And because of the seminal research of David McNeill we are starting to catch a real glimpse of how the hands and the voice together bridge that enormous gap between human beings as they go about their everyday lives, talking, acting and listening.
240
GEOFFREY BEATTIE AND HEATHER SHOVELTON
References Allott, R. (1992). The motor theory of language: Origin and function. In J. Wind (Ed.), Language origin: A multidisciplinary approach. Dordrecht: Kluwer Academic Publishers. Argyle, M. (1972). The psychology of interpersonal behaviour. (2nd Ed). London: Penguin. Austin, G. (1806/1966). Chirinoma or, a treatise on rhetorical delivery. Carbondale and Edwardsville: Southern Illinois University Press. Bacon, F. (1605/1952). The advancement of learning. Great books of the western world, Volume 30, Francis Bacon. Chicago: Encyclopedia Britannica. Beattie, G. (1983). Talk: An analysis of speech and nonverbal behaviour in conversation. Milton Keynes: Open University Press. Beattie, G. (2003). Visible thought: The new psychology of body language. London: Routledge. Beattie, G. & Coughlan, J. (1998). Do iconic gestures have a functional role in lexical access? An experimental study of the effects of repeating a verbal message on gesture production. Semiotica, 119, 221-249. Beattie, G. & Coughlan, J. (1999). An experimental investigation of the role of iconic gestures in lexical access using the tip-of-the-tongue phenomenon. British Journal of Psychology, 90, 35-56. Beattie, G. & Shovelton, H. (1998). The communicational significance of the iconic hand gestures which accompany spontaneous speech: An experimental and critical appraisal. In S. Santi, I. Guaitella, C. Cave, & G. Konopczynski (eds.) Oralite et gestualite communication multimodale, interaction, (pp. 371-375). France: L’Harmattan. Beattie, G. & Shovelton, H. (1999a). Do iconic hand gestures really contribute anything to the semantic information conveyed by speech? An experimental investigation. Semiotica, 123, 1-30. Beattie, G. & Shovelton, H. (1999b). Mapping the range of information contained in the iconic hand gestures that accompany spontaneous speech. Journal of Language and Social Psychology, 18, 438-462. Beattie, G. & Shovelton, H. (2000). Iconic hand gestures and the predictability of words in context in spontaneous speech. British Journal of Psychology, 91, 473-492. Beattie, G. & Shovelton, H. (2001). An experimental investigation of the role of different types of iconic gesture in communication: a semantic feature approach. Gesture, 1, 129-149. Beattie, G. & Shovelton, H. (2002a). What properties of talk are associated with the generation of spontaneous iconic hand gestures? British Journal of Social Psychology, 41, 403-417. Beattie, G. & Shovelton, H. (2002b). An experimental investigation of some properties of individual iconic gestures that affect their communicative power. British Journal of Psychology, 93, 473-492. Beattie, G. & Shovelton, H. (2005). Why the spontaneous images created by the hands during talk can help make TV advertisements more effective. British Journal of Psychology, 96, 21-37. Bulwer, J. (1644/1974). Chirolia: Or the natural language of the hand and chironomia: Or the art of manual rhetoric. Carbondale, IL: Southern Illinois University Press. Butterworth, B. & Beattie, G. (1978). Gesture and silence as indicators of planning in speech. In R. Campbell & P. Smith (Eds.) Recent advances in the psychology of language: formal and experimental approaches. New York: Plenum. Butterworth, B. & Hadar, U. (1989). Gesture, speech, and computational stages: A reply to McNeill. Psychological Review, 96, 168-174. Ekman, P. & Friesen, W. (1969). The repertoire of nonverbal behavioural categories: Origins, usage, and coding. Semiotica, 1, 49-98. Goldin-Meadow, S. & McNeill, D. (1999). The role of gesture and mimetic representation in making language the province of speech. In M. Corballis & S. Lea (Eds.), The descent of mind (pp.155-172). Oxford: Oxford University Press. Hadar, U. & Butterworth, B. (1997). Iconic gestures, imagery and word retrieval in speech. Semiotica, 115, 147-172.
ICONIC GESTURE IN COMMUNICATION 241 Hadar, U. (2001). The recognition of the meaning of ideational gestures by untrained subjects. In C. Cave; I. Guaitella, & S. Santi, (Eds.) Oralite et gestualite: Interactions et comportements multimodaux dans la communication, (pp. 292-295). L’Harmattan: France. Krauss, R., Morrel-Samuels, P. & Colasante, C. (1991). Do conversational hand gestures communicate? Journal of Personality and Social Psychology, 61, 743-754. Mann, T. (1924/1995). The magic mountain. J. Woods (trans.). New York: Knopf. McNeill, D. (1979). The conceptual basis of language. Hillsdale, NJ: Erlbaum. McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review, 92, 350-371. McNeill, D. (1992). Hand and mind. What gestures reveal about thought. Chicago: University of Chicago Press. McNeill, D. (2000). Language and gesture. Cambridge: Cambridge University Press. McNeill, D., Cassell, J. & McCullough, K.-E. (1994). Communicative effects of speechmismatched gestures. Research on Language and Social Interaction, 27, 223-237. Quintilian, M. (100/1902). Quintilian, institutions oratoriae. H. Butler (trans.). London: Heinemann. Tylor, E. (1878). Researches into the early history of mankind. London: John Murray. Wundt, W. (1921/1973). The language of gestures. The Hague: Mouton and Co.
Intersubjectivity in Gestures The Speaker’s Perspectives toward the Addressee* Mika Ishino University of Chicago
Gestures encode significant semantic, pragmatic, social, cognitive, and discourse information. Consequently, gestures also show speakers’ attitudes, emotions, and perspectives (that is, their ‘subjectivity’) toward their utterances' propositional content. We distinguish subjectivity from ‘intersubjectivity’: “Intersubjectivity involves the speaker’s attention to the addressee as a participant in the speech event, not in the world talked about, and hence it arises directly from the interaction of the speaker with the addressee” (Traugott & Dasher, 2005:22). This paper examines how gestures reflect intersubjectivity in a Japanese dyad’s interaction. The focus is on the synchronic intersubjectivity that encodes speakers’ viewpoints in gestural deixis in relation to addressees.
1.
Introduction
We as language users are not divine but human. Human language use reflects individual speakers’ evaluations, opinions, emotions, and attitudes; that is, their individual perspectives. Instances of natural language use are thus not completely reducible to their objective propositional content. In recent decades, ‘subjectivity’ and ‘subjectification’ in language have engaged language researchers, especially with respect to how speakers’ perspectives are encoded in deixis and modality, as well as how discourse functions are marked (Benveniste, 1966; Langacker, 1985; Lyons, 1982; Traugott, 1995; Traugott & Dasher, 2005; among many others). Despite the popularity of the concepts of subjectivity and subjectification, they do not have a consistent definition (cf. Traugott & Dasher, 2005; De Smet & Verstraete, 2006). Gestures are integral to the cognitive process of language production (McNeill, 1992; 2005). Therefore, just as spoken language may be mutable with respect to speakers’ perspectives, so too do gestures encode speakers’ perspectives. However, there have been relatively few studies that examine the encoding of speakers’ subjectivity in gestures (but see Bavelas, Black, Lemery, & Mullett, 1986; Emmorey, Tversky & Taylor, 2000; McNeill, *
I shall never forget the intellectual ideas, time, space, and gestures I shared with David McNeill in my years as a graduate student and so I would like to thank David for that. I also want to thank the editors of this volume for their invaluable comments and helpful suggestions.
244 MIKA ISHINO 1992). For the purpose of this paper, ‘subjectivity’ is defined as speakers’ attitudes, emotions, or viewpoints/perspectives toward what they say. Based on Benveniste’s work (1966), Traugott & Dasher (2005) distinguish ‘intersubjectivity’ from ‘subjectivity: “Intersubjectivity crucially involves the speaker’s attention to the addressee as a participant in the speech event, not in the world talked about,” and hence it arises directly from the interaction of the speaker with the addressee (Traugott & Dasher, 2005:22). Traugott & Dasher (2005:23) claim that “intersubjective meanings crucially involve social deixis (i.e., attitude toward status that speakers impose on first person-second person deixis)”. For example, honorifics, the choice of personal pronouns, and final particles which show affective meanings are manifestations of intersubjectivity. Due to the nature of gestures, I use intersubjectivity somewhat differently from Traugott & Dasher (2005).1 While those authors address diachronic intersubjectivity, my concern is with synchronic intersubjectivity, that is, how a speaker shifts viewpoints from one moment to another in discourse. In particular, I focus on how the presence of an addressee can affect the directionality of gestural deixis. The organization of the paper is as follows. Section 2 reviews research on subjectivity and intersubjectivity in gestures, and section 3 presents a case study of a speaker shifting the ‘origo’ (Bühler, 1934) of her deictic gestures in discourse. I will discuss how such origo shifts reflect intersubjectivity toward the addressee. 2.
Subjectivity and Intersubjectivity in Gestures
I first present a brief overview of McNeill’s work on subjectivity in gestures. McNeill (1992) notes that gestures and speech often co-express the speaker’s viewpoint. Observer viewpoint (henceforth, OVPT) gestures display events from the perspective of an observer, and character viewpoint gestures (henceforth, CVPT) are produced as if the speaker were a character in the story. Further, in CVPT gestures speakers enact actions as if they were taking place at the moment of speaking. Hence, CVPT gestures encode speakers’ subjectivity toward the story. For example, when narrating an event in the Canary Row cartoon (specifically, a scene in which Sylvester the Cat tries to climb up to Tweety Bird via the inside of a drainpipe), one speaker makes a two-handed gesture depicting Tweety’s response to Sylvester’s approach, which is to thrust a bowling ball into the drainpipe (see McNeill, 1992). In this gesture, the speaker’s hands become the bird’s hands. With this CVPT gesture, the speaker displays her subjective perspective on the story. In conversational discourse, too, speakers may show their subjectivity in the same way, by enacting in gesture what another
1
Following McNeill (1992), I take gestures to be global and synthetic, non-compositional, and lacking standards of form.
INTERSUBJECTIVITY IN GESTURES 245
person does. In such gestures, speakers become another person and hence identify themselves with that other. Now I will provide a brief introduction to findings from representative studies on intersubjectivity in gestures. While no gesture researchers have used the term ‘intersubjectivity’, several studies address the impact that the presence of the other participants in a discourse has on a speaker’s gestures. First, Bavelas, Black, Lemery & Mullett (1986) argue that motor mimicry is an overt manifestation of interpersonal processes such as empathy toward the addressee, conveying, “I show how you feel.” Thus the phenomenon of motor mimicry expresses intersubjectivity, even though the authors do not use the notion of intersubjectivity (since the notion as it is defined here was not yet in existence at the time). Second, Bavelas, Chovil, Lawlie & Wade (1992) show that the visibility of addressees affects the rate of ‘interactive gestures’, which index the social dimensions of interactions. Third, in a study investigating whether gestures serve a communicative or cognitive function, Özyürek (2002) shows that speakers tend to change the directionality of their gestures depending on the number of addressees; they do not, however, alter their speech systematically and they do not show differences in conceptualization of the direction of motion. The study also shows that how the addressees are positioned relative to the speaker affects gesture directionality. Özyürek’s findings are thus also manifestations of intersubjectivity in gesture because the positions (hence, physical presence) of the addressees affect the directionality of gestures. 3.
Analysis
In this section I present a case study of the gestural manifestation of synchronic intersubjectivity in Japanese dyadic discourse. The data were elicited as follows. Two female participants, M and B, were asked to talk about something that had recently made them angry. Speaker B began to discuss a recent job interview. She worked as a part-time receptionist at a hospital, however, wanted to become a full-time receptionist and so interviewed at a second hospital. B says that, after the interview, the interviewer called her in person, and she sensed that the interviewer wanted to turn her down for the job, without saying so directly. The speaker makes a total of seven deictic gestures with her right hand. The seven gestures are divided into three groupings. In the first, B points to herself, then she moves her hand from shared space to herself,2 and finally points to the addressee, M. In the second grouping, B moves her hand from shared space to herself. In the third grouping, B points to the addressee, M, three times across an interval of discourse. I will describe how these groupings of deictic gestures are evidence of speaker-addressee intersubjectivity.
2
The shared space is the overlapping personal space between the discourse participants (Özyürek, 2000).
246 MIKA ISHINO In the following excerpt (1) from this conversation, B describes the interviewer’s attitude, as well as her own feelings toward the experience. (1) B1: soko no mensetukan mitaina hito ga {1hito}ri hitori there GEN interviewer like person NOM one.person one.person “What (the) interviewer there called and told (me/interviewees) atode {2nan}ka Øme[yob-are-te] Øhe Ømeiw-are-ta koto ni wa:3 later well call-HON-and say-HON-PST thing DAT TOP in person later is,” 1 1. pointing <SP>: index finger in front of chest points at speaker’s own face.4
M1: {1un} / {2un} uh.huh uh.huh “Uh huh. Uh huh.” B2: nanka Øhe toomawasini [maa Ømekotowari tai n yaroo ke]do well euphemistically well turn.down DES NL EVID though “Well, though (he) seems to want to turn (me) down euphemistically,” 2 2. pointing <SP>: loosely-bent-thumb and index finger with palm facing up move upward while bending index finger at belly level to point at speaker.
M2: a{3:a} uh.huh “Uh huh.” B3: [{3na}nka sono Øhe Øme kotowari-kata ga:] well that decline-way NOM “Well. The way (he) turns (me) down is,” 3 3. pointing <SP> & iconic
: index finger with palm facing up moves from shared space to speaker herself, to point at speaker.
In gesture (1), B points to her own face with her index finger at chest level. As shown, the direct object of the two transitive verbs (yobu ‘to call’ and iu ‘to say’) are zero anaphors and thus not overtly expressed. Gesture (1) synchronizes with the stem of the first transitive verb yob- and part of the honorific suffix -are. The gesture is typical of pointing gestures in that the speaker uses her extended 3 In describing speech and gestures, co-indexed curly brackets indicate speech overlap; Øpronoun indicates an unexpressed pronoun; / signifies an unfilled pause; : in speech indicates elongated phonation; square brackets show when the hands are in motion; bold fonts signify a gesture stroke phase; %laugh indicates laugh; SP and AD stand for the speaker and the addressee, respectively. 4 In the description of gesture examples, SP indicates that the target of the deictic gesture is the speaker and AD indicates that the target is the addressee.
INTERSUBJECTIVITY IN GESTURES 247
index finger to point at herself, who is of course present in physical space5. Since Japanese is a language in which pronouns are not overtly expressed when their referents are recoverable from discourse context, here the first person pronoun is not overtly expressed in speech (that is, it constitutes zero anaphora). However, the deictic gesture (1), also a part of language, overtly plays a pronominal role (cf. Haviland, 2003; Ishino, 2005). In B2, gesture (2) also indexes the speaker herself. She now uses a loosely bent thumb and index finger to point to herself. This deictic gesture synchronizes with the verb root (that is, kotowari ‘to turn down’). In concurrent speech, the speaker says, “Well, though (he) wants to turn (me) down euphemistically.” Similar to the concurrent speech for gesture (1), the direct object of this construction (the equivalent of ‘me’) is not overtly expressed. Hence, gesture (2) also serves the function of a first person pronoun (‘me’). Both gestures, (1) and (2), are made from the speaker’s perspective. In short, with respect to ‘subjectivity’ the speaker shows her own perspective rather than that of a third person. Gesture (3) clearly displays directionality from the addressee to the speaker, as the extended index finger moves from shared space to the speaker. Simultaneously, the directionality of the gesture iconically depicts the subject and direct object of the verb (‘to decline’), as inferred from the concurrent speech in B3. The speaker points at herself in gesture when she uses her body to make reference to herself. However, the addressee is viewed as the interviewer in gesture space, as suggested by the fact that the pointing gesture in (3) moves from the shared space between the two discourse participants, who are seated at a 90degree angle. In that respect, the origo reflects a metaphor in which the addressee is viewed as the interviewer. The origo of this pointing gesture is the addressee’s location. As suggested by this and the concurrent speech (that is, “the way he turns me down is”), the addressee is gesturally conceptualized as the interviewer. In that respect, gesture (3) is made possible by the presence of the addressee, thus encoding intersubjectivity. In passage 2, below, speaker B shifts her perspective to that of the interviewer when quoting him in B4 and B5. Whether or not she enacts what the interviewer did gesturally cannot be known for sure. However, she quotes what the interviewer said to her, as shown in B4 and B5. The direct quotations are a manifestation of the speaker’s taking another’s perspective, even though, in all three quotes, the subjects and the verbs are not overtly expressed. In colloquial Japanese, utterances are often marked by the quotation particle -(ttoka)tte, as in the quoted utterances in B4 and B5. In speech, the speaker identifies herself, in the quotes, with the interviewer. Furthermore, the construction of the two sentence final particles, yo ne, adds the speaker’s attitudes toward the propositional content; 5
Gestures pointing to an entity or an object in physical space is called concrete deixis, as opposed to abstract deixis, in which speakers point at empty space that has been invested with referential substance (McNeill, Cassell, & Levy, 1993). Hence, gesture (1) is an instance of concrete deixis.
248 MIKA ISHINO it is equivalent to the English, ‘Is it right that...?’ The yo ne-marked utterances seek the addressee’s agreement. Hence, these encode the speaker’s intersubjectivity. Since these are quotes from the interviewer, the sentence final particles encode the interviewer’s intersubjectivity as well. (2) B4: nanka / “kekkon[:]-aite [i masu yo ne:” tte] well marriage-partner exist POL FP FP QT “(He says), ‘(You) have a fiancé, don’t you?’” 4 5 4. pointing & iconic : index finger, facing center, points at the addressee. 5. pointing & iconic : loosely bent thumb and index finger point at the addressee.
“{4kekkonsi [ma}su yo ne:” ]{5tte} get.married POL FP FP QT “(He goes like,) ‘(You) will get married, won’t you?’” 6 6. pointing & iconic : loosely bent thumb and index finger, palm facing up, point at addressee.
M4: {4haa} // {5%laugh} oh “Oh.” B5: “dakara Øyou tutome-rare-mas-en yo ne{6:” ttoka tte} therefore work-POT-POL-NEG FP FP QT QT “(He says,) ‘So, (you) cannot work (here), can you?’” M5: {6aa aa aa} uh.huh uh.huh uh.huh “Uh huh. Uh huh. Uh huh.” B6: yuuhuuni Øme motteik-are-kata ga nanka sugoi iya ya tte like take-PASS-way NOM well very disgusting COP and “(The) way (I) was led like (B4 & 5) was very disgusting and,” In gesture (4), B points briefly at the addressee while articulating the elongated final nasal consonant (that is, n) of the noun kekkon (‘marriage’). In gestures (5) and (6), the speaker points at the addressee. The three gestures pointing to the addressee are accompanied by the two yo ne-marked utterances, quoting what the interviewer said to the speaker (“(I) bet (you) have a fiancée. (You) will get married, right?”). Then, all the origos of the gestures in (4-6) are the speaker, and their targets are the addressee. Thus, the speaker takes the interviewer’s perspective, and she, as interviewer, points at the addressee, who is conceptualized as the speaker. The directionality of the pointing gesture differs
INTERSUBJECTIVITY IN GESTURES 249
from gestures (1-3) to (4-5). It simultaneously indicates that the viewpoint shifts from the speaker’s own viewpoint to that of the interviewer. Gestures (4-5) encode the speaker’s intersubjectivity, in that the presence of the addressee enables the speaker as the interviewer to point at the addressee who is the speaker in gesture. Furthermore, note that the deictic gestures in (4-5) have a dual status, because, while the speaker concretely points at the addressee in actual space, the deictic gestures reflect a metaphoric construal in which the speaker is viewed as the interviewer and the addressee is viewed as the speaker. (For further examples of deictic gestures with this dual status, see Ishino, forthcoming.) In passage 3, the speaker shifts perspective back to the speaker herself, as shown in gesture (7) in B7. Speaker B says, “(I) felt like (he) should turn (me) down directly,” and, as in B7, the speaker points at herself with index finger extended. The deictic gesture in (8) is synchronous with a part of the transitive verb (kotowaru ‘to decline’) and, as in the previous utterances, both subject and the object pronouns are omitted. Gesture (7) indexes and points to the speaker herself. The deictic gesture in (7) serves as a zero anaphor (“me”) in B7. (3) B7: un / Øhe ha[kkiri Øme kotowar-ebaii noni] mitaina yeah directly decline-should PTC like ”(I) felt like that (he) should turn (me) down directly,” 7 7. pointing <SP>: index finger points briefly at speaker.
Interestingly, gestures (2), (3), and (7) synchronize with the verb kotowaru and gestures (5) and (6) with the sentence final particles yo ne. In the latter case, intersubjectivity is encoded in both speech and gesture. To summarize the gesture observations from all three passages, while all the deictic gestures make reference to the speaker, their targets shift from speaker to addressee, reflecting the speaker’s conceptualization of perspective. 4.
Conclusion
Following McNeill (1992; 2005), the present study is based on the assumption that gesture and speech are intimately tied to one another. The observations described here provide further evidence that gesture and speech comprise a single unit of thinking and have several general implications. They support the view that gestures are a part of language and form part of a communicative system more broadly. This is because, as the examples show, deictic gestures and zero anaphors can occur in complementary distribution. As Traugott & Dasher (2005) claim, intersubjectivity is grounded in dyadic discourse, in that the existence of an addressee in physical space makes it possible for the speaker to point to the addressee from another’s perspective. Pointing gestures are often simple in form, but observation of gesture-speech synchrony can enrich our understanding of how intersubjectivity is manifested in
250 MIKA ISHINO the directionality of pointing gestures, in face-to-face dyads. Furthermore, seemingly concrete pointing gestures reflect metaphors in which one person is conceptualized in terms of another. In the above examples (4-6), the addressee is conceptualized in terms of the speaker, who herself takes another’s viewpoint. In that respect, such pointing gestures have a dual status: they are both concrete and abstract. The present study suggests that further research into the intersubjectivity manifested in gesture-speech synchrony will yield further insight into both cognitive and interactive dimensions of gestures. References Bavelas, J. B., Black, A., Lemery, C.R, & Mullett, J. (1986). “I show how you feel”: Motor mimicry as a communicative act. Journal of Personality and Social Psychology, 50, 322329. Bavelas, J. B., Chovil, N., Lawlie, D. A., & Wade, A. (1992). Interactive gestures. Discourse Processes 15, 469-489. Benveniste, E. (1971[1966]). Subjectivity in language. In Trans. by M. E. Meek, Problems in general linguistics (pp.223–230). Coral Gables, FL: University of Miami Press. Bühler, K. (1982[1934]). Sprachtheorie: Die Darstellungsfunktion der Sprache. Stuttgart: Fischer. De Smet, H., & Verstraete, J-C. (2006). Coming to terms with subjectivity. Cognitive Linguistics, 17(3), 365-392. Emmorey, K., Tversky, B. & Taylor, H. A. (2000). Using space to describe space: Perspective in speech, sign and gesture. Spatial Cognition and Computation, 2, 157-180. Haviland, J. B. (2003). How to point in Zinacantán. In S. Kita (Ed.), Pointing:Where language, culture, and cognition meet (pp.139-169). New Jersey: Lawrence Erlbaum Associates Publishers. Ishino, M. (2005). Pointing in face-to-face interaction. Paper presented at the Second Congress of International Studies for Gesture Studies, Lyon, France. Ishino, M. (forthcoming). Roles of gestures pointing to the addressees in Japanese face-to-face interaction: Attaining cohesion via metonymy. In Y. Takubo (Ed.), Japanese/Korean linguistics 16. Stanford, CA: CSLI Publications. Langacker, R. W. (1985). Observations and speculations on subjectivity. In Haiman, J. (Ed.), Iconicity in syntax (pp.109-150). Amsterdam & Philaderphia: John Benjamins. Lyons, J. (1982). Deixis and subjectivity: Loquor, ergo sum? In R.J. Jarvella & W. Klein (Eds.), Speech, place, and action (pp.101-124). Chichester: John Wiley and Sons. McNeill, D. (1992). Hand and mind. Chicago: University of Chicago Press. McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press. McNeill, D., Cassell, J., & Levy, E. T. (1993). Abstract deixis. Semiotica, 95(1/2), 5-19. Özyürek, A. (2000). The influence of the addressee location on spatial language and representational gestures of direction. In D. McNeill (Ed.), Language and gesture (pp.6483), Cambridge: Cambridge University Press. Özyürek, A. (2002). Do speakers design their co-speech gestures for their addressees? The effects of addressee location on representational gestures, Journal of Memory and Language, 46, 688-704. Traugott, E. C. (1995). Subjectification in grammaticalisation. In D. Stein & S. Wright (Eds.), Subjectivity and subjectivisation (pp.31–54). Cambridge: Cambridge University Press. Traugott, E. C., & Dasher, R. B. (2005). Regularity in semantic change. Cambridge: Cambridge University Press.
An Integrated Approach to the Study of Convention, Conflict, and Compliance in Interaction1 Starkey Duncan, Jr. University of Chicago
Stemming from studies of early parent–child interaction, an integrated treatment of compliance, noncompliance, conflict, and related phenomena is proposed within a broader conceptual framework for describing face–to–face interaction. Basic to this framework are the notions of convention–based or rule–governed interaction, strategies taken within convention–based interactions, and ratification. Derived from these concepts, compliance, noncompliance, and conflict: (a) are defined in terms of convention–based interaction sequences, (b) may be observed in at least two different types of interaction, and (c) are regarded as pervasive in the child’s everyday interaction with others. Implications of this approach for bidirectionality, parent–child negotiation, and reciprocity theory are considered. While based on studies of parent– child interaction, the basic notions are intended to be applicable to interactions of any sort.
1.
Introduction
This discussion proceeds from the notion that much of face–to–face interaction in general, and parent–child interaction in particular, is convention based or rule governed. A conceptual framework, a general methodology, and several sets of results for interaction between adults and between parents and infants have been previously described (e.g., Duncan, 1991; Duncan & Farley, 1990; Duncan & Fiske, 1977; Duncan, Fiske, Denny, Kanki & Mokros, 1985). The purpose of this paper is to consider two more specialized phenomena within the more general framework: conflict and compliance in convention–based interaction. Although the notions of conflict and compliance developed here are intended to be applicable to interaction in general, examples, references, and data will be derived from studies of infant–parent interaction, generally within the infant’s first year of life. This is a matter of convenience because this more specific area of interaction has been the recent primary focus in the laboratory (e.g., Doolittle, 1995; Enders, 1997; Li, 1997). 1
Preparation of this paper was supported in part by National Institute of Mental Health grant MH– 38344 and by a McCormick–Tribune grant to Starkey Duncan, Jr. Fraeda Friedman and Anne Farley contributed to the data analyses used for illustrations.
252 STARKEY DUNCAN In this discussion it is presupposed not only that interaction in general is typically rule governed, but also that conflict rarely exists in a pure form without any rules constraining the actions of participants (Schelling, 1960). It is generally to the advantage of participants to limit conflicts in various ways. In any event, this discussion focuses on conflicts bounded by a set of shared rules, so that cooperation and conflict co–exist within the conflictual interaction. Schelling (1960) termed this situation a ‘mixed game’ (p.89), involving neither pure coordination nor pure conflict. In this discussion, conflict and compliance will be treated as deeply related, complementary phenomena, often coexisting sequentially within a single convention–based interaction. Treatment of conflict and compliance will be developed within the context of two underlying notions: (a) convention-based interaction, and (b) processes of ratification and nonratification-based interaction. It is therefore necessary to begin with a brief and very general sketch of these notions. More extended discussion of this conceptual framework may be found in Duncan & Fiske (1977), Duncan, Fiske, Denny, Kanki & Mokros (1985), and Duncan (1991). There can be two distinguishable domains of conflict, compliance, and related phenomena. One domain has to do with the convention to be used in an interaction, termed ‘concerning convention’. The second domain has to do with the course the interaction should take once a convention has been adopted for an interaction, termed within convention. The principles underlying these two domains are essentially the same. Because the concerning-convention domain was considered in an earlier paper (Duncan, 1991), this discussion will focus on the within-convention domain. In the present discussion, all uses of the terms ‘conflict’ and ‘compliance’ refer to instances that are ’within convention’. If successful, the general formulation will provide expanded understanding and more precise definition of conflict and compliance, permitting more powerful, sensitive, and integrated studies of these important phenomena in face-to-face interaction. 2.
Convention
The general notion of convention, or rule–governed action, is presupposed in linguistics and much of the social sciences. Closely related terms such as ’custom’, ‘norm’, ‘script’, ‘sanction’, and ‘social practice’ are common in the social science literature. Broadly speaking, all of these are taken to have their base in the notion of convention. In this discussion, the terms ‘convention’ and ‘rule governed’ will be used interchangeably to refer to this general phenomenon. The term ’structure’ will be used to refer to a hypothesis of a convention operating in an interaction. For purposes of communication, structures are represented as flowcharts.
CONVENTION, CONFLICT, AND COMPLIANCE 253
2.1. Hypothesized convention for spoon–feeding Alex It is useful to have a specific interaction structure to serve as an example in this discussion. Although the conceptual framework discussed here is intended to be entirely general with respect to face-to-face interaction, as a matter of convenience an example is chosen from recent studies of infant-parent interaction in the first year. Figure 1 shows the structure for a highly conflictual spoon– feeding interaction between Alex and his mother, extending from 9 to 12 months of age. For infants of this age, it is unusual for a single structure to continue without modification across several months. However, this circumstance is highly convenient for purposes of analysis. A more extended account of this structure shown in Figure 1, and related phenomena may be found in Friedman et al. (2007). Analyses of this interaction drew on a larger set of videotapes made in the home, all day, two consecutive days each month for 13 months, beginning when the child was 9 months old. The structure is designed to apply to each ‘spoon– feeding cycle’, defined as beginning when the mother spooned food from a container, and ending when the spoon was returned to the container, regardless of whether or not the child was fed. A spoon–feeding cycle constitutes one enactment of the structure. The structure is represented as a flowchart. Each shape in the flowchart is an ‘element’ of the structure, involving one or more actions. Arrows connect elements, indicating appropriate sequences of actions within the structure. The rectangular shapes in the flowchart represent actions hypothesized to be obligatory for the indicated participant. These actions must be taken at the indicated points in the stream of interaction. Diamond shapes represent points at which the participant may exercise a legitimate option. The arrows leading from the shape indicate the alternatives available to the participant at each of these points. The shape with a horizontal bar labeled “joint action” represents an element of the structure that can be accomplished only through the joint, cooperative action of both participants. These joint actions are typically exchanges. In the case of feeding Alex, the child is fed only if the mother makes the food available, and the child accepts. The shapes labeled “negative response” represent points at which the child might display a rather rich and varied set of actions to ward off the mother’s attempted spoon–feeding. The occurrence of any one or more of these actions constituted a ‘negative response’. Thus, these actions are considered interchangeable within the element. Negative responses included the child’s looking down, leaning over the side of the tray, averting his head, screaming, raising his arm(s), putting his hand(s) over his mouth, vocalizing, crying, pushing the Mother’s hand, kicking, swinging his arm(s), rocking from side to side, putting his head on the tray, shaking his head, hitting the tray, and blocking or hitting at the extended spoon. These actions occurred singly or in clusters, some of which were rather large. The effect was of frequently vigorous and dramatic displays.
Figure 1: Hypothesized structure for spoon-feeding Alex (9-12 months of age).
CONVENTION, CONFLICT, AND COMPLIANCE 255
However, these displays did not occur on every cycle, but rather from time to time on a complex basis. On many cycles Alex calmly accepted the food in an unexceptional way. In these cases, the interaction followed the straight line across the top of the flowchart. The general appearance of the flowchart is similar to those for spoon–feeding at this age in other families, except for the prominent negative–response elements. Flowcharts such as that shown in Figure 1 are useful for representing structures. Among other things, flowcharts graphically illustrate important interaction phenomena, for example, (a) the way in which both the participants’ actions and the sequences in which they occur are constrained by a commonly held structure, (b) the specific manner in which each participant’s action at a particular point in the structure provides the immediate context for and influences the partner’s next action, resulting in (c) rule–mediated bidirectional influence in interaction: the sequential contingency of each participant’s action on that of the partner, and (d) the way in which each instance of the interaction is created by sequences of actions by both participants, forming or defining a particular transit of the flowchart from beginning to end. It is readily apparent that this transit is jointly constructed through sequences of action involving both participants. Table 1 presents summary data for each month on the extent to which the structure fit the observed spoon–feeding cycles. Table 1: Fitting the structure to the transcribed data. Age
N
Interruptions
(months)
cycles
Uninterrupted
Failures
P model
cycles
to fit
fits
9
160
0
160
0
1.00
10
278
1
277
0
1.00
11
117
1
116
6
0.94
12
33
0
33
0
1.00
å
588
2
586
6
0.99
2.2 Interaction strategy In connection with interaction structure, the term ‘strategy’ will be used to refer to the joint construction of interactions by participants within the constraints of convention. Thus, strategy refers to sequences of rule–governed actions in interactions, as opposed to cognitive processes, such as plans, goals, intentions, or motivations. The topics of this discussion—conflict and compliance—will be seen to be primarily strategy issues within the context of interaction structure. Notice that the sort of pragmatic, everyday interaction aimed at achieving a particular goal, such as feeding or dressing the child, may be significantly more complicated in its structure than the parent–child games, such as peek–a–boo, described by the investigators cited earlier. Many of these games contain only obligatory actions, following an unvarying sequence of actions in each enactment.
256 STARKEY DUNCAN Such a structure containing no options was termed a ‘fixed–sequence’ structure in Duncan and Farley (1990). In contrast, a ‘variable–sequence’ structure provides at least one participant with an option. The structure shown in Figure 1 provides both participants with several options, with the result that the feeding interaction can take a number of different courses within the constraints of the structure. 2.3 Ratification In addition to convention, the notion of ratification regarding interaction structures is central to developing definitions of conflict and compliance in interaction. Ratification is taken in its usual sense of giving consent or approval. In the context of this discussion, ‘ratification’ is considered to involve a participant’s agreeing to, permitting, or going along with some convention– defined preceding action by the partner. Ratification in face–to–face interaction was considered by Goffman (1967; 1971). It was further elaborated in a conceptual framework for rule–governed interaction (Duncan & Fiske, 1977; Duncan et al., 1985) and was included an earlier, less inclusive, treatment of conflict (Duncan, 1991). Figure 1 provides examples of ratification processes within the structure for spoon–feeding Alex. The figure illustrates both the way an action within a convention may be identified as either ratifying or not ratifying its preceding action, and the way ratification sequences may be elaborated in longer strings, embodied in paths taken through an interaction structure. Most lines between shapes are labeled with a “+” or “–”, indicating that the action either ratifies or fails to ratify the preceding action, and with an “M” (mother) or “C” (child), indicating the participant. For example, “+M” indicates that the mother is ratifying the child’s preceding action. The mother’s choice of whether or not to clean the child’s mouth is not labeled with regard to ratification. This action is considered to be contingent on some state of affairs (food on the child’s face), and not as relevant to ratifying a preceding action by the child. As implied by the full structure, the child always ratifies the cleaning, even after an appreciable resistance to being fed. As mentioned earlier, upon a transit of the loop, the structure shown in Figure 1 provides the participants with three options: (a) the mother accedes to the child’s Negative Response, ending the feeding cycle, (b) the mother persists in her attempts to feed after the child’s Negative Response thereby creating another transit of the loop, and (c) the child abandons his Negative Responses, accepting the spoonful of food. Thus, the structure of this loop dooms the participants to repeat it until one of them accedes to the other [+M or +C]. Thus, in this case the general interaction sequence for exiting the loop is ([–R] → [+R]), where either participant may provide [+R]. It should be noted that not all conflictual loops have this structure. Other loops may have a more complex structure and be more difficult to exit (Hardway & Duncan, 2005).
CONVENTION, CONFLICT, AND COMPLIANCE 257
It may be seen that, when a structure is hypothesized and both participants’ actions remain fully conforming to the rules of the structure, it becomes possible for the investigator to identify most actions by a participant as ratifying or failing to ratify the partner’s preceding action. Additionally, there may be actions, such as cleaning in Figure 1, that may be judged to be neither ratifying nor nonratifying. As suggested in the preceding examples, when a convention is in use in an interaction, and participants are conforming to it (and thus ratifying its use), ‘ratification’ applies to their action sequences when using that convention. As illustrated in Figure 1, within the constraints of a mutually ratified convention, participants may ratify or fail to ratify the partner’s preceding action. That is, participants may concur or fail to concur with the direction the interaction is taking at that moment. Due to length limitations, the focus of this discussion will be on ratification. 3.
Defining Conflict and Compliance
We are now in a position to develop a definition of conflict in convention– based interaction. The main focus will be on the process of creating conflict––that is, defining when conflict comes into existence in an interaction. Directly implied is a definition of when an existing conflict is concluded. Table 2 presents key terms in the definition of conflict and compliance. These terms will be considered in detail in the following sections. Table 2: Terms used in the proposed definitions of conflict and compliance. ratification a participant’s agreeing to, permitting, or going along with some convention–defined preceding action by the partner steady state an unperturbed, nonconflictual interpersonal state target action a failure to ratify a current steady state in the interaction conflict two successive failures to ratify, one by each participant: a target action by one participant, followed by a failure to ratify the target action by the partner compliance a participant’s ratification of the partner’s preceding convention–based action
3.1 Creating conflict: Steady states, target actions, and failures to ratify In considering the creation of conflict in an interaction, it is convenient to begin with the assumption that from time to time there may exist between the participants an unperturbed, nonconflictual interpersonal state, termed a ‘steady state’ (Duncan, 1991). In a steady interaction state, the participants either are not in focused interaction, or are in nonconflictual interaction. Defining possible steady states in interaction is important because the type of steady state determines what actions are regarded as target actions. When there is a steady state in an interaction, a participant may fail to ratify—that is, attempt to change—some aspect of it. In this context, this nonratifying initiative is termed a ‘target action’ (Duncan, 1991). Assuming a
258 STARKEY DUNCAN current steady state in an interaction, ‘conflict’ will be said to be created when there are two successive failures to ratify, one by each participant. The first failure to ratify would be the target action. Conflict is created when the partner fails to ratify the target action. If the target action is ratified, the initiated change is incorporated in the interaction without conflict. If the target action is not ratified, then conflict is created, its course and duration determined by the prevailing structure and the subsequent actions of participants. In the case of conflict, it is assumed that a particular convention is being used in the interaction, and that both participants are acting in accordance with it and thus ratifying it. At issue is the course the interaction takes within the constraints of the convention. The paths contained within the structure define the possible courses the interaction may take, including the points at which the course of the interaction may be changed in rule–governed ways. A ratifying action by a participant within a convention constitutes a steady state. An action is considered ratifying [+R] if it continues or builds on the partner’s preceding action or facilitates the partner’s ongoing action. 3.1.1 Target actions An action is considered a target action, that is, a nonratifying [–R] initiative, if (a) the preceding action was ratifying [+R], and (b) its effect is to block or divert that preceding action. Thus, a ‘target action’ is a participant’s failure to ratify a preceding ratifying action by the partner ([+R] → [–R]). The preceding ratifying action by the partner would be considered the momentary steady state within the convention. A failure to ratify within a convention has the effect of changing the current course of the interaction within the rules for the convention. ‘Conflict’ is defined as being created when, following a ratifying action (the steady state), there are two adjacent failures to ratify, one by each participant. This first failure to ratify (target action) is regarded as a bid by one participant to change the direction of the interaction within the constraints of the structure. If this bid is ratified by the partner, the course of the interaction is changed without conflict. Conflict is created upon the second failure to ratify. The attempted change is resisted by the partner ([–R] → [–R]). Conflict is thus defined as being created when there is mutual nonratification within a convention. Thus, in conflict, there is mutual ratification of the convention itself but manifest disagreement regarding the course the interaction should take within the convention rules. Depending on the structure of the convention, it is entirely possible for two or more conflicts to arise within a single transit of the flowchart, or for the same conflict to become extended. It may be seen that that particular structure makes it possible to exit the conflictual loop and subsequently re–enter it within a single cycle. 3.1.2 Conflict: Conclusion The integrated treatment of a wide range of conflictual interactions presupposes that in each case the relevant convention has been hypothesized.
CONVENTION, CONFLICT, AND COMPLIANCE 259
Target actions and subsequent ratification or nonratification are identifiable only in terms of a specific interaction structure. Only some structures permit conflict. For example, in the no–wires game with its simple, fixed–sequence structure (Duncan & Farley, 1990), no options are available to the father and child. It is possible only to change the structure or end the game. In contrast, several different conflictual sequences are possible in the structure for spoon–feeding Alex. Thus, identifying the ratification sequences in the paths for a structure permits evaluation of the types of conflicts possible in the structure. The investigator becomes able to examine, among other things, the extent to which participants engage in these possible conflicts, what courses these conflicts take, how the conflicts are concluded, and how these conflicts change over time. 3.2 Compliance Given the developments in the preceding sections, it will not be surprising that in this discussion compliance in face–to–face interaction will be identified with a participant’s ratifying the partner’s preceding action. Evaluating an action as either ratifying or failing to ratify a preceding action is based on the hypothesized convention of which the action is a part. Both compliance and noncompliance presuppose (a) that participants are in convention–based interaction, and (b) that participants are following the rules of each convention in use. Compliance within a convention is defined as a participant’s ratifying the partner’s preceding convention–based action, regardless of whether or not that preceding action was ratifying or not. 3.3 Applying conflict and compliance to ratification action sequences We are now in a position to apply the definitions of compliance, noncompliance, and conflict developed in the preceding sections to strings of convention–based action. For simplicity, and to conserve space, these strings will be limited to three–action ratification sequences. Figure 1 provides examples of more extended sequences. Table 3 shows a set of three–action ratification sequences by participants A and B. In this illustration each sequence begins with an action by participant A and traces forward through the next two actions (A → B → A). For simplicity, these examples assume a steady state prior to A’s beginning action. In Line 1 there is full mutual ratification or reciprocal or compliance across the sequence. Line 2 begins with A’s nonratifying target action. However, B ratifies that nonratification, and the interaction proceeds in the new direction initiated by A. In Line 3 the sequence begins with mutual ratification. A then performs a target action.
260 STARKEY DUNCAN In Line 4 A’s initial failure to ratify is a target action. Conflict is averted by B’s subsequent ratification. However, A follows with a second target action. B’s next response will determine whether or not conflict is created. In Line 5 B performs the target action, but conflict is averted when A ratifies. The interaction proceeds in the direction initiated by B’s action. Table 3: Ratification actions sequences (participants A and B). Line Initial Subsequent Interpretation of sequence action 2-action sequence A B A No Conflict mutual ratification (reciprocal 1 + + + compliance) A's nonratification (target action), then 2 – + + mutual ratification mutual ratification, then A's target 3 + + – action A's nonratification (target action); B 4 – + – ratifies; A's new target action 5 B's target action; A ratifies; no conflict + – +
6
+
–
–
7
–
–
–
8
–
–
+
Conflict B's target action; conflict created when A does not ratify ongoing conflict, created when B fails to ratify A's target action conflict, created when B fails to ratify, and concluded when A ratifies
B has the target action in Line 6. A’s subsequent failure to ratify creates conflict. The string of mutual nonratifying actions in Line 7 indicates ongoing conflict, beginning with A’s target action. Depending on the interaction structure, this might take the form of a loop, or a nonlooping conflict in which actions are not repeated. Finally, in Line 8 conflict is created in the first two actions: A’s target action, and B’s failure to ratify. The conflict is concluded immediately upon its creation when A ratifies B’s nonratification. The initiative in A’s target action is abandoned. The interaction proceeds in line with the steady state prior to A’s target action. For hypothesized conventions it will be apparent that these and longer sequences can be traced across entire enactments of a convention (a path within a convention structure), providing a low–level interpretation of that interaction, and shedding light on the potentially complex interplay of compliance and noncompliance, cooperation and conflict within that single interaction. This would be true of all conventionalized interactions, regardless of their nature. By analyzing interactions with a number of occurrences, such as spoon–feeding, it becomes
CONVENTION, CONFLICT, AND COMPLIANCE 261
possible to describe broad tendencies in action sequences (Anolli, Duncan, Magnusson, & Riva, 2005; Duncan & Collier, 2002; Friedman et al., 2007; Magnusson, 2000; 2005). It becomes obvious that the implication of each ratification or failure to ratify by a participant, including its effect on the partner, can be interpreted only in terms of the interaction structure and the broader action sequence within which it is embedded. A simple example of this is shown in Figure 1, in which the mother’s extending the spoon is either ratifying or not ratifying, depending on the point in the stream of interaction at which it occurs. A single ratification may be: (a) part of a string of mutual ratifications, (b) a positive response to a target action, thus avoiding conflict, or (c) the conclusion of an ongoing conflict. Similarly, a failure to ratify may be (a) a target action, (b) the creation of conflict following a target action, or (c) part of a longer conflictual sequence. Thus, the location of a participant’s action on a path within a structure both gives meaning to that action and creates implications for the partner’s next action. 3.4 Summary: Compliance and noncompliance It may be seen that potentially complex patterns of compliance and noncompliance can occur both within and across interactions in early parent–child interaction. When conventions are hypothesized, it becomes possible for investigators to examine these patterns in detail. Because ratification sequences are abstract, they can be analyzed, if desired, independently of the specific actions within a convention. For this reason recurrences of ratification sequences shown in Table 3, as well as longer ones, may be traced (a) within a single enactment of a convention, (b) between two or more enactments of the same convention, such as successive instances of spoon–feeding during a meal or spoon–feeding at different meals, and (c) between different interactions, such as games and spoon–feeding. Observed patterns of these occurrences may provide information on those sequences most closely related to reciprocal compliance and noncompliance in families, and thus how these phenomena manifest themselves and develop. The potential complexity of the data provides the opportunity for expanding our understanding of processes of compliance and noncompliance, cooperation and conflict. Much interaction involves unexceptional, violation–free compliance with routinely used conventions. In this case both participants ratify (i.e., comply with) the convention. That is, both participants are acting in conformity with the shared interaction structure. This is reasonably regarded as an important form of mutual or reciprocal compliance. By the same token, there may be frequent or extended sequences of mutual or reciprocal noncompliance. In the context of compliance, there may be more complex processes of joint construction of interactions involving interesting sequences of compliance, noncompliance, conflict initiation, negotiation, and conclusion. These sequences await the scrutiny of investigators.
262 STARKEY DUNCAN 4.
Discussion
Building on a conceptual framework for describing face–to–face interaction, an integrated treatment of compliance, noncompliance, conflict, and related phenomena is proposed. The approach is based on two underlying notions: (a) participants’ use of conventions for conducting various types of everyday interaction, such as discipline or spoon–feeding, and (b) ratification. As discussed earlier, the notion of rule–governed action or convention is commonplace in linguistics and in the social sciences. Hypotheses concerning rules operating at various levels of interaction can be developed on the basis of observed regularities in interaction sequences. 4.1 Ratification–based compliance, noncompliance, and conflict Within this framework, compliance occurs when a participant ratifies the partner’s preceding action within a convention. Similarly, noncompliance is defined as a failure to ratify. Within a structure each action can be identified as ratifying, not ratifying, or neither. Interestingly, as noted in the earlier section on ratification, within certain interaction structures it is possible for the same action to be either ratifying or not ratifying, depending on the path in which it occurs. Strings of two or more consecutive ratifications ([+R] → [+R]) are considered mutual compliance, one form of reciprocal compliance. Conflict is defined as being created by reciprocal noncompliance: two successive failures to ratify ([–R] → [–R]), one by each participant. The first of these was termed the target action. Conflict is created upon the second failure to ratify and continues as long as neither participant ratifies. Conflict assumes that both participants are conforming fully to the applicable convention. In this case the conflictual issue relates to the partner’s preceding convention–based action, and thus to the path taken through the structure. That is, the conflict concerns the way the interaction is to proceed within the structure. When a structure contains one or more options, as in Figure 1, there will be more than one possible path through it. It becomes a routine matter to identify all possible paths within the structure. Further, each action in a path can be identified as ratifying, not ratifying, or neither. For each observed enactment of the convention, the investigator can determine which one of the possible paths was actually taken. Paths within the structure are interpretable as strings of ratifying and nonratifying actions. It becomes possible to observe in routine family interactions: mutual ratification (reciprocal compliance) ([+R]) → [+R]), mutual nonratification (conflict) ([–R]) → [–R]), unilateral nonratification (a target action) ([+R]) → [–R]), and unilateral ratification (accepting the partner’s target action) ([–R]) → [+R]), as well as to identify the participant taking each of these actions. For many structures, such as that shown in Figure 1, longer, more complex strings can be observed. Thus, flowchart paths contain information on, for example, (a) who is ratifying or not ratifying whom, (b) patterns of compliance, (c) the directionality
CONVENTION, CONFLICT, AND COMPLIANCE 263
of this compliance, (d) mutual compliance, (e) who is initiating changes in the course of the interaction, (f) in the structure of the interaction itself, (g) each participant’s response to initiated changes, (h) the occurrence and duration of conflict, and (i) how that conflict is negotiated and concluded. As the interaction is repeated within the family, it becomes possible to gather data on how many times each path was taken, and on the sequence in which different paths are taken. The research possibilities, as well as the potential complexity of the phenomena, are apparent. A research tool is available that permits direct examination of each of these issues (Duncan & Collier, 2002). The extent to which various conventions provide for the partner’s legitimate nonratification of the participant’s preceding action is an empirical issue. It seems reasonable to expect that nonratification may be an option in many interaction structures and may be considered desirable by the parents, at least in certain contexts and when used in moderation. Such an interaction structure provides for expression of the partner’s current state (in the case of Figure 1, receptivity or nonreceptivity to being spoon–fed), and thus for the possibility of ratifying that expression—that is, adjusting the interaction in response to this state. This is one example of the broader notion of state sensitivity in conventions (Duncan & Farley, 1990). (Of course, the partner may not ratify the participant’s nonratification, creating conflict, as shown in Figure 1.) Note that a target action, even an oppositional one such as a refusal or an aggressive act, does not create conflict if the partner ratifies it. This seems important because it preserves the notion of conflict as an interactive process. Conflict cannot be unilaterally created. An apparently unintended consequence of one–opposition definitions is that they permit unilateral conflicts because they require only that a participant oppose the partner’s action. No constraints are placed either on the nature of the partner’s action to which there is an opposition, or on the nature of the partner’s response to the opposition. Thus, the opposition is considered alone in the stream of interaction. 4.2 Mutual contingency of action Interaction is defined as mutual contingency of action. That is, each participant’s action is influenced, at least in part, by the preceding actions of the partner (as well as possible expectations of the partner’s subsequent action). Within an interaction structure, each action by a participant directly influences the action of the partner by moving the interaction to the next point in the structure at which the partner must either perform an obligatory action, or choose from a set of options. This mutual influence occurs even when the structure contains only obligatory actions, such as the simple ‘feed–Daddy’ and ‘no–wires’ games described in Duncan & Farley (1990). For example, in Figure 1 if the child does not display a Negative Response upon the mother’s spooning the food (the first diamond), the mother always extends the spoon. If the child does display a negative response at this point, the
264 STARKEY DUNCAN mother chooses either to end the cycle, or to extend the spoon. These are the direct implications of the way the child exercises this option at this point in the interaction. Once the spoon is extended, a Negative Response creates a different set of options for the mother’s next action. Among other things in this structure, the first Negative Response may occur only once during a spoon–feeding cycle, while the second Negative Response may occur multiple times. When the structure for an interaction is hypothesized, it becomes possible to describe in highly specific terms the way in which each participant influences the other at each point in the stream of interaction. In this way, hypotheses of interaction structure provide investigators with a tool for detailed examination of mutual contingency in observed interactions. It becomes possible to trace the evolution of mutual contingency within an ongoing relationship, to compare influence processes across different types of interactions for a relationship, and to compare relationships on the specific ways members influence each other in comparable interactions. 4.3 Negotiation Like bidirectionality, negotiation by both participants is directly implied by convention–based interaction. Negotiation has been defined as, “a process by which a joint decision is made by two or more parties” (Pruitt, 1981, p. 1). From the perspective of convention–based interaction, all face–to–face interaction entails negotiation between the participants. Each path through a structure is a joint construction of the participants and thus a negotiation, regardless of whether the interaction is conflictual or nonconflictual. Each exercise of an option may be regarded as a proposal regarding the direction or path the interaction should take within the constraints of the convention. This proposal is subject to the partner’s agreement, to be accepted or resisted. In either case the course of the interaction is negotiated between the participants. It should be noted that the convention structure itself, together with the actions it contains, constrains what can be negotiated, and how it may be negotiated. This provides a critically important additional input to a participant’s negotiation skills. For example, it would be entirely possible for a structure to provide options only for the participant and none for the partner, or vice versa. 4.4 Changes in structure and strategy It will be apparent that the suggested conceptual framework is based on regularities in interaction sequences in observed interactions. It proposes an approach to describing observed actions and their sequential relationship. This approach may be applied regardless of the nature of the interaction, the specific actions observed, or the nature of the structure. Thus, the general approach may be applied in the same manner to different interactions in the same family, and to interactions in different families.
CONVENTION, CONFLICT, AND COMPLIANCE 265
By the same token, interaction can be studied longitudinally. There may be developmental changes within a family. Thus, in many families spoon–feeding structures evolve over time. For example, the structure may be elaborated or streamlined, or the child may engage in more self–feeding either by hand or with a spoon. These developments would undoubtedly result in changes in the structure. Contrasts may be found in two major aspects of interaction: (a) the structure itself, and (b) the paths taken in the structure. Even when the structure remains constant over some period of time, as is the case for Figure 1, there may be developmentally significant changes in the paths taken in the interaction (Friedman et al., 2007). That is, there may be changes in the strategy taken within the interaction structure. For example, in the face of the child’s vigorous resistance, the mother may increasingly choose to end the cycle, or the child may resist more frequently over time. In either case, the structure is both sensitive to any changes in the interaction that may occur, and quite specific concerning the nature of these changes. However, the structure is entirely descriptive, avoiding a priori commitment to the nature either of the structure or of observed changes in it. 4.5 Actions comprising structures Although this discussion has focused on the structure of interactions, it should be noted that each structure is composed of specific actions that provide its substance. That is, it is important to consider not only the abstract characteristics of the structure, but also the actions comprising the structure. What actions is the child learning to use, and to expect of others? Figure 1 provides an example. In many families, spoon–feeding is a warm and tranquil process. However, in the case of Alex, spoon–feeding involves frequent, stormy noncompliance and conflict. Notice that, extrapolating from the 277 uninterrupted spoon–feeding cycles observed in two days of taping when he was 10 months old, there are more than 4,000 cycles estimated for Alex during this month. Consider the possible implications of this one interaction, with its experience of frequent and highly demonstrative conflict, for the child’s socialization. Consider the child’s developing set of expectations concerning his own actions and those of the mother, including the child’s developing social strategies, all within a legitimized family interaction structure. 5.
Conclusion
The notion of convention–based interaction, together with the related notion of ratification, provides not only a conceptual foundation for studying the general process of interaction, but also more specifically a framework for studying conflict and compliance. By capitalizing on the operation of conventions in interaction, it becomes possible to provide a specific definition of conflict and compliance that accommodates the complexities of interaction process. The proposed approach to conflict and compliance highlights both the way in which
266 STARKEY DUNCAN these two phenomena are integrally related and complementary, and the way that much of interaction may involve a continual interplay between conflict and compliance. These interaction processes may be directly linked to higher level concepts of interest to investigators, for example, bidirectionality, negotiation, and reciprocal effects. As such, the study of convention–based interaction may provide a productive perspective on the dynamics of interaction. References Anolli, L, Duncan, S. D., Jr., Magnusson, M. S., & Riva, G. (2005). Hidden structure of interaction: From neurons to culture patterns. Amsterdam: IOS. Bavelas, A. (1950). Communication patterns in task–oriented groups. Journal of the Acoustical Society of America, 22, 725–730. Reprinted in D. Cartwright & A. Zander (Eds.) (1960), Group dynamics: Research and theory (2nd ed.) (pp.669–682). Evanston, IL: Row, Peterson. Doolittle, E. J. (1995). Mother–daughter interaction in lower–socioeconomic African American families and its implications for academic achievement. Unpublished doctoral dissertation. University of Chicago. Duncan, S. D., Jr. (1991). Convention and conflict in the child’s interaction with others. Developmental Review, 11, 337–367. Duncan, S. D., Jr., & Collier, N. T. (2002). C–Quence: A tool for analyzing qualitative sequential data. Behavior Research Methods, Instruments, and Computers, 34, 108–116. Duncan, S. D., Jr., & Farley, A. M. (1990). Achieving parent–child coordination through convention: Fixed– and variable–sequence conventions. Child Development, 61, 742–753. Duncan, S. D., Jr., & Fiske, D. W. (1977). Face–to–face interaction: Research, methods, and theory. Hillsdale, NJ: Lawrence Erlbaum Associates. Enders, J. K. J. (1997). Meal time interactions from infancy to toddlerhood: The case of one mother–son dyad. Unpublished doctoral dissertation, University of Chicago. Friedman, J, F., Duncan, S.D. Jr., Sadowski, K., & Hedges, L., & Li, T. (2007). Analyses of rulegoverned infant–caregiver interaction. Submitted for publication. Goffman, E. (1967). Interaction ritual. Garden City, NY: Anchor. Goffman, E. (1971). Relations in public. New York: Basic Books. Hardway, C., & Duncan, S. D., Jr. (2005). “Me first!” Structure and dynamics of a four-way family conflict. In L. Anolli, S. Duncan., Jr., M. Magnusson, & G. Riva (Eds.), The hidden structure of social interaction: From neurons to behavior patterns (pp.209-222). Amsterdam: IOS Press. Li, T. (1997). Cross–cultural comparisons between American and Chinese families on early caregiver–infant interactions at home. Unpublished doctoral dissertation. University of Chicago. Nelson, R. J. (1968). Introduction to automata. New York: Wiley. Pruitt, D. G. (1981). Negotiation behavior. New York: Academic. Schelling, T. C. (1960). The strategy of conflict. New York: Oxford University Press. Shields, M. W. (1987). An introduction to automata theory. Oxford: Blackwell.
Atypical Minds and Bodies
Discourse Focus, Gesture, and Disfluent Aphasia1 Susan Duncan University of Chicago
Laura Pedelty University of Illinois
The telegraphic speech of Broca’s aphasics, often accompanied by deficits in verb use, gives the impression that these speakers have lost their understanding of grammar. The ‘agrammatism’ syndrome is often taken as support for modularist and localizationist accounts of language processing and the brain’s organization for language. Goodglass et al. (1967) offer an opposing, processing limitations account, based on what is spared in agrammatic aphasic speech, calling upon the notion of ‘stress-salience’ of sentence consituents. Here, we extend this account based on a three-language, comparative analysis of gesture-speech coexpression of discourse focal content in non-aphasic speakers. We show that verb salience in utterances varies within-language depending upon discourse context, and across languages depending on general facts about verb behavior in each language (Hopper, 1997). Finally, a case study of the discourse of an English-speaking aphasic reveals implications of multimodal discourse analysis (McNeill, 1992; 2005) for a theoretical account of the ‘verb problem’ of disfluent aphasia.
1.
Introduction
We judge the fitness of any paradigm for understanding the human capacity for language according to a variety of criteria. Important among these is the question of how much insight application of the paradigm yields into the nature of language breakdown. As noted in the Introduction to this volume, McNeill (2005) describes the theory of language he has developed over decades (1985; 1992; 2005) as, “antireductionist, holistic, dialectical, and grounded in action and material experience” (p.4). The paradigm is fundamentally distinct from psycholinguistic and neurolinguistic research paradigms cued by formalist linguistic theories; those claiming that the human mind possesses an invariant, compartmentalized grammatical competence, autonomous from other cognitive capabilities. An aspect of the McNeill paradigm that has significant consequences 1
We wish to thank Elena Levy, David McNeill, and James Goss for their helpful comments on earlier versions of this paper. We also wish to thank David McNeill for many years of inspired mentoring.
270 SUSAN DUNCAN AND LAURA PEDELTY for our understanding of language disorders is its methodological component. The method emphasizes exhaustive, multimodal microanalysis of natural, extended, connected discourses of many types, for example, storytelling, collaborative planning, dyadic and group conversation, etc., and from many different speaker groups—children, adults, diverse language/cultural groups, and individuals with neurogenic language disorders. In theorizing about skills we infer to be constitutive of language, and the neurological underpinnings of language systems, the study of aphasias has held a unique place. These are clinical syndromes of disordered linguistic processing consequent on focal brain lesions. Damage to identifiable neurological structures results in distinct disruptions of language with reasonable reliability. This mapping of inferred function onto identifiable brain structures has been employed as a tool to probe the nature of the psychological representations and mechanisms of human language use. We highlight the analytic method belonging to the McNeill paradigm, above, since its application to non-aphasic and aphasic language behavior can lead observers to conclusions about the nature of the psychological representations and mechanisms of human language that differ substantially from those arrived at through application of other research methods, both psycholinguistic and neurolinguistic, that do not involve close analysis of natural discourse. In this chapter we present an example outcome of such a discourse analysis. 1.1
The clinical syndrome of Broca’s aphasia
A principle distinction in the phenomenology of the aphasias emphasizes the qualities of aphasic speech. ‘Nonfluent’ aphasias, typically associated with lesions anterior to the Rolandic fissure and characterized by effortful, halting, and laconic speech, are contrasted with ‘fluent’ aphasias, typically associated with lesions posterior to the Rolandic fissure and characterized by fluent or even hyperfluent speech, together with variably severe deficits in comprehension. The syndrome known as Broca’s aphasia, in which speech is sparse and effortful while comprehension is relatively spared, is typically associated with damage to the dominant hemisphere (left, in most right-handers) frontal operculum (including, but not limited to, Brodmann’s areas 44 and 45). False starts and omissions— often of closed class function words, morphological inflections, and verbs—typify Broca’s aphasia. The utterances of these speakers are often said to sound “telegraphic,” reflecting the emphasis on static, content-laden words at the expense of functors. When there is a paucity of overtly organized syntactic structure, Broca’s aphasic spoken language is labeled “agrammatic.” The ‘verb problem’ in Broca’s aphasia is the tendency to omit or nominalize verbs in utterances (Miceli, Silveri, Villa & Caramzza, 1984; Zingeser & Berndt, 1990). This characteristic symptom in the language systems of Broca’s aphasic speakers is the target of our attention in this chapter. On the view that verbs are the core syntactic constituents of utterances, this symptom has been studied and
GESTURE AND DISFLUENT APHASIA 271
interpreted by some as one piece of evidence in support of a neurologicallygrounded grammar ‘module’, that is somehow responsible for organizing the syntax of speech (language) output. 1.2
An example of disfluent aphasic speech
Excerpt (1) is from a narrative discourse produced by an adult with aphasia. She suffered a left hemisphere ischemic stroke, resulting in the clinical syndrome of Broca’s aphasia, as determined using the Boston Diagnostic Aphasia Examination (Goodglass & Kaplan, 1972). The patient was referred by her neurologist. Diagnosis was verified by her speech therapist and by core examinations as outlined in Pedelty (1987). The session from which the excerpt is taken was videotaped in an examining room at the Siegler Institute of Speech Disorders, Michael Reese Hospital, Chicago, following a meeting of the MRH Aphasia Support Group. The patient gave informed consent to be videotaped telling the story of an animated cartoon she had just viewed.2 In the excerpt, the speaker is describing to a listener an interval of the cartoon, immediately after viewing that interval. The three main characters are a cat, a bird, and the bird’s owner, an old woman. Over the course of the 6.5-minute cartoon, the cat tries repeatedly to catch the bird and repeatedly fails.3 In her descriptions of the cartoon events, this aphasic’s speech has the telegraphic quality often described in Broca’s aphasia. Overall, the speaker’s complete narration is a reasonable match to the criteria defining agrammatic aphasia, as well. In the excerpt (1), an article (“the”) occurs only twice. The only two identifiable verbs, though inflected for person and number, are stammered. In contrast, the six noun tokens are uttered generally clearly and forcefully. (1)
(.1) the (pause) vlk- (pause) uh (breath) bird? (pause) and c- (breath) cat (.2) (pause) and uh (breath) ss- uh (pause) she ss- (breath) (pause) apartment (.3) and ih- (pause) the (pause) uh (pause) old (pause) my (breath) ss- uh (pause) woman (pause) (.4) and uh (pause) she ss- (pause) like (pause) uh ae- f- f-fas-t (breath) (.5) cat (pause) and uh (pause) bird is-ss-ss (pause) (breath) (.6) I uh (pause) (.7) (breath) sh-sho- shows t- (pause) (.8) a- and down (pause) t- d- down (breath)
Aphasias with the syndrome characteristics described above have been observed in many different languages (Menn & Obler, 1990). Cross-language 2
The narrative discourse data for this aphasic speaker comes from a corpus videotaped discourses from nine aphasic speakers (four nonfluent and five fluent), on which the analyses in Pedelty (1987) were based. A full discussion may be found there. 3 See McNeill (1992) for details of the cartoon elicitation method.
272 SUSAN DUNCAN AND LAURA PEDELTY comparative analyses of this aphasia have been concerned with the extent to which agrammatism may manifest differently, given differences among the grammars of different languages (Bates et al., 1991; Paradis, 2001). Across all languages, however, Broca’s aphasics are known to exhibit impaired access to verbs in a variety of tests of production and comprehension. 1.3
Explaining agrammatic Broca’s aphasia
1.3.1 Syntactic deficits due to specific brain injury The seemingly systematic agrammatism of Broca’s aphasia has been offered as support for modularist, ‘autonomy of syntax’, linguistic theories. The frequency with which the clinical syndrome arises from damage to the frontal operculum (including Brodmann’s areas 44/45), suggests regional specialization for those grammatical aspects of language that seem absent in Broca’s aphasic speech. This has been taken to support localizationist accounts of the brain’s evolution and organization for language, at least as concerns syntactic processing (see, for example, Grodzinsky, 2000; Friedmann, 2001). The ‘verb problem’ as a symptom of Broca’s aphasia is often discussed constrastively with Wernicke’s (fluent) aphasia. The latter is associated with damage to the temporal rather than frontal cortex of the left hemisphere. Wernicke’s aphasics are often reported to have impaired access to nouns (anomia), with verbs relatively spared. The relevance of such dissociations for neurolinguistic theories is clear. To the extent that damage to particular brain areas may result in deficiencies in the use of particular grammatical form classes, modularist and localizationist accounts of language processing are at least partially supported. The significance of the ‘verb problem’ for theorizing about Broca’s aphasia is reflected in the assertion, frequently encountered in the literature on agrammatic aphasia, that the verb is, “the most important word in a sentence, because it reveals who did what to whom.” This assertion makes reference to the fact that, in many languages, Italian and Spanish, for instance, verbs are morphologically inflected for person, number, case, and so on. Or, in contrast, in non-inflecting languages, the verb conveys sentence argument structure by virtue of its position relative to other sequentially-ordered sentential constituents. To frame an approach to the ‘verb problem’ by saying that verbs are integral to the syntactic framing of utterances is to view the symptom as due to a syntactic deficit. 1.3.2 Against a syntactic deficit account A modularist, syntactic deficit account of Broca’s aphasia, generally, or the ‘verb problem’ in particular, however, cannot account for data from, for example, Chinese Broca’s aphasics, in whom the ‘verb problem’ and the verb/noun dissociation (Bates et al., 1991a) is also observed. Non-aphasic Chinese is a morphologically very spare language with no inflections on verbs for person, number, case, or other grammatical distinction; further, the positioning of verbs in Chinese utterances is quite variable. In other words, verbs in Chinese utterances
GESTURE AND DISFLUENT APHASIA 273
cannot be said to have the same ‘syntacticity’ as verbs in, for instance, Romance languages. Yet, Chinese aphasics show the same signs of impaired access to verbs as speakers of other languages (Chen & Bates, 1998). The behavior of verbs in non-aphasic English has some of the same qualities as in Chinese (Hopper, 1997). We return to this point in section 2.4, below. Other behavioral and brain data also tend to weaken modularist and localizationist accounts in general. Cross-language comparative research on aphasic language comprehension and production demonstrates that agrammatic Broca’s aphasic native speakers of different languages can be prompted to manifest knowledge of the particular syntactic properties of those languages (Bates et al., 1991b). Further, agrammatism deficits tend to be variable and intermittent as opposed to absolute (Berndt, 1990). Undiluted cases of agrammatism are rare, perhaps nonexistent. These facts are problematic for localizationist accounts of specific defiicits. Finally, studies of the brain damage in aphasics show that, “there is sufficient variability in lesion sites that produce clinically similar symptom pictures so that the standard ‘accepted’ lesion corresponding to a syndrome has only a probabalistic, rather than a fixed, relationship to the given pattern of language disturbances” (Goodglass, 1993:218). Taking a different perspective, some researchers have linked telegraphic Broca’s aphasic speech to the prosodic or phonological structure of language production and comprehension. Goodglass, Fodor & Schulhoff (1967), Kean (1977), and Goodglass (1993) have observed that constituents of utterances that can or do receive prosodic emphasis, tend to be the spared output of disfluent aphasic language production. Kean (1977) argued that morphemes that have no role in the assignment of word stress in a sentence are the ones that will be omitted. She concluded that the deficits of Broca’s aphasia had been wrongly attributed to a loss of syntax competence and proposed instead that deficient phonological competence is the root cause.4 Goodglass et al. (1967), in a study that also focused on prosodic factors in language comprehension and production, presented agrammatic aphasics with a sentence repetition task. Goodglass and his colleagues asked the aphasics to repeat sentences that were spoken to them. The sentence constituent receiving prosodic emphasis varied from stimulus sentence to stimulus sentence; for example, “JOHN kissed mary,” “john kissed MARY,” and, “john KISSED mary.” Results showed that the aphasics tended to be able to repeat the stressed constituent of each sentence (sometimes only that constituent), regardless of the constituent’s grammatical role, it’s morphological characteristics, or its form class. The authors considered these results to be evidence for a processing limitations account of agrammatism: “one that relates the threshhold for initiation of an utterance to the ‘saliency’ of available words in the message. ‘Saliency’ is defined roughly as a
4
Kean’s proposal is essentially an alternative modularist explanation of the deficits.
274 SUSAN DUNCAN AND LAURA PEDELTY combined effect of semantic significance and word stress” Goodglass (1993:114115). Findings such as these direct our attention away from notions such as ‘loss of grammatical competence’ to factors such as the Broca’s aphasic’s motor impairments, related to sequencing of speaking behaviors, and how these impairments may impede or distract attention from production of all but the most focal or significant words in each utterance the aphasic attempts. What is lost versus what is spared in the speech of disfluent aphasics may thus be more productively viewed from the perspective of discourse processes. 1.4
Summary
A theory of the human capacity for language needs to address the empirical facts of language breakdown. The seeming systematicity of the spoken language deficits associated with the aphasias, together with the general claims that may be made concerning the relationship between brain lesion locations and aphasic syndromes, have tended to comport with modularist, localizationist theories of human language. The notion that a disfluent aphasic’s deficits may be due to a specific lesion that has disabled all or part of a modularized grammatical competence, however, is at odds with several types of data: (i) the mutability and variability of agrammatic symptoms, (ii) cross-linguistic comparisons showing that agrammatic speakers of different languages retain degrees of proficiency in comprehension and production of the grammatical features particular to their languages, and (iii) evidence that the specific losses comprising a clinical picture of agrammatism are permeable to manipulations of discourse context. There are non-syntactic-deficit explanations for the verb problem, as well. These include category-specific lexical dissociation (e.g., Caramazza & Hillis, 1991), Luria’s (1970) distinction between nominal and predicative uses of language, and Goodglass’ ‘functional attitude’ analysis (1993:187). We lack space to sketch these alternatives here. 2.
A Multimodal Discourse Perspective on Non-Aphasic and Aphasic Language
In what follows, observations of speech and coverbal spontaneous gestures in adult speakers of English, Chinese, and Spanish are the substance of an analysis of natural discourse collected from ‘neurologically typical’, non-aphasic speakers. The results of this cross-language comparative analysis are then compared to the speech and gesture of a case-study, disfluent aphasic. This is the speaker of excerpt (1), above. These comparisons, between non-aphasic speakers and a disfluent aphasic, of how gestures combine with speech in connected discourse, are similar in spirit to the Goodglass et al. (1967) study of the relationship between prosodic emphasis and what is spared in disfluent aphasic spoken language. Specifically, we will suggest that the discourse-level language
GESTURE AND DISFLUENT APHASIA 275
processing phenomena that generate discourse focus and speech prosodic emphasis may be responsible for some of the specific characteristics of disfluent aphasic speech. Further, the observations permit us to formulate a prediction concerning the relative severity of the ‘verb problem’ in disfluent aphasia, across languages. 2.1 Gestures in discourse: Non-aphasic speakers In studies of neurologically typical speakers engaged in extended, connected discourse, researchers have observed that the stroke phases5 of meaningful gestures have a strong tendency to synchronize with prosodically emphasized constituents of the accompanying spoken utterances (Schegloff, 1984; Kendon, 1980; Nobe, 2000; Loehr, 2004). Utterance (2) is an excerpt from an English speaker’s cartoon narration. In it the speaker describes how the cat tries to reach the bird sitting high up on the window sill of an apartment building. In this utterance the verb-plus-particle sequence, “climbs up,” was given intonational emphasis. Precisely while uttering these two words, the speaker’s hands executed the stroke phase of the gesture (indicated with boldface type). With fingers directed away from her body, the hands flap upward alternatingly, a representation of the cat’s climbing manner of motion. Thus, we see that an iconically depictive gesture of climbing up occurs in synchrony with utterance constituents expressive of similar content. This representational gesture combines with intonational emphasis (indicated with superimposed rectangles of shading) to highlight the verbs-particle constituent in this utterance.6
(2)
the [cat climbs up the drain]pipe
(3)
this time he cli[mbs up inside the drain]pipe
In fact, it is routinely the case, in many speakers’ tellings of the cartoon event described in (2), that the action of the cat’s climbing emerges as the discourse focal element at this moment in their narrations. The cat has previously tried once and failed to reach the bird. This climb is his newly-conceived strategy, and the cartoon—cinematographically speaking—focuses the viewer’s attention on the event of climbing for some moments. About one minute after (2) is uttered, in the subsequent episode of her narration, this speaker describes the same cat climbing up the same drainpipe again. In the cartoon this time, however, the cat has decided to go up inside the pipe, so as not to be seen. As (3) shows, the speaker again chooses to describe this 5
The stroke is the phase of gesture production that is meaningfully interpretable in relation to concurrent speech and discourse context. 6 Gesture-speech annotation conventions in these examples: […] a gesture phrase in relation to concurrent speech; boldface: the stroke phase of the gesture phrase; underline: a motionless hold phase; shaded rectangle: peak prosodic emphasis; / unfilled speech pause; * self-interruption.
276 SUSAN DUNCAN AND LAURA PEDELTY new event using the verb-particle, “climb up”; however, the gesture stroke phase and speech prosodic emphasis do not synchronize with that constituent this time. Discourse focus has shifted now. Stroke and prosodic emphasis combine to highlight the figure-ground relational constituent of the utterance, “inside,” instead. The act of climbing itself is old information at this point. The new information that (3) is constructed to deliver is the position of the cat’s body with respect to the pipe. This bit of information has key relevance for how the remainder of the episode then unfolds. Duncan & Loehr (in preparation) demonstrate more fully how the semantic contents of speech and coverbal gestures, examined together, are informative as to a speaker’s specific discourse focus in each utterance of a continuous narration (see also McNeill, 2005:108-112). Their study also confirms the relationship between gesture stroke and prosodic emphasis during discourse focal intervals of speech, with instrumental assessment of vocal pitch, loudness, and syllable duration. What matters for the analysis of non-aphasic and aphasic speech presented below is that every such utterance, in extended, connected discourse, has a focal center that is given prosodic emphasis, and typically gestural emphasis as well; further, that this focal emphasis does not necessarily extend to include the verb of the utterance. 2.2
Gesture and the verb in English, Chinese, and Spanish
Research on the ‘verb problem’ in Broca’s aphasia has been guided by a strong assumption that verbs always or routinely constitute the focal centers of utterances, since, sententially, they code ‘who did what to whom.’ Many researchers seem to regard this supposed centrality of the verb in the sentence as being a cross-language universal, as well. Contrary to this assumption, however, the speech and gesture data from Chinese, Spanish, and English natural discourse that we present here demonstrate that the discourse focal constituents of utterances of non-aphasic speakers are frequently constituents other than the verb, and that there is variation across these three languages in how often this is the case. As with (2) and (3) in the previous section, we identify the focal centers of utterances on the basis of observations of combined gestural and prosodic highlighting of specific constituents of the speech stream. Utterances (4), (5), and (6), below, are excerpts from a comparison of full length cartoon narrations by eight adult, native, nonaphasic speakers each of American English, Mandarin Chinese, and Spanish; a sample of 24, five- to tenminute narrations. A variety of cartoon event descriptions were included in the comparison. However, to simplify exposition here, each pair of representative utterances below consists of one speaker’s descriptions of the same two cartoon events discussed in the previous section: (1.1) the cat climbs up on the outside of the drainpipe, followed by (1.2) the cat climbs up inside the pipe. Given our analytic focus on the location of the verb of each utterance with respect to speech prosodic emphasis and gesture stroke phase, and how this varies within- and
GESTURE AND DISFLUENT APHASIA 277
cross-language, the dotted-line border around the verb or phrasal verb of each utterance is a visual assist for distinguishing the relative positions of gesture stroke and verb, across the excerpts in the three different languages. 2.2.1 English The English excerpts in (4) are very similar in word choice and in gesturespeech synchrony to those in (3), above, in terms of the timing of focushighlighting gesture strokes with respect to verbs. This, in fact, is the pattern of stroke-to-speech synchrony, with prosodic emphasis, encountered in the majority of English-language descriptions of these two cartoon events (Duncan & Loehr, in preparation). In this speaker’s description of the cat’s climb up on the outside of the drainpipe, (4.1), the stroke phase of a gesture depicting upward movement synchronizes with the verb-particle construction, “climb up.” In his description of the cat’s climb up inside the pipe, (4.2), again as in (3), a gesture depictive of upward movement skips this main verb and particle to synchronize instead with the figure-ground relational term, “through.” Again as in (3), the strokeaccompanied speech constituents are prosodically emphasized as well.7 (4)
(.1) the second part [is he climbs up the drain] to get* try and get Tweety (.2) this time he tries to climb [up in through the] drain
2.2.2 Mandarin Chinese The excerpts in (5), from a Chinese narration, are similar in several respects to the English, though the extents of gesture stroke phases are greater in two instances. In the stroke phase of (5.1), the speaker’s two fists move upward alternatingly, depictive of climbing manner, in synchrony with the verb of the utterance pa (climb) as well as the words surrounding this verb. This speaker’s description of the cat’s climb up inside the pipe, (5.2), is a bit complex. We see that the verb zuan (bore) occurs three times. In the first iteration, a gesture stroke depictive of a curving upward path of motion synchronizes, not with this verb, but instead with the ground- and figure-ground relation-expressive words; shui-guan (drainpipe) and li-mian (inside). In the second iteration, zuan (bore) is initially skipped again, but then the third iteration is encompassed by a rather long-lasting gesture stroke phase that again encompasses the ground- and figure/ground-relation-expressive constituents. Thus, in these representative English and Chinese utterances, we see similar patterns of gesture-speech cooccurrence highlighting discourse focal information, in ways that are typical at each of these two points in the cartoon narrations. In the first description in each language, the discourse focal constuent is a verb or includes the verb; in the second, the verb is largely excluded from discourse focus. 7
In contrast to Duncan & Loehr’s (in preparation) method, prosodic emphasis in the cases discussed here was judged by native-speaker ear.
278 SUSAN DUNCAN AND LAURA PEDELTY
(5)
(.1) [ / ta jiu pa shui-guan / ] / 他就 爬 水 管 / he then climb water-pipe then he climbs the drainpipe (.2) (mao) [zuan zai shui-gu][an li-mian / dui / ] (貓) 鑽 在 水 管 里 面 / 對 / (cat) bore on water-pipe in-side / right / (the cat) bores into the drainpipe, right [ta zuan shui-guan li-mian zuan-jin-qu / / ] 他鑽 水 管 里面 鑽 進 去// he bore water-pipe in-side bore-enter-go he bores into the drainpipe, bores right on in
2.2.3 Spanish The Spanish excerpts in (6) present a picture that is similar to the English and Chinese examples, with respect to expression of discourse focal content. These excerpts differ, however, in that this content is expressed by the verb in both event descriptions. The Spanish utterance descriptive of the cat’s ascent on the outside of the drainpipe, (6.1), is accompanied by a gesture stroke that is again somewhat longer than the one that accompanied the English utterance, (4.1). The Spanish speaker’s stroke depicts upward motion and encompasses the verb-gerund construction va subiendo (go ascendingly), as well as a path-ground expressive constituent, por allí (via there). As with the English and Chinese examples, the Spanish description of this first event seems to center on the act of ascending. It is in her description of the cat’s entry via the inside of the drainpipe that the Spanish speaker displays a pattern of gesture-speech co-occurrence that, overall, distinguished the Spanish event descriptions in our sample of 24 narrations from those of English and Chinese. Our Spanish narrators often chose the verb meter (enter, insert) when relating this event of the cat ascending inside the pipe. Inherent in this verb is something of the figure-ground relation that the English speakers expressed with “inside” or “through,” and Chinese speakers expressed with li-mian (inside). This was the common way for Spanish speakers to express this event content that was discourse focal at this juncture in their narrations. (6)
(.1) [se* se va subiendo por allí no? / ] refl* refl he-go ascending via there no he goes up that way, no?
(.2) [ahora se mete por el bajante / ][y cuando se mete ] now refl he-enter through the drainpipe / and when refl he-enter now he goes in through the drainpipe, and when he goes in …
GESTURE AND DISFLUENT APHASIA 279
Based on our gesture-speech synchrony analysis of event descriptions from all eight narrations of each speaker group, we observe that utterances with gestural and prosodic highlighting of some constituent other than the verb occur roughly twice as often in the Chinese and English narration data than in the Spanish data. In other words, verbs appear to function more often as focal constituents of utterances in Spanish than is the case in English and Chinese. 2.2.4 Summary To summarize and conclude this sketch of our cross-language comparative analysis, our observations suggest that verbs generally less often have focal significance in Chinese or English, discourse-contextualized utterances than in such Spanish utterances. In the spirit of the Goodglass et al. (1967) salience account of what is spared in disfluent aphasic speech, the implications of these observations of non-aphasic discourse in three languages for aphasic narrative discourse are that, (i) to the extent that the aphasic has speaking ability sufficient to construct something of a narration, the aspects of events that non-aphasics tend to highlight with gesture and speech prosodic emphasis are most likely to be expressed in aphasic discourse, (ii) the constituents of utterances that speakers of an aphasic’s native language tend to prefer for emphasizing of such aspects will likely be relatively spared. 2.3
Discourse focus and gesture in disfluent aphasia: An example
A small exploratory analysis of the speech and gesture of the Englishspeaking disfluent aphasic who contributed (1), above, tends to fulfill these implications. In this ‘case study’-style analysis, we observe instances in which the aphasic’s telegraphic spoken utterances and co-occurring gestures highlight just those constituents that are typically highlighted in the gesture-speech productions of non-aphasic English speakers, as described above, and at analogous points in the continuous narration. In (7) a non-aphasic English speaker describes a scene in the cartoon in which the cat falls back down inside the drainpipe after climbing up inside it. The gesture stroke phase is a plummeting downward motion, from above the speaker’s head to about the level of her abdomen. This stroke synchronizes with the prosodically-emphasized path particle, “down.” The verb of the utterance, “rolls,” is underlined, indicating a motionless hold phase, preparatory to performing the gesture that co-expresses the discourse focal content of the cat’s very dramatic fall from a great height. This verb content is not discourse-focal at this moment in the narration and the gesture does not depict any rolling manner of motion.
(7)
and he rolls / down the drai][nspout
(8)
[a- and down] [(pause) t- d- down (breath)]
280 SUSAN DUNCAN AND LAURA PEDELTY As an example, the aphasic speaker’s description of this same cartoon event (1.8), is given again in (8), annotated to show co-occurring gesture. This is the final utterance of the longer narration interval excerpted in (1). During the interval preceding (8), the aphasic performs several gestures representing cartoon locations, entities, and events leading up to this final utterance. Examination of these preceding gesture-speech pairings, informed by our knowledge of the content of the cartoon that the aphasic speaker viewed, permits us to be quite certain that the event the she is trying to relate in (8) is the same as that related by the non-aphasic speaker in (7), even though there is no verb in (8), nor actor, nor setting. The aphasic performs a gesture depictive of a downward path of motion, not unlike the non-aphasic’s gesture in (7). Her hand moves sharply down from about the level of her shoulder to her lap. This synchronizes with the isolated path-expressive particle, “down.” She repeats this gesture in synchrony with the stuttered onset of repeated, “down.” Though there are no other words in her utterance, thus a judgment of prosodic emphasis is not really possible, the first vocalization of “down” is loud and sharp, compared to many of the other words in the extended interval, (1). The second “down” is strongly vocalized as well. This example from our case study is suggestive of the possibility that a disfluent aphasic attempts utterance production on the basis of a discourse model similar to that of the non-aphasic speaker, however, her motor deficits prohibit her from uttering and gesturing about any but the discourse focal elements (similarly to Goodglass et al., 1967). 2.4
The ‘verb problem’ revisited
The comparative analyses above tend to undermine the notion that verbs always play a central, organizing role in utterances in natural discourse. A more nuanced perspective on the ‘verb problem’ in Broca’s aphasia results from considering data drawn from extended natural discourses, comparing across languages in which the roles of verbs may differ, and across non-aphasic and aphasic speakers engaged in the same discourse production task. We find that sentential main verbs are often not the information-loaded, discourse-focal utterance constituents that our usual ways of thinkng about them would suggest. The greater tendency of Chinese and English speakers to give prominence to sentential constituents other than main verbs relative to the Spanish speakers, observed here, is in keeping with Hopper’s (1997) functionalist analysis of extended phrasal verbs in English and Chinese. Focusing on English, and drawing on a large corpus of personal narrative discourse in that language, Hopper observes a paucity of “solitary” verbs (pp.96-97), an abundance of phrasal verbs in English, and a, “dispersal of verbal elements over various parts of the predicate” (Hopper: 99). Regarding Chinese, Hopper notes, “… over the centuries a stylistic premium has come to be placed on elaborate verb phrases, such that ‘the longer and more complex the verbal, the more natural and appropriate the sentence.’” (Lily Chen (n.d.); Hopper p.100). He concludes that, “Languages [like
GESTURE AND DISFLUENT APHASIA 281
English and Chinese] abhor morphology and indicate distinctions of modality and aspect through auxiliary verbs, and various kinds of prepositional constructions, distributed discontinuously over the predicate and even across clause boundaries.” In the English and Chinese speakers’ highlighting of phrasal elements other than the main verb, in the relatively simple utterances descriptive of motion events examined here, we see a small reflection of the more general phenomenon identified by Hopper. Also in keeping with such claims are Tao’s (1996) observations on Chinese, that non-sentential utterances, many without main verbs, predominate in the natural discourse of neurotypical Chinese speakers. These observations by Hopper, Chen, and Tao comport with the findings of our comparative analysis. The contrast in tendency for verbs to receive focal emphasis in the utterances of non-aphasic Spanish speakers versus those of nonaphasic English or Chinese speakers, plausibly connects with the fact that Spanish is the relatively ‘highly inflecting’ language in this sample. The main verbs of Spanish utterances may indeed generally carry a greater burden of information about the utterance as a whole, and thus be important to emphasize, for successful communication. 3.
Conclusions
Lesser & Milroy (1993) note that experimental work and descriptive accounts of language disrupted by brain damage have too typically relied solely on speech data and on non-discourse-contextualized language use (but note that Goodwin (2003), and other efforts in the Conversation Analysis methodological tradition are exceptions to this generalization). The multimodal analysis offered here, of aphasic narrative discourse in comparison with narrative discourses of non-aphasic speakers, demonstrates the kinds of insights that are afforded the observer by less narrow, yet still highly systematic, comparative analyses of discourse. The type of analysis illustrated in this study, with its emphasis on assessment of gesture-speech coexpressivity of meaning and precise assessment of gesture-speech timing relationships, is the essence of the methodological approach embodied in the McNeill paradigm of psycholinguistic research. It is sometimes said of multimodal analyses of (relatively) unconstrained discourses, that their results can be unfocused, “purely descriptive,” and lacking in explanatory and predictive power. The facts of Spanish, Chinese, and English discourse focus in relation to the verbs of utterances, presented here in conjunction with the seeming similarities of discourse-focal emphasis in the English-speaking aphasic with English-speaking non-aphasics, however, are clearly evidence against a modularist agrammatism account of the deficits of Broca’s aphasia. They contribute to a parsimonious, alternative explanation of the deficits as being due instead to processing limitations. Further, the results resported here do yield a prediction. This is that the ‘verb problem’ in disfluent aphasia should be less severe in Spanish and languages like it, than in languages like English and Chinese. Relative severity should be a matter of degrees,
282 SUSAN DUNCAN AND LAURA PEDELTY proportionate to the tendency of verbs in each of the languages to be vehicles for discourse-focal content in extended, connected discourse. The study presented here demonstrates the utility of the multimodal discourse analytic method, practiced by McNeill and his students now for many years, for elucidating the psychological representations and mechanisms involved in non-aphasic and aphasic language use. The results have explanatory and predictive power and they suggest a future research direction: a larger-scale, systematic comparison of verb usage comparing languages that differ with respect to typical verb use. Results of such a study would have the potential to solidify the conclusions reached here, concerning processing capacity limitations as a root cause of a range of Broca’s aphasic symptoms. References Bates, E., Chen, S., Tzeng, O., Li, P., & Opie, M. (1991a) The noun-verb problem in Chinese aphasia, Brain and Language, 41, 203-233. Bates, E., Wulfeck, B., & MacWhinney, B. (Eds.) (1991b). Special issue: Cross-linguistic research in aphasia. Brain and Language, 41(2). Caramazza, A. & Hillis, A. E. (1991). Lexical organization of nouns and verbs in the brain. Nature, 349, 788-790. Chen, L., n.d. Verbal expansion in the history of Chinese. Department of Linguistics and Semiotics, Rice University (Manuscript, cited in Hopper, 1997). Chen, S. & Bates, E. (1998). The dissociation between nouns & verbs in Broca’s and Wernicke’s aphasics: findings from Chinese.” Aphasiology, 12(1), 5-36. Berndt, R. S. (1990). Introduction. In L. Menn & L. K. Obler (Eds.), Agrammatic aphasia: A cross-language narrative sourcebook. Amsterdam/Philadelphia: John Benjamins Publishing Company. Duncan, S. & Loehr, D. (in preparation). Contrastive discourse focus and gesture production. Friedmann, N. (2001). Agrammatism and the psychological reality of the syntactic tree. Journal of Psycholinguistic Research, 30(1), 71-90. Goodglass, H. (1993). Understanding aphasia. San Diego, CA: Academic Press. Goodglass, H., Fodor, I., and Schulhoff, C. (1967). Prosodic factors in grammar: Evidence from aphasia. Journal of Speech and Hearing Research, 10:5-20. Goodglass, H. & Kaplan, E. (1972). The assessment of aphasia and related disorders. Philadelphia: Lea and Febiger. Goodwin, C. (Ed.) (2003). Conversation and brain damage. New York: Oxford University Press. Grodzinsky, Y. (2000). The neurology of syntax: Language use without Broca’s area. Behavioral and Brain Sciences, 23, 1-71. Hopper, Paul (1997). Discourse and the category ‘verb’ in English. Language and Communication, 17(2):93-102. Kean, M.-L. (1977). The linguistic interpretation of aphasic syndromes: Agrammatism in Broca’s aphasia, an example. Cognition, 5, 9-46. Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In M.R. Key (ed.), The relationship between verbal and nonverbal communication. The Hague: Mouton. Lesser, R. & Milroy, L. (1993). Linguistics and aphasia: Psycholinguistic and pragmatic aspects of intervention. London: Longman Group. Loehr, D. (2004). Gesture and Intonation. Unpublished doctoral dissertation, Georgetown University Linguistics Department. McNeill, D. (1985). So you think gestures are non-verbal? Psychological Review, 92, 350-371.
GESTURE AND DISFLUENT APHASIA 283
McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press. McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press. Menn, L. & Obler, L. K. (Eds.) (1990). Agrammatic aphasia: A cross-language narrative sourcebook. Amsterdam/Philadelphia: John Benjamins Publishing Company. Miceli, G., Silveri, M. C., Villa, G., Caramzza, A. (1984). On the basis of the agrammatic’s difficulty in producing main verbs. Cortex, 20, 207-220. Nobe, S. (1996). Representational gestures, cognitive rhythms, and acoustic aspects of speech: A network/threshold model of gesture production. Unpublished doctoral dissertation, University of Chicago Psychology Department.. Paradis, M. (2001). Manifestations of aphasia symptoms in different languages. Oxford: Pergamon Press. Pedelty, L. L. (1987). Gesture in aphasia. Unpublished doctoral dissertation, University of Chicago Department of Behavioral Sciences. Schegloff, E. (1984). On some gestures’ relation to talk. In J. M. Atkinson & J. Heritage (Eds.), Structures of social action (pp.266-295). Cambridge: Cambridge University Press. Tao, H.-Y. (1996). Units in Mandarin conversation: Prosody, discourse, and grammar. Amsterdam/Philadelphia: John Benjamins. Zingeser, L. B., & Berndt, R. S. (1990). Retrieval of nouns and verbs in agrammatism and anomia. Brain and Language. 39(1), 14-32.
The Construction of a Temporally Coherent Narrative by an Autistic Adolescent Co-contributions of Speech, Enactment and Gesture1 Elena Levy University of Connecticut
The central concern of this chapter, following David McNeill’s (1992) account of the ontogenesis of gesture, is the co-contribution of speech, enactment and gesture to the development of temporally coherent discourse. The analysis builds on an earlier study (Levy & Fowler, 2005) of an autistic adolescent’s reliance on social speech when repeatedly retelling a story. In this chapter I add to the earlier analysis the role of body movement, both enactment and gesture. Two types of examples illustrate the joint contribution, when building temporal sequences, of social speech and movement: The first traces the discourse history, in speech and movement, of temporally sequenced utterance pairs, and the second observations of the constitutive function of gestural catchments. Following McNeill, I argue that the contribution of motoric enactment comes from the actual motion of the gestures themselves; and that such findings have implications for language-gesture acquisition, more broadly.
1.
Introduction
In Hand and Mind David McNeill describes a young child who, while narrating the Tweety Bird cartoon, says, “he cl-climbed up,” and simultaneously leaves the chair he’s been sitting on to climb on a second one. McNeill (1992) interprets this as part of a developmental progression. He remarks: An explanation for the…growth in use of symbols by children can be found in a principle to which virtually every developmental psychologist has subscribed. Piaget (1954), Werner (1961), Bruner (1966), and many others have recognized that objects and events in the world are, for little children, represented in the form of the actions that the child himself performs on the objects, or in the form of the actions the thing itself performs (simulated by the child)…The child doesn’t see himself presenting meanings so much as enacting them, and in this state he is what he is symbolizing… (McNeill, 1992:297-8).
1
I would like to thank Susan Duncan, Susan Goldin-Meadow, and Andrea Tartaro for their comments on earlier versions of this chapter.
286 ELENA LEVY In McNeill’s view, the example shows part of an ontogenetic process of semiotic change. Early gestures are actionlike; gesture space, for example, is, “full-sized, it centers on the child and moves about with him, it has local orientation, and all parts of the body are equally privileged to move in it…” (McNeill, 1992:303). As gestures approach adult iconics later in childhood, gesture space is reduced, and for adults normally consists of a “flattened disk” in front of the speaker. Even as gestures change, however, they remain a radically different semiotic mode from speech, unlike the segmented, arbitrary and conventional categories of language: “[T]he gesture, the actual motion of the gesture itself, is a dimension of meaning. Such is possible if the gesture is the very image; not an ‘expression’ or ‘representation’ of it, but is it” (McNeill, 2005:98). This chapter is concerned as well with the co-influence of gesture and speech on the development of meaning. The meaning I refer to is a temporally coherent view of events. My focus is not on ontogenesis as described above, but on an intermediate, mesodevelopmental span of time (Lewis, 2002), the unfolding of speech across minutes, hours and days. The examples come from a database of narratives collected using a method of repeated reproductions of stories (Bartlett, 1932), the first and last examples spanning several days. I have found that across this span of time, one can often witness changes in narrative coherence (Levy, 2003). My goal is to propose an account of how, in narrative discourse, temporal coherence can be built from the joint contribution of movement and speech. The proposal is illustrated with data from an autistic adolescent, “D,” who was asked to retell a story on three successive days, during a total of four interactive sessions. At first, D had great difficulty narrating the story, and his early attempts were mostly single or incomplete clauses, sometimes enacted. Later, he combined speech with enactment to produce longer, temporally accurate sequences of utterances; and finally produced temporally coherent sequences of speech alone, at times accompanied by adult-like gestures. Levy and Fowler’s (2005) analysis of D’s spoken retellings suggests that, as with typically developing children, the emergence of some temporally coherent discourse relies on properties of social speech. This includes adult prompts as well as D’s own earlier discourse productions; looked at, as Tomasello (1992) has noted, “as if someone else were looking at it.” Here, I add to this analysis the role of motoric enactment. My central proposal is that the kinesthetic enactment of events can co-contribute with speech to the development of temporally coherent discourse. Following McNeill, I argue that the contribution of enactment comes from the actual motion of the gestures themselves. 2.
Overview of Changes in D’s Speech-Movement Combinations
The examples are taken from D’s retellings of the classic children’s film, The Red Balloon. The story is about the friendship between a young boy and a red balloon that follows him as he goes about his daily activities. D watched the
TEMPORALLY COHERENT NARRATIVES 287
movie and was asked to retell the story immediately after (Day 1A), and again on the evening of the same day (Day 1E). Two other sessions followed on the next two days (Days 2 and 3). At all times D’s mother (M) was present during the sessions, and at times other adults as well: his father (F), a home health aide (H), and one of two undergraduate students (E and C). The instructions to D during the first session, Day 1A, appear at the start of example (1). Levy and Fowler (2005) describe how, in D’s spoken discourse, multiple scene narratives were built up over time, creating a temporally coherent linguistic structure that D added to, deleted from, and otherwise transformed. The focus of Levy and Fowler’s analysis was the scaffolding of temporal sequences by social speech—both adult prompts and D’s own earlier utterances. As D used social speech to construct temporal sequences, he made use of the same types of linguistic activities used by nonautistic, although younger children, including a range of degrees of reproduction of his own and others’ speech; from exact repetition to elaborated reproductions (see examples from typically developing children in Levy, 1999 and Levy & Fowler, 2005). In the rest of this section, I add to these earlier observations an overview of changes in D’s entire movementspeech combinations. The overview provides a background for the analysis, in the remainder of the chapter, of D’s construction of temporal coherence through his own repeated reproductions; from, as McNeill proposes, the joint contribution of movement and speech. 2.1 Day 1A Example (1)2 presents the first segment from the start of Day 1A, a long interactive session that spanned nearly 1000 utterances. (1)
2
(.1) (D and E are on couch; D is lying back and fidgeting.) E:[unintelligible] What do you remember? I don’t know. You don’t know? (.2) M: D, you’ve got to do better than that… (.3) E: Think about it. No. What’s the movie about? Called the red balloon. And what happened in the red balloon? The boy…the kids…the kids were happy [unintelligible], and lived happily ever after. (.4) M: Oh you’ve got to tell us more than that, D. (.5) (D sits up.) E:…What do you remember from the beginning of the movie?... Can you narrate it? (E moves arms up and down in shrugging motion). Can you show me what happened? (E repeats shrugging motion). Can you tell me what happened? The boy fly-y-y-y (D jumps up from couch on word “fly,” flapping arms as if flying, jumping and moving forward). (.6) M: Was that the beginning or the end or the middle? The end (D is standing still). M: And what happened at the end? The boy flew (remains standing still). E: The boy flew?
In this and all other transcriptions, adult speech is in italics.
288 ELENA LEVY (.7) M: And how did he fly?... (D clasps both hands overhead with arms outstretched, as if holding onto a balloon.) Tight (whispered). (.8) F: How did he get in the air? M: by what?...By balloons (lowering arms; starts to walk toward couch). M: Oh, by holding on tight by balloons, oh okay. That’s what the words were. (D sits on couch while M talks.) The example shows that, in his earliest retellings, D was highly resistant to retelling the story, replying, for example, “I don’t know”3 in response to “what do you remember?” (1.1), “no” in response to, “think about it” (1.3), and, “why don’t you tell it and then I say after you” (not shown) to other adult prompts. Most of D’s body motion was, in parallel, diffuse and unfocused: He lay on the couch or sat and fidgeted, sometimes rocking back and forth or swaying. During most of this session D produced only single utterances, as illustrated by his four short productions in (1), “the boy fly-y-y-y” (1.5), “the boy flew” (1.6), “tight” (1.7), and “by balloons” (1.8). It appears that, from D’s point of view, these were extralinguistic references to the original film, connected to one another only through adult speech. They were elicited during a rare moment of concentrated activity, starting with D getting up from the couch and ending with his return to it (hereafter referred to as the ‘couch-to-couch’ sequence; from 1.5 to 1.8). As D talked he enacted parts of the film. The speech-enactment combinations, along with the adult speech, formed the first true narrative sequence, and this in turn formed the basis for the later retellings. Toward the end of the session D agreed to “think about” the movie until E returned on the next day: (2)
E: I want you to think really hard for tomorrow, okay? Okay. Cause I have to drive far to get up here, and I want you to tell me a whole story without trying to stop, okay? Okay. And I want you to think really really really hard, okay? And I’m going to come here tomorrow afternoon, alright? [unintelligible] afternoon. Yes, we can do it, okay. (Day 1A)
2.2 Day 1E During the next session D interacted mostly with his home health aide, H, and his mother. Much of his activity was now energetic and focused, and he began to enact sequences of events as he had in the couch-to-couch segment on Day 1A. D’s intention to act out events was reflected in his own introductions to his enacted retellings as pretend games. For example, he began the first enactment as follows: 3
Throughout the text, I use quotation marks to represent expressions that actually occur in (that is, are tokens of) the participants’ speech. Underlining in the text represents what I take to be the speakers’ intended referents.
TEMPORALLY COHERENT NARRATIVES 289
(3)
Pretend we’re at the movie show (stands with right arm up while H rolls up his sleeve, “high fives” H, and puts right hand down). H: Okay. And then, and then we watch a movie (walks to right, points at H with left hand), okay (starts to sit down). Watch a movie [unintelligible] watch a movie (sitting down, points at H with left hand). (Day 1E)4
Most of D’s enacted speech during this session consisted of reproductions from the last session; in fact all enacted events that referred to actual events in the film had been put into words in the earlier session, 1A. Just before the final retelling on Day 1E, D and his mother acknowledged that his retellings were facilitated by acting out: (4)
M: Can you tell it better if you move? Yes. (Day 1E)
2.3 Day 2 D’s body motion in this session was again different. For most of the session he sat still and concentrated, and by the end described several scenes in accurate order, creating a single, temporally coherent account. This was produced without full-body enactment, although accompanied at times by adult-like gestures. D’s retellings were preceded by the following remarks to E, addressing her request from Day 1A (compare to 2 above): (5)
E: Okay… tell me from the beginning…to the end (two-handed gesture from left to right.) I thinked about [it] in my head (sitting still). E: Okay, good. (Day 2)
2.4 Day 3 On Day 3 D began his retelling with enthusiasm: (6)
(D and C sitting in chairs, facing one another.) Can I tell you the Red Balloon? The story goes like this… (Day 3)
D again remained still and appeared to concentrate, producing a single coherent narrative (see Figure 1). This was an elaborated reproduction of his final retelling on Day 2, and, like the earlier one, produced with speech alone or accompanied by gestures. Looking back to Day 1A, the connected narrative on Day 3 was very different from the original, individually elicited utterances that 4
Later enactments on Day 1E were preceded by, “I’m gonna act it out (standing up to start enactment),” “I just want to pretend I’m the little boy (moving and starting enactment),” and, “ladies and gentlemen…boys and girls, welcome to this movie (sitting still).”
290 ELENA LEVY were linked to one another through adult speech. The utterances that comprised the final retelling now appeared to be linked across D’s own speech, that is, intralinguistically, thus creating segments of cohesive discourse.5 Speech (adult speech is in italics)
Can I tell you The Red Balloon? (C: OK)… The story goes like this.. Once upon a time there was a boy climbing up the pole and the little boy cut the balloon and then the little boy came down the balloon… and then went to the school bus and then the school bus took him to school and the and the balloon followed him
and and then the balloon followed him and back and forth back and forth all the way to school … and then he went to the mirror store, and then he went to the train store, then he went to the bakery and then the mirror store, he cannot find the balloon and then he- and then the balloon got lost in the mirror and then and then the bakery, he ate a donut and guess what he did? (C: what?) he cannot find that balloon, he lost it, he cannot find it (C: OK), he lost it and then the balloon- then what did the kids do? they stepped on it… they st- popped it and then all the balloons gathered around
and they flew away
D’s Body Motion (descriptions of gestures refer to underlined speech)* Legs crossed, As resting on knees.
LA up, moves down, IF extended LA moves down to rest on chair
RA up, moves in downward curve from R to C space, IF extended Moves head and RH, IF extended, side to side Sitting still.
Briefly lifts LA
BHs at waist level move out & back together Lifts BAs up in flying motion; one motion up on “flew” & a 2nd further up on “away” As move down
That’s the end of my story…(M: That’s a very good telling) *G=gesture; R=right; L=left; C=center; B=both; H=hand; A=arm; IF= index finger Figure 1: Retelling on Day 3 (abridged).
In summary, across the four sessions of D’s retellings, his speech-movement combinations changed from a preponderance of elicited speech with diffuse body motion on Day 1A, to enactments of reproduced speech on Day 1E, to connected speech in the absence of enactment on Days 2 and 3, at times using adult-like 5
This interpretation—of cohesively linked utterances on Day 3—is supported by the following underlined speech error, an inappropriately used pronoun “he,” and its transformation to the fully appropriate full noun phrase, also underlined, “the balloon”: He cannot find the balloon (sitting still). And then he- and then the balloon got lost in the mirror. (Day 3) This change suggests that D was now taking into account the informational status of referents in his own earlier speech, and thus, at least in this instance, linking spoken utterances on a truly intralinguistic basis.
TEMPORALLY COHERENT NARRATIVES 291
gestures. As his speech-movement combinations changed, his usage shifted from a reliance on adult speech to a reliance on his own earlier discourse. 3.
Temporal Coherence from Enactments of Elicited and Reproduced Speech
In this section I trace the discourse history of two passages in D’s later speech that illustrate the typical path of development outlined above. Both passages are taken from D’s narration on Day 3 (see Figure 1). My goal is to show how borrowed links from adult speech and the enactment of events in the original film may co-contribute to the development of temporally coherent discourse. 3.1
Passage 1
Once upon a time there was a boy climbing up the pole and the little boy cut the balloon (moves left arm up and down with index finger extended) Figure 2: Passage 1, excerpted from narration on Day 3.
The first passage is a two-utterance sequence that describes events at the start of the film (see Figure 2; compare to first shaded portion of Figure 1). The sequence originated on Day 1A in two predicates, climb up the pole and get the balloon, used by D’s mother in a single clause: (7)
M: The boy climbed the pole to get the balloon. (Day 1A)
Soon after, D’s mother prompted him to produce a similar sequence. (8)
(D sitting on couch with E.) M: You told us that the boy found a pole and he got the balloon by doing what to the pole? How did he get the balloon? He climbed up the pole (sits up and yawns). M: He climbed up the pole so he could what the balloon?...So he can get the balloon. M: So he could get the balloon, that’s right. (Day 1A)
Both elicited utterances in (8) were elaborated reproductions of the mother’s speech; her immediately preceding prompts (in 8) and her earlier speech in (7). D’s first spontaneous use of the predicate pair occurred in the next session, 1E, now accompanied by full-body enactment: (9) The little boy climbed up (moving arms up and down as if climbing up a pole), got the balloon (turning in a circle with arms up, as if holding a balloon over head). (Day 1E) The order of utterances paralleled their order in the mother’s speech in (7), the order in which they were elicited in (8), and the order of events in the original
292 ELENA LEVY film. D then linked the two utterances four times on Day 2, each a close reproduction of earlier speech, and now in the absence both of specific adult prompts, and of full-body enactment (see Figure 3). The utterance pairs reproduced the temporal sequence of the earlier, elicited and enacted utterances. Thus, utterances that were first elicited as individual utterances and linked through adult speech, were later linked through speech and enactment, and in the last retellings through speech alone, or with adult-like gestures. Day 2 first the little boy climbed the pole
Day 2 the little boy went up to the pole
Day 2 once upon a time there was a boy
Day 2 once upon a time the little boy climbed up to the pole
and then he went to… to the pole and then the little and then the little and then he got the and he [cut/got] the boy cut the balloon boy cut the balloon balloon balloon Figure 3: D’s repeated reproductions of predicate pair climb (up) the pole-cut/get the balloon on Day 2 (D is sitting on bed).
3.2
Passage 2
The second passage has a similar discourse history (see Figure 4, compare to second shaded portion of Figure 1). and then he went to the mirror store, and then he went to the train store, then he went to the bakery And then the mirror store, he cannot find the balloon. And then he- and then the balloon got lost in the mirror And then and then the bakery he ate a donut. He cannot find that balloon…he cannot find it, he lost it Figure 4: Passage 2, excerpted from narration on Day 3.
As in Passage 1, this example originated in social speech (see 10), and elicited utterances were then rehearsed with enactment (see 11), with the speech accompanying the enactments in the first person. (10) M: Can you tell me one place he went? To the mirror store…to the mirror store (sitting, looking down, moving leg)… M: And what happened at the mirror store, D? Then he lost it…M: And what happened after he lost the balloon?...He looked for it. (Day 1A) (11) (D has been out of room and walks back in.) When I get tired I went to the movies- I went to the mirror store (remains relatively still). Here’s the mirror store (walks outside room). I looked around (not visible; still outside room). (Day 1E)
TEMPORALLY COHERENT NARRATIVES 293
On Day 2 the same linguistic pattern (went to the mirror store/looked around) was reproduced and elaborated on, this time returning to the third person. The elaborated reproductions, carried out without full-body enactment, created a longer and more informative sequence. (12) (.1) (Sitting, rocking in circular motion.) And then he went to the mirror store, and then he went to the- then looked around [a little], and then he sawthen he looked for the balloon. (.2) Then he went to the train store, and then he looked over there. (.3) And then he went to the bakery, and then he looked over there. (Day 2) From Day 1E (11) to Day 2 (12), D transformed his earlier speech, adding, deleting and substituting words, and even rearranging whole clauses. That is, the first two utterances in (12.1), “he went to the mirror store” and, “he…looked around,” were near exact reproductions of utterances enacted in (11). This pattern was then reproduced twice, in (12.2) and (12.3), each time with linguistic substitutions; “the train store and “the bakery” for “the mirror store,” and “looked over there” for “looked around.” Day 3 (see Figure 4, reproduced in 13) continued the same pattern of linked predicates, went to ___ and looked ___, with further linguistic elaborations. (13) (.1) (Sitting still in chair, facing C.) And then he went to the mirror store, and then he went to the train store, then he went to the bakery. (.2) And then the mirror store, he cannot find the balloon. And then heand then the balloon got lost in the mirror. (.3) And then and then the bakery he ate a donut. He cannot find that balloon …he cannot find it, he lost it. (Day 3) D began this last segment with a list, “and then he went to the mirror store, and then he went to the train store, then he went to the bakery” (13.1), created by reproducing the first part of each utterance pattern from Day 2 (12.1-.3). The utterances that followed (13.2-.3) elaborated on this. In (13.2), a predicate from Day 2, “looked around…looked for the balloon” (12.1) was transformed to a related, but new predicate, “cannot find the balloon.” The new utterance presupposed the earlier one, its production requiring a small inferential leap. Also in (13.2), the utterance, “the balloon got lost in the mirror,” was an elaborated reproduction of speech borrowed from D’s own response to his mother’s prompts on Day 1A, “he lost it” in (10). In (13.3) D referred back to the bakery, and for the first time added new information about it, “he ate a donut.” He then reproduced the same two verbs from (13.2), cannot find and lost. This example adds the following to the developmental progression described above. It shows a use to which temporally linked utterances—earlier elicited, reproduced and enacted—can be put. D ‘cuts and pastes’ his own utterances to
294 ELENA LEVY create new ones, and adds newly propositionalized information to already rehearsed sequences. These are processes that have been observed to occur across the repeated retellings of typically developing children as well, and may contribute to how children infer the complex mental states of others, inferring cause and effect relationships between other people’s motivations and their actions (Levy, 2003; Levy & Fowler, 2005). That is, some instances of logically coherent discourse may depend on the earlier production of temporally coherent sequences of utterances; available as a basis for the drawing of inferences. 3.3
Summary and discussion
The discourse history of Passages 1 and 2 shows how temporal coherence can be built in incremental steps. Both passages originated in social speech, elicited as separate utterances and linked through adult intervention. The elicited utterances were later reproduced and enacted in temporal sequence, the order of enactments and their accompanying speech the same as the order of events in the original film and the order in which they were elicited and/or produced by adults. When reproduced again, the utterances were linked in speech alone, or with adultlike gestures. The second example shows how linguistic sequences can provide a foundation for further linguistic change—through rearrangements of earlier speech and by adding new utterances to ‘old’, temporally coherent structures, the new information derived either from inference, or from the original, visually perceived events (Levy, 2003; Levy & Fowler, 2005). Corresponding to changes in speech-movement combinations was a shift in discourse meaning. At first, events were described as single events, and later as parts of larger, temporally connected units, their meaning now including temporal relationships among reported events. I suggest that D’s appropriation of temporal sequencing relied on the joint contribution of movement and speech. This was possible because, to paraphrase the quotes from McNeill (1992) above, when D enacted events, his actual movements were themselves a dimension of meaning; he was what he was depicting, and his movements were motivated, as Vygotsky (1987) put it, by, “bonds actually existing,” in the original film. Children who develop more typically have been observed to make use of each process used by D to appropriate the meaning of observed events. This includes kinesthetic enactment (McNeill, 1992); the “borrowing” of entire discourse patterns, including temporal sequencing, from adult speech (Nelson, 1989; 2007; Levy & Nelson, 1994); the “cutting and pasting” of earlier speech (Levy, 2003); and the addition of new information to temporally linked linguistic structures (Levy & Fowler, 2005). My point is that the developmental path observed in D’s speech may form part of an ontogenetic progression for more typically developing children as well. This suggests that entire speech-movement combinations may need to be considered in accounts of the development of coherent discourse more broadly, and in this respect as well, “[w]e should speak
TEMPORALLY COHERENT NARRATIVES 295
not of language acquisition, but of language-gesture acquisition” (McNeill, 1992:295). 4.
Catchments and Temporal Coherence
If, in the speech of children acquiring fundamental discourse skills, kinesthetic enactment is used to build temporal coherence, what of this function in the discourse of older children and adults? Does the motor system continue to supply a channel for keeping track of events in sequence? McNeill’s research suggests that some adult gestures serve as cohesive devices, the cohesive function displayed in catchments, sequences of two or more gestures with features, as of handedness, shape and movement, that recur in part or in full (McNeill, 2005:117): “[W]hen working backwards, the recurring features offer clues to the cohesive linkages in the text with which [the catchment] co-occurs” (McNeill et al., 2001). From a different perspective, of narrators who “work forward” through a discourse, the question concerns the use of gestural features to help build coherence, not only to mark it; in other words, gestures serving a constitutive function. Susan Goldin-Meadow (2003) proposes that, more generally, gestures can serve a constitutive function when organizing information in speech. With respect to living space descriptions, McNeill et al. (2001) argue that gestural catchments can, “lead the way.” What I am suggesting is an extension of these views: that similar processes may hold in the organization of temporal information in narratives. That is, some gestural catchments may help speakers create temporally coherent discourse, supplying a second channel that works together with speech to keep track of narrative events in sequence. A final example from D (in Figure 5a) may provide insight into the development of this proposed function of gestures. In the analysis of this example I return to McNeill’s (1992) account of the ontogenesis of gesture: from early movements that, full-sized, “lack full separation from action,” to gestures that take place in the flattened disk in front of the speaker, and bear a more arbitrary relationship to what they denote. The example in Figure 5a consists of three catchments that compose most of the retellings of the last scene in the original film (hereafter, the ‘gathering’, ‘flying’, and ‘holding’ catchments). Across retellings, two of these—the ‘gathering’ and ‘flying’ catchments—were reduced in size, each illustrating a different part of the ontogenetic progression proposed by McNeill (1992). 4.1
Reduction of ‘gathering’ catchment
The ‘gathering’ catchment consisted of six gestures, including one false start, spread out in time from Days 1A to 3. All gestures accompanied the same spoken utterance, “all the balloons gathered around” (the false start accompanied
296 ELENA LEVY “and then the- all the balloons-”), depicting event (6) in Figure 5b6, each gesture apparently produced from the perspective of the balloons. As the catchment changed across retellings, gesture space was reduced from full-sized, to the flattened disk characteristic of the gestures of adults. In the first instance (column 2 of Figure 5a and example 14.2 below), D, while sitting down, rocked in a circular motion, as if enacting balloons moving in a circle. Example (14) shows that this may have been triggered by the preceding speech-movement combination, a less pronounced circling motion that accompanied the last preceding word, “everywhere” (14.1)—and receded afterwards (14.3):
(14) (.1) …and followed him everywhere (starts to rock in a circular motion on “everywhere”). (.2) And then all the balloons gathered around (rocks in more pronounced motion). (.3) And then once a- and then that’s the end (rocks in less pronounced motion). (Day 1A) The second instance of the catchment (column 3) carried more information than the first; it depicted a relationship between two sets of characters, the balloons, depicted by D’s arms, and the boy, depicted by D himself. Again sitting down, D stretched his right arm out and then down in a semicircular motion, to meet his left arm, which moved slightly up. This took place well outside of typical gesture space; in the ‘extreme right periphery’ identified by McNeill (1992:89). After this, with the exception of the false start on Day 3 (column 6), D’s arm movements became progressively smaller and less actionlike (columns 4, 7 and 8). The third instance (column 4) was similar to the second, but reduced in size: While sitting, D spread both arms apart and up, his left hand sweeping in, as if to indicate the balloons gathering around the boy. This now took place with both elbows bent, and thus remained in the ‘periphery’ (but not ‘extreme periphery’) of gesture space. Then, after the false start, the fifth instance (column 7) was again reduced in size, consisting of the left hand spiraling down from ‘upper left center’ to ‘lower right periphery’ and then ‘lower center periphery’. The gesture maintained the feature of circling, but now bore a more arbitrary relationship to what it depicted. The last instance of the catchment (column 8) was again more arbitrary. It was a further reduction of the second instance (column 3), with both hands moving out and back at waist level, creating a stylized depiction of the event. It
6
The list of events in Figure 5b is taken from a larger master list of events in the original film. See Passonneau, Goodkind & Levy (in press) for a description of guidelines for developing the master list.
The boy fly (flying catchment1) The boy flew tight (holding catchment1) by what? by balloons
Column 1 Day 1A
Column 3 Day1A and then the balloon- all the balloons gathered around (gathering catchment2)
Column 4 Day 1E all the balloons gathered around (gathering catchment3)
Column 5 Day 1E
Column 6 Day 2 and then theall the balloons(gathering catchment4)
Column 7 Day 2 all the balloons gathered around (gathering catchment5)
1) Wind carries balloons away from twins, baby, boy in park, concession stand etc. 2) Balloons fly out of windows and doors 3) Balloons gather and fly over roofs in city 4) Boy sits on top of hill with his popped balloon in hand 5) He looks up to see balloons 6) All the balloons descend to boy 7) Boy collects strings of all the balloons 8) Balloons lift boy into the air 9) Boy flies above city and further off into the sky Figure 5b: Description of final scene in original film
and then he floated then the kid and then the little flay- flied kid flied away (starts to form away (flying (flying catchment4) holding catchment3) catchment3) he hanged on tight and the boy and then say hanged from the wheee balloons (holding (holding catchment3) catchment2) (flying catchment2) Figure 5a: Catchments that depict the final scene of the original film: ‘flying’, ‘holding’ and ‘gathering’ catchments.
Column 2 Day 1A and then all the balloons gathered around (gathering catchment1)
Column 8 Day 3 and then all the balloons gathered around (gathering catchment6) and they flew away (flying catchment5)
298 ELENA LEVY now took place in the flattened disk, the location of most adult gestures. Thus looking from the first three instances of the catchment to the last two, the movements that composed the latter group were less highly motivated by the original, perceived events, and they became, as McNeill (1992:303) put it increasingly less “robust.” 4.2
Reduction of ‘flying’ catchment
Changes in the ‘flying’ catchment illustrate a different part of the ontogenetic progression outlined by McNeill (1992); from full-body enactment to movement of the hands and arms alone. The catchment consisted of a series of five movements depicting the last event, (9), in Figure 5b, the movements again reduced in size and space across retellings. The first instance of the catchment (column 1 of Figure 5a) was produced as part of the couch-to-couch sequence. Example (1.5) shows that D jumped up from the couch, moved forward, and flapped his arms while saying, “fly,” the vowel extended in time. Similar to the ‘gathering’ catchment, this first instance as well may have originated in an immediately preceding speech-movement combination; here, in the speech and gestures of D’s interlocutor, E. E’s questions, “Can you narrate it? Can you show me what happened? Can you tell me what happened?” were accompanied by successive shrugs that brought her arms up and down twice in a shrugging motion. The similarity between the two movements—E’s shrugging gestures and D’s flying movements—suggests that D’s speechmovement combination was triggered by the speech-movement combination presented to him by E. The second instance of the ‘flying’ catchment was again produced while D was on his feet (column 5), jumping twice while moving forward, and moving his arms as if flying. The last three instances (columns 6, 7 and 8) were less actionlike; all produced while sitting down, with both hands moving up and out, as if to depict flying. The last (column 8) was more stylized than the instances that preceded: D lifted both arms up in a flying motion, now divided into two parts, one movement up with the word “flew,” and the second out on “away.” Similar to the ‘gathering’ catchment, the gesture now bore a relatively more arbitrary relationship to what it depicted.
4.3
Rearrangement of Temporal Sequences
Looking at the discourse history of the final scene, the pattern of speech and movement suggests that, as in the history of Passages 1 and 2 (see Figures 2 and 4), D used his own enactments to build temporal coherence. In Passages 1 and 2 temporal sequences were put together via adult speech. In the development of the final scene, I suggest, D played a more active role, making use of enactment to build temporal sequences without immediate adult intervention.
TEMPORALLY COHERENT NARRATIVES 299
The following example shows how, in this interpretation, D used enactment to reorganize a sequence that was originally elicited in reverse temporal order (see description of couch-to-couch sequence on Day 1A). The original sequence consisted of two speech-enactment combinations (see 15 below and column 1 of Figure 5a): the first instance of the ‘holding’ catchment. (15) (.1) The boy fly-y-y-y (‘flying’ catchment1)…the boy flew… (.2) Tight…by balloons (‘holding’ catchment1) (recast by M as “oh, by holding on tight by balloons”). (Day 1A) The origin of (15.1) is described above (see section 4.2). (15.2) originated in further promptings by adults, for example, “how did he fly?” (see example 1.7). D responded by first clasping his hands over his head with his arms outstretched, as if holding onto the string of a balloon, and then whispering the single word, “tight.” Asked “by what,” he put his arms down, said “by balloons,” and walked to the couch and sat down. The entire multi-person speech-movement sequence was then recast by his mother in speech, “oh by holding on tight by balloons, oh okay, that’s what the words were” (example 1.8). On Day 1E this utterance sequence was reversed to create the temporally accurate sequence in (16.2) – (16.3). (See column 5 of Figure 5a.) (16) (.1) He floated (start of ‘holding’ catchment3). (.2) He hanged on tight (‘holding’ catchment3)… (.3) [pause] (‘flying’ catchment2)… (Day 1E) First, D reproduced the same sequence of spoken utterances as on Day 1A, with, “he floated” (16.1) substituted for the, “the boy flew” (15.1), and, “he hanged on tight” (16.2) an elaborated reproduction of his mother’s earlier recast (15.2). Utterances (15.2) and (16.2) were each accompanied by instances of the ‘holding’ catchment. Then, in the absence of speech, D produced an instance of the ‘flying’ catchment (16.3). This, with its preceding utterance (16.2), formed an accurate temporal sequence. The point of this example is the following. If, in D’s enactments, he was what he was depicting (McNeill, 1992), the enactment that accompanied the utterance in (16.2), depicting event (7) in Figure 5b, may have helped trigger the depiction of the later event, (9), in the relative order in which the two were perceived in the original film. The second event was manifested in discourse (16.3) as the already rehearsed flying movement. In this interpretation, as D spoke on Day 1E he worked out the sequence of events through a combination of speech and movement, both of his productions drawing on his earlier speech-movement combinations. A similar argument can be made for the next instantiation of the ‘flying’ catchment on Day 2 (column 6); that the speech-movement combination, “then the kid flay- flied away” with ‘flying’ catchment3, was triggered by the earlier instant-
300 ELENA LEVY iation of this catchment (in column 5). A second speech-movement combination in column 5, “he hanged on tight and then say wheee,” with ‘holding’ catchment3, was dropped, and the multimodal utterance that preceded this catchment’s earlier instantiation (in column 4), “all the balloons gathered around,” with ‘gathering’ catchment3, was substituted for it, creating a ‘gathering’-‘flying’ sequence. One can argue that, of the three events depicted by the catchments that composed the final scene, the two that remained—the ‘gathering’ and ‘flying’ catchments—were the most effective at, “pushing the communication forward” (Firbas, 1971). As column 8 of Figure 5a shows, these same two events—but not the less significant holding event—reappeared in the final retelling on Day 3. Thus, like the proposals of Goldin-Meadow (2003) and McNeill et al. (2001) concerning the constitutive function of gestures in other types of discourse contexts, the history of catchments in D’s retellings illustrates a proposed constitutive function of enactment in the choice of temporal sequences in narratives.
5.
Gestures as Residues of Earlier Enactments
To conclude, this analysis suggests the following. The coherence-creating function of some adult gestures, such as the catchments described by McNeill, may have originated in ontogenetically earlier enactment. In this account, some gestures are residues of earlier enactments, the coherence-creating function of enactment appropriated by hand gestures alone, and as such help older children and adults keep track of spatial and temporal relationships among events. This brings up the question, more generally, of the relationship between ontogenesis and development across shorter spans of time. Perhaps earlier-developing processes, in modified form, continue to be used across shorter time spans, as in meso- and even microgenesis. From this point of view, what we are perhaps witnessing in D’s speech is the appropriation by gestures of a function served in earlier ontogenesis by full-body enactment, the motor system still used to keep track of the sequencing of events. The appropriation of function may occur first within individual ‘islands of development’, as Tomasello (1992) proposed for other domains of language acquisition. In this view, adult-like relationships between form and function are first acquired for individual tokens of a linguistic type, as in the proposed transfer of function from enactment to gesture across individual catchments. The generalization of function from specific instances to other related forms—in this case the appropriation of a coherence-creating function by other gestures with recurring features—would await later use and development7.1Adult uses of gestures as coherence-creating devices, as in the examples of gestural catchments from McNeill, may show the results of this process. Then, as McNeill (1992) put it, the, “actual motion of the gesture itself,” remains a dimension of meaning. This 7
Compare with the use, in crib talk, of ‘islands of development’ in a two year old’s acquisition of the cohesive function of spoken utterances (Levy, 1999).
TEMPORALLY COHERENT NARRATIVES
301
is because the gesture remains, “the very image; not an ‘expression’ or ‘representation’ of it”—it “is it.”
References Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge: Cambridge University Press. Bruner, J. (1966). On cognitive growth, I and II. In J. Bruner, R. Olver, & P.M. Greenfield (Eds.), Studies in cognitive growth. New York: Wiley & Sons. Firbas, J. (1971). On the concept of communicative dynamism in the theory of functional sentence perspective. Philologica Pragensia, 8, 135-144. Goldin-Meadow, S. (2003). Hearing gesture: How our hands help us think. Cambridge, MA: Harvard University Press. Levy, E. T. (1999). A social-pragmatic account of the development of planned discourse. Human Development, 42, 225-246. Levy, E. T. (2003). The roots of coherence in discourse. Human Development, 46, 169-188. Levy, E. T. & Fowler, C. A. (2004-2005). How autistic children may use narrative discourse to scaffold coherent interpretations of events: A case study. Imagination, Cognition and Personality, 24(3), 207-244. Levy, E.T. & Nelson, K. (1994). Words in discourse: A dialectical approach to the acquisition of meaning and use. Journal of Child Language, 21, 367-389. Lewis, M. D. (2002). Interacting time scales in personality (and cognitive) development: Intentions, emotions, and emergent forms. In N. Granott & J. Parziale (Eds.), Microdevelopment: Transition processes in development and learning. Cambridge: Cambridge University Press, pp. 183-212. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press. McNeill, D., Quek, F., McCullough, K-E., Duncan, S., Furuyama, N., Bryll, R., Ma, X-F., & Ansari, R. (2001). Catchments, prosody and discourse. Gesture, 1(1), 9-33. Nelson, K. (1989). Representation of real-life experience. In K. Nelson (Ed.), Narratives from the crib. Cambridge, MA: Harvard University Press. Nelson, K. (2007). Young minds in social worlds: Experience, meaning, and memory. Cambridge, MA: Harvard University Press. Passonneau, R. J., Goodkind, A., & Levy, E. T. (in press). Annotation of children’s oral narrations: Modeling emergent narrative skills for computational applications. To appear in: Proceedings of the FLAIRS-20 Conference, Florida Artificial Intelligence Research Society, Special Track on Applied Natural Language Processing. Key West, Florida, May 7-9, 2007. AAAI Press. Piaget, J. (1954). The construction of reality in the child (M. Cook, trans.). New York: Basic Books. Tomasello, M. (1992). First verbs: A case study of early language development. Cambridge: Cambridge University Press. Vygotsky, L. (1987). Problems of general psychology. New York: Plenum. Werner, H. (1961). Comparative psychology of mental development. New York: Science Editions.
The Body in Communication Lessons from the Near-Human1 Justine Cassell Northwestern University
Embodied Conversational Agents (ECAs) are simulations of human communicative behavior, as well as human-computer interfaces. In this chapter I discuss the ways in which theories of the relationship between the verbal and nonverbal modalities (often those from David McNeill and his students) have influenced the development of ECAs and, vice-versa, the ways in which ECAs have contributred to the development of theories of the relationship between language, gesture, eye gaze, posture, and other nonverbal modalities in communication.
The story of the automaton had struck deep root into their souls and, in fact, a pernicious mistrust of human figures in general had begun to creep in. Many lovers, to be quite convinced that they were not enamoured of wooden dolls, would request their mistresses to sing and dance a little out of time, to embroider and knit, and play with their lapdogs, while listening to reading, etc., and, above all, not merely to listen, but also sometimes to talk, in such a manner as presupposed actual thought and feeling. (Hoffmann, 1817/1844:110)
1.
Introduction
It's the summer of 2005 and I'm teaching a group of linguists in a small Edinburgh classroom, as a part of the European Summer School in Logic, Language, and Information. The lesson consists of watching intently the conversational skills of a life-size virtual human projected on the screen at the front of the room. Most of the participants come from formal semantics; they are used to describing human language in terms of logical formulae, and usually see 1
The research reported in this chapter would not have been possible without David McNeill’s intellectual guidance and generous mentorship when I was a graduate student, and then without the hard work, persistence and insight of my own graduate students who have been so generous with their time and knowledge that they have quickly became my colleagues and teachers. Sincere thanks go to the members of Animated Conversation, Gesture and Narrative Language, and the ArticuLab. Thanks also to Ken Alder, Pablo Boczkowski, Sid Horton, Elena Levy, Jessica Riskin, Dan Schwartz, Lucy Suchman, and Matthew Stone, for careful and perceptive comments that greatly improved the quality of the manuscript. An earlier version of this chapter appeared in Riskin, J. (ed.) Genesis Redux. Chicago: University of Chicago Press.
304 JUSTINE CASSELL language as an expression of a person's intentions to communicate and from there issued directly out of that one person's mouth. I, on the other hand, come from a tradition that sees language as a genre of social practice, or interpersonal action, emergent and multiply-determined by social, personal, historical, and moment-tomoment linguistic contexts, and I am as likely to see language expressed by a person's hands and eyes as mouth and pen. As David McNeill’s graduate student pursuing a dual Ph.D. in linguistics and psychology in the 1980s I had felt blessed in many things, but profoundly inadequate in the presence of these scholars: their formalized theories belong to a particular kind of technical discourse that is constructed in opposition to every-day language (Agre, 1992) and that had seemed more scientific than my messy relational and embodied understanding of how language looks. Those feelings of inadequacy—paired with the real-life experience of having articles rejected from mainstream journals and conferences—led me to try to formalize or ‘scientify’ my work, undertaking a collaboration with computer scientists in 1993 to build a computational simulation of my hypotheses, a simulation that took the form of virtual humans who act off of a ‘grammar’ of rules about human communication. In turn, that simulation has, in the manner of all iconic representations, turned out to both reveal and obscure my original goals, depending on what the technical features of the model can and cannot handle. And the simulation has, like many scientific instruments, taken on a life of its own—almost literally in this instance—as the virtual human has come to be a playmate for children, a teaching device for soldiers, and a companion on cell phones, a mode of interacting with computers as well as a simulation that runs on computers. But back to the classroom in Edinburgh. In the intervening fifteen years since graduate school, I have armed myself with a sexy demo to show other scientists and, thanks in large part to David McNeill’s continued insights into the nature of gesture and its relationship to language, times have changed and the notion that language is embodied is somewhat more accepted in linguistics today. And so these formal linguists have chosen to attend the summer school class on
Figure 1: NUMACK, the Northwestern University Multimodal Autonomous Conversational Kiosk.
THE BODY IN COMMUNICATION 305
‘face-to-face pragmatics’ that I am co-teaching. In the conversation today I’m trying to convince them of two points: that linguists should study videotapes and not just audiotapes, and that we can learn something important about human language by studying embodied conversational agents—fake humans who are capable of carrying on a (very limited) conversation with real humans—such as the one we call NUMACK, who is depicted in Figure 1. I show the students a brand new video of NUMACK (the Northwestern University Multimodal Autonomous Conversational Kiosk) interacting with a real human, a simulation of our very latest work on the relationship between gesture and language during direction-giving. On the basis of an examination of 10 people giving directions to a particular place across campus, my students and I have tried to extract generalities at a fine enough level of detail to be able to understand what the humans are doing, and to use that understanding to program our virtual humans to give directions in the same way as humans do. The work exemplified in this particular video has concentrated on the shape of the people's hands as they give directions, and on what kind of information they choose to give in speech and what kind in gesture. I'm excited to share this work, which has taken over a year to complete—moment by moment investigations into the minutiae of human gesture and language extracted from endless examinations of videotapes that show four views of a conversation (as shown in Figure 2), followed by complicated and novel implementations of a computer system that can behave in the same way. In fact, this will be the first time I see the newly updated system myself, as I’ve been traveling and my graduate students finished up the programming and filmed the demo.
Figure 2: Analysis of videotapes allows us to draw generalizations about human – human direction-giving.
306 JUSTINE CASSELL The Edinburgh linguists and I together watch the video of NUMACK giving directions to a person and it looks terrible! The small group of students tries to look down so as not to reveal that they don't think this is a fitting culmination of one year of work. I break the silence and say, “it looks ridiculous! Something is really off here. What is wrong with it? Can anybody help me figure out why it looks so non-human?” The students look surprised—after all, NUMACK looks non-human along hundreds of dimensions (starting with the fact that it is purple). Used as they are to seeing impeccably animated characters in movies and on webpages, they have expected to hurt my feelings by criticizing the virtual human’s poor rendering of reality. But, as we watch the video over and over again, what becomes salient is the way in which NUMACK’s interaction violates our intuitions about how direction-giving should look. After 3 or 4 viewings one of them suggests that the two hands of the computer-programmed agent operate independently in giving directions. The virtual human says, “take a right” and gestures with his right hand. He then says, “Take a left” and gestures with his left hand. I've never thought about this before, but in looking at the robot I am struck by the fact that we humans don't do that—we must have some kind of cohesion in our gestures that makes us use the same hand over and over for the same set of directions. Another student points out that the virtual human describes the entire route (roughly fourteen “turn left,” “turn right,” “go straight ahead” kinds of segments) at once, with only a “uh huh” on the part of the real human—no real human would do that—the directions are too long and couldn’t possibly be remembered in their entirety. I am thrilled and once again amazed at how much I learn about human behavior when I try to recreate it—in particular when, and because, my imitations are partial and imperfect. Only when I try to reproduce the processes in the individual that go into making embodied language, do I get such a clear picture of what I don't yet know. For example, here I have realized that we will need to go back to our ten real human direction-givers and look at their choice of hands — can I draw any generalizations about the contexts in which they use their right hand or their left? When is the same hand used over and over, and when do they switch to a different hand? Likewise, we will need to go back to our real human direction-givers and look further at the emergent properties of the directions. What behaviors signal to the direction-giver when to pause and when to continue, when to elaborate and when to repeat? What embodied and verbal actions serve to alert the two participants to that the message has been taken up and understood, and the next part of the message can be conveyed? I am also struck once again at the extent to which people are willing to engage with the virtual human, both as participants in a conversation about how to get to the campus chapel, and as participants in a conversation about the holes in our theory of the relationship between verbal and nonverbal elements in conversation. I have learned something about the particularities of human communication here despite the fact that what I am viewing is a freak of artificial nature—a virtual human that is both generic and very particular, general and very detailed.
THE BODY IN COMMUNICATION 307
In fact, for the experiment to work, we depend in part on the not-so-laudable schemas and expectations of our viewers and ourselves—that there can be such a thing as the unmarked or generic human, which probably entails, for a directiongiving robot, that it is male and humanoid (albeit purple) and that its voice is Caucasian and American. As Nass & Brave (2005) point out, violating cultural assumptions about expertise and gender or race produces distrust on the part of users. But, in the current case, these largely unconscious assumptions on the part of the scientists examining the simulation are what allow them to identify as failings, not a lack of personality or cultural identity in the virtual human, but simply that the hands are not synchronized. And thus, in this simulation, I have learned something about human communication despite all of the ways in which this virtual human is not very human at all. Below I will return to question these assumptions, but for the moment let us return to the fundamental questions that guide this work. AI investigators and their acolytes, like automata makers before them, ask, “Can we make a mechanical human? (or, in the weaker version, “a human-like machine”)” I would rather ask, “what can we learn about humans when we make a machine that evokes humanness in us—a machine that acts human enough that we respond to it as we respond to another human? (where I mean both responds to us in our status of interlocutor, or of scientist).” Some researchers are interested in stretching the limits and capabilities of the machine, or interested in stretching the limits of what we consider human by building increasingly human machines. In my own work, at the end of the day I’m less interested in the properties of machines than in the properties of humans. For me there are two kinds of “ah ha!” moments: to learn from my successes by watching a person turn to one of my virtual humans and unconsciously nod and carry on a conversation replete with gestures and intent eye gaze. And to learn from my failures by watching the ways in which the real human is uncomfortable in the interaction, or the interaction looks wrong, as I illustrated in the Edinburgh classroom. These simulations serve as sufficiency proofs for partial theories of human behavior—what Keller has described as the second historical stage in the use of simulation and computer modeling (Keller, 2003)—and thus my goal is to build a virtual human to whom people can’t stop themselves from reacting in human-like ways, about whom people can’t prevent themselves from applying native speaker intuitions. And key to the enterprise is the fact that those theories of human behavior and those native speaker intuitions refer to the whole body, as it enacts conversations with other bodies in the physical world. In the remainder of this chapter I’m going to talk about my work on one particular kind of virtual human called an Embodied Conversational Agent (ECA), in terms of its duality as a simulation and as an interface. That is, I will describe how these virtual humans have allowed me to test hypotheses about human conversation, and what they have taught me by their flaws. In this way, I
308 JUSTINE CASSELL hope to illuminate the kinds of conversations that these virtual humans engage when scientists use them as tools to study conversational phenomena. 2.
Embodied Conversational Agents as Conversational Simulations
Just to be clear about our terms, Embodied Conversational Agents (ECAs) are cartoon-like, often life-size, depictions of virtual humans that are projected on a screen. They have bodies that look more-or-less human-like, they are capable of initiating and responding in (very limited) conversations (in pre-set domains) with real humans, and they have agency in the sense that they behave autonomously, in the moment of their deployment, without anybody pulling the strings. Of course, this agency relies on a prior pre-set network of interactions between their inventors, their users, and the sociotechnical context of their deployment. As a point of contrast, consider chat bots or chatterbots. Chat bots (such as the popular Alice, http://www.alicebot.org/, which readers can try out for themselves) rely on a mixture of matching input sentences to templates, stock responses and conversational tricks (such as, “what makes you say X [where X is what the user typed in]” or, “I would need a more complicated algorithm to answer that question” when they don’t understand). Chat bots often communicate with viewers only through text, but when embodied, generally they only have a head, and a head that displays only the most rudimentary of behaviors (blinking, looking left and right). Embodied Conversational Agents, on the other hand are by definition models of human behavior, which means that at least along some dimension they must function in the same way humans do. Thus, Wang’s et al. (Wang, Johnson, Rizzo, Shaw & Mayer, 2005) pedagogical agent and Walker’s et al. (Walker, Cahn & Whittaker, 1997) virtual actor both rely on Brown and Levinson’s theory of politeness and language use (Brown & Levinson, 1987). Poggi and Pelachaud (Poggi & Pelachaud, 2000) base the facial expressions of their ECA on Austin’s theory of performatives (Austin, 1962). Likewise, ECAs are full functioning Artificial Intelligence systems in the sense that they understand language by composing meanings for sentences out of the meanings of words, they deliberate over an appropriate response, deliver the response, and then remember what they said so as to make the subsequent conversation coherent. They mostly have both heads and bodies, and their behavior is based on an observation of human behavior.
THE BODY IN COMMUNICATION 309
Figure 3: REA, the virtual Real Estate Agent.
Figure 3 shows an ECA named REA (for Real Estate Agent) who was programmed on the basis of a detailed examination into the behavior of realtors and clients. Over a period of roughly five years, various graduate students, postdocs and colleagues in my research group studied different aspects of housebuying talk, and then incorporated their findings into the ECA. Hao Yan looked at what features of a house description were likely to be expressed in hand gestures, and what features in speech (Yan, 2000). Yukiko Nakano discovered that posture shifts were correlated with shifts in conversational topic and shifts in whose turn it was to talk (Cassell, Nakano, Bickmore, Sidner & Rich, 2001). Tim Bickmore examined the ways in which small talk was employed to establish trust and rapport between realtor and client (Bickmore & Cassell, 1999). Earlier work by Scott Prevost on intonation (Prevost, 1996), and by Obed Torres on patterns of eye gaze (Torres, Cassell & Prevost, 1997) also went into the implementation. As our research into human conversation progressed, we also came to better understand some of the overall properties of human conversation, and those insights were also incorporated. The result was a virtual woman who tried to sell a home to whoever approached her. A small camera on top of the screen allowed her to detect the presence of real humans and initiate a conversation with them. Her knowledge of appropriate realtor conversation led her to ask questions about the person’s housing needs and then nod, seem to reflect, and pull up data on appropriate properties, describing them using a combination of descriptive hand gestures, head movements and spoken language.
310
JUSTINE CASSELL
3.
A Sample Interaction with an ECA Rea is gazing idly about. As Mike moves within range of the two cameras mounted above the screen, Rea turns to face him and says: REA> Hello. How can I help you?
MIKE> I'm looking to buy a place near MIT. Rea nods, indicating that she is following.
REA> I have a house to show you.
A picture of a house appears on-screen behind Rea who blinks and looks at the house and then at Mike. REA> It’s in Somerville. MIKE> Tell me about it. Rea briefly looks up and away while she plans what to say REA> It’s big. Rea makes an expansive gesture with her hands.
Mike brings his hands up as if he is about to speak and so Rea does not continue; instead waiting for him to speak. MIKE> Tell me more about it REA> Sure thing. It has a nice garden. Rea sketches a curved gesture with her hands indicating that the garden extends along two sides of the house MIKE> How far is it? REA> It is five minutes to the Porter Square T station.
Rea makes it clear that it is five minutes on foot from the T station by making a walking gesture with her fingers MIKE> How big is the house? REA> It has four bedrooms, three bathrooms. . .
Figure 4: Transcript of an interaction between REA and a person.
Figure 4 illustrates an actual scenario between a human user and REA. 4.
Conversational Properties
As well as demonstrating the use of several conversational modalities, such as speech, hand gestures, and head movements, in this example Rea is engaging in some very subtle human-like behavior that demonstrates four of the key properties of human face-to-face conversation that have become the axes along which our investigations of conversation have proceeded. Those four properties are (1) the distinction between interactional and propositional functions of language and conversation; (2) the distinction between conversational behaviors (such as eyebrow raises) and conversational functions (such as turn taking); (3) the importance of timing among conversational behaviors; (4) the deployment of each modality to do what it does best. Our insights into each of these properties has come though the cycle of watching real humans, attempting to model what we see in virtual humans, observing the result or observing people interacting with the result.
THE BODY IN COMMUNICATION 311
4.1 Division between propositional and interactional functions Some of the things that people say to one another move the conversation forward, while others regulate the conversational process. Propositional information corresponds to the content (sometimes referred to as transmission of information) and includes meaningful speech as well as hand gestures that represent something, such as punching a fist forward while saying, “she gave him one” (indicating that the speaker’s meaning is that she punched him, and not that she gave him a present). Interactional information regulates the conversational process and includes a range of non-verbal behaviors (quick head nods to indicate that one is following, bringing one’s hands to one’s lap and turning to the listener to indicate that one is giving up one’s turn) as well as sociocentric speech (“huh?”, “do go on”). It should be clear from these examples that both functions may be filled by either verbal or non-verbal means. Thus, in the dialogue excerpted above, Rea’s non-verbal behaviors sometimes contribute propositions to the discourse, such as the gesture that indicates that the house in question is five minutes on foot from the T stop, and sometimes regulate the interaction, such as the head-nod that indicates that Rea has understood Mike’s utterance. 4.2
Distinction between function and behavior
When humans converse, few of their behaviors are hard-coded. That is, there is no mechanism or database ‘look-up table’ that gives the appropriate response for every possible conversational move on the part of one’s partner. Gestures and head movements are no more likely to be routinized—head nods will look different if we are looking up at a taller interlocutor or down at somebody short, if we are wearing a hat or bareheaded. And other than emblems, gestures display a great variety across people and even within one person across time. For example, speakers do not always nod when they understand. Instead they sometimes signal that they are following along by making agreement noises such as, “uh huh”. In our simulation of this behavior, then, instead of hard-coding, the emphasis is on identifying the high level structural elements that make up a conversation. These elements are then described in terms of their role or function in the exchange. Typical discourse functions include ‘conversation invitation’, ‘turn taking’, ‘providing feedback’, ‘contrast and emphasis’, and ‘breaking away’. Each function can be filled through a number of different behaviors, in one or several modalities. The form given to a particular discourse function depends on, among other things, current availability of modalities such as the face and the hands, type of conversation, cultural patterns and personal style. In the REA embodied conversational agent, Rea generates speech, gesture and facial expressions based on the current conversational state, the conversational function she is trying to convey, and the availability of her hands, head and face to engage in the desired behavior. For example, when the user first approaches Rea (“User Present” state), she signals her openness to engage in
312 JUSTINE CASSELL conversation by looking at the user, smiling, and/or tossing her head. Figure 5 shows a visualization of REA’s internal state with respect to conversational behaviors and conversational states
Figure 5: Visualization of ECA and human conversational state.
4.3
Importance of timing
The relative timing of conversational behaviors plays a large role in determining their meaning. That is, for example, the meaning of a nod is determined by where it occurs in an utterance, all the way down to the 200 millisecond scale (consider the difference between, “you did a [great job]” (square brackets indicate the temporal extent of the nod) and, “you did a [. . .] great job”). Thus, in the dialogue above, Rea says, “it is five minutes from the Porter Square T station,” at exactly the same time as she performs a walking gesture. If the gesture occurred in another context, it could mean something quite different; if it occurred during silence, it could simply indicate Rea’s desire to take the turn. Although it has long been known that the most effortful part of a gesture cooccurs with the part of an utterance that receives prosodic stress (Kendon, 1972), it wasn’t until researchers needed to generate gestures along with speech in an ECA—and therefore needed to know the details of the context in which one was most likely to find contentful gestures—that it was discovered that the gesture is most likely to co-occur with the rhematic (Halliday, 1967) or new contribution part of an utterance. This means that if a speaker is pointing to her new vehicle and saying, “this car is amazingly comfortable. In fact, this car actually has reclining seats,” the phrase “amazingly comfortable” would be the rheme in the first sentence, because car is redundant (since the speaker is pointing to it) and “reclining seats” would be the rheme in the second sentence, because car has already been mentioned. Therefore, the speaker would be most likely to produce hand gestures with “amazingly comfortable” and “reclining seats”).
THE BODY IN COMMUNICATION 313
4.4
Using the modalities to do what they do best
In e-mail, we are obliged to compress all of our communication goals into textual form (plus the occasional emoticon). In face-to-face conversation, on the other hand, humans have many more modalities of expression at their disposal, and they depend on each of those means, and various combinations amongst them, to communicate what they want to say. They use gestures to indicate things that may be hard to represent in speech, such as spatial relationships among objects (Cassell, Stone & Yan, 2000), and they depend on the ability to simultaneously use speech and gesture in order to communicate quickly. In this sense, face-toface conversation may allow us to be maximally efficient or, in other instances, to use conversation to do other kinds of work than information transmission (for example, we may use the body to indicate rapport with others, while language is getting task work done). In the dialogue reproduced above, Rea takes advantage of the hands’ ability to represent spatial relations among objects and places by using her hands to indicate the shape of the garden (sketching a curved gesture around an imaginary house) while her speech gives a positive assessment of it (“it has a nice garden”). However, in order to produce this description, the ECA needs to know something about the relative representational properties of speech and gesture, something about how to merge simultaneous descriptions in two modalities, and something about what her listener does and does not already know about the house in question. The need to understand how speech and gesture and facial/head movements can be produced together by ECAs has forced me to design experimental and naturalistic methodologies to look at the nature of the interaction between modalities, and has resulted in significant advances in my theorizing about the relationship between speech and gesture in humans. Thus, for example, in my current work, with not REA but the purple virtual robot NUMACK as a simulation, Paul Tepper, Stefan Kopp and I have become interested in the seeming paradox of how gesture communicates, given that there are no standards of form in spontaneous gesture—no consistent form-meaning mappings. Some gestures clearly depict visually what the speaker is saying verbally, and these gestures are known as iconics. But, what is depicted on the fingers, and its relationship to what is said, can be more or less obvious. And two speakers’ depiction of the same thing can be quite different. An example comes from the comparison of two people describing the same landmark on Northwestern University’s campus: an arch that signals the beginning of the campus, and that lies at the intersection of Sheridan Road and Chicago Avenue. In order to collect these data, we hid prizes in various spots on campus, and asked one student, who knew where the prize was hidden, to give directions to the prize to a second student. If the second student was successful in finding it, the two shared the prize (and both were entered into a drawing for an iPod, probably the most motivating feature of the experiment!). The direction-giving was videotaped using 4 cameras trained on different parts of the bodies of the two speakers, as described above (and shown in Figure 2), and
314 JUSTINE CASSELL then each gesture was transcribed, along with the speech that accompanied it, for further study. One speaker in the experiment, describing directions to a church near the arch, said, “go to the arch” and with his fingertips touching one another with the fingers pointing upwards, made a kind of teepee shape. In this instance, the gesture seemed to indicate a generic arch. Compare that gesture to the following, made by another participant in the experiment who, while referring to that same arch, said, “you know the arch?” but this time, although his fingertips were touching one another, the fingers were pointing towards the listener and the thumbs up, making the shape of a right angle. In this instance, the gesture seems to indicate . . . what? An arch lying on its side? It makes, in fact, no sense to us as observers . . . unless we know that the arch is located at the right angle formed by Sheridan Road and Chicago Avenue. And this interpretation of the gesture is supported by the speaker’s next utterance, “it’s located at the corner of Sheridan.” So, in the absence of the relatively stable form-meaning pairing that language enjoys (the same image may not be evoked for both of us, but when I say, “right angle,” I can be relatively sure that you will interpret it to mean something along the lines of a right angle), how do gestures communicate? The answer to this question (which is outside the scope of this chapter, but has to do with the fact that gestures have a kind of interpretive flexibility, and have meanings only in situated contexts) resulted both in a new computational architecture whereby gesture and speech are computationally generated together, and a new way of understanding of how gestures communicate among humans. 5.
Translating Conversational Properties into Computational Architectures
The four conversational properties discussed in the previous section gave rise in 2000 to a computational architecture that can be seen in Figure 6: As this diagram makes clear, and like many systems in Artificial Intelligence, ECAs are largely linear and devoid of contingent functionality—the real human asks a question, which is collected by the input modules of the system (cameras to view the speaker’s gestures and posture, microphones to hear the speech) and then interpreted into a unified understanding of what the speaker meant. In turn, that understanding is translated into some kind of obligation to respond. That response is planned out first in “thought” or communicative intention, and then in speech and movements of the animated body, face, and hands through the use of a speech synthesizer, computer graphics engine and various other output modes. Meanwhile, so as not to wait for all of that processing to be completed before a response is generated, a certain number of hardwired responses are sent to be realized: head nods, phatic noises (mmm, uh huh) and shifts of the body. The linear nature of this architecture is one of the constraints imposed by the scientific instrumen—like trying to cut out circles with straight blades. When I first began to collaborate with computer scientists in 1993-1994 to build a virtual human I asked them to build one that was responsive to itself and to its
THE BODY IN COMMUNICATION 315
Figure 6: Computational architecture of an ECA.
interlocutor in a number of ways. I told them that I wanted the virtual human to be able to see its own hands, and from what it saw decide what it wanted to say in the moment—the way humans often do, such as when they can’t recall a word until they make the gesture for it. And I told them I wanted some kind of entrainment or accommodation between the different participants in the conversation, such that their language and gesture grew increasingly alike, as they came to mirror one another. The response was incredulity and a request for me to be better informed before I went asking for features. The goal, I was told, was autonomy and not codependence. Of course, as Suchman has pointed out about other work in Artificial Intelligence, this means that we have not produced a truly conversational agent, since, “interaction is a name for the ongoing, contingent co-production of a shared social/material world” (Suchman, 2003:7). But the kinds of interdependence that we wish to simulate are hard to achieve given our current models. In general terms, however, building ECAs has forced researchers of human behavior to attend to the integration of modalities and behaviors in a way that merges approaches from fields that for the most part do not speak to one another: ethnomethodological interpretive and holistic studies of human communication with psycholinguistic experimental isolative studies of particular communicative phenomena. To build a human entails understanding the context in which one finds each behavior—and that context is the other behaviors. During that first collaboration with computer scientists in 1993-1994, when we were building the very first of these animated embodied conversational agents, each of the parts of the body was being implemented by a different researcher. Catherine Pelachaud was writing the algorithms to drive the character’s facial movements—head nods, eye gaze, etc.—based on conversational parameters such as who had the turn. Scott Prevost was writing rules to generate appropriate intonation—the prosody of human language—on the basis of the relationship between the current utterance and previous utterances. I myself was working on where to insert gestures into the dialogue. After several months of work, we finally had a working system. In those days, ECAs needed to be “rendered”—they
316 JUSTINE CASSELL were not real-time—and so with bated breath we ran the simulation, copied it to videodisc, and then watched the video. The result was an embodied conversational agent who looked like he was speaking to very small children, or to foreigners. That is, the resultant virtual human used so many nonverbal behaviors that signaled the same thing, that he seemed to be trying to explain something to a listener who didn’t speak his own language or was just very stupid. This system, called Animated Conversation, was first shown at SIGGRAPH, the largest Computer Graphics conference, in front of an audience of 4000 researchers and professional animators (the folks who build cartoons and interactive characters) and they found it hilarious. To my mind, on the other hand, we had made a huge advance. We had realized that the phenomena of hand gesture, intonation and facial expression were not separate systems, nor was one a “translation” of the others, but instead had to be derived from one common set of communicative goals. That was the only explanation for the perception of over-emphasizing each concept through a multiplicity of communicative means. The result fundamentally changed the way we build embodied conversational agents, but it was an advance in my understanding of human communication as well. It led me back to the fields of conversational analysis and ethnomethodology (not a part of the McNeill curriculum when I was a graduate student, but very much a part of what I read in the courses I took with Michael Silverstein and others), which of course had never deviated from this holistic understanding of human communication. It also led to a design methodology that I have relied on ever since, and that is represented in Figure 7. Iteratively, my students and I collect data on human-human conversation, interpret those data in such a way as to build a formal model, implement a virtual human on the basis of the model, confront the virtual human with a real human, evaluate the results, and collect more data on human-human communication if needed (a side effect of this methodology is the need to confront
Figure 7: Methodology for modeling human conversation and building ECAs.
the response of lay viewers to the necessary flaws and lacunae in the implementation, but I try to think of that as character building).
THE BODY IN COMMUNICATION 317
It should be reiterated that building a computational system has traditionally demanded a formal or predictive model. That is, in addition to being able to interpret why a particular experience occurs in a particular context, one must also be able to predict in the future what set of conditions will give rise to a particular experience so that one can generate those behaviors in the ECA in response to the appropriate conditions. Unfortunately, predictive models also come with their own baggage, as they tend to underscore the way in which fixed sets of conditions give rise to fixed outputs, as opposed to highlighting the very contingent co-produced nature of human conversation where, on the fly, hearers and speakers influence one another’s language and indeed their very thinking patterns. In this sense, I sometimes worry that building computational simulations of this sort may set back the study of language; that phenomena that cannot yet be modeled in virtual people will be ignored. On the other hand, for the most part, before the advent of embodied conversational agents, computational linguistics and work on dialogue systems (which arose from the Cognitive Sciences—psychology, linguistics, philosophy, computer science) concentrated on the propositional functions of language, which were thought by many linguists to be the primary if not the only function of language. Before ECAs computational models of language were capable only of simulating task talk, bereft of social context, and bereft of nonverbal behavior. And given the power of these computational models, perhaps the arrival of ECAs with their attendant attention to the non-informational, and socially-contextualized, functions of language have played some positive role in the Cognitive Sciences. More hopefully, even, now that there has been a decade of research on Embodied Conversational Agents, several researchers, including myself, are beginning to explore other kinds of computational architectures and techniques that do not require deterministic formal input-output style models of conversation. Probabilistic techniques, such as spreading activation, Bayesian reinforcement learning, and Partially Observable Markov Decision Processes, are being applied to the newest phenomena to be modeled with ECAs. These phenomena, which tend to have more to do with social context than local linguistic context, include the effect of emotion on verbal and nonverbal behavior in conversation, the role of personality and cultural differences, social influence, etiquette, and relationshipbuilding. In all of these implementation experiments, embodied conversational agents are tools to think with, much like other computer software and hardware that allows us to evaluate our own performance in the world (Turkle, 1995). They allow us to evaluate our hypotheses about the relationship between verbal and nonverbal behavior, and to see what gaps exist in our knowledge about human communication, by seeing ourselves and our conversational partners in the machine. How do we go about evaluating our hypotheses? As described above, we watch the virtual humans and observe our own reactions. But, we also put others in front of these ECAs and examine the differences between their behavior with ECAs and their behaviors with other humans. This second kind of experiment
318 JUSTINE CASSELL relies on the supposition that correctly implemented virtual humans evoke human-
Figure 8: Analysis of grounding behaviors in Human-ECA conversation.
like behavior. In this instance, mechanisms that seem human make us attribute humanness/aliveness to them, and that make us act human and alive. Successful virtual humans evoke distinctly human characteristics in our interaction with them. The psychological approach to artificial life leads to functional bodies that are easy to interact with, “natural” in a particular sense: they evoke a response. Yukiko Nakano and I carried out a study of the role of nonverbal behaviors in grounding, and how these behaviors could be implemented in a virtual human (Nakano, Reinstein, Stocky & Cassell, 2003). Common ground can be thought of as the sum of mutual knowledge, mutual beliefs and mutual suppositions necessary for a particular stage of a conversation (Clark, 1992). Grounding refers to the ways in which speakers and listeners ensure that the common ground is updated, such that the participants understand one another. Grounding may occur by nodding to indicate that one is following, by asking for clarifications when one doesn’t understand, or by uttering requests for feedback, such as, “you know what I mean?” Here too an extensive study of human-human behavior in the domain of direction-giving paved the way for the implementation of an ECA that could ground while giving directions using a map and using hand gestures. And here too we evaluated our work by comparing people’s reactions to two versions of the virtual human, in which one demonstrated grounding behaviors, and the other had the grounding “turned off.” When the behaviors were turned off, the person simply acted as if she were in front of a kiosk and not another human—not gazing at the ECA or looking back and forth between him and the map. When the ECA did engage in grounding behaviors, the human acted strikingly . . . human, looking back and forth between the map and the ECA, as shown in Figure 8. A final example, and perhaps the most illustrative of the ways in which ECAs properly constructed, on the basis of theories elaborated from human observation, can elicit human-like behavior (and how this behavior can be illuminating along both positive and negative dimensions) is an experiment in which we endowed REA with social chit-chat skills (Cassell & Bickmore, 2002). As mentioned above, Tim Bickmore carried out an extensive study of small talk in
THE BODY IN COMMUNICATION 319
realtors and traveling salesmen. The results indicated that small talk was not randomly inserted into conversation, but served specific purposes, including to minimize the potential face threat of personal questions (such as, “how much do you earn”). These functions of small talk could be simulated in such a way as to allow us to implement a small-talking realtor, who used chit-chat to smooth the rails of a house-selling transaction with a human. In order to test our model of human conversation, we asked people to interact with one of two versions of the ECA. One used task-talk only, while the other added social chit-chat at key places in the interaction. The people who interacted with each ECA were asked to evaluate their experience: how natural they felt the interaction to be, how much they liked the ECA, how warm they felt she was, how trustworthy. We also tested the subjects on their own social skills, dividing them into extroverts and introverts using a common psychological scale. The results showed that extroverts preferred the small talk version of the ECA while introverts preferred the ECA to keep to the task (we also discovered that it was difficult to find extroverts among the MIT students, but that’s another story). An introvert in the small talk condition remarked: REA exemplifies some things that some people, for example my wife, would have sat down and chatted with her a lot more than I would have. Her conversational style seemed to me to be more applicable to women, frankly, than to me. I come in and I shop and I get the hell out. She seemed to want to start a basis for understanding each other, and I would glean that in terms of our business interaction as compared to chit chat. I will form a sense of her character as we go over our business as compared to our personal life. Whereas my wife would want to know about her life and her dog, whereas I really couldn’t give a damn.
An extrovert in the same condition had a very different response: I thought she was pretty good. You know, I can small talk with somebody for a long time. It’s how I get comfortable with someone, and how I get to trust them, and understand how trustworthy they are, so I use that as a tool for myself.
Clearly, the people in this experiment are evaluating the ECA’s behaviors in much the same way as they would evaluate a flesh-and-blood realtor. And clearly, our unexamined implementation of the realtor as a woman instead of a man has played into those evaluations, as much as have any of our carefully examined decisions about small talk, hand gestures and body posture. Although our goal was to obtain input into a theory of the role of small talk in task talk, this response from one of REA’s interlocutors effectively demolishes the claim that human identity can be denuded of its material aspects. Much of previous work on responses to ECAs as interfaces has in fact concentrated on exactly this sort of effect, with some researchers advising industry executives to implement a female ECA to sell phone service, but a male ECA to sell cars (cf. Nass & Brave, 2005). In response to this unintended research finding in our small talk study, my
320 JUSTINE CASSELL students and I have begun to use the virtual human paradigm to investigate explicitly which linguistic, nonverbal, and visual cues signal aspects of identity. Some have suggested that the race of ECAs be paired to the putative race of the user; my students and I have begun to look at the complex topic of racial identity, and how a person’s construction of his/her own race, and recognition of the racial identity of others, may be conveyed not just by skin color, but (also) by aspects of linguistic practice, patterns of nonverbal behavior and narrative style (Cassell, Tartaro, Oza, Rankin & Tse, forthcoming). 6.
Embodied Conversational Agents as Interfaces
I’ve alluded to other ways in which ECAs are used, where they serve not as scientific instruments or tools to think with, but interfaces to computers. In this function, ECAs might take the place of a keyboard, screen and mouse—the human speaks to them instead of typing. Or they might represent the user in an online chat room. ECAs can also serve as teachers or tutors in educational software—socalled “pedagogical agents.” Research in this applied science examines whether ECAs are preferable to other modalities of human-computer interaction such as text or speech; what kinds of behaviors make the ECAs most believable, and most effective (as tutors, information retrievers, avatars); and what personas the ECA should adopt in order to be accepted by their users. My students and I have also conducted some of this research, looking at whether virtual children are effective learning companions for literacy skills (Ryokai, Vaucelle & Cassell, 2003), whether people are willing to be represented by ECAs in online conversations (Vilhjalmsson & Cassell, 1998), and whether tiny ECAs—small enough to fit on a cell phone—still evoke natural verbal and nonverbal responses in the people speaking with them (Bickmore, 2002). Even here, however, our research on virtual peers has led us back to an exploration of human-human communication, as we attempt to identify the features that signal to children that somebody else is a peer, is good friendship material, is worth listening to and telling stories with. In this instance our exploration of the pragmatics of the body has led us to some key features of social interaction—how rapport and friendship are negotiated—which, in turn, have led us to a better understanding of peer learning. 7.
Conclusions
These five-finger exercises in building virtual people have led to advances in what we know about the interaction between verbal and nonverbal behavior in humans, about the role of small talk in task talk, about the kinds of functions filled in conversation by the different modalities of the body, and about how learning is linked to rapport in children. In learning what must be implemented in order to make Embodied Conversational Agents evoke a lifelike response, and in learning what the technology can and can’t do at the present time, has also given me a sense of the meaning of humanness through human behavior. It is the ensemble of
THE BODY IN COMMUNICATION 321
behaviors, in all of their minuteness and unconscious performance that make a human seem human-like. Flaws and lacunae in that ensemble of behaviors give the scientist interlocutor a sense of what we do not know about human communication. Strengths and continuities in the theory that underlies the implementation lead to a virtual human that evokes human-like behavior in a layperson interlocutor. The sufficiency criterion in Cognitive Science consists of explaining human cognitive activity by showing how a computer program may bring about the same result when the computer is provided with the same input (Newell & Simon, 1972). In virtual human simulations, cognitive activity is not sufficient. I know that my model successfully explains human behavior, when it evokes human behavior, because human communicative behavior is intrinsically relational, and cannot be understood without two humans. As a graduate student I spent long hours in David McNeill’s lab turning the slow-motion dial on the VCR until my fingers were sore. David taught us to pay attention to behaviors, and interactions among behaviors, that virtually nobody else thought were worth watching. From those years of learning how to see under David’s tutelage was born the desire to teach others to pay attention, to teach others that if it is a window into the mind that they are looking for, these nonspeech modalities must not be ignored. Of course the videotapes that I was showing in that Edinburgh classroom are of virtual humans, and not of real humans. But that is because I hope to have added to the tool-kit that David McNeill opened for us, a new tool, a way of analyzing by synthesizing, of understanding by building a simulacrum, of increasing our understanding of the interaction between speech and the non-speech modalities by taking our lessons from the near-human. References Agre, P. (1992). Formalization as a social project. Quarterly Newsletter of the Laboratory of Comparative Human Cognition, 14(1), 25-27. http://polaris.gseis.ucla.edu/pagre/ formalization.html Austin, J. (1962). How to do things with words. Oxford: Oxford University Press. Bickmore, T. (2002). Towards the design of multimodal interfaces for handheld conversational characters. Proceedings of CHI. Minneapolis, MN. Bickmore, T., & Cassell, J. (1999, November 5-7). Small talk and conversational storytelling in embodied conversational characters. Proceedings of AAAI Fall Symposium on Narrative Intelligence (pp.87-92). Cape Cod, MA. Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language use. New York: Cambridge University Press. Cassell, J., & Bickmore, T. (2002). Negotiated collusion: Modeling social language and its relationship effects in intelligent agents. User Modeling and Adaptive Interfaces, 12, 1-44. Cassell, J., Nakano, Y., Bickmore, T., Sidner, C., & Rich, C. (2001, July 17-19). Non-verbal cues for discourse structure. Proceedings of 41st Annual Meeting of the Association of Computational Linguistics (pp.106-115). Toulouse, France. Cassell, J., Stone, M., & Yan, H. (2000). Coordination and context-dependence in the generation of embodied conversation. Proceedings of INLG 2000 (pp.171-178). Mitzpe Ramon, Israel: Association of Computational Linguistics.
322 JUSTINE CASSELL Cassell, J., Tartaro, A., Oza, V., Rankin, Y., & Tse, C. (forthcoming). Virtual peers for literacy learning. Educational Technology, Special Issue on Pedagogical Agents. Clark, H. H. (1992). Arenas of Language Use. Chicago, IL: University of Chicago Press. Halliday, M. A. K. (1967). Intonation and grammar in British English. The Hague: Mouton. Hoffmann, E. T. A. (1844). The Sandman (J. Oxenford & C. A. Feiling, Trans.). In J. Oxenford & C. A. Feiling (Eds.), Library of select novels, No. 42. Tales from the German, comprising specimens from the most celebrated authors (p.110). New York: Harper & Brothers. (Original work published 1817) Keller, E. F. (2003). Models, simulation, and “computer experiments.” In H. Radder (Ed.), The philosophy of scientific experimentation (pp.198-215). Pittsburgh Pa: University of Pittsburgh Press. Kendon, A. (1972). Some relationships between body motion and speech. In A. W. Siegman & B. Pope (Eds.), Studies in Dyadic Communication (pp.177-210). Elmsford, NY: Pergamon Press. Nakano, Y. I., Reinstein, G., Stocky, T., & Cassell, J. (2003, July 7-12). Towards a model of faceto-face grounding. Proceedings of Annual Meeting of the Association for Computational Linguistics (pp.553–561). Sapporo, Japan. Nass, C. I., & Brave, S. (2005). Wired for speech: how voice activates and advances the humancomputer relationship. Cambridge, MA: MIT Press. Newell, A., & Simon, H. A. (1972). Human problem solving. Oxford, England: Prentice-Hall. Poggi, I., & Pelachaud, C. (2000). Performative facial expressions in animated faces. In J. Cassell, J. Sullivan, S. Prevost & E. Churchill (Eds.), Embodied Conversational Agents (pp.155188). Cambridge, MA: MIT Press. Prevost, S. A. (1996). Modeling contrast in the generation and synthesis of spoken language. Proceedings of ICSLP '96. Philadelphia, PA. Ryokai, K., Vaucelle, C., & Cassell, J. (2003). Virtual peers as partners in storytelling and literacy learning. Journal of Computer Assisted Learning, 19(2), 195-208. Suchman, L. (2003). Writing and reading: A response to comments on plans and situated actions. Journal of the Learning Sciences, 12(2), 299-306. Torres, O. E., Cassell, J., & Prevost, S. (1997, July 14-16). Modeling gaze behavior as a function of discourse structure. Proceedings of First International Workshop on Human-Computer Conversation. Bellagio, Italy. Turkle, S. (1995). Life on the screen: Identity in the age of the internet. New York: Simon & Schuster. Vilhjalmsson, H. H., & Cassell, J. (1998). BodyChat: Autonomous communicative behaviors in avatars. Proceedings of Autonomous Agents 98 (pp.269-276). Minneapolis, MN. Walker, M. A., Cahn, J. E., & Whittaker, S. J. (1997). Improvising linguistic style: Social and affective bases for agent personality. Proceedings of Autonomous Agents 97 (pp.96-105). Marina del Rey, CA. Wang, N., Johnson, W. L., Rizzo, P., Shaw, E., & Mayer, R. E. (2005). Experimental evaluation of polite interaction tactics for pedagogical agents. Proceedings of International Conference on Intelligent User Interfaces. San Diego: ACM Press. Yan, H. (2000). Paired speech and gesture generation in embodied conversational agents. Unpublished Masters of Science, MIT, Cambridge, MA.
Index Acquisition, first language, 22–24, 26, 32, 51–53, 55, 62–63, 118–119, 174, 285, 295, 300. See also Language Acquisition Device Acquisition, second language, 9, 67, 117– 122. See also Second language Acquisition, sign language, 24, 32 51–53, 55, 62–63, 174 Addressee, 9, 56, 57, 128, 135, 136, 173, 177, 179, 182, 183, 186, 187–190, 192, 201–204, 209, 243–250 Advertising, 235–239 African-American, 91, 93–95 American Sign Language (ASL), 24, 31, 33, 53, 173, 175, 176, 180 American Indian sign language, 15 Animated agent, 69, 78, 271, 306, 308, 314, 315 Anthropologist(s), 91, 94, 95, 149, 150 Anthropology, 13, 14, 15-18, 21 Antireductionist, 4, 270 Aphasias, 270–271, 274 Aphasic, 8, 10, 269–276, 279-282 Applied communicational situations, 235 Arabic, 91, 93–95, 98 Arbitrariness, 5, 34, 53, 286, 295–296, 298 Archaeology, 195, 205, 208, 210 Architecture, see Computational Architecture Attention, 13, 15, 16, 18, 21, 23, 24, 43, 56, 62, 63, 83, 87–89, 109, 110, 113–116, 117, 155, 162, 171, 186, 201, 203, 208, 210, 223, 236, 243, 244, 270, 274, 275, 317, 321 Autism, 8, 10, 189, 285, 286, 287 Automata, 307 Backchanneling, 92–93, 95 Behaviorism, 3, 16, 22 Birdwhistell, R., 21-22 Blended space, 176 Blending, 99–103, 105, 107, 176–177 Bloomfield, L. 22 Bot, see Animated Agent
Bowing, 159, 161–162, 165–166, 169–170 Brain localizationist accounts, 269, 272–273 Broca’s aphasia, 270–273, 276, 280–281 Brown, R., 11 Bulgarian, 91, 93–95, 98 Bulwer, J., 13–14, Buoy (sign language), 180, 185–186, 190 Cartoon, 8, 69, 75, 77–79, 81, 87, 99, 100, 104–105, 107, 213–219, 222–223, 225, 227–229, 244, 271, 275, 277, 279–280, 285, 316 Catchment, 8, 75, 79–81, 87, 149, 285, 295– 300 Category, 34, 115, 207, 228, 230, 232, 274 Categories, linguistic, 4, 6, 34 85, 114, 119, 171, 286 Categorical, 5–6, 8, 33, 37, 45, 100, 110, 213 Character viewpoint, 107, 216, 218, 220, 229–234, 244, Chicago School, 3 Chinese, 38, 80, 272–274, 276–281 Chomsky, N., 17, 22 Chomskyan linguistics, 110 Cognition, 7, 8, 29, 115, 118, 149–150, 172, 198, 201, 203, 210, 320–321 Cognitive instability, 41 Cognitive load, 44-45, 91 Cognitive processes, 17, 23, 110–111, 115, 255 Cognitive foundations of language, 13 Cognitive science(s) / studies, 23, 317, 321 ‘Cognitive turn’ in linguistics, 21–22, 23 Cognitive universals, 8, 92, 98 Cohesion, 306 Cohesive (discourse), 290, 295, 300 Cohesiveness, 75, 79–81, 300 Combination, 9, 23, 39–41, 99, 114, 151, 159, 170, 209, 215, 223, 225–228, 231, 233–235, 286–291, 294, 296, 298–300, 309, 313 Communication of semantic information, 221, 232, 239 Communicative act, 32, 127, 131, 151
324 INDEX Communicative effectiveness of gestures, 230, 232, 235, 236 Communicative status of gestures, 200, 201, 203, 209 Community, 10, 91, 205, 206, 210 Compliance, 251-252, 255, 256, 257, 259– 260, 261, 262–263, 265–266 Computational architecture, 314-315 Computational model, 304, 307, 317 Computer(s), 117, 177, 178-191, 192, 193, 197, 303, 304, 305, 306, 307, 314, 315, 316, 317, 320, 321 Conceptual integration theory, 8, 101, 102, 105, 107 Conceptualizer, 68 Conceptual Basis of Language, 23 Condillac, E. B., 14, 16 Conducting, musical, 168, 169 Conflict (in interaction), 251-252, 255, 256, 257, 258, 259, 260–261, 262, 263, 264, 265, 266 Conflict(ual) representation, 100, 102, 214, 217–218, 219–220, Constitutive function of gestures, 300 Constraints, 8, 54–55, 69, 79, 110, 113–114, 255–258, 263–264, 314 Context, 11, 110, 115, 223, 149-150, 171, 176, 186, 210, 304, 306, 308, 314, 315, 317 Context, discourse, 5–6, 8, 72, 79, 80, 84, 86, 92, 100, 101, 104, 189, 191, 193, 247, 270, 274, 275, 279, 281, 300, 312, 317 Context, environmental, 7, 9, 201, 210 Context, intrapersonal and interpersonal, 4. See also Vygotsky Context, referential, 51, 62 Context, social interactional, 6–7, 72, 127, 129-131, 134, 136, 140, 141, 172, 218, 220, 255, 257, 261, 263, 317 Context, visuospatial, 5, 104, 220, 239 Convention, 5–6, 8–10, 33, 36, 45, 51–56, 59–63, 83, 84, 87–88, 92, 111, 152, 171, 174–176, 251–253, 255–265, 286 Convention-based interaction, 251–252, 257, 259, 262, 264, 265–266 Coordination, 79, 113, 129, 136, 150, 154, 213, 217–220, 252 Coreferential gestures, 217–218, 220 Cross-language comparison, 67, 271, 273– 274, 276–277, 279
De Jorio, A., 15–16 Deception, 8, 99, 101–102, 106. See also Misreport Deictic gestures, 120, 224, 245, 249. See also Directionality of deictic gestures Deixis, 93, 97, 98, 154, 155, 167, 243, 244, 247 Demonstration, 9, 13, 22, 71, 99, 147, 157, 159, 164, 166, 168, 176, 191, 192, 200, 202, 209, 223 Depicted, 183, 189, 190, 193, 214, 217, 225–227, 229, 230, 296, 298, 300, 305, 313 Depicting, 53–55, 103, 185, 192, 193, 244, 277, 294, 296, 298, 299 space, 178 Depiction, 52, 54, 61, 63, 72, 190, 223, 296, 299, 308, 313 Dialectic, 4–6, 8, 67, 72, 79, 86, 87, 100, 105, 107, 109, 110, 116, 117, 213, 269. See also Language-imagery dialectic Directed signs, 179, 181 Directional signs, 182, 183, 191 Directionality of deictic gestures, 171, 183, 191, 219, 244, 245, 247, 249, 250 Direction-giving, 305-306, 313-314, 318 Directions, see Direction-giving Discourse context, 6, 8, 72, 80, 99, 247, 269, 274, 279, 381, 300 Discourse focus, 269, 275–277, 279, 281 Disfluent aphasia, 269, 275, 279, 281 Displaced reference, 37 Dyadic conversation, 92, 270 Dyadic descriptions, 215, 217, 219 Dyadic discourse, 245, 249 Dyadic perspective, 133 Dynamic dimension (of language), 3–5, 7, 8, 67, 71, 73, 101, 105, 109–110, 111, 112, 114–116, 117, 151, 199 Dynamic view, 4, 109, 110, 111, 115, 116 ECA, see Embodied Conversational Agent Ecology, 199 Emblem(s), 52, 83, 87, 120, 171, 311 Embodied conversational agent (ECA), 10, 303, 305, 307, 308, 311, 315–317, 320 Embodiment, 5, 170, 172, 203, 205 Emergence, 6, 13, 14, 17, 18, 63, 150, 286 Emergent, 103, 150, 304, 306 Emergent cognitive unfolding, 147, 172 Emergent structure, 4, 8, 101
INDEX Emotional contagion, 137–139 Empathy, 138, 139, 245 Enactment, 10, 53, 54, 61, 63, 253, 255, 260–262, 285, 286, 288–295, 298–300 English, 15, 22, 27, 33, 38, 39, 52, 54, 61, 62, 67, 69–72, 80, 91, 93–95, 98, 120– 122, 137, 215–217, 219, 248, 269, 273– 281 Environment, 7, 9, 17, 43, 61, 91–95, 125, 141, 149, 150, 151, 154, 175, 176, 195, 197, 198–210 Environmentally coupled gestures, 9, 195, 198, 200, 201, 203–206, 208–210 Evaluate, 24, 52, 316, 317, 319 Evaluation, 87, 243, 259, 319 Expertise, 148, 150, 210, 307 Extralinguistic reference, 288 Eye contact, 131–133, 136–139, 190, 192. See also Gaze Eye gaze, see Gaze Face-to-face dialogue, 9, 127–129, 134–136, 140–142 Facial displays, 128, 129, 134, 135 First language, 11, 71, 118-122, 128, 178– 183, 189, 192, 193 Foregrounding, 109, 110, 112–116 Formulator, 68 Free imagery hypothesis, 68, 70, 72 Gaze, 10, 51, 54–57, 112–114, 128, 136, 165, 177, 178, 181, 183, 185–190, 192, 202, 203, 209, 303, 307, 309, 315 Gestural, 8, 9, 14, 15, 17–19, 36–38, 51–53, 55, 62, 63, 69–72, 75, 85–87, 101, 112, 114, 122, 149, 153, 154, 159, 175, 185, 196, 201, 209, 218, 223, 238, 243–245, 276, 279, 285, 295, 300 Gestural imagery, 154 Gestural semiosis, 154 Gestural typology, 159 Gesture space, 6, 79, 91, 92, 95, 97, 98, 112, 203, 209, 213, 229, 247, 286, 296 Gesture types, 53, 159 Gesture-only condition, 229, 232–234 Gestures as a mechanism of change, 8, 43, 44 Gesture’s role in problem-solving, 44 Gestures and speech, 223, 235, 237, 244 Gesture–speech mismatch, 41–43, 99–102 Goodglass, H., 269, 271, 273, 274, 280
325
Gradience, 8, 52, 53, 175 Gradient, 52, 53, 175 Greco-Roman, 15 Grounding, 114, 318 Growth point, 5, 6–7, 8, 9, 11, 67, 68, 72, 79–80, 83–88, 100, 102, 105, 107, 116, 117, 220,147, 149–151, 170, 213 Growth point theory, 8, 67, 68, 72, 79, 86 Head nod, 92, 311, 314, 315 Holistic, 4, 40, 64, 269, 315–316 Homesign, 8, 33 Hopper, P., 269, 273, 280, 281 Human–computer interaction, 303, 307, 319, 320 Human-like, 307, 308, 310, 318, 321 Hybrid utterances, 209 Iconic gestures, 51, 67, 68, 71, 72, 222–232, 234, 236 Iconicity, 53 Image, 5, 80, 88, 89, 104, 106, 133, 153, 154, 172, 186, 188, 201, 202, 213, 214– 220, 227, 235–238, 286, 301, 314 Imagery, 4, 5, 9, 11, 15, 23, 40, 67–70, 72, 87, 88, 93, 100, 105, 107, 116, 117, 154, 174, 175, 213, 221, 222 Imitation, 44, 132, 133, 137 Implementation, 110, 309, 316–319, 321 Inclusivity, 91–93, 95, 98 Indexicality, 53 Indicating verbs, 174, 175 Individual, 6–9, 43, 52, 91–93, 95, 98, 127, 129–131, 134–137, 139–141, 149, 150, 168, 177, 192, 198, 199, 205, 209, 215, 225, 227, 228, 230, 233, 243, 292, 300, 306 Individual(istic) theories, 127, 133 Individual as unit of analysis, 140–141 Inference, 294 Influence, 3, 6, 7, 17, 21, 67, 80, 81, 118, 128, 141, 214, 216, 255, 263, 264, 286, 317 Information, 20, 31, 39, 40–43, 45, 67, 69– 72, 75, 76–78, 80, 81, 86, 88, 99–104, 106, 110, 112, 114, 115, 119, 128, 133, 173, 175, 177, 182, 183, 185, 187, 192, 193, 213, 214, 217–236, 238, 243, 261, 262, 276, 277, 280, 281, 293–296, 303, 305, 311, 313, 320 Inscription, 206–208
326 INDEX Interaction, 7, 9, 10, 18–21, 56, 69, 109, 110, 112, 114, 116, 122, 127, 128, 136, 139–141, 147–151, 170–172, 198, 203, 208, 213, 223, 243, 244, 251–253, 255– 266, 307, 310, 311, 313, 315, 318–321 Interactional function, 311 Interactive, 110–115, 129, 149, 150, 151, 154, 157, 172, 245, 250, 263, 286, 287, 316 Interface hypothesis, 67–69, 71, 72 Interlanguage, 119, 120, 122 Interpsychic, 7 Interspeaker influence, 214 Intersubjectivity, 243–245, 247–250 diachronic, 244 synchronic, 243–245 Intonation, 20, 21, 52, 53, 112, 309, 315, 316 Intralinguistic reference, 290 Intraspeaker influence, 214 Islands of development, 300 Japanese, 69–71, 73, 79, 81, 120, 215–217, 219, 243, 245, 247 Joint attentional frame, 203 Kinesic(s), 20, 21-22, 120 Korean, 91, 93–95, 98 Krauss, R. M., 68, 137–139, 200, 220, 224 L1, 71, 118–122, 178–183, 189, 192, 193. See also First language; Acquisition, first language L2, 71, 118–122, 178–189, 192, 193. See also Second language; Acquisition, second language Language acquisition device, 22 Language origins, 13, 14, 16, 18, 21, 24 Language, resilient properties of, 31, 33-34 Language system, 86, 119, 270 Language use, 8–10, 33, 34, 36, 109–111, 114–116, 127–130, 210, 243, 270, 281, 282, 308 Language–gesture acquisition, 51, 285, 295 Language–imagery dialectic, 213 Levelt, W. J. M., 68–70 Lexical access, 224 Lexical semantics hypothesis, 68, 70, 72 Liddell, S., 9, 52, 53, 63, 173–176, 179, 180, 185 Lists or alternatives, 93, 96
Literal meaning, 171 Map, 8, 198, 206, 207, 213, 318 Mastery, 148, 150, 170, 205 McNeill, D., 3–11, 13, 23, 24, 32, 33, 38– 40, 45, 51, 57, 58, 64, 67, 69, 72, 73, 75, 76, 79, 80, 83, 84, 86, 87, 89, 91, 100, 102, 103, 107, 109, 110, 116–119, 121, 122, 142, 148, 151, 174, 175, 195, 203, 209, 213, 214, 216, 221–223, 225, 227– 229, 231, 235, 239, 243, 244, 249, 269, 270, 276, 281, 282, 285–287, 294–296, 298, 299, 300, 303, 316, 321 McNeill’s theory, 3, 4, 9, 67, 72, 79, 86, 109, 117, 119, 130, 239 Meaning, 4, 5, 8, 10, 33, 34, 40, 45, 53, 83, 85–87, 89, 91, 92, 100, 103, 107, 109, 114, 115, 119, 121, 141, 142, 149, 170, 171, 173–175, 178, 191, 195, 198, 199, 201, 210, 213, 214, 217, 224, 227, 228, 230, 232–234, 236, 239, 244, 261, 275, 281, 285, 286, 294, 300, 308, 311, 312– 314, 320 Meaning construction, 109, 175 in mesodevelopment, 286 in microgenesis, 4–6, 300 Mental lexicon, 224 Mental space(s), 63, 100–105, 107, 175–177 Mesodevelopment, 286 Metalanguage, 150 Metaphor, 8, 80, 101, 109–116, 140, 147, 149, 169, 170, 222, 235–239, 247, 249, 250 Microgenesis, 4–6, 300 Micro-social, 9, 127–137, 139–142 Mime, 147, 150, 155, 157, 161, 163, 164 Mimicry, 61, 127, 129–140, 245 Mirror image, 213–215, 219, 220 Misreport, 99–107. See also Deception Modularist accounts, 269, 272, 273, 281 Monologic talk, 150 Monologic description, 215–216, 217, 219 Morphology, 118, 281 Motion events, 69, 71, 80, 118, 121, 122, 214, 281 Motion trajectory, 216, 217, 219 Motor mimicry, 61, 127, 129–140, 245 Multimodal, 10, 83, 84, 89, 100, 105, 109, 110, 115, 116, 147, 153, 156, 166, 196– 199, 269, 270, 274, 281, 282, 300, 304, 305
INDEX Multimodal language, 83, 84, 89, 100 Multimodality, 9, 114, 149, 159, 172 Musical notation, 152, 155, 156 Musician, 147–150, 153, 156, 157, 159, 168–170, 172 Mutual contingency of action, 263–264 Mutual orientation, 203, 209 Narrative(s), 10, 34, 36, 69, 75, 77–80, 121, 128, 135, 173, 177, 178, 189, 191, 214, 229, 271, 279–281, 285–289, 295, 300, 303, 320 Negotiation, 203, 214, 251, 261, 264, 266 Nelson, K., 294 Nine-dot problem, 129, 130, 134, 136, 140 Nonverbal, 10, 23, 25, 304, 306 Nonverbal communication, 20-21, 119-120, 142 Nonverbal signals / behavior(s), 92, 44, 128, 131, 222, 316–318, 320 Notation, musical, 152, 155–156 Object(s), 34, 36, 39, 40, 61, 62, 95, 101, 150, 183, 196, 199, 205, 208, 210, 213, 214, 226, 229, 230, 231, 285, 313 Observer viewpoint, 216, 217–218, 219– 220, 229–233, 244 Octave balance, 151–153, 171 Ontogenesis of gestures, 285, 295 Paradigm(s), 33, 34, 37, 269 Parallel processes, 138 Parent-child interaction, 251, 255, 261 Participation framework, 195, 202–204, 208–210 Path, 38, 41, 62, 70–72, 85, 112, 121, 122, 178, 215, 260–264, 277–280, 291, 294 Peirce, C., 53, 54 Performance, 10, 17, 99–101, 105, 110, 120, 147–151, 155–159, 161, 163, 167–172, 228, 317, 321 Perspective(s), 3, 4, 7–10, 20, 21, 99–101, 105, 107, 112, 117, 119, 121, 122, 127– 130, 133, 142, 149, 205, 209, 216, 217, 220, 243, 244, 247–249, 264, 266, 273, 274, 280, 295, 296 Pointing, 10, 35, 36, 54–58, 63, 92, 101, 154, 158, 167, 180, 185, 196, 197, 201, 225, 228, 233, 234, 246, 247–250, 312, 314 Posture shift, 309, 314, 319
327
Practice, 15, 148, 159, 196, 198, 199, 202– 206, 208–210, 252, 282, 304, 320 Presentational, metapragmatic, 157 Primacy/recency effects, 77 Prior talk, 201 Problem-solving, 44 Professional vision, 204, 205, 208, 210 Propositional function, 310, 317 Psycholinguistics, 4, 11, 23, 140 Quintilianus, 14 Rapport, 138, 139, 309, 313, 320 Real space, 9, 175–179, 181, 183–185, 188– 193 Real space blends, 9, 175–177, 184, 188, 191–193 Rearrangement of temporal sequences, 298 Recalled image, 213, 217 Recency effect, 77 Reciprocity, 128, 136, 251 Recursion, 34, 35, 37 Reductionism, 141 Reference, 7, 10, 37, 57, 60, 61, 71, 72, 75, 77, 79–81, 87, 93, 120, 128, 137, 176, 179, 184, 191, 201, 217, 203, 247, 249, 251, 272, 288 Reference maintenance, 75, 79–81. See also cohesiveness Relative position, 226–234, 277 Relative size, 226, 229 Repeated reproductions, 286, 287, 296 Representation, 8, 34, 40, 45, 68–69, 70–72, 99–101, 104–107, 118, 150–153, 156, 172–175, 214, 216–218, 220, 228, 235, 270, 275, 282, 286, 301, 304 Representational, 45, 68, 69, 83, 147, 150, 220, 275, 313 Resilient properties of language, see Language, resilient properties of Robinson Crusoe myth, 140 Roman era, 13 Routinization, 88 Salience, 5, 114, 269, 279 Salient, 38, 114, 306 Satellite-framed language, 38, 80, 121, Scaffolding of temporal sequences, 287 Second language, 9, 71, 72, 118–121, 178– 189, 192, 193 acquisition, 9, 67, 117–122
328 INDEX Segmentation, 39, 40 Selection of information, 75 Semantic categories / domains, 219, 220, 224, 230, 227, 232–234, Semantic features / properties, 68, 223–224, 226–227, 229, 231–232, 235, 238–239 Semiosis, 154 Sequential context, 210 Sign, 8, 9, 13–18, 21, 24, 31–33, 36, 38, 39, 51–63, 78, 85, 86, 159, 173–176, 178– 183, 185, 186, 191, 193, 196, 198, 199, 203, 204, 207, 210 Sign language, 8, 9, 14–17, 21, 24, 31–33, 36, 39, 51–53, 56, 62, 63, 173–176, 178, 180, 191 acquisition, 51 Sign Language of the Netherlands (SLN), 53, 55–57, 59–63 Simulation, 303–305, 307, 308, 311, 313, 316, 317, 321. See also Computational model Small talk, 309, 318–320 Spanish, 38, 39, 80, 121, 272, 274, 276, 278–281 Spatial conceptualizations, 173, 175, 177, 180, 191 Spatio-temporal layout, 78 Speaker-specific gesture(s), 83, 87, 88 Speech–image advertisement, 237 Spontaneous, 8, 9, 33, 36, 40, 43, 44, 83, 88, 91–93, 111, 136, 174, 175, 229, 235, 236, 239, 274, 291, 313 images, 239 Story-telling, 37
Stress-salience, 269 String quartet, 9, 147, 148, 150, 151, 159, 168, 170 Subjectivity, 243–245, 247–250 Talmy, L., 38, 70, 80, 118, 121 Teaching, musical, 148–150, 156, 157 Temporally coherent narratives, 10, 285 Temporal sequence(s), 285, 287, 292, 294, 298–300 Theme buoy, see Buoy Thinking for speaking, 9, 68, 79, 109, 116– 118, 121, 122 Tomasello, M., 55, 203, 286, 300 Turkana, 91, 94–98 Turkish, 38, 39, 69–71 TV messages, 235 Underlying representation, 100, 107 Verb-framed language, 38, 80, 121 Verb problem, 269, 270, 272, 274–276, 280, 281 Viewpoint(s), 79, 107, 216–218, 229–234, 243, 244, 248–250 Virtual human, 303–308, 310, 314–318, 320, 321 Vision, 64, 204, 205, 208, 210 Visual imagery, 174, 175 Visuospatial, 5, 87, 213, 222 Vygotsky, 4–5, 6–7, 117, 120, 294 Word order, 34, 35, 37, 39 Wundt, W., 14–16, 19, 222