V O LU M E
F I F T Y- T H R E E
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
Series Editor Brian H. Ross Beckman Institute and Department of Psychology University of Illinois at Urbana-Champaign Urbana, Illinois
V O LU M E
F I F T Y- T H R E E
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory EDITED BY
BRIAN H. ROSS Beckman Institute and Department of Psychology University of Illinois at Urbana-Champaign Urbana, Illinois
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA 32 Jamestown Road, London, NW1 7BY, UK Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands
Copyright # 2010, Elsevier Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email:
[email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made ISBN: 978-0-12-380906-3 ISSN: 0079-7421
For information on all Academic Press publications visit our website at elsevierdirect.com
Printed and bound in USA 10 11 12 13 10 9 8 7 6 5 4 3 2 1
CONTENTS
Contributors
ix
1. Adaptive Memory: Evolutionary Constraints on Remembering
1
James S. Nairne 1. Introduction: Nature’s Criterion 2. The Mnemonic Value of Fitness-Relevant Processing 3. Memory Theory and Nature’s Criterion 4. Remembering with a Stone-Age Brain 5. Conclusions Acknowledgments References
2. Digging into De´ja` Vu: Recent Research on Possible Mechanisms
2 3 12 20 27 28 28
33
Alan S. Brown and Elizabeth J. Marsh 1. Introduction 2. Perceptual Explanation 3. Implicit Memory Explanation 4. Physiological Explanation 5. Reports in Anomalous Individuals 6. Continuing Issues 7. Concluding Remarks References
3. Spacing and Testing Effects: A Deeply Critical, Lengthy, and At Times Discursive Review of the Literature
34 36 43 52 54 56 59 60
63
Peter F. Delaney, Peter P. J. L. Verkoeijen, and Arie Spirgel 1. 2. 3. 4.
Introduction A Field Guide to the Spacing Literature: Spotting Impostors The Failure of Existing Spacing Theories Extending a Context Plus Study-Phase Retrieval Account of Spacing Effects 5. The Testing Effect
64 66 80 104 112 v
vi
Contents
6. Spacing and Testing in Educational Contexts 7. Conclusions References
4. How One’s Hook Is Baited Matters for Catching an Analogy
126 135 137
149
Jeffrey Loewenstein 1. Introduction 2. Key Roles for Retrieving Analogies 3. Underlying Structure and Retrieving Analogies 4. Facilitating the Retrieval of Analogies at Retrieval Time 5. Implications 6. Conclusion References
150 151 160 167 173 176 177
5. Generating Inductive Inferences: Premise Relations and Property Effects
183
John D. Coley and Nadya Y. Vasilyeva 1. Introduction 2. Effects of Premise Relations on Inference Generation 3. Effects of Property on Inference Generation 4. Inference Generation: Conclusions and Implications Acknowledgments References
6. From Uncertainly Exact to Certainly Vague: Epistemic Uncertainty and Approximation in Science and Engineering Problem Solving
184 191 203 217 224 224
227
Christian D. Schunn 1. Introduction 2. Linguistic Pragmatics of Uncertainty and Approximation 3. Coding Approximation and Uncertainty from Speech 4. Coding Uncertainty from Gestures 5. Uncertainty, Approximation, and Expertise 6. From Uncertainty to Approximation via Spatial Reasoning 7. Summary and Discussion 8. Future Directions Acknowledgments References
228 229 231 234 237 241 246 248 249 250
Contents
7. Event Perception: A Theory and Its Application to Clinical Neuroscience
vii
253
Jeffrey M. Zacks and Jesse Q. Sargent 1. Introduction 2. Event Segmentation Theory 3. Schizophrenia 4. Obsessive-Compulsive Disorder 5. Parkinson’s Disease 6. Lesions of the Prefrontal Cortex 7. Aging 8. Alzheimer’s Disease 9. Conclusions Acknowledgments References
8. Two Minds, One Dialog: Coordinating Speaking and Understanding
254 255 262 264 269 272 275 282 287 290 290
301
Susan E. Brennan, Alexia Galati, and Anna K. Kuhlen 1. Introduction: The Joint Nature of Language Processing 2. Dialog: Beyond Transcripts 3. Process Models of Dialog 4. The Role of Cues in Grounding 5. Partner-Specific Processing 6. Neural Bases of Partner-Adapted Processing 7. Conclusions Acknowledgments References
302 304 307 313 315 324 335 337 338
9. Retrieving Personal Names, Referring Expressions, and Terms of Address
345
Zenzi M. Griffin 1. Introduction 2. Psychological Research on Personal Name Production 3. Personal Names and Reference Across Cultures 4. Direct Address in Spoken Language 5. Conclusion Acknowledgments References Subject Index Contents of Recent Volumes
345 346 364 371 379 379 380 389 395
This page intentionally left blank
CONTRIBUTORS
Susan E. Brennan Department of Psychology, Stony Brook University, Stony Brook, NY, USA Alan S. Brown Department of Psychology, Southern Methodist University, Dallas, TX, USA John D. Coley Department of Psychology, Northeastern University, Boston, MA, USA Peter F. Delaney Department of Psychology, The University of North Carolina at Greensboro, Greensboro, NC, USA Alexia Galati Department of Psychology, Stony Brook University, Stony Brook, NY, USA Zenzi M. Griffin Department of Psychology, University of Texas at Austin, Austin, TX, USA Anna K. Kuhlen Department of Psychology, Stony Brook University, Stony Brook, NY, USA Jeffrey Loewenstein McCombs School of Business, University of Texas at Austin, TX, USA Elizabeth J. Marsh Department of Psychology and Neuroscience, Duke University, Durham, NC, USA James S. Nairne Department of Psychological Sciences, West Lafayette, IN, USA Jesse Q. Sargent Department of Psychology, Washington University, St Louis, MO, USA Christian D. Schunn LRDC, University of Pittsburgh, Pittsburgh, PA, USA Arie Spirgel Department of Psychology, The University of North Carolina at Greensboro, Greensboro, NC, USA Nadya Y. Vasilyeva Department of Psychology, Northeastern University, Boston, MA, USA ix
x
Contributors
Peter P. J. L. Verkoeijen Department of Psychology, Erasmus University Rotterdam, Rotterdam, The Netherlands Jeffrey M. Zacks Department of Psychology, Washington University, St. Louis, MO, USA
C H A P T E R
O N E
Adaptive Memory: Evolutionary Constraints on Remembering James S. Nairne Contents 2 3 4 8 12 13 14 16 18 20 21 23 24 27 28 28
1. Introduction: Nature’s Criterion 2. The Mnemonic Value of Fitness-Relevant Processing 2.1. The Survival Processing Paradigm 2.2. Explaining the Survival Processing Advantage 3. Memory Theory and Nature’s Criterion 3.1. The Encoding–Retrieval Match 3.2. Levels of Processing 3.3. Episodic Future Thought 3.4. Rational Analysis of Memory 4. Remembering with a Stone-Age Brain 4.1. Building the Case for Cognitive Adaptations 4.2. Ancestral Priorities in Survival Processing 4.3. What Is the Adaptation? 5. Conclusions Acknowledgments References
Abstract Human memory evolved subject to the constraints of nature’s criterion— differential survival and reproduction. Consequently, our capacity to remember and forget is likely tuned to solving fitness-based problems, particularly those prominent in the ancestral environments in which memory evolved. Do the operating characteristics of memory continue to bear the footprint of nature’s criterion? This is ultimately an empirical question, and I review evidence consistent with this claim. In addition, I briefly consider several explanatory assumptions of modern memory theory from the perspective of nature’s criterion. How well-equipped is the toolkit of modern memory theory to deal with a cognitive system shaped by nature’s criterion? Finally, I discuss the inherent difficulties that surround evolutionary accounts of cognition. Given there are no fossilized memory traces, and only incomplete knowledge about ancestral environments, is it possible to develop an adequate evolutionary account of remembering? Psychology of Learning and Motivation, Volume 53 ISSN 0079-7421, DOI: 10.1016/S0079-7421(10)53001-9
#
2010 Elsevier Inc. All rights reserved.
1
2
James S. Nairne
1. Introduction: Nature’s Criterion Imagine you were given the task of designing a human memory system from scratch. What features would you include and why? As memory’s architect, you would need a criterion, a metric against which you could judge the acceptability of design features. Modern memory theorists use a task-based criterion: People are asked to remember information for a test, such as recall or recognition, and proffered features must help predict or explain performance on the test. Although rarely justified, the choice of criterial task is obviously important. It constrains theory development and colors the theory’s final form. For example, theories of free recall lean heavily on a construct called ‘‘temporal context’’ because free recall requires people to remember information in the absence of explicit cues (e.g., Howard & Kahana, 2002; Raaijmakers & Shiffrin, 1981). Yet, the capacity to remember and forget did not emerge from the mind of a memory theorist—it evolved through a tinkering process called natural selection (Darwin, 1859; Jacob, 1977). Design through natural selection has its own stringent criterion: Structural features, once they arise, are maintained if they enhance fitness—that is, survival en route to differential reproduction. If the capacity to remember failed to confer a fitness advantage, modern brains would likely lack a tendency to reference the past. Memory systems need to be adaptive or, at least, they needed to have been adaptive at some point in our evolutionary past (e.g., Symons, 1992). In building a memory system from scratch, then, the lesson of evolutionary biology is clear—pay heed to nature’s criterion. In this chapter, I consider how nature’s criterion potentially shaped the human capacity to remember and forget. In Section 2, I review recent empirical evidence indicating that our memory systems may be specially tuned to remember information that is processed for fitness. Memory evolved because it helped us survive and reproduce and, not surprisingly, it shows sensitivity to fitness-relevant processing in these domains. In Section 3, I scan the landscape of modern memory theory through the lens of nature’s criterion. How wellequipped are the postulates, principles, and perspectives of modern memory theory to deal with a cognitive system shaped by nature’s criterion? Finally, I discuss strategies for developing an evolutionary account of remembering. There are no fossilized memory traces, our knowledge about the heritability of ancestral memory ‘‘traits’’ is limited, nor can we pinpoint the exact environments in which selection took place. Can we ever hope to develop a truly evolutionary account of remembering?
Adaptive Memory: Evolutionary Constraints on Remembering
3
2. The Mnemonic Value of Fitness-Relevant Processing If memory evolved, crafted by the forces of natural selection, then its operating characteristics likely bear some imprint of ancestral selection pressures (Klein, Cosmides, Tooby, & Chance, 2002; Nairne & Pandeirada, 2008a). This is true throughout the physical body, where the footprints of nature’s criterion are easily observed. Each of the body’s major organs plays a crucial role in helping us survive and reproduce and, typically, the fit between form and function is tight. Remnants of the original adaptive problem, or selection pressure, are readily gleaned from the organ’s architecture. The function of the heart is to pump blood and its physical structure reflects that end; the eye’s function is to transduce electromagnetic energy, and retinal cells are uniquely tuned to this task. The body also divides its labor into component parts, each designed to accomplish a particular goal (pumping and filtering blood, collecting and processing oxygen, and so on). Only adaptive problems can engage the design tools of natural selection—problems directly applicable to survival or reproductive fitness—so specificity abounds. The central thesis of evolutionary psychology is that the architecture of the mind—our cognitive processes—shows similar specificity (Tooby & Cosmides, 1992). Particular selection pressures, or adaptive problems, fueled the development of human memory systems; consequently, the proximate mechanisms that enable us to remember and forget are likely tuned to solving such problems, particularly as prominent in the ancestral environments in which memory evolved. Although we can never fully know ancestral environments, it is reasonable to suppose that our ancestors faced recurrent adaptive problems, ones that remained relatively constant across situations. Table 1 lists some potential candidates that apply specifically to remembering (from Nairne & Pandeirada, 2008a). Note that each entry potentially relates to fitness, either through affecting the likelihood of survival, protecting kin, or increasing the chances of successful reproduction. Human memory researchers rarely investigate the specific functional problems listed in Table 1, but relevant data do exist. For example, we have long known that fitness-relevant events can produce salient long-term retention. One compelling example is flashbulb memories, which track the retention of significant life events (Brown & Kulik, 1977; for a recent review, see Luminet & Curci, 2009). Both children and adults report strong and vivid memories for highly emotional events, such as situations in which their lives were in danger, although such memories tend to be reconstructive (e.g., Buss, 2005; Winograd & Neisser, 1992). Additional evidence comes from the study of cultural transmission: What kinds of information are most likely transferred from person to person and across generations?
4
James S. Nairne
Table 1
Potential Candidates for Domain-Specific Mnemonic Processes.
Type of fitness-relevant selection pressure
Examples of potential mnemonic targets relevant to each type of selection pressure
Survival-related events
Food (edible vs. inedible), water, shelter, medicinal plants, predators, prey Landmarks, constellations, weather patterns Physical and/or social characteristics of potential mating partners and/or rivals Altruistic acts, reciprocation, violation of social contracts, social status or hierarchy Physical features and social actions of kin versus nonkin
Navigation Reproduction Social exchange Kin
Note: For each category, our memory systems might be tuned to remember the examples on the right; for example, remembering the locations of edible food, medicinal plants, the meaning of weather patterns, family members, and altruistic acts.
Not surprising, fitness-relevant information, such as information about social interactions or heroic exploits, tends to transmit easily and effectively (Mesoudi & Whiten, 2008; Rubin, 1995). In the laboratory, studies have consistently found that people can easily associate fitness-relevant stimuli, such as snakes and spiders, with aversive ¨ hman & Mineka, 2001). So-called ‘‘taboo’’ words, which are events (O often sexual in nature, are also remembered particularly well and may induce prioritized ‘‘binding processes’’ between items and their context (Guillet & Arndt, 2009; Schmidt & Saari, 2007). In the word recognition literature, people are faster at recognizing words that rate highly along a ‘‘usefulness to survival’’ dimension relative to matched controls (e.g., Wurm, 2007). There is also evidence for a kin-related bias in autobiographical memory: Unpleasant events resulting from social interactions with kin are remembered as having occurred farther back in time than similar interactions with nonkin (Lu & Chang, 2009). People are also particularly good at attributing statements about the violation of social contracts to faces in a source attribution paradigm (Buchner, Bell, Mehl, & Musch, 2009). Not surprisingly, people also tend to remember attractive faces better than average-looking faces, although the effect is larger for female than male faces (see Kenrick, Delton, Robertson, Becker, & Neuberg, 2007).
2.1. The Survival Processing Paradigm As the studies just described illustrate, one can attempt to identify fitnessrelevant events or situations and assess their mnemonic power. Collectively, the evidence suggests that our memory systems effectively retain
Adaptive Memory: Evolutionary Constraints on Remembering
5
information pertinent to situations such as those listed in Table 1. However, studies of this sort suffer from an inherent methodological problem because comparisons are typically made across different items—for example, taboo words and nontaboo words. One can attempt to equate the stimuli on other relevant dimensions, but item-selection effects are always a lingering concern. One can never be completely certain that the stimuli differ only along the particular dimension of interest. Our laboratory has taken a different approach. Rather than comparing retention across item-type (fitness-relevant or not), participants in our experiments are asked to remember the same information (usually unrelated words). What differs across conditions is how those items are processed prior to a subsequent memory test—that is, either in terms of fitness-relevance or not. Table 2 lists the typical survival processing scenario we have used along with two relevant control conditions (Nairne, Thompson, & Pandeirada, 2007). Words are presented individually and people respond by producing a rating—for example, would this item be relevant if stranded in the grasslands of a foreign land without any survival materials? Surprise recall or recognition performance for the survival rating condition is then compared to performance in the ‘‘control’’ conditions, which also require meaningful, or ‘‘deep,’’ processing (Craik & Tulving, 1975). Table 2 Scenarios Used in Nairne et al. (2007).
Survival
Moving
Pleasantness
In this task we would like you to imagine that you are stranded in the grasslands of a foreign land, without any basic survival materials. Over the next few months, you will need to find steady supplies of food and water and protect yourself from predators. We are going to show you a list of words, and we would like you to rate how relevant each of these words would be for you in this survival situation. Some of the words may be relevant and others may not—it is up to you to decide. In this task we would like you to imagine that you are planning to move to a new home in a foreign land. Over the next few months, you will need to locate and purchase a new home and transport your belongings. We are going to show you a list of words, and we would like you to rate how relevant each of these words would be for you in accomplishing this task. Some of the words may be relevant and others may not—it is up to you to decide. In this task, we are going to show you a list of words, and we would like you to rate the pleasantness of each word. Some of the words may be pleasant and others may not—it is up to you to decide.
6
James S. Nairne
0.70
Proportion correct recall
0.65
0.60
0.55
0.50
0.45
0.40
Survival
Moving
Pleasantness
Figure 1 Proportion correct recall for words rated for their relevance to a survival scenario, a scenario involving moving, or for pleasantness (data adapted from Nairne et al., 2007).
Figure 1 shows the standard finding: Survival processing enhances retention relative to other forms of meaningful processing. In this particular example, survival processing produces better retention than processing for pleasantness, a condition known to be a highly effective form of deep processing (Packman & Battig, 1978). The ‘‘moving’’ condition is included as schematic or thematic control. One might argue that survival processing is effective simply because it forces people to encode information into a rich and coherent schema, one that is particularly salient and accessible at retrieval. In fact, both survival processing and the moving control do tend to produce more nonlist intrusions in recall compared to pleasantness processing, suggesting that some kind of schematic processing may be involved, but survival processing still produces the best retention. (Nairne et al., 2007). The survival processing effect has been replicated a number of times in our laboratory and in other laboratories as well (Kang, McDermott, & Cohen, 2008; Weinstein, Bugg, & Roediger, 2008). The effect occurs in both within- and between-subject designs, when either recall or recognition is used as the retention measure, and when pictures instead of words are used as the to-be-remembered stimuli (Otgaar, Smeets, & Van Bergen, 2010). Perhaps most impressively, a few seconds of survival processing produces better long-term recall than a veritable ‘‘who’s who’’ of classic encoding manipulations. Nairne, Pandeirada, and Thompson (2008) used a between-group design to compare the effects of survival processing against
7
Adaptive Memory: Evolutionary Constraints on Remembering
0.65
Proportion correct recall
0.60
0.55
0.50
0.45
0.40 Survival Pleasantness Imagery Self-reference Generation
Intentional
Figure 2 Proportion correct recall for words rated for their relevance to a survival scenario along with recall proportions for a host of other recognized encoding techniques (data adapted from Nairne et al., 2008).
forming visual images, self-reference (relating the item to a personal experience), generating an item from an anagram, and intentional learning. Each of these comparison conditions is widely recognized to enhance retention— in fact, these are the encoding manipulations typically championed in human memory textbooks—yet survival processing produced the best retention. The relevant data are shown in Figure 2. Once again, everyone in these experiments is asked to remember exactly the same stimuli, so survival advantages cannot be attributed to the inherent qualities of the to-be-remembered items. Rather, it is the nature of the processing that produces the enhancement. Inducing participants to process information in a survival ‘‘mode’’ leads to effective long-term retention, regardless of whether information is rated as relevant to survival or not (see Nairne et al., 2007). This last result may seem surprising—one might have expected that only survival-relevant stimuli would be remembered well. In fact, participants usually are more likely to remember items given a high survival relevance rating (see Butler, Kang, & Roediger, 2009; Nairne et al., 2007), but such comparisons suffer from the item-selection concerns noted
8
James S. Nairne
earlier. In addition, the fit, or congruence, between to-be-remembered material and the encoding context is an important determinant of retention as well (Craik & Tulving, 1975; Schulman, 1974). Items deemed highly relevant to survival could be remembered better simply because they are more congruent with the survival-based encoding scenario. This makes comparisons between survival relevant and irrelevant stimuli difficult in the survival processing paradigm. The fact that survival processing enhances retention even for items that are seemingly unrelated to fitness provides another indication of its mnemonic power. Any stimulus bathed in the spotlight of survival processing seems to receive some kind of mnemonic boost. Of course, in natural settings it will be the fitness-relevant stimuli that typically receive the spotlight of processing attention—irrelevant events, unlike in the laboratory, will either be ignored or processed with less vigor. At the same time, importantly, fitness-relevance is not an inherent property of most stimuli; instead, fitness-relevance is context-dependent. As Nairne and Pandeirada (2008a) put it: ‘‘food is survival relevant, but more so at the beginning of a meal than at its completion; a fur coat has high s-value at the North Pole, but low at the Equator’’ (p. 240). Even mundane stimuli, such as a pencil, can become quite fitness-relevant under the right circumstances (e.g., a pencil can be used as a weapon in an attack). For this reason we have suggested that survival processing may be the key to long-term enhancement, although stimuli that are naturally fitness-relevant (at least most of the time) might show better retention as well. As noted earlier, words rated as useful to survival are recognized faster and more accurately in a lexical decision task than are matched control words (e.g., Wurm, 2007).
2.2. Explaining the Survival Processing Advantage Still, the fact that survival processing yields particularly good retention does not tell us much about the proximate mechanisms that produce the advantage. The survival advantage is an a priori prediction of an evolutionary analysis, but standard memory principles might explain it. For example, survival processing could simply lead to greater emotional arousal than control conditions, boosting later recall of information encoded in such a context (see Nairne et al., 2007; Weinstein et al., 2008). Such an account would be consistent with an evolutionary locus—that is, nature solved the adaptive problem of remembering fitness-relevant information indirectly by linking memory to emotional arousal (e.g., McGaugh, 2003, 2006). 2.2.1. Emotional Processing However, there does not appear to be any simple link between memory and emotional processing; the relevant literature is filled with complex and conflicting findings. Increased arousal does not always lead to enhanced
Adaptive Memory: Evolutionary Constraints on Remembering
9
retention and may, in fact, reduce retention in some circumstances (Kensinger, Garoff-Eaton, & Schacter, 2007; LaBar & Cabeza, 2006). In addition, if emotional arousal mediates the survival advantage, the size of the effect should depend on the emotional rating or valence of the processed stimuli. Otgaar et al. (2010) obtained separate measures of arousal and valence for pictures and assessed recall after people rated the pictures for survival relevance, moving to a foreign land, or pleasantness. Both arousal and valence affected recall performance overall, but failed to interact with the size of the survival recall advantage. Nairne et al. (2007) performed a similar analysis with word stimuli and also failed to find any relationship between emotionality rating and the size of the survival processing advantage. Research on memory for emotional words often shows design effects as well—that is, retention advantages for emotional words are confined to mixed designs in which both emotional and neutral words are contained in the same list (e.g., Schmidt & Saari, 2007). Such a pattern suggests that emotional words tend to be remembered well only when they ‘‘stand out’’ or are distinctive relative to neutral words presented in the same context. As noted earlier, the survival processing effect remains highly robust in both within- and between-subject designs. In fact, we have directly compared survival processing in within- and between-subject designs—for example, survival and pleasantness processing occurred either randomly intermixed in the same list or in different lists—and the size of the survival advantage in recall remains essentially the same in both designs. This last finding—that survival processing advantages do not show design effects—also helps distinguish the effect of survival processing from many other standard findings in the memory literature. For example, the generation effect (generated items are remembered better than read items), the effect of bizarre imagery (forming a bizarre image of an item produces better memory than a common image), the enactment effect (subjectperformed actions are remembered better than experimenter-performed actions), and the perceptual interference effect (perceptually masked words are remembered better than unmasked words) all show strong design effects, at least when free recall is used as the retention measure. Each effect is typically stronger in a within-subject design and may even fail to materialize in a between-subject design (for other examples, see McDaniel & Bugg, 2008). Again, survival processing shows no such sensitivity. 2.2.2. Thematic Processing One could also argue that survival processing is effective simply because the rated information is processed in a rich thematic context. Thematic processing affords a number of mnemonic benefits, including enhanced relational processing, that are absent or minimized in item-based processing tasks of the type compared in Figure 2. In our original work we attempted to counter this interpretation by comparing survival to another thematic scenario—moving to
10
James S. Nairne
a foreign land. Although we matched the moving and survival scenarios as closely as possible, one could still argue that thinking about survival is inherently more arousing, interesting, or novel than moving. Since our original report (Nairne et al., 2007), we have replicated the survival benefit using a number of alternative thematic scenarios. For example, we have compared survival to scenarios in which (a) people are asked to imagine themselves vacationing at a fancy resort with all of their needs taken care of, (b) eating dinner at a restaurant, and (c) planning a charity event with animals at the local zoo (Nairne & Pandeirada, 2007; Nairne et al., 2007, 2008)—in each case, a survival processing advantage was found. Our survival scenario also produces better memory than one involving the planning and execution of a bank heist (Kang et al., 2008). In this case, the bank heist scenario was chosen because Kang et al. felt our original moving condition was somewhat mundane, lacking the novelty and excitement of the survival scenario. Nairne and Pandeirada (2008b) also found robust survival processing advantages when people rated words in categorized lists. We reasoned that survival processing might induce people to encode unrelated words into an ‘‘ad hoc’’ category representing ‘‘things that occur in a survival situation.’’ Once primed by the rating task, the ad hoc category could then provide an efficient retrieval structure relative to item-based tasks such as pleasantness processing. However, such an account predicts that if the to-be-rated words are inherently related (i.e., the list is categorized), then any relational processing induced by the survival rating task should be less beneficial to retention. Many studies have shown that relational processing of items in a related list, such as sorting items from an obviously categorized list into categories, yields few mnemonic advantages compared to identical processing of words in an unrelated list (e.g., Hunt & Einstein, 1981). In fact, encoding procedures that focus on the item itself, such as rating the item for pleasantness, produce the best recall when a list is categorized. Nairne and Pandeirada (2008b) found that survival processing continued to produce better recall than pleasantness processing, even when the lists were categorized and the items were drawn from survival-relevant categories. Perhaps the most convincing evidence against a thematic or relational processing account, though, comes from a recent study using more focused survival scenarios (Nairne, Pandeirada, Gregory, & Van Arsdall, 2009). Evolutionary psychologists often argue that extant cognitive processes evolved primarily during the Pleistocene when our species survived largely as foragers or hunter-gatherers. We continue to house a ‘‘stone-age mind,’’ one filled with adaptations uniquely designed to handle problems relevant to early hunter-gatherer environments (e.g., Tooby & Cosmides, 2005). With this in mind, we developed scenarios to tap prototypical ‘‘hunting’’ and ‘‘gathering’’ activities. In the hunter scenario, people were asked to imagine themselves living in the grasslands as part of a small group; their task was to contribute needed meat to the tribe by hunting big game, trapping small
11
Adaptive Memory: Evolutionary Constraints on Remembering
animals, or fishing in a nearby lake. In the gathering condition, the task was to gather food for the tribe by scavenging for edible fruits, nuts, or vegetables. In line with our earlier work, participants were asked to rate the relevance of random words to these activities prior to a surprise memory test. Of main interest are the two control conditions. In the gathering control condition, participants were asked to rate the relevance of words to a task involving searching for and locating food items, but under the guise of a nonfitness-based scavenger hunt. The hunter scenario was compared to a matched control in which participants rated the relevance of words to participating in a hunting contest. Importantly, both control scenarios required people to imagine tracking and hunting for food—the same activities required in the survival scenarios—but only in the survival versions were the activities fitnessrelevant (necessary for continued survival). As shown in Figure 3, significantly better recall performance was found when the scenarios induced people to process information in a survival mode (Nairne et al., 2009). 2.2.3. Special Adaptation? Does processing information in a survival ‘‘mode’’ engage special mnemonic machinery—perhaps some kind of targeted adaptation uniquely sculpted by the processes of natural selection? I address this possibility in more detail later in the chapter, but some clear conclusions are possible at this point. First, at a 0.70
Proportion correct recall
0.65
0.60
0.55
0.50
0.45
0.40
Gatherer
Scavenger
Hunter
Hunting contest
Figure 3 The left-hand side shows proportion correct recall for words rated with respect to a ‘‘gathering’’ scenario, which was fitness-relevant, and a matched ‘‘scavenger hunt’’ scenario, which was not. The right-hand side shows data for the ‘‘hunting’’ and matched ‘‘hunting contest’’ scenario (data adapted from Nairne et al., 2009).
12
James S. Nairne
purely empirical level, survival processing produces excellent retention— better, in fact, than virtually all known encoding techniques. For example, as just noted, survival processing produces better retention than pleasantness processing in a categorized list—the latter is generally considered to be the ‘‘gold standard’’ against which effective encoding techniques are compared (see Hunt & McDaniel, 1993). From the perspective of nature’s criterion, this is the anticipated result—memory needs to be adaptive, particularly with respect to the maintenance and use of information related to fitness. Second, as the experiments discussed in this section illustrate, it is unlikely that domain-general factors, such as interest, novelty, emotional or thematic processing, will easily account for the retention advantages found after survival processing. ‘‘Standard’’ memory processes may yet explain the advantage, but the proximate mechanisms involved remain unknown. In the next section, I consider some of the standard explanatory mechanisms used by memory theorists in more detail, but viewed from the unique perspective of nature’s criterion. To preview, most memory theorists rely on general purpose processes to explain retention, ones that fail to consider either nature’s criterion or any specific purposeful end. It is widely accepted that our sensory systems evolved to solve a set of highly specified problems—for example, detecting edges, extracting wavelength information, maintaining shape constancy—but little is known about the comparable problems that drive our capacity to remember. Instead, researchers focus on explaining retention performance in a few well-specified tasks, such as free recall or recognition, rather than isolating the adaptive problems that memory presumably evolved to solve. Regardless of the proximate mechanisms that actually underlie the advantage, however, survival processing remains an extremely effective encoding technique. To maximize retention in both normal and impaired populations, it is critical to develop encoding techniques that are congruent with the natural design of memory systems. Semantic-based processing and self-referential processing have been used for years in clinical settings to improve retention (e.g., Bird, 2001; De Vreese, Neri, Fioravanti, Belloi, & Zanetti, 2001; Mimura et al., 2005), yet a few seconds of survival-based processing produces better free recall than either of these encoding tasks. Thus, understanding the functional problems that drive remembering, and the particular role that fitness-relevant processing contributes to long-term retention, should help to improve retention in a variety of populations and retrieval settings.
3. Memory Theory and Nature’s Criterion As noted earlier, very few of the topics listed in Table 1 have received much attention in the human memory literature. This is partly due to the emphasis that cognitive psychologists place on understanding tasks, but
Adaptive Memory: Evolutionary Constraints on Remembering
13
there is the propensity to rely on domain-general learning and memory processes as well. Most memory theorists accept that memory evolved, but fail to factor nature’s criterion into their task analyses. The possibility that there are a host of domain-specific memory processes, each uniquely crafted to solve particular fitness-relevant problems, is either ignored or rejected by the community of modern memory researchers (for some exceptions, see Klein, Cosmides, et al., 2002; Paivio, 2007; Sherry & Schacter, 1987). Instead, theorists appeal to a few general constructs or principles to explain how retention varies across situations. I discuss two of the most popular constructs below—encoding specificity and levels of processing—and then consider some recent functionally themed approaches that fit more snugly with nature’s criterion.
3.1. The Encoding–Retrieval Match One of the most widely used theoretical constructs in memory theory is encoding specificity or, more generally, the principle of the encoding– retrieval match (see Tulving, 1983; Tulving & Thomson, 1973). This principle can be summarized as follows: Conditions present at encoding establish memory records that, in turn, are differentially accessible depending on the retrieval environment. What ultimately determines retention, at least with respect to a particular target event, is the relative match between the encoded record and the retrieval cue(s) in effect. The better the match, or more precisely the extent to which the retrieval cue matches the target better than other possible retrieval candidates, the more likely the target memory will be retrieved (see Jacoby & Craik, 1979; Nairne, 2002). Things remembered best are those with memory records that match or resemble the cues likely to be present in the testing or retrieval environment. The key element in this principle is equipotentiality: Neither events, processes, nor retrieval cues are assumed to have any special mnemonic properties (see Surprenant & Neath, 2009; Tulving, 1983). What matters is simply the functional match between the encoding and retrieval environments. Consider the picture-superiority effect: One generally finds that pictures are easier to remember than words, but the advantage is dependent on the nature of the retrieval environment (usually recall or recognition). Retrieval environments can be arranged in which words are remembered better than pictures—one merely needs to employ retrieval cues at test that are more diagnostic of previously encoded words than they are of pictures (e.g., using word fragments as cues at retrieval; see Weldon & Roediger, 1987). It is the relationship between the encoding and retrieval conditions that reigns supreme, not the content of information or the manner in which it is processed. As a result, survival processing must be beneficial because it produces diagnostic memory records—that is, those that are likely to be matched in
14
James S. Nairne
later retrieval environments. By itself, of course, this reasoning is circular; additional assumptions are needed to explain why one type of encoding produces more ‘‘matchable’’ retrieval records than another. Historically, memory researchers have appealed to ‘‘elaboration’’ or ‘‘spread of encoding’’ to help solve this problem (e.g., Craik & Tulving, 1975). Effective encoding procedures are those that promote the generation of multiple retrieval cues through the linking of the target item to other information in memory. As the number of linkages—or ‘‘spread’’ of the encoding— increases, the chances that an effective retrieval cue will be encountered later increase as well. But the process itself is domain-general. Retention is controlled by the presence of a diagnostic retrieval cue; environmental factors, rather than information content alone, determine when (or if) an effective retrieval cue will be present. There are no inherent memory ‘‘tunings,’’ only taxonomies relating encoding and retrieval contexts. Viewed through the lens of nature’s criterion, of course, equipotentiality seems unworkable. How could such a system evolve—that is, one that does not discriminate among the adaptive consequences of the processed event? There are simply too many critical problems for the developing human to solve—avoiding predators, locating nourishment, selecting an appropriate mate—to rely on such a general, content-free principle. Again, the engine that drives natural selection and structural change is fitness enhancement. Any evolved system, as a result, likely guarantees that fitness-relevant events receive some processing priority, at least relative to events that are largely fitness-irrelevant. Nature builds physical structures that solve specific problems—livers, hearts, visual systems—not general systems that remain insensitive to content. Moreover, a system that relies merely on the match between encoded records and retrieval environments remembers continuously. On-line experiences always yield cues that will match some elements of previous experience, so restrictions are essential for the memory system to function. Decisions need to be made about how to restrict the retrieval cues that are processed, as well as the range of allowable memory records that can be matched. Selection advantages will accrue to memory systems that remember appropriately—that is, to systems that remember information pertinent to improving survival and reproduction. The match between encoding and retrieval environments may be important, perhaps even critical to successful retention, but its role in remembering must ultimately be understood in terms of some larger functional agenda.
3.2. Levels of Processing A similar argument applies to the popular encoding theory known as the levels of processing framework (Craik & Lockhart, 1972; Craik & Tulving, 1975). According to this view, successful retention depends on the depth of
Adaptive Memory: Evolutionary Constraints on Remembering
15
processing that an item receives, in which ‘‘depth’’ is defined as the extent of meaningful or conceptual processing. Empirically, it is well established that thinking about the meaning of an item produces excellent long-term retention compared to more superficial forms of processing, such as attending to the shape or sound of a verbal item (e.g., Hyde & Jenkins, 1973). However, as with the picture-superiority effect, the advantage of meaningful processing depends on the characteristics of the retrieval environment. One can arrange retrieval environments in which nonmeaningful (shallow) forms of processing lead to comparatively better retention (Stein, 1978; also see Roediger, Gallo, & Geraci, 2002). But for traditional retrieval environments (e.g., free recall or recognition) processing for meaning remains an excellent vehicle for long-term retention. At first glance, the levels of processing framework seems like a domainspecific theory—our memory systems are ‘‘tuned’’ to the processing of meaning. Yet, the theory ultimately subscribes to a kind of equipotentiality as well—meaningful processing ‘‘works’’ only because it promotes the encoding of information into highly organized and differentiated retrieval structures. Craik (2007) has used the analogy of a library: If a new acquisition is ‘encoded deeply’ it will be shelved precisely in terms of its topic, author, date, etc., and the structure of the library catalog will later enable precise location of the book. If the new book was simply categorized in terms of its surface features (‘blue cover, 8" 10", weighs about a pound’) it would be stored with many similar items and be difficult or impossible to retrieve later. The ability to process deeply is thus a function of a person’s expertise in some domain—it could be mathematics, French poetry, rock music, wine tasting, tennis, or a multitude of other types of knowledge. (p. 131)
.
From the standpoint of theory, then, memory is conceptualized in a completely domain-general way. Successful retention depends on fitting tobe-remembered material into rich, established knowledge structures that are easy to access when retrieval is needed. Moreover, it is experience, or expertise, that is the ultimate arbiter of effectiveness. Although recurrent aspects of the environment may lead to common knowledge structures across people, an individual’s unique interests and life experiences build those domains of expertise that afford the best opportunity for excellent long-term retention. As Craik (2007) notes, the levels of processing framework ‘‘postulates no special ‘store’ or ‘faculty’ of memory—or even special memory processes’’ (p. 132). Again, is it reasonable to assume that such a domain-general process is well suited for solving the wide range of mnemonic problems that humans faced throughout their evolutionary history, everything from remembering food locations, predator routes, potential mate choices, cheaters on social contracts, and so on? One might argue that fitness-relevant knowledge
16
James S. Nairne
structures, those germane to survival and reproduction, are simply better described than nonfitness-relevant events—that is, more organized and differentiated. However, it is unlikely that these characteristics, if present, developed with experience or expertise. Most people have limited experience with survival situations, particularly those involving predators in the grasslands of a foreign land. More importantly, remembering fitnessrelevant information is too important to rely on the whims of environments that may or may not deliver the experiences necessary to build appropriate retrieval structures. Empirically, as reviewed earlier, a few seconds of survival processing leads to enhanced retention relative to traditional ‘‘deep’’ processing tasks, including ones that should activate highly organized and differentiated retrieval structures. For example, Nairne et al. (2008) compared survival processing to a self-reference task. People were asked to make survival relevance ratings about words or to rate the ease with which the word brought an important personal experience to mind. So-called ‘‘self schemas’’ are highly organized and differentiated, and well practiced, yet survival processing produced better retention. Moving and spending time at a restaurant are also well practiced compared to surviving in the grasslands, and should activate highly organized knowledge structures, yet it is survival processing that produces the better memory. Finally, as Nairne et al. (2009) have shown, one can use rating scenarios that trigger exactly the same activities (e.g., hunting or searching for food) but retention depends importantly on whether the activities are deemed fitness-relevant or not. There is little question that depth of processing and the encoding– retrieval match are important to retention; it would be folly to suggest otherwise. Decades of research have established that retention is retrievalcue dependent and improved by encoding techniques that maximize the chances that effective cues will be present when needed (see Tulving & Craik, 2000). However, to suggest that these two principles are sufficient to capture the essential properties of memory’s evolved architecture is nonsense. The idea that our memory systems are insensitive to content—that neither events, processes, nor retrieval cues are ‘‘special’’—ignores the specificity and defining characteristics of nature’s criterion.
3.3. Episodic Future Thought Although the majority of memory researchers remain focused on understanding specific retrieval environments, invoking general constructs such as encoding specificity or levels of processing to explain retention, more functionally oriented perspectives do exist. One relatively recent idea is that our memory systems are fundamentally prospective—that is, oriented toward the future rather than the past (Schacter & Addis, 2007; Szpunar & McDermott, 2008). Of course, from the perspective of nature’s criterion
Adaptive Memory: Evolutionary Constraints on Remembering
17
such a conclusion must be true. It is the ability to use the past, in combination with the present, that produces adaptive behavior (Nairne & Pandeirada, 2008a; Suddendorf & Corballis, 2007). The core idea behind the emerging concept of episodic future thought is adaptive simulation (Atance & O’Neill, 2001). People possess the unique ability to imagine, or pre-experience, events that may happen in the future, thereby enabling them to cope more effectively with future events. To consider an obvious case, the nervous teenager anticipating his first date actively envisions scenarios—what his partner might do or say and he can practice witty retorts. One can also re-create scenarios from the personal past—for example, a botched job interview—and cast alternative versions of events in the hope of performing more effectively in the future. The adaptive value of mental simulation is widely practiced across domains. Golfers, for example, often mentally picture the trajectory of a shot before addressing the ball. It has been suggested that one of the primary functions of episodic memory, one reason why the system might have evolved, is to provide the key elements or building blocks from which future thoughts can be constructed (Schacter & Addis, 2007). Indeed, evidence from a variety of sources—neuropsychological, neuroimaging, and behavioral data—indicates a close relationship between episodic retrieval and future thought simulation. For example, individuals who have lost the capacity to remember personal episodes from the past have trouble imagining personal events in the future (Klein, Loftus, & Kihlstrom, 2002; Tulving, 2002). Neuroimaging studies suggest that a common core brain network may be engaged during both episodic remembering and episodic future thought (Buckner & Carroll, 2007; Szpunar, Watson, & McDermott, 2007). Behaviorally, the ability of normal people to imagine vivid and detailed future scenarios depends on the availability of relevant past episodes (Szpunar & McDermott, 2008). If our memory systems truly evolved to anticipate and plan for the future, then processing information in a future-oriented ‘‘planning mode’’ might produce particularly good retention. Evidence consistent with this idea has been reported recently by Klein, Robertson, and Delton (2010). People were asked to make ratings about objects in the context of a camping trip in the woods. In one condition, focused on the past, people were asked to rate the likelihood that particular objects had been taken on a past camping trip; in a second atemporal condition, people were asked simply to imagine a camping site and to rate the chances that objects were contained in the image; in the future-oriented planning condition, people were asked to rate the likelihood that they would plan to take a particular object with them on a future trip. A surprise recall test revealed that futureoriented planning produced the best retention—even better, in fact, than yet another condition in which people were asked to rate the survival value of each of the objects.
18
James S. Nairne
Again, to satisfy nature’s criterion, cognitive systems must be adaptive— that is, they need to produce behavior that directly or indirectly increases survivability and reproduction. A system designed merely to remember the past could not have easily evolved. The past can never occur again, at least in exactly the same form, so memory systems gain their adaptive edge by improving future responding (Suddendorf & Corballis, 1997). At the same time, a memory system that is designed simply to simulate the future falls short of nature’s criterion as well—the concept is too general. If the ability to construct future scenarios is an evolved characteristic, it arose because it ultimately enhanced fitness. Consequently, as with survival processing, we might expect the imprint of nature’s criterion to be observable in the operating characteristics of episodic future thought. For example, we might anticipate that people will simulate future events more effectively when those events are relevant to fitness than when they are not. At this point, though, the mark of nature’s criterion on episodic future thought remains to be investigated. Interestingly, the survival processing paradigm can be conceived as one that induces episodic future thought. People are asked to imagine a grasslands scenario and then to rate the relevance of events to surviving in such a context. It is easy to imagine that people in these experiments are actively simulating the scenario and anticipating how the presented events apply (or not). Memory is enhanced relative to conditions in which events receive only item-based processing, such as rating for pleasantness or forming a visual image—but also to simulated scenarios that involve activities that are not fitness-relevant (such as moving to a foreign land or participating in a hunting contest). However, one difference between a simulated survival scenario and ‘‘planning’’ a specific future event, such as a camping trip, is that the survival scenario is more likely to rely on generic knowledge than on personally relevant episodes. Few, if any, college-age participants have a background in grasslands-based survival situations, so it is unlikely that a survival simulation is constructed from personally relevant episodes (Klein, Loftus, et al., 2002; Szpunar, 2010). Unraveling the connections between the building blocks of episodic future thought and their evolutionary roots should prove to be a productive avenue for future research.
3.4. Rational Analysis of Memory Another functional perspective on retention proposes that our memory systems evolved, in part, to reflect the statistical regularities of events in the environment. These ‘‘rational’’ models of memory adopt a Bayesian framework, assuming that one important function of memory is to calculate the conditional probabilities associated with event occurrence. There is presumably a cost to remembering, so it is adaptive to consider the probabilities that
Adaptive Memory: Evolutionary Constraints on Remembering
19
particular memories will be relevant, and therefore needed, in a particular environment (Anderson & Milson, 1989; Shiffrin & Steyvers, 1997). In fact, our retention functions do seem to track the way events actually occur and recur in the environment. Forgetting functions are negatively accelerated, meaning that most of the retention loss occurs early in the function and slows thereafter. It turns out that the statistical properties of event occurrence follow essentially the same form. For example, Anderson and Schooler (1991) assessed the probability that a particular word would appear in the headlines of the New York Times as a function of the number of days that passed from an initial occurrence. So, if the phrase ‘‘Cap and Trade’’ appears in the headlines today, there is a relatively good chance that the same phrase will appear tomorrow. But the odds fall off with each successive day in a form that mimics the classic forgetting function. Anderson and Schooler’s (1991) results suggest that ‘‘forgetting’’ is simply an optimal reflection of the way events actually occur and recur in the environment. We are less likely to remember a specific occurrence with time, but that is because the event is less likely to occur again and be needed (for other supporting applications, see Anderson & Schooler, 2000). The rational approach successfully captures the idea that our cognitive systems are inherently constrained by nature—we think and remember in particular ways in order to optimize expected utilities (gains vs. costs) in a given situation. Unlike most approaches to human memory, the rational viewpoint is also functional; it assumes that memory systems are purposeful and crafted to solve specific problems in the environment. However, from the perspective of nature’s criterion, two caveats deserve mention. First, evolutionary psychologists generally believe that our brains developed to solve adaptive problems prevalent during the so-called environment of evolutionary adaptedness (e.g., Symons, 1992; Tooby & Cosmides, 1992). Thus, although our cognitive systems may be optimally designed, they evolved to solve problems in ancestral environments, particularly those associated with foraging lifestyles. This means that our memory systems may not be optimal in modern environments, and it may be a mistake to assume that they are designed merely to detect statistical regularities in such environments (for a discussion of optimality modeling and evolution, see Gangestad & Simpson, 2007). The second caveat emerges from a recurrent theme of this chapter—the engine that drives structural change through natural selection is the enhancement of fitness. Consequently, it is unlikely that our memory systems evolved simply to reflect the statistical properties of events—in either modern or ancestral environments; instead, the content (or, more specifically, the fitness-relevance) of the information needs to be taken into the account. Our memory systems should be optimally designed to reflect the occurrence and recurrence of fitness-relevant information, rather than information in general. In fact, Anderson and Schooler (2000) suggest that it
20
James S. Nairne
might be less costly to process or retrieve certain kinds of memories, based on the content or ‘‘importance’’ of the events involved, although they have not pursued the issue empirically.
4. Remembering with a Stone-Age Brain Throughout the chapter, I have developed logical arguments for specially tuned memory systems, those sculpted by the processes of natural selection. For example, memory’s tunings are unlikely to have emerged entirely from experientially based learning mechanisms—on-line experiences often do not deliver the information necessary to respond appropriately. In addition, memory systems could not have evolved to record and remember everything—problems of combinatorial explosion arise quickly so selectivity in storage is required (see Ermer, Cosmides, & Tooby, 2007). Instead, given the severity of nature’s criterion, cognitive systems likely come equipped with ‘‘crib sheets’’ or built-in biases about how to respond rapidly and efficiently to fitness-relevant input (Tooby & Cosmides, 1992). At the same time, there is a difference between recognizing that our memory systems are functionally designed—that is, ‘‘tuned’’ to solve particular kinds of problems—and discovering the ultimate origins of those tunings. Identifying an adaptation, especially a cognitive one, is notoriously difficult. There are no ‘‘fossilized’’ memory traces, and we have only limited knowledge about the ancestral environments in which our memory systems actually evolved (Buller, 2005). Adaptive solutions to recurrent problems can arise indirectly, by piggybacking on adaptations that evolved for different reasons (exaptations), or as a result of natural constraints in the environment (e.g., the physical laws of nature or genetic constraints). The proximate mechanisms that enable us to read and write, for example, could not have evolved directly for those ends even though reading and writing are very adaptive abilities. To establish that a given cognitive mechanism, such as a mnemonic tuning, reflects an adaptation—that is, a mechanism arising directly as a consequence of evolution through natural selection—requires satisfying multiple criteria (e.g., Brandon, 1990; Williams, 1966). In principle, one would need to establish that the trait can be inherited, or passed along across generations through differential reproduction. One would also want to show that at some point in our ancestral past there were individual differences among people along the trait dimension, and that certain forms (such as a special memory tuning for fitness-relevant information) were selected because they promoted differential survival and reproduction relative to other forms. Obtaining this kind of evidence is difficult, if not functionally
Adaptive Memory: Evolutionary Constraints on Remembering
21
impossible, for most of the cognitive adaptations of interest to evolutionary psychologists (e.g., see Richardson, 2007). It is also important to recognize that our cognitive systems were not built from scratch—natural selection ‘‘tinkers,’’ which means that changes emerge from preexisting structures. The design of these structures, in turn, introduces constraints that color how the adaptive problems that drive evolution are ultimately solved. Thus, even if we could correctly identify the ancestral selection pressures that drove the development of our memory systems, it would still be difficult to predict how nature solved the relevant adaptive problems. As noted above, the task becomes even more difficult with the recognition that adaptations can be co-opted to solve problems that are ostensibly unrelated to their functional design (Gould & Vrba, 1982). Even worse, some mnemonic phenomena may even be artifacts—so-called ‘‘spandrels’’ or incidental byproducts of other design features. For example, sensory persistence (e.g., iconic memory) may occur simply as a byproduct of the fact that neural responses are extended in time (Haber, 1983; Loftus & Irwin, 1998).
4.1. Building the Case for Cognitive Adaptations Despite these inherent problems, one can still build compelling arguments in favor of evolutionary loci (see Andrews, Gangestad, & Matthews, 2002). Evolutionary biologists, ethologists, and comparative psychologists have been proposing adaptationist hypotheses for generations, without satisfying the stringent criteria mentioned above (see Bolles & Beecher, 1988; Shettleworth, 1998). Most scholars agree that cognitive adaptations exist in humans—for example, sensory and perceptual systems—although the evolutionary lineage is unavailable or difficult to track even in the most obvious cases. Moreover, the absence of relevant evidence is not sufficient to falsify adaptationist arguments, nor does it mean that nonadaptationist hypotheses are correct (e.g., all memory ‘‘tunings’’ emerge from experience). Both adaptationist and nonadaptationist hypotheses need to be constructed (and judged) on an empirical base. One can infer the existence of an adaptation, generate and test empirical predictions, and systematically rule out alternative explanations (Williams, 1966). As the research reviewed in this chapter illustrates, it is possible to adopt a functional/evolutionary perspective and generate empirically-testable hypotheses. One common complaint against evolutionary psychology is the proliferation of ‘‘just-so’’ stories—that is, post-hoc explanations for phenomena that seem apt from an evolutionary perspective but lack relevant empirical grounding (e.g., giraffes have long necks because they could reach more easily for food; Gould & Lewontin, 1979). By themselves, these kinds of accounts have explanatory value, but mainly to the extent that they can be used to generate empirically-testable predictions. Recognizing that
22
James S. Nairne
our memory systems evolved, and were subject to the constraints of nature’s criterion, led to the prediction that processing information for fitness would lead to especially good retention. Because of how labor was divided during early environments of adaptation, Silverman and Eals (1992) generated the clear prediction that women may be better equipped than men to remember the locations of objects set in fixed locales (see also New, Krasnow, Truxaw, & Gaulin, 2007). Similarly, recognizing that males and females differ in their relative amounts of parental investment generates predictions about sex-based mating strategies and parental behavior that, in turn, can be confirmed or disconfirmed empirically (Buss, 2006). It is also possible to attempt comparative analyses, across cultures and species, either to establish the universality of the trait or to demonstrate that it occurs only in environments affording the relevant selection pressures. Given an evolutionary locus, one would presumably expect to find fitnessbased retention advantages across species and peoples. This may seem like a trivial prediction but, in fact, an early criticism of our work was that survival processing advantages might have arisen from exposure to culture-specific media, such as the television program Survivor—they do not (see Nairne et al., 2007; Weinstein et al., 2008). At the same time, comparative analyses, by themselves, do not provide unequivocal support for the presence of an adaptation. Universality can arise for many reasons—for example, common experiences or natural constraints across environments—and the presence of a trait across species does necessarily mean that common adaptations are involved. Comparative analyses can be effective in helping to eliminate alternative hypotheses and may serve as one piece in a larger argument in favor of an adaptationist account. One can also look for tunings or specificity in development. For example, many scholars believe that language learning in children is biologically prepared—the capacity for language develops easily and reliably and follows rules that cannot be readily gleaned from everyday experiences (e.g., Pinker, 1994). Moreover, the human ear and vocal tract seem perfectly tailored to meet the needs of speech, and there are specific regions in the brain that control the production and comprehension of spoken language. Many developmental psychologists argue as well that babies are born knowing all kinds of things about the world, everything from an intuitive sense of motion and the physical world to differences between animate and inanimate objects (Bloom, 2005; Gelman, 2003). Babies may also be born with a bias to recognize and remember faces, gender-specific voices, and fear natural predators such as snakes (DeLoache & LoBue, 2009). Again, none of these data, by themselves, can decisively confirm an adaptationist locus—for example, these abilities could be exaptations, co-opted from other adaptations—but they help to bolster an adaptationist case. It would be interesting to know, for example, whether fitness-based retention effects arise easily in
Adaptive Memory: Evolutionary Constraints on Remembering
23
children or fitness-relevant processing activates regions in the brain that overlap (or not) with other forms of mnemonic processing. Another criterion that is sometimes used to defend an evolutionary locus is optimality. Proximate mechanisms resulting from evolved adaptations should show a special ability to maximize adaptive behavior. So, one can investigate the operating parameters of remembering and forgetting and establish (usually through some form of quantitative model) that the system rationally maximizes benefits and minimizes costs (e.g., Anderson & Milson, 1989). However, as noted earlier, adaptations, by definition, are rooted in the past; consequently, we should not necessarily expect to detect optimal behavior from an evolved system operating in a modern environment. Instead, at least in principle, we should expect to find ancestral priorities—that is, we should find that the system is tuned to operate most effectively in past environments, particularly environments associated with our foraging past. Evidence of this sort is particularly compelling for adaptationist accounts because it is difficult to see how general learning mechanisms could possibly account for an ancestral priority. In fact, evidence consistent with ancestral priorities exists in several cognitive domains. For example, New, Cosmides, and Tooby (2007) found that people are faster and more accurate at detecting animals, both human and nonhuman, than inanimate objects using the change-detection paradigm, a procedure in which people are asked to detect changes in rapidly alternating images. People were slower at detecting changes in familiar vehicles across images than they were at detecting changes in rarely experienced animal species. In the learning domain, some studies have found that ancestrally relevant fear stimuli, such as snakes and spiders, are easier to associate with aversive stimuli than modern fear-relevant stimuli such as ¨ hman & Mineka, 2001). In addition, specific guns and electrical outlets (see O phobias are more apt to develop to ancestral stimuli (e.g., spiders) than to aversive stimuli experienced exclusively in modern environments (e.g., weapons; De Silva, Rachman, & Seligman, 1977). Although not definitive, these data are consistent with the notion that some aspects of cognitive processing may be better tuned to ancestral than to modern priorities.
4.2. Ancestral Priorities in Survival Processing There is evidence indicating that ancestral priorities may help drive retention performance in the survival processing paradigm as well (Nairne & Pandeirada, 2010; Weinstein et al., 2008). Weinstein et al. asked people to process the relevance of words to a survival situation, but varied whether the scenario described an ancestral or a modern setting. In one condition, using the typical survival scenario (see Table 2), people were asked to imagine themselves stranded in the grasslands of a foreign land without basic survival
24
James S. Nairne
materials. Over the next few months, they would need to find steady supplies of food and water and protect themselves from predators. In a second condition, exactly the same scenario was used but two critical words were changed: city was substituted for grasslands and predators was replaced by attackers. Escaping from predators in the grasslands, the authors reasoned, is a closer fit to the problems faced in the environment of evolutionary adaptation; as a result, it should produce better memory than processing in a modern context, even though the latter is arguably more familiar and likely to lead to greater amounts of elaboration. Consistent with their hypothesis, better retention for the rated words was found for the group processing the ancestral scenario. Our laboratory has recently replicated this work and extended it to two new domains—attempting to cure an infection and finding necessary nourishment. In the first case, the survival scenario was once again set either in the grasslands or in a city, and participants were asked to imagine they had been hurt and a dangerous infection might be developing. Participants were instructed to rate the relevance of words to the task of finding ‘‘relevant medicinal plants’’ to cure the infection (ancestral) or finding ‘‘relevant antibiotics’’ (modern). In a second experiment, again employing either a grasslands or a city scenario, people were asked to imagine they had not eaten for several days and needed to ‘‘search for and gather edible plants’’ (ancestral) or ‘‘search for and buy food’’ (modern). In all other respects the scenarios were matched exactly. The rating task was followed by a surprise recall test for the rated words. The main results of interest are shown in Figure 4. In both experiments, people who imagined themselves in an ancestral context remembered more of the rated words than those who imagined themselves in a city. Importantly, both of the scenarios depicted survival situations and the adaptive problems involved (curing an infection and finding nourishment) were essentially the same. Moreover, typical for the survival processing paradigm, everyone in both experiments was asked to remember exactly the same stimuli. Despite the fact that the scenarios were very closely matched— differing in only a few words—processing an item in an ancestral survival context led to better retention than processing the same item in a modern survival context. It is tempting to conclude from these data that the ancestral scenarios induced a unique form of survival processing, one congruent with the selection pressures that originally fed the processes of natural selection.
4.3. What Is the Adaptation? Assuming that mnemonic adaptations exist, and account partly for the fitness-based ‘‘tunings’’ seen in the survival processing paradigm, what form would these adaptations be likely to take? Do we have minds filled
25
Adaptive Memory: Evolutionary Constraints on Remembering
Proportion correct recall
0.60
0.55
0.50
0.45
0.40 Medicinal plants
Antibiotics
Ancestral food
Modern food
Figure 4 Proportion correct recall for the ‘‘ancestral’’ conditions (searching for medicinal or edible plants) and the matched ‘‘modern’’ conditions (searching for antibiotics or shopping for food). Data are from Nairne and Pandeirada (2010).
with highly specialized memory adaptations, each crafted to solve a particular kind of memory problem (e.g., remembering faces, edible plants, or predator types)? Or, did we evolve a few general systems defined more by flexibility than by domain-specificity? Memory researchers sometimes propose multiple memory systems (e.g., Schacter & Tulving, 1994), but those systems are typically defined by the source of information rather than by its content (see Tooby & Cosmides, 2005). For instance, we may have evolved systems for dealing with personal autobiographical events, general knowledge, or perceptual representations, but not for specific situations related to fitness (e.g., predators, food sources, or potential mates). Some neuroscientists have argued for domain-specific knowledge systems in the brain (e.g., Caramazza & Shelton, 1998), but such proposals are rarely considered by mainstream cognitive psychologists. As argued throughout, adaptations develop to solve adaptive problems, those defined by nature’s criterion. Evolutionary psychologists tend to reject content-free architectures because it is difficult to see how such structures could evolve. Structural features evolve because they enhance fitness—so, in the case of memory, our capacity to remember and forget likely developed because our memory systems helped us solve fitness problems of the sort listed in Table 1. Adaptive problems can be solved by general systems, but general systems are rarely engineered by natural selection. For example, consider a retention system based merely on meaning—information that is
26
James S. Nairne
processed for meaning is remembered better than information processed along more ‘‘shallow’’ perceptual dimensions (Craik & Lockhart, 1972). One could argue that processing in a survival ‘‘mode’’ induces meaningful processing, and concomitant ‘‘elaborations,’’ and therefore fitness-relevant information would typically be remembered well. However, as noted earlier, failing to differentiate between important and unimportant material (i.e., the assumption of equipotentiality) leads to a host of potential problems (e.g., combinatorial explosion of information). It is more likely that a system evolved to detect and remember fitness-relevant information, a system that could then be co-opted to remember generally. At the same time, we probably did not evolve any simple kind of ‘‘survival module.’’ The concept of survival is too general as well. As Nairne et al. (2007) argued, the retention advantages that accrue from survival processing could easily result from ‘‘multiple modules working in concert—each activated to one degree or another by the survival processing task’’ (p. 270). From an evolutionary perspective, specific processing systems may have developed for dealing with particular foods, predators, potential mating partners, and the like (e.g., see Barrett, 2005). It is probably necessary to differentiate among retention environments as well. For example, most of the work conducted to date on survival processing has used free recall as the retention measure. Free recall requires a search engine, or retrieval process, that accesses stored information using a criterion of recent occurrence. It is an episodic task, one that requires people to recall information that occurred at a specific time, in a specific location, as defined by the experiment. For some kinds of fitness-relevant problems—perhaps remembering the location of a predator or a food source—enhanced episodic retrieval might be especially beneficial. However, for other fitnessrelevant problems, such as remembering whether someone is a cheater or a potential mate, remembering temporal and spatial information may be less useful. At this point, it is not possible to characterize mnemonic adaptations in any satisfactory fashion. We can use the lessons of evolutionary biology to speculate—for example, adaptations tend to be domain-specific and functionally designed—but logic alone is not a substitute for building a strong empirical case. Again, as the data reviewed in this chapter clearly show, it is possible to generate a priori empirical predictions about the possible functions and evolutionary roots of our memory systems. Future research will need to compare and contrast alternative accounts and ‘‘visions’’ of memory’s evolved architecture. However, regardless of the proximate mechanisms that are ultimately uncovered, it will be important to recognize initially that cognitive systems are functionally designed. Our memory systems are purposeful—they evolved to solve adaptive problems—and memory’s architecture is likely to reflect those functional ends.
Adaptive Memory: Evolutionary Constraints on Remembering
27
5. Conclusions Theories naturally evolve, based on the criterion of successfully predicting and describing performance on a criterial task. In the case of memory theory, psychologists have relied on an ever-expanding toolkit of memory measures—for example, recall, recognition, fMRI scans—but rarely explain or justify why one task should be preferred over another. Adopting such a ‘‘structuralist’’ mindset means, of course, that our theories tend to be task-based and rarely connected to actual problems (see Nairne, 2005). Which is likely to provide the clearest window into what it means to remember—free recall, recognition, or some other task? Most notably absent from current memory debates, however, is the recognition that nature designed our memory systems with her own criterial task—reproductive fitness. For a memory system to evolve, it must satisfy the constraints of nature’s criterion; it must easily solve the kinds of adaptive problems that engineer change through natural selection (e.g., situations of the type listed in Table 1). Accordingly, one might hypothesize, the imprints—or footprints—of those criterial problems should remain visible in the operating characteristics of memory systems. This is ultimately an empirical question, but recent research suggests that our memory systems may indeed be ‘‘tuned’’ to remember information and events that are relevant to fitness. In fact, as discussed earlier, a few seconds of survival processing produces better free recall performance than a veritable ‘‘who’s who’’ of established memory encoding techniques (Nairne et al., 2008). Recognizing a role for nature’s criterion in the design and function of memory systems has implications for our theoretical conceptions of memory as well. The crux of the functionalist agenda is the recognition that memory is functionally designed (Klein, Cosmides, et al., 2002; Nairne, 2005; Sherry & Schacter, 1987). Our memory systems are not engineered to remember everything; decisions need to be made about storage and retrieval and content matters. It is much more important to have memory systems that track the locations of predators and food, or the statements of potential mating partners, than other random events in the environment—regardless of the ultimate origins of those biases or tunings. Yet, many modern memory theorists continue to champion equipotentiality, expressed in the form of domain-general constructs such as encoding specificity or levels of processing. Again, the ultimate arbiter of whether our memory systems are indeed domain-specific, and whether it is appropriate to propose multiple highly specialized memory systems, is empirical. Adopting a truly functional perspective, recognizing that our memory systems are designed to solve adaptive problems, should help to establish productive empirical pathways in the future.
28
James S. Nairne
Finally, despite the compelling logic of an evolutionary perspective, it is important to acknowledge the difficulties that surround the search for adaptations, cognitive or otherwise. As noted, there are no fossilized memory records, the heritability of cognitive processes remains largely unknown, and we can only speculate about the selection pressures that operated in ancestral environments. There is also the troubling temptation to concoct adaptationist accounts based on plausibility rather than empirical fact (i.e., ‘‘just-so’’ stories; Gould & Lewontin, 1979). At the same time, relevant evidence can be collected about our foraging past (Tooby & Cosmides, 2005); and, as illustrated throughout, it is certainly possible to generate empirically-testable predictions about how recurrent adaptive problems impact modern memory functioning. Few scholars question the assertion that cognitive adaptations must exist, but to build a convincing empirical case for their existence requires much more. Recent research on the evolutionary determinants of memory is seeking to provide an empirical foundation on which just such a case can be made.
ACKNOWLEDGMENTS Special thanks are due to Josefa Pandeirada for many helpful comments on the manuscript. This research was supported, in part, by a grant from the National Science Foundation (BCS-0843165).
REFERENCES Anderson, J. R., & Milson, R. (1989). Human memory: An adaptive perspective. Psychological Review, 96, 703–719. Anderson, J. R., & Schooler, L. J. (1991). Reflections of the environment in memory. Psychological Science, 2, 396–408. Anderson, J. R., & Schooler, L. J. (2000). The adaptive nature of memory. In E. Tulving & F. I. M. Craik (Eds.), The Oxford handbook of memory (pp. 557–570). New York: Oxford University Press. Andrews, P. W., Gangestad, S. W., & Matthews, D. (2002). Adaptationism: How to carry out an exaptationist program. Behavioral and Brain Sciences, 25, 489–553. Atance, C. M., & O’Neill, D. K. (2001). Episodic future thinking. Trends in Cognitive Sciences, 5, 533–539. Barrett, H. C. (2005). Adaptations to predators and prey. In D. Buss (Ed.), The handbook of evolutionary psychology (pp. 200–223). Hoboken, NJ: Wiley. Bird, M. (2001). Behavioural difficulties and cued recall of adaptive behaviour in dementia: Experimental and clinical evidence. Cognitive Rehabilitation in Dementia, 3, 357–375. Bloom, P. (2005). Descartes’ baby. New York: Basic Books. Bolles, R. C., & Beecher, M. D. (Eds.), (1988). Evolution and learning. Hillsdale, NJ: Lawrence Erlbaum Associates. Brandon, R. (1990). Adaptation and environment. Princeton: Princeton University Press. Brown, R., & Kulik, J. (1977). Flashbulb memories. Cognition, 5, 73–99.
Adaptive Memory: Evolutionary Constraints on Remembering
29
Buchner, A., Bell, R., Mehl, B., & Musch, J. (2009). No enhanced recognition memory, but better source memory for faces of cheaters. Evolution and Human Behavior, 30, 212–224. Buckner, R. L., & Carroll, D. C. (2007). Self-projection and the brain. Trends in Cognitive Sciences, 11, 49–57. Buller, D. J. (2005). Adapting minds: Evolutionary psychology and the persistent quest for human nature. Cambridge, MA: The MIT Press. Buss, D. M. (2005). The murderer next door: Why the mind is designed to kill. New York: The Penguin Press. Buss, D. M. (2006). Strategies in human mating. Psychological Topics, 2, 239–260. Butler, A. C., Kang, S. H. K., & Roediger, H. L. III. (2009). Congruity effects between materials and processing tasks in the survival processing paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 1477–1486. Caramazza, A., & Shelton, J. R. (1998). Domain-specific knowledge systems in the brain: The animate–inanimate distinction. Journal of Cognitive Neuroscience, 10, 1–34. Craik, F. I. M. (2007). Encoding: A cognitive perspective. In H. L. Roediger III, Y. Dudai, & S. M. Fitzpatrick (Eds.), Science of memory: Concepts (pp. 129–135). New York: Oxford University Press. Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671–684. Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104, 268–294. Darwin, C. (1859). On the origin of species. London: John Murray. De Silva, P., Rachman, S., & Seligman, M. (1977). Prepared phobias and obsessions: Therapeutic outcome. Behavioral Research and Therapy, 15, 65–77. De Vreese, L. P., Neri, M., Fioravanti, M., Belloi, L., & Zanetti, O. (2001). Memory rehabilitation in Alzheimer’s disease: A review of progress. International Journal of Geriatric Psychiatry, 16, 794–809. DeLoache, J. S., & LoBue, V. (2009). The narrow fellow in the grass: Human infants associate snakes and fear. Developmental Science, 12, 201–207. Ermer, E., Cosmides, L., & Tooby, J. (2007). Functional specialization and the adaptationist program. In S. W. Gangestad & J. A. Simpson (Eds.), The evolution of mind: Fundamental questions and controversies (pp. 86–94). New York: Guilford Press. Gangestad, S. W., & Simpson, J. A. (2007). Whither science of the evolution of mind. In S. W. Gangestad & J. A. Simpson (Eds.), The evolution of mind: Fundamental questions and controversies (pp. 397–437). New York: Guilford Press. Gelman, R. (2003). The essential child: Origins of essentialism in everyday thought. Oxford: Oxford University Press. Gould, S. J., & Lewontin, R. C. (1979). The spandrels of San Marco and the Panglossian paradigm: A critique of the adaptationist programme. Proceedings of the Royal Society B: Biological Sciences, 205, 581–598. Gould, S. J., & Vrba, E. S. (1982). Exaptation: A missing term in the science of form. Paleobiology, 8, 4–15. Guillet, R., & Arndt, J. (2009). Taboo words: The effect of emotion on memory for peripheral information. Memory & Cognition, 37, 866–879. Haber, R. N. (1983). The impending demise of the icon: A critique of the concept of iconic storage in visual information processing. Behavioral and Brain Sciences, 6, 1–10. Howard, M. W., & Kahana, M. J. (2002). A distributed representation of temporal context. Journal of Mathematical Psychology, 46, 269–299. Hunt, R. R., & Einstein, G. O. (1981). Relational and item-specific information in memory. Journal of Verbal Learning and Verbal Behavior, 20, 497–514. Hunt, R. R., & McDaniel, M. A. (1993). The enigma of organization and distinctiveness. Journal of Memory and Language, 32, 421–445.
30
James S. Nairne
Hyde, T. S., & Jenkins, J. J. (1973). Recall for words as a function of semantic, graphic, and syntactic orienting tasks. Journal of Verbal Learning and Verbal Behavior, 12, 471–480. Jacob, F. (1977). Evolution and tinkering. Science, 196, 1161–1166. Jacoby, L. L., & Craik, F. I. M. (1979). Effects of elaboration of processing at encoding and retrieval: Trace distinctiveness and recovery of initial context. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing in human memory (pp. 1–21). Hillsdale, NJ: Erlbaum. Kang, S., McDermott, K. B., & Cohen, S. (2008). The mnemonic advantage of processing fitness-relevant information. Memory & Cognition, 36, 1151–1156. Kenrick, D. T., Delton, A. W., Robertson, T., Becker, D. V., & Neuberg, S. L. (2007). How the mind warps: A social evolutionary perspective on cognitive processing disjunctions. In J. P. Forgas, M. G. Haselton, & W. von Hippel (Eds.), Evolution and the social mind: Evolutionary psychology and the social mind. New York: Psychology Press. Kensinger, E. A., Garoff-Eaton, R. J., & Schacter, D. L. (2007). Effects of emotion on memory specificity: Memory trade-offs elicited by negative visually arousing stimuli. Journal of Memory and Language, 56, 575–591. Klein, S. B., Cosmides, L., Tooby, J., & Chance, S. (2002). Decisions and the evolution of memory: Multiple systems, multiple functions. Psychological Review, 109, 306–329. Klein, S. B., Loftus, J., & Kihlstrom, J. F. (2002). Memory and temporal experience: The effects of episodic memory loss on an amnesic patient’s ability to remember the past and imagine the future. Social Cognition, 20, 353–379. Klein, S. B., Robertson, T. E., & Delton, A. W. (2010). Facing the future: Memory as an evolved system for planning future acts. Memory & Cognition, 38, 13–22. LaBar, K. S., & Cabeza, R. (2006). Cognitive neuroscience of emotional memory. Nature Reviews Neuroscience, 7, 54–64. Loftus, G. R., & Irwin, D. E. (1998). On the relations among different measures of visible and informational persistence. Cognitive Psychology, 35, 135–199. Lu, H. J., & Chang, L. (2009). Kinship effect on subjective temporal distance of autobiographical memory. Personality and Individual Differences, 47, 595–598. Luminet, O., & Curci, A. (Eds.), (2009). Flashbulb memories: New issues and new perspectives. New York: Psychology Press. McDaniel, M. A., & Bugg, J. M. (2008). Instability in memory phenomena: A common puzzle and a unifying explanation. Psychonomic Bulletin & Review, 15, 237–255. McGaugh, J. L. (2003). Memory and emotion: The making of lasting memories. New York: Columbia University Press. McGaugh, J. L. (2006). Make mild moments memorable: Add a little arousal. Trends in Cognitive Sciences, 10, 345–347. Mesoudi, A., & Whiten, A. (2008). The multiple roles of cultural transmission experiments in understanding human cultural evolution. Philosophical Transactions of the Royal Society B: Biological Sciences, 363, 3489–3501. Mimura, M., Komatsu, S. I., Kato, M., Yoshimasu, H., Moriyama, Y., & Kashima, H. (2005). Further evidence for a comparable memory advantage of self-performed tasks in Korsakoff’s syndrome and nonamnesic control subjects. Journal of the International Neuropsychological Society, 11, 545–553. Nairne, J. S. (2002). The myth of the encoding–retrieval match. Memory, 10, 389–395. Nairne, J. S. (2005). The functionalist agenda in memory research. In A. F. Healy (Ed.), Experimental psychology and its applications (pp. 115–126). Washington, DC: American Psychological Association. Nairne, J. S., & Pandeirada, J. N. S. (2007). Adaptive memory: Is survival processing special? Paper presented at the 48th Annual Meeting of the Psychonomic Society. Nairne, J. S., & Pandeirada, J. N. S. (2008a). Adaptive memory: Remembering with a stoneage brain. Current Directions in Psychological Science, 17, 239–243.
Adaptive Memory: Evolutionary Constraints on Remembering
31
Nairne, J. S., & Pandeirada, J. N. S. (2008b). Adaptive memory: Is survival processing special? Journal of Memory and Language, 59, 377–385. Nairne, J. S., & Pandeirada, J. N. S. (2010). Adaptive memory: Ancestral priorities and the mnemonic value of survival processing. Cognitive Psychology (2010), doi:10.1016/j. cogpsych.2010.01.005. Nairne, J. S., Pandeirada, J. N. S., Gregory, K. J., & Van Arsdall, J. E. (2009). Adaptive memory: Fitness-relevance and the hunter-gatherer mind. Psychological Science, 20, 740–746. Nairne, J. S., Pandeirada, J. N. S., & Thompson, S. R. (2008). Adaptive memory: The comparative value of survival processing. Psychological Science, 19, 176–180. Nairne, J. S., Thompson, S. R., & Pandeirada, J. N. S. (2007). Adaptive memory: Survival processing enhances retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 263–273. New, J., Cosmides, L., & Tooby, J. (2007). Category-specific attention for animals reflects ancestral priorities, not expertise. Proceedings of the National Academy of Sciences of the United States of America, 104, 16598–16603. New, J., Krasnow, M. M., Truxaw, D., & Gaulin, S. J. C. (2007). Spatial adaptations for plant foraging: Women excel and calories count. Proceedings of the Royal Society B: Biological Sciences, 274, 2679–2684. ¨ hman, A., & Mineka, S. (2001). Fears, phobia, and preparedness: Toward an evolved O module of fear and fear learning. Psychological Review, 108, 483–522. Otgaar, H., Smeets, T., & van Bergen, S. (2010). Picturing survival memories: Enhanced memory after fitness-relevant processing occurs for verbal and visual stimuli. Memory & Cognition, 38, 23–28. Packman, J. L., & Battig, W. F. (1978). Effects of different kinds of semantic processing on memory for words. Memory & Cognition, 6, 502–508. Paivio, A. (2007). Mind and its evolution: A dual coding theoretical approach. Mahwah, NJ: Erlbaum. Pinker, S. (1994). The language instinct. New York: HarperCollins. Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associative memory. Psychological Review, 88, 93–134. Richardson, R. C. (2007). Evolutionary psychology as maladapted psychology. Cambridge, MA: The MIT Press. Roediger, H. L., III, Gallo, D. A., & Geraci, L. (2002). Processing approaches to cognition: The impetus from the levels-of-processing framework. Memory, 10, 319–332. Rubin, D. C. (1995). Memory in oral traditions. The cognitive psychology of epic, ballads, and counting-out rhymes. New York: Oxford University Press. Schacter, D. L., & Addis, D. R. (2007). The cognitive neuroscience of constructive memory: Remembering the past and imagining the future. Philosophical Transactions of the Royal Society B: Biological Sciences, 362, 773–786. Schacter, D. L., & Tulving, E. (1994). What are the memory systems of 1994? In D. L. Schacter & E. Tulving (Eds.), Memory systems (pp. 1–38). Cambridge, MA: The MIT Press. Schmidt, S. R., & Saari, B. (2007). The emotional memory effect: Differential processing or item distinctiveness? Memory & Cognition, 35, 1905–1916. Schulman, A. I. (1974). Memory for words recently classified. Memory & Cognition, 2, 47–52. Sherry, D. F., & Schacter, D. L. (1987). The evolution of multiple memory systems. Psychological Review, 94, 439–454. Shettleworth, S. J. (1998). Cognition, evolution, and behavior. New York: Oxford University Press. Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM—Retrieving effectively from memory. Psychonomic Bulletin & Review, 4, 145–166.
32
James S. Nairne
Silverman, I., & Eals, M. (1992). Sex differences in spatial abilities: Evolutionary theory and data. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary theory and the generation of culture (pp. 531–549). New York: Oxford Press. Stein, B. S. (1978). Depth of processing reexamined: The effects of precision of encoding and test appropriateness. Journal of Verbal Learning and Verbal Behavior, 17, 165–174. Suddendorf, T., & Corballis, M. C. (1997). Mental time travel and the evolution of the human mind. General, Social, and General Psychology Monographs, 123, 133–167. Suddendorf, T., & Corballis, M. C. (2007). The evolution of foresight: What is mental time travel, and is it unique to humans? Behavioral and Brain Sciences, 30, 299–313. Surprenant, A. M., & Neath, I. (2009). Principles of memory. New York: Psychology Press. Symons, D. (1992). On the use and misuse of Darwinism in the study of human behavior. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture (pp. 137–159). New York: Oxford University Press. Szpunar, K. K. (2010). Episodic future thought: An emerging concept. Perspectives on Psychological Science, 5, 142–162. Szpunar, K. K., & McDermott, K. B. (2008). Episodic memory: An evolving concept. In D. Sweat, R. Menzel, H. Eichenbaum, & H. L. Roediger III (Eds.), Learning and memory: A comprehensive reference (pp. 491–510). Oxford: Elsevier. Szpunar, K. K., Watson, J. M., & McDermott, K. B. (2007). Neural substrates of envisioning the future. Proceedings of the National Academy of Sciences of the United States America, 104, 642–647. Tooby, J., & Cosmides, L. (1992). The psychological foundations of culture. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary theory and the generation of culture (pp. 19–136). New York: Oxford Press. Tooby, J., & Cosmides, J. (2005). Conceptual foundations of evolutionary psychology. In D. Buss (Ed.), The handbook of evolutionary psychology (pp. 5–67). Hoboken, NJ: Wiley. Tulving, E. (1983). Elements of episodic memory. New York: Oxford University Press. Tulving, E. (2002). Episodic memory: From mind to brain. Annual Review of Psychology, 53, 1–25. Tulving, E., & Craik, F. I. M. (Eds.), (2000). The Oxford handbook of memory. Oxford: Oxford University Press. Tulving, E., & Thomson, D. M. (1973). Encoding specificity and retrieval processes in episodic memory. Psychological Review, 80, 352–373. Weinstein, Y., Bugg, J. M., & Roediger, H. L. (2008). Can the survival recall advantage be explained by basic memory processes? Memory & Cognition, 36, 913–919. Weldon, M. S., & Roediger III, H. L. (1987). Altering retrieval demands reverses the picture superiority effect. Memory & Cognition, 15, 269–280. Williams, G. C. (1966). Adaptation and natural selection. Princeton: Princeton University Press. Winograd, E., & Neisser, U. (1992). Affect and accuracy in recall: Studies of ‘‘flashbulb memories’’ New York: Cambridge University Press. Wurm, L. H. (2007). Danger and usefulness: An alternative framework for understanding rapid evaluation effects in perception? Psychonomic Bulletin & Review, 14, 1218–1225.
C H A P T E R
T W O
Digging into De´ja` Vu: Recent Research on Possible Mechanisms Alan S. Brown and Elizabeth J. Marsh Contents 34 36 37 38 39 40 42 43 43 46 49 51 52 52 53 53 54 54 55 56 56 57 57 58 59 60
1. Introduction 2. Perceptual Explanation 2.1. Jacoby and Whitehouse (1989) 2.2. Split Perception: Study 1 2.3. Split Perception: Study 2 2.4. Split Perception: Study 3 2.5. Superficial Glance ¼ Shallow Processing? 3. Implicit Memory Explanation 3.1. Episodic Experience 3.2. Single-Element Familiarity Explanation 3.3. Gestalt Familiarity Explanation 3.4. Hypnosis 4. Physiological Explanation 4.1. Neural Transmission Asynchrony 4.2. Surgical Elimination of De´ja` Vu 4.3. Surgical Elicitation of De´ja` Vu 5. Reports in Anomalous Individuals 5.1. Blindness 5.2. Chronic De´ja` Vu 6. Continuing Issues 6.1. Aging 6.2. Dreams 6.3. Single versus Multiple Causes 6.4. Jamais Vu 7. Concluding Remarks References
Abstract The de´ja` vu experience has piqued the interest of philosophers and physicians for over 150 years, and has recently begun to connect to research on fundamental cognitive mechanisms. Following a brief description of the nature of this Psychology of Learning and Motivation, Volume 53 ISSN 0079-7421, DOI: 10.1016/S0079-7421(10)53002-0
#
2010 Elsevier Inc. All rights reserved.
33
34
Alan S. Brown and Elizabeth J. Marsh
recognition anomaly, this chapter summarizes findings from several laboratories that are related to this memory phenomenon. In our labs, we have found support for three possible mechanisms that could trigger de´ja` vu. The first is split perception, which posits that a de´ja` vu is caused by a brief glance at an object or scene just prior to a fully aware look. Thus, the perception is split into two parts and appears to be eerily duplicated. A second mechanism is implicit memory, whereby a prior setting actually has been experienced before by the person but stored in such an indistinct manner that only the sense of familiarity is resurrected. Another example of an implicit memory effect involves a single part of a larger scene that is familiar but not identified as such, with the result that the strong sense of familiarity associated with this portion inappropriately bleeds over onto the entire scene. Others have found support for gestalt familiarity, that the framework of the present setting closely resembles something experienced before in outline but not in specifics. We also present physiological evidence from brain and cognitive dysfunctions that relate to our understanding of de´ja` vu. Finally, some important but unresolved issues in de´ja` vu research are noted, ones that should guide future research on the topic.
1. Introduction We have all some experience of a feeling that comes over us occasionally of what we are saying or doing having been done in a remote time—of our having been surrounded dim ages ago by the same faces, objects, and circumstances—of our knowing perfectly well what will be said next, as if we suddenly remembered it. David Copperfield, Charles Dickens (1849, p. 630)
Perhaps the most exciting insights into the nature of cognitive function happen when normal processes break down. Roediger (1996) notes that the field of perceptual psychology embraced, early on, the study of illusions as a conduit to better understand normal perceptual processes. Yet memory researchers have not been as enthusiastic about such an approach, perhaps because memory dysfunction (compared to perceptual dysfunction) is more closely associated with global mental and physical pathology (cf. Brown, 2004). While a few memory illusions have been extensively investigated, such as false recall (Roediger & McDermott, 1995) and conjunction errors ( Jones & Atchley, 2006), de´ja` vu is perhaps the most interesting and dramatic of memory illusions because it involves a clash of two rational and routine cognitive evaluations—familiarity versus unfamiliarity. During de´ja` vu, one feels that a setting or event is strongly familiar, yet rationally ‘‘knows’’ that it is not.
Recent Research on De´ja` Vu
35
Stepping back into the realm perceptual psychology, there are two different classes of illusions—those that are not attention grabbing (Mu¨ller-Lyer) and those that are (wagon wheel). With the Mu¨ller-Lyer illusion, one simply perceives the arrow-head capped line to be shorter than the one with the arrow heads, without surprise or awareness of one’s error. In contrast, in the wagon wheel illusion, the spokes of the stagecoach appear to be turning backwards as in the old cowboy movie, jolting our awareness. We know that the wheels are not really turning in reverse direction, and that the movie frames are simply out of sync with the wheel spokes. Turning back to the realm of memory, there are also two categories of illusions: those that we are aware of, and those that we are not. When we fail to recognize an old friend in a crowd as they walk past us, we are unaware of it and it does not capture our attention. On the other hand, when we fly to Key West for the first time and our rented vacation condo feels strikingly familiar, we experience a realm of uncomfortable mental incongruity that grabs hold of us and elicits a de´ja` vu. The literature on the de´ja` vu experience is extensive, going back 150 years (cf. Brown, 2004). Most early reports involve personal reflections in the form of literary descriptions and personal anecdotes. A few attempted to document a connection between the de´ja` vu and various medical (epilepsy) and psychological (schizophrenia) dysfunctions, but the application of scientific scrutiny to de´ja` vu has been slow to evolve. This sluggish involvement of systematic empirical investigation is perhaps a result of de´ja` vu’s unfortunate association with things mysterious and unempirical, such as reincarnation and extra sensory perception (cf. Funkhouser, 1983). Another factor impeding research progress may be the rarity of the experience, typically occurring only once or twice a year even with those most prone (young adults) (Brown, 2003). But perhaps the most important hindrance to research on de´ja` vu is the lack of a clear eliciting stimulus. In culling through personal descriptions, it is nearly impossible to find a clear or consistent trigger for de´ja` vu. Nearly all published descriptions focus on the nature of the cognitive disruption or one’s personal reaction or what one feels during the experience. The quote by Dickens at the start of this chapter is typical of published descriptions. Thus, it is a serious challenge to identify stimuli that could reliably elicit a de´ja` vu in the lab. Later in this chapter, we will describe ways in which current research has attempted to scientifically evaluate this phenomenon. Rather than attempting to recreate a full-blown de´ja` vu experience, most research approaches this topic indirectly: how can we increase the probability of a false positive familiarity illusion? Simply put, de´ja` vu is a recognition failure—an involuntary false alarm. Under normal circumstances, we experience familiarity for objects and situations that we have encountered before, and unfamiliarity for those that we have not. With de´ja` vu, we have a sense of
36
Alan S. Brown and Elizabeth J. Marsh
strong positive familiarity for items that we know to be novel: ‘‘any subjectively inappropriate impression of familiarity of a present experience with an undefined past’’ (Neppe, 1983, p. 3). Given the rarity of de´ja` vu, most information has been gathered retrospectively through surveys. Such data reveal that de´ja` vu is experienced by two-thirds (67%) of respondents, with the incidence highest among those in their late teens and 20s, and dropping off steadily with increasing age (Brown, 2003, 2004). Among experients—those who report ever having the experience—it is reported much less frequently as one ages. The experience happens more often among more educated, more liberal (politically/religiously), and more traveled individuals, and is unrelated to gender or race. De´ja` vu is typically associated with an entire setting, rather than with specifiable elements (objects, people, or sounds). It also accompanies the preseizure aura in a small percentage of temporal lobe epileptics. Apart from specific temporal lobe pathology (seizure; tumor), de´ja` vu has not been clearly connected with any physical or psychological pathology. The vague nature of the experience provides a fertile ground for theoretical speculation, with few clear constraints. Over 50 explanations have been proposed, the most viable of which are subsumed under three different categories: perceptual, memory, and physiological (cf. Brown, 2004). All can connect to theories and findings that have emerged in research on cognition and neuroscience. In fact, we are at a propitious point in the evolution of our research designs/tools, where we can begin to conduct more precise tests of such theoretical speculation. This chapter is intended primarily to summarize research findings on de´ja` vu published since previous summaries (Brown, 2003, 2004) and to give a sense of where the field is heading.
2. Perceptual Explanation Usually referred to as perceptual gap or split perception, a de´ja` vu may occur when a person processes the present sensory input twice, in rapid succession. The first input experience is brief, degraded, occluded, and/or while distracted. The second perception, immediately following, then seems strangely familiar because it connects to the immediately prior input (unbeknownst to us). As with each category of explanation, many variations exist that can be traced back over a century (Angell, 1908). This particular explanation is exceptional because it received formal attention by a pioneer of modern cognitive science:
Recent Research on De´ja` Vu
37
. . . you are about to cross a crowded street, and you take a hasty glance in both directions to make sure of a safe passage. Now your eye is caught, for a moment, by the contents of a shop window; and you pause, though only for a moment, to survey the window before you actually cross the street. . .the preliminary glance up and down, that ordinarily connects with the crossing in a single attentive experience, is disjointed from the crossing; the look at the window, casual as it was, has been able to disrupt the associative tendencies. As you cross, then, you think ‘‘Why, I crossed this street just now’’; your nervous system has severed two phases of a single experience, both of which are familiar, and the latter of which appears accordingly as a repetition of the earlier. (Titchener, 1928, pp. 187–188)
2.1. Jacoby and Whitehouse (1989) Titchener’s quote was a focal point for the first scientifically rigorous test of a possible mechanism underlying de´ja` vu. Jacoby and Whitehouse (1989) modeled Titchener’s ‘‘hasty glance’’ through a brief visual exposure in a controlled laboratory setting. If this explanation is true, then a subthreshold glance at a word should create a heightened sense of familiarity for it when it is viewed in full, moments later. Jacoby and Whitehouse’s design involved two stages: first, an input list of words; second, an old/new recognition test. The recognition test was one word at a time. Each test word was preceded by a briefly flashed stimulus consisting of (a) the word itself (identical), (b) a word different from the test word (different), or (c) no word (none). The key finding was that when the prior glance involved the test word itself (identical), this increased the likelihood of misidentifying this new word as having occurred on the prior list—relative to new words in the different or none prime conditions. This finding was replicated both within the Jacoby and Whitehouse article, and in subsequent research (Bernstein & Welch, 1991; Gellatly, Banton, & Woods, 1995; Joordens & Merikle, 1992; Klinger, 2001). This demonstration of a false positive familiarity illusion captured the imagination of many, as reflected in a phenomenal number of subsequent articles (over 200) that have cited the Jacoby and Whitehouse study. This captured our attention as well. Rather than forcing subjects to stay mentally within the confines of a laboratory in making familiarity assessments, we wanted to know whether a false positive familiarity illusion could be pushed much further back into one’s personal past, prior to the lab (Brown & Marsh, 2009). If so, this could move a step closer to modeling actual de´ja` vu experiences. Thus, our goal was to capture some sense of the amorphous temporal quality that typifies de´ja` vu—‘‘this experience has happened sometime before in my life, but I don’t know exactly when.’’ Brown, Porter, and Nix (1994) confirmed that subjects have
38
Alan S. Brown and Elizabeth J. Marsh
difficulty identifying just when the prior experience supposedly happened: survey respondents were evenly distributed on whether the illusory prior encounter happened days, weeks, months, or years ago.
2.2. Split Perception: Study 1 We attempted to increase the verisimilitude of Jacoby and Whitehouse’s design via two experimental design changes (Brown & Marsh, 2009). First, we eliminated the input list and used only a test list. This alteration would, we hoped, force our subjects to attribute any sense of enhanced familiarity to experiences prior to the current lab session: ‘‘have you had a pre-experimental encounter with this symbol?’’ The second issue involved stimulus materials. Asking about a pre-experimental encounter rules out the use of words, because practically all words have been seen prior to the experiment. Instead, we gathered a collection of relatively unfamiliar line drawings, and cataloged how unfamiliar such symbols were by using a pilot group of subjects to rate these 300 black and white line drawing figures. Based upon these ratings, we sorted symbols into three sets: novel, low familiarity, and high familiarity. A sample of each type is shown in Figure 1. Novel symbols:
Low-familiarity symbols:
High-familiarity symbols:
Figure 1 (2009).
Novel, low-familiarity, and high-familiarity symbols from Brown and Marsh
Recent Research on De´ja` Vu
39
To recap, we followed the Jacoby and Whitehouse (1989) procedure of preceding each figure with a brief flash of (a) the same stimulus (identical), (b) a different stimulus (different), or (c) nothing (none). As is obvious from the examples in Figure 1, we expected that high-familiarity stimuli (e.g., a heart) would have been seen prior to the study, but included them so that all subjects could respond ‘‘yes’’ on some trials. However, we were uninterested in analyzing these high-familiarity stimuli because their ratings should be at a ceiling, limiting the possibility of increasing judged familiarity. Our primary finding replicated Jacoby and Whitehouse (1989): a brief glance at a figure just before judging its familiarity significantly increased a sense that it had been seen before. For novel figures, subjects were five times more likely to claim a pre-experimental encounter in the identical prime condition (15% rated as seen before) than in either the different (3%) or no prime (3%) conditions. The same significant effect also occurred with low-familiarity stimuli: an identical prime roughly doubled the probability of claiming a prestudy encounter (28%), compared to different (16%) or no prime (13%) conditions. Thus, we successfully created an illusion of a previous experience, by simply flashing the stimulus briefly ahead of itself. This again confirmed Jacoby and Whitehouse’s finding that a ‘‘new’’ stimulus word (or symbol) can be misattributed as having been seen before. However, we showed that this effect can be induced for stimuli that the subject probably has never seen before (novel symbols), and demonstrated that this misattribution can extend to a time frame and place outside the laboratory. Our intent was to test the split perception theory of de´ja` vu by pushing familiarity around, and we did not anticipate that our manipulation would be powerful enough to produce a full-blown de´ja` vu experience. Checking on this item by item would have been ill-advised from several perspectives. Not only would it have considerably slowed the procedure, we were concerned that it would create an expectation bias. But just to check on the possibility, we asked subjects after the procedure was over whether they had experienced a de´ja` vu at some point during the study. Surprisingly, 50% said that they had. There was no way to confirm that these experiences happened on an identical prime trial, rather than different or none prime trial. However, given that most of these same subjects (71%) reported that de´ja` vu occurred less frequently than once a month, this finding was intriguing.
2.3. Split Perception: Study 2 We conducted several follow-up investigations to Brown and Marsh (2009) that required a more complex evaluation of familiarity. Requiring that subjects assign any sense of increased familiarity only to pre-experimental
40
Alan S. Brown and Elizabeth J. Marsh
encounters may have been less sensitive to subtle changes in familiarity that might have occurred. What if the familiarity enhancement was modest and insufficiently intense for subjects to consider it as emanating from a prestudy exposure? In the first follow-up, we used the same design as Brown and Marsh (2009), except that subjects rated symbols on a more general familiarity scale of ‘‘have you ever encountered this design before?’’ (1 ¼ definitely no; 6 ¼ definitely yes). Congruent with our published report, a brief exposure significantly increased familiarity ratings for both novel and low-familiarity symbols. For novel figures, mean familiarity for both the different (1.8) and none (2.1) prime conditions was significantly lower than that for identical prime (4.3). As in Study 1, this effect replicated with low-familiarity symbols. Compared to the different (2.5) or none (2.7) prime conditions, a brief exposure to itself (identical prime) significantly increased rated familiarity (4.8). To gather more detail on the familiarity attribution, on each trial where a symbol was rated as familiar (ratings 4, 5, and 6), subjects also assessed the familiarity source: (1) prior exposure during the study, (2) prestudy encounter, or (3) unsure. In the identical prime condition, subjects attributed their sense of familiarity most often to in-study exposure (81%) rather than to prestudy (13%) or unsure (6%). For the different and no prime conditions, the positive familiarity attributions were more evenly distributed between in-study (43%) and prestudy (41%), with a few unsure (16%) responses. This finding suggests that the familiarity enhancement generated by a quick glance will be primarily attributed to a recent within-experiment experience, if subjects are given this option. Thus, our published report may actually underestimate the impact of our manipulation. If one feels a strong feeling of already seeing this particular symbol, the predominant attribution may be to a recent exposure, earlier in the series of just-rated symbols. Perhaps subjects in Brown and Marsh (2009) were inclined to attribute the identity prime familiarity boost to a recent encounter—within the study—and thus less likely to attribute it to a prelab encounter. Such speculation aside, the most important finding in Study 2 is a replication of familiarity enhancement found in Study 1. Interestingly, a postexperiment inquiry again revealed that about half (46%) of the subjects experienced de´ja` vu during the procedure.
2.4. Split Perception: Study 3 In Study 2, both the prime and the target symbol were presented foveally, in the center of the computer screen. We wondered whether the effect would change if the prime symbol was processed off to one side, in the parafoveal area. One explanation of de´ja` vu is that it results from an initial peripheral perception of one object while focusing on something
Recent Research on De´ja` Vu
41
else near it. You drive to a restaurant for the first time, and while you approach the front door an unusual flowering plant beside the entrance captures your attention. When you then look directly at the distinctive doorway, you are struck by an unsettling feeling of familiarity. It is possible that the visual information (doorway) was briefly preprocessed in the foveal area while you were looking at the plant, and when this impression matched the subsequent fully processed view, a de´ja` vu resulted. . . . it is very common for people to be in situations where there are many unattended stimuli outside their immediate focus of attention that are not consciously experienced. . .For this reason, the experimental conditions in studies in which unattended stimuli are presented at spatial locations removed from the current focus of attention more closely resemble the conditions under which visual stimuli are perceived in everyday situations. . . . (Merikle, Smilek, & Eastwood, 2001, p. 122)
Research on inattentional blindness provides credence to the potency of unattended, parafoveal stimuli (Mack, 2003; Mack & Rock, 1998). Participants perform a fairly simple visual task, such as judging which of the two arms of a briefly presented cross (vertical; horizontal) is longer. The procedure extends for many trials, and on a few of these trials another stimulus (a symbol, word, or letter) accompanies the cross, off to one side. Surprisingly, most participants fail to report seeing the additional item, although they show priming for this stimulus on a subsequent indirect memory test, indicating that it was processed without their awareness. To evaluate the peripheral priming possibility for de´ja` vu, Study 3 modified the design used in Study 2. Rather than the identical or different symbol appearing in the same foveal location as the subsequently rated symbol, it appeared offset toward one of the four corners of the computer screen. The outcome essentially replicated Study 2. For novel symbols, the identical prime boosted the rated familiarity substantially (4.1) over the different (2.0) and none (1.8) prime conditions. For the low-familiarity symbols, identity prime symbols were again rated much more familiar (4.5) than those in either the different (2.4) or none (2.4) conditions. Taken together, this series of studies supports the possibility that a perceptual double-take (i.e., a superficial glance followed by a close look) can elicit an exaggerated sense of familiarity for a stimulus. This enhancement is repeatedly shown to occur for both novel and low-familiarity symbols across three different studies. This boost in assessed familiarity was found with both an ambiguous (during vs. before experiment) source rating (Study 2 and 3), as well as a pre-experiment source rating (Study 1; Brown & Marsh, 2009), with the latter finding more directly supporting the concept of de´ja` vu.
42
Alan S. Brown and Elizabeth J. Marsh
2.5. Superficial Glance ¼ Shallow Processing? Given that the split perception explanation seems viable, what mechanism(s) might underlie this? In other words, what forces a subjective temporal separation between these two adjacent perceptual experiences? One possibility is that the initial glance involves shallow processing, where only superficial physical attributes are extracted from the stimulus. And perhaps stimuli that are processed in a shallow manner seem older to us, when contrasted to deeply processed stimuli. We tested this by presenting a list of words, some of which were processed deeply (can you carry this?) and others shallowly (does it have an ‘‘e’’?). After a short distractor task, subjects identified whether each word had appeared in the 1st, 2nd, or 3rd third of the input list. There was a general bias to guess ‘‘middle third’’ for both shallow and deep words, probably reflecting a middle-of-the-road response default when unsure. However, there was a clear difference between deep and shallow items on whether subjects believed that they had been presented in the first-third or last-third of the list (Figure 2). Overall, 34% of deeply processed words were judged as more proximal (end of list; last 3rd), compared to 21% of shallow words. In contrast, distal (beginning of list; first 3rd) judgments related to level of processing in the opposite manner: 24% of deeply processed words seemed to have occurred earlier in the list (first 3rd) compared to 37% of the shallowly processed words. Remarkably, this bias remained consistent across items actually appearing in 40 35 30 25 20
Shallow
15
Deep
10 5 0 1st 3rd as 2nd 3rd as 3rd 3rd as early early early
1st 3rd as 2nd 3rd as 3rd 3rd as late late late
Figure 2 Mean percentage of items in each list third judged as early (appearing in first third) or late (appearing in last third), for shallowly and deeply processed words.
Recent Research on De´ja` Vu
43
the 1st, 2nd, and 3rd thirds of the list. This outcome can potentially clarify why the split perception experience leads to a sense of de´ja` vu. The initial, shallowly processed impression gets temporally pushed back (older), while the subsequent deep look gets pulled forward (recent). This contrast in the moment, with two impressions duplicated in immediate succession drifting in opposing temporal directions, may exaggerate the actual time separation which then leads to a sense of de´ja` vu.
3. Implicit Memory Explanation There are a number of different versions of the implicit memory interpretation of de´ja` vu. All are grounded in the assumption that a de´ja` vu occurs because some aspect of the current situation has actually been experienced before. When the present stimuli hook into previously stored memories which are lacking temporal or contextual tags to assist in the conscious identification of the source of ‘‘oldness,’’ a sense of familiarity that is aroused cannot be explicitly identified. Several lines of research tie into this general explanation.
3.1. Episodic Experience One of the most reasonable and straight-forward interpretations for de´ja` vu is that a person actually has experienced this situation or setting before, but has simply forgotten it. Given the enormous amount of information that we process, it seems likely that there are stored memories of many different types of outdoor scenes, palaces, verbal phrases, plot themes, social situations, hotel lobbies, and melodies, many of which may have lost their explicit memory tag. When a current stimulus connects with one of the episodically disconnected and orphaned memories, this unbeknownst resurrection of the stored representation could yield a vague and unsettling sense of prior experience. Because the objective data that we sort through in the moment are insufficient to support this familiarity, we interpret it as a discomfiting memory illusion. A marvelous commercial by Hotels.com (Deja View) (http://www. elsevierdirect.com/companions/9780123809063/Supplemental/material/1) illustrates this scenario. A couple enters a hotel room, and against a background of spooky music, the moderately distressed man says ‘‘I’ve been in this room before!’’ His nonchalant woman partner replies ‘‘What?’’ to which he emphatically repeats ‘‘I’ve been here before!’’ The woman quickly solves his quandary by reminding him that ‘‘You took the virtual tour on Hotels.com.’’ While this serves as a great relief to the man, it illustrates how
44
Alan S. Brown and Elizabeth J. Marsh
readily such information may become planted in our experiential memory at a shallow level, and then subsequently connected with the real situation that is playing out in front of us, causing momentary memorial distress. 3.1.1. Episodic Experience: Study 1 To model this possibility in the lab, we used our captive audience of undergraduate students to create a plausible memory dilemma (Brown & Marsh, 2008). Most college students visit numerous college campuses prior to their final selection, and we used this fact to help evoke a false sense of prior experience. Students signed up for a two-stage study. During the first, they saw a variety of different scenes: mountain ranges, courtyards, campus buildings, serene lakes, etc. Embedded in each was a small cross, and their task was to identify which quadrant of the picture this cross was located in. We pushed them along at a good clip, so that they would process the pictures in a relatively superficial manner. Mixed in among these pictures were some campus shots from a university that they were not attending. We did verify, postexperimentally, that the ‘‘other’’ campus had not been visited and excluded the handful of Duke students who had actually visited SMU, and those SMU students who had toured Duke. Our main objective was to plant unfamiliar campus images in the students’ memories, in a way that could subsequently evoke a false impression of an actual prior visit. To model de´ja` vu, it was important to ask not simply if the scene was familiar, but if the student had actually been to the location depicted in the photo. Both mundane and unique scenes from both campuses were included, because anecdotal reports suggest that de´ja` vu can occur in both ordinary circumstances (hanging out with friends, relaxing, watching TV) as well as unusual settings (Brown, 2004). This difference is illustrated by these two open-ended survey responses: I was sitting in this guy’s apartment talking about something and I got little flashes like I had been there talking about the same thing and I know it never happened before. I was going to a rock concert in downtown Fort Worth. When we got to the parking lot, I looked up and noticed all the buildings around me. At that moment, I felt as if I had experienced that exact same scene before, although I had never been to downtown Fort Worth.
Examples of these unique (chapel; famous monument) and mundane (dorms; academic classrooms) campus settings are shown in Figure 3. Presentation frequency (once or twice) was varied during the initial cross-detection phase. This manipulation did not have a theoretical underpinning, but was included to see if memory strength might influence false visit attributions. After completing the rapid cross-detection task, subjects returned one week later for session 2, during which they viewed scenes from their home campus and the unfamiliar campus. Home-campus shots did not appear in
45
Recent Research on De´ja` Vu
Unique locations
Mundane locations
Figure 3 Examples of unique and mundane campus locations used in Brown and Marsh (2008).
session 1, but were added at session 2 to assure that each subject could respond that they had actually visited some of the locations. Each photo was shown briefly (half a second) to limit analytical processing, and subjects were instructed to respond quickly based on first impression. After each photo was presented and removed, subjects evaluated whether they had actually been at that particular location using a four-point scale: no, might, probably, definitely. Visit ratings for the critical (away) campus shots were significantly higher for those exposed before, in session 1, compared to those that had not. However, there was no difference between scenes viewed once versus twice in session 1. As expected, mundane shots were given higher visitation ratings than unique shots, because there were fewer clues available to discount a possible visit. But the boost in visit ratings from prior exposure was consistent across unique and mundane scenes. 3.1.2. Episodic Experience: Study 2 These results were essentially replicated in a second study (Brown & Marsh, 2008, Experiment 2). Presentation frequency was again manipulated (one or two exposures in session 1), in addition to retention interval between
46
Alan S. Brown and Elizabeth J. Marsh
sessions 1 and 2: one versus three weeks. Prior exposure in session 1 again boosted subsequent personal visit assessments, and presentation frequency and retention interval had no effect on the degree of this enhancement. Similar to the split perception studies described earlier in this chapter (Sections 2.2–2.4) (Brown & Marsh, 2009), a postprocedural interview revealed that nearly half of the subjects admitted having a de´ja` vu sometime during the procedure: 46% in Experiment 1; 49% in Experiment 2. As explained earlier, we cannot determine which specific item(s) elicited de´ja` vu, as this would require an item by item query during the procedure. However, their general responses provide encouragement that this paradigm may model real-life de´ja` vu experiences. More specifically, de´ja` vu could occur when the present scene or setting duplicates one experienced before in the form of a magazine, movie, PowerPoint presentation, website, or newspaper.
3.2. Single-Element Familiarity Explanation In the above experiments with campus scenes, we explored the possibility that de´ja` vu could stem from having seen an entire scene before. But another implicit-memory possibility is that de´ja` vu could be triggered when a small part of a scene is familiar. Imagine walking into a friend’s living room for the first time and being struck by a feeling of eerie familiarity. It is only later that you realize this familiarity stems from a lamp on her end table that is identical to one in the basement recreation room of your best friend during high school. The source of this intense familiarity—triggered by that single element—is not immediately identified and over-generalizes to the entire scene. Consider another, related example: you walk across campus when two people approach you, talking with each other. You for sure recognize the person on the left, but then feel like you must know person with them but cannot figure out from where. Does your familiarity for person A affect your sense of familiarity for person B? Both of these examples involve the possible spill-over familiarity from one element, whether it affects the familiarity of an entire scene (example 1) or another element (example 2). 3.2.1. Single-Element Familiarity: Study 1 We began with a laboratory investigation of the second example by asking whether the familiarity of one single element can ‘‘bleed over’’ and influence the familiarity evaluation of a second item (Brown & Marsh, 2007; Marsh & Brown, 2010)? Would low-familiarity symbols, selected from our symbol pool from the split-perception studies (Brown & Marsh, 2009) increase or decrease in rated familiarity, depending on the familiarity level of the symbol that was shown with them? More specifically, could we bias subjects to give a higher rating if a high-familiarity symbol accompanied the target, and would subjects reduce a target symbol’s rated familiarity
47
Recent Research on De´ja` Vu
if accompanied by a novel symbol? Two factors were manipulated: how long the target appeared on the screen (100 vs. 1000 ms), and whether the target appeared (a) alone, (b) with a novel symbol, or (c) with a highfamiliarity symbol. The procedure for the first experiment is summarized in Figure 4. Subjects were told ‘‘your job is to decide how familiar the target symbol is. In other words, you are to judge how well you are acquainted with the target symbol in everyday life.’’ On both two-symbol and one-symbol trials, the judgment was made after the symbol(s) disappeared and a question mark appeared in the location of the to-be-rated symbol. In sum, a ready prompt was followed the symbol(s), which were then briefly masked and replaced by a question mark indicating the target symbol. On the scale of 0 (very unfamiliar) to 5 (very familiar), mean performance on filler trials indicated that subjects were using the scale properly: novel ¼ 0.80; high familiarity ¼ 4.23. More importantly, test context mattered. Mean judged familiarity for a low-familiarity symbol was lower when accompanied by a novel (1.55) compared to a high-familiarity (2.10) symbol, and intermediate when presented alone (1.81). This effect did not depend upon symbol presentation time. 3.2.2. Single-Element Familiarity: Study 2 Study 1 required subjects to remember which symbol had been presented where. A question mark appeared in the location where the target had been, but the symbol itself was not in view for the judgment. Thus, subjects may have occasionally judged the wrong symbol because they had forgotten where it had been shown. To address this, the second experiment modified the procedure (see Figure 4): following a ‘‘ready’’ prompt, the symbol(s) appeared for 2 s Study 1: 2000 ms
500 ms
100 or 1000 ms
Ready?
500 ms
Judgment
?
Study 2: 2000 ms
2000 ms
Judgment
Ready?
Figure 4
Experimental procedure used in single-element familiarity studies.
48
Alan S. Brown and Elizabeth J. Marsh
before a box appeared around the target. Replicating the first experiment, a low-familiarity symbol was judged to be more familiar (2.16) if paired with a high-familiarity symbol than if alone (1.97). However, the novel symbol no longer influenced familiarity judgments: the target (low-familiarity) symbol was equally familiar when tested alone (1.97) or with a novel symbol (2.00). This outcome suggests that familiarity is easier to enhance than decrement, so in the remaining experiments in this series we focused on whether high-familiarity neighbors could pull target familiarity up. Our use of both more and less familiar neighbors in Study 1 was mainly for academic curiosity—to see if symmetrical effects exist. De´ja` vu relates primarily to increasing familiarity through a familiar accompanying element. Decrementing familiarity ties in with jamais vu, a lesser known phenomenon which is related to de´ja` vu and described later in this chapter (Section 6.4). However, as in real life, our jamais vu model appears to be less reliable than de´ja` vu. 3.2.3. Single-Element Familiarity: Study 3 One simple, but reasonable, alternative explanation for the boost in familiarity rating described above is that familiarity increase when two symbols are shown on the screen compared to one (control condition), and not because of the presence of a high-familiarity neighbor. To address this, we compared the effects of a high-familiarity neighbor symbol with a low-familiarity neighbor. A low-familiarity symbol accompanied by another low-familiarity symbol received a similar familiarity rating (1.81) to when it appeared alone (1.82). In contrast, pairing a low- with a high-familiarity symbol (M ¼ 1.94) increased its perceived familiarity. Thus the earlier effects were not simply due to seeing two symbols at one time. Rather, a more familiar neighboring symbol increases perceived familiarity of a less familiar target. 3.2.4. Single-Element Familiarity: Study 4 We also tested a perceptual explanation of the familiarity effect. Perhaps the high-familiarity symbol changed the interpretation of the target symbol. For example, does a random squiggly symbol look more like a nameable object when paired with a familiar handicap symbol? To test this, we changed our dependent measure from rating familiarity to identifying the meaning of the symbol. Participants were told that we were interested in their ability to identify symbols, and that some would be very easy to identify and for others they would have no idea of the meaning. They were warned against guessing, and instructed to type ‘‘I don’t know’’ if they did not know the meaning of a symbol. The same procedure was used, with only the evaluation measure changed. Following a ‘‘ready’’ prompt, the symbol(s) appeared for 2 s. Then, a text box appeared and subjects answered the question ‘‘What does the target drawing mean to you?’’ We scored the data in two ways. First, we computed the proportion of symbols that subjects could label,
Recent Research on De´ja` Vu
49
regardless of the nature of the label. Second, in the pair condition, we examined whether the label given to a target symbol was related to the meaning of the accompanying high-familiarity symbol. Overall, subjects were good at identifying the meaning of high-familiarity filler symbols, being correct 87% of the time. As expected, they were much less likely to ascribe meanings to low-familiarity targets, labeling just 33%. This also indicates that they followed the instruction not to guess or make up meanings. Critically, seeing meaning in low-familiarity target symbols was not influenced by pairing with a high-familiarity symbol. Subjects generated interpretations for 32% of alone targets and 33% of paired targets. Furthermore, when subjects did assign meaning to the target, it was rarely related to the high-familiarity neighbor (3%). These results suggest that the effects of the high-familiarity flanker were not due to influencing the interpretation of the target symbol. In short, having memory for part of a scene—the high-familiarity symbol, in our paradigm—can influence one’s feeling of familiarity for other elements of the scene. Less clear is how much this is under conscious control. If subjects are told not to let the familiarity of one object affect their judgment of another, can they avoid its influence? We are currently collecting these data, and our hunch is that subjects will be unable to control the influence of the familiar symbol, in the same way that people are unable to avoid attributing their emotions from one stimulus to another neutral one (Payne, Cheng, Govorun, & Stewart, 2005).
3.3. Gestalt Familiarity Explanation In addition to seeing an entire scene before (episodic experience) or a piece of a scene (single-element familiarity), another type of implicit-memory explanation for the de´ja` vu experience is that the general framework of the current circumstance or setting resembles one experienced before. Assume that you are a college student making a trip to a new campus to see a high school buddy. During the drive through the main drag on campus, you are struck by an eerie sense of having been here before. What may be familiar is a general layout: a central quadrangle, surrounded by a white chapel on the left and a fountain in the middle and a two brick classroom buildings on the right. Although no specific feature is identical to one with which you are familiar, the general layout follows a well-etched mental template. As with other de´ja` vu interpretations, this one also reaches back over a century (Sander, 1874), and Dashiell (1937) includes a great street-scene visual illustration of how this could work (cf. Brown, 2004). 3.3.1. Familiarity without Identification Research Cleary, Ryals, and Nomi (2009) designed a clever study to evaluate this gestalt model of de´ja` vu. But before describing this study, some background on Cleary’s (2004, 2008) research would help. In her study of familiarity
50
Alan S. Brown and Elizabeth J. Marsh
based recognition, or recognition without identification (Cleary, 2008), a general sense of familiarity appears to guide recognition decisions, even when we do not have access to the specific prior experience which elicits this feeling. To illustrate this, if subjects first study a list of celebrity names, and then provide celebrity names to face cues, subjects can discriminate between the celebrity names which did, versus did not, appear in the initial name list, even when they cannot produce the celebrity’s name in phase two (Cleary & Specker, 2007). It is as if the familiarity spread from the person’s name to their face, so that it received implicit activation. This activation was sufficient to support the recognition that it was connected with a prior experience, but insufficient to facilitate name retrieval. Recognition without identification also has been demonstrated with famous scenes. Similar to Cleary and Specker (2007), Cleary and Reyes (2009) had subjects first study names of famous landmarks and locations (Stonehenge, Taj Mahal), and then provide the names for pictures of such places. Among pictures that remained unnamed, subjects could discriminate those whose name had, versus had not, appeared on the prior list. This again illustrates that a sense of prior experience can be triggered by a face or edifice cue, even when the prior experience and specific studied name cannot be recalled. 3.3.2. Gestalt Familiarity Study Cleary et al. (2009) constructed a direct test of the gestalt theory of de´ja` vu, using her recognition without identification paradigm. Black-and-white line drawing stimuli depicting various scenes were constructed in pairs, resembling each other in overall configuration. A sample configural pair in Figure 5 depict an arbor (left) and castle drawbridge (right).
Figure 5 Configurally familiar scene pair from Cleary et al. (2009).
Recent Research on De´ja` Vu
51
Subjects were asked to remember each study scene and the accompanying verbal description of it (arbor). At test, none of the original scenes were shown. Rather, half of the test scenes (castle drawbridge) configurally resembled one of the studied scenes and half did not. The configurally similar scene served as the memory cue, and subjects’ attempt to identify (provide the label for) the input list picture that it resembled. As before, when subjects were unable to recall a corresponding input scene they still showed evidence of recognition without identification. Familiarity ratings were higher for tests scenes that resembled input scenes, compared to those that did not. After each familiarity decision, subjects were asked if they had experienced de´ja` vu, and these reports mirrored familiarity assessments: de´ja` vu occurred more often for test scenes resembling input scenes, compared to those with no resemblance. Given that these two ratings were always done in the same order—familiarity, then de´ja` vu—the familiarity rating may have biased the de´ja` vu rating. In Experiment 2a, Clear et al. (2009) had subjects report only de´ja` vu experiences (no familiarity rating). As before, de´ja` vu was more likely with configurally related test scenes, compared to unrelated ones. Cleary et al. (2009) argue that their findings suggest that a single process underlies both de´ja` vu and familiarity. They base this speculation on two lines of evidence. First, configural resemblance produces similar effects for both de´ja` vu and familiarity. Second, a questionnaire study revealed that 79% of respondents define de´ja` vu as logical familiarity—re-experiencing something old that you know is old. Only 7% defined de´ja` vu as illogical familiarity—something new that feels old. This survey outcome should serve as a general caution about assuming that subjects doing de´ja` vu ratings actually understand the accurate or technical definition of the term.
3.4. Hypnosis Banister and Zangwill (1941a, 1941b) attempted to elicit de´ja` vu experiences in the laboratory, to model the implicit memory explanation that de´ja` vu occurs because this particular experience has happened before but has been forgotten (Brown & Marsh, 2008). They presented pictures (Banister & Zangwill, 1941a) or odors (Banister & Zangwill, 1941b) to hypnotized subjects, followed by a posthypnotic suggestion to forget the encounter. One day later, in a normal waking state, subjects were tested about their recollection (and familiarity) for these same pictures or odors. While this approach holds promise, serious problems exist with this particular application (Brown, 2004). Recently, O’Connor, Barnier, and Cox (2008) conducted an investigation improving on this hypnosis design, using a unique puzzle task as the memory target. All subjects attempted to solve the puzzle while hypnotized. Some were given the posthypnotic suggestion to be amnesic about the puzzle, while others were told that the puzzle would later feel familiar. Later, during a nonhypnotized session, five of six
52
Alan S. Brown and Elizabeth J. Marsh
subjects in the familiarity group experienced a strong sense of de´ja` vu when encountering this puzzle, whereas none of six subjects in the amnesia group felt strong de´ja` vu. This study raises the tantalizing possibility that the sense of de´ja` vu can be recreated in a laboratory setting with the right parameters and procedures (hypnotic suggestion). Cleary (2008; Cleary et al., 2009) has echoed O’Connor et al.’s (2008) optimism, suggesting that given how much familiarity without recollection resembles de´ja` vu, we may eventually be able to reliably elicit de´ja` vu using laboratory manipulations which are proven to successfully affect familiarity ratings.
4. Physiological Explanation Turning to the third class of explanations, one of the earliest interpretations of de´ja` vu is that it reflects an alteration in the normal brain functions that utilize multiple pathways of information transmission. Osborn (1884) speculated that the sensory signals transmitted from the eyes to the occipital area separate and follow different tracks to the right and left hemispheres. This information then merges together at the occipital lobe to produce one unified perceptual impression. On occasion, the messages become slightly asynchronous, producing a sensation of de´ja` vu. The slight temporal delay in one track results in two visual impressions rather than one as they arrive successively (rather than together) at their destination. The trailing sensation seems to be a duplication of the first. These transmissions become slightly dysphasic due to a neurological event, such as a slight synaptic deficiency at some point on one of the two pathways. The brain misinterprets this slight separation as reflecting temporally distinct experiences, and the logical interpretation is that the present experience duplicates one from an earlier time and place (Brown, 2004).
4.1. Neural Transmission Asynchrony Current technology allows an experimental test of this pathway asynchrony. Bogdan Kostic at Colorado State University used brief visual presentations of a common stimulus (words; faces), sent separately to both the right and left hemispheres. An asynchronous presentation of an identical image to both the right (left visual field) and left (right visual field) hemispheres, offset slightly (20 ms apart), should result in an enhanced sense of familiarity. Kostic did find partial support for such familiarity enhancement with presentation asynchrony, but the results were not straight-forward. A word presented in the right before the left visual field was judged to be significantly more familiar than
Recent Research on De´ja` Vu
53
the reverse—left before right. Simultaneous presentation resulted in a familiarity rating intermediate between the two asynchronous conditions. Kostic speculates that the right-first asynchrony enhances familiarity, relative to left-first, due to the left hemisphere advantage in language processing. If this explanation is true, then nonverbal stimuli should result in left-first familiarity enhancement, compared to a right-first presentation. Unfortunately, face stimuli did not result in a left-first advantage, with no familiarity rating difference between asynchronous and simultaneous presentation. These findings are very intriguing, but Kostic points out that the length of the delay between presentations that he used (20 ms) may be too long, and that endogenous delays in the nervous system that produce this outcome may be much shorter.
4.2. Surgical Elimination of De´ja` Vu The earliest scientific research on de´ja` vu was based on the assumption that it indicates brain pathology—seizure activity currently exists or is likely to develop. This speculation originated from the observation that some individuals with temporal lobe epilepsy (TLE) experience de´ja` vu in their preseizure aura (Brown, 2003, 2004), but the accumulated data do not support a stronger conclusion of brain pathology. Despite this erroneous early assumption, research on TLEs has continued to provide useful evidence about the nature of recognition processes involved with false familiarity. Bowles, Crupi, Mirsattari, Pigott, Parrent, et al. (2007) describe a young woman who developed TLE in her preteen years, and her preseizure auras routinely included de´ja` vu. These seizures could not be managed by medication, and surgical correction was required. The surgery removed a brain tumor and surrounding tissue, which included the amygdala, entorhinal cortex, and perirhinal cortex. Both her seizures and de´ja` vu experiences were eliminated. But an interesting result of surgery is that her ability to assess familiarity was eliminated, while recollection was preserved. Using experimental tests involving list learning procedures with the remember/ know task (Gardiner, Ramponi, & Richardson-Klavehn, 1998), the patient performed better than a control group on recollection (do you recall the item’s presentation?) while showing a pathological absence of familiarity (does the item seem familiar?). This was confirmed across four cognitive tasks using a variety of different encoding and response manipulations. The clear implication of Bowles et al. (2007) is that de´ja` vu is associated with a separate cognitive system that governs familiarity, apart from brain structures involved with contextually guided recognition evaluations.
4.3. Surgical Elicitation of De´ja` Vu A second study dovetails nicely with Bowles et al. (2007). Prior to surgically removing tissue in epileptics, surgeons often implant depth electrodes in various areas of the brain that appear to be the origin sites of seizure activity.
54
Alan S. Brown and Elizabeth J. Marsh
These electrodes can both stimulate and record electrical activity. While procedural sophistication has evolved over recent years, the accumulated findings have not provided a reasonably precise or replicable picture concerning where de´ja` vu experiences may originate (Brown, 2004). While de´ja` vu can be created through stimulation of electrodes planted in and around the temporal area, inconsistent results and procedural problems (e.g., spread of stimulation) cloud these findings. A recent study is notable for the reliability with which it was able to elicit de´ja` vu in TLEs. Bartolomei, Barbeau, Gavaret, Guye, McGonigal, et al. (2004) found that de´ja` vu experiences could be triggered via stimulation of the rhinal cortex in seven (of 24) patients, and that repeated stimulation produced the same de´ja` vu response. Replicable electrical elicitation of de´ja` vu was a first, but they were also able to differentiate between the perirhinal and entorhinal cortices. Recall that Bowles et al. (2007) (above) discovered that removal of both perirhinal and entorhinal cortices eliminated de´ja` vu (and familiarity) in their patient. Bartolomei et al. were able to differentiate between these two structures by finding that the entorhinal cortex is the key: 3% of perirhinal stimulations resulted in de´ja` vu, whereas 17% of entorhinal stimulations elicited de´ja` vu. A second investigation implicates other areas that may be involved in de´ja` vu, or at least capable of creating the sensation through indirect pathways via spillover activation. Kovacs, Auer, Balas, Zambo, Klivenyi, et al. (2009) present a case study where de´ja` vu was repeatedly elicited through stimulating the globus pallidum. Remarkably, this woman had never previously experienced a de´ja` vu. However, there are several qualifications on this report. De´ja` vu only occurred with a relatively high-level of electrical stimulation, raising the possibility that the experience resulted from indirect activation of neighboring brain regions. Furthermore, the illusions only happened with her eyes open, and were reported only in response to a direct query. She would not volunteer reports of de´ja` vu—but only acknowledged it if asked. Data from this particular patient must also be qualified by an early brain injury that altered her normal hemispheric language lateralization. Thus, this patient provides additional evidence that de´ja` vu can be reliably elicited through stimulation of a single brain location, but the specific role of the globus pallidum needs further verification.
5. Reports in Anomalous Individuals 5.1. Blindness De´ja` vu research has primarily emphasized the visual dimension in anecdotal reports, theoretical speculation, and empirical demonstrations (Brown, 2004; Brown et al., 1994; Neppe, 1983). However, many reports involve an auditory component, particularly where a conversation seems eerily familiar:
Recent Research on De´ja` Vu
55
. . . an impression that we have previously been in the place where we are at the moment, or a conviction that we have previously said the words we are now saying, while as a matter of fact we know that we cannot possibly have been in a given situation, nor have spoken the words. Angell (1908, p. 235)
The visual bias in de´ja` vu may stem from the fact that most cognitive research involves visual rather than auditory processing, thus naturally pulling theoretical speculation in this direction. With this context in mind, O’Connor and Moulin (2008) document de´ja` vu in a male who has been blind since birth, and reports that ‘‘hearing and touch and smell often seem to intermingle in the de´ja` vu experiences’’ (p. 247). It would be very useful for our understanding of de´ja` vu if our current theoretical interpretations could be applied to, or tested in, other sensory modes. For example, could the split-perception paradigm that has been successful with visual materials (Brown & Marsh, 2009) extend to auditory identity priming? Would a brief and barely audible (at threshold) presentation of a word, just prior to a clear presentation, result in enhanced familiarity? Perhaps the single-element familiarity (Brown & Marsh, 2007) research with visual symbols could be modeled by presenting an auditory fragment (‘‘bah’’) preceding the full spoken version (‘‘bottle’’). And full spoken phrases, sentences, or short paragraphs might be a viable extension of the visual implicit memory demonstration of de´ja` vu (Brown & Marsh, 2008).
5.2. Chronic De´ja` Vu Two recent case studies report chronic de´ja` vu in four individuals who experience the sensation on essentially a daily basis. Given that de´ja` vu happens only a few times a year even in those most prone (Brown, 2004), a daily rate is extraordinary. This is even more exceptional because all persons in these reports are all middle aged or older, an age range where de´ja` vu experiences are rare. In one report, O’Connor and Moulin (2008) document a 39-year-old TLE patient who experiences de´ja` vu up to three times per day, always associated with the preseizure aura. This annoyance motivated the patient to try active strategies to terminate the sensation—turning his attention to something else; looking away from what he judged to be the eliciting visual stimulus. These efforts were to no avail, as de´ja` vu ‘‘follows my line of vision and hearing’’ (p. 145). O’Connor and Moulin (2008) suggest that this argues against a data-driven (bottom-up) etiology of de´ja` vu. They reason that if de´ja` vu was caused by visual sensations, then altering such stimulation should end de´ja` vu. Although a reasonable position, evidence against an external perceptual trigger does not prove that it can never occur through this route—only that it is not the exclusive triggering stimulus for a de´ja` vu. A second report of chronic de´ja` vu describes three elderly subjects, all 65 or older (Thompson, Moulin, Conway, & Jones, 2004), whose frequent
56
Alan S. Brown and Elizabeth J. Marsh
de´ja` vu experiences made everyday living problematic. They discontinued routine daily activities like watching TV, reading the newspaper, or listening to the radio because it felt as if they have seen or heard these before. Similar to O’Connor and Moulin (2008), Thompson et al. suggest that such cases demonstrate that de´ja` vu is a central nervous system dysfunction, unrelated to specific external perceptual triggers. Incidentally, each of these subjects had some brain pathology (atrophy; hemorrhage), and it is unclear how this might relate to the chronicity of the memory illusion. As a segue to the following section, Thompson et al. propose that their clinical observations suggest that de´ja` vu increases as one ages, a position counter to a large body of evidence (Brown, 2003, 2004). They further suggest that the prevalence of de´ja` vu is underreported because it gets lost in the higher incidence of many other more serious memory problems that pop up as one ages.
6. Continuing Issues 6.1. Aging One of the biggest empirical puzzles about de´ja` vu is its decline with age. This systematic decrease is reflected in the percentage of individuals who admit to ever having a de´ja` vu experience (Chapman & Mensh, 1951), and for individuals who do have de´ja` vu the incidence of the experience declines across their life (Brown, 2003, 2004). Superficially, these findings appear contrary to general findings regarding aging and memory. Familiarity assessment seems to remain relatively stable with age, whereas the capacity to recollect specific temporal and contextual details about experiences decreases (Mantyla, 1993). De´ja` vu represents a strong sense of subjective familiarity in the absence of any objective evidence, and these two functions should show greater divergence as we grow older. Thus, de´ja` vu should increase rather than decrease with age. What are some possible reasons? It may be a measurement issue, involving age-related changes in recall (de´ja` vu is more apt to be forgotten), response bias (older adults are more reticent to admit to de´ja` vu), or cohort (older adults are less aware of the concept) (Brown, 2004). However, it is also possible that older adults learn to rely more on familiarity than recollection, given that the former memory function is more stable. Thus, they are more likely to dismiss a discrepancy between familiarity and the absence of recollection (Cleary, 2008). Also, older adults may be less attentive to details of their surroundings that could possibly trigger a de´ja` vu, and they may also visit fewer places on a regular basis (and thus experience fewer possible triggers). Finally, in the face of an overall increase in memory difficulties, subtle issues like de´ja` vu may not be as noticeable. Incidentally, Thompson et al. (2004)
Recent Research on De´ja` Vu
57
propose that de´ja` vecu, a variant of de´ja` vu where the present experience seems to have been lived through before, increases with age. They base this upon their impression of older adults who come to their memory clinic, and further propose that the experience is underreported by older adults (see above).
6.2. Dreams Following the appearance of an article on de´ja` vu research in a major national newspaper, over 500 e-mails poured in. Most were diligently answered, even though the sender’s desire for a definitive explanation could not be provided. The most curious dimension of these reactions from the general public is that most felt that the ‘‘prior experience’’ had occurred in a dream. Survey data show that one in five college students agree with this dream-origin interpretation. This dream impression needs to be logically explained, in order to remove de´ja` vu from the realm of the occult (cf. Brown, 2004). Our best hunch is that the surreal impression created by a de´ja` vu fits with the cognitive texture of a dream, rather than a real experience, and finding ways to specify this more empirically would be helpful in the development of research on de´ja` vu.
6.3. Single versus Multiple Causes Several published reports openly challenge the notion that de´ja` vu is initiated by external stimuli, and suggest that it is only triggered by a biological dysfunction (O’Connor & Moulin, 2006, 2008; Thompson et al., 2004). All cognitive interpretations discussed earlier—split perception, implicit memory, single-element, gestalt—are predicated on the assumption that de´ja` vu is initiated by an external perceptual experience. The difference is whether that stimulus connects with itself from a few moments ago (split perception), the same scene experienced weeks or years ago (implicit memory), a piece of a prior real experience (single-element), or a familiar format (gestalt). The alternative is that de´ja` vu is all in the brain. Support for this alternative position is drawn from individuals where the de´ja` vu experience: (a) occurs with extraordinary frequency, (b) is not tied to the physical setting, and (c) cannot be ended or altered by willfully changing the perceptual input (O’Connor & Moulin, 2008; Thompson et al., 2004). We believe that there are multiple possible causes for de´ja` vu. Just as a stomach ache can have different causes (e.g., over consumption, flu, food poisoning, medications, stress), the same is true of de´ja` vu (Brown, 2003, 2004). If a de´ja` vu experience can be identified as a likely result of one possible mechanism, this does not necessarily rule out others (cf. Cleary et al., 2009). Similarly, forgetting where you put your car keys could be
58
Alan S. Brown and Elizabeth J. Marsh
traced to biological (fatigue, stress, low blood sugar) as well as psychological (distraction, multitasking) circumstances. Proving one cause for a particular incident does not rule out other possibilities. There is a considerable amount of accumulating evidence supporting de´ja` vu as caused through data-driven procedures: split perception (Bernstein & Welch, 1991; Brown & Marsh, 2009; Jacoby & Whitehouse, 1989), implicit memory (Brown & Marsh, 2008), single-element familiarity (Brown & Marsh, 2007), and gestalt resemblance (Cleary et al., 2009). No theory of de´ja` vu should be eliminated as precondition of accepting another. We are in an early phase in the exploration of this experience, and different interpretations can provide a rich source of ideas that may yield important findings to cognitive phenomena apart from de´ja` vu.
6.4. Jamais Vu Normally, we experience a perfect alignment between objective and subjective recognition: things that we know feel familiar and settings/people that have not been experienced feel unfamiliar. De´ja` vu is a mismatch between the two, with positive subjective recognition in the face of negative objective recognition. Jamais vu is the opposite—negative subjective recognition contrasted with positive objective recognition. For example, you walk into the dining room in the home that you grew up in, and it appears momentarily unfamiliar as if you are seeing it for the first time. Jamais vu is much rarer than de´ja` vu, and research on the subject is scant with only a few published reports on its nature or incidence (cf. Brown, 2004). Jamais vu was briefly noted in Jacoby and Whitehouse (1989) but their speculation did not get traction in subsequent research. The most captivating aspect of their study (discussed earlier) was that a brief presentation of a prime identical to the immediately succeeding target word enhanced its perceived familiarity. Another finding, however, caught the attention of Jacoby and Whitehouse. In their different prime condition, where the preceding prime word differed from the target, the likelihood of a false alarm decreased relative to the control (no word) condition. Jacoby and Whitehouse suggest that: the processing of a test word is disrupted when its presentation is preceded by a nonmatching context word, and this reduction in fluency gives rise to a lack of familiarity, a feeling of strangeness. (p. 134)
Although found in their Experiment 1, this difference disappeared in Experiment 2. Nonetheless, if fluency enhancement can artificially enhance false positive recognition, why should not the opposite happen? This would provide a tidy symmetry to de´ja` and jamais vu, but such does not seem to be the case.
Recent Research on De´ja` Vu
59
The lifetime incidence of jamais vu is much lower than de´ja` vu among college students. Whereas it is difficult to find an undergraduate who has not experienced de´ja` vu, barely a third of students admit to having experienced jamais vu (Brown, 2004). However, there is a more common experience that resembles jamais vu—word blindness. A survey at SMU revealed that most (60%) (N ¼ 167) college students have experienced a familiar word suddenly looking unrecognizable, so that it momentarily appears to be a nonword. Females report this more than males (56% vs. 38%) and older students (junior/senior) more than younger students (freshman/sophomore) (58% vs. 35%). Of those who admit to word blindness, most have it at least every few months. Words that respondents report becoming ‘‘blind’’ to are surprisingly simple, such as ‘‘were, through, is, of, mine, grow, from, actual.’’ A few were longer (‘‘preservation, statutory’’), and most are abstract nouns or function words. Also related to jamais vu is semantic satiation, where the meaning of a word dissolves after repeated oral presentations or pronunciations. However, this is a poorer model because meaning dissolves only after forced repetition. Jamais vu, on the other hand, seems to occur without apparent repetition. Jamais vu may be more common than reported, but is not noticed as readily as is de´ja` vu. Perhaps when subjective unfamiliarity contrasts with objective familiarity, it is not as attention-grabbing as de´ja` vu or it can be dismissed more easily. Current evidence has not provided a clear link between the two phenomena (Brown, 2004), but this would be a fruitful avenue to pursue to help clarify the mechanisms underlying de´ja` vu.
7. Concluding Remarks The de´ja` vu illusion has received considerable attention over the past century and has stimulated over 40 different interpretations (Brown, 2004). Recent empirical evaluation of some of these theoretical positions has recently appeared in published literature. This chapter summarized tentative support for de´ja` vu as possibly caused by: two perceptions that occur in rapid succession, a momentarily inaccessible prior experience of the present scene, an overly generalized familiarity emanating from one portion of a scene, and a general-form match between the present and a past experience. Evidence from brain pathology and stimulation suggests that we may be close to identifying specific brain structures involved in this illusion of false positive recognition. Examining a subtle cognitive dysfunction like de´ja` vu among cognitively disturbed or medicated patients will always be difficult (Brown, 2004), but findings relating de´ja` vu to milder forms of cognitive dysfunction (e.g., dissociation; Adachi, Akanu, Adachi, Adachi, Ikeda,
60
Alan S. Brown and Elizabeth J. Marsh
et al., 2008) and medication side effects (Kalra, Chancellor, & Zeman, 2007) may elucidate biological and cognitive dimensions of the experience. In closing, the accumulating body of intriguing research on de´ja` vu will hopefully encourage us to spend more effort delving into memory illusions as a means of understanding normal memory function (cf. Roediger, 1996). These obtuse messages from the brain are potentially packed with fascinating secrets about cognitive function.
REFERENCES Adachi, N., Akanu, N., Adachi, T., Adachi, Y., Ikeda, H., Ito, M., et al. (2008). De´ja` vu experiences are rarely associated with pathological dissociation. Journal of Nervous and Mental Disease, 196, 417–419. Angell, J. R. (1908). Psychology. New York: Henry Holt. Banister, H., & Zangwill, O. L. (1941). Experimentally induced olfactory paramnesias. British Journal of Psychology, 32, 155–175. Banister, H., & Zangwill, O. L. (1941). Experimentally induced visual paramnesias. British Journal of Psychology, 32, 30–51. Bartolomei, F., Barbeau, E., Gavaret, M., Guye, M., McGonigal, A., Regis, J., et al. (2004). Cortical stimulation study of the role of rhinal cortex in deja vu and reminiscence of memories. Neurology, 63, 858–864. Bernstein, I. H., & Welch, K. R. (1991). Awareness, false recognition, and the Jacoby– Whitehouse effect. Journal of Experimental Psychology: General, 120, 324–328. Bowles, B., Crupi, C., Mirsattari, S. M., Pigott, S. E., Parrent, A. G., Pruessner, J. C., et al. (2007). Impaired familiarity with preserved recollection after anterior temporal-lobe resection that spares the hippocampus. Proceedings of the National Academy of Sciences of the United States America, 104(41), 16382–16387. Brown, A. S. (2003). A review of the de´ja` vu experience. Psychological Bulletin, 129, 394–413. Brown, A. S. (2004). The de´ja` vu experience. New York: Psychology Press. Brown, A. S., & Marsh, E. J. (2007). Object familiarity can be altered in the presence of other objects. Paper presented at the annual convention of the Psychonomic Society, Long Beach, CA. Brown, A. S., & Marsh, E. J. (2008). Evoking false beliefs about autobiographical experience. Psychonomic Bulletin & Review, 15, 186–190. Brown, A. S., & Marsh, E. J. (2009). Creating illusions of past encounter through brief exposure. Psychological Science, 20, 534–538. Brown, A. S., Porter, C. L., & Nix, L. A. (1994). A questionnaire evaluation of the de´ja` vu experience. Paper presented at the annual convention of the Midwestern Psychological Association, Chicago, ILL. Chapman, A. H., & Mensh, I. N. (1951). De´ja` vu experience and conscious fantasy in adults. Psychiatric Quarterly Supplement, 25, 163–175. Cleary, A. M. (2004). Orthography, phonology, and meaning: Word features that give rise to feelings of familiarity in recognition. Psychonomic Bulletin & Review, 11, 446–451. Cleary, A. M. (2008). Recognition memory, familiarity, and de´ja` vu experiences. Current Directions in Psychological Science, 17, 353–357. Cleary, A. M., & Reyes, N. L. (2009). Scene recognition without identification. Acta Psychologica, 131, 53–62.
Recent Research on De´ja` Vu
61
Cleary, A. M., Ryals, A. J., & Nomi, J. S. (2009). Can de´ja` vu result from similarity to a prior experience? Support for the similarity hypothesis of de´ja` vu. Psychonomic Bulletin & Review, 16, 1082–1088. Cleary, A. M., & Specker, L. E. (2007). Recognition without face identification. Memory & Cognition, 35, 1610–1619. Dashiell, J. F. (1937). Fundamentals of general psychology. Boston, MA: Houghton Mifflin Company. Dickens, C. (1849). The personal history of David Copperfield. London: Penguin Books. Funkhouser, A. T. (1983). A historical review of de´ja` vu. Parapsychological Journal of South Africa, 4, 11–24. Gardiner, J. M., Ramponi, C., & Richardson-Klavehn, A. (1998). Experiences of remembering, knowing, and guessing. Consciousness and Cognition: An International Journal, 7, 1–26. Gellatly, A., Banton, P., & Woods, C. (1995). Salience and awareness in the Jacoby– Whitehouse effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 1374–1379. Jacoby, L. L., & Whitehouse, K. (1989). An illusion of memory: False recognition influenced by unconscious perception. Journal of Experimental Psychology: General, 118, 126–135. Jones, T. C., & Atchley, P. (2006). Conjunction errors, recollection-based rejections, and forgetting in a continuous recognition memory test: Little evidence for recollection. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 374–379. Joordens, S., & Merikle, P. M. (1992). False recognition and perception without awareness. Memory and Cognition, 20, 151–159. Kalra, S., Chancellor, A., & Zeman, A. (2007). Recurring de´ja` vu associated with 5-hydroxytryptophan. Acta Neuropsychiatrica, 19, 311–313. Klinger, M. R. (2001). The roles of attention and awareness in the false recognition effect. American Journal of Psychology, 114, 93–114. Kovacs, N., Auer, T., Balas, I., Zambo, K., Klivenyi, P., Horvath, K., et al. (2009). Neuroimaging and cognitive changes during de´ja` vu. Epilepsy & Behavior, 14(1), 190–196. Mack, A. (2003). Inattentional blindness: Looking without seeing. Current Directions in Psychological Science, 12, 180–184. Mack, A., & Rock, I. (1998). Inattentional blindness. Cambridge, MA: The MIT Press. Mantyla, T. (1993). Knowing but not remembering: Adult age differences in recollective experience. Memory & Cognition, 21, 379–388. Marsh, E. J., & Brown, A. S. (2010). Becoming familiar by the company you keep. Under review. Merikle, P. M., Smilek, D., & Eastwood, J. D. (2001). Perception without awareness: Perspectives from cognitive psychology. Cognition, 79, 115–134. Neppe, V. M. (1983). The psychology of de´ja` vu: Have I been here before? Johannesburg, South Africa: Witwatersrand University Press. O’Connor, A. R., Barnier, A. J., & Cox, R. E. (2008). De´ja` vu in the laboratory: A behavioral and experiential comparison of posthypnotic amnesia and posthypnotic familiarity. International Journal of Clinical and Experimental Hypnosis, 56, 425–450. O’Connor, A. R., & Moulin, C. J. A. (2006). Normal patterns of de´ja` experience in a healthy, blind male: Challenging optical pathway delay theory. Brain and Cognition, 62, 246–249. O’Connor, A. R., & Moulin, C. J. A. (2008). The persistence of erroneous familiarity in an epileptic male: Challenging perceptual theories of de´ja` vu activation. Brain and Cognition, 68, 144–147. Osborn, H. F. (1884). Illusions of memory. North American Review, 138, 476–486.
62
Alan S. Brown and Elizabeth J. Marsh
Payne, B. K., Cheng, C. M., Govorun, O., & Stewart, B. (2005). An inkblot for attitudes: Affect misattribution as implicit measurement. Journal of Personality and Social Psychology, 89, 277–293. Roediger, H. L. (1996). Memory illusions. Journal of Memory and Language, 35, 76–100. Roediger, H. L., & McDermott, K. B. (1995). Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 803–814. Sander, W. (1874). Ueber erinnerungsta¨uschungen. Archiv fu¨r Psychiatrie und Nervenkrankheiten, 4, 244–253. Thompson, R. G., Moulin, C. J. A., Conway, M. A., & Jones, R. W. (2004). Persistent de´ja` vu: A disorder of memory. International Journal of Geriatric Psychiatry, 19, 906–907. Titchener, E. B. (1928). A text-book of psychology. New York: Macmillan.
C H A P T E R
T H R E E
Spacing and Testing Effects: A Deeply Critical, Lengthy, and At Times Discursive Review of the Literature Peter F. Delaney, Peter P. J. L. Verkoeijen, and Arie Spirgel Contents 1. Introduction 2. A Field Guide to the Spacing Literature: Spotting Impostors 2.1. Recency Effects 2.2. Intentional Learning and Mixed Lists: Rehearsal Effects and Strategy-Switching 2.3. Primacy and Recency Buffers: The Zero-Sum Effect 2.4. Deficient-Processing Effects 2.5. Incidental Learning and Mixed Lists: List-Strength Effects 2.6. Summary: The Impostor Effects and Confounds in Spacing Designs 3. The Failure of Existing Spacing Theories 3.1. Intention Invariance 3.2. Age-Invariance 3.3. Species Invariance 3.4. The Glenberg Surface 3.5. Deliberate Contextual Variability at the Item Level Doesn’t Help 3.6. Recognition Required for Spacing Benefits 3.7. Semantic and Perceptual Priming Accounts for Cued-Memory Tasks 3.8. Hybrid Accounts 3.9. Summary: Theories and Key Phenomena 4. Extending a Context Plus Study-Phase Retrieval Account of Spacing Effects 4.1. An Account of the List-Strength Effect Using SAM 4.2. A Modified One-Shot Account of Spacing? 4.3. Some Experiments Linking Context and Spacing 4.4. Directed Forgetting as a List-Strength Phenomenon 4.5. Summary and Untested Predictions of the Account
Psychology of Learning and Motivation, Volume 53 ISSN 0079-7421, DOI: 10.1016/S0079-7421(10)53003-2
#
64 66 67 68 74 77 79 80 80 81 84 85 86 87 91 94 101 103 104 104 106 108 109 111
2010 Elsevier Inc. All rights reserved.
63
64
Peter F. Delaney et al.
5. The Testing Effect 5.1. Early Research: Tests Slow Forgetting 5.2. The Importance of Retention Interval 5.3. The Return of Deficient-Processing Accounts 5.4. Transfer-Appropriate Processing Accounts 5.5. Retrieval Effort and Desirable Difficulty 5.6. Why Does Testing Help More Than Restudy? 5.7. Testing Effects for Integrated Stimuli 5.8. Summary: The Testing Effect 6. Spacing and Testing in Educational Contexts 6.1. Do Spacing and Testing Improve Learning or Just Memory? 6.2. How Prevalent Are Spacing and Testing in Classroom Settings? 6.3. How Can One Improve Learners’ Use of Spacing and Testing? 6.4. Are There Individual Differences in Spacing and Testing? 7. Conclusions References
112 113 115 117 119 121 122 124 125 126 127 130 131 134 135 137
Abstract What appears to be a simple pattern of results—distributed-study opportunities usually produce better memory than massed-study opportunities—turns out to be quite complicated. Many ‘‘impostor’’ effects such as rehearsal borrowing, strategy changes during study, recency effects, and item skipping complicate the interpretation of spacing experiments. We suggest some best practices for future experiments that diverge from the typical spacing experiments in the literature. Next, we outline the major theories that have been advanced to account for spacing studies while highlighting the critical experimental evidence that a theory of spacing must explain. We then propose a tentative verbal theory based on the SAM/REM model that utilizes contextual variability and study-phase retrieval to explain the major findings, as well as predict some novel results. Next, we outline the major phenomena supporting testing as superior to restudy on long-term retention tests, and review theories of the testing phenomenon, along with some possible boundary conditions. Finally, we suggest some ways that spacing and testing can be integrated into the classroom, and ask to what extent educators already capitalize on these phenomena. Along the way, we present several new experiments that shed light on various facets of the spacing and testing effects.
1. Introduction This chapter reflects our best attempt to review the state of theoretical and empirical knowledge on the family of memory effects that deal with the impact of studying the same thing several times—the distributed-practice family. Extra study opportunities produce better memory, but how we distribute those study opportunities is also important for memory.
Spacing and Testing Effects
65
The distributed-practice family of effects comprises a variety of phenomena, including the spacing effect, lag effect, and testing effect. Cognitive psychologists have produced hundreds of papers over the last century arguing that there is a spacing effect—that is, a memory advantage to restudying something with a delay between the repetitions compared to immediate restudy. The spacing effect is often viewed as an instance of the broader lag effect, in which longer spacing intervals are associated with changes in later recall. Specifically, the lag effect reveals that short spacing results in lower recall relative to moderate spacing, and very long spacing begins to show declines again. Finally, the spacing effect’s first cousin is the testing effect, which refers to the advantage of testing an item relative to just studying it again. Thus, the distributed-practice family includes several of memory theory’s favored children because of their obvious implications for improving education. We intend this chapter to serve as a comprehensive review of the spacing and testing literature and their associated theories, circa 2010. We are due for a long narrative review of the spacing literature anyway. This review, like many others, culminates with a theoretical proposal that attempts to explain the vast range of empirical results in the spacing literature. We also present some new data and draw attention to the importance of some recent papers, whose importance might otherwise be missed. What we think our review contributes beyond that is a careful experimental analysis of the task used in spacing experiments: verbal list learning. No one is inherently excited about word lists, but they have been used in the preponderance of studies on the spacing effect, and therefore understanding what people are doing in these experiments is critical. We will take the rather strange stance that there is a ‘‘real’’ spacing effect somewhere and that all of the other (e.g., rehearsal borrowing, strategy changes during study) phenomena are ‘‘imposters’’ that masquerade as the spacing effect. Just because many different phenomena have a similar observable outcome—namely, better memory for spaced repetitions than for massed repetitions—does not mean that all of these phenomena are the same. It would be like arguing that giving extra study time and asking people to process items for survival value are ‘‘really the same thing’’ because they both result in better memory for studied items. We cannot rely on similar outcomes in recall rates as the sole diagnostic criterion for identifying the spacing effect. For example, ‘‘deficient processing’’ accounts of spacing propose that when people encounter a massed repetition, they exert less encoding effort on the second presentation than they do for second spaced repetitions. Several studies have demonstrated that deficient processing does happen in some cases, and it produces a spacing effect. Furthermore, the deficient-processing effect can be discriminated from other spacing effects because it weakens the benefit of massed repetitions over single presentations rather than enhancing the recall of spaced items relative to massed repetitions. Therefore, although it is
66
Peter F. Delaney et al.
phenomenonologically similar, the deficient-processing effect is not the ‘‘real’’ spacing effect—it is an impostor. These impostors often produce effect sizes as large as or larger than the ‘‘real’’ spacing effect. Furthermore, they may operate in the same direction as the real spacing effect, thereby greatly exaggerating its impact, or they may operate in the opposite direction from the real spacing effect, canceling it out. Without a careful experimental analysis of participants’ behavior during the verbal learning task, it is quite difficult to understand the circumstances under which the ‘‘real’’ spacing effect occurs and the circumstances under which it does not. This confusion has produced a bewildering thicket of experimental results that seemingly contradict one another. In this chapter, we do our very best to untangle the thicket on a briar by briar basis, identifying the impostor phenomena and providing guidelines for running future impostor-free spacing experiments. A crucial part of this effort involves the analysis of the strategies that participants use when they study lists of words. For the past 10 years, our laboratories have worked to understand what people do when they encounter an instruction to ‘‘study words for a later memory test’’ and how the strategies they choose interact—often in surprising ways—with the number of lists people study, whether items are repeated in a massed or spaced fashion, and whether massed and spaced repetitions are mixed together on the list or kept on separate lists. We think that very few experimental studies meet rigorous standards for comparing theoretical views about the ‘‘true’’ cause of spacing effects, because human participants do not cooperate with researchers by ‘‘just behaving normally’’ during memory experiments. Instead, they devise a variety of clever strategies for memorizing lists of words, and these strategies interact in surprising ways with the structure of the lists to affect memory. For example, we will see that rote rehearsal strategies sometimes enhance and sometimes reduce the impact of spacing, depending on the structure of the list.
2. A Field Guide to the Spacing Literature: Spotting Impostors There have been three major meta-analyses of the spacing literature conducted in the past decade (Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006; Donovan & Radosevich, 1999; Janiszewski, Noel, & Sawyer, 2003), which produced conflicting results that depend on what studies were included. The most comprehensive meta-analysis of verbal learning was the most recent (Cepeda et al.), which identified confounds in some earlier studies and included the largest number of studies. For each study, they assessed the lag between repetitions (i.e., how much time passes between
Spacing and Testing Effects
67
each repetition), and the retention interval (i.e., how much time passes between the last repetition and the test). For any given retention interval, there is an optimal lag between repetitions that maximizes memory. Shorter-than-optimal and longer-than-optimal lags between repetitions produce suboptimal memory. Furthermore, as the retention interval increases, so does the optimal lag between repetitions. Therefore, memory is a function of both the retention interval and the lag between repetitions. Finally, they found that the same pattern held for both free recall and cuedrecall tests. Their analysis represents the best current conclusions regarding the spacing effect. However, their reliance on verbal learning data is problematic due to the large number of confounds present in existing spacing studies. Specifically, there are a variety of impostor spacing effects that deserve their own names, and should be carefully watched for in studies that attempt to measure the ‘‘true’’ spacing effect. Table 1 outlines the major phenomena we will review here that affect the conclusions of many spacing studies.
2.1. Recency Effects It is fitting to begin with recency effects, an impostor that is so well known that it stars in virtually every introductory psychology textbook’s discussion of memory. The problem with recency confounds in spacing studies is an old one in the literature, highlighted by Crowder’s (1976) review. Specifically, because spaced items must occur in multiple locations on the list, their final presentation tends to be more recent than an equal number of massed items unless care is taken to equate the final positions. Because recent items are more easily recalled than older items, an artifactual spacing effect can be observed. One approach to solving this problem, whose discovery was attributed to Melton (1967) by Crowder, was to use primacy and recency ‘‘buffer’’ items that would not be tested, or just not counted for free recall. In fact, this approach was used earlier by Waugh (1962), but it is not terribly effective at controlling recency. Zimmerman (1975), for example, found an extended recency function that produced 20% higher recall for laterpresented than earlier-presented items, even though he included primacy and recency buffers. He required participants to focus on only the current item, which eliminated the primacy effect, but resulted in an extended recency function. Even in recent work, recency control has been a problem. Toppino and Bloom (2002), in their Experiment 1, replicated an experiment of Greene (1989) that compared free recall following incidental and intentional learning. The lists contained some massed and some spaced items, with spaced items of varying lag. Greene tried to control for recency biases by counterbalancing the assignment of words to quadrants of the list.
68 Table 1
Peter F. Delaney et al.
Five Impostors: Spacing-Like Phenomena.
1. The recency effect. Even if we control rehearsal, there is an extended recency function. Failing to account for this can artificially enhance the memory of spaced items, because their last presentations are more recent and therefore stronger. 2. Rehearsal-borrowing effects on mixed lists. Mixed lists encourage rehearsal borrowing, which artificially inflates the spacing effect on mixed lists relative to pure lists. The degree of borrowing varies depending on list structure as well, so one can create some super-spaced items unintentionally. This effect is often wrongly discounted as unimportant because spacing effects emerge in incidental learning, and because people often change encoding strategies during study (see Delaney & Knowles, 2005). 3. The zero-sum effect on pure lists. Because people rehearse during study, there is no guarantee—particularly with pure-list designs—that the primacy items wo’nt receive differential practice on some types of lists compared to others. A spacing effect occurs on pure lists if you throw away the beginning of the list, but only because the beginning of the list benefits tremendously from displaced rehearsal on all-massed lists. 4. Deficient-processing effects. There are a family of deficient-processing effects, including the Deficient-processing effect, in which processing is reduced; the Rose effect, in which people choose to spend less time on massed items when they have control over study time; and the speed effect, in which toofast presentation rates encourage people to mass items or to skip spaced items. 5. List-strength effects. In free recall, there are output effects at recall that favor spaced items over massed items. These effects appear only on mixed lists and vanish on pure lists.
The Toppino and Bloom study was virtually an exact replication of the experiment, except that it more carefully controlled recency by controlling the position of the second presentation of words instead of just the quadrant. Surprisingly, this subtle change eliminated the spacing effect for incidental learning observed by Greene. The study highlights the fact that seemingly minor recency biases can inflate or deflate the magnitude of spacing effects, altering our conclusions about the magnitude of the spacing effect—or even its presence or absence under varying conditions.
2.2. Intentional Learning and Mixed Lists: Rehearsal Effects and Strategy-Switching Our second spacing impostor is the rehearsal-borrowing effect. Like recency, rehearsal is a well-known phenomenon, but it also provides a convincing impostor spacing effect when rehearsal favors spaced items over massed items. It is a serious problem for most spacing studies, because
Spacing and Testing Effects
69
most spacing studies use a mixed-list design, meaning that they have massed repetitions and spaced repetitions on the same list. Furthermore, they use nonspecific instructions to study the words on the list for a later memory test, and therefore they do not really control how long people study each item. Such designs encourage rehearsal borrowing that redistributes study time away from massed items and awards it to spaced items. The obvious result of spending a much longer time in studying the spaced items than the massed items is that the spaced items are better remembered on a test. Hall (1992a) went so far as to revive the theory that rehearsal borrowing was the only mechanism necessary to explain the emergence of a spacing effect in most studies. The borrowing explanation was first advanced in the original Atkinson and Shiffrin (1968) ‘‘modal model’’ paper. Atkinson and Shiffrin argued that people comply with instructions to study a list of words by reading each item and then rehearsing earlier-presented items in a shortterm memory buffer. Because the buffer had limited capacity, adding new items to the rehearsal buffer resulted in dropping some earlier words. The time in the buffer—equivalent to the number of rehearsals the item received—would then predict its later strength and hence probability of final recall. Such a mechanism would naturally produce a spacing effect and a lag effect, because spaced items (but not massed items) appear in multiple places on the list. The longer the lag between presentations, the more likely it was that the item had already received a ‘‘full run’’ through the buffer when it was next encountered. Upon being refreshed, it would get a new run through the buffer, receiving extra rehearsals. However, massed items appear in only one location on the list, and therefore get only one ‘‘full run’’ through the rehearsal buffer. The result is more rehearsals for spaced than for massed items. Rundus (1971) verified this prediction using rehearse-aloud protocols, and discovered that the probability of rehearsing an item was directly predictive of its probability of later recall. He further showed that spaced items received more rehearsal than did massed items, demonstrating rehearsal borrowing. If borrowing is pervasive on mixed lists, we would expect that mixed lists greatly overestimate the true benefit of spacing. Furthermore, if Hall’s (1992a) contention were correct and rehearsal borrowing were the only mechanism necessary to explain the spacing effect, then spacing would be virtually useless as a learning tool. The goal of spacing practice is to improve memory for all of the to-be-learned items, not to selectively improve memory for a few of the items at the expense of the rest! Because of this concern, Hall used pure lists—that is, lists composed of only spaced items or only massed items—to see if the spacing effect would disappear once people could no longer borrow time from massed items to help the spaced items. In three experiments, he showed that studying pure lists eliminated the spacing effect on a free recall test, using presentation times ranging from 1 to 4 s per item. Furthermore, compared to a mixed list, the pure lists resulted in lower
70
Peter F. Delaney et al.
recall of spaced items and higher recall of massed items. The latter result is consistent with Hall’s argument that for mixed lists, rehearsal borrowing awards extra study time to the spaced items at the expense of the massed items. In another study, Hall (1992b) compared pure lists of spaced items with pure lists of once-presented items that were presented for the same total duration. At 2, 4, and 6 s per item presentation rates, he obtained no spacing advantages with free recall tests. Taken together, the results suggested that rehearsal borrowing might be a serious problem for our conclusions about the spacing effect, since virtually all of the studies in the literature use mixed-list designs together with intentional learning. Two later studies seemed to overturn Hall’s (1992a) conclusions, however. An important paper by Toppino and Schneider (1999) demonstrated that you could still get spacing effects on pure lists, provided multiple study lists were employed (with a free recall test after each list). We will later see that the inclusion of multiple lists within the session is important because people change how they study throughout the course of an experiment. Toppino and Schneider also included a condition that used a mixed list, but where each half of the list was pure. That is, the first half of the list contained only spaced or only massed items, while the second half contained the opposite type of item. These ‘‘special’’ lists would presumably reduce the extent of rehearsal borrowing across item types (if that borrowing tended to come from recent items). Their most crucial evidence against the rehearsalborrowing explanation was that the pure lists and the ‘‘special’’ mixed lists produced relatively similar spacing effects (8% for mixed lists and 7% for the pure lists). It is worth noting, however, that Hall found that ‘‘regular’’ mixed lists produced spacing effects roughly twice as large (14%). A later paper by Kahana and Howard (2005) also obtained spacing effects in free recall using pure lists, and further demonstrated that the lag effect was present. Results such as these—especially when combined with earlier papers that obtained spacing effects using pure lists1 (Underwood, 1969, 1970)—seemed to indicate that rehearsal was less important than Hall (1992a) had believed. However, more recent work has suggested that the story is more complicated, and we will discuss this more recent research next. 2.2.1. People Do Not All Rehearse, and They Change Strategies with Practice Hall (1992a) assumed that most people comply with the instructions to study words for a later memory test by rehearsing. But do they really? Ironically, there are almost no studies that have asked the straightforward 1
Underwood’s (1969, 1970) studies were atypical, however, in that they used very long presentation rates (10 s per item) and often many repetitions, which would tend to produce deficient processing effects; see below for more on deficient processing.
Spacing and Testing Effects
71
behavioral question, ‘‘What do people do when you tell them to study words for a later memory test?’’ When we create cognitive models, we typically implicitly assume that people (a) all do pretty much the same thing, and (b) do pretty much the same thing from one trial to the next. As someone trained in the problem-solving tradition, these assumptions seemed rather flimsy to the first author. After all, rather ordinary people can obtain digit spans greater than 70 with a few months’ practice (e.g., Chase & Ericsson, 1981), and they rapidly discover better strategies than rote rehearsal. At the extreme, memory experts like the memorist Rajan will discover new mnemonic strategies to deal with memory tasks deliberately created to interfere with his existing mnemonic techniques in just a few days of practice (Ericsson, Delaney, Weaver, & Mahadevan, 2004). We therefore conducted a series of studies using methods typically reserved for the thinking literature. We asked participants to study lists of words, but afterwards asked them to tell us what they were thinking as they studied the words. We then coded these verbal reports into strategy groups (Delaney & Knowles, 2005; Sahakyan & Delaney, 2003). It turns out that on the first list of words that people study, about 70% use a rote rehearsal strategy in which they read each item as it appears and then rehearse earlier items. However, rote rehearsal is not a terribly effective memory strategy, and if people receive a test after each list, they will often abandon rote rehearsal for something else. The second most frequent strategy after rehearsal was the story mnemonic (Bower & Clark, 1969; Drevenstedt & Bellezza, 1993; Reddy & Bellezza, 1983), in which people make up a story using all the words on the list. There are various other ‘‘deep’’ mnemonics that people use, like linking each word to their own personal experiences or making up sentences using each word. On the first list, about 16% of participants used a deep encoding strategy. However, by the fourth study list, about equal numbers of people (43–44%) were using a deep strategy and the rote rehearsal strategy. Thus, when people study multiple lists, they tend to abandon rote rehearsal in favor of more effective strategies. Tests are one way to induce people to switch strategies. In fact, you do not even have to explicitly test people; metacognitive judgments or various disruptions of the rehearsal strategy between two lists also result in strategy changes (see Sahakyan & Delaney, 2003, 2005; Sahakyan, Delaney, & Kelley, 2004). Strategy changes favoring better encoding on later-studied lists may also work to ameliorate the deleterious effects of proactive interference build-up in cases when people are instructed to study word lists without any specific instructions on the strategy to use during study (Szpunar, McDermott, & Roediger, 2008). We have summarized the impact of encoding strategy on the magnitude of the spacing effect in Table 2, based on several recent studies conducted in our laboratories. Delaney and Knowles (2005) explored the role of study strategy in the spacing effect on pure lists of words. In Experiment 1, they
72
Peter F. Delaney et al.
Table 2 Magnitude of the Spacing Effect in Free Recall by Encoding Strategy and List Type. Strategy
Mixed lists
Pure lists
Rehearse each item alone Rehearse the items together Story mnemonic
Small Large Large
Small Null Small
Note: Assuming a list of 32 items presented twice and free recall testing, a small effect is about a 6% spacing advantage, a large effect is around 15%, and a null effect is less than 2%. Mixed lists contain both spaced and massed items, while pure lists contain only spaced or massed items (but not both).
partitioned their data into participants who used rote rehearsal and those who used a ‘‘deep’’ encoding strategy like the story mnemonic. Replicating Hall (1992a), when people reported using rote rehearsal, there was no significant spacing effect on pure lists—at best, it was a small (1–2%) advantage. There was no spacing effect regardless of how many lists they had studied, provided they stuck with rote rehearsal throughout. However, for people who switched strategies to a deep encoding strategy, the spacing effect emerged on pure lists. Thus, Delaney and Knowles concluded that Hall’s participants, who saw only a single list, were mostly using rote rehearsal, and thus showed no spacing effect. However, later papers like Toppino and Schneider’s (1999) study had people study multiple lists, which caused people to abandon the rote rehearsal strategy. Consequently, they obtained a significant spacing effect even on pure lists. In a second experiment, Delaney and Knowles (2005) controlled the study strategy their participants used by instructing them to either use a rote rehearsal strategy or to use the story mnemonic. They again found no reliable spacing effect in the rote rehearsal condition, but a significant spacing effect in the story mnemonic condition, confirming their earlier results. A similar study by Paivio and Yuille (1969) had earlier shown similar strategy-switching for cued recall. They found that participants often start by using a rehearsal strategy, but switch to a mediation or imagery-based strategy. Thus, the concern that the number of lists employed in spacing experiments, and the particular mix of strategies used, is not limited to free recall and single-item recognition experiments—although no one has specifically repeated the Delaney and Knowles (2005) experiments using cued recall. Bahrick and Hall (2005) have argued for item-specific strategy changes in cued recall, such that when people see a pair again, if they retrieve their earlier association they will strengthen it. However, if they fail to retrieve that association, then they generate a new one. In a Darwinian selection/retention process, successful mediators are retained while unsuccessful ones are replaced, resulting in better memory following long spacing of items.
Spacing and Testing Effects
73
2.2.2. Rote Rehearsal and the Borrowing Hypothesis Revisited In a recent paper, we examined the rote rehearsal strategy in order to learn how rehearsal interacts with list structure (Delaney & Verkoeijen, 2009). Specifically, we asked our participants to rehearse using the rote rehearsal strategy as described to us by people who used it in our earlier laboratory studies. Our participants described a process we called the rehearse-together strategy, in which they would read each word as it appeared on screen and then use any remaining time to rehearse earlier items. Consistent with the Delaney and Knowles (2005) studies, we found that the rehearse-together strategy resulted in a null spacing effect on pure lists. However, it resulted in a large spacing effect on mixed lists. The same results were obtained with both free recall and recognition tests. In order to understand how rehearsing groups of items affected memory, we compared the rehearse-together conditions to a rehearse-alone condition, in which participants read each word and then repeated only that item until the next item appeared (see also Wright & Brelsford, 1978; Zimmerman, 1975). In several experiments, we found identical small spacing effects on pure and mixed lists using the rehearse-alone condition. The experiments are particularly dramatic because they show that the ‘‘real’’ spacing effect— as manifest in the rehearse-alone condition—can be doubled in magnitude on mixed lists and eliminated on pure lists simply by changing how people study the lists. Another way of saying this is that the rehearsal confounds in a typical spacing experiment are larger than the spacing effect that the experiments are designed to study. An earlier study by Wright and Brelsford (1978) also compared rehearsealone and rehearse-together instructions although they used only the mixed-list conditions. In Experiment 1, they compared rehearse-alone and rehearse-together using overt rehearsal, and obtained no spacing effect with rehearse-alone instructions, but a significant spacing effect with rehearse-together instructions. However, their results were vulnerable to a floor effect interpretation (see p. 637), and we found that a spacing effect does emerge on mixed lists with rehearse-alone—it is just smaller than in the rehearse-together condition (Delaney & Verkoeijen, 2009). In their Experiment 2, they let people rehearse covertly, which may have allowed some of them to violate the instructions. However, they found results more similar to ours in that they obtained a larger spacing effect for rehearse-together than for rehearse-alone. Furthermore, in their rehearse-together condition, the massed items were recalled at a rate similar to singletons, consistent with displaced rehearsal. Why does the rehearse-together strategy affect memory so differently on pure and mixed lists? One part of the story is that rehearse-together strategies manipulate recency effects in interesting ways. In the rehearsealone condition, we obtained an extended recency effect (better memory for the end of the list) and no primacy effect (better memory for the
74
Peter F. Delaney et al.
beginning on the list). This is similar to what one observes when people do not expect a test and do not try to study the words at all. When people rehearse items together, we obtained both primacy and recency effects. People tend to rehearse early items on the list throughout the entire duration of the list, making them artificially recent (cf. Tan & Ward, 2000). Another effect is that rehearsing earlier-studied items turns them into a kind of spaced item. When we asked people to rehearse out loud, we found that they tended to rehearse spaced items more frequently than massed items on the mixed lists (see also Rundus, 1971). In contrast, on pure lists, massed items benefit because they are more likely to receive distributed rehearsal than they would if people focused only on the current item, making them functionally similar to spaced words. 2.2.3. Summary In summary, research often fails to control encoding strategy in spacing experiments, which results in participants adopting increasingly better study strategies across lists. Because different encoding strategies result in different magnitudes of the spacing effect, averaging across multiple lists, even when the order is counterbalanced, can produce misleading estimates of the true effect size. Encoding strategies that encourage rehearsal borrowing tend to result in much larger spacing effects on mixed lists than on pure lists. In the typical studies conducted in the past, people have used mixed lists and intentional rehearsal, which encourage borrowing. Since the borrowing effect is as large as or larger than the actual spacing effect, such studies cannot provide accurate estimates of the true magnitude of the spacing effect.
2.3. Primacy and Recency Buffers: The Zero-Sum Effect Our third impostor is also related to rehearsal borrowing, and we call it the ‘‘zero-sum effect’’ (Verkoeijen & Delaney, 2008). The zero-sum effect is a consequence of the common experimental practice of throwing away some of the items on the list and measuring recall of the rest. Waugh (1962) introduced the practice of including items at the beginning and end of the list—called primacy and recency buffers, respectively—that were not counted and served only to reduce the impact of primacy and recency biases on massed versus spaced comparisons. This practice has apparently been enforced by generations of spacing researchers, as it is used in the majority of studies. One of the unusual features of the Delaney and Knowles (2005) and Delaney and Verkoeijen (2009) studies is that they do not include any primacy or recency buffers. Consistent with our general position that everything spacing researchers think is good is really bad, we think primacy and recency buffers are problematic—especially if they are used on pure lists.
Spacing and Testing Effects
75
To understand why, it is important to first note that Toppino and Schneider (1999) showed that the serial position function of pure-spaced and pure-massed lists differ in interesting ways. Specifically, the pure-massed lists show an enhanced primacy effect compared to the pure-spaced lists, resulting in a crossover interaction such that massing produced better memory for the beginning of the list while spacing produced better memory in the rest of the list. Toppino and Schneider termed this the enhanced primacy effect. However, we have already proposed that the spacing effect observed in Toppino and Schneider’s (1999) study reflected a mixture of strategies. When one plots the serial position function for the rehearse-together strategy, one obtains an enhanced primacy effect, but no overall spacing effect (Delaney & Knowles, 2005; Delaney & Verkoeijen, 2009). The serial position function for the story mnemonic produces no primacy and a weak recency effect, with a spacing effect throughout the list. If one mixes together some rehearse-together participants and some story mnemonic participants—as we think Toppino and Schneider’s study naturally did— one would obtain a function that displays enhanced primacy, but also has a spacing effect. As that is exactly the pattern they obtained, the strategy mixing seems quite plausible. Just because one uses pure lists does not mean rehearsal-borrowing stops; it just means participants cannot borrow from massed items to help spaced items. One can still rehearse some items more often than others. There are well-established rehearsal frequency differences that depend on serial position, such as the primacy effect, which results from extensive rehearsal of the early items on the list (e.g., Tan & Ward, 2000). On pure-massed lists, the extra rehearsal for primacy items is likely to be greater than for pure-spaced lists, because each of those primacy items is presented right away for twice as long. On the pure-spaced lists, in contrast, primacy items are already being replaced with new items soon after they are introduced. According to this logic, the enhanced primacy effect on massed lists is a result of rehearsal patterns that strengthen items at the start of the list. A corollary of this argument is that the apparent spacing effect in the rest of the list might be due to rehearsal borrowing, such that the strong primacy-region items steal rehearsal time away from the rest of the list on the massed lists. To test this idea, Verkoeijen and Delaney (2008) recently conducted a series of pure-list spacing experiments in which we required participants to use the rehearse-together strategy. As in our earlier studies, the spacing effect was small and nonsignificant. Our next step was to plot the serial position functions and to ask whether people who showed a bigger enhanced primacy effect—that is, a bigger massing advantage in the first quadrant—were the same people who showed a bigger spacing effect throughout the list. To illustrate, Figure 1 shows two participants, A and B. Participant A shows a large enhanced primacy effect, because she focuses on rehearsing the beginning of the massed list to a greater extent than
76
Peter F. Delaney et al.
Participant A
1
2 3 List quadrant
4
Massed Spaced
1
Participant B
2 3 List quadrant
4
Figure 1 The zero-sum hypothesis proposes that if you show a bigger enhanced primacy advantage (Quadrant 1 is better recalled on massed than spaced lists), then you will show a smaller spacing advantage throughout Quadrants 2–4. Participant A rehearsed the beginning of the spaced list quite a lot, resulting in lower recall of the rest of the spaced list. Participant B showed a smaller primacy effect, and hence better recall of the rest of the list.
Participant B. However, this extra rehearsal of the beginning of the massed list comes at a cost; compared to Participant B, she shows less memory for the rest of the massed list, resulting in a spacing advantage throughout the rest of the list. Verkoeijen and Delaney called this the zero-sum hypothesis, as it suggests that the better you do on one part of the list, the worse you are likely to do on the rest of the list. Indeed, this is exactly the pattern we found: people who showed larger enhanced primacy effects were the same people who showed larger spacing effects in the rest of the list. People who showed little or no enhanced primacy effect also showed little or no spacing effect in the rest of the list, suggesting trade-offs in memory. Turning back to the issue of primacy and recency buffers, it should be clear that they are part of the list to be studied from the perspective of the participants. Before we throw those parts of the list away, we should check whether the list structure affects the recall of the primacy and recency buffer items. Pure lists cease to be pure if they have primacy and recency buffers, because those items then receive rehearsal. Primacy buffer items, for example, are likely to be rehearsed throughout the list. This effect can be magnified if they are followed by a large number of massed repetitions, during which people will continue to rehearse the primacy buffer items. At this time, we also have no way of knowing whether on mixed lists the primacy buffer items receive more rehearsal during massed than during spaced repetitions. Hence, we do not favor the inclusion of primacy and recency buffer items—which is unfortunately a feature of the majority of spacing studies.
Spacing and Testing Effects
77
In summary, even designs that throw away the primacy and recency regions can result in rehearsal-borrowing effects that differ between spaced and massed lists. This is because people may not distribute practice to the primacy and recency regions equally in spaced and massed lists. Thus, rehearsal-borrowing problems persist even with pure lists. Many of these problems can be ameliorated by controlling encoding strategy and by measuring recall rates for the entire list, and not just a portion of it.
2.4. Deficient-Processing Effects One of the earliest proposed explanations for the spacing effect involved deficient processing, which is our fourth impostor. The idea behind deficient-processing explanations was that the second time an item is encountered, processing the item is somehow easier than it was the first time. In verbal learning studies where people study individual words, there is not usually very much ‘‘processing’’ that people need to do; they read the word and activate its meaning. Deficient processing makes more sense when people need to generate something on each repetition. For example, if we ask people to rate a twice-presented word for pleasantness, they have no need to think about their answer on the second occurrence unless they have forgotten their original answer. An even clearer example of deficient processing was demonstrated by Jacoby (1978), who asked people to solve word puzzles that consisted of two words. The first word was a cue that helped participants to solve the puzzle, and the second word had some missing letters. For example, he might present shoe—F _ _ T, and the answer would be FOOT. Jacoby found that when people had recently seen the word FOOT, these puzzles became trivial, and later memory for the word was much lower on a surprise cued-recall test compared to puzzles they had solved themselves (see also Cuddy & Jacoby, 1982). A classic demonstration of deficient processing was a study by Thios (1972), who presented participants with sentences whose subject and object were sometimes repeated in a later sentence. Repetitions either used the same ‘‘sense’’ of the subject and object, or a homographic ‘‘sense’’ of the subject and object. For example, if participants read, ‘‘The electric drill cut into the cinder block,’’ then a same-sense repetition might be, ‘‘The hi-powered drill entered the masonry block.’’ A homographic repetition might be, ‘‘The fire drill emptied the city block.’’ After 80 sentences, they were cued with the subject words and had to recall the object words. The major result of the study was that there was a spacing effect in both conditions, but by comparing the massed repetitions to once-presented sentences, they determined that massed homographic repetitions improved memory more than did massed same-sense repetitions. The results suggest that sentences that were more dissimilar reduced the massed-item
78
Peter F. Delaney et al.
processing deficit. (In contrast, for spaced items, the lag effect was larger with same-sense repetitions.) Similar results were reported by Dellarosa and Bourne (1985). In Experiment 1, they either repeated sentences verbatim or paraphrased them. They further varied the lag, using massed repetitions and spaced repetitions with lags out to eight sentences. Changing the surface form of the sentence improved memory for massed repetitions, but had small and inconsistent impact on spaced repetitions. In Experiment 2, sentences were repeated using either the same-gender voice or a different gender voice. Switching the gender of the speaker improved memory for the massed sentences substantially, but improved memory for spaced sentences only slightly. Both of these results are consistent with a deficient-processing explanation whereby identical or nearly identical repetitions provide little benefit to memory when they are repeated without any lag. Another source of deficient processing can be participants’ own choices about how long to study. Zimmerman (1975) gave participants the option to control the rate at which items appeared on screen for study. By hitting the space bar, they could terminate the presentation and move to the next item. He found that people would terminate study of massed items more quickly than they would spaced items, suggesting that people would intentionally induce deficient processing on the massed items. Furthermore, people terminated study of short-lag items sooner than long-lag items, producing a lag effect. A study conducted by Shaughnessy, Zimmerman, and Underwood (1972, Experiment 3) produced similar results. A recent study by Toppino, Cohen, Davis, and Moors (2009) raises another possibility for deficient processing—though in this case, for spaced repetitions. Toppino et al. manipulated the difficulty of study items, and showed that for more difficult items, participants often failed to fully perceive them at rapid presentation rates. Under these circumstances, they showed better memory for massed than spaced repetitions. The Toppino et al. study suggests that if the presentation rates in a typical spacing study are too fast, people may have no choice but to skip some of the items to cope with the fast pace. If so, they might favor massed items, which they feel they have time to process fully, and skip many of the spaced items. The itemskipping approach predicts that if the presentation time is very fast, you might observe a reverse spacing effect (i.e., better memory for massed items). It turns out that is exactly what one finds. Metcalfe and Kornell (2003) used Spanish–English word pairs to demonstrate that at a 0.5-s presentation rate, the spacing effect reverses itself, and at a 1-s presentation rate, it is a null effect (for further null spacing effects at 1-s presentation rates, see Waugh, 1963, 1967, 1970). In sum, there are several conditions under which people will show marked deficient processing of massed items (e.g., the deficient-processing effect), and a few cases when they will show deficient processing of spaced
Spacing and Testing Effects
79
items (e.g., fast presentation). These results obviously complicate the interpretation of spacing effects observed in many experiments; just as with rehearsal effects, they can sometimes magnify and sometimes diminish the effects of spacing on learning.
2.5. Incidental Learning and Mixed Lists: List-Strength Effects We would like to raise one final issue that is often important in considering spacing effects, and that is the presence of list-strength effects in free recall. However, far from being a negative feature of spacing experiments, we think list-strength effects provide important evidence regarding the source of spacing benefits. Therefore, while the list-strength effect makes the impostors list, we think it may be a consequence of the ‘‘true’’ spacing effect rather than a confound (more on that later, in Section 4). The list-strength effect was first demonstrated by Tulving and Hastie (1972), who showed that items presented multiple times on a study list reduced recall of the once-presented items. This inhibitory effect was consistent with global memory models like SAM that assumed that repeated items accumulate context strength and that stronger items are therefore sampled more frequently when the context is used as a cue to retrieve them (Ratcliff, Clark, & Shiffrin, 1990). However, subsequent studies posed a problem for global memory models because they demonstrated convincingly that once rehearsal was controlled, recognition memory did not show global competition effects (e.g., Hirshman, 1995; Yonelinas, Hockley, & Murdock, 1992). A more general conclusion is that more difficult tasks that invoke recollective processes tend to show the list-strength effect (Diana & Reder, 2005; Murnane & Shiffrin, 1991; Norman, 2002). However, simple cued-recall or recognition tests are unlikely to show a list-strength effect (see also Ba¨uml, 1997). The signature list-strength pattern is obtained by comparing recall on pure lists (i.e., all-spaced or all-massed lists) to mixed lists (i.e., lists with some spaced and some massed items). The list-strength effect consists of two effects when switching from pure to mixed lists. First, the spaced items show better recall on mixed than on pure lists. Second, the massed items show poorer recall on mixed than on pure lists. If this sounds familiar by now, it is because it is exactly the pattern obtained by Delaney and Verkoeijen (2009) in our studies on rehearsal. The concern that covert rehearsal was responsible for earlier list-strength effects led to extreme attempts to control encoding, but the final resolution of this work seems to be that list-strength effects emerge in incidental learning (Sahakyan, Delaney, & Waldum, 2008; Yonelinas et al., 1992). We also obtained a list-strength effect for free recall but not for recognition when we forced participants to use a rehearse-alone strategy (Delaney & Verkoeijen). A further twist to this story is that the list-strength effect has been observed only with spaced repetitions (Malmberg & Shiffrin, 2005;
80
Peter F. Delaney et al.
Sahakyan et al., 2008). Other methods of strengthening items, such as extra presentation time or deeper orienting tasks, increase recall of the stronger items, but do not produce the list-strength pattern; they produce only a main effect of strength, such that the strong items are recalled better than the weak items on both pure and mixed lists. Therefore, the list-strength effect can be equated with the spacing effect, and it directly predicts a larger spacing effect in free recall on mixed than on pure lists. Perhaps the list-strength effect, like the other encoding effects mentioned in this section, is a confound that must be eliminated to understand the ‘‘real’’ spacing effect. However, another possibility is that the liststrength effect is an indicator as to the true source of the spacing effect. Specifically, we will argue later that a theory that incorporates some assumptions about how context is stored with a trace and how different types of tests use context can provide a viable explanation of the spacing effect, once the encoding confounds described in this guide are taken into account.
2.6. Summary: The Impostor Effects and Confounds in Spacing Designs Our review of the impostor phenomena provides a bleak view of the spacing literature as a whole. Based on the above review, the ‘‘ideal’’ study should use presentation rates slow enough that people do not skip items. It should control recency very carefully, as even small biases in favor of spaced items can inflate estimates of the magnitude of the spacing effect. It should use pure-list designs (and perhaps compare those designs to mixed lists), and preferably have no primacy and recency buffers. Furthermore, it should carefully control the strategies participants use to study, preferably by using incidental-learning procedures. How many of the hundreds of spacing studies have used a design of this type? The answer is vanishingly few. As we then consider the theories of spacing and the evidence against each of those theories, it may be worth keeping in mind that we are using flawed data to reject most of these theories—albeit lots of flawed data collected in multiple laboratories using multiple methods.
3. The Failure of Existing Spacing Theories Before indicating what theoretical position we favor, we will examine the successes and failures of earlier theories. We cannot explore every theoretical perspective ever advanced in our limited space, so we will focus on theories that have been seriously considered by at least one
Spacing and Testing Effects
81
researcher in the past 20 years. Furthermore, we will mostly restrict our review to accounts of what we termed the ‘‘real’’ spacing effect, trying to ignore the various impostor effects that produce benefits of spacing over massing, but that apply in limited circumstances. To evaluate the theories, we will lay out what we see as the most important phenomena that spacing theories need to explain. Table 3 lists these major phenomena. In some cases, we will note that a phenomenon, although important, may need to be replicated under controlled circumstances in order to be sure that it is real. By the end, we will be poised to offer our thrilling alternative.
3.1. Intention Invariance We already outlined (in Section 2) the rehearsal-borrowing effect. However, one of the earliest theories of spacing effects was that there was no ‘‘true’’ spacing effect, and it was all due to rehearsal borrowing (Atkinson & Shiffrin, 1968). While we agree that rehearsal borrowing is an important problem when interpreting the spacing literature, it cannot be the full explanation of the spacing effect because spacing effects still emerge robustly in incidental learning (Braun & Rubin, 1998; Challis, 1993; Glenberg & Smith, 1981; Greene, 1989; Paivio, 1974; Rose & Rowe, 1976; Sahakyan et al., 2008; Shaughnessy, 1976; Toppino & Bloom, 2002; Verkoeijen, Table 3
Major Spacing Phenomena.
1. Intention invariance. Spacing effects emerge with both incidental and intentional learning, using a wide range of materials. 2. Age invariance. Children (including infants), young adults, and older adults all show the spacing effect. 3. Species invariance. Everything from marine mollusks (Carew, Pinsker, & Kandel, 1972) to honeybees (Menzel et al., 2001) to mice (Scharf et al., 2002) shows spacing effects of some sort. 4. The Glenberg surface. The effect of lag is jointly determined by retention interval and type of test. Typically, the relationship between memory and lag is U-shaped, with the peak of the U-curve moving further to the right as the retention interval increases. 5. Manipulating contextual variability seldom helps recall. There are numerous failures to get multiple retrieval routes to help recall compared to a single repeated retrieval route. 6. Recognition is required. Items people fail to recognize on later repetitions show little or no spacing benefit. 7. Perceptual priming effects. A priming account might handle material that is not semantically coded, like faces and nonwords, but it can’t handle semantic information.
82
Peter F. Delaney et al.
Rikers, & Schmidt, 2005). All of these experiments used mixed lists, so it is not clear from the literature whether spacing effects emerge following incidental learning on pure lists or not. Each of them is therefore vulnerable to a list-strength effect critique. Additionally, some of the experiments demonstrating incidentallearning effects may be vulnerable to deficient-processing explanations. For example, Greene (1989) wrote that ‘‘. . . asking subjects to make the same response to an item every time it occurs. . . may lead the subject to base the response to a second occurrence on memory for the response to the first occurrence.’’ Indeed, Jensen and Freund (1981) conducted two experiments in which they compared incidentally-learned lists containing either a single semantic judgment (done twice) or two different semantic judgments (done once each). The lists were mixed with respect to spacing and massing, and also included once-presented items. In both studies, mixing encoding strategies lowered subsequent free recall of once-presented and spaced items relative to using only a single dimension. However, mixing encoding strategies actually helped massed items. Very similar results were obtained with children in the first, third, and sixth grades by Toppino and DeMesquita (1984). Such results suggest a possible switch cost for using two different encoding strategies, but that massed items likely suffered from a processing deficit when rated twice on the same dimension. In other words, there was a deficient-processing effect for incidentally processed items when they were rated twice on the same dimension. There have been some attempts to argue that some incidental-learning instructions might encourage rehearsal-like processes that favor spaced items by forcing retrieval of earlier items. If people make ratings by comparing the current item to previously encountered items, for example, they would have to retrieve earlier-presented items. Because spaced items occur in more places on the list, they are more likely to be recent at any given time, and therefore may be differentially often used as the basis of comparisons, thus strengthening them. Of course, there is absolutely no empirical evidence to support this, but the study is easy enough to conduct—simply ask participants to do a rating task and ask them to report whether they make their judgment by comparing the item to another word (and if so, which one), or if they are rating it without reference to any other items. Having tried the task out on ourselves, we suspect the latter is more common, but it may vary depending on the difficulty of the rating task such that more sensitive scales (e.g., 1–9) may result in more covert retrieval than less sensitive scales (e.g., yes/no). Incidental-learning tasks that encourage people to look for a rule in a sequence may be the most likely to show covert rehearsal effects (e.g., Greene, 1989; Paivio, 1974), as the task requires comparison across items. In sum, it would be nice if there were a clearer demonstration of incidental-learning effects that could not be attributed to any of the impostors outlined in Section 2. While the balance of evidence seems to suggest
83
Spacing and Testing Effects
there is a ‘‘true’’ spacing effect in incidental learning, there is as yet no convincing demonstration using pure lists. However, in an unpublished study, Delaney and Verkoeijen asked 85 participants to view two lists of 32 medium-frequency nouns. Each word was repeated twice for 2 s on each presentation, with a 1-s interstimulus interval. The design of the study was 3 Lag (massed, spaced lag 2, and spaced lag 12) 2 Intentionality (incidental vs. intentional) design, with intentionality manipulated within-subjects and lag manipulated between subjects. That is, every participant saw two pure lists, one of which was learned incidentally and the other intentionally. The order of these lists was counterbalanced so that half of the people received intentional instructions first and the other half received incidental instructions first. The incidental-learning instructions told participants to indicate for each word either (a) whether it was man-made or not, if they saw an ‘‘mm’’ symbol; or (b) whether it was pleasant or not, if they saw a ‘‘;-)’’ symbol. They always received one of the instructions on the first presentation and the other on the second presentation. The intentional-learning instructions told them to rehearse the words aloud in order to learn the list. At the end of each list, there was a free recall test. Participants gave no indication that they expected the test, but of course it is always possible that they expected it. To summarize the results, there was a spacing effect in the incidental condition, but not in the intentional condition. Figure 2 shows the pattern of recall. Consistent with our other studies (Delaney & Knowles, 2005; Delaney &
0.40
Intentional
0.35
Incidental
Proportion recall
0.30 0.25 0.20 0.15 0.10 0.05 0.00 Massed
Spaced-2
Spaced-12
Figure 2 Proportion recall as a function of the lag between repetitions on pure lists for lists learned either via rote rehearsal (intentional) or incidentally. From an unpublished study by Delaney and Verkoeijen.
84
Peter F. Delaney et al.
Verkoeijen, 2009), pure lists with instructions to study via rehearsal produced no overall spacing effects. In contrast, incidental learning produced a significant spacing effect, although the lag effect was not significant (F < 1). Therefore, it seems that spacing effects do emerge on pure lists when studied incidentally, though we did not obtain a significant lag effect.
3.2. Age-Invariance A second clue that rehearsal cannot fully explain the spacing effect is that it occurs throughout the lifespan, even in children too young to rehearse. There are now numerous studies showing that the spacing effect emerges in children, using both recognition (Cahill & Toppino, 1993; Toppino, Kasserman, & Mracek, 1991; Vlach, Sandhofer, & Kornell, 2008) and free recall (Seabrook, Brown, & Solity, 2005; Toppino, 1993; Toppino & DeMesquita, 1984; Toppino & DiGeorge, 1984; Wilson, 1976). It persists over 48 h, at least in recognition (Cahill & Toppino). Although one study failed to obtain the effect with preschoolers (Toppino & DiGeorge, 1984), many later studies obtained it with preschool-age children (e.g., Rea & Modigliani, 1987; Toppino, 1991, 1993; Toppino et al.). Furthermore, the effect occurs with spacing lags up to 1 day for autobiographical events (Price, Connolly, & Gordon, 2006). These studies are important in part because preschool children are too young to implement a rehearsal strategy, and therefore the results cannot be attributed to rehearsal biases. Even infants show the spacing effect. Using habituation, Cornell (1980) showed babies a photo four times, with the repeated exposures spaced either ‘‘massed-like’’ with 3 s between viewings, or ‘‘spaced-like’’ with 60 s between viewings. The baby would then see the same photo again, along with a novel photo. Because babies usually like to look at novel things, they would be expected to spend less time looking at the previously seen photo if they remembered it better. In fact, babies looked longer at the massed-like photos than they did at the spaced-like photos, suggesting they had better memory for the spaced-like photos. This was true when the delay until the test was 1 min, 5 min, or 1 h. (An added advantage of the infants design is that it is not vulnerable to a list-strength effect interpretation.) Habituation is probably mediated by a kind of perceptual priming, suggesting that perceptual priming may be important for the spacing effect, especially with nonsemantic materials—a point we will return to later. Another infant study used operant conditioning of a foot kick in response to a toy mobile in 8-week-old infants (Vander Linde, Morrongiello, & Rovee-Collier, 1985). On a final test two weeks later, the response was retained better when 18 min of training were split into three sessions separated by 1 or 2 days compared to 18 min on a single day. The effect was quite large, with 48-h spacing resulting in an average of 25 kicks on the final test as compared to only 15 for massed study. As operant
Spacing and Testing Effects
85
conditioning relies on motor responses, it is unlikely to be due only to perceptual priming. What about older adults? Perhaps unsurprisingly, older adults show spacing effects roughly comparable to those of young adults (Balota, Duchek, & Paullin, 1989; Kausler, Wiley, & Phillips, 1990). Benjamin and Craik (2001) found that for both older and younger adults, spacing made it easier to discriminate studied from unstudied items than massing did. However, two lists were studied and the task was to respond only to the items from one of the lists—that is, when a source judgment was required— older adults were more likely to mistakenly endorse items from the wrong list. Younger adults showed no such trend. The study suggests that while item memory is improved with spacing in both older and younger adults, older adults do not show a spacing effect for source memory. In sum, the spacing effect seems to emerge throughout the lifespan and with many types of materials, which suggests that simple strategic explanations are insufficient to account for the results. Results like these suggest that very basic neural phenomena could be involved in producing the spacing effect.
3.3. Species Invariance A further piece of evidence that spacing effects might arise from basic memory processes comes from comparative psychological studies. One interesting study by Menzel, Manz, Menzel, and Greggers (2001), for example, used classical conditioning procedures to condition honeybees to extend the proboscis (in response to various stimuli such as carnations, propionic acid, and hexanol). They varied the spacing between acquisition trials to produce massed trials (<30 s between trials) and spaced trials (3, 10, 20, or 30 min between trials). Spacing sped acquisition of the conditioned response in the honeybees, as opposed to the slowing of acquisition that spacing produces during learning in human beings (e.g., Schmidt & Bjork, 1992). They further varied the retention interval, with intervals ranging from relatively short (30 min) out to several days. The results suggested that spacing advantages on a final test were absent at very short retention intervals, but after 3 days the advantage of spacing was pronounced, with massed trials mostly forgotten and spaced trials showing memory rates similar to the end of the acquisition period. An advantage of using the honeybees is that memory based on protein synthesis in the honeybee develops rather slowly, and the time-course is well known. Blocking protein synthesis did not affect acquisition, but it prevented the spacing effect from emerging on the final test after a delay. Specifically, after 1–2 days’ retention, blocking protein synthesis harmed spaced but not massed retention, dropping spaced recall to the level of massed recall. After 3–4 days, it harmed both spaced and massed retention, with both spaced and massed recall dropping to a low level. The results
86
Peter F. Delaney et al.
suggest that for honeybees, the spacing advantage during acquisition is independent of protein synthesis, but that to display a spacing advantage over longer periods of time requires consolidation processes. Similar results have been obtained with other organisms, including other insects like drosophila (flies) and mice—the latter of which, like humans, have a hippocampus (DeZazzo & Tully, 1995; Scharf et al., 2002). Results such as these suggest that neural consolidation processes might be involved in the advantage of spaced memories, and that these consolidation effects might occur over relatively long periods of time. Forms of consolidation theories were proposed early, notably by Landauer (1969), who proposed that when an item is repeated, the second repetition needs to be delayed sufficiently to allow for consolidation of the first response before additional learning benefits can be seen. These early consolidation theories were rejected based mainly on a celebrated study by Bjork and Allen (1970), who presented a word triplet, then repeated it either following an ‘‘easy’’ distractor task or a ‘‘hard’’ distractor task. After the second presentation, a filler task was followed by a recall prompt. Contrary to consolidation accounts, the harder task did not impair the consolidation of the first trace; in fact, harder tasks improved the benefit of restudy. However, consolidation theories need not assume that the second presentation disrupts consolidation of the first. The honeybee studies, for example, suggest that protein synthesis in the brain occurs because the spaced presentations show superior consolidation when the same neurons are repeatedly activated, suggesting the second and subsequent presentations would show slower forgetting over time—a result consistent with several mathematical models of the spacing effect (Pavlik & Anderson, 2005; Reed, 1977).
3.4. The Glenberg Surface One of the most important discoveries in the spacing effect was that spacing is not an all-or-none proposition. Melton (1967) showed that there is also a lag effect such that the longer the spacing is between repetitions, the more memory is helped. Peterson, Wampler, Kirkpatrick, and Saltzman (1963) modified Melton’s conclusions, showing that the relationship between lag and memory was actually an inverted U-shape, such that longer lags initially improve recall up to a local maximum, with longer lags than that resulting in lower recall. The U-curve was replicated by a number of early spacing researchers, including Atkinson and Brelsford (cited in Atkinson & Shiffrin, 1968) and Young (1971). Glenberg (1976, 1977, 1979) subsequently demonstrated that the retention interval between the final study episode and the test was also important. Specifically, he demonstrated that the peak point of the U-curve is proportional to the retention interval, with longer retention intervals yielding longer optimal spacings.
Spacing and Testing Effects
87
One further wrinkle to this story is that Glenberg (1976) argued that the U-curve occurs only for cued recall, not for free recall. However, he may just have used too short of a lag. Verkoeijen et al. (2005) showed the U-curve in free recall rather clearly, using both intentional and incidental learning. Toppino and Bloom (2002) used free recall and found that the peak of the U-curve depended not on the number of items that intervened between repetitions, but rather on the time between the two repetitions. Using different presentation rates and different lags between repetitions, they were able to show that some manipulations that apparently eliminated the spacing effect were just far enough along the U-curve that they produced negligible spacing benefits. Thus, it seems that the U-curve holds for both free recall and for cued recall. A major meta-analytic review by Cepeda et al. (2006) concluded that the optimal spacing typically was around 10–20% of the retention interval (see also Pashler, Rohrer, Cepeda, & Carpenter, 2007). The optimal spacing calculations, however, depend on studies where rehearsal borrowing was not controlled, and therefore may be inaccurate in some circumstances.
3.5. Deliberate Contextual Variability at the Item Level Doesn’t Help To be considered successful, any theory of spacing needs to be able to reproduce the Glenberg surface, which therefore places a major constraint on theory. One of the earliest attempts to produce the Glenberg surface was contextual variability theory—in fact, it is also the theory Glenberg himself championed. Melton (1967) was the first to propose that contextual variability could account for the lag effect. The basic idea behind contextual variability accounts is that spaced items occur in multiple contexts, and therefore have more retrieval routes by which they can be later accessed than massed items do. (The assumption that events are encoded with respect to some background context, and that the context at test is used as one of the retrieval cues to help aid recall has a long history in psychology, and underlies many modern memory models.) At short lags, context does not vary much between repetitions, producing overlapping contexts at study (and hence less resistance to forgetting). However, compared to massed items, even the contextual variability between repetitions in close proximity would still provide additional retrieval routes, favoring retrieval of spaced items. Producing the downward slope of the U-curve is more challenging. Glenberg (1976) favored an explanation in terms of the match between the test context and the study contexts. He reasoned that the study context of the first repetition would, for very long lags, mismatch the test context too
88
Peter F. Delaney et al.
much, resulting in complete reliance on the second repetition at recall. Hence, the spacing effect should become small at long retention intervals. There were a number of reasons why people abandoned the original contextual variability accounts, but chief among them was a paper by Ross and Landauer (1978) which reasoned that if contextual variability increases the likelihood of retrieving two repetitions of the same item, it should also help with recalling at least one of two different items. That is, if two words appear at a longer distance from one another, then recall of at least one of those words should be higher than if they are nearby one another, because there are multiple contextual routes to retrieve the items. This turns out not to be the case, which is a problem for the classic version of contextual variability. In addition, attempts to deliberately induce contextual variability had mixed—but generally negative—results. Early attempts to test encoding variability were based on early notions that the semantic connotation of words was biased by its neighbors, and that therefore it was encoded in different ways at different places on the list (Madigan, 1969). Therefore, researchers quite reasonably used homographs, which are words with multiple meanings, to create maximally-different connotations for words. As an example, Johnston, Coots, and Flickinger (1972) presented homographs twice together with a cue word that was supposedly to help them remember the words. For some of these, the word was deliberately chosen to bias one or the other meaning of the word (e.g., river-BANK, money-BANK) or was seemingly unrelated to the meaning (e.g., dog-BANK, spoon-BANK). They also manipulated whether people saw the same cue on both repetitions or a different biasing cue, and the lag between the repetitions. Encoding variability theory should predict that at wider spacings, different cues should result in better memory than the same cues, but that is not what Johnston et al. found. Figure 3 shows their results, which represent a fairly common pattern in the spacing literature. For biasing cues, changing the cue helped on massed items but had no impact on spaced items. For neutral cues, massed items were unaffected, but spaced items were better recalled when the same cue was present on each occasion than when different cues were presented. These results are clearly problematic for classic encoding variability theories. Other similar studies using homographs produced similar results (Bobrow, 1970; D’Agostino & DeRemer, 1973; Hintzman, Summers, & Block, 1975; Madigan; Thios, 1972), although there is one aberrant paper that might be worth trying to replicate (Gartman & Johnson, 1972). As a quick aside, Johnston et al. (1972) also included pure lists of oncepresented items as a control. They made a big deal about failing to obtain a list-strength effect on their once-presented items. However, extracting means from their data, once-presented items were recalled numerically less often on mixed lists containing some twice-presented items (27%)
89
Proportion recalled
Spacing and Testing Effects
0.70
Same cue
0.60
Two different cues
0.50 0.40 0.30 0.20 0.10 0.00 Massed
Spaced
Figure 3 Proportion free recall of capitalized words after viewing the words twice, either with the same biased cue each time (sports-FAN, sports-FAN) or different biasing cues each time (sports-FAN, electric-FAN) for massed or spaced (lag 3 or 7) repetitions. Adapted from Johnston et al. (1972).
than on the pure lists containing only other once-presented items (33%). This is a 6% list-strength effect, which is as large as the spacing effect in experiments where encoding strategy is controlled (e.g., Delaney & Knowles, 2005). It may not be reliable statistically, but hardly provides a strong failure to replicate Tulving and Hastie’s (1972) original demonstration of the list-strength effect. Later authors argued that homographs provided a poor test of the encoding variability hypothesis because the two meanings of the homographs were, in many ways, two different words entirely (e.g., Hintzman, 1974; Maskarinec & Thompson, 1976). Therefore, a second round of encoding variability tests tried to keep the items and their semantic meanings the same, but varied whether the cues presented together with those items were the same or different. For example, in Experiment 1 of their oftcited paper, Postman and Knecht (1983) presented people with a list of words embedded in sentences. Participants were told that they were supposed to remember the words for a later memory test. The lists were presented three times, so that each sentence was shown three times for a total presentation time of 15 s. Participants either saw three different short sentences using the word, or the same sentence using the word three times. Finally, participants either had a free recall test on the words, or they were tested using the cue sentences they had studied. Contrary to encoding variability accounts, the different ‘‘contexts’’ created by the multiple sentences did not produce better free recall, and actually produced worse memory on cued-recall tests. If having multiple retrieval routes should help recall, then why did it not? Their subsequent experiments were similar. Experiment 2 used incidental learning and included a test either immediately or after a 24-h delay.
90
Peter F. Delaney et al.
The results were essentially identical to Experiment 1, with variable encoding producing either equivalent or worse memory. Experiment 3 attempted to use three different instances of a noun (e.g., wine bottle, medicine bottle, and thermos bottle) or a single repeated instance of the noun and obtained results identical to Experiment 2. So much for deliberate attempts to manipulate the retrieval cues. Another approach was to vary intrinsic properties of the stimulus. One attempt was to vary the language used in the experiment. For example, an early study by Kolers (1966) used French–English bilinguals and presented words either once or several times in a spaced fashion. On some lists, words were repeated in the same language each time, while on other lists, they were repeated in each language (French and English). There were also some lists that had some items repeated in the same and some in a different language. The results suggested that there was no difference between repeating an item in the same language or in a different language, which is contrary to the predictions of encoding variability theory. A later study by Glanzer and Duarte (1971) tried something similar using Spanish–English bilinguals; they presented word pairs twice either in English, Spanish, or once in each language. Statistically, the results supported the simple interpretation that changing the language helped massed items, but had less and less effect at longer lags. However, their data in fact show that the probability of recall for a pair presented in each language (0.56) was roughly equivalent to pairs presented twice in Spanish (0.57). However, two repetitions in English showed very poor recall (0.44). Hence, their result may be an artifact of poor recollection of English-only pairs, or a result of output interference driven by recalling English-only pairs last. To our knowledge, only the massed portion of the Glanzer and Duarte (1971) study has ever been replicated (by Durgunog˘lu & Roediger, 1987, who further showed similar results with yes/no recognition—that is, better recognition of mixed-language pairs than same-language pairs). It would be interesting to see whether the spaced pattern emerged in both free recall and recognition under more controlled conditions, or on unmixed lists where output interference could be ruled out as an explanation. Nonetheless, these studies suggest that encoding variability helped massed items much more than it helped spaced items, if the results are in fact replicable. In sum, the encoding variability theory was originally proposed to account for the Glenberg surface. However, by the mid-1970s, the evidence for encoding variability was looking pretty grim, at least for withinlist contextual variation. Many studies attempted to vary the local context around an item and found that contextual variation either did not help—or even hurt—memory. Furthermore, Ross and Landauer (1978) had earlier argued that most versions of the encoding variability account implied that two different items at two places on the list should also show a memory benefit, which they do not. We will later see that a version of encoding
Spacing and Testing Effects
91
variability theory can be proposed that is generally consistent with the results outlined here, but not without introducing some different assumptions about the nature of context.
3.6. Recognition Required for Spacing Benefits The next class of theories to survive into the modern day involves studyphase retrieval, a fancy term for ‘‘recognizing that something is repeated when you see it.’’ Hintzman and Block (1973) first proposed the studyphase retrieval account, which assumes that in order to obtain the spacing effect, people must retrieve the prior presentation. Study-phase retrieval provides a straightforward explanation for the lag U-curve, because as the lag between repetitions increases, so does the difficulty of the retrieval. Harder retrievals are thought to result in more strengthening of the original trace. Furthermore, as the retrieval becomes too difficult, the probability of successful retrieval on the second presentation will drop. If the previous repetition is not retrieved, then there is no benefit of spacing, producing the downward slope of the U-curve at long lags. The earliest evidence supporting study-phase retrieval was that when people are asked to indicate how far apart two repetitions of an item occurred, their estimates track the actual distance between presentations (Hintzman & Block, 1973; Hintzman et al., 1975). The same turns out to be true for related items and homographs (Hintzman et al.), even though for two unrelated items, people are essentially at chance. This suggests that people noticed the related items during study, and tagged how far apart they were. An important study by Johnston and Uhl (1976, Experiment 2) tested some of the predictions of study-phase retrieval theory. They used a continuous recognition paradigm, whereby participants had to attempt to recognize whether a word had been seen earlier in the list. As the lag between repetitions increased, the probability of successful recognition of an item as ‘‘old’’ went down slightly (though it was still 91% at a lag of 13). Furthermore, the spacing benefit was observed on a final free recall test only for items that were successfully recognized as ‘‘old’’ during the initial study phase. There are interpretations of this study that do not require study-phase retrieval in ordinary spacing studies, however. The introduction of the continuous recognition procedure makes this a testing effect study rather than a spacing study, and we will later see that testing only benefits items that are successfully retrieved during the test (e.g., Carpenter & DeLosh, 2005). Therefore, the study may not tell us much about ‘‘normal’’ studyphase retrieval, which may not happen spontaneously. A clever experiment in the same vein by Braun and Rubin (1998, Experiment 3) noted that a strict version of the study-phase retrieval account suggests that it is the first presentation of the item that benefits
92
Peter F. Delaney et al.
from study-phase retrieval, not the second presentation. Therefore, if there were a way to differentiate the first and second presentation, one might be able to demonstrate study-phase retrieval effects directly. To get at this issue, they used the same continuous recognition paradigm as Johnston and Uhl (1976), but with a twist—instead of exact repetitions, people looked for words that had the same first three letters (e.g., BURden, BURlap). As with Johnson and Uhl’s study, they found that the longer the lag, the lower the probability of successfully recognizing that a word was a repetition during the study phase. On a final recall test, participants received the stem (e.g., BUR) and two blanks. They were instructed to write whatever words they remembered and to fill both blanks if they recalled two words with that stem. This procedure allowed Braun and Rubin to distinguish whether it was the first or second presentation that benefitted from spacing, or both. On the one hand, they found that the first word with a given stem was better recalled than the second, consistent with study-phase retrieval explanations. On the other hand, they also found that both the first and second presentation showed a spacing benefit. This latter result could be due to output facilitation, such that recalling the first member of the pair at test facilitates recall of the second member of the pair. However, if so, one would expect that if the cued-recall test were replaced with a recognition test, then the first presentation would lose its advantage. Instead, the same pattern emerged with recognition testing—spacing effects on both repetitions, with better recognition of the first than second presentation. Therefore, their results are puzzling if one assumes that spacing effects are entirely due to study-phase retrieval. Furthermore, even for massed items they obtained better memory for the first than for the second item in the pair. It would have been nice to know whether this result was due to their continuous recognition procedure (which makes this study a testing effect study), or if the same results would hold if people were not explicitly asked to watch for repetitions. A particularly nice study by Sahakyan and Goodmon (2007) took advantage of the extensive research on what words remind people of what other words (e.g., Nelson, McKinney, Gee, & Janczura, 1998). In their studies, they created lists of words that were unidirectionally-related—that is, one of the words automatically reminds people of the other, but not viceversa. They then created lists where List 2 and List 1 deliberately had this unidirectional relationship. When List 2 words reminded participants of List 1 words, the List 1 words benefitted on a later free recall test, consistent with a study-phase retrieval effect. However, when the List 1 words reminded people of List 2, memory was no different than in control conditions where List 1 and List 2 words were unrelated. There was one exception to the latter rule—sometimes, they observed output facilitation such that at the time of recall, the List 1 words were output first and then reminded people of the semantically related List 2 words. When output
Spacing and Testing Effects
93
order was controlled to force output of List 2 first, this effect disappeared. Output facilitation cannot happen with ‘‘real’’ spaced repetitions except on frequency judgment tests, since once you recall a word, it does not matter if it reminds you of its other presentation; you already output that word. These results are consistent with a study-phase retrieval account, and one of the nice things about this study is that there was no instruction to retrieve, showing that people did it spontaneously. The study-phase retrieval account provides some other nice predictions regarding the difficulty of recognizing items. Items that are harder to recognize should have a shorter optimal lag than items that are easier to recognize. It can therefore explain rather complicated patterns of data on the interaction between different types of items, lag, and retention interval. A celebrated study by Paivio (1974) used words and easily named pictures, the latter of which typically show a memory advantage over the former. In Experiment 1, the lists contained items presented once and items presented twice with a rather long lag (48 items), with all combinations of pictures and their corresponding words (i.e., repeated word, repeated picture, word then the corresponding picture, and picture then the corresponding word). Once-presented items were mixed with twice-presented items, but spaced and massed items did not appear on the same list. Furthermore, Paivio included an incidental-learning condition in which participants were unaware they would be tested and had to predict whether the next item would be a picture or a word (there was no true rhyme or reason to the order). For repeated pictures and repeated words, he obtained a spacing effect, but for repetitions where the type of the item changed, he obtained no spacing effects. In Experiment 2, he compared once-presented items with twice-presented items at varying lags (massed, spaced/24, and spaced/ 48). In this experiment, he obtained different results for intentional and incidental learning. For twice-presented words, he obtained no spacing effect on incidental but obtained it with intentional. For twice-presented pictures, he obtained spacing effects for both intentional and incidental learning. Finally, for the mixed-type items, he received no spacing effects with incidental learning and a U-shaped curve for intentional learning. If we were to summarize these results using the study-phase retrieval account, they show that the more difficult it is to recognize the previous presentation, the shorter the lag should be to obtain a significant spacing effect. Pictures are better recalled than words, so they should be recognized at a longer lag than words are, producing spacing effects even at long lags. Intentional learning leads to rehearsal, which also strengthens memorability of the items and leads to spacing effects at longer lags. Finally, mixing the type of item on the repetition tends to reduce the chance that people detect a repetition as well. Another set of results that only the study-phase retrieval account can handle involve inhibited items. For example, in retrieval-induced
94
Peter F. Delaney et al.
forgetting, people study a list of category–exemplar pairs, such as fruit– lemon or profession–scientist. Subsequently, some of the items receive retrieval practice, while others do not. The result is that on a final test, memory for the practiced items improves, while memory for the unpracticed items from the same category suffers (e.g., Anderson, Bjork, & Bjork, 1994). This inhibitory process provides a way to weaken some items below a once-studied baseline. An important recent study by Storm, Bjork, and Bjork (2008) used retrieval-induced forgetting to weaken some studied items. When the inhibited items were then presented for restudy, they subsequently showed better memory than comparable items that had been studied one time but not inhibited. This result is difficult to explain until one realizes that the difficulty of the retrieval predicts the benefit of a restudy trial in study-phase retrieval accounts. Therefore, weakening items below once-presented items can nonetheless allow for more difficult retrieval later on, thus strengthening them. In sum, the study-phase retrieval account had a number of advantages over earlier accounts. It could explain why recognition was required for a spacing benefit to emerge, why inhibited items show bigger spacing benefits than noninhibited items, and why difficult-to-learn material might result in shorter optimal lags. It therefore persists as one of the major theories of spacing to this day.
3.7. Semantic and Perceptual Priming Accounts for Cued-Memory Tasks In the 1990s, a family of priming-based accounts of spacing in cued recall and recognition emerged. They are technically deficient-processing explanations (similar to the impostors seen in Section 2), but are assumed to provide the explanation of the ‘‘real’’ spacing effect, at least for cued recall and recognition. Our view is that important findings from the spacing effect literature are difficult to reconcile with the priming account, but it is worth reviewing in detail anyway for two reasons. First, intriguing findings with nonverbal materials provide constraints on theories of spacing effects. Second, no one has yet done a major narrative review of this literature, as it is relatively new. Challis (1993) proposed the priming account, which suggests that semantic processing is critical to obtaining spacing effects. According to his account, the semantic representation of an item is activated by its first occurrence, and it remains active for a short period. During massed repetitions, the second repetition of the item occurs while the semantic representation of the item’s first presentation is still activated. During spaced repetitions, enough time has passed that the semantic representation has partially deactivated. Consequently, less total semantic processing will be devoted to later occurrences of the massed items than later occurrences of
Spacing and Testing Effects
95
spaced items, producing spacing effects. This explanation is based on the finding that semantic-associative priming is a short-lasting phenomenon, usually obtained when the prime immediately precedes the target, but not when more than one item intervenes between the prime and target words (Bentin & Feldman, 1990; Dannenbring & Briand, 1982; Kirsner, Smith, Lockhart, King, & Jain, 1984; McNamara, 1992). To test the semantic priming account, Challis (1993) designed two experiments using mixed lists of massed and spaced words. Participants were either instructed to use various orienting tasks that encouraged processing the words at a semantic level (e.g., rate them for pleasantness) or at a graphemic level (e.g., count the number of descending letters such as g and y). After viewing the list, they received a frequency-judgment test (Experiment 1) or a cued-recall test (Experiment 2). Both experiments confirmed Challis’ predictions—the spacing effect was present only with semantic and not with graphemic encoding manipulations. The semantic priming account’s central prediction is that priming should be higher for massed repetitions than for spaced repetitions, and consequently massed repetitions should be less accessible than spaced repetitions on a subsequent cued-memory test. To address this issue, Rose (1984, Experiment 2) instructed participants to answer semantic questions about words repeated at various lags. Half of the participants received the same question on all three repetitions, while the rest received a different question on each repetition. After the list, they received a surprise old/new recognition test (as well as other tests). Crucially, when a different question was presented on each repetition, answer time was unrelated to lag. In contrast, when the same question was used on each repetition, answer times were faster at shorter lags than at longer lags. Massed items were sped up the most by moving from different questions to the same question. Furthermore, the recognition data showed a spacing effect in the same-question condition, but not in the different-question condition. Although measures of semantic priming were not calculated in Rose’s second experiment, the combination of reaction times data and recognition data are consistent with the semantic priming account. Specifically, the semantic priming account predicts—in line with the experiment—that when semantic priming is reduced by asking different semantic questions about the occurrences of a repetition, the magnitude of the spacing effect should be reduced. In another study, Wagner, Maril, and Schacter (2000) tested two central predictions of the semantic priming account. Participants in an fMRI scanner did incidental semantic processing of massed and spaced words, followed by a final old/new recognition test. One prediction of the semantic priming account is that memory should be better for spaced repetitions than for massed repetitions (i.e., a spacing effect), whereas semantic priming should be lower for spaced repetitions than for massed repetitions—a result that Wagner et al. obtained. The semantic priming account further predicts
96
Peter F. Delaney et al.
that within massed and spaced repetition, the magnitude of a participant’s semantic priming effect should be inversely correlated with their memory performance. For spaced repetitions, Wagner and colleagues obtained the predicted negative correlation between mean priming level and memory performance. By contrast, within massed repetitions, no reliable correlation between priming and memory performance was found—a finding that is at variance with the semantic priming account. However, some aspects of the Wagner et al. (2000) study were suboptimal, and therefore it is difficult to draw firm conclusions about the semantic priming account on the basis of the reported outcomes. To begin, and as acknowledged by Wagner and colleagues, the inconsistent findings regarding the within-repetition type correlations may have been due to the small sample size (n ¼ 12). Furthermore, priming scores are difference scores and such scores are known to be less reliable than the constituent scores. Consequently, the power of statistical analyses that involve difference scores is often lower than analyses of the single constituent scores. Considering both of these factors, their analyses could have missed even medium-sized correlations. Furthermore, Wagner and colleagues did not attempt to control or assess baseline memory performance in their sample, which may affect priming scores. Baseline reaction times may either be positively (higher reaction times indicate a better task focus) or negatively (higher reaction times indicate lack of motivation) related to memory performance. In the former situation, the resulting positive bivariate correlation between priming and memory performance will be incorrectly interpreted as evidence against the semantic priming account, whereas in the latter situation the resulting negative bivariate correlation will be incorrectly interpreted as evidence in favor of the semantic priming account. Therefore, when assessing the relationship between priming and memory, it is important to control for baseline effects. In sum, the correlations between priming and memory performance within repetition type are difficult to interpret. 3.7.1. Difficulties for the Semantic Priming Account: Related Items A central prediction of the semantic priming account is that spacing effects should emerge not only for repetitions but also for semantically related words. There appears to be no reason why the influence of semantic priming on memory performance should differ between repetitions and associated word pairs. However, a study by Hintzman et al. (1975) suggested that spacing effects for repetition pairs and associated pairs differ tremendously (see Greene, 1990 and Stern & Hintzman, 1979, for similar studies using synonyms as stimulus materials). Hintzman et al. did not design their experiment to test the semantic priming account; the target items in the study list were organized such that pairs were associated in a backward direction. That is, the second presented word of a pair evoked the first
Spacing and Testing Effects
97
word, but the first word did not evoke the second word. Given that semantic priming (for a review see Neely, 1991) is assumed to operate in a forward direction (i.e., from the first word to the second word), the demonstrated reversed spacing effect cannot be taken as evidence against the semantic priming explanation. Verkoeijen, Rikers, Pecher, Zeelenberg, and Schmidt (2010) conducted a study to address this issue. In their Experiment 2, lists containing either massed and spaced word pairs with forward associations (e.g., fork–knife) or massed and spaced repetitions (e.g., knife–knife) were shown. They were encoded incidentally using a semantic yes/no judgment as to whether they would fit into a big box (the experiment was conducted in Dutch and ‘‘big box’’ refers to a specific size). At the end, a yes/no recognition test was administered. Semantic priming, as assessed by reaction times during the study phase, was larger for massed pairs than for spaced pairs both in the associated-pairs condition and in the repetition condition. In addition, the recognition data showed a reversed spacing effect in the associated-pairs condition, whereas a standard spacing effect was revealed in the repetition condition. The finding—observed in the associated-pairs condition—that a larger priming effect was associated with a better memory performance was interpreted as evidence against the semantic priming account of spacing effects in cued-memory tasks. Some researchers (e.g., Mammarella, Russo, & Avons, 2002) have argued that Challis (1993) was referring to semantic repetition priming rather than to semantic associative priming, which would render results using associated pairs irrelevant. To us, however, Challis’ account seems more consistent with an associative-semantic priming interpretation than with a semantic repetition priming interpretation. After all, associative priming is a short-lived phenomenon and repetition priming is not (e.g., Dannenbring & Briand, 1982; Kirsner et al., 1984; Zeelenberg & Pecher, 2002). Repetition priming can endure a retention interval of several days (e.g., Jacoby, 1983), whereas associative priming disappears when one or two items intervene between the prime and the target (e.g., Dannenbring & Briand; McNamara, 1992). In addition, several of the papers referred to by Challis suggest he had an associative priming explanation in mind when proposing his priming account as the experiments reported in these papers used semantically associated word pairs (e.g., Neely, 1977; Smith, Theodor, & Franklin, 1983). Thus, clearly Challis was not excluding the possibility that spacing effects were due to semantic-associative priming. Arguably, even a semantic repetition account has trouble accounting for the Verkoeijen et al. (2010) study’s results. According to the semantic repetition priming account, the spacing effect emerges because priming of semantic features is stronger for massed repetitions than for spaced repetitions. When associated pairs are used, priming of semantic features will also take place, albeit to a lesser extent than with repetitions. So, if priming of semantic features is
98
Peter F. Delaney et al.
indeed underlying the spacing effect, a straightforward prediction seems to be that the spacing effect is smaller for associated pairs than for repetitions. Verkoeijen et al.’s inverse spacing effect for associated pairs runs counter to this prediction. The question that remains then is why priming of semantic features produces a spacing effect in repetitions and an inverse spacing effect in associated pairs. Furthermore, the semantic priming account has difficulties with the results of a study by Peterson, Hillner, and Saltzman (1962; see also Peterson et al., 1963, Experiment 3), who presented a list of paired associates containing some massed repetitions and some spaced repetitions, with the latter 8 s apart. Retention interval was varied from 2 to 16 s. The semantic priming account dictates that the spacing effect emerges during study due to deficient processing of massed repetitions as compared to spaced. Furthermore, according to the semantic priming account there is no reason to expect that the spacing effect interacts with the retention interval. However, Peterson and colleagues obtained a reverse spacing effect at retention intervals of 2–4 s, and a regular spacing effect at retention intervals of 8–16 s—a pattern of results that clearly contradicts the semantic priming account. The semantic priming account also applies only to a restricted range of lags and retention intervals, given the relatively short duration of priming. It cannot predict the nonmonotonic relationships between spacing interval and retention interval which are observed in cued-recall tasks at very long delays (see Cepeda et al., 2009; 2006; Cepeda, Vul, Rohrer, Wixted, & Pashler, 2008), because it predicts that memory performance should increase with spacing until the interval at which priming no longer occurs, after which memory performance will remain at a constant, asymptotic level. Also, this predicted relationship between spacing and memory performance should be independent of the retention interval. (Of course, these longer-term spacing effects could rely on a completely different mechanism.) 3.7.2. Mounting Evidence for a Structural-Perceptual Priming Mechanism Another serious problem with the semantic priming account is that it incorrectly predicts null spacing effects for complex nonverbal materials that are processed perceptually and not semantically (Russo, Parkin, Taylor, & Wilks, 1998). However, spacing effects are obtained with nonsense shapes (Cornoldi & Longoni, 1977) and unfamiliar faces (Parkin, Gardiner, & Rosser, 1995; Russo et al.). In defense of the semantic priming account, one might argue that participants use some kind of semantic processing mode to encode the pictorial targets. Russo et al. sought to rule out this alternative explanation. Russo et al. reasoned that semantic processing of faces will most probably not occur when participants perform
Spacing and Testing Effects
99
orienting tasks that focus on perceptual features of the faces. In their Experiment 3, participants studied massed and spaced unfamiliar faces while judging each on symmetry and length—the equivalent of a ‘‘graphemic’’ manipulation for faces. However, a robust spacing effect emerged, which is at odds with the predictions of Challis’ (1993) theory. To account for the results with complex perceptual materials, Russo and colleagues proposed a theoretical framework that distinguished two qualitatively different priming mechanisms. For semantically processed materials, the spacing effect in cued-memory tests was thought to be produced by Challis’ (1993) semantic priming mechanism. Alternatively, for stimulus materials unlikely to be processed semantically, a structural–perceptual priming mechanism is assumed to underlie the spacing effect. The structural–perceptual priming explanation is conceptually analogous to the semantic priming mechanism, with the exception that the structural– perceptual mechanism operates at an item’s orthographic, rather than at its semantic level of representation. Several recent studies provide support for the structural–perceptual priming explanation of the spacing effect for nonsemantic materials. Russo, Mammarella, and Avons (2002) presented participants with mixed lists containing massed and spaced nonwords. On each repetition, participants performed two graphemic orienting tasks. On half of the trials, both repetitions were in the same font, whereas for the rest each occurrence was in a different font. This was expected to reduce perceptual priming, and hence reduce the spacing effect. Consistent with their predictions, there was a spacing effect for the same-font nonwords, but no spacing effect for the different-font nonwords. Furthermore, supporting their claim that structural–perceptual priming is a mechanism specific to the spacing effect for unfamiliar stimuli, when words were substituted for the nonwords in another experiment, a spacing effect emerged with both same and different fonts. Their Experiment 3 provided the most convincing corroboration of the structural–perceptual priming mechanism. Participants performed a lexical decision task on words and nonwords that were presented once, repeated twice in a massed fashion, or repeated twice with lags of three or six items between the repetitions. As in their other experiments, the font was changed between the repetitions for half of the items and kept the same for the rest. Consistent with the priming account, repetition priming for words decreased as a function of inter-repetition lag, with a comparable decrease for words repeated in the same font as for words repeated in a different font. In contrast, for nonwords presented in the same font, repetition priming sharply declined as a function of lag. However, nonwords repeated in a different font showed much less priming decrement. The memory data closely reproduced the data from their earlier studies, and could be mapped on to the priming data. Taken together, the Russo and
100
Peter F. Delaney et al.
colleagues experiments provide converging evidence that a structural– perceptual priming mechanism causes the spacing effect in cued-memory tests for unfamiliar materials. Other studies partially replicated or extended the results of Russo et al. (2002). Russo and Mammarella (2002) asked participants to evaluate perceptual features of either words or nonwords that were repeated in a massed or a spaced fashion (see also Mammarella, Avons, & Russo, 2004). Subsequently, participants were given a yes–no recognition test on the studied items. Similar to the findings obtained by Russo and colleagues in the 2002 study, Russo and Mammarella found a spacing effect for nonwords but not for words. Comparable studies with faces and ‘‘nonfaces’’ was conducted by Mammarella et al. (2002) with similar results. Their Experiment 1 provided, in our view, the strongest test of the structural-priming mechanisms. Faces and nonfaces were shown repeated in the same pose or in a different pose at three different inter-repetition lags: massed, lag 2, and lag 4. For each item, participants indicated whether it was a face or not. At the end of the list, there was an old/new recognition test. As in Russo et al.’s (2002) Experiment 3, changing the pose between two occurrences of a repetition reduced perceptual priming, and this decrease was larger for massed items than for spaced items. Also, the pose change eliminated the spacing effect in yes-no recognition. In sum, with both nonwords (Russo et al., 2002), and with unfamiliar faces (Mammarella et al., 2002) manipulations that reduced structural repetition priming also eliminated the spacing effect. However, caution should be exerted when interpreting this correlation, as an unknown third variable might explain both the reduction of structural repetition priming and the reduction of the spacing effect. 3.7.3. Conclusions about Priming Mechanisms The empirical evidence seems stacked against the notion that semantic priming explains spacing effects with cued-memory tests. Among other failures, the semantic priming account cannot explain why words that semantically prime later-seen words produce better final-test memory of the primed words, when the priming account predicts the opposite. The evidence against the perceptual priming explanation for meaningless materials is currently weaker. Indeed, there are some reasons to think it may be a real phenomenon, though the jury is still out. Even if the latter holds, it may be more properly classified as an ‘‘impostor’’ phenomenon that applies in very restricted circumstances, as it does not explain spacing effects with meaningful materials. Finally, there are competing accounts that may be able to explain the results from the meaningless materials more parsimoniously—a point we will consider in the next section of our review.
Spacing and Testing Effects
101
3.8. Hybrid Accounts The modern trend has been to suggest that spacing effects cannot be explained by a unitary mechanism—a trend that we wholeheartedly endorsed in our lengthy expose´ of impostor spacing effects in Section 2. An oft-cited paper by Greene (1989) laid out a major theory that proposed two separate processes are involved in spacing effects. He was one of the first to note that the spacing effect depended on the type of test used. Specifically, he distinguished between cued tests—which include cued recall, frequency judgments, and recognition (in recognition, the ‘‘cue’’ is the item itself)—and free recall. Greene never tested cued recall, so his ‘‘cued’’ tests were recognition and its cousin frequency judgment. For cued tests, he argued that rehearsal was extremely important because the spacing effect and the lag effect both occur under intentional-learning conditions, but not under incidental-learning conditions. Therefore, he reasoned, rehearsal processes were likely responsible for the spacing and lag effects in such cases. However, for free recall, he obtained spacing and lag effects regardless of whether intentional or incidental learning were used. He believed this represented study-phase retrieval effects. Our work has shown that rehearsal is important in both recognition and free recall (Delaney & Verkoeijen, 2009) and contributes to both similarly. It is curious that Greene (1989) failed to obtain a spacing effect with recognition following incidental learning, given that spacing effects emerge when participants are encouraged to rehearse only the current item (e.g., Delaney & Verkoeijen). Later research suggested that it was the low level of semantic processing encouraged by his encoding instructions that may have eliminated the spacing effect, which is sometimes observed with incidental learning, provided semantic orienting tasks are employed (Challis, 1993; Greene & Stillwell, 1995). It may also be that the long 10-s presentation rate and instructions to look for a rule resulted in unusual rehearsal patterns; for example, perhaps people focused heavily on the reasons why some items were massed, resulting in unusual attention to massed repetitions. Finally, the anomalous result may simply reflect that with a very long presentation time—most of which was not used for studying—and incidental learning, items were very weakly encoded and so the optimal lag was quite short. A more recent hybrid model by Raaijmakers (2005) combined encoding variability theories and study-phase retrieval theories. Raaijmakers only sought to explain cued-recall paradigms, but we will see later that extending the model to free recall is not terribly difficult with some additional assumptions. The model—which is based on Raaijmakers and Shiffrin’s (1980, 1981) SAM model of recall—proposes that when an item is repeated, if people recognize the item as old, then they strengthen the memory for that item. If not, then they store the presentation as a new trace.
102
Peter F. Delaney et al.
It is worth digressing into the SAM model of recall, because our own approach is grounded in SAM as well, and differs only a little from the Raaijmakers (2005) model. In SAM, memory traces contain three types of information: item content information, item context information, and associative information. Item content information refers to features of the item itself, such as its semantic properties, phonemic properties, and so on. Item context information refers to the link between the item and its context, and context is usually list membership in memory experiments. Finally, associative information stores relationships between traces in memory. (In fact, item context information is treated as a special case of associative information where the associate is the background context.) A useful thing about discriminating item content from item context information is that they may interact with the type of test in interesting ways. The SAM model assumes that retrieval involves two types of processes, which are called sampling and recovery. When people attempt to retrieve, they start by sampling a memory trace from the sea of all memory traces. The sampling process begins with the cues present on the test (including the test context). Therefore, for cued recall and recognition, the associative information is usually the most relevant to sampling, with items that are more strongly linked to the test cues being most frequently retrieved. However, context information also plays a role, as the test context is always implicitly part of the test cues. In free recall, there is usually no test cue present, and so people rely almost exclusively on the context cues (at least initially). Importantly, sampling is a function not only of the strength of the ‘‘correct’’ item but of its strength relative to all the other items in memory. Thus, the chance of sampling the ‘‘correct’’ answer goes up as a function of the strength of the link between the test cues and the item, but it also depends critically on the strength of all the other items in memory to the same test cues. Only once an item’s image in memory is sampled do people attempt to recover the image. An item that cannot be successfully recovered will not be remembered. Recovery also depends on strength, but absolute strength (not relative strength). That is, for recovery it is not important how strong other competing items are; it only matters how strongly the cues activate the target item. In SAM, all of the strengths are usually incremented whenever study is happening. Longer study results in stronger links between the cues and stronger item strength. (We will later propose that for free recall, this rule may be different, but Raaijmakers did not alter these default assumptions.) Raaijmakers introduced additional assumptions in order to account for the lag effect. Specifically, he assumes that contextual elements change over time, and that when an item is encountered a second (or third) time, a retrieval attempt is made. If the item is still active in short-term
Spacing and Testing Effects
103
memory, then there is no need to make a retrieval attempt, and so no further information is stored. However, if the item has already dropped out of short-term memory, then the usual sampling and recovery process is engaged. If the retrieval of the prior presentation is successful, then additional contextual elements are stored with the image. If retrieval fails, a new image is generated. This mechanism captures the study-phase retrieval mechanism because a successful retrieval is required to store additional information about an item. At long lags, the context information mismatches, and so the probability of a successful retrieval drops. It also captures the basic lag effect, because the more time has passed since the first presentation, the larger is the contextual change between the two repetitions, resulting in a stronger link to the context. Finally, Ross and Landauer’s critique that once-presented items in multiple places on the list should benefit from context variability (they do not) does not apply to the model, since additional contextual information is only stored following a successful study-phase retrieval. It does not predict better memory for two unrelated items when they are spaced apart, because context information is only incremented when a successful retrieval occurs. Hence, the Raaijmakers model incorporates both study-phase retrieval mechanisms and contextual fluctuation mechanisms.
3.9. Summary: Theories and Key Phenomena Our review proposed that rehearsal explanations were insufficient to fully explain the spacing effect. Among other findings that argue against the rehearsal explanation is our demonstration that pure lists with incidentallearning instructions still show a spacing effect. Another set of findings that suggest rehearsal is not the end of the story is that spacing effects are obtained with infants and honeybees, who do not rehearse. Classical versions of encoding variability theories, which propose that spaced items have more varied neighbors during study, were proposed to explain the Glenberg surface, which related optimal lag and retention interval. However, they also failed in a long series of studies that showed that deliberate attempts to vary the neighbors of items did not help memory. Another theory that largely failed is the semantic priming account, which attributed spacing effects to priming-related processing deficits. Although a perceptual priming version of the account survives for ‘‘meaningless’’ materials, the semantic priming account failed on numerous counts. The study-phase retrieval theory, which attributes spacing effects to the benefits of covert retrieval of previously seen items when they are re-exposed, fares better and is able to explain many important phenomena, including why difficult-to-learn items have shorter optimal lags and why inhibited items show a larger spacing effect than noninhibited items. More recent versions of the encoding variability account, built
104
Peter F. Delaney et al.
around the SAM model, are likewise able to handle many more of the phenomena outlined here. It seems likely that hybrid accounts incorporating some aspects of study-phase retrieval and encoding variability theory are likely to provide the best-developed theoretical accounts of spacing.
4. Extending a Context Plus Study-Phase Retrieval Account of Spacing Effects Our goal in this section is not to propose a full-fledged theory of spacing, but rather to add support to accounts similar to the SAM account proposed by Raaijmakers (2005). We will propose some likely extensions to his approach while providing additional experimental evidence that is consistent with the general account. We remind the reader that testing these accounts requires carefully controlling for all of the impostor spacing phenomena outlined in Section 2, and that many of the puzzling results in the literature that seem contrary to its predictions can be handled by noting that those results are vulnerable to one of the many criticisms raised in that section. The Raaijmakers model as written has some shortcomings. First, quantitative tests of the model have been restricted to cued recall and intentional learning. Predictions regarding free recall seem less clear. Second, rehearsalbased confounds are possible in all of the data that Raaijmakers modeled. The data may therefore be vulnerable to critiques based on some of the impostor spacing phenomena in Section 2. Third, the short-term memory mechanism in the Raaijmakers model does not allow for enhanced context storage when an item is encountered again while it is still in short-term memory—a mechanism essential for capturing the deficient-processing impostor phenomenon in the model. However, it may be too generous, at least for cued recall, because it overestimates the probability that short spacings produce no memory benefits (Pavlik & Anderson, 2005). Fortunately, a possible solution to the third problem has been published by Malmberg and Shiffrin (2005) as part of their research on the list-strength effect. We will therefore consider their solution next.
4.1. An Account of the List-Strength Effect Using SAM Malmberg and Shiffrin (2005) have shown that a version of SAM/REM2 can elegantly handle the list-strength effect results outlined in Section 2.5 by incorporating some assumptions about how item content and item context information accrue during study. To remind the reader, the list-strength effect occurs with mixed lists that contain some ‘‘strong’’ and some ‘‘weak’’ 2
REM is Retrieving Effectively from Memory, and is a revision of the original SAM theory.
Spacing and Testing Effects
105
items. Recall of the strong and weak items on the mixed list is compared to recall on pure lists; pure-strong lists contain only strong items, and pureweak lists contain only weak items. Compared to the pure-strong list items, strong items on mixed lists are better recalled. Compared to the pure-weak list items, weak items on mixed lists are more poorly recalled. Thus, on mixed lists, the strong get stronger and the weak get weaker. Furthermore, the effect happens robustly in free recall, but not frequently in cued recall or recognition, indicating that at least in free recall, one will often observe dissociations such that mixed lists show larger spacing effects than pure lists. Malmberg and Shiffrin (2005) demonstrated that spacing is a special sort of strengthening in that it produces list-strength effects. Other kinds of strengthening manipulations such as giving each item extra massed-study time increased recall, but they did not produce a list-strength effect. Therefore, they proposed that spaced repetitions strengthen the link between items and their contexts, whereas other strengthening manipulations affect only the item strength. Specifically, Malmberg and Shiffrin proposed that each time an item is encountered, people store one ‘‘shot’’ of context, provided sufficient study time is allowed. During additional study time beyond that minimum—or during massed repetitions, which amount to the same thing as extra study time—no additional ‘‘shots’’ of context are stored; only the item strength is increased by massed study. However, if sufficient lag occurs between two repetitions, then both the item and context information are incremented. Their computational model based on these assumptions was able to successfully model important aspects of their experiments. The Malmberg and Shiffrin version of SAM/REM preserves the distinction between two retrieval processes—sampling, which depends on the strength of items relative to all others in memory, and recovery, which depends only on that item (and its associative links to the probe cues). However, unlike in some other versions of SAM, the context strength plays a role mainly during sampling, and not during recovery. According to their model, the list-strength effect is a relative strength phenomenon produced by the sampling process. That is, it depends on the competition between strong and weak items at test, not on the absolute strength of the items. In free recall, people use the context to sample items. For pure lists, there is no competition between strong and weak items (because they are not on the same list), so there is no sampling advantage for strong over weak items. However, on mixed lists, the strong items are sampled more frequently than the weak items. Consequently, the strong items benefit (relative to pure lists) and the weak items suffer (relative to pure lists). Comparing the Malmberg and Shiffrin model to the Raaijmakers model, Malmberg and Shiffrin make no assumptions about contextual drift over time. The Malmberg and Shiffrin model therefore does not predict a lag
106
Peter F. Delaney et al.
effect (which is wrong, at least in some circumstances). The Malmberg and Shiffrin model also does not make the assumption that items in short-term memory are not strengthened; in fact, it rather explicitly makes the opposite prediction, that items receive item content information as long as they are still being processed. Finally, the Malmberg and Shiffrin model works for free recall and has not been extended to cued recall, whereas the Raaijmakers model works only for cued recall. A productive activity for modeling researchers would be to merge the models into a more general SAM/REM account that can make accurate predictions on a broader range of data.
4.2. A Modified One-Shot Account of Spacing? There are many possible ways to unify the two models, and actual modeling efforts will be needed in order to identify which is correct. However, we will take a stab at proposing a ‘‘verbal theory’’ that incorporates some quantitative assumptions in order to demonstrate the plausibility of a model based on these principles. (At the very least, future cognitive models will be able to use our list of phenomena to compare competing models; at best, our proposed theory can form the basis of a rigorous quantitative model that captures our intuitions.) The basic theory is similar to the Malmberg and Shiffrin (2005) theory, except that we add additional assumptions to account for cued-recall and recognition tests. Specifically, we assume that the first time an item is studied stores context, associative, and item information. Item content information continues to accumulate as long as the item is seen; context information rises to a maximum value (‘‘one shot’’) and then stops. This assumption is lifted directly from the Malmberg and Shiffrin (2005) model. Our possibly controversial addition to their account is that extra study time should not strengthen item-to-item associative information either; it also rises to a maximum value and then stops, unless study-phase retrieval occurs. We need this assumption in order to explain cued-recall spacing effects, because cued recall is less reliant on background context than it is on the association between the test cue (which is present both at study and at test) and the item.3 To explain the spacing effect, we assume that when an item is seen, people automatically initiate a search of memory for the identical or highly related items. This reminding process is the study-phase retrieval part of the account and follows the usual SAM/REM conventions; memory images are sampled using the current item and the background context as the cue and 3
Actually, a more plausible assumption may be that there are two kinds of associative information, one that is used mainly in sampling and the other that is used mainly in recovery. It would be the former kind that discriminates spacing from massing.
Spacing and Testing Effects
107
followed by an attempt to recover sampled images. If an item is recovered— usually the same item, but highly related items may also be found—then the accumulation process begins again for the recovered item, strengthening item content, context, and associative information. It is worth stressing that the recovered trace is strengthened, and a new trace is not stored. If, however, an item has been seen before but the person fails to retrieve it, then a new image is stored instead. This differential strengthening produces the recovery advantage of items studied for a long time without increasing their sampling advantage. Next, to explain the lag effect, we must introduce a mechanism that strengthens context and associative information when a successful retrieval occurs. The magnitude of this strengthening must be proportional to the difficulty of the retrieval, such that more difficult retrievals (along some dimension of difficulty) result in more strengthening (a closed loop phenomenon; see Murdock, 2003). We favor an account similar to the Malmberg and Shiffrin (2005) account, but that is extended to provide increasing context and associative information storage as a function of the lag between repetitions. Specifically, we assume that the increase in context and associative strength is proportional to the difference between the maximum possible strengths and the current strengths at the time of retrieval. Thus, for a short-lag item, the current strength will be relatively strong when its second and subsequent presentations come along, resulting in a ‘‘shot’’ of context and associative information that is relatively small. However, for long-lag items, the current strength will be weaker when its second and subsequent presentations appear, producing a ‘‘shot’’ of context and associative information that is relatively large. However, we note that there are many alternative formulations of this general rule that could be explored, so we will leave it at the level of a ‘‘verbal theory’’ for now. At the test, the sampling and recovery implications of the ‘‘modified’’ one-shot hypothesis produce different effects depending on the type of test. For cued tests like recognition and cued recall, the impact of contextual information is minimized because there are associative cues present at both study and test. The list-strength effect is absent because at test people rely mainly on associative strength for sampling, and not on context strength. However, there is still a spacing benefit, because associative strength is increased during spaced restudy. For free recall, people rely less on associative cues and more on context strength. This produces a list-strength effect according to the same types of mechanisms described in Section 5.1. An account like this can readily handle the phenomena in Table 3. The account produces intention invariance, as the reminding process is obligatory. It can be interpreted to produce age invariance, as there is nothing strategic about the process. The Glenberg surface emerges because the gain in associative and contextual strength is bounded, producing a local maximum gain at a certain retrieval distance. Furthermore, as the distance
108
Peter F. Delaney et al.
between an item and its repetition increases, the probability of successful reminding goes down (a recency effect). Together, these mechanisms produce the U-shaped curve. Manipulating contextual variability will seldom help recall, because changing the cue between restudy opportunities means that the associative strength does not go up. Finally, a study-phase retrieval mechanism is used, so recognition is required to get a spacing benefit. Last, we note that this account turns the list-strength effect from an impostor effect to a signature effect that is directly predicted by the model. Rather than viewing the list-strength effect as a problem, we view it as fully consistent with our context-based account.
4.3. Some Experiments Linking Context and Spacing The previous account assumes that context strengthening is critical to producing spacing effects. By ‘‘context’’ we mean neither the semantic connotation of a word nor the other words that surround an item on a long list; rather, we are referring to incidental background stimuli that are present during encoding. For example, when the physical environment during test mismatches the physical environment during study, memory is reduced (Godden & Baddeley, 1975; Smith, 1979, 1984; Smith, Glenberg, & Bjork, 1978). Internal states can also be part of the context, such as mood (e.g., Eich, 1980). While Raaijmakers’ version of SAM assumes that list context fluctuates as people move through the list, it may well be that within a short list, the context does not change very much between the items. In such cases, the ‘‘list’’ context is relatively stable, and can be approximated without modeling the small drift in context during the list. Anderson and Bower (1972), for example, successfully modeled many list learning paradigms with a model called FRAN that linked studied items to a global ‘‘list’’ node. However, between lists or over time, the background context is likely to fluctuate more dramatically. Hence, context may change at different rates depending on what people are doing. One piece of evidence that context fluctuates at a different rate withinlist and between-list is that if you ask people to tell you how far apart two unrelated words were within a list, they are not very calibrated (Hintzman & Block, 1973; Hintzman et al., 1975). However, if you ask them to tell you which list an item came from, they are often quite accurate. Intrusion errors across lists are surprisingly rare following relational encoding like that in typical spacing studies, and people are capable of recalling items from a previously studied list even when a subsequent list intervenes, suggesting that people have good memory for list membership (e.g., Shiffrin, 1970). An important study by Smith et al. (1978) demonstrated that studying a list in two distinct environmental contexts reduced the forgetting when
Spacing and Testing Effects
109
tested in a third context relative to studying a list twice in the same environmental context. This result is important because it shows that encoding variability at the list level is quite predictive of later recall, even if at the item level it is not. In general, studying in multiple contexts protects against forgetting caused by mismatch of the test and study contexts. One reason this might be true is that the test context has a better chance of matching some components of multiple contexts than it does of matching components of a single study context—essentially, an encoding variability phenomenon. Contexts may be more likely than individual cues (e.g., a in an a–b pair) to contain components that repeat across different contexts, as they are composed of a large number of features compared to the relatively few features present in a single cue. Turning now to within-list context, a recent paper from one of our laboratories (Verkoeijen, Rikers, & Schmidt, 2004) explicitly manipulated the background against which words were studied. Experiment 1 used solid background colors, while Experiment 2 used a cityscape and a forest image. As with many studies that varied properties of the stimulus, changing the background improved memory for massed items compared to keeping the background the same. This is probably because of one of the impostors outlined in Section 2, deficient processing. However, for spaced items, changing the background reduced the probability that people would retrieve the earlier presentation, thereby lowering subsequent recall. These results are very similar to the results of the homograph studies and the change-of-encoding studies reviewed earlier (see Section 3.5).
4.4. Directed Forgetting as a List-Strength Phenomenon Another piece of evidence that context is important to the spacing effect comes from a recent study from one of our laboratories (Sahakyan et al., 2008). We asked participants to rate mixed lists of spaced and massed words for pleasantness and animacy by giving a simple yes/no judgment on each dimension. To reduce deficient-processing explanations of our results, we used a different rating dimension on each presentation. After the first list, half of our participants were instructed to try to forget (i.e., directed forgetting) the previously studied list, as it was just for practice, and that the real list would be coming next. The other participants were told that it was just the first half of the list, and that they should keep rating words on the second list. After the second list, there was a distractor task followed by a free recall test on List 1. Directed forgetting is often explained as a context effect (Sahakyan & Kelley, 2002). Sahakyan and Kelley argued that people comply with a forget instruction by ‘‘thinking of something else’’ which results in a new mental context being set up for the second list. At the time of the test, the test context better matches the second list than the first list, producing impaired
110
Peter F. Delaney et al.
recall of List 1 items, but enhanced recall of List 2. Consistent with their explanation, Sahakyan and Kelley found that when the original context was reinstated using guided retrieval, memory for List 1 items recovered (and List 2 items suffered). Furthermore, when participants are told to keep remembering but were instructed to engage in a distracting thought—such as imagining themselves invisible or imagining their parents’ house—they still show impaired List 1 recall (Sahakyan & Delaney, 2003; Sahakyan & Kelley). From the perspective of SAM/REM models, directed forgetting is conceptually identical to a list-strength effect. List 1 items are associated with a context that mismatches the test context, and so List 2 items are sampled more often than List 1 items are. Once sampled, their recovery is unaffected. This is why it is difficult to obtain forgetting using recognition tests (e.g., Sahakyan & Delaney, 2005): just as in the list-strength effect, the primary cue that people rely on during recognition tests is the item shown, not the context. One would expect that with very difficult recognition tests that rely on retrieving specific stimulus information and not on familiarity alone, people would tend to rely more on the context. Indeed, it has been shown that asking people to discriminate whether items were presented in their singular or plural form (e.g., baker vs. bakers) and other recognition tests that rely heavily on recollection produce both list-strength phenomena (Diana & Reder, 2005; Norman, 2002) and directed forgetting of List 1 items (Sahakyan, Waldum, Benjamin, & Bickett, 2009). Another directed forgetting finding that falls directly out of SAM/REM is that List 2 must be studied to produce forgetting. Because SAM/REM assumes that directed forgetting is a sampling phenomenon, without competing items to preferentially sample, there should be no directed forgetting. This prediction has also been confirmed in the directed forgetting literature (Bjork, 1989) and using Sahakyan and Kelley’s (2002) context-change paradigm (Pasto¨tter & Ba¨uml, 2007). Furthermore, one should expect that the more List 2 encoding occurs, the larger is the competition, as there are more items that could potentially be sampled—a prediction confirmed by Pasto¨tter and Ba¨uml (2010). The Sahakyan et al. (2008) study effectively combined a list-strength paradigm on List 1 with the two-list directed forgetting paradigm. A direct prediction is that since spaced items on List 1 are more strongly linked to the context than the massed items on List 1, then spaced items should also suffer more forgetting than massed items. Indeed, this is exactly what we found in our study, whose main results are reproduced as Figure 4. The greater forgetting of spaced than massed items is a direct—if counterintuitive— prediction of the SAM/REM account because spaced items are more closely linked to List 1 context. When the test context changes, their usual sampling advantage is lost, and they drop to near the level of massed items. One detail of Figure 4 that is often puzzling is that massed items apparently show no directed forgetting at all. However, this is exactly what the
111
Spacing and Testing Effects
0.60
Proportion recalled
0.50
Forget Remember
0.40 0.30 0.20 0.10 0.00 Massed
Spaced
Figure 4 Proportion free recall of words as a function of spacing and intentional forgetting cue. Adapted from Sahakyan et al. (2008).
model predicts! Specifically, as the test context is no longer such a good match for List 1 spaced items following intentional forgetting, List 1 spaced items no longer enjoy such a large sampling advantage over massed items. As with the list-strength effect, strengthening the link between items and their context not only enhances their memory relative to weak competitors but also drives down recall of the weak competitors. Consequently, when the spaced items lose that advantage, massed items show spontaneous recovery. Therefore, while massed items are being ‘‘forgotten’’ because List 2 items are now competing with them more effectively, they are also simultaneously receiving less competition from the weakened List 1 spaced items, and hence the List 1 massed items become more memorable compared to the remember condition. These countervailing effects largely cancel one another out, producing no forgetting of massed items. In sum, the Sahakyan et al. (2008) experiment is an important contribution because it confirms counterintuitive predictions made by the SAM/ REM model of the list-strength effect. The full pattern of results of the study are difficult to explain without understanding what the model would predict, and are fully consistent with the context-based theory of spacing outlined here.
4.5. Summary and Untested Predictions of the Account To summarize Section 4, we proposed that the Raaijmakers (2005) account of the spacing effect is largely correct. However, we proposed some extensions based on the Malmberg and Shiffrin (2005) SAM/REM account of the list-strength effect, which refers to the larger spacing effect observed with mixed lists than with pure lists. The hybrid account is able to explain
112
Peter F. Delaney et al.
why the list-strength effect emerges in free recall but not in recognition or cued recall, and correctly predicts greater list-method directed forgetting from spaced items than from massed items on mixed lists. It also correctly predicts the pattern of recall in experiments that manipulate background incidental context. Taken together, this tentative account provides a number of correct qualitative predictions on which a computational model could be constructed. The tentative theory described herein also makes a number of as-yet untested predictions. We are in the process of testing some of these, but consider it worth outlining them now to set others thinking along the same direction. The first and perhaps most amazing omission is that no one has demonstrated that longer lags between spaced repetitions produce liststrength effects. One should generally predict that mixing long-lag and short-lag items in a free recall study should produce a list-strength effect. This is not predicted by the original Malmberg and Shiffrin (2005) account, but it would be predicted from our expanded version of their model. Longlag items would store more context than short-lag items, resulting in a sampling advantage at the test. They should then be output sooner and with higher frequency than the short-lag items. A failure to obtain a lag liststrength effect would falsify the prediction of greater context storage following more difficult retrievals. One potentially surprising prediction is that the spacing effect should interact with retention interval differently depending on the type of test. As the retention interval gets longer, there is a greater mismatch between the test context and the study context. In free recall, spacing results in a stronger link between the item and the study context. Hence, if the test context mismatches the study context, then spacing will confer relatively little advantage over massing (for evidence that this pattern is observed in directed forgetting, see Sahakyan et al., 2008). Therefore, we anticipate that in free recall, increasing the retention interval will tend to reduce the advantage of spaced items. In contrast, in cued tests like recognition and cued recall, target items are associated during study with the cue. This cue will be presented again at the test, suggesting that for such tests the spacing effect should get stronger and stronger over time. This latter prediction has been confirmed in a number of studies, as reviewed in a recent meta-analysis (Cepeda et al., 2006). However, the former prediction has never been tested.
5. The Testing Effect Up until this point, we have been discussing how distributing practice grants memorial advantages over massing practice. The studies described above mostly repeat items by granting additional study opportunities.
Spacing and Testing Effects
113
However, when an item is tested, it also constitutes a kind of restudy, and so most of the same things that take place in spacing studies also take place when the additional study opportunity is replaced with a test. Furthermore, it is now well known that inserting tests into a learning sequence produces better memory for the presented material than a similar amount of time spent studying. Researchers have extensively investigated this beneficial influence of testing on memory in recent years (see Roediger & Karpicke, 2006a, for an excellent review), and frequently make pleas for employing testing as a learning aid in educational practice. Empirical work on the testing phenomenon—as well as calls for its practical application in the classroom—has a long tradition. For example, in an early review of the results from experimental psychology, Offner (1911) wrote, ‘‘Witasek hat, was Ebbinghaus und Pilzecker schon beru¨hrten, an Silbenreihen umsta¨ndlich gezeigt, daß das bloße Lesen einen geringeren Einpra¨gungswert hat als das Rezitieren.’’ Loosely translated, this means Witasek demonstrated, consistent with what Ebbinghaus and Pilzecker had already suspected, that the memory trace of nonsense syllables is stronger when participants regularly recited (i.e., tested themselves) during learning than when they read the syllables multiple times. Furthermore, Offner, who was a preparatory-school principal in Munich, noted that the recitation method was gaining ground in school practice. He advised students to attend to the meaning of the material and to try to retrieve it from memory instead of re-reading it, at least for material like verse. Thus, it appears that in the beginning of the twentieth century, some German students were already being advised to use self-tests instead of rereading to learn.4 In this section, we will follow a format similar to Section 3. We will simultaneously identify some phenomena that need to be explained, which we summarize in Table 4, and describe the development and periodic rejection of theories that have been proposed to account for the testing effect. As there are far fewer theories to explain testing than to explain spacing, this is a much briefer endeavor.
5.1. Early Research: Tests Slow Forgetting One classic study on the testing effect was conducted by Gates (1917), who conducted a large-scale study aimed at comparing memory after restudy versus self-testing. Children in first, fourth, sixth, and eighth grade studied lists of nonsense syllables and brief biographies taken from Who’s Who in 4
Offner (1911, p. 52): ‘‘Vielfach gibt man den Schu¨lern den Rat, beim Memorieren eines Gedichtes, einer Regel u. dgl. (. . .), wenn das Hersagen oder Vortragen nicht glatt von statten gehen will, ins Buch oder ins Konzept zu blicken, sondern sich aufs Folgende oder auf den Zusammenhang zuna¨chst noch zu besinnen und erst wenn dies ohne Erfolg bleibt, nachzusehen.’’
114
Peter F. Delaney et al.
Table 4 Some Possible Testing Phenomena.
1. Testing effects grow over time. While restudy often produces better memory than testing after a short delay (at least without feedback on the test), testing tends to produce better memory after a long delay. Compared to restudy conditions, tests result in slower forgetting over time. 2. Test type invariance. Tests benefit memory on other types of tests, not just the original type of test. Furthermore, all types of test benefit memory. 3. Asymmetry. Testing usually produces asymmetric recall benefits, whereas restudy results in symmetric recall. Specifically, if a–b pairs are studied, then a-? tests are provided, a-? tests benefit more than ?-b tests. 4. Difficulty enhances testing effects. More difficult retrievals typically result in bigger testing benefits. Both weakening the cues and increasing the lag between study and test result in bigger testing benefits. 5. Testing reduces proactive interference. Inserting a test after a study event seems to reduce or eliminate build-up of proactive interference on subsequent material. 6. Integration weakens testing effects. There is some preliminary evidence that integrated materials may weaken the testing effect. Note that these effects are generally less well-established empirically than the effects in Table 3.
America. The children read the material, and at some point were told to stop reading and look away from the material in order to mentally retrieve whatever they could from their reading. The amount of time children spent on self-testing was varied (from 0% to 90% of the total study time). Immediately after learning, the children received a written free recall test. After 3–4 h, they were tested again. If the children were unable to recall, they could look back at the material during their self-tests—a feature that provides high ecological validity, as glancing back at the material during self-testing is probably what students do during self-testing. Obviously, however, this aspect of the procedure also loosened experimental control. The general conclusion from Gates’ study was that a combination of study and self-testing produces better memory than studying the same material over and over again. This result held for both nonsense syllables and biographies, and for most of the grade levels tested, with greater percentages of self-test resulting in better final test performance. However, the positive effect of recitation over restudying seemed to be moderated by a number of factors. First, age interacted with the effect of self-testing, as for nonsense syllables the effect did not occur for the youngest children (Grade 1). Second, the effect of self-testing was stronger for meaningful materials (biographical facts) than for meaningless nonsense syllables—a point we will return to later. Third, at least for meaningful materials tested after a 3–4 h delay, when self-testing took more than 60% of the time there was a
Spacing and Testing Effects
115
downturn in effectiveness. This suggests that a certain amount of study is required before self-testing can facilitate learning. Another classic study involved over 3600 students—the entire population of 91 Iowa elementary schools (Spitzer, 1939). Spitzer’s participants read an approximately 600-word instructional text on bamboo. Each student then received two or three tests within the next 63 days. The general result was that on a final test, students who received intervening tests performed better than those who had merely studied the text (but not received a test). Furthermore, it appeared from his data that the tests greatly slowed the forgetting rate over time. Around 1940, the interest in the effect of testing on learning waned, only to emerge again in the 1960s. Hanawalt and Tarr (1961) compared the effect of an intermediate free recall test on final recognition performance. In their experiment, participants studied 23 statements that each contained a subject, copula, and final predicate adjective (i.e., Brown eggs are expensive). Following the study phase, participants in the intermediate-test group had to recall as many of the adjectives as possible, while participants in a study-once group engaged in an unrelated activity. After either 8 min or 48 h (note that there was also a condition that received the final test after 52 h; however, the results of this group were similar to those found in the 48-h delay group) participants received a final five-choice recognition test on the previously studied adjectives. After 8 min, the intermediate-test and study-once groups had similar recognition accuracy, but after 48 h the intermediate-test group had substantially better recognition accuracy than the study-once group did. Taken together, the early studies showed that intermediate tests improved memory compared to study alone. However, the benefits of testing typically grow larger over time, perhaps because the forgetting rate is slower following a test.
5.2. The Importance of Retention Interval Subsequent studies focused heavily on the effect of retention interval. For example, Allen, Mahler, and Estes (1969) gave participants lists of threeletter English nouns paired with two-digit numbers. Participants studied 27 paired associates, each consisting of a three-letter English noun and a two-digit number. On Day 1, the session began with ten cycles of 18 study trials; nine paired associates appeared in all ten cycles (training condition 10), nine appeared in the first five cycles (training condition 5F), and nine appeared in the last five cycles (training condition 5L). During each study trial, a paired associate was presented on a projector screen for approximately 2 s. Participants were instructed to repeat the item appearing on screen as often as possible. Immediately after the training phase, one-third of the paired associates in each training condition received five test trials, one-third
116
Peter F. Delaney et al.
received one test trial, and one-third were not tested. At a test trial, a noun was presented on screen, and a participant was required to enter the twodigit number associated with this noun. Participants received no corrective feedback on their responses. After the test trial had been administered, participants were dismissed and they were instructed to return 24 h for the second session. In the second session, participants were tested on the paired associates they had learned on the previous day. The test format was identical to that used on Day 1 in the intermediate-test conditions, with each Day-1 item being tested four times. The dependent variable was the percentage of errors on the cued-recall test. For the present purposes, the most interesting outcome pertains to the comparison of two conditions, namely the condition in which items were studied ten times without an intermediate test (henceforth termed the repeated-study condition), and the condition in which items were studied five times (the combined training conditions 5F and 5L) and received five intermediate tests (henceforth termed the study-test condition). Allen, Mahler, and Estes showed that after 24 h, the average test performance in the study-test condition was better than in the repeated-study condition. This finding suggests that a relatively long retention interval is required before the memory benefit of intermediate testing over restudying emerges. Hogan and Kintsch (1971) compared intermediate testing with restudying at multiple retention intervals. In their first experiment, there were seven conditions that differed in terms of the training schedule and the final test. Most important for now is to make a distinction between repeated-study conditions and study-test conditions. Participants in the repeated-study conditions studied a list of 40 words three times with short breaks between presentations. Alternatively, participants in the study-test conditions studied the list once and then received two intermediate tests. These tests were either two free recall tests or two two-alternative force choice recognition tests. In both conditions participants took a final test (a free recall or recognition test) immediately after the last study trial or after the last test trial. In addition, a second test (again a free recall test or a recognition test) was administered after 2 days. The study produced some interesting findings. First, when free recall was used in the intermediate tests and in the final tests, the repeated-study group recalled more words than the study-test group on the immediate test. However, after a 2-day retention interval, mean free recall did not differ between the study-test group and the repeated-study group. Second, when recognition was used in the intermediate tests and free recall in the final tests, mean free recall performance after a 2-day retention interval was higher for the study-test group than for the repeated-study group. Third, when recognition was used in the intermediate tests and the final tests, it turned out that the repeated-study group performed as good as the studytest group. To strengthen the experimental manipulation from their first
Spacing and Testing Effects
117
experiment, Hogan and Kintsch conducted a second experiment. Participants studied the same 40 words as in Experiment 1 four times (restudy condition) or they studied the words once and took three consecutive free recall tests (study-test condition). A final test (either free recall or recognition) was administered after 2 days. The results showed that participants in the study-test condition outperformed those in the repeated-study condition on the final free recall test, whereas the reversed pattern was observed for the performance on the final recognition test. On the basis of the results reported by Hogan and Kintsch (1971) we can provide a definition of the testing effect: the testing effect refers to the finding that an intervening test leads to a better memory performance on a delayed test than restudying the material for the same amount of time. Furthermore, this positive effect of testing on long-term retention emerges because more forgetting occurs following restudying than following intervening testing. The testing effect has proven to be a very robust phenomenon as it has been demonstrated in laboratory studies with simple stimulus materials such as word lists or paired associates, and with a variety of memory tests (e.g., Cull, 2000; Karpicke & Roediger, 2007; Wheeler, Ewers, & Buonanno, 2003), as well as in laboratory studies with relatively complex stimulus materials, for example, prose materials or short papers (e.g., Agarwal, Karpicke, Kang, Roediger, & McDermott, 2008; Kang, McDermott, & Roediger, 2007; Nungester & Duchastel, 1982; Roediger & Karpicke, 2006b), visuospatial maps (Carpenter & Pashler, 2007), and obscure facts (Carpenter, Pashler, Wixted, & Vul, 2008).
5.3. The Return of Deficient-Processing Accounts A problematic aspect of many early testing studies—at least when it comes to drawing conclusions about the effect of intermediate testing on retention—is that the observed mnemonic benefits of intermediate testing may simply be due to re-presentation of (some of) the studied material during a test rather to the testing per se. In other words, testing may introduce additional processing of items compared to restudy. If this sounds eerily familiar, it is because it is the deficient-processing theory of spacing applied to the testing effect. To rule out this somewhat trivial explanation of the testing effect, an intermediate-testing condition ought to be pitted against a repeated-study condition. Such comparison was made in a study by Tulving (1967). In his Experiment 2, participants learned lists of 36 nouns in several different ways. One group studied the list, then was tested using oral free recall, then studied the list again, then repeated the test (the STST condition). This pattern was repeated six more times. In the study condition, participants received three study trials followed by a test trial (again six times). In the repeated-test condition, they studied the list once then received three test trials (again six times). Interestingly, Tulving found that the learning curves were almost identical in the three conditions. Thus, it seems fair to conclude
118
Peter F. Delaney et al.
that when the total presentation time is controlled for, a study trial produces as much learning as a test trial. This seems to rule out the simplest form of the deficient-processing account. However, some researchers have argued that the testing effect should be attributed to overlearning of the successfully tested items (e.g., Slamecka & Katsaiti, 1988; Thompson, Wenger, & Bartling, 1978). Overlearning occurs when an item that is already well known continues to receive practice. Hence, these overlearned items are at ceiling-level recall during the test, and even after their strength drops, they remain so well learned that they show no apparent forgetting. If tested items were overlearned relative to restudied items, then when forgetting happens with a delay, the restudied items will drop off the ceiling and show forgetting. The tested items, however, will weaken but still be at ceiling-level recall for some time, producing a slower forgetting rate. Only once they drop off the ceiling will they show the same forgetting rate as other items. One way to overcome overlearning is to ensure that participants achieve a largely errorless intermediate-test performance. In Thompson and colleagues’ third experiment this was done by testing and re-presenting series of five item sublists. The intermediate test on each sublist consisted of writing the five previously studied items down thrice with a short distractor task between the three free recall tests. Although maximum retrieval was not attained (participants recalled on average four of the five sublist items), overlearning was greatly diminished. Inconsistent with the overlearning account, a small advantage of repeated testing over restudying was found after a 48-h retention interval; no difference was observed after a short 20-min delay (see Kuo & Hirshman, 1996 for a similar short-delay result with a similar procedure). Furthermore, other studies (e.g., Carrier & Pashler, 1992; Toppino & Cohen, 2009) have demonstrated strong testing effects using experimental procedures which seemingly prevent overlearning. In addition, the overlearning account predicts that intermediate testing will result in a superior final memory test performance at all retention intervals. However, most studies in the literature show that the testing effect only emerges after a relatively long delay; in fact, at short retention intervals restudying often produces a better memory performance than intermediate testing. This pattern is clearly at variance with the overlearning explanation of the testing effect. Lastly, when highly integrated texts are used as stimulus materials, it has been demonstrated that the beneficial effect of testing can ‘‘spill over’’ to information not tested during the intermediate test (e.g., Chan, 2009; Chan, McDermott, & Roediger, 2006). It is unclear how the overlearning account can accommodate such findings. In sum, in view of the above-presented empirical evidence, we feel (in line with other researchers; see for instance Roediger & Karpicke, 2006a) that the testing effect cannot be attributed to either additional exposure or overlearning. That said, both additional exposure and
Spacing and Testing Effects
119
overlearning might be important ‘‘impostor’’ testing effects in certain circumstances. It remains to be seen whether clear evidence for either will emerge in experimental tests, but both are quite plausible problems.
5.4. Transfer-Appropriate Processing Accounts A second mechanism that has been frequently proposed to explain the testing effect is transfer-appropriate processing (e.g., Roediger & Karpicke, 2006a). According to this approach, the testing effect emerges because the mental processes that enhance performance on a final memory test are more closely reflected in the processes occurring during an intermediate test than in those occurring during study. The basic tenet of transferappropriate processing—that transfer from the learning phase to a final test is optimal when there is a close match between learning and test—is very useful. For instance, if a teacher encourages learning strategies that lead to conceptual understanding of the class material, then a final examination should ideally also emphasize conceptual understanding. While transferappropriate processing is therefore useful to educators, there are a number of empirical findings militating against accepting the transfer-appropriate processing account as a theoretical explanation of the testing effect. Like the overlearning account, the transfer-appropriate processing account has difficulties explaining why the testing effect is found after a long retention interval, but typically not after a short retention interval, and it cannot accommodate the finding that testing can also enhance memory for untested items (e.g., Chan, 2009; Chan et al., 2006). Furthermore, an important prediction of the transfer-appropriate processing account is that memory performance is best when there is a close match between the intermediate test and the final test. However, studies aimed at assessing this prediction failed to obtain strong support for it. For instance, Glover (1989) in his Experiments 4A–C instructed participants to study a 300-words essay describing the fictitious state of Mala. Subsequently, in the control condition participants were dismissed, whereas in the other three conditions participants received a free recall test, a cued-recall test, or a recognition test on the previously studied essay. Two days after the first session, participants returned to the laboratory for a final test, which was either a free recall test (Experiment 4A), a cued-recall test (Experiment 4B) or a recognition test (Experiment 4C). Contrary to the transfer-appropriate processing account, final test performance was not best when the intermediate test and the final test were identical. Instead, in each of the three experiments, final test performance was always best when participants had received an intermediate free recall test (see also Carpenter & DeLosh, 2006). Results reported by Carpenter, Pashler, and Vul (2006) also argue against the transfer-appropriate processing account. Their participants studied 40 weakly bidirectionally-associated noun-pairs (e.g., coffee-morning). After
120
Peter F. Delaney et al.
the entire list, they received a restudy opportunity for 20 of the pairs, while the other 20 pairs were tested by presenting the cue and asking for the target. On tested items, the pair was further re-presented briefly after each test. A final test was administered 18–48 h after the study session. The most interesting test conditions were the forward condition, when participants had to provide the target in response to the cue (e.g., coffee-?), and the backward condition, when participants had to retrieve cue given the target (e.g., ?-morning). According to the transfer-appropriate processing account, the advantage of testing over restudying should be largest when the final test and the intermediate test are identical (i.e., forward). However, Carpenter and colleagues found that the magnitude of testing effect was comparable across the four conditions. Interestingly, though, slight changes to the experimental procedure produce outcomes consistent with the transfer-appropriate processing approach. A new study by Carpenter, Pashler, and Jones (2008) used a procedure similar to the Carpenter et al. (2006) study, but this time using semantically unrelated pairs. As before, participants studied the pairs and then received either a restudy or a test opportunity on the pairs, with feedback provided after the test pairs. Testing benefitted both forward and backward conditions compared to restudy. Restudy produced symmetric recall in the forward and backward direction, but testing produced an asymmetry such that the direction that was used during the test resulted in superior recall compared to the reverse direction. Similar results have been reported by Zeelenberg, Pecher, and Tabbers (2008), whose participants studied a list of 24 unrelated word pairs five times in a row. A different list of 24 unrelated word pairs was studied thrice and subsequently tested twice by providing the cue and asking participants to generate the target. No feedback was given. After the study phase, a final test was administered on half of the items after 5 min and the rest after 1 week. Consistent with the Carpenter et al. (2008) results, after 1 week, they found that the testing effect was larger when the final test was in the same direction as the intermediate test than when the final test was in the opposite direction. There is now a substantial amount of evidence that when two items X and Y are studied together, they form a bidirectional representation such that X serves as a retrieval cue for Y, and Y also serves as a retrieval cue for X. Anderson and Lebiere (1998), in their ACT-R models, have always assumed that study produces bidirectional relationships. However, when people repeatedly retrieve one member of the pair using the other, people may learn specific procedural rules that create asymmetric recall. That is, if I repeatedly use X to retrieve Y, then eventually I will bypass the usual declarative memory mechanisms and create a procedural memory of the form ‘‘If X, then retrieve Y.’’ As such rules are inherently asymmetrical, the ACT-R theory predicts that testing should result in asymmetric benefits for the direction of the test. Thus, the recent results demonstrating that testing benefits the direction of the test may not be evidence for transfer-appropriate
Spacing and Testing Effects
121
processing so much as they are evidence for a transition in the type of memory being formed by tests as compared to restudy. (Then again, perhaps it is the asymmetric production-based memory that is ‘‘transfer appropriate,’’ as it is the type of memory that will be accessed at test.)
5.5. Retrieval Effort and Desirable Difficulty If additional exposure, overlearning, and transfer-appropriate processing cannot adequately explain the testing effect, then what mechanism can? We will argue here that the desirable-difficulties framework and the related concept of effortful retrieval are very useful in understanding the testing effect. Bjork (1994) suggested that long-term retention is promoted when techniques that encourage students to engage in more effortful encoding operations during learning are used. Examples of these desirable-difficulties techniques are spaced practice, delayed feedback, and testing. Relative to restudying, taking an additional test requires more effort, and this may even slow initial learning. However, in the long run, testing will lead to better retention than restudying. A straightforward prediction of the desirable-difficulties framework is that the beneficial effect of testing on a final test increases when the retrieval effort during an intervening test is greater, at least as long as an item is successfully retrieved. Several studies have provided support for this prediction using a variety of experimental manipulations. For example, both Glover (1989) and Carpenter and DeLosh (2006) compared the effect of three different types of intervening tests (free recall, cued recall, and recognition), on memory on a subsequent final free recall, cued recall, or recognition test. If retrieval effort is an important factor in the emergence of the testing effect, and if we assume that free recall requires more retrieval effort than cued recall and recognition, then an intervening free recall test should produce the largest testing effect regardless of the final-test type. The outcomes of Glover’s experiments and of Carpenter and Delosh’s experiment confirmed this prediction. Other studies have substantiated the idea that information is better retained when it is harder to retrieve initially. Karpicke and Roediger (2007) showed that long-term retention is better with longer time-intervals between the presentation of information and the initial than with a shorter time-interval. Furthermore, Carpenter and DeLosh (2006, Experiments 2 and 3) examined final retention as a function of the number of cues participants needed to retrieve an item during an intervening test. They found fewer cues, and hence a greater retrieval effort, led to a better final test performance after a 5-min retention interval. Also, Pyc and Rawson (2009) provided more direct evidence for the retrieval-difficulty hypothesis (which is inherently related to the desirable-difficulties framework) by showing that difficult but successful retrievals produce better memory than easier successful retrievals both after a short and a long retention interval.
122
Peter F. Delaney et al.
Thus, the results of the above-presented studies are clearly in line with the desirable-difficulties framework. However, it is not clear what exactly is enhanced by more difficult retrievals. There are many possible ways that memory could be enhanced by more difficult retrievals, all of which would be broadly consistent with the desirable-difficulties framework. It is therefore worth asking whether more detailed psychological mechanisms could be specified that produce results consistent with the desirable-difficulties framework.
5.6. Why Does Testing Help More Than Restudy? One possible explanation for the testing effect is that testing enhances encoding variability. McDaniel and Masson (1985), in their Experiment 1, had participants encode words using either semantic or phonemic cues. A control group left after encoding, while the other participants received an immediate cued-recall test. After a 1-day delay, everyone returned and received a final cued-recall test using either semantic cues (category cues) or phonemic cues (rhyming cues). When the original encoding and the type of final test matched, there was no advantage of having received a test (21% recall) compared to not (22% recall). However, when the final test mismatched the original encoding, a test produced better recall (18%) than no test (11%). Hence, it seems that a test primarily helped make judgments that differed from the original type of encoding. In Experiment 3, the singlestudy condition from Experiment 1 was compared against a restudy condition. The restudy condition was almost identical to the intervening-test condition from Experiment 1 except that the intervening cued-recall test was replaced with a restudy opportunity. As before, restudying yielded a larger final-test advantage when final-test cues mismatched the original encoding than when the final-test cues matched the original encoding. McDaniel and Masson interpreted these findings in terms of an encoding variability mechanism. If an intervening test or an additional study opportunity somehow adds new information elements to the existing memory trace, then the number of retrieval cues increases. Furthermore, and entirely consistent with the results found by McDaniel and Masson, these additional retrieval cues will particularly facilitate final-test performance when the final-test cue is different from the original encoding. Although McDaniel and Masson did not directly compare restudying with taking intervening tests, the combined findings from Experiments 1 and 3 suggest that encoding variability may underlie the testing effect. Specifically, an intervening test, but not, or to a more limited extent an extra study opportunity, serves to increase the number of retrieval cues encoded with an item’s memory trace and this will provide tested items with a memory advantage over restudied items on a delayed final memory test.
Spacing and Testing Effects
123
Recently, it has been demonstrated that testing insulates against the build-up of proactive interference (Szpunar, McDermott, & Roediger, 2007; Szpunar et al., 2008). This finding can also be interpreted as being consistent with the encoding variability explanation of the testing effect. Consider the third experiment in the study of Szpunar et al. (2008), in which three groups of participants had to study five 18-words lists. In one group, participants received a free recall test after each list. In a second group, participants restudied each previous list, and in the third group, participants studied each list only once. Thirty minutes after the fifth list had been (re)studied or tested, a final free recall test was administered to all participants. The critical comparison between the three groups pertained to List-5 performance because proactive interference should be strongest for the last list in the initial study sequence. It was demonstrated that the intervening-test group outperformed both the restudy group and the study-once group on List-5 memory. In addition, the last two groups did not differ in terms of List-5 memory. These findings provide a strong argument for the idea that relative to restudying lists, or studying lists once, intervening tests protect against proactive interference. To explain this phenomenon, Szpunar and colleagues proposed an encoding variability mechanism. They suggest that ‘‘testing adds contextual elements to a memory trace—over and above those added by a restudy episode—that enhance subsequent discriminability of recalled materials’’ (p. 1397). Given the presented empirical evidence, encoding variability may be the mechanism underlying the testing effect. However, Carpenter (2009) put forward the elaborative retrieval hypothesis as an alternative explanation of the testing effect. In this view, which is heavily based on spreading activation theories of memory, retrieval involves searching memory for a specific target, which activates a network of related concepts. The generation of this elaborative structure becomes helpful on a delayed final-test because it provides multiple retrieval routes to an item. By contrast, during a restudy trial, it is less likely that a participant will generate such elaborative structure, because an item is directly available. Therefore, on a delayed final test, tested items will be better remembered than restudied items. Carpenter provided support for the elaborative retrieval hypothesis by comparing memory performance on previously studied paired associates. In Experiment 1, participants studied cue-target pairs that were either weakly associated, such as basket–bread, or strongly associated, such as dentist–teeth. During the initial encoding phase, all pairs were presented one by one on the computer screen, and participants had to rate the degree of relatedness between the words. In all pairs, the target appeared in bold, underline font. Following the initial study phase, half of the weakly associated and strongly associated pairs were rated again on their relatedness (restudied pairs), whereas the other half were tested. During a test trial, the cue was presented and participants had to enter the studied word.
124
Peter F. Delaney et al.
No corrective feedback was given after the response. Five minutes after the pairs had been restudied or tested, a final free recall test was administered asking participants to recall as many of the underlined targets as possible. The final test results demonstrated that tested items were better recalled than restudied items. This outcome is quite interesting because it runs counter to the frequently observed finding in the testing literature that memory performance for restudied items surpasses that of tested items after a short retention interval (but see Carrier & Pashler, 1992). Furthermore, it was demonstrated that weakly associated pairs were better retained than strongly associated pairs; that is, the difference between the proportion correct at the intervening cued-recall test and the proportion correct free recall at the final test was smaller for weakly associated pairs than for strongly associated pairs. These findings were replicated with a slightly different procedure in Experiment 2. Carpenter took the results of her study as evidence in favor of the elaborative retrieval hypothesis. Under the assumption that target retrieval requires more elaboration with a weak cue than with a strong cue, if follows that the number of pathways to a target is larger for targets from weakly associated cues than from targets from strongly associated cues. Consequently, targets from weakly associated cues will suffer less from forgetting than targets from strongly associated cues. An interesting prediction that follows from the elaborative retrieval hypothesis is that an initial test should benefit memory not only of the tested information but also of related, but untested information. Chan (2009) and Chan et al. (2006) have corroborated this prediction.
5.7. Testing Effects for Integrated Stimuli In the vast majority of testing effect studies, the stimulus materials are lists of items with a low level of integration; individual list items, such as words or paired associates, are not in any way connected to each other. However, there is both an empirical and a theoretical argument that the testing effect may be smaller for integrated than for nonintegrated materials. The empirical argument is based on a finding reported in a study by Chan et al. (2006; Experiment 1). They demonstrated with an integrated text about the toucan bird that restudying led to a better recall of text information than intermediate testing after a retention interval of 24 h. However, and as pointed out by Chan and colleagues, this finding should be interpreted with caution because the retention interval was short compared with other testing studies. Therefore, it may be possible that the restudy superiority would disappear and reverse, i.e., turn into a testing advantage, with a longer delay. Alternatively, there is also a theoretical argument for the idea that the testing effect will be weaker for integrated than for nonintegrated materials. Specifically, when participants study materials that are integrated either as
Spacing and Testing Effects
125
a result of material characteristics (e.g., all items in a word list are from the same category) or due to instruction characteristics (e.g., participants have to make a story of initially unrelated items, cf. Delaney & Knowles, 2005), they can construct a gist-feature that binds the studied items together. Furthermore, this gist-feature may serve as such a strong retrieval cue on a final memory test that it reduces the beneficial effect of intermediate testing over restudying. Recently, we conducted two experiments that provide information about the role of integration in the testing effect. In one experiment (Verkoeijen & Delaney, 2010a), participants studied a list containing unrelated word using either a continuous rehearsal strategy (i.e., keep rehearsing as many words from the list as possible) or a story strategy (i.e., make a story of the words in the list). Apart from the learning strategy, we manipulated study type (restudy vs. intervening test) and the retention interval from the last study episode to the final test (5 min vs. 7 days) as between-subjects factors. Furthermore, free recall was used at both the intervening and the final test. Remarkably, the analysis of the final-test free recall performances revealed a three-way interaction. For the rehearsal strategy, a classic testing pattern emerged with more forgetting occurring for restudied words than for tested words. By contrast, in the story-strategy condition, we found a main effect of study type and length of the retention interval, without a trace of a study type by retention interval interaction. On average, restudying led to a better final-test performance than testing after 5 min and after 7 days. In addition, the forgetting rates were nearly identical for both study types. In the other experiment (Verkoeijen & Delaney, 2010b), we asked participants to learn four categorized lists and manipulated study type (restudy vs. intervening test) and retention interval (5 min vs. 7 days) within-subjects. At both the intervening and the final test, a free recall test was administered to the participants. The final-test results showed that average performance was the same for restudied and for tested items after 5 min and after 7 days (also, overall performance was worse after 7 days than after 5 min). The above-presented results suggest that the testing effect may be smaller for integrated than for nonintegrated materials. However, this preliminary finding needs to be corroborated by other empirical evidence. The experiment with the categorized lists also seems to indicate that the testing effect is absent with integrated materials. Yet, to strengthen our position, an extra experiment needs to be run in which categorized and noncategorized lists are compared with respect to the testing effect.
5.8. Summary: The Testing Effect Early results from the testing literature indicated that tests seem to slow forgetting. Often on an initial test there is little advantage of testing over restudying, but due to the differences in forgetting rates, testing ultimately
126
Peter F. Delaney et al.
results in better retention. As with spacing, deficient processing might produce some of the apparent testing benefits, and we should be mindful of this possibility. One way that could happen is through overlearning, with some items showing ceiling-level performance and then slower forgetting over time. However, studies seem to show that testing effects occur even with items that are not at the ceiling, so deficient processing is unlikely to provide the whole story. One explanation for the testing effect focused on match between study and test processes (transfer-appropriate processing), but several studies show that tests help even when study and test processes mismatch. Later accounts focused mainly on the difficulty of retrieval, suggesting that more effortful retrievals produce more resilient memory traces. The latter account is quite similar to the study-phase retrieval account of spacing, and can explain most of the critical phenomena in the testing literature. Among the pieces of evidence for the retrieval difficulty account was that increasing the lag between study and test and reducing the specificity of the cue during the test both increase retrieval difficulty and enhance the impact of a test. Furthermore, unlike restudy, testing sometimes creates an asymmetric memory benefit such that the portion of the material that is retrieved benefits to a greater degree than the cue used for retrieval. Finally, we presented some new data which indicate that integrated materials may show smaller testing effects than nonintegrated materials because the former rely less on contextual information and more on item-to-item associations formed during study. At this time, there is no formal computational model of the testing effect, although the ACT-R model has some successes in this direction. Future research should be directed at creating an integrated computational model of spacing and testing.
6. Spacing and Testing in Educational Contexts Historically, studies on spacing and testing have been conducted in tightly controlled laboratory settings in which competing theories have been developed and tested. However, extending laboratory findings to educational settings is equally important. Applied studies, where longer delays and educationally relevant materials are used, have yielded results that are analogous to basic findings. For example, in a 6-week web-based Brain and Behavior course, being quizzed relative to rereading course material produced superior subsequent recall on a final exam (McDaniel, Anderson, Derbish, & Morrisette, 2007)—the standard testing effect. Additionally, as retrieval difficulty increased on the initial quiz, so did performance on the final exam, a finding that is consistent with laboratory research
Spacing and Testing Effects
127
(e.g., Carpenter & DeLosh, 2006). Other research has found spacing and testing effects to extend to a variety of educationally relevant materials including scientific prose (e.g., Roediger & Karpicke, 2006b), maps (e.g., Carpenter & Pashler, 2007), foreign languages (e.g., Bahrick, Bahrick, Bahrick, & Bahrick, 1993), history facts (e.g., Carpenter, Pashler, & Cepeda, 2009), and math learning (e.g., Rohrer & Taylor, 2006). Recognizing that spacing and testing are excellent candidates for improving memory for factual knowledge, researchers have strongly recommended that educators include spaced practice and frequent testing in schools as ways to improve educational outcomes (e.g., Pashler et al., 2007; Roediger, Agarwal, Kang, & Marsh, 2010). Advocacy for the inclusion of spacing and testing in schools stems from the fact that they are empirically supported methods for improving memory. As applicable to education as spacing and testing are, we argue that there are at least four unaddressed questions that prevent spacing and testing from having a greater impact on learning. The thesis of our argument is that cognitive psychologists have been successful at identifying how spacing and testing improve memory, but that there remain unaddressed concepts central to improving education. What follows are descriptions of those four questions and how they can be addressed. Because the application of spacing and testing have been recently reviewed elsewhere (e.g., Cepeda et al., 2006; Pashler et al., 2007; Roediger & Karpicke, 2006a; Roediger et al., 2010), we will only review prior research as it relates to our commentary. The questions we pose, our criticisms, and recommendations, apply only to research that seeks to make direct contributions to education.
6.1. Do Spacing and Testing Improve Learning or Just Memory? Applied research on spacing and testing typically asks participants to study novel information (e.g., foreign vocabulary) and examine how some treatment (e.g., testing or spacing) impacts memory relative to a control group (e.g., restudying or massing). These studies have taught us a great deal about how spacing between study opportunities (e.g., Bahrick et al., 1993), retrieval difficulty (e.g., McDaniel et al., 2007), feedback (e.g., Butler, Karpicke, & Roediger, 2007), and retention interval (e.g., Cepeda et al., 2008) can be optimized to produce superior memory. However, the typical focus of spacing and testing research is on memory, not on other kinds of learning. Although some work has shown that spacing benefits skill learning (e.g., Rickard, Lau, & Pashler, 2008; Rohrer & Taylor, 2006), rote memory is the usual dependent variable investigated in spacing and testing experiments. Kintsch (1994) drew a distinction between remembering and learning, where remembering involves being able to recall or identify a set of previously seen items. Learning, according
128
Peter F. Delaney et al.
to Kintsch, implies deeper understanding of a subject where knowledge can be used flexibly. Thus, despite sometimes impressive spacing and testing effects, it is unclear whether these manipulations enhance memory alone, or both memory and learning. In terms of making recommendations to educators, this is an important distinction. In schools, memory often is the primary outcome measure (e.g., most multiple choice exams), but one job of schools is to prepare people for employment where success depends on applying knowledge to novel situations. For example, remembering the historical causes of a societal collapse would allow one to perform well on an exam in school, but making a contribution outside of a school setting would require inferring what downfalls of past societies can tell us about prevention of our own societal failure. Learning, in other words, allows one to use prior knowledge to make novel connections and aid in solving an array of problems. Several studies have demonstrated that remembering and learning (in the sense described here) are independent constructs. For example, before reading a technical article about microbes, Mannes and Kintsch (1987) gave participants background material that was presented either in the same order or a different order than the article. Although participants in the same-order condition outperformed the participants in the differentorder condition on later free recall and sentence verification tasks about the article, participants in the different-order condition outperformed the sameorder participants on inference and problem-solving tasks. Kintsch (1994) explained those results by attributing the difficulty associated with deriving coherence between the background text and the target text with forcing people to create a richly interconnected mental representation of the two. When background material matches the target text, there is little interference or need to develop a new mental model to integrate the two. Although this match facilitates rote memory, it is not as conducive to problem solving or inference making abilities. Kintsch’s ideas are consistent with fuzzy trace theory (Brainerd & Reyna, 1990), where it has been found that studying material verbatim leads to relatively better memory, but activities that promote more gist-like encoding produce a deeper understanding of the material (Wolfe, Reyna, & Brainerd, 2005). At this point, it is unclear if spacing and testing have any effects on problem solving beyond contributing increased knowledge, or if they facilitate more sophisticated mental models. One recent study has explicitly evaluated learning instead of merely memory (Kornell & Bjork, 2008). Motivated by Rothkopf’s quote that ‘‘spacing is the friend of recall, but the enemy induction’’ and by research showing a massing effect in inductive learning, the authors set out to investigate if massing is in fact more conducive to inductive learning than spacing. In Experiment 1a, participants were shown six different paintings from each of 12 different artists. Six of the artists’ works were presented in
Spacing and Testing Effects
129
spaced format, and the other six artists’ works were presented in massed format. Experiment 1b was exactly the same, except spacing and massing were manipulated between subjects. At test, participants were shown new paintings one at a time from the previously seen artists and indicated which artist they thought painted the piece. In both experiments, participants were better able to infer new artists’ paintings when they learned that artist’s work through spaced presentation. Given that the results could be explained by participants simply being able to better remember which artist painted which painting in spaced conditions (a finding that would say nothing about inference), a second experiment was conducted that was almost identical to 1a (the only difference was that the test required participants to discriminate between familiar and unfamiliar artists). The results again revealed an advantage for spaced presentations. A similar study using children was recently published. As with undergraduates learning artists, it turns out that spacing instances of categories improves children’s ability to induce whether a new item is a member of the category or not (Vlach et al., 2008). A recent study by Johnson and Mayer (2009) explored whether testing benefits comprehension relative to restudy. In this study, participants learned a narrated animation about how lightning works. This animation, which was 140 s long, was presented on a computer screen. Afterwards, some participants had to study the same animation again (restudy condition), other participants were given a retention test, and the third group of participants received a transfer test consisting of four questions. Subsequently, half of the participants in each of the three conditions received a final retention test and a final transfer test after 5 min, whereas the other half of the participants received these tests after 7 days. It should be noted that the final retention test was identical to the intervening test; the final transfer test consisted of two questions from the intervening test and two new questions. For the present purpose, the most important finding was that at the 7-day delay, Johnson and Mayer found a testing effect on new transfer questions. That is, participants who had received an intervening transfer test, scored better on the new transfer question of the final test than the participants in the restudy condition. However, one peculiar aspect of Johnson and Mayer’s study was the animated computer lesson, which was presented without any learner control. This type of material and the presentation format are not typically used in educational settings. Furthermore, the new transfer score was based on only two items, which is problematic in terms of the reliability and the validity of the test scores. Hence, it remains to be seen whether Johnson and Mayer’s results can be substantiated in upcoming research. Outside of the spacing and testing literature, researchers have spent many years studying distant transfer—that is, applying knowledge from one domain to solve problems in a relatively unrelated domain. An example of distant transfer is an army general applying his knowledge of chess to
130
Peter F. Delaney et al.
battlefield tactics. The conditions under which people are able to execute distant transfer are not well understood (Barnett & Ceci, 2002), but it does remain a construct worth studying with respect to spacing and testing. When business and education leaders call for graduates with complex thinking skills, they are often speaking of distant transfer. In other words, they believe school should give students the knowledge and the skills to take what they learned in the classroom to generate ideas and solve problems in the real world. Using spacing and testing to develop students with such farreaching abilities would require that cognitive researchers move beyond memory performance as the primary dependent variable in their research. Taking a cue from education researchers, cognitive psychologists might aim to better understand how spacing and testing impact skills such as critical thinking (e.g., Quitadamo & Kurtz, 2007), comprehension (e.g., Konopak, Martin, & Martin, 1990), and interpretation (e.g., Beins, 1993).
6.2. How Prevalent Are Spacing and Testing in Classroom Settings? Many researchers point out that spacing and testing are rare in classrooms and that expanding their use would benefit education (e.g., Dempster, 1996; Pashler et al., 2007). Based on how effective spacing and testing are at improving memory, this is a logical conclusion. Anecdotally, high school teachers and college professors seem to teach in a linear fashion without repetition and give three or four noncumulative exams. Rohrer (2009) alleges that mathematics textbooks usually present blocked practice on a given topic, and only more rarely present review problems that would constitute spaced tests (see also Stigler, Fuson, Ham, & Kim, 1986). Such structural problems seemingly preclude the possibility of frequent spaced- and retrieval-practice. However, measuring the prevalence of spaced- and retrieval-practice solely on the nonrepetition of lesson plans and the paucity of tests might underestimate their true frequency. Spaced practice is implicit in many domains. In statistics, ANOVA might be learned early in a semester and regression late in the semester, but many teachers likely review ANOVA when presenting regression for the first time. Even if instructors do not review information verbatim, there is evidence suggesting that when the second presentation of an item is a gist version of the second, massed items may be remembered just as well as spaced items (Dellarosa & Bourne, 1985; Glover & Corkill, 1987). Like in statistics, units of information in other domains do not exist in isolation but are integrated with other units of information. Learning newer information often requires restudying and retrieval of older information. The interconnected nature of knowledge might therefore inherently encourage spacing and testing, even if instructors do not deliberately try to build spacing and testing into their courses.
Spacing and Testing Effects
131
Prevalence estimates of spacing and testing in classrooms may also shortchange the value of in-class discussions. Discussion with classmates about a topic includes listening to what other people say (a form of restudying or spaced practice) and retrieving prior knowledge (a form of a test); research on writing shows that retrieving information in order to form arguments improves memory compared to rote retrieval (Wiley & Voss, 1996). Furthermore, instructors who pose questions to the class, even if they are rhetorical, might initiate students to covertly retrieve information. In sum, the real amount of spacing and testing in classrooms may need to be assessed by observing real classrooms. Broadly speaking, it is important that researchers in the field develop a better understanding of spacing and testing in classrooms. Rohrer and Taylor (2006) have provided estimates indicating that spacing of problems in mathematics textbooks is the exception rather than the rule. Beyond that, we do not know much about how common spacing and testing are in classrooms. More specifically, we think that researchers have paid insufficient attention to what defines spacing and testing. For example, in many studies on the spacing effect spaced practice is compared to massed practice and in studies on the testing effect testing it is compared to restudying. If spacing and testing can in fact be something other than formal opportunities for restudy or tests, then future research might aim to uncover what constructivist activities that could be implemented in the classroom encompass spacing and testing. For example, a study might compare students who are tested versus students who engage in class discussions. This design would allow researchers to uncover informal instances of spacing and testing in the classroom. If class discussions prove to be just as effective mnemonic devices as traditional spacing and testing, research might be doing students a disservice by trading class discussion time for traditional restudy and retrieval practice.
6.3. How Can One Improve Learners’ Use of Spacing and Testing? Recent calls to use more spacing and testing have generally focused on classroom instructors. However, given that much of the learning we do happens outside of the classroom, one wonders how much more could be achieved by helping learners to space their own practice and to effectively test themselves. Given that when students study, they usually have control over which items they will study, it is important to know whether they even think spacing and testing are helpful. It is also important to know if, given the choice, they will space their own practice or not. If students are already doing substantial spaced practice on their own, then teachers’ attempts to encourage spaced practice in the classroom may help very little (if at all).
132
Peter F. Delaney et al.
One way to find out whether people are aware of the benefits of spaced practice is to compare perceptions of training regimens after people have experienced them. An important study by Baddeley and Longman (1978) involved teaching postal workers to type. The authors varied how much the training on subskills was spaced (interleaved) or massed (blocked). At the end of training, they asked the postal workers to indicate how satisfied they were with the training, and found that the objectively most effective training method, which involved the most spacing, was the least liked and that many postal workers would even refuse to participate if asked to train like that again. In contrast, the objectively least-effective regimen, which involved the most blocking and massing, was the most liked. Our interpretation of these results is that people find spaced practice effortful and unrewarding, at least when there is a lot of task-switching involved as well. Consistent with these results, Simon and Bjork (2001) gave people massed or spaced practice on a motor learning task and found that while massed practice resulted in faster acquisition of each response, spaced practice resulted in far better retention. Nonetheless, when people were asked which they preferred, they thought massing was better and that it promoted learning to a greater degree than spacing. Similar results were found in the Kornell and Bjork (2008) study in which participants learned painters’ styles (see Section 6.1): more than 80% of participants classified more paintings correctly with spaced repetitions, but right after study, an approximately equal percentage believed massed presentation was at least as effective as spaced presentation. Taken together, these results suggest that people have little insight as to whether massed or spaced presentation promote learning, and may be tempted to mistakenly attribute the fluency of performance during study for effectiveness of training in the long run.5 Furthermore, it appears that students left to their own devices rarely space their study. A recent anonymous survey of over 200 University of North Carolina at Greensboro introductory psychology students conducted by the first author found that most indicated that they did not space their study; instead, they would study a single chapter straight through, and then move to the next, without ever revisiting the earlier one. Additionally, the majority of students indicated that they study only the night before an exam, although there was a sizeable minority who indicated that they study ‘‘a little every day.’’ (There was also a not inconsiderable minority who indicated doing neither; they reported that they ‘‘rarely or ever’’ study at all.) In contrast, Karpicke, Butler, and Roediger (2009) reported that students’ most favored study strategy is rereading.
5
It seems strange to us that nobody to our knowledge has conducted the identical study for vocabulary memory, where there is no task switching. Task switching is generally effortful and unpleasant, but spacing is perfectly possible in vocabulary learning without any task switching at all.
Spacing and Testing Effects
133
Given that students do not space their study sessions, perhaps they nonetheless spaced their practice within a given session. If so, then given the choice, they should prefer to space items rather than to mass them. To find out, Ciccone and Brelsford (1976) allowed participants to choose the order of presentation of CVC paired associates (e.g., MAQ–TOJ) by pushing a button during the first presentation of the pair. Their goal was to learn all of the pairs in a set of 16. Participants chose lags of 2–5 approximately 70% of the time, suggesting that they favored short-lag spacing, and studied items on average between 11 and 14 times. After a 24-h break, they recalled an impressive 88% of the responses correctly on a surprise test. However, the most important aspect of the study was that having control of one’s own study lag improved recall tremendously compared to a yoked control who received the same schedule. Apparently, people avoided lags that were too short or too long, and probably studied the items they personally found difficult more times. Aside from the fact that having control over one’s study improved learning and retention, the study suggests that people are smart enough to avoid the massed item deficit. More recent studies have examined whether students are sensitive to item difficulty when making decisions about spacing or massing items. An ingenious study by Son (2004) presented participants with a list of words to study, some of which were more difficult to learn than others. They then could choose whether they wanted to see the same item again immediately, or whether they wanted to ‘‘save’’ it for a spaced presentation. She found that people tended to mass the harder items and space easier items. Benjamin and Bird (2006), however, forced participants to space exactly half of the items. Under these conditions, participants preferred to space the harder items. These divergent results were reconciled neatly in a recent study by Toppino et al. (2010), who noted that presentation rate was a major difference between the earlier studies, with Son using a faster pace than Benjamin and Bird. Toppino et al. showed that for difficult items, participants could not fully encode the item in the time given, so they elected to mass the items. However, for easier items, they were more often able to fully encode the item, and so they spaced them. At slower presentation rates like those used in the Benjamin and Bird study, participants always elected to space the items. In a second experiment, they showed that participants often reported not perceiving the words if they were difficult and passing by too quickly, consistent with their argument that participants massed items to avoid skipping them entirely. In sum, learners are fairly savvy when it comes to making item-by-item decisions about spacing. However, they are easily fooled by the fluency induced by massed practice into thinking that massed learning is superior to spaced learning. Speed of learning is not always a good indicator of effective retention in the future. Finally, students are not usually good about spacing their study sessions, even if many are normatively aware of the long-term
134
Peter F. Delaney et al.
benefits of spacing study sessions. This problem may be exacerbated by the fact that massed study sometimes yields good scores on tests that occur shortly after the massed study; cramming may work if you know that the test will never be repeated again. Educators may want to consider clever methods for encouraging students to space their study at home.
6.4. Are There Individual Differences in Spacing and Testing? Over recent decades, researchers have investigated numerous variables that can be manipulated to optimize spacing and testing effects. Unfortunately, almost none of these studies have assessed individual differences. The drawback of sweeping advocacy for spacing and testing in schools is that a learning schedule that benefits one student might have neutral effects for another student, or even come at a cost to more effective study strategies for others. With the benefits of spacing and testing potentially emerging as a result of memory retrieval of previously seen items (Greene, 1989), baseline memory abilities might be a source of individual differences in the spacing and testing effects. The inverted U-shape of memory performance as a function of lag between items in spacing studies is evidence for this (e.g., Verkoeijen et al., 2005). Based on this, it is presumed that optimal lag differs depending on baseline memory abilities, but there is only indirect evidence for this hypothesis. For example, Sperber (1974) showed that in children who are mentally retarded, spacing practice is sometimes detrimental to those with lower IQs compared to those with higher IQs. In addition, Verkoeijen and Bouwmeester (2008), using a latent class regression analysis technique, demonstrated that under certain conditions, the spacing effect is smaller for college students with an overall lower memory-performance level than for students with an overall higher memory-performance level. This same general relationship may also exist between testing and working memory. One factor that helps to optimize the testing effect is successfully retrieving an item once it has been cleared from working memory (Karpicke & Roediger, 2007). However, if an item has been cleared from working memory and it can no longer be retrieved, the benefits of testing are likely to be smaller (Baddeley, 1990). Based on this premise, we presume that a person with lower memory abilities would benefit from retrieval practice at some time sooner than a person with higher memory abilities. This hypothesis is yet to be tested (although Latasha Holden in the Delaney lab is currently conducting a study on this topic). It is probably unrealistic to think that we could assess every student’s memory ability and use that estimate to create personalized spacing and testing practice schedules. However, even without a personalized profile for every student, different students might benefit from metacognitive
Spacing and Testing Effects
135
techniques to optimize learning. For example, if students could be taught to recognize the time when restudying or testing themselves on a particular item is difficult enough that it improves memory, but not too difficult that the item cannot be retrieved, this would allow for learning improvement for a wider range of students regardless of individual differences. Prior research supports the notion that memory in educational settings can be improved through metacognitive training (Metcalfe, Kornell, & Son, 2007) and monitoring for retrieval failures (Bahrick & Hall, 2005).
7. Conclusions Our review sought to make sense of the conflicting and frequently bewildering results in the spacing literature. We began by pointing out that spacing-like effects can probably be produced by many different means, and that likewise there are things that people choose to do in our experiments that may obscure the ‘‘true’’ spacing effect. Before we can make sense of the spacing literature from the perspective of retrieval processes, we must understand how people act in our studies and how their strategic decisions affect memory. Often, peoples’ decisions about how to study are obscured by the procedures we use in laboratory memory studies, and yet we have demonstrated repeatedly that the effects of these strategies may be larger than the ‘‘true’’ spacing effect itself. Most studies of spacing use word lists and ask people to study those words for a later memory test. Not only can rehearsal interact in unexpected ways with list order to produce larger or smaller spacing effects (Delaney & Verkoeijen, 2009), but people often change their study strategies as they encounter several lists (Delaney & Knowles, 2005). These encoding strategy differences can often obscure whether spacing effects ‘‘should’’ be present or not. For example, there is no reason why spacing effects should be absent on pure lists when people use rote rehearsal, but they generally are. The reason is because one of the ‘‘impostor’’ phenomena actually works against the spacing effect on such lists, and cancels it out. Hence, we think that controlling encoding strategies is going to be increasingly important if we want to make theoretical progress on spacing and testing. The knowledge that one or more ‘‘impostor’’ spacing phenomena are present in a study casts doubt on the validity of the theoretical conclusions drawn from these studies. Our list of impostors can be understood as a potential set of critiques that reviewers can raise when judging whether a new paper should be used for making theoretical arguments, or whether it needs to be repeated with a cleaner design before we can trust the results. We understand that this exercise is fundamentally destructive in that it casts doubt on a large number of well-known empirical results. However, in
136
Peter F. Delaney et al.
order to build a theory that can explain the spacing effect, we first need rules by which studies can be considered trustworthy or suspicious. Next, we reviewed some existing major theoretical perspectives and tried to evaluate whether they could explain the ‘‘true’’ spacing effect. Various theories have been largely discarded because they fail to capture important aspects of the data (see Table 3 for some of these aspects). Our conclusion was that contextual variability and study-phase retrieval could probably be combined into a model that provides a successful account of most of the important spacing phenomena (cf. Raaijmakers, 2005). However, such a model requires quantitative tests before we can be confident that it works. Nonetheless, the verbal theory on which that model would be built makes a number of correct predictions that seemed to us to be counterintuitive, but that survived empirical tests. For example, it predicts dissociations with delay in free recall and recognition, which we have some evidence exist. It also predicts that directed forgetting will have a larger impact on spaced than on massed items, which was demonstrated by Sahakyan et al. (2008). In our view, a good theory ought to go beyond explaining (some of the) existing data, and make good predictions about future experiments. By that standard, the expanded version of Raaijmakers’ SAM/REM account seems to be quite successful. We next reviewed theoretical accounts of the testing effect in a fashion similar to our review of the spacing effect (see Table 4 for empirical phenomena). The transfer-appropriate processing notion is useful at a practical level, in that when intervening tests are used as an educational tool, it is helpful if characteristics of the intervening test mimic those of the final test. However, as a theoretical explanation of the testing effect, the transfer-appropriate processing account falls short because it has difficulties accommodating important findings from the testing effect literature. That leaves us with the desirable-difficulties framework and the associated concept of effortful retrieval. The idea that retrieval effort plays a pivotal role in the emergence of the testing effect is consistent with important findings from the testing-effect literature. However, why retrieval effort produces the testing effect is not yet clear. On the one hand, there are reasons to suspect that retrieval effort creates a memory advantage over a restudy episode because it adds extra information element to a memory trace (i.e., retrieval effect leads to encoding variability). On the other hand, retrieval effort may bring about its beneficial effect because it activates an elaborative structure of related concepts. It remains to be seen whether these accounts can be distinguished with respect to predictions about the testing effect. While a hybrid encoding variability and retrieval account is appealing, especially since it postulates similar mechanisms for the spacing and testing effects, it is not entirely clear how such an account deals with the typical interaction between testing and the length of the retention interval. At first sight it seems to follow that testing should produce better final test
Spacing and Testing Effects
137
performance than restudying at any retention interval. However, the testing effect only emerges after long retention intervals; at short retention intervals restudying often leads to a better final test performance than testing. In contrast, the spacing effect can emerge even on time scales of seconds. Results like these may imply that consolidation-like mechanisms might provide a better account of existing data. Finally, previous reviews of the spacing and testing literature have emphasized the importance of these phenomena in education (e.g., Dempster, 1988). We agree that implementing spacing and testing in school settings is a promising endeavor both practically and empirically. Borrowing from Daniel and Poole’s (2009) educational philosophy, we advocate that spacing and testing research grows out of its current ‘‘memory first’’ approach and embraces a ‘‘pedagogical ecology’’ approach. A pedagogical approach has an interdisciplinary focus and observes students in context with the goal of identifying interactions that lead to various outcomes (Daniel & Poole). The careful control of the laboratory environment is critical for making theoretical progress, but we must be wary of assuming that the results of our laboratory studies can be applied. Learners and teachers both need to be aware of the benefits of spacing and testing, and to be guided to make choices that maximize learning in the long-term instead of minimizing the pain of training. Furthermore, it is worth assessing the value and frequency of existing educational practices by determining whether they encourage spaced practice, are sustainable, and are desired by students.
REFERENCES Agarwal, P. K., Karpicke, J. D., Kang, S. H. K., Roediger, H. L., III, & McDermott, K. B. (2008). Examining the testing effect with open- and closed-book tests. Applied Cognitive Psychology, 22, 861–876. Allen, G. A., Mahler, W. A., & Estes, W. K. (1969). Effects of recall tests on long-term retention of paired associates. Journal of Verbal Learning and Verbal Behavior, 8, 463–470. Anderson, M. C., Bjork, R. A., & Bjork, E. L. (1994). Remembering can cause forgetting: Retrieval dynamics in long-term memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1063–1087. Anderson, J. R., & Bower, G. H. (1972). Configural properties in sentence memory. Journal of Verbal Learning and Verbal Behavior, 11, 594–605. Anderson, J. R., & Lebiere, C. (1998). Atomic components of thought. Hillsdale, NJ: Erlbaum. Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory, 2 (pp. 89–195). New York: Academic Press. Baddeley, A. (1990). Human memory: Theory and practice. Needham Heights, MA: Allyn & Bacon. Baddeley, A. D., & Longman, D. J. (1978). The influence of length and frequency of training session on the rate of learning to type. Ergonomics, 21(8), 627–635.
138
Peter F. Delaney et al.
Bahrick, H. P., Bahrick, L. E., Bahrick, A. S., & Bahrick, P. E. (1993). Maintenance of foreign language vocabulary and the spacing effect. Psychological Science, 4, 316–321. Bahrick, H. P., & Hall, L. K. (2005). The importance of retrieval failures to long-term retention: A metacognitive explanation of the spacing effect. Journal of Memory and Language, 52, 566–577. Balota, D. A., Duchek, J. M., & Paullin, R. (1989). Age-related differences in the impact of spacing, lag, and retention interval. Psychology and Aging, 4, 3–9. Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn? A taxonomy for far transfer. Psychological Bulletin, 128, 612–637. Ba¨uml, K.-H. (1997). The list-strength effect: Strength-dependent competition or suppression? Psychonomic Bulletin & Review, 4, 260–264. Beins, B. C. (1993). Writing assignments in statistics classes encourage students to learn interpretation. Teaching of Psychology, 20, 161–164. Benjamin, A. S., & Bird, R. D. (2006). Metacognitive control of the spacing of study repetitions. Journal of Memory and Language, 55, 126–137. Benjamin, A. S., & Craik, F. I. M. (2001). Parallel effects of aging and time pressure on memory for source: Evidence from the spacing effect. Memory & Cognition, 29, 691–697. Bentin, S., & Feldman, L. B. (1990). The contribution of morphological and semantic relatedness to repetition priming at short and long lags: Evidence from Hebrew. Quarterly Journal of Experimental Psychology, 42A, 693–711. Bjork, R. A. (1989). Retrieval inhibition as an adaptive mechanism in human memory. In H. L. Roediger & F. I. M. Craik (Eds.), Varieties of memory and consciousness: Essays in honour of Endel Tulving (pp. 309–330). Hillsdale, NJ: Erlbaum. Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). Cambridge, MA: MIT Press. Bjork, R. A., & Allen, T. W. (1970). The spacing effect: Consolidation or differential encoding? Journal of Verbal Learning and Verbal Behavior, 9, 567–572. Bobrow, S. A. (1970). Memory for words in sentences. Journal of Verbal Learning and Behavior, 9, 363–372. Bower, G. H., & Clark, M. C. (1969). Narrative stories as mediators for serial learning. Psychonomic Science, 14, 181–182. Brainerd, C. J., & Reyna, V. F. (1990). Gist is the grist: Fuzzy-trace theory and the new intuitionism. Developmental Review, 10, 3–47. Braun, K., & Rubin, D. C. (1998). The spacing effect depends on an encoding deficit, retrieval, and time in working memory: Evidence from once-presented words. Memory, 6, 37–65. Butler, A. C., Karpicke, J. D., & Roediger, H. L. (2007). The effect of type and timing of feedback on learning from multiple-choice tests. Journal of Experimental Psychology: Applied, 13, 273–281. Cahill, A., & Toppino, T. C. (1993). Young children’s recognition as a function of the spacing of repetitions and the type of study and test stimuli. Bulletin of the Psychonomic Society, 31, 481–484. Carew, T. J., Pinsker, H. M., & Kandel, E. R. (1972). Long-term habituation of a defensive withdrawal reflex in aplysia. Science, 175, 451–454. Carpenter, S. K. (2009). Cue strength as a moderator of the testing effect: The benefits of elaborative retrieval. Journal of Experimental Psychology: Learning, Memory, & Cognition, 35, 1563–1569. Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and spacing effects to name-learning. Applied Cognitive Psychology, 19, 619–636. Carpenter, S. K., & DeLosh, E. L. (2006). Impoverished cue support enhances subsequent retention: Support for the elaborative retrieval explanation of the testing effect. Memory & Cognition, 34, 268–276.
Spacing and Testing Effects
139
Carpenter, S. K., & Pashler, H. (2007). Testing beyond words: Using tests to enhance visuospatial map learning. Psychonomic Bulletin & Review, 14, 474–478. Carpenter, S. K., Pashler, H., & Cepeda, N. J. (2009). Using tests to enhance 8th grade students’ retention of U.S. history facts. Applied Cognitive Psychology, 23, 760–771. Carpenter, S. K., Pashler, H., & Jones, J. (2008). The effect of retrieval practice on associative recall of word pairs. In: Poster presented at the 49th Annual Meeting of the Psychonomic Society. Chicago, IL. Carpenter, S. K., Pashler, H., & Vul, E. (2006). What types of learning are enhanced by a cued recall test? Psychonomic Bulletin & Review, 13, 826–830. Carpenter, S. K., Pashler, H., Wixted, J. T., & Vul, E. (2008). The effects of tests on learning and forgetting. Memory & Cognition, 36, 438–448. Carrier, M. L., & Pashler, H. (1992). The influence of retrieval on retention. Memory & Cognition, 20, 633–642. Cepeda, N. J., Coburn, N., Rohrer, D., Wixted, J. T., Mozer, M. C., & Pashler, H. (2009). Optimizing distributed practice: Theoretical analysis and practical implications. Experimental Psychology, 56, 236–246. Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132, 354–380. Cepeda, N. J., Vul, E., Rohrer, D., Wixted, J. T., & Pashler, H. (2008). Spacing effects in learning: A temporal ridgeline of optimal retention. Psychological Science, 19, 1095–1102. Challis, B. H. (1993). Spacing effects on cued-memory tests depend on level of processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 389–396. Chan, J. C. K. (2009). When does retrieval induce forgetting and when does it induce facilitation? Implications for retrieval inhibition, testing effect, and text processing. Journal of Memory and Language, 61, 153–170. Chan, J. C. K., McDermott, K. B., & Roediger, H. L., III. (2006). Retrieval-induced facilitation: Initially nontested material can benefit from prior testing of related material. Journal of Experimental Psychology: General, 135, 553–571. Chase, W. G., & Ericsson, K. A. (1981). Skilled memory. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 141–189). Hillsdale, NJ: Lawrence Erlbaum Associates. Ciccone, D. S., & Brelsford, J. W. (1976). Spacing repetitions in paired-associated learning: Experimenter versus subject control. Journal of Experimental Psychology: Human Learning and Memory, 2, 446–455. Cornell, E. H. (1980). Distributed study facilitates infants’ delayed recognition memory. Memory & Cognition, 8, 539–542. Cornoldi, C., & Longoni, A. (1977). The MP-DP effect and the influence of distinct repetitions on recognition of random shapes. Italian Journal of Psychology, 4, 65–76. Crowder, R. G. (1976). Principles of learning and memory. Hillsdale, NJ: Erlbaum. Cuddy, L. J., & Jacoby, L. L. (1982). When forgetting helps memory: An analysis of repetition effects. Journal of Verbal Learning and Verbal Behavior, 21, 451–467. Cull, W. L. (2000). Untangling the benefits of multiple study opportunities and repeated testing for cued recall. Applied Cognitive Psychology, 14, 215–235. D’Agostino, P. R., & DeRemer, P. (1973). Item repetition in free and cued recall. Journal of Verbal Learning and Verbal Behavior, 11, 54–58. Daniel, D. B., & Poole, D. A. (2009). Learning for life: An ecological approach to pedagogical research. Perspectives on Psychological Science, 4, 91–96. Dannenbring, G. L., & Briand, K. (1982). Semantic priming and the word repetition effect in a lexical decision task. Canadian Journal of Psychology, 36, 435–444. Delaney, P. F., & Knowles, M. E. (2005). Encoding strategy changes and spacing effects in the free recall of unmixed lists. Journal of Memory and Language, 52, 120–130.
140
Peter F. Delaney et al.
Delaney, P. F., & Verkoeijen, P. P. J. L. (2009). Rehearsal strategies can enlarge or diminish the spacing effect: Pure versus mixed lists and encoding strategy. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 1148–1161. Dellarosa, D., & Bourne, L. E. (1985). Surface form and the spacing effect. Memory & Cognition, 13, 529–537. Dempster, F. N. (1988). Informing classroom practice: What we know about several task characteristics and their effects on learning. Contemporary Educational Psychology, 13, 254–264. Dempster, F. N. (1996). Distributing and managing the conditions of encoding and practice. In E. L. Bjork & R. A. Bjork (Eds.), Memory (pp. 317–344). San Diego, CA: Academic Press. DeZazzo, J., & Tully, T. (1995). Dissection of memory formation: From behavioral pharmacology to molecular genetics. Trends in Neurosciences, 18, 212–218. Diana, R. A., & Reder, L. M. (2005). The list strength effect: A contextual competition account. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 1289–1302. Donovan, J. J., & Radosevich, D. J. (1999). A meta-analytic review of the distribution of practice effect: Now you see it, now you don’t. Journal of Applied Psychology, 84, 795–805. Drevenstedt, J., & Bellezza, F. S. (1993). Memory for self-generated narration in the elderly. Psychology and Aging, 8, 187–192. Durgunog˘lu, A. Y., & Roediger, H. L. (1987). Test differences in accessing bilingual memory. Journal of Memory and Language, 26, 377–391. Eich, J. E. (1980). The cue-dependent nature of state-dependent retrieval. Memory & Cognition, 8, 157–173. Ericsson, K. A., Delaney, P. F., Weaver, G. A., & Mahadevan, S. (2004). Uncovering the structure of a memorist’s superior ‘‘basic’’ memory capacity. Cognitive Psychology, 49, 191–237. Gartman, L. M., & Johnson, N. F. (1972). Massed versus distributed repetition of homographs: A test of the differential-encoding hypothesis. Journal of Verbal Learning and Verbal Behavior, 11, 801–808. Gates, A. I. (1917). Recitation as a factor in memorizing. Archives of Psychology, 40, 104. Glanzer, M., & Duarte, A. (1971). Repetition between and within languages in free recall. Journal of Verbal Learning and Verbal Behavior, 10, 625–630. Glenberg, A. M. (1976). Monotonic and nonmonotonic lag effects in paired-associate and recognition memory paradigms. Journal of Verbal Learning and Verbal Behavior, 15, 1–16. Glenberg, A. M. (1977). Influences of retrieval processes on the spacing effect in free recall. Journal of Experimental Psychology: Human Learning and Memory, 3, 282–294. Glenberg, A. M. (1979). Component-levels theory of the effects of spacing of repetitions on recall and recognition. Memory & Cognition, 7, 95–112. Glenberg, A. M., & Smith, S. M. (1981). Spacing repetitions and solving problems are not the same. Journal of Verbal Learning and Verbal Behavior, 20, 110–119. Glover, J. A. (1989). The ‘‘testing’’ phenomenon: Not gone but nearly forgotten. Journal of Educational Psychology, 81, 392–399. Glover, J. A., & Corkill, A. J. (1987). Influence of paraphrased repetitions on the spacing effect. Journal of Educational Psychology, 79, 198–199. Godden, D. R., & Baddeley, A. D. (1975). Context-dependent memory in two natural environments: On land and underwater. British Journal of Psychology, 66, 325–331. Greene, R. L. (1989). Spacing effects in memory: Evidence for a two-process account. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 371–377. Greene, R. L. (1990). Spacing effects on implicit memory tests. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 1004–1011.
Spacing and Testing Effects
141
Greene, R. L., & Stillwell, A. M. (1995). Effects of encoding variability and spacing on frequency discrimination. Journal of Memory and Language, 34, 468–478. Hall, J. W. (1992a). Unmixing effects of spacing on free recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 608–614. Hall, J. W. (1992b). Recall of lists of prolonged and repeated (spaced) words. Bulletin of the Psychonomic Society, 18, 183–186. Hanawalt, N. G., & Tarr, A. G. (1961). The effect of recall on recognition. Journal of Educational Psychology, 62, 361–367. Hintzman, D. L. (1974). Theoretical implications of the spacing effect. In R. L. Solso (Ed.), Theories in cognitive psychology: The Loyola symposium (pp. 77–99). Hillsdale, NJ: Erlbaum. Hintzman, D. L., & Block, R. A. (1973). Memory for the spacing of repetitions. Journal of Experimental Psychology, 99, 70–74. Hintzman, D. L., Summers, J. J., & Block, R. A. (1975). Spacing judgments as an index of study-phase retrieval. Journal of Experimental Psychology: Human Learning and Memory, 104, 31–40. Hirshman, E. (1995). Decision processes in recognition memory: Criterion shifts and the list-strength paradigm. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 302–313. Hogan, R. M., & Kintsch, W. (1971). Differential effects of study and test trials on long-term recognition and recall. Journal of Verbal Learning and Verbal Behavior, 10, 562–567. Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering a solution. Journal of Verbal Learning and Verbal Behavior, 17, 649–667. Jacoby, L. L. (1983). Perceptual enhancement: Persistent effects of an experience. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 21–38. Janiszewski, C., Noel, H., & Sawyer, A. G. (2003). A meta-analysis of the spacing effect in verbal learning: Implications for research on advertising repetition and consumer memory. Journal of Consumer Research, 30, 138–149. Jensen, T. D., & Freund, J. S. (1981). Persistence of the spacing effect in incidental free recall: The effect of external list comparisons and intertask correlations. Bulletin of the Psychonomic Society, 18, 183–186. Johnson, C. I., & Mayer, R. E. (2009). A testing effect with multimedia learning. Journal of Educational Psychology, 101, 621–629. Johnston, W. A., Coots, J. H., & Flickinger, R. G. (1972). Controlled semantic encoding and the effect of repetition lag on free recall. Journal of Verbal Learning and Verbal Behavior, 11, 784–788. Johnston, W. A., & Uhl, C. N. (1976). The contributions of encoding effort and variability to the spacing effect on free recall. Journal of Experimental Psychology: Human Learning and Memory, 2, 153–160. Kahana, M. J., & Howard, M. W. (2005). Spacing and lag effects in free recall of pure lists. Psychonomic Bulletin & Review, 12, 159–164. Kang, S. H. K., McDermott, K. B., & Roediger, H. L., III. (2007). Test format and corrective feedback modify the effect of testing on long-term retention. European Journal of Cognitive Psychology, 19, 528–558. Karpicke, J. D., Butler, A. C., & Roediger, H. L. (2009). Metacognitive strategies in student learning: Do students practice retrieval when they study on their own? Memory, 17, 471–479. Karpicke, J. D., & Roediger, H. L., III. (2007). Expanding retrieval practice promotes shortterm retention, but equally spaced retrieval enhances long-term retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 704–719. Kausler, D. H., Wiley, J. G., & Phillips, P. L. (1990). Adult age differences in memory for massed and distributed repeated actions. Psychology and Aging, 5, 530–534.
142
Peter F. Delaney et al.
Kintsch, W. (1994). Text comprehension, memory, and learning. American Psychologist, 49, 294–304. Kirsner, K., Smith, M. C., Lockhart, R. S., King, M. L., & Jain, M. (1984). The bilingual lexicon: Language-specific units in an integrated network. Journal of Verbal Learning and Verbal Behavior, 23, 519–539. Kolers, P. A. (1966). Interlingual facilitation of short-term memory. Journal of Verbal Learning and Verbal Behavior, 5, 314–319. Konopak, B. C., Martin, S. H., & Martin, M. A. (1990). Using a writing strategy to enhance sixth-grade students’ comprehension of content material. Journal of Reading Behavior, 22, 19–37. Kornell, N., & Bjork, R. A. (2008). Learning concepts and categories: Is spacing the ‘‘enemy of induction’’? Psychological Science, 19, 585–592. Kuo, T., & Hirshman, E. (1996). Investigations of the testing effect. American Journal of Psychology, 109, 451–464. Landauer, T. K. (1969). Reinforcement as consolidation. Psychological Review, 76, 82–96. Madigan, S. A. (1969). Intraserial repetition and coding processes in free recall. Journal of Verbal Learning and Verbal Behavior, 8, 828–835. Malmberg, K. J., & Shiffrin, R. M. (2005). The ‘‘one-shot’’ hypothesis for context storage. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 322–336. Mammarella, N., Avons, S. E., & Russo, R. (2004). A short-term perceptual priming account of spacing effects in explicit cued-memory tasks for unfamiliar stimuli. European Journal of Cognitive Psychology, 16, 387–402. Mammarella, N., Russo, R., & Avons, S. E. (2002). Spacing effects in cued-memory tasks for unfamiliar faces and nonwords. Memory & Cognition, 30, 1238–1251. Mannes, S. M., & Kintsch, W. (1987). Knowledge organization and text organization. Cognition and Instruction, 4, 91–115. Maskarinec, A. S., & Thompson, C. P. (1976). The within-list distributed practice effect: Tests of the varied context and varied encoding hypothesis. Memory & Cognition, 4, 741–746. McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N. (2007). Testing the testing effect in the classroom. European Journal of Cognitive Psychology, 19, 494–513. McDaniel, M. A., & Masson, M. E. J. (1985). Altering memory representations through retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 371–385. McNamara, T. P. (1992). Theories of priming: I. Associative distance and lag. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 1173–1190. Melton, A. W. (1967). Repetition and retrieval from memory. Science, 158(3800), 532. Menzel, R., Manz, G., Menzel, R., & Greggers, U. (2001). Massed and Spaced learning in honeybees: The role of CS, US, the intertribal interval, and the test interval. Learning & Memory, 8, 198–208. Metcalfe, J., & Kornell, N. (2003). The dynamics of learning and allocation of study time to a region of proximal learning. Journal of Experimental Psychology: General, 132, 530–542. Metcalfe, J., Kornell, N., & Son, L. K. (2007). A cognitive-science based program to enhance study efficacy in a high and low-risk setting. European Journal of Cognitive Psychology, 19, 743–768. Murdock, B. (2003). The mirror effect and the spacing effect. Psychonomic Bulletin & Review, 10, 570–588. Murnane, K., & Shiffrin, R. M. (1991). Word repetitions in sentence recognition. Memory & Cognition, 19, 119–130. Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention. Journal of Experimental Psychology: General, 106, 226–254.
Spacing and Testing Effects
143
Neely, J. H. (1991). Semantic priming effects in visual word recognition: A selective review of current findings and theories. In D. Besner & G. Humphreys (Eds.), Basic processes in reading: Visual word recognition (pp. 264–336). Hillsdale, NJ: Erlbaum. Nelson, D. L., McKinney, V. M., Gee, N. R., & Janczura, G. A. (1998). Interpreting the influence of implicitly activated memories on recall and recognition. Psychological Review, 105, 299–324. Norman, K. A. (2002). Differential effects of list strength on recollection and familiarity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 1083–1094. Nungester, R. J., & Duchastel, P. C. (1982). Testing versus review: Effects on retention. Journal of Educational Psychology, 74, 18–22. Offner, M. (1911). Das Gedaa¨chtnis: Die Ergebnisse der experimentellen Psychologie und ihre Anwendung in Unterricht und Erziehung. Berlin, Germany: Reuther & Reichard. Paivio, A. (1974). Spacing of repetitions in the incidental and intentional free recall of pictures and words. Journal of Verbal Learning and Verbal Behavior, 13, 497–511. Paivio, A., & Yuille, J. C. (1969). Changes in associative strategies and paired-associate learning over trials as a function of word imagery and type of learning set. Journal of Experimental Psychology, 79, 458–463. Parkin, A. J., Gardiner, J. M., & Rosser, R. (1995). Functional aspects of recollective experience in face recognition. Consciousness & Cognition, 4, 387–398. Pashler, H., Rohrer, D., Cepeda, N. J., & Carpenter, S. K. (2007). Enhancing learning and retarding forgetting: Choices and consequences. Psychonomic Bulletin & Review, 14, 187–193. Pasto¨tter, B., & Ba¨uml, K.-H. (2007). The crucial role of postcue encoding in directed forgetting and context-dependent forgetting. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 977–982. Pasto¨tter, B., & Ba¨uml, K.-H. (2010). Amount of postcue encoding predicts amount of directed forgetting. Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 54–65. Pavlik, P. I., & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cognitive Science, 29, 559–586. Peterson, L. R., Hillner, K., & Saltzman, D. (1962). Time between pairings and short-term retention. Journal of Experimental Psychology, 64, 550–551. Peterson, L. R., Wampler, R., Kirkpatrick, M., & Saltzman, D. (1963). Effect of spacing presentation on retention of a paired associate over short intervals. Journal of Experimental Psychology, 66, 206–209. Postman, L., & Knecht, K. (1983). Encoding variability and retention. Journal of Verbal Learning and Verbal Behavior, 22, 133–152. Price, H. L., Connolly, D. A., & Gordon, H. M. (2006). Children’s memory for complex autobiographical events: Does spacing of repeated instances matter? Memory, 14, 977–989. Pyc, M. A., & Rawson, K. A. (2009). Testing the retrieval effort hypothesis: Does greater difficulty correctly recalling information lead to higher levels of memory? Journal of Memory and Language, 60, 437–447. Quitadamo, I. J., & Kurtz, M. J. (2007). Learning to improve: Using writing to increase critical thinking performance in general education biology. Life Sciences Education, 6, 140–154. Raaijmakers, J. G. W. (2005). Modeling implicit and explicit memory. In C. Izawa & N. Ohta (Eds.), Human Learning and Memory: Advances in Theory and Application (pp. 85–105). Mahwah, NJ: Lawrence Erlbaum Associates. Raaijmakers, J. G. W., & Shiffrin, R. M. (1980). SAM: A theory of probabilistic search in associative memory. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory, 14 (pp. 207–262). New York: Academic Press.
144
Peter F. Delaney et al.
Raaijmakers, J. G. W., & Shiffrin, R. M. (1981). Search of associative memory. Psychological Review, 88, 93–134. Ratcliff, R., Clark, S., & Shiffrin, R. M. (1990). The list-strength effect: I. Data and discussion. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 163–178. Rea, C. P., & Modigliani, V. (1987). The spacing effect in 4- to 9-year-old children. Memory & Cognition, 15, 436–443. Reddy, B. G., & Bellezza, F. S. (1983). Encoding specificity in free recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 167–174. Reed, A. V. (1977). Quantitative prediction of spacing effects in learning. Journal of Verbal Learning and Verbal Behavior, 16, 693–698. Rickard, T. C., Lau, J. S., & Pashler, H. (2008). Spacing and the transition from calculation to retrieval. Psychonomic Bulletin & Review, 15, 656–661. Roediger, H. L., Agarwal, P. K., Kang, S. H. K., & Marsh, E. J. (2010). Benefits of testing memory: Best practices and boundary conditions. In G. M. Davies & D. B. Wright (Eds.), New frontiers in applied memory. (pp. 13–49). Brighton UK: Psychology Press. Roediger, H. L., III, & Karpicke, J. D. (2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181–210. Roediger, H. L., III, & Karpicke, J. D. (2006b). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249–255. Rohrer, D. (2009). The effects of spacing and mixing practice problems. Journal for Research in Mathematics Education, 40, 4–17. Rohrer, D., & Taylor, K. (2006). The effects of overlearning and distributed practice on the retention of mathematics knowledge. Applied Cognitive Psychology, 20, 1209–1224. Rose, R. J. (1984). Processing time for repetitions and the spacing effect. Canadian Journal of Experimental Psychology, 83, 537–550. Rose, R. J., & Rowe, E. J. (1976). Effects of orienting task and spacing of repetitions on frequency judgments. Journal of Experimental Psychology: Human Learning and Memory, 2, 142–152. Ross, B. H., & Landauer, T. K. (1978). Memory for at least one of two items: Test and failure of several theories of spacing effects. Journal of Verbal Learning and Verbal Behavior, 17, 669–680. Rundus, D. (1971). Analysis of rehearsal processes in free recall. Journal of Experimental Psychology, 89, 63–77. Russo, R., & Mammarella, N. (2002). Spacing effects in recognition memory: When meaning matters. European Journal of Cognitive Psychology, 14, 49–59. Russo, R., Mammarella, N., & Avons, S. E. (2002). Spacing effects in cued memory tasks for unfamiliar faces and nonwords. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 819–829. Russo, R., Parkin, A. J., Taylor, S. R., & Wilks, J. (1998). Revising current two-process accounts of spacing effects in memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 161–172. Sahakyan, L., Waldum, E. R., Benjamin, A. S., & Bickett, S. P. (2009). Where is the forgetting with list-method directed forgetting in recognition? Memory & Cognition, 37, 464–476. Sahakyan, L., & Delaney, P. F. (2003). Can encoding differences explain the benefits of directed forgetting in the list-method paradigm? Journal of Memory and Language, 48, 195–201. Sahakyan, L., & Delaney, P. F. (2005). Directed forgetting in incidental learning and recognition testing: Support for a two-factor account. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 789–801.
Spacing and Testing Effects
145
Sahakyan, L., Delaney, P. F., & Kelley, C. M. (2004). Self-evaluation as a moderating factor in strategy change in directed forgetting benefits. Psychonomic Bulletin & Review, 11, 131–136. Sahakyan, L., Delaney, P. F., & Waldum, E. R. (2008). Intentional forgetting is easier after two ‘‘shots’’ than one. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 408–414. Sahakyan, L., & Goodmon, L. B. (2007). The influence of directional associations on directed forgetting and interference. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 1035–1049. Sahakyan, L., & Kelley, C. M. (2002). A contextual change account of the directed forgetting effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 1064–1072. Scharf, M. T., Woo, N. H., Lattal, K. M., Young, J. Z., Nguyen, P. V., & Abel, T. (2002). Protein synthesis is required for the enhancement of long-term potentiation and longterm memory by spaced training. Journal of Neurophysiology, 87, 2770–2777. Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms suggest new concepts for training. Psychological Science, 3, 207–217. Seabrook, R., Brown, G. D. A., & Solity, J. E. (2005). Distributed and massed practice: From laboratory to classroom. Applied Cognitive Psychology, 19, 107–122. Shaughnessy, J. J. (1976). Persistence of the spacing effect in free recall under varying incidental learning conditions. Memory & Cognition, 4, 369–377. Shaughnessy, J. J., Zimmerman, J., & Underwood, B. J. (1972). Further evidence on the MP-DP effect in free-recall learning. Journal of Verbal Learning and Verbal Behavior, 11, 1–12. Shiffrin, R. M. (1970). Forgetting, trace erosion or retrieval failure? Science, 168, 1601–1603. Simon, D. A., & Bjork, R. A. (2001). Metacognition in motor learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 907–912. Slamecka, N. J., & Katsaiti, L. T. (1988). Normal forgetting of verbal lists as a function of prior testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 716–727. Smith, S. M. (1979). Remembering in and out of context. Journal of Experimental Psychology: Human Learning and Memory, 5, 460–471. Smith, S. M. (1984). A comparison of two techniques for reducing context-dependent forgetting. Memory & Cognition, 12, 477–482. Smith, S. M., Glenberg, A. M., & Bjork, R. A. (1978). Environmental context and human memory. Memory & Cognition, 6, 342–353. Smith, M. C., Theodor, L., & Franklin, P. E. (1983). The relationship between contextual facilitation and depth of processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 697–712. Son, L. K. (2004). Spacing one’s study: Evidence for a metacognitive control strategy. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 601–604. Sperber, R. D. (1974). Developmental changes in effects of spacing of trials in retardate discrimination learning and memory. Journal of Experimental Psychology, 103, 204–210. Spitzer, H. F. (1939). Studies in retention. Journal of Educational Psychology, 30, 641–656. Stern, L. D., & Hintzman, D. L. (1979). Spacing and retention of synonyms. Bulletin of the Psychonomic Society, 13, 363–366. Stigler, J. W., Fuson, K. C., Ham, M., & Kim, M. S. (1986). An analysis of addition and subtraction word problems in American and Soviet elementary mathematics textbooks. Cognition and Instruction, 3, 153–171.
146
Peter F. Delaney et al.
Storm, B. C., Bjork, E. L., & Bjork, R. A. (2008). Accelerated relearning after retrievalinduced forgetting: The benefit of being forgotten. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 230–236. Szpunar, K. K., McDermott, K. B., & Roediger, H. L., III. (2007). Expectation of a final cumulative test enhances long-term retention. Memory & Cognition, 35, 1007–1013. Szpunar, K. K., McDermott, K. B., & Roediger, H. L., III. (2008). Testing during study insulates against the build-up of proactive interference. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 1392–1399. Tan, L., & Ward, G. (2000). A recency-based account of the primacy effect in free recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 1589–1625. Thios, S. J. (1972). Memory for words in repeated sentences. Journal of Verbal Learning and Verbal Behavior, 11, 789–793. Thompson, C. P., Wenger, S. K., & Bartling, C. A. (1978). How recall facilitates subsequent recall: A reappraisal. Journal of Experimental Psychology: Human Learning and Memory, 4, 210–221. Toppino, T. C. (1991). The spacing effect in young children’s free recall: Support for automatic-process explanations. Memory & Cognition, 19, 159–167. Toppino, T. C. (1993). The spacing effect in preschool children’s free recall of pictures and words. Bulletin of the Psychonomic Society, 31, 27–30. Toppino, T. C., & Bloom, L. C. (2002). The spacing effect, free recall, and two-process theory: A closer look. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 437–444. Toppino, T. C., & Cohen, M. S. (2009). The testing effect and the retention interval. Experimental Psychology, 56, 252–257. Toppino, T. C., Cohen, M. S., Davis, M., & Moors, A. (2009). Metacognitive control over the spacing of practice: When is spacing preferred? Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 1352–1358. Toppino, T. C., & DeMesquita, M. (1984). Effects of spacing repetitions on children’s memory. Journal of Experimental Child Psychology, 37, 637–648. Toppino, T. C., & DiGeorge, W. (1984). The spacing effect in free recall emerges with development. Memory & Cognition, 12, 118–122. Toppino, T. C., Kasserman, J. E., & Mracek, W. A. (1991). The effect of spacing repetitions on the recognition memory of young children and adults. Journal of Experimental Child Psychology, 51, 123–138. Toppino, T. C., & Schneider, M. A. (1999). The mix-up regarding mixed and unmixed lists in spacing-effect research. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1071–1076. Tulving, E. (1967). The effects of presentation and recall of material in free-recall learning. Journal of Verbal Learning and Verbal Behavior, 6, 175–184. Tulving, E., & Hastie, R. (1972). Inhibition effects of intralist repetition in free recall. Journal of Experimental Psychology, 92, 297–304. Underwood, B. J. (1969). Some correlates of item repetition in free recall learning. Journal of Verbal Learning and Verbal Behavior, 9, 573–580. Underwood, B. J. (1970). The spacing effect: Additions to the theoretical and empirical puzzles. Memory & Cognition, 4, 391–400. Vander Linde, E., Morrongiello, B. A., & Rovee-Collier, C. (1985). Determinants of retention in 8-week-old infants. Developmental Psychology, 21, 601–613. Verkoeijen, P. P. J. L., & Bouwmeester, S. (2008). Modeling bimodality in spacing effect data. Journal of Memory and Language, 59, 545–555. Verkoeijen, P. P. J. L., & Delaney, P. F. (2008). Rote rehearsal and spacing effects in the free recall of pure and mixed lists. Journal of Memory and Language, 58, 35–47.
Spacing and Testing Effects
147
Verkoeijen, P. P. J. L., & Delaney, P. F. (2010a). The testing effect depends on the type of encoding strategy. Manuscript in preparation. Verkoeijen, P. P. J. L., & Delaney, P. F. (2010b). The effect of testing on memory for category lists. Manuscript in preparation. Verkoeijen, P. P. J. L., Rikers, R. M. J. P., Pecher, D., Zeelenberg, R., & Schmidt, H. G. (2010). Evidence against the semantic priming account of spacing effects in recognition memory. Unpublished manuscript. Verkoeijen, P. P. J. L., Rikers, R. M. J. P., & Schmidt, H. G. (2004). Detrimental influence of contextual change on spacing effects in free recall. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30, 796–800. Verkoeijen, P. P. J. L., Rikers, R. M. J. P., & Schmidt, H. G. (2005). Limitations to the spacing effect: Demonstration of an inverted U-shaped relationship between interrepetition spacing and free recall. Experimental Psychology, 52, 257–263. Vlach, H. A., Sandhofer, C. M., & Kornell, N. (2008). The spacing effect in children’s memory and category induction. Cognition, 109, 163–167. Wagner, A. D., Maril, A., & Schacter, D. L. (2000). Interactions between forms of memory: When priming hinders new episodic learning. Journal of Cognitive Neuroscience, 12, 52–60. Waugh, N. C. (1962). The effect of intralist repetition on free recall. Journal of Verbal Learning and Verbal Behavior, 1, 95–99. Waugh, N. C. (1963). Immediate memory as a function of repetition. Journal of Verbal Learning and Verbal Behavior, 2, 107–112. Waugh, N. C. (1967). Presentation time and free recall. Journal of Experimental Psychology, 73, 39–44. Waugh, N. C. (1970). On the effective duration of a repeated word. Journal of Verbal Learning and Verbal Behavior, 5, 587–595. Wheeler, M. A., Ewers, M., & Buonanno, J. F. (2003). Different rates of forgetting following study versus test trials. Memory, 11, 571–580. Wiley, J., & Voss, J. F. (1996). The effects of ‘‘playing’’ historian on learning in history. Applied Cognitive Psychology, 10, 63–72. Wilson, W. P. (1976). Developmental changes in the lag effect: An encoding hypothesis for repeated word recall. Journal of Experimental Child Psychology, 22, 113–122. Wolfe, C. R., Reyna, V. F., & Brainerd, C. J. (2005). Fuzzy-trace theory: Implications for transfer in teaching and learning. In J. P. Mestre (Ed.), Transfer of learning from a modern multidisciplinary perspective (pp. 53–88). Greenwich, CT: Information Age Publishing. Wright, J., & Brelsford, J. (1978). Changed in the spacing effect with instructional variables in free recall. American Journal of Psychology, 91, 631–643. Yonelinas, A. P., Hockley, W. E., & Murdock, B. B. (1992). Tests of the list-strength effect in recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 345–355. Young, J. L. (1971). Reinforcement-test intervals in paired-associate learning. Journal of Mathematical Psychology, 8, 58–81. Zeelenberg, R., & Pecher, D. (2002). False memories and lexical decision: Even twelve primes do not cause long-term semantic priming. Acta Psychologica, 109, 269–284. Zeelenberg, R., Pecher, D., & Tabbers, H. K. (2008). The effect of testing on memory: Does enhanced retention transfer to new test situations? Poster presented at the 49th Annual Meeting of the Psychonomic Society, Chicago, IL. Zimmerman, J. (1975). Free recall after self-paced study: A test of the attention explanation of the spacing effect. American Journal of Psychology, 88, 277–291.
C H A P T E R
F O U R
How One’s Hook Is Baited Matters for Catching an Analogy Jeffrey Loewenstein Contents 1. Introduction 2. Key Roles for Retrieving Analogies 2.1. Problem Solving and Retrieving Analogies 2.2. Creativity and Retrieving Analogies 2.3. Acquisition of Domain Knowledge and Retrieving Analogies 3. Underlying Structure and Retrieving Analogies 3.1. Encoding the Underlying Structure in Examples 3.2. Using Underlying Structure in Retrieval 4. Facilitating the Retrieval of Analogies at Retrieval Time 4.1. The ‘‘Own Memory’’ Studies: Retrieving Analogies from Autobiographical Memory 4.2. The Controlled Memory Set Studies 4.3. MAC/FAC Simulation Modeling 5. Implications 5.1. Implications for Problem Solving and Creativity 5.2. Implications for the Acquisition of Domain Knowledge 6. Conclusion References
150 151 152 155 157 160 160 163 167 168 170 171 173 173 175 176 177
Abstract Memory provides an ocean of possibilities, so it is necessary to find good ways to bait one’s hook to ensure catching something worthwhile. Intelligent action requires linking useful, previously learned examples with current problems, which very often means intelligent action requires retrieving analogies. We already know that effectively encoding the underlying structure in examples during initial learning facilitates later retrieving them as analogies. My colleagues and I have recently found that it is also possible to facilitate retrieving analogies by effectively encoding the example serving as a probe to memory, without relying on any special encoding of the stored examples. Thus, people do not have to learn examples well initially to still make good use of them, a finding with useful implications for problem solving, creativity, and acquiring domain knowledge. Psychology of Learning and Motivation, Volume 53 ISSN 0079-7421, DOI: 10.1016/S0079-7421(10)53004-4
#
2010 Elsevier Inc. All rights reserved.
149
150
Jeffrey Loewenstein
1. Introduction Acting intelligently requires information, leading to the fundamental problem of having the right information at the right time. Perhaps the key challenge making this problem difficult is readily retrieving a small bit of relevant information from a vast amount of available information. As an example of the scale and importance of the retrieval challenge, in a mundane search of the Web, people usually look at no more than a handful of the search results retrieved; those results are drawn from a pool of over a trillion pages on the Web (as of July 2008); and the market value of the Web search industry is on the scale of hundreds of billions of dollars. Compared to retrieving Web pages, retrieving information from one’s own memory is a more challenging and more critical problem. So, it is important to study the challenges people have in retrieving information from memory (Gentner, Rattermann, & Forbus, 1993; Ross, 1984). It is also why my colleagues and I (Gentner, Loewenstein, Thompson, & Forbus, 2009; Kurtz & Loewenstein, 2007) have been examining whether there is a new avenue for retrieving information effectively. Retrieving the right information at the right time is only sometimes challenging for people. Often, people do so effortlessly, so we consider the retrievals mundane—remembering what a dog is or how to get dressed are not typically recognized as achievements. More generally, the world is often kind (Gentner, 1989; Gentner & Medina, 1998; Medin & Ross, 1989), in the sense that surface appearances are often correlated with the underlying structures, principles, or essences required to act intelligently. This allows people a shortcut. They can encode surface properties and retrieve information from memory based on surface properties. Of course, one cannot always judge a book by its cover. It is in these sorts of situations—when information that is relevant comes from a source without surface properties in common—that people find it challenging to retrieve information and as a result fail to act intelligently. These are situations in which people need to retrieve examples that have different surface properties but the same underlying structures. People need to retrieve analogies (Gentner, 1983). Thus, the biggest challenge of people retrieving the right information at the right time from their own memories is the challenge of how to retrieve analogies. The dominant trend in research on retrieving analogies has been to emphasize the importance of how information is initially indexed or encoded (e.g., Gick & Holyoak’s, 1983; Schank, 1982). It is reasonable that for information to be later retrieved at the right times, people need to encode it well from the start. It also makes sense in an educational context to consider how people can learn information well initially such that they are likely to be able to use it later when it is relevant. A less obvious reason to focus on initial encoding is that in addition to learning new examples, initial
How One’s Hook Is Baited Matters for Catching an Analogy
151
encoding is the point at which people can also be learning new indexing terms (or more broadly, learning new encoding vocabularies; Forbus, Gentner, & Law, 1995; Kolodner, 1993). If people learn new indexing terms and then use those terms to encode future examples, this should facilitate retrieving the initial examples encoded with the same indexing terms. This ‘‘learning to encode’’ account (Gentner et al., 2009), together with the educational focus and the interest in people encoding information well from the start, provides reasons for focusing on how information is initially indexed when considering the retrieval of analogies. The largely unasked question has been whether improving people’s initial encoding of examples is the only influence on retrieving analogies. This is important, because if changing initial encoding is the only means for influencing the retrieval of analogies, then we have little to say to someone currently confronted by a challenge in retrieving an analogy. Their prior knowledge is either already well encoded (and hence retrieving analogies should not be so challenging) or it is effectively useless because it cannot be retrieved. But it appears that there is another means for facilitating the retrieval of analogies. My colleagues and I (Gentner et al., 2009; Kurtz & Loewenstein, 2007) have recently documented that people can change their ability to retrieve analogies at the time of retrieval, without changing initial learning.1
2. Key Roles for Retrieving Analogies The difficulties posed by retrieving analogies appear as specific challenges in many areas, consistent with effective information retrieval being a fundamental problem. Retrieving analogies is the central challenge behind effective knowledge transfer during problem solving (Gick & Holyoak’s, 1980; Ross, 1989). Retrieving analogies is a central challenge for creativity, as people need to connect new kinds of relevant information to current situations to produce novel and useful outcomes (Markman, Wood, Linsey, Murphy, & Laux, 2009). Retrieving analogies is also a key challenge in the process of acquiring domain knowledge, both in the course of child development (Brown, 1989, 1990) and in the course of developing expertise (Chi & Ohlsson, 2005). This is because people need to learn, apply, and organize their knowledge using the underlying principles in the domain rather than surface properties. Consequently, before considering a new way in which people can retrieve analogies, I examine prior research on retrieving analogies in problem solving, creativity, and the acquisition of domain knowledge. 1
As will be apparent, this stream of research and my thinking on it has benefitted greatly from the influence of my mentor and collaborator, Dedre Gentner; the errors are mine alone.
152
Jeffrey Loewenstein
2.1. Problem Solving and Retrieving Analogies Research on problem solving has long been concerned with people’s ability (or lack of ability) to transfer knowledge from one problem or situation to help solve further problems (Reeves & Weisberg, 1994; Ross, 1984; Whitehead, 1929). For example, people negotiating the syndication rights to a television show might recall an earlier negotiation in which the cost of shipping was contingent on the timing of the goods’ arrival, and realize that they could make the payments for the television show contingent on the show’s popularity (Thompson, Gentner, & Loewenstein, 2000). Spontaneously retrieving a prior example because it has matching underlying structure (e.g., contingent payments) rather than matching surface features (e.g., negotiating over a television show) seems to be the main sticking point. This is because people working on a problem who fail to retrieve a prior analogous example spontaneously can very often succeed at making use of the earlier example if given a hint to use it (Gick & Holyoak’s, 1980). Even without the potential distraction of working on a problem, people who read an example and are asked to recall an earlier example typically fail to retrieve an analogy, but can select the appropriate analogical match if given a forced choice (Gentner et al., 1993). Accordingly, how to apply a prior example to help solve a current problem is usually less of a challenge (but still a concern, e.g., Bassok & Holyoak, 1989; Blessing & Ross, 1996) than the challenge of retrieving the prior example. Research provides varying degrees of support for factors that enable people to retrieve analogies during problem solving. There is widespread agreement that having drawn a comparison between initial examples facilitates retrieving them later when confronted by an analogous problem or situation. This effect holds when people compare initial examples as part of a learning phase (e.g., Catrambone & Holyoak, 1989; Gentner, Loewenstein, & Thompson, 2003; Gick & Holyoak’s, 1983) and when people compare initial examples in the course of problem solving (e.g., Dixon & Bangert, 2004; Ross & Kennedy, 1990; see also Gentner, Loewenstein, & Hung, 2007; Loewenstein & Gentner, 2001). There is also evidence from a variety of sources that linking an example not to a second example but to an abstract principle (a description of the underlying structure in an example) can promote later analogical transfer (Goldstone & Wilensky, 2008). This effect has been found, for example, for people who read a principle embedded within an example (Ross & Kilbane, 1997), people who have to select which of a small number of principles applies to an example (Seifert, McKoon, Abelson, & Ratcliff, 1986), and people who compare a principle and an example (Gentner, Loewenstein, & Thompson, 2004). Related work, although not showing effects on problem solving, does show that people with sufficient relevant background knowledge and encouragement to study single examples can derive the underlying structure
How One’s Hook Is Baited Matters for Catching an Analogy
153
in those examples through self-explanation (Ahn, Brewer, & Mooney, 1992) or discussion (Schwartz, 1995). It is plausible that self-explanation or joint discussion could therefore facilitate later analogical retrieval of the studied example and promote problem solving. There is a small amount of evidence that even without an explanation, just using a few words to emphasize the underlying structure in the initial example facilitates using that example later: Loewenstein and Gentner (2005, Experiment 5) found transfer effects between mapping tasks, and Clement, Mawby, and Giles (1994) found effects on retrieving analogies. An alternative approach to emphasizing underlying structure is to remove potential distractions from examples. There is a body of literature supporting the idea that both using sparse examples with few surface features (DeLoache, 1995; Gentner & Rattermann, 1991) and progressively removing concrete details to make an example more schematic (Goldstone & Son, 2005) promote later transfer to solving an analogous problem. Finally, a new finding is that presenting written examples, as is commonly done in studies of analogical retrieval and problem solving, engenders lower rates of analogical retrieval than presenting those examples in spoken form (Markman, Taylor, & Gentner, 2007). It is possible that the advantage of hearing rather than reading examples is that listening encourages people to emphasize the gist of the examples. A potential generalization across all these findings is that people need encouragement to generate encodings of examples that (1) articulate the underlying structure in examples, (2) articulate that underlying structure in a generic fashion, apart from contextual details, and (3) emphasize that underlying structure. It cannot simply be that people fail to encode underlying structure and these interventions help them to do so—the primary stumbling block is retrieving the analogy, not using the analogy once it is retrieved. If people did not encode the information, it would not be available for application. Thus the effects of comparing examples, linking examples to abstract principles, self-explanations of the underlying structure in examples, language for the underlying structure in examples, and fading away details must be more about how and how much of the underlying structure is encoded. The various interventions seem to encourage people to encode the complete underlying structure in examples and to do so in a generic way. A generalization or schema resulting from comparison should capture all the commonalities between examples. Because it has to apply to both examples it must be at least somewhat generic—how generic will depend on how different the examples are. Abstract principles are necessarily generic. A component of self-explanations is developing and improving one’s understandings of the underlying structure in examples. Language for the underlying structure in examples can provide generic categories for characterizing the key underlying structure in examples and can point out
154
Jeffrey Loewenstein
critical aspects of that structure thereby minimizing the likelihood of partial understandings. Fading away details also makes examples less specific and more generic, as well as enhances the ease of perceiving the full underlying structure. Finally, gist encodings of examples should also tend toward the generic. To be clear, all of these routes to fostering complete and generic encodings of examples are efforts at removing details and preserving the underlying structure in the examples. They are not fostering the derivation of vapid abstractions (‘‘something happened’’) that gloss over the coherent systems of roles and relations in the examples. The final commonality to the interventions is that they foster emphasizing the underlying structure in the examples. This is not to say that people fail to encode other aspects of examples. For example, we know that people continue to use surface features in the retrieval and application of prior examples even after they retrieve analogies (Bassok & Holyoak, 1989; Blessing & Ross, 1996; Novick, 1988). It is a matter of pushing that surface property information to the background and bringing underlying structure to the fore. One way to think about the interventions is that by fostering an emphasis on complete, generic encodings of underlying structure in examples, they are encouraging people to encode examples like domain experts. Presumably experts focus their encodings on the underlying structure in examples, consistent with research that experts organize problems based on their underlying principles rather than their surface properties (e.g., Chi, Feltovitch, & Glaser, 1981). If so, novices should have the most difficulty retrieving analogies, whereas experts should have less difficulty retrieving analogies. In support of this claim, Novick (1988) found that students with higher mathematics test scores were better able than those with lower test scores to retrieve analogies to solve problems. Also in support of this account, Dunbar and Blanchette (2001) report that microbiology researchers (who were presumably experts in the domain of microbiology) generated more spontaneous analogies between areas of microbiology than they did to examples outside the domain. Also consistent with this account, my colleagues and I (Gentner et al., 2009) have found that higher base rates for retrieving analogies from one’s own memory were associated with more years of domain experience. I will provide specifics on our data, and supplement the published data with unpublished data, because our work extends prior research on the relationship between experience and analogical retrieval. Collapsing across several studies that used the same methodology, we have found that the quality (i.e., degree of structural match, measured on a 0–2 scale) of people’s retrievals based on a negotiation example was associated with the extent of their business experience. We found low-quality retrievals from those with no business experience (M ¼ 0.24, n ¼ 17), better quality retrievals from people with 1–7 years of business experience (M ¼ 0.52, n ¼ 81), and
How One’s Hook Is Baited Matters for Catching an Analogy
155
still higher quality retrievals from people with 15–40 years of business experience (M ¼ 0.84, n ¼ 80), w2 (2, N ¼ 178) ¼ 10.81, p < 0.05. This is weak evidence given that it consists of cross-study comparisons, but it is suggestive of the general pattern claimed in the literature and covers a more substantial range of experience than previous research. Taken together, the research on retrieving analogies in problem solving can largely be summarized by claiming that better encodings of examples today lead to better retrievals of those examples tomorrow. If people are encouraged to encode examples in more sophisticated ways than they apparently normally do—such as because they drew analogical comparisons—then they are more likely to be able to later retrieve the example when they encounter an analogous problem.
2.2. Creativity and Retrieving Analogies Research on creativity has long emphasized both the value of analogies (Finke, Ward, & Smith, 1992; Hesse, 1966; Hofstadter & The Fluid Analogies Research Group [FARG], 1995; Koestler, 1969; Weisberg, 1993) and the value of connecting ideas from different domains (Mednick, 1962) as a way to generate novel and useful outcomes. However, creativity research has not always linked these two discussions, perhaps because creativity research is interested in all useful associations, and even random associations (Campbell, 1960). Still, analogies provide an important avenue for the generation of creative outcomes (Holyoak & Thagard, 1995), and accordingly analogical retrieval is important for creativity and innovation (Markman et al., 2009). For example, a topic of strong current interest within the design and innovation community is biomimicry (e.g., biomimicry.net; asknature.org), or the generation of designs based on analogies to nature. Benyus (1997) describes the modification of a train’s design so that it entered tunnels more efficiently based on an analogy to how kingfisher birds enter water. As this example indicates, retrieving an appropriate analogy to support creativity can be daunting, as it was within the domain of problem solving. Perhaps this is unsurprising, given that some researchers treat creativity as a kind of problem solving (e.g., Amabile, 1996). Still, creativity research has developed along somewhat different lines than problem-solving research and provides some distinct insights into the retrieval of analogies. One area of creativity research has examined providing various sorts of memory cues and their influence on analogical retrieval. This is related to the research discussed earlier on the world being kind, which suggested that surface properties can serve as the primary basis for retrievals that can subsequently be used as analogies (Blessing & Ross, 1996; Ross, 1987). Research on brainstorming (Dugosh, Paulus, Roland, & Yang, 2000) and on design tasks (Christensen & Schunn, 2009) suggests that providing
156
Jeffrey Loewenstein
multiple retrieval cues as people work on creative tasks facilitates their performance, and at least some of these gains are due to retrieving analogies. For example, Dugosh et al. found that if people heard lists of ideas as they were brainstorming, their own performance was improved relative to those not hearing lists of ideas, and the more ideas presented the better their performance. These sorts of memory cues are often fast and cheap to generate. Yet the more arbitrary the cues and the less kind the world, the less efficient this mechanism will be in triggering useful analogical retrievals. Random and arbitrary cues are nonetheless commonly discussed by practitioners, perhaps because of the effectiveness of variable reinforcement schedules for generating superstitious learning (Ferster & Skinner, 1957), or more charitably, because the value of retrieving analogies for creativity and innovation is sufficiently high that any improvement from people’s typically low baseline performance is notable. In effect, this work highlights the straightforward claim that if the base rates for analogical retrieval are low, making more retrieval attempts should be more effective than making fewer retrieval attempts. Still, what is most important for current purposes is that this intervention to trigger analogical retrieval occurs at the time of retrieval, rather than at the time of initial learning, as was commonplace in the problem-solving research discussed earlier. A second area of creativity and innovation research has examined social factors that are predictive of retrieving analogies. As just discussed, providing multiple different cues to memory is one means of accessing information from multiple different areas of memory and improving one’s odds given a low overall base rate of analogical retrievals. Another means for making multiple retrieval attempts is to present the same cue to multiple people—and ideally people who have sufficiently different memories such that their retrievals are non-overlapping. For example, Dunbar’s (1995) work implies that if one asked a group of microbiology researchers to explain a puzzling research finding, the more diverse the specialty areas of the researchers, the more likely it is that one will generate an analogy that provides an appropriate explanation. The idea of tapping diverse groups of individuals to spur creativity extends beyond retrieving analogies, but many of those working with diverse groups (and who are often not, at least at the outset, analogy researchers) tend to end up emphasizing the retrieval and application of analogies in discussing their findings (Burt, 2004; Dunbar, 1995; Hargadon & Sutton, 1997; Paletz & Schunn, 2010). Perhaps the most relevant aspect of these findings for our understanding of the individual retrieval of analogies is that there are gains of having access to multiple domains of knowledge. It is possible that individuals who themselves have competence in multiple domains and consider retrieving examples from each of those domains should be better able to retrieve analogies than those whose competence or retrieval attempts lie solely within one domain.
How One’s Hook Is Baited Matters for Catching an Analogy
157
A third area of creativity research has examined incubation or preparedness effects (Christensen & Schunn, 2005; Moss, Kotovsky, & Cagan, 2007; Seifert, Meyer, Davidson, Patalano, & Yaniv, 1995). This work, which straddles creativity and problem solving, derives from the observation that people, on rare occasions, spontaneously generate solutions to a problem when they are not actively working on the problem. These are the ‘‘a-ha’’ moments that can strike when walking in the park, getting on a bus, or taking a shower. Here, unlike in problem solving and analogical transfer, the problem or need for creativity comes first, and the solution, which is often an analogy, comes later and retrieves the earlier problem (‘‘oh, my problem can be solved in this same way’’). The new aspect for the discussion of analogical retrieval that this work raises is that an unresolved prior example (an open problem or need for creativity) is likely to be retrieved when encountering an analogous example. Thus, an additional factor influencing initial encoding of examples that could facilitate later analogical retrieval is whether those examples are finished and settled, or still open concerns. Incubation effects suggest that if people have open concerns (or, in the language of social cognition, have concerns that are chronically accessible; Higgins, 1996), this should facilitate analogical retrieval by leading people to make more retrieval attempts, and retrieval attempts in multiple contexts. This limited review of research on creativity contributes two useful points to the discussion of analogical retrieval. First, it demonstrates the practical importance of retrieving analogies and consequently the value of factors that can increase people’s ability to retrieve analogies. Second, it raises the possibility that retrieving analogies is not only about what is stored. We can also attend to the situation at the time of retrieval and consider factors at that point that could spur useful retrievals.
2.3. Acquisition of Domain Knowledge and Retrieving Analogies In discussing problem solving earlier, expertise appeared to be related to retrieving analogies. Specifically, the prior discussion suggested that expertise facilitates analogical retrieval because experts encode examples in their domains using consistent, sophisticated systems of concepts (Forbus et al., 1995). The question then arises as to how experts learn those sophisticated systems of concepts that form the heart of their domain knowledge. Drawing analogies likely plays a key role here (Gentner & Loewenstein, 2002), and there is potentially a key role for analogical retrieval, as well. One possibility for why learning sophisticated systems of concepts might hinge on analogical retrieval is that such concepts must apply to examples that do not share surface properties. For example, within the domain of negotiation, there are a relatively small number of kinds of contract structures that enable parties to form agreements that are beneficial for all involved, and those
158
Jeffrey Loewenstein
contract structures apply to a limitless number of negotiation contexts. A straightforward contract structure along these lines is called a ‘‘tradeoff,’’ and it involves parties conceding on low-priority issues so they can reap concessions from other parties on high-priority issues. Tradeoffs appear in a wide array of negotiation situations. A new graduate negotiating a job offer might concede and accept a limited benefits package (which is costly to employers but less pressing to the young and healthy) in return for gaining a high starting bonus to help pay for pressing needs such as a home and car. Or, the United States might drop foreign tariffs in return for a country’s human rights concessions. Recognizing that varied examples are instances of the same underlying structure—and so forming a relational category (Gentner & Kurtz, 2005)—could involve analogical retrieval to link the varied instances. To clarify, the use of the term ‘‘experts’’ here is not limited to chess experts, radiological experts, or other experts in advanced domains, but rather applies more generally to all developed domain knowledge. The complex declarative learning central to acquiring domain knowledge is perhaps the major accomplishment of cognitive development as well as the development of expertise in a formal domain (Chi & Ohlsson, 2005; Mason, 2007). The research challenge in studying complex declarative learning is to understand conceptual change or knowledge restructuring. That is, the goal is to understand the learning that results in knowledge that is qualitatively different than what came before and revises one’s framework for understanding the domain, rather than simply being the accretion of factual details added to an existing framework (Carey, 1991; Stern, 2005). Although there is still disagreement and a lack of knowledge about the mechanisms underlying conceptual change (Lewandowsky, Kalish, & Griffiths, 2000), analogy is on most shortlists (Chi & Ohlsson; Gentner, Brem, Ferguson, Markman, Levidow, et al., 1997; Smith, Solomon, & Carey, 2005; Vosniadou & Brewer, 1987). The reason is that drawing analogies is a mechanism that can produce generalizations of the underlying structures in examples (Gentner & Wolff, 2000; Hummel & Holyoak, 2003). These are structures that people might previously have thought to be idiosyncratic rather than being organizing principles or relational categories. For example, novice negotiators may recognize that they formed an agreement specifying that the payments for a television show vary based on the show’s eventual ratings, but fail to realize that this is a general type of contract structure—a contingent contract—that can be used in a broad array of negotiation situations (Bazerman & Gillespie, 1999). Because analogies can lead people to recognize generic underlying structure in examples, this means that analogy can produce new representational resources for encoding examples. In brief, analogy plays a fundamental role in the acquisition of domain knowledge because once people are thinking about an analogy, it can lead them to generate sophisticated concepts and systems of concepts (see Gentner & Wolff).
How One’s Hook Is Baited Matters for Catching an Analogy
159
The question then becomes how people come to draw the analogies that yield new generalizations about domain principles. There are at least two established sources of analogies. One source is progressive alignment (Gentner et al., 2007; Kotovsky & Gentner, 1996): by comparing items that are similar both in their surface properties and underlying principles, people are sensitized to both kinds of commonalities and become increasingly able to notice other examples sharing the same underlying principles, even if they have different surface properties. For example, children who are asked to compare OXO to oxo are better able to notice a similarity with ABA than children just seeing OXO (and not also oxo) initially (Kotovsky & Gentner). Progressive alignment uses comparison to improve the quality of people’s initial encodings, thereby facilitating later drawing analogies. The difference is that it starts with items with similar surface properties, removing the need for analogical retrieval to initiate the process. A second basis for people drawing analogies to promote learning domain knowledge is social guidance (Carey, 2004; Gentner & Loewenstein, 2002; Tomasello, 1999). People who are relative experts in a domain can guide learners to draw appropriate analogies and thereby foster their knowledge acquisition. This can be done directly, though formal or informal instruction. It can also be done indirectly, by using words that stand for the important underlying principles in the domain consistently across examples, and thereby guiding learners to compare examples labeled with the same words. For example, it is not always obvious what it means to cooperate, but tracking how the word cooperation is used across situations could guide people to understand this complex kind of social interaction (Keller & Loewenstein, 2010). The words make the world kinder for learners, in the sense that they serve to add a salient surface property (Yamauchi & Markman, 2000) to examples with common underlying principles. What is striking about these two means for using analogies is that they, like the earlier work in problem solving, largely rely on people’s encodings of the initial examples. They promote analogical retrieval by encoding examples in useful ways from the start, leveraging people’s ability to link items on the basis of surface features. This raises the question of what happens to all the examples people learn that are not encoded in useful ways from the start. These examples might be only minimally useful, mostly lying latent as inert knowledge in memory because they are unlikely to be recalled by an analogy and linked to other examples on the basis of common underlying principles. If the knowledge experts have is more abstract, interconnected, and consistent than novices’ knowledge (Chi & Ohlsson, 2005), and is at least in part incommensurable with prior understandings (Smith et al., 2005), then examples learned early may indeed be mostly wasted. They would be mostly wasted unless it is possible for people to revisit earlier examples and re-encode them using later, more sophisticated understandings (cf., de Leeuw & Chi, 2003). Perhaps, people can leverage
160
Jeffrey Loewenstein
new understandings by integrating them with prior knowledge, and thereby update and learn from that prior knowledge. Perhaps, new learning can not only influence what one can do in the future but also one’s understanding of the past. It could, if people could retrieve analogies.
3. Underlying Structure and Retrieving Analogies One of the most central claims about human memory retrieval (Higham, 2002) is that it is a function of encoding specificity (Tulving, 1983; Tulving & Thompson, 1973), or the match between what was initially encoded about some exemplar and the probe that is later used to cue memory retrieval. As a simple example, MBA students who study tables of company financial information are better able to recall that information later if they are presented with an empty table that is otherwise the same rather than an empty table whose rows and columns were transposed (Ryack & Kida, 2006). Encoding specificity has implications for retrieving analogies. As already noted, analogies rest on a match between the underlying structures in two exemplars—the core insight articulated and elaborated on by the Structure-Mapping theory of analogy (Gentner, 1983; Gentner & Markman, 1997). Thus, retrieving analogies from memory must result from people encoding something about the underlying structures in both the stored and probe exemplars, as otherwise there would be no basis for an analogical match. Examining whether, when, and how people encode underlying structure in examples is therefore an important concern. Critically though for current purposes, encoding specificity is concerned with a match between what is stored and the cue or memory probe. Although nearly all prior research on analogical retrieval focuses on how stored examples are encoded, encoding specificity provides equal motivation to be concerned with how memory probes are encoded.
3.1. Encoding the Underlying Structure in Examples There are innumerable ways to encode exemplars, and given that people use their knowledge in multiple ways, people have good reasons for encoding exemplars in multiple ways (Markman & Ross, 2003). For example, for pragmatic reasons people frequently draw attention to different aspects of and different perspectives on exemplars. Based on the situation, we might decide to call a person ‘‘the company CFO,’’ ‘‘my boss,’’ ‘‘an accountant,’’ ‘‘an African-American,’’ or ‘‘a member of the Executive Committee,’’ and our choice conveys a particular encoding of the situation at hand
How One’s Hook Is Baited Matters for Catching an Analogy
161
(Clark, 1996). Such encoding choices should be consequential for memory retrieval, because of encoding specificity. Young children, novices early in learning a domain, and the lazy or rushed (Gentner & Rattermann, 1991; Hammond, Siefert, & Gray, 1991) may encode little or no underlying structure in examples, but otherwise it is reasonable to assume that people encode some underlying structure. For example, 5- to 6-year-olds tend to generate interpretations of the metaphor ‘‘A tape recorder is like a camera,’’ that are focused on surface properties (e.g., they are both black), but 9- to 10-year-olds and adults generate interpretations that are focused on underlying structure (e.g., they both record something for later; Gentner, 1988). The question then turns to the sources of variation and normal tendencies in how people encode underlying structure. It is already clear from the various interventions that advance problem solving discussed earlier that if emphasizing underlying structure makes a difference for performance, it must be the case that people routinely fail to emphasize the underlying structure in examples. I suggest that there are at least four sources of variation in encoding underlying structure: ambiguity, context specificity, completeness, and weighting. The first is the same ambiguity and contextual variability seen in the CFO, boss, etc., example. For example, the act of shaking hands can be interpreted as a greeting (like a wave or a spoken ‘‘hello’’) or as a signal of agreement (like signing a contract). Variation in encoding the underlying structure in examples implies a challenge in choosing which of the multiple possible underlying structures to encode. In most computational models of analogical retrieval (and I will use MAC/FAC, Forbus et al., 1995, and LISA, Hummel & Holyoak, 1997, as running examples because they are arguably the most prominent models), this kind of variation would be instantiated as a difference in which representational units or predicates would be used in a knowledge representation to account for the underlying structure. For example, MAC/FAC takes as input predicate logic representations, composed of units such as GREET(party A, party B) or AGREE (party A, party B). The issue of ambiguity, which underlying structure to encode, is unavoidable and has influences at multiple levels of analysis. Gentner’s (1982; Gentner & Boroditsky, 2001; Gentner & Bowerman, 2009) natural partitions, relational relativity, and typological prevalence hypotheses map out in large-scale reasons for variation and the role of language in guiding people toward some underlying structures as opposed to others. People’s own areas of expertise as well as their particular needs and goals in a given situation must also influence what they encode. The second source of variation in encoding underlying structure is how context specific it is. Underlying structure can be encoded in ways that are specific to the example or in ways that are more generic. For example, one description of a negotiation situation might be ‘‘The Management at Arlington Inc. proposed a 10% pay cut for all hourly workers, which
162
Jeffrey Loewenstein
prompted those workers to announce they were considering going on strike.’’ A more generic description of this situation might be ‘‘One party asked for a concession, and the other party refused and made a threat.’’ Computational models of analogical retrieval could instantiate this kind of variation as a difference in which predicates are used in the knowledge representation just as with ambiguity and contextual variability. The distinction is that here, the different descriptions would necessarily draw upon predicates that have different distributions in the larger pool of examples used in the simulation. More generic predicates are those that are used in a wider array of contexts, and take a broader range of arguments. If distributed representations were used to encode examples (as in Hummel & Holyoak, 1997), then an additional difference would likely be that more generic predicates would have fewer components or features. The reason for variation in context specificity is that both specificity and generality are functional. Specificity retains more information and is more tightly linked to the situations of known relevance, while generality provides the potential for powerful and efficient systems of rules. Medin and Ross (1989) outlined reasons why people, and particularly relative novices, are likely to have a bias toward context-specific encodings rather than generic encodings. As a result, people likely preserve at least some of the basis for noting multiple underlying structures (i.e., preserve the basis for ambiguity, the first source of variation). However, this means that the underlying structures in examples are likely to be latent rather than manifest (Clement et al., 1994) in people’s representations. The third source of variation in encoding underlying structure is how complete it is. For example, the use of tradeoffs to form negotiated agreements, as noted earlier, involves exchanging concessions on low-priority issues for gains on high-priority issues. After reading an example exemplifying a tradeoff, some of our participants (Gentner et al., 2009, Experiment 3) seemed to encode only part of the underlying structure, writing descriptions such as ‘‘both parties conceded,’’ ‘‘they compromised,’’ ‘‘everyone benefitted,’’ or ‘‘they were creative.’’ Another form of partial structure is to encode the lower level events, but fail to consider the overarching pattern (or high-order relational structure) that brings coherence to those events (Clement et al., 1994, study 3; Gentner & Toupin, 1986; Loewenstein & Gentner, 2005). Computational models of analogical retrieval would instantiate this kind of variation as a difference both in how many predicates are used to represent the underlying structure in the example and in the elaborateness of the system of relations among those predicates. I am not aware of any general claims about the completeness of one’s encodings of underlying structure in examples. Certainly one’s skill, motivation and opportunity are plausible factors influencing completeness. The conservative claim, given bounded rationality (Simon, 1947), would have to be that people tend to encode only part of the underlying structures in examples.
How One’s Hook Is Baited Matters for Catching an Analogy
163
The fourth source of variation in encoding underlying structure is the weight or importance placed on that structure relative to other aspects of the example. For example, greetings may play a very important or a trivial role in a negotiation example. This could be instantiated in computational models of analogical retrieval by varying the number of predicates and elaborateness of the system of relations involving those predicates, as with the discussion of completeness. Encoding specificity implies that the match between the entire probe example and stored example is what matters. As a result, the more of the example that is dedicated to encoding the underlying structure, the more the underlying structure should determine what matches the example. For example, in MAC/FAC, each component in a representation has an associated weight that indicates its importance in the initial matching process (the MAC stage), and the weight of all the components are normalized such that together they sum to one. It follows that adding more surface properties to a knowledge representation necessarily means lowering the weight given to the predicates representing the underlying structure. Or, on the flip side, simplifying a knowledge representation by removing surface properties necessarily implies adding weight to the predicates representing the underlying structure. Given the earlier claim that people, particularly relative novices, have a tendency to encode examples in context-specific ways, it follows that their encodings maintain surface properties and hence give relatively low weightings to underlying structure. The picture of people’s encodings of underlying structure in examples that emerges from this discussion is one in which relative novices generate context-specific encodings. They probably only encode part of the underlying structure, grant it relatively low weight because they maintain ample surface properties, and encode the underlying structure in a way that leaves latent the similarity between the example and other examples of the same underlying structure set in other contexts. Having encoded examples in these ways, it is relatively uncommon for them to retrieve analogies. In contrast, relative experts likely do encode examples using their underlying structures and likely mostly disregard surface properties (Myles-Worsley, Johnston, & Simons, 1988). Having encoded examples in this way, it should be relatively more common for experts to retrieve analogies. Finally, the manipulations that seemed to influence analogical retrieval in problem solving and creativity are readily interpretable as ones that influence the encoding of (at least initial) examples in ways that address typical novice shortcomings.
3.2. Using Underlying Structure in Retrieval Given that underlying structure is at least sometimes encoded, and that encoding specificity implies retrievals should be governed by what is encoded, we must predict that analogical retrieval can and does occur on
164
Jeffrey Loewenstein
the basis of underlying structure. The most basic component of structure is role bindings: most parents want to know if it was John who hurt Mary, or Mary who hurt John. Wharton, Holyoak, Downing, Lange, Wickens and Melz (1994) showed that for sentences almost as simple as these, people were capable of retrieving earlier sentences with not just similar content (Jack, Marie, hit), but also matching role bindings (that Jack was the one doing the hitting, and it was Marie who got hit). A second important concern is whether people can retrieve prior examples based on many relations, not just one. An elegant demonstration by Clement et al. (1994) demonstrated that people can, and that generic encodings of underlying structure matter. They showed that people were less likely to retrieve an analogy if the examples were written with contextspecific verbs (e.g., verbs specific to writing, such as edit. . . marked. . . typed in; verbs typically encode the relations that make up underlying structure) than if the examples were written with generic verbs (e.g., verbs not specific to writing that can apply to many contexts, such as fixed. . . replaced). Using generic verbs served to facilitate people’s analogical retrievals. A final concern is whether people can retrieve analogical examples sufficiently well to make use of them. Ross and Kennedy (1990) showed that people who first linked two examples with a common underlying structure were later able to use these early examples to help solve a subsequent problem. Perhaps the most evidence in support of the use of underlying structure in retrieval are the studies showing that people who draw comparisons between examples and derive generalizations of their commonalities, relative to those who see just one initial example or two examples separately, are later more likely to solve an analogous problem (e.g., Catrambone & Holyoak, 1989; Gick & Holyoak’s, 1983; Loewenstein, Thompson, & Gentner, 1999, 2003). The explanation for these effects rests on people retrieving either the earlier examples or a generalization derived from those examples upon encountering the subsequent analogous problem—it rests on analogical retrieval. The question is, how does analogical retrieval work? The approach taken by the main models of analogical retrieval generally starts from content matches and moves to structural matches that take role bindings into consideration. MAC/FAC (Forbus et al., 1995) uses two distinct phases: one in which all content in a probe example is matched against all content in the examples in memory without consideration for structure. Then, the most similar items from memory are examined in a second phase that takes the entire structure of the examples into consideration. LISA (Hummel & Holyoak, 1997) uses less distinct phases, matching subsets of the content in examples in separate waves (in effect using content and a little information about structure) and then subsequently adds in a fuller accounting of structural consistency. A further, subtler commonality across the models is that they normalize the weights of the items within a
How One’s Hook Is Baited Matters for Catching an Analogy
165
single knowledge representation—forcing each example to have a total weight of one, rather than having each predicate within a representation have a weight of one—to avoid examples with large numbers of predicates from being constantly retrieved (Forbus et al.; Hummel & Holyoak). As a result of this normalizing, not just the number of predicates articulating the underlying structure but their proportion relative to the rest of the example should therefore influence the rate of retrieving analogies. Accordingly, the most basic explanation that models of analogical retrieval give to explain why people retrieve analogies is that examples are encoded in generic rather than context-specific form, with complete underlying structures, and with relatively few surface properties so their underlying structures gain sufficient weight. Why generic encodings facilitate analogical retrieval is complex. One possibility is that generically encoded examples better retrieve other generically encoded examples than contextually encoded examples retrieve other contextually encoded examples (assuming that the examples are from different contexts). This was the contrast Clement et al. (1994) examined: examples from different contexts were both given generic verbs, and this facilitated analogical retrieval relative to when both examples were given context-specific verbs. Of interest for understanding the mechanism, Clement and colleagues found comparably strong results even if the generic verbs in the two examples were not the same, but merely synonymous. The issue of generic encodings also arises in the work on schemas in problem solving. Here, people derive schemas initially, and then later attempt to solve a problem, set in a specific context. Having generated the schema earlier facilitates problem solving, presumably because of the retrieval and application of the schema. It is possible that the schema is retrieved because having generated a schema makes people more likely to encode the later problem using generic rather than context-specific representations of the underlying structure. To some extent this begs the question of how people know to use a more generic encoding of the seemingly specific problem, but perhaps one could argue that people’s encoding vocabularies have changed as a result of learning the schema. Alternatively, or in addition, it could be that generic encodings not only match other generic encodings, but can also match specific encodings. For example, using the materials of Clement et al. (1994), perhaps examples with generic verbs, like ‘‘fixed’’ and ‘‘replaced,’’ would retrieve prior analogous examples with context-specific verbs, like ‘‘edit’’ and ‘‘typed in.’’ This would be an interesting area for future research. Some kind of predicate decomposition (Gentner & Wolff, 2000), minimal ascension to a superordinate category (Falkenhainer, 1990), or distributed representation scheme (as used by the LISA model, Hummel & Holyoak, 1997) seems necessary to allow a generic encoding to match a specific encoding.
166
Jeffrey Loewenstein
The weighting of aspects of a representation also has attendant complexities. Specifically, it suggests that there could be a difference between schemas or generalizations on the one hand and examples on the other. This is because examples will necessarily include surface properties. If the example is sufficiently large and complex, it could also include extraneous underlying structure distinct from the main thrust of the example. If normalizing the total weight of an example to one is an appropriate assumption, then this suggests that removing surface properties and tangential underlying structure will result in increasing the weight on the main underlying structure in the example. This, in turn, will increase the likelihood of the example being retrieved as an analogy. If this is so, the more that examples are turned into schemas or generalizations that encode primarily underlying structure and little in the way of surface properties, the more they should be able to be retrieved as analogies, and serve as general purpose rules (Gentner & Medina, 1998). Alternatively, it could be that the normalizing assumption needs to be made more complex, such that people can maintain the surface properties in examples but lessen their weight (or learn their irrelevance; Ross & Kennedy, 1990). This would allow examples to function like schemas without forcing them to lose all context specificity. A more general complexity to note is that analogical retrieval must be able to succeed on the basis of matches of partial underlying structures, rather than entire systems of underlying structure (Johnson & Seifert, 1992; Kurtz & Loewenstein, 2007). For example, the saying ‘‘don’t count your chickens before they are hatched’’ might come to mind upon hearing a story about a man making elaborate preparations to spend an inheritance of a still-living relative, even before finding out that the relative did not include the man in her will (cf., Johnson & Seifert). More generally, analogical problem solving hinges on retrieving prior examples that serve as bases from which people can infer solutions for their current problems. Accordingly, the match between a prior example and a current problem must be based on only partial underlying structures, because the current problem has no solution to include in the matching process. The prior example and the current problem must match on the basis of the description of the problem situation, even if it is the solution that is the purpose of having retrieved the analogy. Schemas might be useful for providing generic descriptions of solutions, but unless people run through a list of all possible solutions they know, this effect can only be important for applying the schema, not for retrieving it. Although analogical retrieval is relatively rare, it is not because underlying structures need to match completely, just substantively. The final point to highlight about analogical retrieval is that there is nothing in encoding specificity and in the matching processes in computational models of analogical retrieval that makes them directional. There is a fundamental asymmetry in memory retrieval in the sense that there is
How One’s Hook Is Baited Matters for Catching an Analogy
167
one probe and a vast number of items stored in memory, but this is a distinct issue from the process of evaluating any given match between the probe and a given example in memory. Even though it is the match between the probe and the stored, initial example that should matter, nearly all the research on analogical retrieval has examined changes to the encoding of the initial examples. However, all the main concerns (generic encodings, complete underlying structures, minimal surface properties, emphasized or weighted underlying structure) could just as readily be applied to the probe example. Maybe the same factors that allow comparison and other interventions to foster forward transfer to new problems can also foster backward retrieval of previously learned analogous examples. It is possible that transfer does not go backward, and that prior research has been correct to focus on the encoding of initial examples. Perhaps more generic encodings of early examples change the encoding vocabulary used to encode later examples, and this is why the initial examples are later recalled. A more generic encoding of a probe example presumably would not retroactively change how the content of previously learned examples are encoded, and hence might not be effective. Still, if encoding specificity guides retrieval, then it is possible that altering the encodings of probe items to emphasize their underlying structure could influence the rate of retrieving analogous examples.
4. Facilitating the Retrieval of Analogies at Retrieval Time My colleagues and I (Gentner et al., 2004, 2009; Kurtz & Loewenstein, 2007) set out to examine whether changing people’s encoding of an example could influence their ability to retrieve an analogy already stored in memory. We chose comparison as a means for changing people’s encodings of an example, as it is well tested and robustly fosters complete and generic encodings that deemphasize surface properties. Also, we know that people encounter comparisons in the course of their work, their education, and even through art and literature; it is possible that those comparisons trigger not only ideas for the future but also reflections on prior analogous examples. In our studies, rather than have people compare two examples then test whether they were able to retrieve that learning in response to some subsequent example or problem, we instead asked people to compare two examples and tested whether this facilitated retrieving a previously learned analogous example. We found that it can.
168
Jeffrey Loewenstein
4.1. The ‘‘Own Memory’’ Studies: Retrieving Analogies from Autobiographical Memory Some research starts in the lab, but this research started by examining management consultants from a major firm as they were learning negotiation skills through a professional training seminar. This is a compelling sample and setting for several reasons. We were interested in people’s ability to retrieve analogies, and this is a task that management consultants are paid to do. They are general problem solvers, often working in multiple industries, with different kinds of companies, in different parts of the country and even different countries. As a result, they have a professional need to retrieve analogies as they confront new problems. Still, the consultants we worked with were engaged in a training seminar, and so learning about a domain, negotiation, about which they were familiar but not expert. Accordingly, we could still examine them learning and retrieving examples with underlying structures that they had not already mastered. Also, because they were engaged in a training seminar they found important and useful to their careers, and training alongside their peers at their organization, they were highly motivated to learn, to impress their peers, and to avoid embarrassing themselves. Finally, working with this group, we could assess their retrievals of examples learned days, months, or even years earlier because they had familiarity with the domain under study. That is, we could examine people’s analogical retrievals from their own memories as they struggled to learn something important to them. The limitation, of course, is that we do not know how the stored examples were initially encoded. But this approach does allow us to examine a natural situation of significant importance: given scattered learning over a period of years, what likelihood is there for retrieving an analogy to a current example. The domain under study was negotiation, which, as was mentioned earlier, is a compelling domain to study analogical retrieval because the same underlying structures recur in widely different contexts. It is also a topic about which our participants were highly motivated to learn. We examined people’s performance retrieving tradeoffs in one study, but in most studies, we examined people’s performance retrieving contingent contracts, or agreements whose terms vary according to the outcome of a future event. There are many standardized subtypes, such as bets, options contracts, and pay-per-performance contracts, as well as countless idiosyncratic examples. It is a class of negotiated agreement structures with known properties that can be used to motivate performance, manage risk, overcome biased perceptions, bridge timing differences, and test for deception (Bazerman & Gillespie, 1999). In prior research in the negotiation context using contingent contracts and tradeoffs with undergraduates, MBA students, and executives as participants, my colleagues and I have found that encoding
How One’s Hook Is Baited Matters for Catching an Analogy
169
training examples by drawing comparisons between them facilitates forward knowledge transfer (Gentner et al., 2003; Loewenstein et al., 1999, 2003; Thompson et al., 2000). In most of these studies, we showed that drawing comparisons facilitated later using the agreement structure from the examples in a face-to-face negotiation. In the ‘‘own memory’’ studies, we were focused on whether drawing comparisons facilitated retrieving analogous examples from autobiographical memory. In our first study, we randomly assigned consultants to analyze two examples of the contingent contract structure either one example at a time, or both examples together with an instruction to compare. After writing about the examples we provided, the consultants retrieved matching examples. About 10% did not retrieve any example, which, given the sample and the notes many of those 10% wrote, mostly indicates that they could not retrieve an example they felt had a sufficiently close match to the examples we gave them. Analogical retrieval is indeed difficult. Still, the comparison manipulation had a clear effect. In the probe comparison group, 55% retrieved a complete analogy match, whereas 35% in the separate case analysis group did so. In studies replicating and extending the effect with less experienced samples, we found similarly substantial differences: 74% versus 29% for masters of accounting students (with about 1 year of work experience), and 50% versus 16% for MBA students (with about 5 years of work experience; these last percentages are collapsed across two rounds of retrievals, one for contingent contract examples and one for tradeoff examples). Drawing comparisons between memory probes at the time of retrieval facilitates analogical retrieval from one’s own memory. The data further support linking analogical retrieval to the quality of participants’ encodings of the memory probes, just as the quality of participants’ encodings of training examples has been shown to predict forward transfer (e.g., Gick & Holyoak’s, 1983; Loewenstein et al., 1999). From participants’ written descriptions of the examples, we derived a measure of the completeness of their understandings of the underlying structure in the examples. (It would be interesting in further research to code how generic people’s descriptions are and how many surface properties they list.) We found that on average, the understandings from the probe comparison group were rated as more complete than the understandings from the separate case analysis group, and that completeness was correlated with analogical retrieval. Thus, across over 250 participants with three different levels of domain experience, we have clear support that drawing a comparison between examples spurs more complete descriptions of the underlying structures, and in turn more analogical retrievals (Gentner et al., 2009). Fostering more complete understandings of the underlying structure in probe examples can facilitate retrieving analogies from one’s own memory.
170
Jeffrey Loewenstein
4.2. The Controlled Memory Set Studies A second set of studies examined analogical retrieval from a controlled memory set, rather than from people’s autobiographical memories. This allowed us to examine whether the analogies retrieved from people’s own memories were dependent on any special sort of initial encoding. It is possible that the retrieval advantage found in the ‘‘own memory’’ studies only holds for stored examples that people encoded in an unusually effective manner (i.e., with complete, generic underlying structures and few surface properties). The simplest controlled memory set study involved presenting undergraduate participants with a set of seven examples of negotiations, inserting a brief, filled delay, then having them be in either the probe comparison or probe separate cases group (Gentner et al., 2009). Rates of analogical retrieval were considerably lower than in the own memory studies, but still there was an advantage of probe comparison (27% analogical retrieval) over probe separate case analysis (6%). This suggests that even routinely encoded examples can be later retrieved on the basis of underlying structure, provided people have encoded the probe example effectively. The task in the studies described thus far was simply to recall a prior matching example, but as the earlier discussion indicated, a primary dependent measure used to study analogical retrieval is problem-solving transfer. For this reason, Ken Kurtz and I (Kurtz & Loewenstein, 2007) studied retrieval-time encoding effects during problem solving. Specifically, we used Gick and Holyoak’s (1980, 1983) classic convergence materials for studying analogical problem solving: Duncker’s radiation problem whose solution requires the use of multiple, low-power rays arranged to converge on and eliminate an inoperable tumor, and an analogous story about a general who arranges to have multiple small groups of troops converge on and capture a fortress. The difference between our studies and prior research was that, instead of focusing on how participants encoded the fortress story initially, we focused on how participants encoded the tumor problem. Participants read the fortress story, and then were given one of a variety of tasks. The critical group was asked to compare the tumor problem with a second, analogous problem (Gick and Holyoak’s Red Adair story, turned into a problem whose solution hinges on how to extinguish a fire using many small hoses simultaneously). Then participants were asked to solve the problems. If the comparison led participants to generate a more complete or more generic encoding of the underlying structure of the problem, they should be more likely than other participants not engaging in a comparison to retrieve the earlier fortress story and generate convergence solutions. In support of this logic, we found that participants who had compared problems were more likely to generate convergence solutions to the tumor problem than participants who only received the tumor problem (54% vs. 15%; Experiment 1) or who worked on the two problems separately
How One’s Hook Is Baited Matters for Catching an Analogy
171
(38% vs. 15%; Experiment 2). As an extension, we replicated this second study, contrasting the problem comparison and separate problems conditions, using a roughly 20 min, filled delay between reading the initial fortress story and later attempting to solve the problems (Gentner et al., 2009). We found a similar pattern of results, albeit with a lower overall rate of solutions. Those who compared the two problems (31%) were more likely to solve the tumor problem than those who read the problem separately (5%). These studies are consistent with the claim that an improved encoding of problem descriptions, not solution descriptions, can facilitate analogical retrieval. Unlike for studies of forward transfer, when people are comparing stories in support of later problem solving, in these last studies the comparison was between the problems themselves, which meant we needed an additional kind of control. We needed to test whether comparing two problems, in itself, was the basis of the problem-solving advantage, rather than it being an advantage because it facilitated retrieving the earlier analogous story. Accordingly, we contrasted two groups that both compared the two problems, but one group did and one group did not first read the original fortress story (Kurtz & Loewenstein, 2007). The evidence points to retrieving the earlier story as the key advantage, as those receiving the fortress story analogy (51%) were more likely to solve the tumor problem than those not receiving it (34%). Further evidence in support of analogical retrieval comes from a post-task questionnaire. Those who had read the fortress story initially were asked how they attempted to solve the problems. Of those generating a convergence solution to the tumor problem, 53% said they retrieved an earlier story to use as an analogy, whereas only 15% of those who did not generate a convergence solution reported retrieving an analogy. Taken together, these problem-solving studies suggest that people can spontaneously retrieve and use a prior, single example on the basis of underlying structure, provided they have an effective encoding of the problem they are trying to solve.
4.3. MAC/FAC Simulation Modeling We conducted a series of four simulation studies to help examine the plausibility of obtaining different rates of analogical retrievals based on how examples are encoded (Gentner et al., 2009). The goal was to examine whether a consistent retrieval process could account for both the previously established forward transfer effects based on using comparison to encode training examples stored in memory and the new analogical retrieval effects based on using comparison to encode examples used as memory probes. It was not feasible to simulate the ‘‘own memory’’ studies directly; generating vast numbers of knowledge representations in a new domain is nontrivial. Instead, we took the approach of addressing the theoretical issues using arguably the largest and most widely used set of knowledge
172
Jeffrey Loewenstein
representations used in analogical retrieval studies, the ‘‘Karla the Hawk’’ materials (Gentner et al., 1993). We added to this story set by generating additional analogs (there were already two analogs per story set; we added a third for each set), and generating schemas (one per story set, from two of the analogs). Then we examined the retrievals generated if a story and if a schema were used as probe items (to test the new analogical retrieval effects), or if a schema was stored in memory (to test forward transfer effects). Consistent with the experimental studies, we showed that schemas used as probe items tended to retrieve analogous examples, schemas used as memory items tended to be retrieved by examples used as probes, and examples as probe items tended to retrieve examples with similar surface properties rather than analogous examples. The interesting part of simulation studies though is why they generate the pattern of results they do. In these simulations, schemas had complete representations of the underlying structure in examples, and few surface properties. Examples also had complete representations of the underlying structure, and many surface properties. Thus, in these simulations, surface feature matches and low weights on underlying structure are what led the examples to be relatively ineffective for analogical retrieval. The absence of surface feature matches and the high weights on underlying structure are what led the schemas to be effective for analogical retrieval, both when serving as probes retrieving analogous examples, and when in memory being retrieved by analogous examples. We did not simulate, but it would be straightforward to show, that MAC/FAC would generate a similar pattern if we had manipulated the completeness of the underlying structures rather than their weights. It is less straightforward to examine the role of generic versus context-specific representations of the underlying structure in examples; this would be a worthwhile endeavor for future modeling work to consider. Even so, just from these simulations, it is notable that the same mechanisms that can account for schema abstraction and forward transfer can be parsimoniously extended to account for probe encoding and analogical retrieval. Forward transfer need not be entirely due to learning to encode examples more effectively or due to encoding solutions effectively. Taken together, the theoretical, experimental, and simulation work my colleagues and I have done combine to establish a new avenue for research. It is possible to facilitate analogical retrieval without changing how people initially encode examples. This is not to say that the quality of initial encodings is irrelevant; that is just an already established point. Instead, it was an open question as to whether people needed to have high-quality encodings of stored examples to be able to retrieve those examples later on the basis of underlying structure. This does not appear necessary. Effectively encoding current examples supports analogical retrieval to examples encoded in mundane ways.
How One’s Hook Is Baited Matters for Catching an Analogy
173
5. Implications The most straightforward implication of the findings on comparing probe items facilitating analogical retrieval is that inert knowledge can be revived. Examples learned prior to completely or generically understanding underlying structures and prior to understanding which surface properties are unimportant can nonetheless be retrieved later on the basis of an analogy to a well-encoded probe. To elaborate on this central implication, it is useful to return to the earlier discussions of problem solving, creativity, and the acquisition of domain knowledge.
5.1. Implications for Problem Solving and Creativity The studies just discussed showed effects of comparing examples and problems on analogical retrieval and problem solving. One striking possibility raised, particularly by the Kurtz and Loewenstein (2007) studies, is that sometimes having two problems is better than having one problem. Still, in these studies participants were provided with problems or examples to compare. It is possible for people to generate or notice effective comparisons themselves, but it seems far more likely that most of the time, people will have a single current situation or problem to address. Consequently, the key point for problem solving and creativity may well turn out to be that these studies help to establish that altering encodings of probe examples can increase analogical retrievals. As people are unlikely to generate a single, effective comparison on their own, it will be up to future research to show how to exploit the encoding of probe examples using interventions that individuals can more readily implement themselves when attempting to solve problems or generate creative outcomes. I have discussed multiple interventions that can influence the completeness of, generic articulation of, and emphasis on underlying structure (cf., Burstein, Collins, & Baker, 1991). For example, writing out one’s problem and then systematically replacing the context-specific verbs with more generic verbs, following Clement et al.’s (1994) work, could enhance analogical retrieval in the service of deriving new solutions or creative outcomes. Self-explanations or joint explanations with others could also help to foster a focus on underlying structure in examples, which would enhance analogical retrieval. Although the explanation process is unlikely to include generating perfect comparisons, it is possible that in attempting to explain a problem, people will generate a series of comparisons with partial matches, and in so doing refine their understandings and emphasize underlying structure. Considering how to explain a problem to multiple audiences might also encourage people to alter their encodings in useful ways. One reason is that this will lead people to highlight or entertain questions
174
Jeffrey Loewenstein
about different aspects of the problem. It might also encourage a focus on gist and away from surface details. Studying these and other interventions and assessing the different kinds of variation in encoding they induce would provide the basis for establishing a flexible array of interventions that people could use, on their own, to facilitate analogical retrieval and thereby advance problem solving and creativity. Within research on creativity and innovation, there is a tradition of focusing on how people are encoding the task or problem at hand (e.g., Csikszentmihalyi & Getzels, 1971). But it has mainly been more recently that encodings of one’s problem have been clearly linked to analogical retrieval. Markman et al. (2009) discuss several means for fostering what I have been calling complete, generic encodings of problems’ underlying structure. Ward (2009) emphasizes using synonyms and category taxonomies to generate multiple alternative encodings of a problem. This should enable people to capitalize on the ambiguity of the problem’s underlying structure and retrieve a broader array of analogies. There are also suggestions that generating counterfactuals can influence creative outcomes (e.g., Markman, Lindberg, Kray, & Galinsky, 2007). The implication is that considering not just the underlying structure in the example, but also closely related underlying structures, might be useful. The larger point is that analogical retrievals in the service of creativity and innovation need not be left to chance, but can be directly and systematically fostered. A further possibility, over and above the prior mentioned interventions for influencing people’s encodings, is to consider the potential effects of construal level (Trope & Liberman, 2003). Construal level theory suggests that, roughly, if people perceive events to be distant, they will encode them in more abstract ways, whereas if people perceive events to be proximal, they will encode them in more detailed ways. Construal level influences similarity judgments (Day & Bartels, 2008). Consequently, it is feasible that construal level could influence probe encodings and hence analogical retrieval. It is possible that distant construals would yield more analogical retrievals than proximal construals, if distant construals place greater weight on abstract, underlying structure or reduce attention to surface properties. The caveat is that distance could lead to oversimplification. I am unaware of research on construal level effects on how people encode examples with complicated underlying structures. All of these are suggestions about how to use the pathway established by the new research showing that changing the encoding of a current example can change the likelihood of retrieving analogies from memory. The studies themselves presented people with appropriate comparisons as a means to usefully alter their encodings of current examples, even though people themselves are unlikely to have generated such appropriate examples on their own. Still, there are multiple alternative interventions apart from comparison that are easier for individuals to implement on their own.
How One’s Hook Is Baited Matters for Catching an Analogy
175
5.2. Implications for the Acquisition of Domain Knowledge An exciting phenomenon that arises in teaching is when a student, upon learning some principle, has an immediate reminding: ‘‘that explains a puzzling event I never understood’’ or ‘‘that happened to me once!’’ The studies just discussed provide an explanation for this phenomenon. Students are being guided to generate effective representations of a current example that then triggers an analogical retrieval. More generally, the phenomenon points to a new occasion and pathway for integrating domain knowledge. If generating more effective encodings of current examples makes those examples more useful bases for retrieving analogous examples from memory, then any occasion of insight into a current example can be a trigger to useful reflection. Specifically, a useful comparison generated by an instructor or on one’s own, an act of self-explanation, or the use of any of the other interventions previously discussed could yield a more complete, generic encoding emphasizing underlying structure. This could prompt a reminding to an analogous example that was stored in an ordinary fashion. This in turn could prompt the re-encoding of the prior example using the more complete and generic encoding of the underlying structure. The result could be the beginnings or the further refinement of a sophisticated category for understanding the domain. Although this account is tentative, the studies described earlier have already provided evidence for all but the last of the steps in this chain. This chain could be the basis for ongoing, routine bootstrapping of current insights to revisit and reorganize, rather than leave unused, prior learning. It is a means for knowledge integration. In emphasizing knowledge integration and a specific chain of events for integrating new insights with old knowledge, I am finessing a debate in the conceptual change literature (e.g., diSessa & Sherin, 1998; Smith, 2007) about what is changing and how much changes in conceptual change. The sort of knowledge integration through analogical retrieval and re-encoding of old knowledge I am discussing could yield minor updates to one’s domain knowledge. It could produce changes to domain knowledge slowly, leaving domain knowledge fragmented, as knowledge from various contexts within the domain need not be integrated at the same time. It is also possible that the chain of events I have described could generate differences in degrees of understanding that, after sufficient numbers of passes through the cycle occur, generate qualitative differences in the kind of knowledge people use to encode domain examples. A gradual changing over of knowledge mainly focused on surface properties and contextspecific encodings of underlying structure could shift to knowledge mainly focused on generic encodings of underlying structure with qualitatively different possibilities. Finally, it is also possible that the knowledge integration process I have outlined could be a means by which a striking or radical insight of the sort Carey (2004) discusses in the domain of number could
176
Jeffrey Loewenstein
have a broad and systemic influence by triggering the analogical retrieval and re-encoding of a whole range of old knowledge. This could be coupled with a broader effort of deliberate self-explanation and reflection-driven conceptual change (e.g., de Leeuw & Chi, 2003). My point is that the mechanism my colleagues and I have described, whereby an effective encoding can retrieve analogous examples that were encoded in less effective ways, provides a means for new understandings to change and integrate prior knowledge, thereby extending the reach and impact of those new understandings. This discussion is aimed at highlighting a possibility for how conservative learning can be later revisited once people develop more sophisticated domain understandings. It is worth considering how thoroughly the old examples are re-encoded in light of the new, more complete, more generic understandings. The reason is that new insights can be wrong, still later learning can suggest alternative generic encodings, and more generally it is plausible for there to be multiple waves of conceptual change. There are many reasons; conceptual change research has tended to focus on basic mathematical and scientific understandings. Yet the theories presumably apply just as much to still changing areas of science, social science, and the humanities. These are areas with less stability in the systems of generic underlying structures in use by experts. This line of thinking gives new reason to consider whether, for example, drawing comparisons alters people’s encodings of the examples themselves, generates schemas or abstract principles that are distinct from the examples themselves, both, or something else. The effects may be highly similar for the local act of solving a current problem, but they may quite distinct at the level of acquiring and re-acquiring domain expertise, particularly for still-changing domains. The recent research my colleagues and I have done suggests people do not have to learn it right the first time, but it is not yet clear how many times people can relearn it.
6. Conclusion Memory provides an ocean of examples from which people draw to act intelligently. Usually people reach close at hand for an example, and for many mundane tasks this yields satisfactory outcomes. But for more challenging problems, for which expertise matters, people often need to cast their lines farther to retrieve a useful analogy. Prior research has emphasized efforts at bringing those other examples closer at hand. The new research presented here suggests it is also possible to better bait one’s hook such that a longer cast catches a keeper.
How One’s Hook Is Baited Matters for Catching an Analogy
177
REFERENCES Ahn, W. K., Brewer, W., & Mooney, R. (1992). Schema acquisition from a single example. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 391–412. Amabile, T. M. (1996). Creativity in context. Boulder, CO: Westview Press. Bassok, M., & Holyoak, K. J. (1989). Interdomain transfer between isomorphic topics in algebra and physics. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 153–166. Bazerman, M. H., & Gillespie, J. J. (1999). Betting on the future: The virtues of contingent contracts. Harvard Business Review, 77(5), 155–160. Benyus, J. M. (1997). Biomimicry: Innovation inspired by nature. New York: William Morrow. Blessing, S. B., & Ross, B. H. (1996). Content effects in problem categorization and problem solving. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22(3), 792–810. Brown, A. L. (1989). Analogical learning and transfer: What develops? In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 369–412). New York? Cambridge University Press. Brown, A. L. (1990). Domain-specific principles affect learning and transfer in children. Cognitive Science, 14, 107–133. Burstein, M. H., Collins, A., & Baker, M. (1991). Plausible generalization: Extending a model of human plausible reasoning. The Journal of the Learning Sciences, 1(3–4), 319–359. Burt, R. S. (2004). Structural holes and good ideas. American Journal of Sociology, 110(2), 349–399. Campbell, D. T. (1960). Blind variation and selective retention in creative thought as in other knowledge processes. Psychological Review, 67(6), 380–400. Carey, S. (1991). Knowledge acquisition: Enrichment or conceptual change? In S. Carey & R. Gelman (Eds.), The epigenesis of mind: Essays on biology and cognition (pp. 257–291). Hillsdale, NJ: Erlbaum. Carey, S. (2004). Bootstrapping and the origin of concepts. Daedalus, 133(1), 59–68. Catrambone, R., & Holyoak, K. J. (1989). Overcoming contextual limitations on problemsolving transfer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(6), 1147–1156. Chi, M. T. H., Feltovitch, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121–152. Chi, M. T. H., & Ohlsson, S. (2005). Complex declarative learning. In K. J. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 371–399). New York: Cambridge University Press. Christensen, B. T., & Schunn, C. D. (2005). Spontaneous access and analogical incubation effects. Creativity Research Journal, 17(2), 207–220. Christensen, B. T., & Schunn, C. D. (2009). ‘‘Putting blinkers on a blind man’’: Providing cognitive support for creative processes with environmental cues. In A. B. Markman & K. L. Wood (Eds.), Tools for innovation: The science behind the practical methods that drive new ideas (pp. 48–74). New York: Oxford University Press. Clark, H. H. (1996). Using language. New York: Cambridge University Press. Clement, C. A., Mawby, R., & Giles, D. E. (1994). The effects of manifest relational similarity on analog retrieval. Journal of Memory and Language, 33, 396–420. Csikszentmihalyi, M., & Getzels, J. W. (1971). Discovery-oriented behavior and the originality of creative products: A study with artists. Journal of Personality and Social Psychology, 19(1), 47–52. Day, S. B., & Bartels, D. M. (2008). Representation over time: The effects of temporal distance on similarity. Cognition, 106, 1504–1513.
178
Jeffrey Loewenstein
de Leeuw, N., & Chi, M. T. H. (2003). The role of self-explanation in conceptual change learning. In G. Sinatra & P. Pintrich (Eds.), Intentional conceptual change (pp. 55–78). Mahwah, NJ: Erlbaum. DeLoache, J. S. (1995). Early symbol understanding and use. In D. Medin (Ed.), The psychology of learning and motivation, Vol. 32. New York: Academic Press. diSessa, A., & Sherin, B. (1998). What changes in conceptual change? International Journal of Science Education, 20, 1155–1191. Dixon, J. A., & Bangert, A. S. (2004). On the spontaneous discovery of a mathematical relation during problem solving. Cognitive Science, 28, 433–449. Dugosh, K. L., Paulus, P. B., Roland, E. J., & Yang, H.-C. (2000). Cognitive stimulation in brainstorming. Journal of Personality and Social Psychology, 79(5), 722–735. Dunbar, K. (1995). How scientists really reason: Scientific reasoning in real-world laboratories. In R. J. Sternberg & J. Davidson (Eds.), Mechanisms of insight (pp. 365–395). Cambridge MA: MIT press. Dunbar, K., & Blanchette, I. (2001). The in vivo / in vitro approach to cognition: The case of analogy. Trends in Cognitive Sciences, 5(8), 334–339. Falkenhainer, B. (1990). A unified approach to explanation and theory formation. In J. Shrager & P. Langley (Eds.), Computational models of scientific discovery and theory formation (pp. 157–196). Los Altos, CA: Morgan Kaufmann. Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. Washington, DC: American Psychological Association. Finke, R. A., Ward, T. B., & Smith, S. M. (1992). Creative cognition: Theory, research, and application. Cambridge, MA: MIT Press. Forbus, K. D., Gentner, D., & Law, K. (1995). MAC/FAC: A model of similarity-based retrieval. Cognitive Science, 19(2), 141–205. Gentner, D. (1982). Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In S. A. Kuczaj (Ed.), Language, thought and culture (pp. 301–334). Language development, Vol. 2, (pp. 301–334). Hillsdale, NJ: Lawrence Erlbaum Associates. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155–170. Gentner, D. (1988). Metaphor as structure mapping: The relational shift. Child Development, 59, 47–59. Gentner, D. (1989). Mechanisms of analogical learning. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning (pp. 199–241). New York: Cambridge University Press. Gentner, D., & Boroditsky, L. (2001). Individuation, relativity and early word learning. In M. Bowerman & S. Levinson (Eds.), Language acquisition and conceptual development (pp. 215–256). Cambridge, UK: Cambridge University Press. Gentner, D., & Bowerman, M. (2009). Why some spatial semantic categories are harder to learn than others: The typological prevalence hypothesis. In J. Guo, E. Lieven, S. Ervin¨ zc¸aliskan, & K. Nakamura (Eds.), Cross-linguistic approaches to the Tripp, N. Budwig, S. O psychology of language: Research in the tradition of Dan Isaac Slobin (pp. 465–480). New York: Lawrence Erlbaum Associates. Gentner, D., Brem, S., Ferguson, R. W., Markman, A. B., Levidow, B. B., Wolff, P., et al. (1997). Analogical reasoning and conceptual change: A case study of Johannes Kepler. The Journal of the Learning Sciences, 6(1), 3–40. Gentner, D., & Kurtz, K. (2005). Relational categories. In W. K. Ahn, R. L. Goldstone, B. C. Love, A. B. Markman, & P. W. Wolff (Eds.), Categorization inside and outside the lab (pp. 151–175). Washington, DC: American Psychological Association. Gentner, D., & Loewenstein, J. (2002). Relational language and relational thought. In E. Amsel & J. Byrnes (Eds.), Language, literacy, and cognitive development: The development and consequences of symbolic communication (pp. 87–120). Mahwah, NJ: Erlbaum.
How One’s Hook Is Baited Matters for Catching an Analogy
179
Gentner, D., Loewenstein, J., & Hung, B. (2007). Comparison facilitates children’s learning of names for parts. Journal of Cognition and Development, 8(3), 285–307. Gentner, D., Loewenstein, J., & Thompson, L. (2003). Learning and transfer: A general role for analogical encoding. Journal of Educational Psychology, 95(2), 393–408. Gentner, D., Loewenstein, J., & Thompson, L. (2004). Analogical encoding: Facilitating knowledge transfer and integration. In K. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the twenty-sixth annual conference of the Cognitive Science Society (pp. 452–457). Chicago, IL: Cognitive Science Society. Gentner, D., Loewenstein, J., Thompson, L., & Forbus, K. D. (2009). Reviving inert knowledge: Analogical abstraction supports relational retrieval of past events. Cognitive Science, 33, 1343–1382. Gentner, D., & Markman, A. B. (1997). Structure mapping in analogy and similarity. American Psychologist, 52, 45–56. Gentner, D., & Medina, J. (1998). Similarity and the development of rules. Cognition, 65, 263–297. Gentner, D., & Rattermann, M. J. (1991). Language and the career of similarity. In S. A. Gelman & J. P. Brynes (Eds.), Perspective on thought and language: Interrelations in development (pp. 225–277). New York: Cambridge University Press. Gentner, D., Rattermann, M. J., & Forbus, K. D. (1993). The roles of similarity in transfer: Separating retrievability and inferential soundness. Cognitive Psychology, 25, 524–575. Gentner, D., & Toupin, C. (1986). Systematicity and surface similarity in the development of analogy. Cognitive Science, 10, 277–300. Gentner, D., & Wolff, P. (2000). Metaphor and knowledge change. In E. Dietrich & A. Markman (Eds.), Cognitive dynamics: Conceptual change in humans and machines (pp. 295–342). Mahwah, NJ: LEA. Gick, M. L., & Holyoak’s, K. J. (1980). Analogical problem solving. Cognitive Psychology, 12, 306–355. Gick, M. L., & Holyoak’s, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1–38. Goldstone, R. L., & Son, J. Y. (2005). The transfer of scientific principles using concrete and idealized simulations. The Journal of the Learning Sciences, 14(1), 69–110. Goldstone, R. L., & Wilensky, U. (2008). Promoting transfer through complex systems principles. Journal of the Learning Sciences, 26(1), 465–516. Hammond, K. J., Siefert, C. M., & Gray, K. C. (1991). Functionality in analogical transfer: A hard match is good to find. The Journal of the Learning Sciences, 1, 11–152. Hargadon, A. B., & Sutton, R. I. (1997). Technology brokering and innovation in a product development firm. Administrative Science Quarterly, 42, 716–749. Hesse, M. B. (1966). Models and analogies in science. Notre Dame, IN: University of Notre Dame Press. Higgins, E. T. (1996). Knowledge activation: Accessibility, applicability, and salience. In E. T. Higgins & A. W. Kruglanski (Eds.), Social psychology: Handbook of basic principles (pp. 133–168). New York: Guilford Press. Higham, P. A. (2002). Strong cues are not necessarily weak: Thompson and Tulving (1970) and the encoding specificity principle revisited. Memory & Cognition, 30(1), 67–80. Hofstadter, D.The Fluid Analogies Research Group. (1995). Fluid concepts and creative analogies. New York: Basic Books. Holyoak, K. J., & Thagard, P. (1995). Mental leaps: Analogy in creative thought. Cambridge, MA: MIT Press. Hummel, J. E., & Holyoak, K. J. (1997). Distributed representations of structure: A theory of analogical access and mapping. Psychological Review, 104, 427–466. Hummel, J. E., & Holyoak, K. J. (2003). A symbolic-connectionist theory of relational inference and generalization. Psychological Review, 110, 220–264.
180
Jeffrey Loewenstein
Johnson, H. M., & Seifert, C. M. (1992). The role of predictive features in retrieving analogical cases. Journal of Memory and Language, 31, 648–667. Keller, J., & Loewenstein, J. (2010). The cultural category of cooperation: A Cultural Consensus Model analysis for China and the US. Organization Science, (in press). Koestler, A. (1969). The act of creation. New York: Macmillan. Kolodner, J. (1993). Case-based reasoning. San Francisco, CA: Morgan Kaufmann. Kotovsky, L., & Gentner, D. (1996). Comparison and categorization in the development of relational similarity. Child Development, 67, 2797–2822. Kurtz, K. J., & Loewenstein, J. (2007). Converging on a new role for analogy in problem solving and retrieval: When two problems are better than one. Memory & Cognition, 35(2), 334–341. Lewandowsky, S., Kalish, M., & Griffiths, T. L. (2000). Competing strategies in categorization: Expediency and resistance to knowledge restructuring. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26(6), 1666–1684. Loewenstein, J., & Gentner, D. (2001). Spatial mapping in preschoolers: Close comparisons facilitate far mappings. Journal of Cognition and Development, 2(2), 189–219. Loewenstein, J., & Gentner, D. (2005). Relational language and the development of relational mapping. Cognitive Psychology, 50, 315–363. Loewenstein, J., Thompson, L., & Gentner, D. (1999). Analogical encoding facilitates knowledge transfer in negotiation. Psychonomic Bulletin & Review, 6(4), 586–597. Loewenstein, J., Thompson, L., & Gentner, D. (2003). Analogical learning in negotiation teams: Comparing cases promotes learning and transfer. Academy of Management Learning and Education, 2(2), 119–127. Markman, K. D., Lindberg, M. J., Kray, L. J., & Galinsky, A. D. (2007). Implications of counterfactual structure for creative generation and analytical problem solving. Personality and Social Psychology Bulletin, 33, 312–324. Markman, A. B., & Ross, B. H. (2003). Category use and category learning. Psychological Bulletin, 129(4), 592–613. Markman, A. B., Taylor, E., & Gentner, D. (2007). Auditory presentation leads to better analogical retrieval than written presentation. Psychonomic Bulletin and Review, 14(6), 1101–1106. Markman, A. B., Wood, K. L., Linsey, J. S., Murphy, J. T., & Laux, J. P. (2009). Supporting innovation by promoting analogical reasoning. In A. B. Markman & K. L. Wood (Eds.), Tools for innovation: The science behind the practical methods that drive new ideas (pp. 85–103). New York: Oxford University Press. Mason, L. (2007). Introduction: Bridging the cognitive and sociocultural approaches in research on conceptual change: Is it possible? Educational Psychologist, 42(1), 1–7. Medin, D. L., & Ross, B. H. (1989). The specific character of abstract thought: Categorization, problem-solving, and induction. In R. J. Sternberg (Ed.), Advances in the psychology of human intelligence, Vol. 5 (pp. 189–223). Hillsdale, NJ: Erlbaum. Mednick, S. A. (1962). The associative basis of the creative process. Psychological Review, 69(3), 220–232. Moss, J., Kotovsky, K., & Cagan, J. (2007). The Influence of Open Goals on the Acquisition of Problem-Relevant Information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 876–891. Myles-Worsley, M., Johnston, W. A., & Simons, M. A. (1988). The influence of expertise on x-ray image processing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(3), 553–557. Novick, L. (1988). Analogical transfer, problem similarity, and expertise. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 510–520. Paletz, S. B. F., & Schunn, C. D. (2010). A social-cognitive framework of multidisciplinary team innovation. Topics in Cognitive Science, 2(1), 73–95.
How One’s Hook Is Baited Matters for Catching an Analogy
181
Reeves, L. M., & Weisberg, R. W. (1994). The role of content and abstract information in analogical transfer. Psychological Bulletin, 115, 381–400. Ross, B. H. (1984). Remindings and their effects in learning a cognitive skill. Cognitive Psychology, 16, 371–416. Ross, B. H. (1987). This is like that: The use of earlier problems and the separation of similarity effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13(4), 629–639. Ross, B. H. (1989). Distinguishing types of superficial similarities: Different effects on the access and use of earlier problems. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(3), 456–468. Ross, B. H., & Kennedy, P. T. (1990). Generalizing from the use of earlier examples in problem solving. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 42–55. Ross, B. H., & Kilbane, M. C. (1997). Effects of principle explanation and superficial similarity on analogical mapping in problem solving. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(2), 427–440. Ryack, K., & Kida, T. (2006). Recall of financial information for investment decisions: The impact of encoding specificity and mental imagery. Journal of Behavioral Finance, 7(4), 214–221. Schank, R. (1982). Dynamic memory: A theory of reminding and learning in computers and people. Cambridge: Cambridge University Press. Schwartz, D. L. (1995). The emergence of abstract representations in dyad problem solving. The Journal of the Learning Sciences, 4(3), 321–354. Seifert, C. M., McKoon, G., Abelson, R. P., & Ratcliff, R. (1986). Memory connections between thematically similar episodes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12(2), 220–231. Seifert, C. M., Meyer, D. E., Davidson, N., Patalano, A. L., & Yaniv, I. (1995). Demystification of cognitive insight: Opportunistic assimilation and the prepared-mind hypothesis. In R. J. Sternberg & J. E. Davidson (Eds.), The nature of insight (pp. 65–124). Cambridge, MA: MIT Press. Simon, H. A. (1947). Administrative behavior. New York: Macmillan. Smith, C. L. (2007). Bootstrapping processes in the development of students’ commonsense matter theories: Using analogical mappings, thought experiments, and learning to measure to promote conceptual restructuring. Cognition and Instruction, 25(4), 337–398. Smith, C. L., Solomon, G. E. A., & Carey, S. (2005). Never getting to zero: Elementary school students’ understanding of the infinite divisibility of number and matter. Cognitive Psychology, 51, 101–140. Stern, E. (2005). Knowledge restructuring as a powerful mechanism of cognitive development: How to lay an early foundation for conceptual understanding in formal domains. In P. D. Tomlinson, J. Dockrell, & P. Winne (Eds.), Pedagogy—Teaching for learning, British Journal of Educational Psychology Monograph Series II , No. 3 (pp. 153–169). Leicester: British Psychological Society. Thompson, L., Gentner, D., & Loewenstein, J. (2000). Avoiding missed opportunities in managerial life: Analogical training more powerful than individual case training. Organizational Behavior and Human Decision Processes, 82(1), 60–75. Tomasello, M. (1999). The cultural origins of human cognition. Cambridge, MA: Harvard University Press. Trope, Y., & Liberman, N. (2003). Temporal construal. Psychological Review, 110, 403–421. Tulving, E. (1983). Elements of episodic memory. New York: Oxford University Press. Tulving, E., & Thompson, D. M. (1973). Encoding specificity and retrieval processes in episodic memory. Psychological Review, 80(5), 352–373.
182
Jeffrey Loewenstein
Vosniadou, S., & Brewer, W. F. (1987). Theories of knowledge restructuring in development. Review of Educational Research, 57(1), 51–67. Ward, T. B. (2009). ConceptNets for flexible access to knowledge. In A. B. Markman & K. L. Wood (Eds.), Tools for innovation: The science behind the practical methods that drive new ideas (pp. 153–170). New York: Oxford University Press. Weisberg, R. W. (1993). Creativity: Beyond the myth of genius. New York: Freeman. Wharton, C. M., Holyoak, K. J., Downing, P. E., Lange, T. E., Wickens, T. D., & Melz, E. R. (1994). Below the surface: Analogical similarity and retrieval competition in reminding. Cognitive Psychology, 26, 64–101. Whitehead, A. N. (1929). The aims of education. New York: Macmillian. Yamauchi, T., & Markman, A. B. (2000). Inference using categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 776–795.
C H A P T E R
F I V E
Generating Inductive Inferences: Premise Relations and Property Effects John D. Coley and Nadya Y. Vasilyeva Contents 184 184 187 190 191 191 193
1. Introduction 1.1. Inference Generation 1.2. Induction and Relations among Concepts 1.3. Goals of this Chapter 2. Effects of Premise Relations on Inference Generation 2.1. The Role of Premise Relations in Argument Evaluation 2.2. Relative Salience of Conceptual Relations 2.3. Study One: Investigating Effects of Premise Relations on Inference Generation 2.4. Summary: Effects of Premise Relations on Inference Generation 3. Effects of Property on Inference Generation 3.1. Property Effects in Argument Evaluation 3.2. Study Two: Investigating Effects of Property on Inference Generation 3.3. Summary: Effects of Property on Inference Generation 4. Inference Generation: Conclusions and Implications 4.1. What Have We Learned About Inference Generation? 4.2. Implications 4.3. Conclusions Acknowledgments References
194 202 203 203 205 216 217 217 219 224 224 224
Abstract Categorical inductive inference is the process by which we project features believed to be true of one class to another related class. Traditional approaches to studying inductive inference have focused on the evaluation of inductive arguments. In this chapter, we introduce a new approach by examining the way people generate inductive inferences. We focus on how relations among premise categories, and the nature of the property being projected, impact the kind Psychology of Learning and Motivation, Volume 53 ISSN 0079-7421, DOI: 10.1016/S0079-7421(10)53005-6
#
2010 Elsevier Inc. All rights reserved.
183
184
John D. Coley and Nadya Y. Vasilyeva
of inferences generated. Participants were taught that two animal species shared a novel substance, disease, or gene, and were asked what other species might also have the property, and why. Results show that people attend to salient relations between premise categories, determine their relevance based on the property they are asked to project, and then generate inferences consistent with those relations. Participants drew a broad range of inferences based on taxonomic similarity, contextual similarity, and causal relations. Inference generation was constrained both by salient premise relations and the nature of the projected property. We discuss how these findings expand the list of challenges for the models of induction, question the primacy of taxonomic relations in guiding inductive inference, encourage further investigation into the process by which inductive inferences are generated, and emphasize the knowledge-driven and flexible nature of human inductive reasoning.
1. Introduction Categorical induction is the process by which we project features believed to be true of one class to another related class. This process is essentially knowledge-driven; when we learn that A has a novel property, we use what we know about the relations between A and B to compute the likelihood that B will also have the property. Unlike deductively valid arguments like Socrates is a man, all men are mortal, therefore Socrates is mortal, inductive arguments like Coatimundis have disease X, therefore wombats have disease X are not inherently valid or invalid. Rather, they are strong or weak to the degree that they are supported by relevant knowledge. If we know something about relations between coatimundis and wombats that would warrant a common affliction, this argument may seem strong, but that strength derives entirely from our prior knowledge. The preceding is a more or less standard opening paragraph for a paper on inductive reasoning. And make no mistake; we stand by it. However, our goal in this chapter is to present a decidedly nonstandard approach to understanding inductive inference. Specifically, instead of looking for regularities in the ways in which people evaluate inductive arguments, we focus on the ways in which people generate inductive inferences.
1.1. Inference Generation Most previous research on inductive reasoning has involved the evaluation of complete inductive arguments in one form or another. Typically, participants are presented with one or more premises in which a property is attributed to a category, along with a conclusion in which the property is attributed to a different category, and asked to evaluate the argument, or rate the likelihood that the conclusion is true given that the premises are true.
Generating Inductive Inferences
185
This general approach can take several forms. Participants might be presented with an entire argument (e.g., Frogs have property X, therefore toads have property X ) and asked to evaluate its strength. Alternatively, participants might be asked to make yes/no judgments about whether one category has a property given that another one does (Frogs have property X. Do toads have property X? Do raccoons have property X? Fish?). Another popular format involves presenting a premise and forcing a choice of the better conclusion from among two or more alternatives (Frogs have property X. Is it more likely that raccoons or toads also have property X?). Our intention is not to call for the abolition of argument evaluation as a way of understanding inductive reasoning. We have learned a great deal about categorical induction from argument evaluation; it has revealed that inductive reasoning is both systematic and flexible (see Heit, 2000, for a review). We have used argument evaluation ourselves, and plan on continuing to do so. Rather, the point we want to make is that we cannot and should not base a psychology of inductive reasoning solely on argument evaluation. Argument evaluation depends on the participant recognizing and evaluating specific hypotheses generated by the experimenter. The participant is required to compare premise and conclusion categories, notice the relation(s) under investigation or lack thereof, evaluate their relevance to supporting an inference about the property—which may or may not be informative—and then render a judgment. As such, this approach is akin to a recognition memory task or a multiple-choice exam where people choose the best answer from a pool of prefabricated alternatives. We believe that this approach potentially misses an entire spectrum of inductive phenomena in which people generate plausible inferences from given information rather than evaluate hypotheses that are given to them. To begin to explore this spectrum of inductive reasoning, we utilized a novel open-ended induction task.1 Rather than evaluating the strength of arguments composed of premises and conclusions, or making a forced choice from among a limited set of alternatives, we allowed participants to generate open-ended projections about novel properties from pairs of premise categories (e.g., A and B have property X. What else do you think would have property X, and why?). In other words, we present participants with premises and ask them to generate their own conclusions. As such, our approach is more like a recall memory task or an essay exam. We manipulated the relations between A and B, and made corresponding predictions about the ensuing inferences if such relations guide inductive projections. This allowed us to map out the ways in which participants’ beliefs about relations between premise categories, and the property we ask them to reason about, influence the kinds of inferences they spontaneously generate. This approach has a number of advantages. 1
This methodology is not entirely novel. Proffitt, Coley and Medin (2000) in their Experiment 3, report results from a similar method, but present it in a different framework.
186
John D. Coley and Nadya Y. Vasilyeva
First, it represents (as far as we know) the first attempt to look at how people actually construct inductive inferences. Any viable theory or model of induction must explain not only how people evaluate the strength of arguments, but more centrally, how they generate novel inductive inferences. This chapter does not propose such a model or theory, but we believe that it represents a valuable contribution to the stockpile of raw materials for such a theory or model. Second, asking participants to generate open-ended inferences provides them a greater opportunity to draw on knowledge they spontaneously deem relevant, rather than depending on the experimenter to anticipate potential responses. Likewise, the open-ended method is less likely to artificially force a choice between basing an inference on one of two salient relations when both may seem plausible. Instead, participants are free to base inferences on multiple salient relations, if they so desire. Third, as we shall see, this approach allows participants to utilize relatively abstract or inchoate knowledge to guide inductive inference. As Keil (2003) has shown, our explanatory knowledge and understanding of causal mechanisms are often much more superficial and vague than we know or would like to admit. Nevertheless, inductive inferences can bridge gaps in specific factual knowledge—indeed, making uncertain guesses about the unknown is what induction is all about. Many researchers have suggested that fairly abstract principles (domain theories, schemata) provide inductive constraints in concept learning (e.g., Heibeck & Markman, 1987; Keil, 1981) language acquisition (Chomsky, 1980), and inductive inference (e.g., Coley, Hayes, Lawson, & Moloney, 2004; Coley, Medin, & Atran, 1997; Goodman, 1955). As demonstrated by Kemp, Perfors, and Tenenbaum (2007), it is possible to learn abstract knowledge from observations before acquiring specific knowledge at lower levels of abstraction. If so, inductive reasoning may be especially likely to rely on abstract ideas in the absence of specific knowledge. For example, if you learn that coatimundis have a particular disease, you might think it likely that anything that eats coatimundis might also have the disease, even if you have no idea what might do so. In an open-ended response format, you might confidently assert that ‘‘Anything that preys on coati-whatevers would also have disease X’’ and even explain your response along the lines of ‘‘because they could contract it from eating tainted coati meat.’’ However, if presented with an argument like Coatimundis have disease X, therefore ocelots have disease X, you might not know that ocelots are potential coati predators, and therefore might simply shrug your shoulders and rate the argument as relatively unlikely. Because you were unable to apply your (relatively abstract) knowledge about disease to evaluate the argument, you rated it as weak when in fact you believed it to be strong, but just didn’t realize it. Fourth, an open-ended format increases the ecological validity of induction research by capturing the previously unacknowledged open-ended
Generating Inductive Inferences
187
nature of everyday inductive reasoning. Seldom in everyday life are we asked to evaluate fully formed arguments; rather, we have some facts, and are free to generalize from those facts as we see fit. For example, I may know that butter and bacon are to be avoided on a low cholesterol diet; it seems more ecologically valid for me to generate inferences about ‘‘what else should be avoided’’ than to evaluate a series of arguments such as Butter and bacon have high cholesterol, therefore lettuce has high cholesterol. Butter and bacon have high cholesterol, therefore lamb has high cholesterol. . . In sum, by focusing on how people generate, rather than evaluate, inductive inferences, we allow participants to draw upon any knowledge that they deem relevant in the context of an ecologically valid inductive problem. Our goal is to use an open-ended inductive inference task to take a detailed look at how people spontaneously recruit different kinds of knowledge to generate inductive inferences.
1.2. Induction and Relations among Concepts A central problem in inductive reasoning is deciding what knowledge should inform a given inference. There is a vast amount of information associated with any given concept that can be used in induction. Some information is more likely to support inferences than other information; it is safe to say that a person who knows that a frog is an amphibian and who has seen a frog yesterday is more likely to project a novel property learned about a frog to other amphibians than to other things seen yesterday. In the following we will discuss three kinds of conceptual relations that have been shown to be relevant to evaluating the strength of inductive arguments: taxonomic (intrinsic) similarity, contextual (extrinsic) similarity, and causal relatedness. 1.2.1. Taxonomic Similarity Similarity is an intuitively appealing candidate for guiding induction. It plays an important role in many natural categories, and categorizing novel instances based on their similarity to known category exemplars or to a category prototype provides a basis for forming novel expectations about them. And indeed, taxonomic, or intrinsic, similarity—especially in the form of common category membership—has repeatedly been shown to be an especially strong predictor of the strength of an inductive argument. For example, the argument Frogs have property X, therefore toads have property X might seem strong because frogs and toads are both amphibians; since they are similar kinds of animals that share many known features, it is likely that they will share a newly learned feature as well. Reasoning of this kind is well described by models of category-based induction that emphasize the importance of similarity, shared features and/or common category membership in guiding inductive inferences (e.g., Osherson, Smith, Wilkie, Lopez, &
188
John D. Coley and Nadya Y. Vasilyeva
Shafir, 1990; Rips, 1975; Sloman, 1993; Sloutsky & Fisher, 2004). According to such models, arguments are perceived as strong to the degree that premises and conclusions are generally similar, share specific features, or belong to a common taxonomic class. However, it turns out that this notion of similarity may be both too broad and too narrow to provide a sufficiently detailed and exhaustive account of the range of inductive inferences people make. The construct is too broad because people make use of more precise subtypes of intrinsic similarity. One influential demonstration of this was done by Heit and Rubinstein (1994) who found that people prefer to base inferences on anatomical similarity (e.g., projecting a property from a bat to a mouse) or on behavioral similarity (e.g., projecting a property from a bat to a sparrow) depending on the property being projected. As the authors point out, these results logically eliminate the possibility that a single similarity measure can predict inferences about animals. At the same time, the construct is too narrow because intrinsic taxonomic similarity, even broken down into different subtypes, does not exhaust the types of similarity available for guiding inductive inferences. 1.2.2. Extrinsic Similarity In addition to being a member of a certain taxonomic class, frogs participate in a number of contextual relationships with other species that could potentially provide a basis for induction. For example, a property known to be true of frogs might reasonably be extended to fish; in other words, the argument Frogs have property X, therefore fish have property X might seem strong not because frogs are generally similar or taxonomically related to fish, but because they share one potentially important extrinsic contextual feature—an aquatic environment. That is, if one is aware of a specific contextual relation between premise and conclusion, that relation can increase the perception of argument strength to the degree that it is deemed relevant. There is evidence that adults utilize extrinsic similarity to evaluate inductive arguments. For example, when making categorical inferences about food, people use both taxonomic categories (like fruit or meat)—based on shared intrinsic features or composition—and script categories (like breakfast foods)— based on time, location, or setting in which the foods are eaten—to evaluate potential inductive projections (Nguyen & Murphy, 2003; Ross & Murphy, 1999; Vitkin, Coley & Feigin, 2005). Lin and Murphy (2001) demonstrated that participants view a wide range of thematic relations (e.g., camel–desert, cat–litter box, bees–honey, Michael Jordan–basketball, Hawaii–beach) as plausible bases for inferences. Shafto, Coley, and Baldwin (2007) showed that when people reason about animals, inferences are strengthened by extrinsic relations—specifically shared habitat—between premise and conclusion species, as well as by common membership in taxonomic categories.
Generating Inductive Inferences
189
In sum, although extrinsic contextual similarity has received less attention than intrinsic taxonomic similarity from researchers studying categorical induction, it represents an important alternative basis for inductive inference. 1.2.3. Causal Relations Although adding extrinsic similarity to our list of potential bases for induction is a step in the right direction, it is important to point out that similarity, however flexibly construed, does not exhaust the kinds of knowledge potentially relevant to guiding inductive inference. For example, we also possess causal knowledge about the way frogs interact with other species and their environment. For instance, if you learned that frogs have a property, you might infer that raccoons would also have this property, knowing that because raccoons eat frogs, they could potentially contract the property through ingestion. That is, if one is aware of a causal chain linking premise to conclusion, such as a food chain relation, it can inform evaluation of an inductive argument. In other words, the argument Frogs have property X therefore raccoons have property X is potentially strong not because frogs and raccoons are similar in any way, but because we have knowledge of a causal chain that links the two and is potentially relevant to property projections. In support of this idea, Medin, Coley, Storms, and Hayes (2003) demonstrated sensitivity to causal relations between premises and conclusions in a number of ways. For example, participants rated arguments where premise and conclusion were taxonomically dissimilar but shared a salient causal relation (e.g., Bananas have property X therefore monkeys have property X ) to be as strong as arguments where premise and conclusion were taxonomically more similar but causally unrelated (e.g., Mice have property X therefore monkeys have property X ). Salient causal relations also lead people to commit the conjunction fallacy (Tversky & Kahneman, 1973) by rating arguments with a conjunctive conclusion emphasizing a causal chain (e.g., Grain has property X therefore mice and owls have property X ) as stronger than arguments with a single constituent category as a conclusion (e.g., Grain has property X therefore owls have property X). Feeney, Shafto, and Dunning (2007) replicated this inductive conjunction fallacy effect, and showed that causal relations led to stronger and more persistent fallacies than taxonomic relations. Reliance on causal relations in reasoning has been shown to increase with relevant expertise. For example, Lo´pez, Atran, Coley, Medin, and Smith (1997) found that Itza’ Maya, indigenous people of Guatemala who rely on hunting and agriculture and live in close contact with nature, when asked to evaluate inductive arguments about local species, appeal to specific causal ecological relations between animals. Proffitt, Coley, and Medin (2000) demonstrated a similar effect with North American tree experts who were asked to reason about inductive problems involving disease distribution among trees. Rather than appealing to overall or categorical similarity of tree types, tree experts used their knowledge to construct
190
John D. Coley and Nadya Y. Vasilyeva
sophisticated explanations of how diseases might be transmitted from one tree to another. Likewise, Shafto and Coley (2003) showed that when projecting novel diseases among local marine species, commercial fishermen used causal knowledge of food webs to evaluate arguments. However, even relative novices (undergraduates) actively use causal relations to evaluate arguments when tested about familiar categories (e.g., Feeney et al., 2007; Medin et al., 2003) or when specifically trained about novel causal systems (Shafto, Kemp, Bonawitz, Coley, & Tenenbaum, 2008). Moreover, the expectation that causal relations provide a useful basis for inferences is present early; Muratore and Coley (2009) showed that 8-year-old children, when they have necessary knowledge about ecological interactions between animals, use causal information to make inferences. As demonstrated by Sloman (1994), inductive arguments can spontaneously trigger causal reasoning. When participants could construct a single explanation of why both premise and conclusion have a property, arguments were seen as more plausible than when two separate explanations were required to connect property to the premise and to the conclusion. In sum, people use a variety of conceptual relations to evaluate categorical inductive arguments. Taxonomic similarity—based on shared category membership and/or shared intrinsic features—is one common metric, and it has been widely studied and modeled. However, extrinsic similarity—based on shared context, or common links to the outside world—and causal relatedness—coherent causal pathways that could explain how or why a property is shared by premise and conclusion categories—are also potentially powerful guides for inductive inference. In this chapter, we examine factors that impact the frequency with which people generate inferences based on these three kinds of relations.
1.3. Goals of this Chapter We have argued that people use taxonomic, contextual, and causal relations among categories to evaluate inductive arguments. What factors guide recruitment of these relations during inference generation? In this chapter we utilize a novel open-ended induction task to examine how people generate inductive inferences about plants and animals. We chose the domain of folk biology because there is a rich literature on inductive reasoning using biological categories, and because the different kinds of salient and potentially orthogonal relations among living things (e.g., biological family, behavior, shared habitat, ecological niche, predator–prey) naturally lend themselves to supporting a range of different kinds of inferences. In particular, we examine how people use intrinsic taxonomic similarity, extrinsic ecological similarity, and causal relations to generate inferences about what animal species are likely to share novel properties. We focus on the following questions: First, to what degree do salient relations among
Generating Inductive Inferences
191
premise categories determine the nature of inferences generated from those categories? In general, we expect that participants will compare premise categories, extract salient relations, and generate inferences consistent with those relations. Thus, premises with salient taxonomic relations will yield taxonomic inferences, whereas premises with salient spatiotemporal or causal relations will yield corresponding inferences. Second, to what degree does the property being projected influence the process of inference generation? Previous research suggests that property can have a large effect on how arguments are evaluated. We know little about the effects of property on how arguments are generated; of particular interest is whether property serves as an overall biasing factor, or whether it changes the salience of particular premise relations in guiding inferences. Overall, our goal is to extend the knowledge base available to inform theories and models of inductive reasoning. We have no theoretical axe to grind; rather, we seek to expand the range of phenomena on inductive inference that any theory or model must account for by examining the nature of spontaneous inductive projections and explanations for those projections. In the next section, we consider the effects of relations among premise categories on inference generation. We review previous studies suggesting that salient relations—especially taxonomic relations—among premise and/or conclusion categories influence perceived strength of inductive arguments. We then present evidence that premise relations are important for guiding inference generation, and that taxonomic relations are less privileged than we might expect. In the following section, we consider property effects on inference generation. We first review research showing systematic effects of property being projected on argument evaluation, and then present evidence that property has a number of important effects on inference generation, including overall biasing toward a particular basis of inference, and changing the salience of relations among premise categories. Finally, in the last section, we summarize our findings and draw conclusions.
2. Effects of Premise Relations on Inference Generation 2.1. The Role of Premise Relations in Argument Evaluation Previous research gives us some reason to expect relations among premise categories to be an important influence on the kinds of inferences people make. For example, Medin et al. (2003) have shown that salient relations among premise categories can lead to violations of normative logic or similarity-based predictions. Consider the diversity principle which suggests that dissimilar premise categories should provide stronger evidence for a generalization to an inclusive category than similar premise categories.
192
John D. Coley and Nadya Y. Vasilyeva
Medin et al. showed that arguments with taxonomically diverse yet causally linked premise categories (e.g., Robins have property X and worms have property X therefore goldfish have property X ) were rated as weaker than arguments with less diverse yet unrelated premise categories (e.g., Robins have property X and iguanas have property X therefore goldfish have property X ). Likewise, arguments with taxonomically diverse premise categories that also share a salient and potentially relevant feature (e.g., Polar bears have property X and penguins have property X therefore all animals have property X ) were rated as weaker than arguments with less diverse yet unrelated premise categories (e.g., Polar bears have property X and antelopes have property X therefore all animals have property X ). In both cases, comparison of premise categories yielded a salient linking relation, be it causal (robins eat worms) or featural (polar bears and penguins are both found in cold climates). This relation provided an explanation for the presence of the property in both species, and therefore weakened the general argument relative to a less diverse yet unrelated pair of premise categories (but see Heit & Feeney, 2005). Likewise, salient relations among premise categories also resulted in nonmonotonicity, whereby arguments with fewer premise categories that are proper members of the same superordinate as a conclusion category (e.g., Brown bears have property X therefore buffalo have property X ) were rated as stronger than arguments with more such premise categories (e.g., Brown bears have property X, polar bears have property X, and grizzly bears have property X therefore buffalo have property X ) if the additional premises reinforced a relation shared by premise categories but not the conclusion category. In sum, Medin et al. (2003) demonstrate that salient shared features or causal relations among premise categories can have a marked effect on inductive inferences. By rendering specific categories and/or relations such as bears, arctic animals, or predator–prey highly salient, these manipulations serve to overcome more general default approaches to evaluation of inductive arguments. These findings suggest that participants are comparing premise categories and extracting salient relations between them in order to inform their evaluations of categorical inductive arguments. Consistent with this view, McDonald, Samuels, and Rispoli (1996) proposed that subjects view the premises of a categorical argument as evidence and the conclusion as a hypothesis. As such, the argument should be perceived as weak to the degree that competing hypotheses are brought to mind by the evidence (that is, the premises). To test this, McDonald et al. asked one group of participants to imagine they were scientists who had just discovered ‘‘substance X’’ in certain sets of organisms. Using this ‘‘evidence,’’ they were asked to construct plausible conclusions as to ‘‘general categories of organisms that might reasonably be expected to contain substance X’’ (p. 204). Another group rated the strength of arguments taking each set of organisms as premises and a more general class as a conclusion. Results showed a strong relation between responses to the
Generating Inductive Inferences
193
two tasks; the more competing hypotheses were generated in the first task, the weaker the argument was rated in the second task. These results imply that people may spontaneously generate alternatives to a given conclusion based on salient relations among premise species, and utilize these alternatives to evaluate the strength of an argument. Indeed, Feeney, Coley, and Crisp (2010) showed that while reading premises of an inductive argument, participants actively construct hypotheses about which relations among premise categories might be relevant; when a premise is inconsistent with a current hypothesis, that premise takes longer to read, and has a larger effect on ratings of argument strength, than when the same premise is consistent with a current hypothesis. For example, participants were faster to read the third premise of the argument Magpies have property X, panda bears have property X, zebras have property X, therefore. . . than to read the third premise of the argument Brown bears have property X, panda bears have property X, zebras have property X, therefore. . . despite the fact that the third premise and the preceding second premise are identical in both arguments. This suggests that people form a hypothesis about likely relevant relations based on comparing the categories in the first two premises. In the first argument, this is likely something like, black and white animals, so that zebras is consistent with the hypothesis and therefore processed quickly. In the latter argument, the likely hypothesis is probably bears, and because it is inconsistent with this hypothesis, zebras takes longer to process. Together, this evidence suggests that people may compare premise categories to glean information about likely conclusions, and that salient relations among premise categories may serve to bias or constrain the kinds of inferences generated from those premises.
2.2. Relative Salience of Conceptual Relations We have argued that inductive projections from one category to related categories can be based on a number of different kinds of intercategory relations, including common membership in a taxonomic category and intrinsic similarity, extrinsic similarity with respect to some shared contextual or environmental relation, or causal relations between categories. We have also argued that salient relations among premise categories are among the factors that influence what knowledge is used to guide an inductive inference. However, there is reason to believe that knowledge of these relations may differ systematically in baseline salience; in particular, taxonomic knowledge may be especially salient and accessible, and correspondingly privileged for guiding inductive inference, at least for North American university undergraduates reasoning in domains in which they lack expertise (Coley, Medin, Proffitt, Lynch, & Atran, 1999; Coley, Shafto, Stepanova, & Barraff, 2005; Shafto, Coley, & Vitkin, 2007). For example, verification of taxonomic category membership of foods (e.g., apple is a fruit) is faster
194
John D. Coley and Nadya Y. Vasilyeva
than verification of script category membership (apple is a snack), and priming facilitates script category verification but not taxonomic verification (Ross & Murphy, 1999; Vitkin, Coley, & Feigin, 2005). Moreover, priming taxonomic category membership inhibits script categorization, whereas priming script categories has no effect on taxonomic categorization (Vitkin et al., 2005). Likewise, time pressure inhibited inductive projections among animals that shared habitat, but had no effect on projections among taxonomically related animals (Shafto, Coley, & Baldwin, 2007). Inferences based on shared category membership are present from early in development (Gelman, 2003; Gelman & Coley, 1990). Novice adults widely utilize taxonomic relations to guide inferences (Lo´pez et al., 1997; Osherson et al., 1990; Shafto & Coley, 2003) as do experts when the property being projected provides little information about what other relations might be relevant (Shafto & Coley, 2003). Together, these findings suggest that taxonomic knowledge may be more accessible than other kinds of knowledge, and that taxonomic similarity may be an especially important foundation for inductive inferences. However, as argued above, people are willing and able to recognize that inferences based on other kinds of relations, including extrinsic similarity and causal relatedness, can also be inductively strong. This raises the question about the frequency with which people spontaneously utilize these different relations to guide inductive inferences. In this study, we examine how participants with no special expertise in a domain recruit knowledge of taxonomic, causal, and contextual relations to generate inductive inferences.
2.3. Study One: Investigating Effects of Premise Relations on Inference Generation To investigate the degree to which relations among premise categories guide open-ended inductive inferences, we constructed pairs of local animals that varied in their taxonomic and ecological relatedness. Participants were told that each pair shared a novel internal substance and were asked what other animals they thought might have the substance and why. We reasoned that a novel ‘‘substance’’ was sufficiently ambiguous to avoid overly constraining the nature of the inference; if construed as an innate physiological substance (e.g., analogous to serotonin), it could plausibly be projected along taxonomic lines, whereas if construed as an environmentally transmitted substance (e.g., analogous to DDT), it might plausibly be projected on the basis of extrinsic similarity or causal interaction. Finally, we assessed participants’ beliefs about the relatedness of the premise pairs, which allowed us to directly examine how such beliefs predicted patterns of inference.
195
Generating Inductive Inferences
In general, we hypothesized that salient relations among premise categories would influence the basis of inductive inference; we expect participants to compare premise categories, extract salient relations, and generate inferences consistent with those relations. If so, then salient taxonomic relations among premises should yield taxonomic inferences, whereas premises with salient spatiotemporal or causal relations should yield corresponding inferences. We were also interested in examining the degree to which taxonomic similarity may be a privileged basis for guiding inference generation. A taxonomic bias might manifest itself in several ways. We might simply observe that inferences based on taxonomic similarity are generated more frequently than those based on extrinsic similarity or causal relatedness. Alternatively, taxonomic relations among premise categories might exert a stronger influence on inferences than ecological relations. In the following, we explore these possibilities. 2.3.1. Method 2.3.1.1. Research Design and Procedure Thirty-one Northeastern University undergraduates were recruited from introductory psychology classes and participated for course credit. We chose 12 pairs of animal species native to Massachusetts to independently manipulate the presence of salient taxonomic and ecological relations (see Table 1). Pairs were either taxonomically near (drawn from the same or a closely related superordinate biological class) or taxonomically far (drawn from different superordinate biological classes). Orthogonally, pairs were either ecologically related—via predatory/prey relation, shared habitat, or ecological niche—or ecologically unrelated. Results of posttests (see below) confirmed that participants viewed the relatedness of the premise pairs in the manner in which we intended. Table 1
Stimulus Pairs, Study One.
Ecological relatedness
Taxonomic distance Taxonomically near
Taxonomically far
Related
Coyote/Bobcat Water Snake/Green Frog Heron/Duck
Unrelated
River Otter/Deer Hummingbird/Canada Goose Box Turtle/Gray Tree Frog
Beaver/Spotted Turtle Red-tailed Hawk/ Field Mouse Garter Snake/Owl Chipmunk/Bullfrog Chickadee/ Salamander Muskrat/Woodpecker
196
John D. Coley and Nadya Y. Vasilyeva
Participants were tested individually or in small groups in the laboratory. They were presented with a packet containing 12 pairs of animal names; each pair was presented on a separate page. Participants reasoned about a substance found in the bloodstream. Pairs were presented in random order. Instructions read ‘‘On each page of this packet you will find a pair of local animals which have been discovered to have a specific, naturally occurring substance in their bloodstreams. All you know about the substance is that these two kinds of animals have it. You will be asked to list other animals or kinds of animals you think might also have that substance, as well as reasons for your answers.’’ For each pair, participants wrote down other species they expected would share the property and an explanation for why they projected the property from the premise pair to these animals. Following the inference generation task, participants completed a belief posttest in which they were asked directly about how they thought each premise pair was related. Specifically, for each item, participants were asked the following: Do these animals belong to the same biological category? Do these animals live in the same habitat? Does one of these animals eat the other? Questions were presented in this fixed order for all participants, who simply checked ‘‘yes’’ or ‘‘no.’’ 2.3.1.2. Coding Each response consisted of a list of participant-generated conclusion categories and an explanation for why a property true of the premise pair was likely to be shared by those categories. In order to systematically quantify these responses, we developed a coding system to characterize the basis of each inference. Responses were coded based on both (1) relations between experimenter-generated premise categories and participant-generated conclusion categories, and (2) the explanation generated by the participant. Coding categories were not mutually exclusive; a given response could receive multiple codes.2 Four or five trained coders blind to the hypotheses coded each response independently. Consensus (defined as agreement between N-1 coders) was reached on over 90% of codes. Disagreements were resolved by discussion. The coding scheme is summarized in Table 2. Inferences based on category membership and similar habitat were most frequent, followed by inferences based on food chain interaction, behavioral similarity, and perceptual similarity. The remaining types of inferences were relatively infrequent; means are presented in Table 3. To examine the effects of relations between premise categories on the three broad classes of inferences discussed above, we collapsed the initial coding categories into those that reflect reasoning based on common category membership, appearance, or other shared intrinsic features (henceforth, we will refer to these as taxonomic inferences), those that reflect reasoning based on extrinsic similarity or shared contextual features like similar diet or habitat (henceforth, 2
Multiple-code assignments did not represent ambiguous responses, but rather responses in which participants invoked different kinds of reasons to support a particular inference.
Table 2
Coding Scheme for Characterizing Basis of Inference.
Basis of inference
Taxonomic inferences Category membership Perceptual similarity Behavioral similarity Physiological similarity General similarity Extrinsic inferences Similar diet Similar habitat Causal inferences Predatory interaction Habitat interaction Behavioral interaction General interaction
Definition and example
P and C belong to the same class or category. [Heron/Duck] Geese and cranes because the disease seems to be related to birds. P and C are similar with respect to some aspect of superficial surface appearance. [Box turtle/Gray tree frog] Lizards because they are green just like turtles and frogs. P and C are similar with respect to some aspect of behavior. [Chipmunk/Bullfrog] Squirrel and rabbit because they are fast-moving animals. P and C are similar with respect to specific organs or systems. [Box turtle/Gray tree frog] Rodents and turtles because they have similar genetic makeup. P and C are alike or have similarities without further specifying the nature of the similarity. [Muskrat/ Woodpecker] Rodents and birds because they are similar to muskrat and woodpecker. P and C are similar with respect to diet or eating the same kind of thing. [Chickadee/Salamander] Other plant-eating or insect-eating animals because both examples eat plant matter and insects. P and C share similar or the same habitat without specification that the property is transmitted via habitat. [Owl/Garter snake] Bears and tigers because they can all be found in the woods. P and C interact with respect to predation, that is, one or both Ps eats or is eaten by C. [Field mouse/Red-tailed hawk] An owl; red-tailed hawks could get substance B from eating field mice. Owls eat small animals like field mice. P and C share or pass a property by coming into contact through the same habitat. [Beaver/Spotted turtle] Other animals of their local area; two animals sharing the same bodies of water probably get the substance from it, either by ingestion or absorption, since they spend a lot of time in the water. P and C interact via some aspect of behavior. [Water snake/Frog] Beaver - beaver has the tendency to fight snakes and if the disease C is contagious, the beaver might end up contracting it. P and C interact without further specification of the nature of the interaction. [River otter/Deer] Ticks, flies, river fish; anything that would come in contact with both the deer and the river otter.
Note: P ¼ (given) premise categories; C ¼ (participant generated) conclusion categories.
Table 3 Mean Relative Frequencies for each Inference Type, Studies One and Two. Study One Type of inference
Taxonomic Category membership Perceptual similarity Behavioral similarity Physiological similarity General similarity Extrinsic Similar habitat Similar diet Causal Predatory interaction Habitat interaction Behavioral interaction General interaction
Study Two
Substance
Substance
Disease
Gene
0.55 (0.240) 0.34 (0.200) 0.10 (0.129) 0.11 (0.148) 0.05 (0.084) 0.06 (0.092) 0.38 (0.169) 0.31 (0.161) 0.09 (0.122) 0.18 (0.187) 0.12 (0.136) 0.06 (0.100) 0.01 (0.026) 0.00 (0.000)
0.55 (0.185) 0.34 (0.177) 0.12 (0.131) 0.09 (0.104) 0.02 (0.043) 0.06 (0.087) 0.43 (0.174) 0.35 (0.190) 0.11 (0.095) 0.22 (0.237) 0.17 (0.170) 0.04 (0.068) 0.01 (0.023) 0.02 (0.047)
0.45 (0.225) 0.36 (0.178) 0.06 (0.085) 0.06 (0.078) 0.01 (0.025) 0.06 (0.099) 0.39 (0.206) 0.29 (0.165) 0.12 (0.122) 0.44 (0.257) 0.26 (0.188) 0.13 (0.159) 0.04 (0.053) 0.03 (0.050)
0.74 (0.167) 0.26 (0.223) 0.26 (0.195) 0.25 (0.125) 0.03 (0.050) 0.10 (0.086) 0.27 (0.146) 0.24 (0.151) 0.04 (0.061) 0.03 (0.044) 0.03 (0.044) 0.00 (0.000) 0.00 (0.011) 0.00 (0.000)
Note: Standard deviations appear in parentheses.
Generating Inductive Inferences
199
extrinsic inferences), and those that reflect reasoning based on some causal mechanism, including co-occurrence in space and time or direct contact through behavior or predation (henceforth, causal inferences). The precise makeup of these broad categories can be found in Table 2. If a given response was coded as any of the component inference types, it was scored as a taxonomic, extrinsic, or causal inference. Again, these categories were not mutually exclusive. For example, a response based on as category membership, behavioral similarity and habitat interaction would be counted both as a taxonomic inference and a causal inference; a response coded as behavior interaction and predatory interaction would be counted as a causal inference. These three general categories accounted for over 92% of all responses. 2.3.2. Results and Discussion 2.3.2.1. Relative Frequency of Inferences Results demonstrate that undergraduates used a wide range of knowledge to generate inferences about how a novel substance might be distributed among animal species. Overall, 56% of inferences were taxonomic (e.g., an inference from the heron/duck pair to ‘‘other birds because substance F might occur naturally in birds’’), and almost 38% were extrinsic (e.g., given that herons and ducks have substance F, ‘‘a kingfisher might have substance F in its bloodstream because it feeds off fish like the heron and lives by streams like the duck’’). Causal inferences (e.g., a projection from heron/duck to ‘‘other birds or fish in the area because they may have gotten the substance from a common pond or body of water’’) were least common, but still accounted for 18% of responses. One-way ANOVA confirmed that the differences between all three means were statistically significant (F(2,60) ¼ 21.66, p < 0.0001, hp2 ¼ 0.42). These results demonstrate the spontaneous use of knowledge about taxonomic relatedness, spatiotemporal contiguity, and causal transmission to guide inductive reasoning. It is important to stress that participants in this experiment were not simply reporting how the premise species were related, nor were they evaluating the plausibility of a fully formed argument. Rather, they were forming hypotheses about the distribution of a novel substance on the basis of information extracted via comparison of the premise species, and then using that hypothesis to generate plausible inferences. Sometimes these inferences were quite specific (e.g., an inference from garter snake/owl to ‘‘insects (ants, crickets) and spiders, mice because garter snakes eat insects, I think mice eat garter snakes and I would think owls eat at least mice if not both so maybe substance C travels the food chain’’), and sometimes they were very vague (e.g., from garter snake/owl to ‘‘Any insects or animals eaten by either—the substance will pass from their blood to the animals’’), but the critical point is that a broad range of knowledge about contextual similarity and interaction among premise categories—as well as taxonomic similarity— guided inference generation. Moreover, our open-ended methodology was sufficiently sensitive for us to detect such inferences even in the absence of
200
John D. Coley and Nadya Y. Vasilyeva
specific factual knowledge (about, e.g., what owls and garter snakes eat). More generally, these results revealed that category membership and intrinsic similarity—while clearly important—do not nearly exhaust the spectrum of knowledge utilized to guide inductive inferences. 2.3.2.2. Effects of Premise Relations Our central question in Study One was the degree to which relations among premise categories guide and constrain patterns of inferences. To address this we manipulated relations among premise categories to look for corresponding differences in inference patterns. We originally conceived of this manipulation as a 2 (taxonomic distance) 2 (ecological relatedness) design, but results of posttests suggested that although participants viewed the overall relatedness of the premise pairs in the manner in which we intended, individual variability in salience of relations both within and between our planned item classes was larger than we anticipated. The world, apparently, is not a factorial design. Also, the distribution of predation versus shared habitat relations was uneven in our ‘‘ecologically related’’ cells, and this turned out to be a critical distinction. As such, we deemed it more appropriate to construe the salience of taxonomic, habitat, and predation relations between each pair of premise categories as a continuous variable (ranging from weak to strong) rather than a categorical variable (present/absent) as originally conceived. We used multiple regression analyses to look at the data in this way. For each item, we calculated the relative frequency of positive responses to each belief question (‘‘do these two animals belong to the same biological category,’’ ‘‘do these two animals live in the same habitat,’’ and ‘‘does one of these animal eat the other’’) averaged across all participants. We took these scores as indices of the salience of taxonomic relatedness, shared habitat, and predatory relations between premise species in each item. We also assigned each item scores corresponding to the mean relative frequency of taxonomic, extrinsic, and causal inferences for that item, again averaged across all participants. Using arcsine transforms of these scores, we conducted three item-wise multiple regression analyses using beliefs about premise relations to predict the relative frequencies of each type of inference; standardized regression coefficients for these multiple regressions are presented in Figure 1. This analysis demonstrated that the primary predictor of taxonomic inferences was the salience of the taxonomic relatedness of a premise pair; the frequency of taxonomic inferences increased with taxonomic salience, but was unrelated to the salience of shared habitat or predatory relations (R2 ¼ 0.65, p ¼ 0.031). In contrast, extrinsic inferences were positively related to the salience of shared habitat, but unrelated to the salience of taxonomic relatedness. Extrinsic inferences were also negatively related to the salience of predatory relations between premise pairs (R2 ¼ 0.64, p ¼ 0.033). Finally, frequency of causal inferences was positively related
201
Generating Inductive Inferences
1.0
*
**
Standardized regression coefficient
* 0.5
0.0
–0.5 Same biological family Same habitat One eats the other
*
–1.0 Taxonomic
Extrinsic Inference type
Causal
Figure 1 Relations between salience of premise relations and frequency of taxonomic, extrinsic, and causal inferences, Study One. (Note: *p < 0.05, **p < 0.005.)
to salience of predatory relations, but unrelated to salience of taxonomic relations or shared habitat (R2 ¼ 0.80, p ¼ 0.004). Thus, as predicted, we observed strong effects of relations among premise categories on the nature of inferences generated from those categories. Taxonomic inferences varied with the salience of taxonomic relations among premise species, whereas extrinsic and causal inferences varied with the salience of ecological relations among premise species. Indeed, the correspondence between premise relations and nontaxonomic inferences was quite specific; salient shared habitat led to increased extrinsic reasoning, whereas salient predatory relations led to increased causal reasoning and decreased extrinsic reasoning. This last finding hints that participants preferred to draw inferences based on a salient causal mechanism when possible (e.g., projecting from owl/garter snake to ‘‘eagle, hawk, water snake, and garden snake—I think that these animals could have this substance because they are predators and if the animal is eaten with the substance, it could get into the other animal’s bloodstream’’), and fell back
202
John D. Coley and Nadya Y. Vasilyeva
on extrinsic similarity when no such mechanism was readily apparent (e.g., projecting from chickadee/salamander to ‘‘animals that share their neighborhood: chickadees and salamanders don’t eat exactly the same items; if their shared dirt, air and water have the same substance, several animals could be affected’’). 2.3.2.3. Privileged Taxonomic Inferences? Somewhat to our surprise, our results did not provide support for the privileged status of taxonomic relations in inference generation. If we compare the frequency of taxonomic inferences to that of ‘‘nontaxonomic’’ inferences (i.e., inferences based on extrinsic similarities or causal interaction), the two do not differ (0.56 vs. 0.53; t(30) ¼ 0.35, p ¼ 0.726). Moreover, there was no evidence that taxonomic relations among premise categories exerted undue influence on which inferences were generated. Taxonomic salience predicted taxonomic inferences but did not suppress extrinsic or causal inferences; on the contrary, the most influential premise relations were predation relations, which simultaneously promoted causal inferences and inhibited extrinsic inferences.
2.4. Summary: Effects of Premise Relations on Inference Generation We want to emphasize four conclusions from Study One. First, our participants found the task perfectly natural, and spontaneously generated inferences based on contextual similarity and causal relations as well as inferences based on intrinsic similarity and category membership. For example, upon learning that a water snake and a frog both have a certain substance in their bloodstream, participants used what they know about relations between water snakes and frogs to guide further inferences about the substance. Second, there was a tight linkage between the perceived relations between premise categories and the inferences those categories engendered. Taxonomic relations supported taxonomic inferences; if the belief that snakes and frogs are (somewhat) related taxonomically was most salient, a participant was likely to generate an inference based on taxonomic similarity (e.g., ‘‘salamander, bullfrog, turtle: they are all reptiles’’). In contrast, shared habitat promoted extrinsic inferences, and predation relations supported causal inferences. If a participant noticed that water snakes and frogs share a habitat, or that water snakes like to eat frogs, they were more likely to generate inferences based on extrinsic similarity or causal interaction (e.g., ‘‘bullfrog, fish (freshwater), plants: they live in the same environment and could’ve contracted from the same thing or the snake could’ve just eaten the infected frog,’’ or ‘‘other freshwater life, other snakes, animals that eat small water animals: again it seem to begun on the frog and it could move up the
Generating Inductive Inferences
203
food chain’’). Thus, salient relations between premise categories served as a foundation for generating inferences. Third, we found that when premises were causally related via predation, not only were causal inferences promoted, but extrinsic inferences were also suppressed, suggesting that inferences with some casual component were preferred over inferences based on similarity. And finally, we found little evidence for the primacy of taxonomic relations in guiding inductive inferences. Instead, it appears that when people generate inferences in an open-ended task, they consider a wide range of relations as bases for constructing inductive hypotheses.
3. Effects of Property on Inference Generation In the previous section, we presented evidence that salient relations among premise categories strongly influence the kinds of open-ended inductive inferences that university students spontaneously generate. The open-ended inference generation task proved to be sensitive to sophisticated reasoning patterns our participants employed even in the absence of specific biological knowledge. In this section, we focus on another source of information that can provide relevant constraints on generation of inductive inferences: the nature of the property being projected.
3.1. Property Effects in Argument Evaluation The nature of the property being generalized can have a profound impact on the strength of inductive arguments. For example, if you learn that frogs have gene X, you might infer that toads would also have the gene on the basis of close biological affinity among them and/or common membership in the same taxonomic class. In contrast, if you learn that frogs have disease X, you might reasonably expect fish to also have the disease because both live in aquatic environments, or raccoons might even have the disease from eating sick frogs. More generally, the nature of the property influences the likelihood that you would generate one type of inference over another. Because what we know about genes includes the knowledge that members of the same biological family are genetically similar, this property may increase the salience of taxonomic knowledge about frogs, thereby increasing the likelihood of a taxonomic inference. Likewise, what we know about disease includes knowledge about contagion via contact or eating contaminated food. As such, this property may increase the salience of knowledge about frogs’ habitat and role in the food chain, thereby increasing the likelihood of an inference based on extrinsic similarity or interaction. This
204
John D. Coley and Nadya Y. Vasilyeva
systematic link between the kind of property being projected and the likelihood of different inferences has been termed inductive selectivity. An early demonstration that property could affect perceived argument strength came from Heit and Rubinstein (1994) who asked participants to estimate the probability that a pair of animal species would share a property. The pairs were anatomically similar (e.g., mouse and bat) or behaviorally similar (e.g., sparrow and bat). Likewise, properties were either anatomical (e.g., ‘‘having a liver with two chambers that act as one’’) or behavioral (e.g., ‘‘traveling in back and forth, or zig-zag, trajectory’’). They found that probability judgments were determined by the match between the relation shared by the animals in a pair and the projected property. Arguments in which the premise and conclusion were anatomically similar were judged stronger for an anatomical property than for a behavioral property, whereas arguments with a behaviorally similar premise and conclusion were judged stronger for behavioral than anatomical properties. The authors concluded that property influenced the perceived similarity between premises and conclusions, which in turn determined the strength of the inference. Ross and Murphy (1999) showed that the nature of the property influenced the use of taxonomic or script categories to guide inferences about food. Participants were presented with triads consisting of a target food (e.g., cereal) and two alternatives, one taxonomic (noodles, another member of the breads & grains category), and the other one script (milk, another breakfast food ). They were taught that the target food (cereal) had a biochemical property (a novel enzyme) or a situational property (eaten at a particular ceremony in an unfamiliar culture), and asked to project that property to one of the alternatives. Participants preferred taxonomically related conclusions when projecting a biochemical property, but conclusions related via a common script when projecting situational properties (see also Nguyen & Murphy, 2003; Vitkin, Coley & Feigin, 2005). Inductive selectivity is present early on. By kindergarten and perhaps earlier, children are able to selectively utilize taxonomic similarity among species to guide inferences about novel physiological properties, and ecological relations to guide inferences about disease (Coley, Vitkin, Seaton, & Yopchick, 2005; Coley et al., 2007; Vitkin Vasilyeva, & Coley, 2007). This suggests that from relatively early in development, children are sensitive to different relations among living things and their selective inductive potential. Likewise, Kalish and Gelman (1992) showed that when reasoning about artifacts, preschoolers projected unfamiliar dispositional properties (e.g., will get fractured if put in really cold water) on the basis of material kind (e.g., glass), but projected unfamiliar functional properties (e.g., used for partitioning) on the basis of object kind (e.g., scissors). Shafto and Coley (2003) showed that when projecting a novel disease, commercial fishermen utilized knowledge of predator–prey relations among local marine creatures to guide casual inferences; in contrast, when reasoning
Generating Inductive Inferences
205
about a completely blank property (e.g., has property X), fishermen showed no sign of causal reasoning, and instead utilized taxonomic similarity to guide inferences. Along these same lines, Tenenbaum and colleagues (e.g., Griffiths & Tenenbaum, 2005; Shafto, Kemp, Bonawitz, Coley, & Tenenbaum, 2008; Tenenbaum, Griffiths, & Kemp, 2006) propose a family of models of statistical inference over structured knowledge representations. These models posit separable knowledge structures based on intuitive theories that are called on to guide inductive reasoning in different contexts. In support of this idea, Shafto et al. (2008) explicitly taught biologically naı¨ve undergraduates about food web and taxonomic relations among species. They found that a causal model of the food web predicted inferences about novel diseases but not genes, whereas a treestructure of taxonomic relations among species predicted inferences about novel genes but not diseases. This is consistent with the view that property serves to indicate which knowledge structure is relevant for evaluating the strength of a given argument. In sum, there is ample evidence that the property being projected influences the way in which inductive arguments are evaluated. Typically, these effects are interpreted in terms of the property biasing the computation of premise-conclusion relatedness. However, we know little about how the property being projected might influence the way in which inductive inferences are generated.
3.2. Study Two: Investigating Effects of Property on Inference Generation To examine this question, we utilized the same open-ended inductive inference task that we used in Study One. Our primary goal was to look at the effects of property on patterns of inference generation. To this end, we asked participants to reason about a novel gene, disease, or substance. Gene was chosen as a taxonomically biasing property; participants are likely to believe that genes are distributed along taxonomic lines (Shafto, Coley, & Baldwin, 2007). If property affects what relations between premises are noticed and/or constrains what inductive hypotheses are generated, we expect that thinking about the distribution of a novel gene should render taxonomic relations particularly salient and thereby facilitate inferences based on taxonomic similarity. Likewise, disease was chosen as an ecologically biasing property; participants are likely to believe that disease can be transmitted along ecological lines via contact and/or contamination (e.g., Shafto, Coley, & Baldwin, 2007; Shafto, Coley, & Vitkin, 2007; Shafto et al., 2008). Thus, reasoning about a disease should highlight shared habitat or predation relations, and thereby facilitate reasoning based on extrinsic similarity or interaction. We included substance in order to replicate results of Study One. Moreover, since we observed all three types of inferences of interest among subjects reasoning
206
John D. Coley and Nadya Y. Vasilyeva
about substance in Study One, it represented a relatively ambiguous and unbiased property. Study One revealed a surprising sensitivity to the specific nature of the ecological relation among premise species. Therefore, in Study Two we were more careful about the specific nature of the ecological relation between related pairs; we manipulated whether the pair was related via shared habitat only (e.g., koala–kangaroo), or via predation as well as habitat (e.g., lion–zebra). We also tripled the number of items used. Based on results of Study One, we expected taxonomic relations to promote taxonomic inferences and predation relations to promote causal inferences. We also expected extrinsic inferences to increase with the salience of shared habitat and decrease with the salience of predation. We were particularly interested in potential interactions between property and premise relations. One possibility is that premise relations and property may have relatively independent effects on inferences: each may serve to render particular classes of inferences more or less likely. Participants’ naı¨ve theories about how different properties are distributed may increase the overall likelihood of property-congruent types of inferences. For example, reasoning about disease may render causal inferences likely regardless of whether premises themselves are related via predation. Another possibility is that property may influence the way that premise relations influence inferences. For example, predation relations between premise categories may increase the likelihood of causal inferences about disease, but not about gene. These two possibilities are not mutually exclusive. In Study Two, we examine both of them. 3.2.1. Method 3.2.1.1. Research Design and Procedure Seventy-two Northeastern University undergraduates were recruited from introductory psychology classes and participated for course credit. For this study we developed 36 pairs of animal species. As in the previous study, pairs were either taxonomically near or far. However, we were more precise in manipulating the ecological relations among species. Specifically, pairs were either ecologically related via predation (one species typically preyed on the other), ecologically related via shared habitat (both species are typically found in the same habitat but do not prey on each other), or ecologically unrelated (species are typically found in nonoverlapping habitats3). As in Study One, ecological relatedness was orthogonal to taxonomic distance. A complete list of stimuli is presented in Table 4. 3
We also manipulated whether species were exotic or local—18 pairs (6 from Experiment 1 and 12 new pairs) were local species (native to Massachusetts) and another 18 pairs (all new) represented exotic species (not typically occurring naturally in New England). However, we collapsed across that variable for purposes of this chapter.
207
Generating Inductive Inferences
Table 4 Stimulus Pairs, Study Two. Ecological relation
Predation
Shared habitat
Unrelated
Taxonomic distance Taxonomically near
Taxonomically far
Lion/Zebra* Harp seal/Polar bear* Hammerhead shark/Sardine* Green frog/Water snake Fox/Rabbit Fly/Spider Tarantula/Scorpion* Koala bear/Kangaroo* Toucan/Parrot* Porcupine/Moose Heron/Duck Bee/Butterfly Gorilla/Caribou* Emu/Flamingo* Emerald tree boa/Komodo dragon* Hummingbird/Canada goose Newt/Box turtle Humpback whale/Squirrel
Herring/Penguin* Anteater/Leaf-cutter ant* Lemming/Snowy owl* Salmon/Black bear Brown bat/Mosquito Hawk/Field mouse Dolphin/Seahorse* Elephant/Crocodile* Macaw/Jaguar* Spotted turtle/Beaver Lobster/Tuna Owl/Deer Clownfish/Tiger* Giraffe/Puffin* Gecko/Peacock* Bullfrog/Chipmunk Raccoon/Pelican Chickadee/Salamander
Note: * denotes exotic items.
In addition to manipulating relations among premise species, we also manipulated the property that participants were asked to reason about. As discussed above, participants reasoned about novel substances, genes, or diseases. In sum, each participant reasoned about a single property and generated inferences from 18 local or exotic pairs of animals. Taxonomic distance and ecological relation were manipulated within-subjects. The instructions and procedure were the same as in Study One, except that in the disease condition, the animal pairs were ‘‘discovered to have a certain disease’’ rather than ‘‘a specific, naturally occurring substance in their bloodstreams’’ and in the gene condition they were ‘‘discovered to have a certain gene.’’ Following the inference task, participants completed a belief posttest as in Study One. 3.2.1.2. Coding The coding procedure was identical to that used in Study One. Consensus (defined as agreement between N-1 coders) was reached on over 93% of codes. Disagreements were resolved by discussion. As in
208
John D. Coley and Nadya Y. Vasilyeva
Study One, the most common inferences were those based on category membership (M ¼ 31%), similar habitat (M ¼ 29%), and food chain interaction (M ¼ 15%). Means for all inference types in all three property conditions are given in Table 3. We again collapsed the initial coding categories into those that reflect taxonomic, extrinsic, and causal inferences. Together, these coding categories accounted for 91% of all responses. 3.2.2. Results and Discussion 3.2.2.1. Relative Frequency of Inferences Again, undergraduates used a wide range of knowledge to generate inferences about how novel properties might be distributed among animal species. Across all conditions, 58% of inferences were taxonomic (e.g., an inference from hummingbird/Canada goose to ‘‘Quail, all geese types, because it seems likely that since hummingbirds and Canada geese have it all bird relatives might have substance P in their bloodstream’’). Likewise, 35% of inferences were extrinsic (e.g., one participant projected a substance from green frog/water snake to ‘‘eels, tadpoles, alligators, crocodiles: they all for the most part dwell in water’’). Finally, 23% of inferences were causal (e.g., projecting ‘‘disease E’’ from lobster/tuna to ‘‘clams, seaweed, or algae, because lobster could eat the clams and that could suggest the whole water species has it or it could’ve been something the tuna ate, i.e., seaweed/algae that infected it and then once the lobster ate the infected tuna, it too became ‘E positive’’’). These differences were significant via one-way ANOVA (F(2,138) ¼ 50.12, p < 0.0001, hp2 ¼ 0.42). These results again demonstrate the spontaneous use of knowledge about taxonomic relatedness, spatiotemporal contiguity, and causal transmission to guide inductive reasoning. Moreover, the absolute means were remarkably similar across the two studies, providing some confidence that the distribution of inferences observed in Study One were not due solely to idiosyncrasies of premise pairs. 3.2.2.2. Property Effects A central question in Study Two was the degree to which properties guide and constrain the generation of inductive inferences. To address this question, we first collapsed across all items in order to look at the overall impact of property on the frequencies of our three types of inferences. In general, we expected substance to show the same overall pattern as in Study One. In contrast, we thought disease might bias participants toward extrinsic and/or causal inferences and away from taxonomic inferences, and conversely, gene might bias participants away from extrinsic or causal inferences and toward taxonomic inferences. To test these hypotheses, we conducted a 3 (Property) 3 (Inference Type) ANOVA. As expected, the distribution of inferences differed markedly depending on the kind of property being projected (Inference Type Property interaction (F(4,138) ¼ 17.32, p < 0.0001, hp2 ¼ 0.33).
209
Generating Inductive Inferences
1.0 Taxonomic inference Extrinsic inference Causal inference
Relative frequency of inference
0.8
0.6
0.4
0.2
0.0 Substance
Disease
Gene
Property
Figure 2 Mean relative frequency of taxonomic, extrinsic, and causal inferences in each property condition, Study Two. Error bars represent one standard error of the mean.
We analyzed this interaction in two ways. First, we looked at the relative distribution of inferences in each property condition, as depicted in Figure 2. Second, to examine the degree to which reasoning about disease and gene changed the frequency of different inferences relative to the neutral substance condition, we compared the frequency of each type of inference for people reasoning about disease and gene to the frequency of the inference among people reasoning about substance. Difference scores are presented in Figure 3. 3.2.2.2.1. Substance The distribution of inferences about substance replicated Study One; taxonomic inferences were marginally more frequent than extrinsic inferences, and both were more frequent than causal inferences, which were nevertheless fairly prevalent (F(2,46) ¼ 13.08, p < 0.0001, hp2 ¼ 0.36). However, taxonomic inferences were not more
210
John D. Coley and Nadya Y. Vasilyeva
0.3
Change in relative frequecy from substance
0.2
0.1
0.0
–0.1
–0.2 Disease Gene –0.3 Taxonomic
Extrinsic
Causal
Type of inference
Figure 3 Mean change in relative frequency of inferences from substance condition, Study Two. Error bars represent one standard error of the mean.
frequent than nontaxonomic (extrinsic and causal inferences taken together) (0.58 vs. 0.55, t(23) ¼ 0.40, p ¼ 0.691), again replicating Study One. 3.2.2.2.2. Disease As expected, reasoning about disease led to a marked 22% increase in causal inferences relative to the substance condition (t(46) ¼ 3.06, p ¼ 0.004, Cohen’s d ¼ 0.89) and a 10% decrease in taxonomic inferences (t(46) ¼ 1.65, p < 0.11, Cohen’s d ¼ 0.48; see Figure 3).4 The small decrease in extrinsic inferences did not approach significance. As a result, when compared to each other, the relative frequency of the three inference types in the disease condition did not differ (F(2,46) < 1; see Figure 2). However, the relative frequency of nontaxonomic inferences considered together was significantly higher than 4
Although the t-test by subjects only approached statistical significance, the t-test by items was highly significant (t(35) ¼ 3.37, p < 0.002) and a sign test revealed that for 28/36 items, taxonomic similaritybased inferences were equally or less common for disease than for substance (p ¼ 0.007). Together, these results plus the moderate size of the effect give us confidence in the reliability of the difference.
Generating Inductive Inferences
211
frequency of taxonomic inferences (0.67 vs. 0.47, t(23) ¼ 2.41, p ¼ 0.024, Cohen’s d ¼ 0.84). This pattern of inferences suggests that to some degree, reasoning about disease selectively supported inferences based on causal relations at a cost to inferences based on taxonomic similarity. However, it is clear that all three types of inferences were deemed relevant for reasoning about disease. In support of this view, responses in the disease condition were more likely to receive multiple codes than in the other two conditions (F(2,69) ¼ 6.17, p ¼ 0.003, hp2 ¼ 0.15). This suggests that individual inferences about disease were more complex than inferences about substance or gene. In sum, although inferences generated about disease were more causal and less taxonomic than inferences generated about substance, they were by no means exclusively causal or extrinsic, and thus represented an increase in multidimensional reasoning. 3.2.2.2.3. Gene In contrast, gene was a strongly taxonomically biasing property, as we expected it to be. Compared to the substance condition, taxonomic inferences increased dramatically for people reasoning about genes, whereas extrinsic and causal inferences both decreased markedly (t(46) > 3.44, p 0.001, Cohen’s d 1.00; see Figure 3). Consequently, as seen in Figure 2, taxonomic similarity inferences were much more frequent than extrinsic similarity inferences which in turn were much more frequent than causal inferences (F(2,46) ¼ 141.35, p < 0.0001, hp2 ¼ 0.86). Not surprisingly, taxonomic inferences were also more frequent than nontaxonomic inferences (0.75 vs. 0.25, t(23) ¼ 9.03, p < 0.0001, Cohen’s d ¼ 3.11). Thus, in both an absolute sense and a relative sense, reasoning about genes greatly increased the likelihood of generating taxonomic inferences. Interestingly, a close look at Table 3 reveals that the increase in taxonomic reasoning about genes was not due to an increase in categorybased inferences, which actually decreased in frequency (albeit not reliably). Rather, the increase in taxonomic reasoning stemmed from an increase in inferences based on perceptual similarity (t(46) ¼ 2.88, p ¼ 0.006, Cohen’s d ¼ 0.83) and behavioral similarity (t(46) > 4.61, p < 0.0001, Cohen’s d ¼ 1.33) relative to substance. This suggests that rather than simply falling back on category membership, participants may have attempted to connect the hypothetical gene with specific perceptual or behavioral attributes of premise species, and then base projections on those specific attributes. For example, one participant projected a gene from humpback whale/squirrel to ‘‘opossum, mole, gray mouse, dolphin: they are all gray in color’’ and from raccoon/pelican to ‘‘squirrel, seagull, pigeon: these are animals that rummage through things.’’ This raises the interesting possibility that when people think about genes, they give more weight to their potential to give rise to certain observable characteristics than to their general association with a taxonomic class. In other words, people in this task seemed to be projecting ‘‘gray color genes’’ or ‘‘rummaging genes’’ rather than ‘‘mammal genes’’ or ‘‘bird genes.’’
212
John D. Coley and Nadya Y. Vasilyeva
In sum, property had large effects on the relative frequency with which different inferences were generated. Inferences about substance mirrored those of Study One. Inferences about disease were more complex and multidimensional than for other properties; relative to substance, inferences about disease were more likely to be causal, and less likely to be taxonomic, although all three types of inferences were seen as equally appropriate. In contrast, inferences about genes were strongly biased toward taxonomic similarity. We next examine effects of premise relations on inference generation, and in particular, the degree to which premise relations and property interact in constraining inference generation. 3.2.2.3. Effects of Premise Relations One motivation for conducting Study Two was to be more careful in our manipulations of ecological relations among premise species. As such, we strove to choose pairs that were related via predation and shared habitat, pairs that were related via shared habitat only, and unrelated pairs. Results of posttests suggested that although participants viewed the relatedness of the premise pairs in the manner in which we intended, individual variability in salience of relations both within and between our planned item classes was again larger than we anticipated. Therefore, as in Study One, we decided to trust our participants’ beliefs about premise relatedness rather than our a priori expectations, and to construe the salience of taxonomic, habitat, and predation relations between each premise pair as continuous variables (ranging from weak to strong, based on participants’ ratings) rather than as categorical variables (present/absent) as originally conceived. We present multiple regression analyses comparable to those in Study One—using salience of shared habitat, predation, and taxonomic relations to predict item-wise frequency of each type of inference—rather than ANOVA. First, we averaged across property conditions to get an overall picture of how premise relations predicted inferences. Based on the results of Study One, we expected different inferences to be sensitive to different premise relations; of interest was whether Study Two replicated the specific relations between premises and inferences we observed in Study One. Results of this analysis are presented in Figure 4. Two things are notable in Figure 4. First, the way in which premise relations facilitated inferences was identical to what we observed in Study One. Second, unlike in Study One, the salience of shared habitat rendered taxonomic inferences less likely. Specifically, the frequency of taxonomic inferences increased with taxonomic salience, decreased with the salience of shared habitat, but was unrelated to the salience of predation relations (R2 ¼ 0.68, p < 0.0001). In contrast, extrinsic inferences were positively related to the salience of shared habitat, but unrelated to taxonomic salience (R2 ¼ 0.31, p ¼ 0.007). The negative relation between the salience of predatory relations and the frequency of inferences based on
213
Generating Inductive Inferences
1.0
*** ***
Standardized regression coefficient
** 0.5
0.0
–0.5
*** Same biological family Same habitat One eats the other –1.0 Taxonomic
Extrinsic
Causal
Inference type
Figure 4 Relations between salience of premise relations and frequency of taxonomic, extrinsic, and causal inferences averaged across property conditions, Study Two. (Note: **p < 0.005, ***p < 0.0005.)
extrinsic similarity observed in Study One was marginally significant overall; as we shall see, this particular relation varied by property. Finally, casual inferences were positively related to salience of predatory relations, but unrelated to salience of taxonomic relations or shared habitat (R2 ¼ 0.69, p < 0.0001) (as discussed below, this relation also varied somewhat with property). In sum, as in Study One, we observed a tight coupling between the salience of relations among premise categories and inferences drawn from those categories. Taxonomic relations promoted taxonomic inferences, shared habitat promoted extrinsic inferences, and predation relations promoted causal inferences. It is notable that, unlike in Study One, the salience of shared habitat strongly inhibited taxonomic inferences, suggesting that in the presence of a salient alternative relation, the appeal of taxonomic inferences faded. One possible explanation is that taxonomic inferences serve as a default, and when people notice a salient habitat relation they may tend to believe that this is
214
John D. Coley and Nadya Y. Vasilyeva
what was being specifically ‘‘communicated’’ to them by this premise pair (according to the relevance theory of Medin et al., 2003) rendering them less likely to make a default taxonomic inference. Alternatively, the presence of salient habitat relations may have led participants to develop alternative contextual hypotheses that reduced the strength of taxonomic hypotheses, consistent with findings of McDonald et al. (1996). In either case, since we do not see consistent reciprocal effects of taxonomic relations on other types of inferences, we can speculate that people may have an internal ‘‘relevance ranking’’ of different relations, with contextual relations ranked fairly high. 3.2.2.4. Does Property Influence how Premise Relations Generate Inferences? So far, results show clear effects of property and of premise relations on generation of inductive inferences. However, we were particularly interested in whether these effects were independent of each other, or whether the way premise relations led participants to generate inferences varied by property. To examine this question, we performed separate multiple regressions on item-wise salience and inference scores for each property condition. Standardized regression coefficients are presented in Figure 5; below we discuss results for each type of inference in turn.
3.2.2.4.1. Taxonomic inferences As seen in Figure 5, taxonomic inferences increased with salience of taxonomic relations between premise categories, decreased with the salience of shared habitat, and were unaffected by the salience of predation relations in all three property conditions (Substance: R2 ¼ 0.48, p < 0.0001; Disease: R2 ¼ 0.55, p < 0.0001; Gene: R2 ¼ 0.32, p ¼ 0.005). This suggests that the property being projected had little influence on the way in which premise relations licensed taxonomic inferences. Although the absolute level of taxonomic inferences varied from 74% for gene to 45% for disease, in all cases, salient taxonomic relations among premises facilitated the generation of taxonomic inferences, whereas salience of shared habitat inhibited them. Thus, property and premise relations exerted independent effects on taxonomic inferences. 3.2.2.4.2. Extrinsic inferences Extrinsic inferences were more weakly predicted by premise relations, and the nature of the relationship varied by property. As depicted in Figure 5, for those reasoning about substance, frequency of extrinsic inferences increased with the salience of shared habitat, but was unrelated to the salience of taxonomic and predation relations (R2 ¼ 0.31, p ¼ 0.007) whereas for gene, extrinsic inferences increased with the salience of shared habitat, and decreased with the salience of both taxonomic and predation relations (R2 ¼ 0.35, p ¼ 0.003). This pattern suggests that—unlike taxonomic inferences—property changed the way premise relations promoted extrinsic inferences. While any detailed explanation of this pattern of results would be speculation, results clearly
215
Generating Inductive Inferences
Taxonomic inferences
Extrinsic inferences 1.0
*** ***
Standardized regression coefficient
Standardized regression coefficient
1.0
***
0.5
0.0
–0.5
**
*
**
Same biological family Same habitat One eats the other
–1.0 Substance
Disease
0.5
0.0
*
*
–0.5
Same biological family Same habitat One eats the other
–1.0
Gene
***
**
Substance
Disease
Property
Gene
Property Causal inferences
Standardized regression coefficient
1.0
***
***
*
0.5
0.0
* –0.5 Same biological family Same habitat One eats the other
–1.0 Substance
Disease
Gene
Property
Figure 5 Relations between salience of premise relations and frequency of taxonomic, extrinsic, and causal inferences in each property condition, Study Two. (Note: *p < 0.05, **p < 0.005, ***p < 0.0005.)
demonstrate the interplay of background knowledge about distribution of properties on the one hand, and salient relations among premise categories on the other. In contrast, for disease, frequency of extrinsic inferences was unrelated to any premise relations (R2 ¼ 0.08, p ¼ 0.422). However, it is important to point out that even though extrinsic inferences about disease were not predicted by premise relations, their frequency was nevertheless relatively high. Thus, disease appears to independently promote extrinsic inferences. Such a pattern could be due to participants relying on a general theory—or overhypothesis (Goodman, 1955)—stating that diseases are distributed via
216
John D. Coley and Nadya Y. Vasilyeva
spatial or contextual relations, which would make extrinsic inferences appealing regardless of premise relations. 3.2.2.4.3. Causal inferences As seen in Figure 5, the way in which premise relations predicted causal inferences also varied by property, but less so. In all property conditions, generation of causal inferences increased with the salience of predation relations, and was unrelated to salience of shared habitat. Additionally, for participants reasoning about disease (but not substance or gene), causal inferences decreased with the salience of taxonomic relations (Substance: R2 ¼ 0.52, p < 0.0001; Disease: R2 ¼ 0.68, p < 0.0001; Gene: R2 ¼ 0.19, p ¼ 0.075). In sum, causal reasoning was consistently promoted by salience of predation relations between premise categories, but unrelated to salience of shared habitat. This suggests that contextual similarity was necessary but not sufficient to promote causal inferences, which were rendered particularly tempting when participants were reminded of predator–prey interactions among premise species. This reminding may have provided a salient causal mechanism to explain a shared property. Even for those reasoning about genes, despite the relative dearth of causal inferences (3%), such inferences were still positively predicted by the salience of predation relations among premise species. Although effects of property on the kinds of knowledge recruited to guide causal inferences were not dramatic, they confirm that the nature of the property can influence the way premise relations are used to guide inference generation.
3.3. Summary: Effects of Property on Inference Generation Results of Study Two show that property influenced inference generation at two levels. First, naı¨ve theories about the nature of the properties affected the relative frequency with which participants generated taxonomic, extrinsic, and causal inferences. Reasoning about substance replicated Study One, whereas reasoning about genes strongly biased participants toward taxonomic inferences, and reasoning about disease promoted causal reasoning, but also resulted in a more complex and multidimensional inference pattern. Second, property influenced both the degree to which relations are recruited to guide inferences and the quality of the effects of premise relations on inferences, creating a property-specific facilitation/inhibition profile. In addition, for extrinsic and causal inferences, the effects of premise relations varied by property, whereas for taxonomic inferences, they did not. Finally, Study Two also replicated the overall distribution of inferences, and the effects of premise relations on inference generation, from Study One. Salient taxonomic relations increased taxonomic inferences, salient habitat relations increased extrinsic inferences, and salient predation
Generating Inductive Inferences
217
relations increased causal inferences. The one departure from Study One was the finding that salience of shared habitat consistently inhibited taxonomic inferences.
4. Inference Generation: Conclusions and Implications In two experiments utilizing a novel open-ended induction task we have demonstrated that salient relations among premise categories, and the nature of the property being projected, both guide and constrain the ways in which people generate inductive inferences about novel properties of animals. In this section, we summarize our main findings about the process of inference generation and discuss possible implications for the broader study of inductive reasoning.
4.1. What Have We Learned About Inference Generation? In contrast to traditional methods used in the study of inductive inference, which require participants to evaluate the strength of inductive arguments, participants in our open-ended induction task generated their own inferences from the premise categories and properties we supplied. This approach encouraged them to generate a variety of inferences. Not surprisingly, taxonomic inferences—based on common category membership or shared intrinsic features—were generated most frequently (e.g., an inference from lemming/snowy owl to ‘‘other species of owl and similar species of lemming because of biological similarities between similar animals’’ or from tiger/clownfish to ‘‘a zebra because clownfish and tigers both have stripes. A zebra also has stripes’’). Extrinsic inferences—based on shared situational or contextual features—were also quite common (e.g., an inference from lobster/tuna to ‘‘crabs, catfish, salmon, oysters, shrimp, because they all live in similar environmental conditions’’). Perhaps, most striking was the finding that 20% of inferences generated by participants were based on causal relatedness or interaction (e.g., projecting a substance from salmon/black bear to ‘‘other bears and fish, because the bear might get substance A in their bloodstream by eating salmon, which also has substance A. So any other animal that eats salmon would probably have it also’’ or projecting a disease from ant/anteater to ‘‘birds because the disease may come from the ants themselves. By eating them the anteater got the disease, as would birds’’). Clearly, a broad range of knowledge is used in the process of generating inductive inferences. Moreover, the type of knowledge used to generate inferences varied systematically with the specifics of each inductive problem. Salient relations
218
John D. Coley and Nadya Y. Vasilyeva
among premise categories had a pronounced effect on the nature of inferences generated from those categories. Participants often explicitly referred to relations among premise species to explain their inferences. For example, one participant projected a substance from humpback whale/squirrel to ‘‘other mammals because whales and squirrels are both mammals.’’ Another projected a substance from owl/deer to ‘‘rabbit because all are found in woods.’’ A third projected a substance from elephant/crocodile to ‘‘rhino, hippo, alligator because all have tough, thick skins. Maybe substance E has to do with producing leathery skin.’’ Even more telling was the fact that many participants found themselves at a loss to generate an inference from an unrelated premise pair. The response of one participant, when confronted with the bullfrog/chipmunk pair, was typical: ‘‘No clue. I can’t think of a relationship between the two.’’ Indeed, the links between premise relations and inferences were quite specific. The salience of taxonomic relatedness consistently predicted taxonomic inferences, the salience of shared habitat consistently predicted extrinsic inferences, and the salience of predation relations consistently predicted causal inferences. Premise relations also had inhibitory effects. Most strikingly, salience of shared habitat reliably (in Study Two, at least) inhibited taxonomic inferences. In addition to premise relations, property also had a large effect on the inferences participants generated. One way in which property influenced inference generation was to invoke naı¨ve theories about how kinds of properties are likely to be distributed or transmitted. Substance served as a more or less neutral property; taxonomic and nontaxonomic inferences about substances were equally frequent (although more specifically, taxonomic inferences were more frequent than extrinsic inferences, which were more frequent than causal inferences). Compared to substance, participants reasoning about novel genes were biased in the direction of taxonomic inferences, whereas those reasoning about novel diseases were biased in the direction of causal inferences. Even more strikingly, property influenced what relations among premise categories were seen as relevant. To illustrate, in the gene condition participants responding to the lion/zebra item tended to generate taxonomic inferences like ‘‘tiger, gazelle, horse, because they all have 4 legs, with similar features,’’ and ‘‘tigers and giraffes, because tigers and lions are similar animals, and zebras and giraffes are similar animals.’’ In contrast, in the disease condition participants tended to generate causal inferences from the same pair, like ‘‘hyenas, and lion prey, because lion could have gotten the disease from eating the zebra and spread it to any other animal it came in contact with,’’ and ‘‘Tigers/scavengers that eat zebras because zebras may carry the disease.’’ Thus, not only did different properties engender different inferences from the very same premise pair, but they also rendered different relations among the premise categories salient. Reasoning about genes rendered taxonomic knowledge salient because of what we believe about
Generating Inductive Inferences
219
genes and how they work; therefore, what seemed most relevant about lions and zebras is that both are quadrupedal mammals. In contrast, reasoning about disease rendered knowledge of spatiotemporal interactions salient because of what we believe about diseases and how they work; therefore, what seemed most salient about lions and zebras is the fact that lions eat zebras. One final and striking finding was the frequency with which people generated vague inferences (e.g., one participant projected a substance from leaf-cutter ant/anteater to ‘‘an animal that is a predator to an anteater. I can’t think of any ‘cause I’m not an animal expert. Anteater eats ants, and they both have this substance. So I assume whatever eats an anteater will have it too or receive it by eating it.’’) Inferences like this were quite common and reinforce the idea that people can generate sophisticated and subtle inferences based on framework theories, often despite the lack of specific knowledge. Indeed, this pattern of response is strongly reminiscent of the idea of overhypotheses. Goodman (1955) suggested that people possess abstract beliefs describing the scope of properties, and that these beliefs could constrain possible hypotheses about how properties could be projected. When one of our participants projected a novel gene from leafcutter ant/anteater to ‘‘other animals in the same family as the anteater and leaf cutter ant, because related animals have similar genes,’’ they unwittingly exemplified this idea perfectly. In sum, our results suggest that people generate inductive inferences by extracting salient relations from premise categories in light of what they understand about the property being projected, and then drawing inferences consistent with those relations. This process emphasizes the degree to which categorical induction is both flexible and knowledge-driven. We next consider the broader implications of these findings.
4.2. Implications Taken together, these results show that salient relations derived from comparison of premise categories—in concert with knowledge activated by the property being projected—provide important constraints on the generation of inductive inferences. In some sense, these results should be reassuring in that they reinforce findings that have emerged from the use of argument evaluation. We knew that property influenced how people evaluate arguments (e.g., Heit & Rubinstein, 1994; Kalish & Gelman, 1992; Ross & Murphy, 1999; Shafto & Coley, 2003; Shafto, Coley, & Baldwin, 2007), and now we know it also influences how they generate inferences. We knew that premise relations had an impact on argument evaluation (McDonald et al., 1996; Medin et al., 2003), and now we know they also have an impact on inference generation. In other words, the picture of inductive reasoning that emerges from considering inference generation in
220
John D. Coley and Nadya Y. Vasilyeva
addition to argument evaluation seems to be a coherent one. However, we believe that our perspective has also highlighted aspects of inductive reasoning that might otherwise have remained in the shadows. 4.2.1. Salience of Taxonomy in Category-Based Induction Traditional accounts have emphasized the role of taxonomic similarity in evaluating category-based inductive arguments. In contrast, our results clearly show that when generating inferences, participants spontaneously appealed to extrinsic similarity and causal relatedness as often as taxonomic similarity. In particular, the prevalence of causal reasoning in these experiments is surprising given previous research showing such reasoning is common among experts, but rare among folk biological novices like the undergraduates who participated in these experiments (e.g., Coley, Shafto, et al., 2005; Coley, Vitkin, et al., 2005; Coley et al., 1999). Past research— utilizing argument evaluation—has shown that experts tend to flexibly utilize knowledge of taxonomic, extrinsic, and causal relations, whereas novices are strongly biased toward taxonomic inferences (e.g., Lo´pez et al., 1997; Shafto & Coley, 2003). As discussed above, forced-choice or argument-evaluation tasks require participants to recognize relations between given premise and conclusion categories. In contrast, our task allowed participants to generate their own inferences, and the way we coded responses gave participants credit for the knowledge underlying their inferences, even if it was vague (e.g., projecting a disease from parrot/toucan to ‘‘other birds that live in the tropical climates’’ or from newt/box turtle to ‘‘other creatures that eat newts and box turtles. . .’’) or factually incorrect (e.g., projecting a substance from snowy owl/lemming to ‘‘all owls because lemmings and snowy owls are both owls’’ or a disease from penguin/herring to ‘‘ostriches and guinea hens, because ostriches and penguins both can’t fly, and I’m not sure what a herring is but I think it might be related to a guinea hen’’). Thus, despite the lack of specific factual knowledge about tropical birds, what might eat a box turtle, or what a lemming is, this format enabled participants to nevertheless generate and articulate relatively sophisticated causal and extrinsic inferences. This suggests that the relative paucity of ecological and causal reasoning among folk-biologically naı¨ve participants in previous research may be due in part to the fact that they were being asked to recognize such relations, rather than generate them. Besides potentially taking taxonomic inferences down a peg or two, our results also have implications for the salience of taxonomic relations. A number of studies have shown that taxonomic knowledge dominates other conceptual relations in terms of salience, speed of access (e.g., Ross & Murphy, 1999; Vitkin et al., 2005), and use in guiding inductive inferences (e.g., Shafto, Coley, & Baldwin, 2007). In contrast, our results provide little evidence that taxonomic relations between premise categories are privileged in terms of their impact on inference generation. Indeed, if
Generating Inductive Inferences
221
anything, we observed the opposite; the presence of salient ecological relations among premise was more likely to suppress taxonomic reasoning than vice versa. We have several thoughts on these findings. First, because they were generating their own inferences, rather than evaluating our best guesses as to what they deem plausible arguments, participants were not constrained by lack of specific knowledge (nor, indeed, by facts or reality). As such, informationally vague yet causally sophisticated inferences—which would not be detectable in an argument evaluation paradigm—were relatively common. Second, because our task did not involve any time pressure or speeded responding, and was in fact deliberately reflective in that participants were asked to explain their inferences as well as generate them— baseline differences in knowledge accessibility (Shafto, Coley, & Baldwin, 2007; Shafto, Coley, & Vitkin, 2007) were probably not a factor. In other words, the results of tasks involving time pressure suggest that taxonomic knowledge might be initially more accessible, but our results suggest that given sufficient time, other knowledge is readily recruited to guide inductive inferences. Third, inference generation may involve stronger differentiation between specific kinds of relatedness than argument evaluation does. Beyond assessing whether the premises are sufficiently related in a general way consistent with the projected property—as required for argument evaluation—our participants had to generate novel hypotheses and then articulate the relationships between premises and their hypotheses. Internally labeling taxonomic and ecological relations among premises as such might promote discounting of irrelevant relations and focus attention on more relevant relations. If taxonomic relations are highly salient, yet on some occasions they are viewed as irrelevant for projection, such a selective approach would diminish the effect of taxonomic relatedness on inference generation compared to argument evaluation. Finally, it may also be that, more generally, taxonomic and relational categories have differing cognitive functions. Ross and Murphy (1999) point out that in the domain of food, taxonomic categories—based on intrinsic properties—are useful for categorization and identification, whereas script categories—based on habitual co-occurrence in space and time—are useful for generating solutions to problems like ‘‘what should I have for breakfast?’’ Likewise, in folk biology, relational categories like pond animals, or even noncategorical relations like predator–prey, may be especially useful for generating solutions to problems like ‘‘what other species are likely to have this substance/disease’’ because they embody relations seen as causally relevant for explaining how such properties could come to be shared among species. As such, by focusing on inference generation we may have tapped into precisely the kind of cognitive task that such categories are most useful for.
222
John D. Coley and Nadya Y. Vasilyeva
4.2.2. Challenges for Models of Category-Based Induction 4.2.2.1. What Needs to be Explained? Our findings expand the range of inductive phenomena that any successful theory of inductive reasoning needs to explain. First of all, any successful model has to incorporate a variety of potential bases for inductive inference; at the very least, these must include both taxonomic and extrinsic similarity, and causal relations, but we make no claim about whether this list is exhaustive.5 We emphatically reinforce the point (made elsewhere, e.g., Coley, Shafto, et al., 2005; Coley, Vitkin, et al., 2005; Lassaline, 1996; Medin et al., 2003) that similarity alone—no matter how flexibly conceived—cannot be sufficient to explain inductive reasoning. Second, any theory of inductive reasoning must take into account the fact that the kinds of knowledge used to generate an inference depend critically on the property being projected and on salient relations among premise categories. We think that the kind of models being developed by Tenenbaum and colleagues (e.g., Griffiths & Tenenbaum, 2005; Shafto et al., 2008; Tenenbaum et al., 2006)—which rely on property to indicate which knowledge structure might be most relevant for assessing a given argument—are a step in the right direction. However, our results suggest that not only does property influence the kinds of knowledge upon which participants base their inferences, but it also influences their interpretation of relations among premise categories, and the way in which those relations influence inferences. Any successful theory of inductive reasoning must take into account this interplay between domain knowledge, beliefs about premise relatedness, and beliefs about the property. Finally, any successful theory must take into account the fact that inferences vary widely in their specificity. This is reminiscent of Keil’s (Keil, 2003; Rozenblit & Keil, 2002) proposals about the ‘‘illusion of explanatory depth’’ in the sense that participants probably do not have a detailed understanding about mechanisms of epidemiology or genetics, or about specifics of food webs, but that relatively abstract and cursory framework principles can nevertheless effectively guide inductive reasoning. Likewise, this finding fits with Coley and colleagues’ work on hierarchical induction (Coley et al., 1997, 2004). Although they investigated a very different issue—namely the degree to which knowledge of concepts at different levels of abstraction corresponded to the relative strength of inferences to those concepts—they found that the level at which participants expected category members to share novel properties differed from participants’ knowledge of actual properties shared by category members. Coley et al. (2004) conclude that ‘‘inductive inference is driven by 5
It may well be exhaustive in the context of folk biological inductive reasoning, but for other domains such as reasoning about artifacts or about social categories, no doubt other kinds of inferences might be generated.
Generating Inductive Inferences
223
expectations about conceptual structure that go beyond what is known about particular category members’’ (p. 249). Our findings on inference generation support that conclusion. 4.2.2.2. Focus on Process By focusing on how people generate inductive inferences—how they use what they know to make sensible guesses about what they do not know—we hope to direct some attention to the littlestudied but critical issue of process in inductive reasoning. In most previous work on inductive inference, the target behavior has been evaluation of a complete argument or choice from among a limited set of alternatives. As such, the questions to be explained—and therefore the natural and appropriate goals of empirical and theoretical investigation—have concerned factors that predict argument strength ratings or choices. These have tended to focus on characteristics of arguments (and implicitly or explicitly, the interactions of these characteristics with the knowledge of the reasoner) that render them strong or weak (e.g., McDonald et al., 1996; Medin et al., 2003; Osherson et al., 1990; Sloman, 1993). There is no reason why studies of argument evaluation cannot in principle examine process issues; indeed, a few have done so. For instance, Shafto, Coley, and Baldwin (2007) showed that time pressure lowers strength ratings for inductive arguments based on extrinsic relations, but has no effect on arguments based on taxonomic relations. Likewise, Feeney et al. (2010) have shown that premise reading times are related to the changes in argument strength brought about by those premises; larger changes in argument strength are associated with longer reading times, and presumable deeper processing. Rather, the focus on process inherent in the inference generation approach is more a difference in emphasis. When the target behavior is inference generation, rather than argument evaluation, the questions to be explained concern what inferences are generated under different conditions, why they are generated, and how they are generated. These questions naturally focus attention on the characteristics of the process of inductive inference. We have assumed that the process of inference generation involves accessing knowledge about premise categories and the property being projected, making decisions about what knowledge is relevant, and then generating an actual response. Clearly, there are many processes involved in even this cursory description. These include searching semantic memory for relevant knowledge, comparing premise categories for salient relations, accessing explanatory theories about the nature of the property, and potentially searching for relevant conclusion categories once a basis for inference has been determined, to name just a few. At the moment, we lack answers about the role of any of these processes in inference generation. We do not, however, lack questions. For example,
224
John D. Coley and Nadya Y. Vasilyeva
what is the mechanism by which properties constrain inference generation? Do they focus the search at a relatively early point and thereby limit the candidate inferences that are evaluated? Or do they serve mainly to cull an exhaustive list of possible inferences generated via premise comparison down to a few likely candidates? We hope that our initial look at inference generation prepares the ground and invites further work examining the processes that underlie flexible inductive reasoning.
4.3. Conclusions At the risk of repeating ourselves, we cannot and should not base a psychology of inductive reasoning solely on studies of argument evaluation. In the hope of putting ‘‘reasoning’’ back into the study of category-based induction, we have presented an initial look at how people generate inductive inferences. Our results show that salient relations derived from comparison of premise categories—in concert with knowledge about the property being projected—provide important constraints on the generation of inductive inferences. We have also shown that such inferences vary widely in their specificity, and make contact with a broad range of real-world knowledge. In taking this approach, we hope to draw attention to the process of inductive reasoning as well as the outcome, and to emphasize the knowledge-driven and creative nature of human inductive inference.
ACKNOWLEDGMENTS This chapter is based upon work supported by the National Science Foundation under Grant No. 0236338. We are indebted to Anna Vitkin and Allison Baker for their important contributions to the research reported here. We thank Brett Hayes and Gregory Murphy for careful and thoughtful comments on previous incarnations of this paper. We are especially grateful to Kaitlyn Amato, Yui Anzai, Nicole Ciampanelli, Lindsey Davis, Konstantin Feigin, Ruiwen Hu, Janelle LaMarche, Brianna Roche, Claire Seaton, Carissa Shafto, Courtney Steller, and Jennelle Yopchick for their Herculean efforts to collect and code the data reported here.
REFERENCES Chomsky, N. (1980). Rules and representations. Oxford: Basil Blackwell. Coley, J. D., Hayes, B., Lawson, C., & Moloney, M. (2004). Knowledge, expectations, and inductive inferences within conceptual hierarchies. Cognition, 90, 217–253. Coley, J. D., Medin, D. L., & Atran, S. (1997). Does rank have its privilege? Inductive inferences within folkbiological taxonomies. Cognition, 64, 73–112. Coley, J. D., Medin, D. L., Proffitt, J. B., Lynch, E. B., & Atran, S. (1999). Inductive reasoning in folkbiological thought. In D. L. Medin & S. Atran (Eds.), Folkbiology (pp. 205–232). Cambridge, MA: MIT Press.
Generating Inductive Inferences
225
Coley, J. D., Shafto, P., Stepanova, O., & Barraff, E. (2005). Knowledge and categorybased induction. In W. Ahn, R. L. Goldstone, B. C. Love, A. B. Markman, & P. Wolff (Eds.), Categorization inside and outside the laboratory: Essays in honor of Douglas L. Medin (pp. 69–85). Washington, DC: American Psychological Association. Coley, J. D., Vitkin, A. Z., Seaton, C. E., & Yopchick, J. E. (2005). Effects of experience on relational inferences in children: The case of folk biology. In B. G. Bara, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th annual conference of the Cognitive Science Society (pp. 471–475). Mahwah, NJ: Lawrence Erlbaum Associates. Coley, J. D., Vitkin, A. Z., Vasilyeva, N. Y., & Amato, K. (2007). Experience increases flexible ecological reasoning. In: Paper presented at the 15th Biennial Conference of the Australasian Human Development Association. Sydney, New South Wales, Australia. Feeney, A., Coley, J. D., & Crisp, A. (2010). The relevance theory of category-based induction: Evidence from garden path arguments. Journal of Experimental Psychology: Learning, Memory and Cognition, 36. Feeney, A., Shafto, P., & Dunning, D. (2007). Who is susceptible to the conjunction fallacy in category-based induction? Psychonomic Bulletin & Review, 14, 884–889. Gelman, S. A. (2003). The essential child: Origins of essentialism in everyday thought. New York: Oxford University Press. Gelman, S. A., & Coley, J. D. (1990). The importance of knowing dodo is a bird: Categories and inferences in 2-year-old children. Developmental Psychology, 26, 796–804. Goodman, N. (1955). Fact, fiction, and forecast. Indianapolis, IN: Bobbs-Merrill. Griffiths, T. L., & Tenenbaum, J. B. (2005). Structure and strength in causal induction. Cognitive Psychology, 51, 354–384. Heibeck, T., & Markman, E. (1987). Word learning in children: An examination of fast mapping. Child Development, 58, 1021–1024. Heit, E. (2000). Properties of inductive reasoning. Psychonomic Bulletin & Review, 7, 569–592. Heit, E., & Feeney, A. (2005). Relations between premise similarity and inductive strength. Psychonomic Bulletin & Review, 12, 340–344. Heit, E., & Rubinstein, J. (1994). Similarity and property effects in inductive reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 411–422. Kalish, C. W., & Gelman, S. A. (1992). On wooden pillows: Multiple classifications and children’s category-based induction. Child Development, 75, 1871–1885. Keil, F. C. (1981). Constraints on knowledge and cognitive development. Psychological Review, 88, 197–227. Keil, F. C. (2003). Folkscience: Coarse interpretations of a complex reality. Trends in Cognitive Sciences, 7, 368–373. Kemp, C., Perfors, A., & Tenenbaum, J. B. (2007). Learning overhypotheses with hierarchical Bayesian models. Developmental Science, 10, 307–321. Lassaline, M. E. (1996). Structural alignment in induction and similarity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 754–770. Lin, E. L., & Murphy, G. L. (2001). Thematic relations in adults’ concepts. Journal of Experimental Psychology: General, 130, 3–28. Lo´pez, A., Atran, S., Coley, J. D., Medin, D., & Smith, E. E. (1997). The tree of life: Universal and cultural features of folkbiological taxonomies and inductions. Cognitive Psychology, 32, 251–295. McDonald, J., Samuels, M., & Rispoli, J. (1996). A hypothesis-assessment model of categorical argument strength. Cognition, 59, 199–217. Medin, D., Coley, J. D., Storms, G., & Hayes, B. (2003). A relevance theory of induction. Psychonomic Bulletin & Review, 10, 517–532.
226
John D. Coley and Nadya Y. Vasilyeva
Muratore, T. M., & Coley, J. D. (2009). The role of knowledge in folk biological induction. In: Paper presented at the international conference on Biological Understanding and Theory of Mind. Reims, France. Nguyen, S. P., & Murphy, G. L. (2003). An apple is more than just a fruit: Crossclassification in children’s concepts. Child Development, 6, 1783–1806. Osherson, D. N., Smith, E. E., Wilkie, O., Lopez, A., & Shafir, E. (1990). Category-based induction. Psychological Review, 97, 185–200. Proffitt, J. B., Coley, J. D., & Medin, D. L. (2000). Expertise and category-based induction. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 811–828. Rips, L. J. (1975). Inductive judgments about natural categories. Journal of Verbal Learning and Verbal Behavior, 14, 665–681. Ross, B. H., & Murphy, G. L. (1999). Food for thought: Cross-classification and category organization in a complex real-world domain. Cognitive Psychology, 38, 495–553. Rozenblit, L. R., & Keil, F. C. (2002). The misunderstood limits of folk science: An illusion of explanatory depth. Cognitive Science, 26, 521–562. Shafto, P., & Coley, J. D. (2003). Development of categorization and reasoning in the natural world: Novices to experts, naı¨ve similarity to ecological knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 641–649. Shafto, P., Coley, J. D., & Baldwin, D. (2007). Effects of time pressure on context-sensitive property induction. Psychonomic Bulletin & Review, 14, 890–894. Shafto, P., Coley, J. D., & Vitkin, A. (2007). Availability in category-based induction. In A. Feeney & E. Heit (Eds.), Inductive reasoning: Experimental, developmental, and computational approaches (pp. 114–136). Cambridge University Press. Shafto, P., Kemp, C., Bonawitz, E. B., Coley, J. D., & Tenenbaum, J. B. (2008). Inductive reasoning about causally transmitted properties. Cognition, 109, 175–192. Sloman, S. A. (1993). Feature-based induction. Cognitive Psychology, 25, 231–280. Sloman, S. A. (1994). When explanations compete: The role of explanatory coherence on judgments of likelihood. Cognition, 52, 1–21. Sloutsky, V. M., & Fisher, A. V. (2004). Induction and categorization in young children: A similarity-based model. Journal of Experimental Psychology: General, 133, 166–188. Tenenbaum, J. B., Griffiths, T. L., & Kemp, C. (2006). Theory-based Bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10, 309–318. Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207–232. Vitkin, A., Coley, J. D., & Feigin, K. (2005). Accessibility of taxonomic and script knowledge in the domain of food. Paper presented at the 46th annual meeting of the Psychonomic Society, Toronto. Vitkin, A. Z., Vasilyeva, N. Y., & Coley, J. D. (2007). Experience and the development of flexible inductive reasoning in biology. Paper presented at the annual meeting of British Psychological Society 2007 Developmental Section Conference. Plymouth, UK.
C H A P T E R
S I X
From Uncertainly Exact to Certainly Vague: Epistemic Uncertainty and Approximation in Science and Engineering Problem Solving Christian D. Schunn Contents 1. Introduction 2. Linguistic Pragmatics of Uncertainty and Approximation 3. Coding Approximation and Uncertainty from Speech 3.1. Conversation Coding in Engineering Design Team Meetings 3.2. Conversation and Interview Coding in Science and Applied Science Data Analysis 4. Coding Uncertainty from Gestures 5. Uncertainty, Approximation, and Expertise 6. From Uncertainty to Approximation via Spatial Reasoning 6.1. Uncertainty and Verbally Coded Spatial Transformations in Basic and Applied Science 6.2. Association of Uncertainty and Approximation with Spatial Gestures in Basic Science 6.3. From Approximation to Uncertainty via Mental Simulations in Engineering Design 7. Summary and Discussion 8. Future Directions Acknowledgments References
228 229 231 231 232 234 237 241 241 242 244 246 248 249 250
Abstract Epistemic uncertainty is a huge area of scholarship. It has captured the minds of scholars in psychology and many domain-specific studies of reasoning and problem solving. What does it mean to resolve uncertainty? This chapter explores the idea that resolution of uncertainty in complex science and engineering fields frequently ends with approximations rather than precise answers. The chapter begins by examining language to motivate the core Psychology of Learning and Motivation, Volume 53 ISSN 0079-7421, DOI: 10.1016/S0079-7421(10)53006-8
#
2010 Elsevier Inc. All rights reserved.
227
228
Christian D. Schunn
distinction between uncertainty and approximation. Then, the chapter explores whether the distinction can be defended empirically in reliable and valid coding of speech and gesture data in multiple science and engineering domains. Novice/Expert changes in uncertainty and approximation levels are also explored. Finally, three examinations of temporal patterns of co-occurrence with uncertainty and approximation are presented in multiple problem-solving domains to provide an overall model of uncertainty being transformed to approximation through spatial reasoning and mental simulations.
1. Introduction Studies of behavior in the real world have consistently found that uncertainty has a large influence on behavior. For example, there is a whole subdiscipline of naturalistic decision making focused on judgment under uncertainty (e.g., Klein, 1989). Indeed, there are many pragmatic implications for better understanding uncertainty. For example, the ways in which experts reason about uncertainty in future forecasts under different actions, the ways in which experts choose to communicate this uncertainty to the voting public or the future voting public (in schools), and the ways in which the public understand the uncertainty will also influence critical decisions being made by politicians today (Friday, 2003). While much progress has been made, there is still much to be learned about how uncertainty influences behavior. There are several taxonomies of uncertainty types in existence. Some come from psychology judgment and decision-making research (Berkeley & Humphreys, 1982; Howell & Burnett, 1978; Kahneman & Tversky, 1982; Krivohlavy, 1970; Lipshitz & Strauss, 1997; Musgrave & Gerritz, 1968; Trope, 1978). Others come from a broad array of particular disciplines, such as geography (Abbaspour, Delavar, & Batouli, 2003), ecology (Regan, Colyvan, & Burgman, 2002; Regan, Hope, & Ferson, 2002), finance (Rowe, 1994), management (Priem, Love, & Shaffer, 2002), geospatial information systems (Plewe, 2002), law (Walker, 1991, 1998), acoustics (Egan, Schulman, & Greenberg, 1961), medicine (Brashers et al., 2003; Hall, 2002), consumer choice (Sheer & Cline, 1995; Urbany, Dickson, & Wilkie, 1989), driving behavior (Vlek & Hendrickx, 1988), educational research (Webster & Bond, 2002), negotiation (Bottom, 1998), military tactics (Cohen, Freeman, & Thompson, 1998), and statistics. The sheer number of such domain-specific accounts makes clear how complex and central uncertainty resolution is to problem solving. These taxonomies typically emphasize the different sources of uncertainty—reasons why a problem solver might be uncertain. A different issue from the sources of informational uncertainty (objective ambiguity in the existing information) is psychological uncertainty
From Uncertainly Exact to Certainly Vague
229
( Jousselme, Maupin, & Bosse´, 2003), the internal feeling of being uncertain about information which may or may not be objectively uncertain. Presumably, it is this internal state that directly influences behavior: making choices (Kahneman & Tversky, 1982), avoiding situations, or driving new problem solving aimed at reducing the uncertainty levels (Trickett, Trafton, & Schunn, 2009). Of course the underlying source of informational uncertainty may also influence behaviors aimed at reducing the psychological uncertainty. For example, Lipshitz and Strauss (1997) found that decision makers react differently to three different types of uncertainty: inadequate understanding, incomplete information, and undifferentiated alternatives. Inadequate understanding is addressed by collecting more information; incomplete information is typically addressed through assumption-based reasoning; and undifferentiated alternatives are resolved by weighing pros and cons in more depth. But there still remains the question, what is the psychological nature of the uncertainty itself. In this chapter, I would like to argue for a distinction not previously emphasized in discussions of uncertainty: the difference between psychological uncertainty and psychological approximation, referred to as uncertainty and approximation for the rest of the chapter. Uncertainty is the lack of knowledge about possible states (e.g., is the temperature 18 C or 19 C?). Approximation declares a state as falling with a range (e.g., the temperature is between 18 C and 19 C). At first blush, this distinction appears bizarre and without conceptual merit. From an information theoretic or logical perspective, there is no difference between the two. However, I will argue that this distinction is a critical psychological distinction in science and engineering problem solving. I will show that uncertainty and approximation are discriminable constructs in behavior, that they systematically occur in different places, and that common problem-solving strategies in science engineering serve primarily to convert from uncertainty into approximation. Thus, to ignore this seemingly nondistinction is to ignore a core feature of very important types of problem solving. Further, psychological research coding uncertainty from speech or gestures will likely falsely include approximation behaviors with uncertainty behaviors unless the distinction between uncertainty and approximation is salient.
2. Linguistic Pragmatics of Uncertainty and Approximation To first provide some intuitions regarding the difference between uncertainty and approximation, consider the following everyday conversational examples, focusing on the responses in italics.
230
Christian D. Schunn
(1) Speaker 1: How old is she? Speaker 2: 40? She was born in January of 1969. (2) Speaker 1: How old is she? Speaker 2: Early forties. (3) Speaker 1: How old is she? Speaker 2: Forty plus or minus 2. (4) Speaker 1: How old is she? Speaker 2: Early forties? In (1), speaker 2 has all the information required to provide a precise answer to the question, actually provides a precise answer (40) that is accurate (in 2009), and yet is psychologically uncertain, as noted in providing an answer in a question format. By contrast, in (2), speaker 2 provides an approximate answer (early forties), but with no indicated psychological uncertainty. Example (3) is a more academic-speak response with the same key characteristics as (2): approximation but no indicated uncertainty. Example (4) shows that one can have approximation and uncertainty. From a pragmatics perspective, speaker 2’s responses in (2) and (3) are quite reasonable in that they answer the question with precision that is likely sufficient for speaker 1’s needs and they set clear bounds on the possible actual values. By contrast, speaker 2’s response in (1) of ‘‘40?’’ does not set bounds on the possible actual values, leaving open the possibility of a much wider range of actual age. Human languages contain many categorical terms that represent approximations on quantitative entities. For example, 50s, 19th century, teenage, early childhood, average height, room temperature, steep, and next door represent approximate quantities of age, time, height, temperature, slope, and location. Moreover, each of those terms represents approximations that are much more approximate than humans can perceive psychologically. That is, we could think and express ourselves more precisely than with those terms, but we on occasion choose not to. Interestingly, both uncertainty and approximation can be indicated through the use of hedge words added to more precise terms, although the two use different hedge words. Consider the following two examples. (5) Speaker 1: How old is she? Speaker 2: Maybe 40. (6) Speaker 1: How old is she? Speaker 2: Almost 40. In (5), speaker 2 uses the hedge ‘‘maybe’’ to indicate uncertainty in the precise response with no provided bounds on how far the answer could be off, whereas in (6), speaker 2 uses the hedge ‘‘almost’’ to indicate approximation in the precise response and pragmatic conventions suggest the age is less than 40 and unlikely to be more than 1 or 2 years below 40 (i.e., it might be 38 or 39). Overall there appear to be many more ways of expressing
From Uncertainly Exact to Certainly Vague
231
uncertainty through hedge words than through direct terms indicating approximate or uncertainty quantities, perhaps reflecting subdimensions of uncertainty (e.g., probability distributions or average versus peak intensity) or approximations that do not have convenient linguistic labels (e.g., temperatures between 14 and 16 C, or ages between 43 and 45). As a result, our coding from speech tends to focus on hedge words. The examples above have generally focused on uncertainty and approximation cases that are not informationally equivalent in that the possible range for the uncertainty cases was larger than the possible range for the approximation cases. There are two important points to note about this observation. First, the definitional difference is NOT about relative ambiguity in quantity. Reverse cases are possible: one could be uncertain whether the temperature is 14 or 15 C and one could assert an approximation of 13–18 C. Uncertainty is about psychologically not knowing something, whereas approximation is about asserting a range. Second, it happens to be the case that problem solving tends to reduce the possible range for which one is uncertain to a smaller range that is the approximation. For example, a problem solver might begin with an uncertainty of a very general form (what is the temperature?) or of a wide range (what is the temperature, but knowing that it is a Fall afternoon in New York) and then through some data collection from various sources and reasoning finish with a smaller possible range of 14–16 C. In other words, problem solving (especially in engineering and science for which some level of precision is required) serves to move information ambiguity from unacceptable levels to acceptable levels for the task at hand. This point will be further examined in Section 6.
3. Coding Approximation and Uncertainty from Speech In a different sense of pragmatics, the distinction of approximation versus uncertainty is useful to psychologists (or various other scientists of behavior) only if the distinction can be made reliably from observed behavior and is associated with interesting patterns of behavior. Focusing on the first issue, in a number of projects we have found that uncertainty and approximation can be reliably coded from free speech, either in the form of thinkalouds during problem solving or in the form of natural conversations.
3.1. Conversation Coding in Engineering Design Team Meetings In Christensen and Schunn (2009), we coded for uncertainty and approximation from the many hours of conversation transcripts of an innovative engineering design team during their weekly design team meetings. Our
232
Christian D. Schunn
approach to coding uncertainty and approximation was syntactical with verification, building on a hedge-word uncertainty coding approach developed with Trickett, Trafton, Saner, & Schunn (2007). Examples of uncertainty hedge words are ‘‘probably,’’ ‘‘sort of,’’ ‘‘guess,’’ ‘‘maybe,’’ ‘‘possibly,’’ ‘‘don’t know,’’ ‘‘[don’t] think,’’ ‘‘[not] certain,’’ and ‘‘believe,’’ Examples of approximation hedge words are ‘‘pretty much,’’ ‘‘virtually,’’ ‘‘generally,’’ ‘‘frequently,’’ ‘‘usually,’’ ‘‘normally,’’ ‘‘basically,’’ and ‘‘‘almost.’’ (Actually, we searched for the Danish equivalents of these terms, as the team being studied was Danish.) In either the uncertainty or approximation cases, each instance of the hedge words was examined to make sure it was being used in an uncertainty or approximation sense; if so, the segment containing these hedge words were coded as ‘‘uncertainty present’’ or ‘‘approximation present.’’ Interrater reliability for this approach was extremely high, with kappas of 0.95 for uncertainty coding and 0.96 for approximation coding. As a simple validation of each construct and the distinction between the two, we also looked at the adjacency relationships between codes from one transcript segment to the next. The assumption is that mental states of uncertainty or approximation are ‘‘sticky’’ in that they will tend to continue longer in time than just one segment. Uncertainty and approximation are conceptualized as being about particular quantities and thus co-occurrence will not be perfect, but conversations will tend to continue regarding a given quantity, so there should be some continuity. As can be seen in Table 1, this continuity was clearly shown for both approximation and uncertainty (both trends are statistically significant). Further, taking into account the base rates of uncertainty and approximation, there was no tendency for approximation to immediately follow uncertainty or vice versa.
3.2. Conversation and Interview Coding in Science and Applied Science Data Analysis Another project involved a similar coding procedure applied to two different domains of science and two different domains of applied science (Schunn, Saner, Kirschenbaum, Trafton, & Littleton, 2007; Trickett et al., 2009). Table 1 Rates of Uncertainty and Approximation in the Next Transcript Segment as a Function of their Presence in a Given Segment.
Current segment
Uncertainty in next segment
Approximation in next segment
Uncertainty (n ¼ 247) Approximation (n ¼ 308) Neither (n ¼ 5616)
16% 3% 5%
4% 8% 4%
From Uncertainly Exact to Certainly Vague
233
The first domain involved conversations of earth scientists working at the Jet Propulsion Lab analyzing data as it came down from Mars from two robotic rovers—the Mars Exploration Rovers. The coded conversations were of impromptu meetings held throughout the day between groups of 2–10 scientists from several different disciplines (soil and rock scientists, geochemists, geologists, and atmosphere scientists). There were a number of video cameras off to the sides of the large data analysis rooms. The scientists had given informed consent for this video collection, but the cameras were relatively small, discretely located, and constantly present. Thus, the scientists generally forgot about the existence of the cameras and the transcripts likely capture very typical problem-solving behaviors in this context. The remaining three domains were 13 cognitive neuroscientists analyzing fMRI data (fMRI), 18 meteorologists making weather predictions (Weather), and 22 navy officers localizing an enemy submarine using only passive sonar (Submarine). These datasets involved cued think-alouds of novices (apprentices in the domain, not random undergraduates), intermediates, and experts. Participants were videotaped as they analyzed their data on computers (their own data in the case of fMRI, canned data in the case of Weather and Submarine). After 30–45 min of data analysis, they were then shown three or four different minute-long snippets of the videotape that corresponded to critical decision-making moments during data analysis. The scientists were asked to explain what they knew and did not know at that moment in time. Sometimes problem solvers given thinkaloud instructions fall silent exactly at the interesting moments in time, especially when the task is long and complex. This cued-recall method was designed to capture additional information about these more interesting moments. Across these four domains, we used the same hedge-word technique for coding uncertainty and approximation from the transcribed speech. In all cases, we obtained interrater reliability kappas of greater than 0.8 for both uncertainty and approximation. The know/do not know probes in the fMRI, Weather, and Submarine domain studies provide another validation of the distinction between uncertainty and approximation (and coding was done blind to question context). One would expect that there would be more uncertainty speech cues in response to the ‘‘what did you not know?’’ question than in response to the ‘‘what did you know?’’ question. An opposite pattern is expected for approximation. Figure 1 presents the results. In all three domains, the predicted pattern was obtained and statistically significant for both uncertainty and approximation codes. Thus, uncertainty versus approximation is a distinction that can be made reliably in various science and engineering settings from verbal data in the form of think-alouds or natural conversations. Simple patterns in the data clearly suggest that uncertainty and approximation are temporally coherent
234
Christian D. Schunn
1 Know Q Not know Q
Proportion of segments
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2
fMRI (n = 13)
Weather (n = 18)
Approximation
Uncertainty
Approximation
Uncertainty
Approximation
0
Uncertainty
0.1
Submarine (n = 22)
Figure 1 The proportion of speech containing uncertainty or approximation (with SE bars).
within categories and temporally dissociable across categories. Finally, uncertainty and approximation speech appears under expected conditions.
4. Coding Uncertainty from Gestures In science and engineering, much of the data is inherently visual– spatial or is displayed in spatial format (e.g., graphs of temperature varying with time). Thus, much of the uncertainty and approximation are expressed about visual–spatial quantities. Because science and engineering have formalized much if not all of the quantities and relationships in symbolic formats (e.g., terms for particular quantitative data patterns, equations to represent quantitative data patterns), much can be studied from coding speech from conversations and think-alouds. However, it is likely that considerable representing, reasoning, and problem solving in science and engineering is also happening in a visual–spatial, nonverbal format. How does one measure internal problem solving on visual–spatial content? All measures of mental representations and problem solving are necessarily indirect. Verbal report is one general source of data regarding mental
From Uncertainly Exact to Certainly Vague
235
representation and problem solving. However, for visual–spatial content, it is a suspect source, as verbal data are generally thought to capture the contents of verbal working memory, not spatial working memory (Ericsson & Simon, 1993). Retrospective or intermittent drawings can be another source of data. However many people are not very skilled in drawing, and it is likely that such drawings would influence reasoning more than verbal protocols would because the drawing process is much less automatic and the results of the process are more permanent (i.e., is an object that can be used itself in problem solving). Scientists and engineers do draw (by hand or via a computer) regularly, but not densely enough in time to constitute a good online measure of thinking. A third approach is to use spontaneous gestures. In addition to serving as a communicative act between speaker and listener, spontaneous gestures are thought to be an online measure of mental representations much like verbal protocols (Alibali, Bassok, Solomon, Syc, & Goldin-Meadow, 1999; Alibali & GoldinMeadow, 1993; McNeill, 1992). In spatial tasks, in fact it is disruptive to the problem solver to prevent gesturing from occurring. In a later section, I will consider more complex representational content of gestures. But first, I want to focus on gestures as a direct measure of uncertainty or approximation. There are a number of taxonomies of gesture. One common distinction (McNeill, 1992) is between beat gestures (rhythmic, repetitive gestures often co-timed with speech), deictic gestures (pointing to things in the world around the speaker such as the clock on the wall over there), iconic gestures (gestures that are literal physical presentations of things absent, such as hand-shape holding an implied glass), and metaphoric gestures (a spatial representation of a nonspatial object, such as pointing behind oneself to represent back in time). All of these gestures can have many phases (McNeill, 2005): preparation (optional), prestroke hold (optional), stroke (obligatory), stroke hold (obligatory if the stroke is static), poststroke hold (optional), and a retraction (optional). Uncertainty gestures are typically wiggling movement in the stroke of an iconic or metaphoric gesture that represents some quantity (i.e., normally would be static). For example, a pinch indicating a size together with wavering the size or wiggling the hand. In this way, the uncertainty gesture is discriminable from a beat gesture in that there is content to the gesture beyond the movement in an uncertainty gesture of this type but the beat gesture does not have content beyond the movement (i.e., the hand does not indicate a size or distance or volume). However, another common form of an uncertainty gesture involves a shoulder shrug. In this case, one must rely on speech or perhaps another gesture to determine which quantity is producing uncertainty. We have not yet coded approximation gestures, but I could easily imagine width of gestures indicating the approximations on quantities (e.g., between fingers of one hand or between hands). Further, I could
236
Christian D. Schunn
20 Present Absent 15
10
fMRI
Weather
Approximation
Uncertainty
Approximation
Uncertainty
0
Approximation
5
Uncertainty
% Segments w/uncertainty gesture
easily imagine that some of the wiggling gestures that we previously coded as uncertainty gesture might actually be approximation gestures (e.g., specific movement between particular points). In this section, the uncertainty gesture data are used as a cross-validation: do uncertainty gestures co-occur with uncertainty speech (and less so with approximation speech)? It is important to note, however, that speech and gesture need not always line up perfectly. Speech-gesture mismatches do happen and are not thought to be simply noise in interpretation; rather they are thought to signal coactivation of competing ideas/strategies (Alibali & Goldin-Meadow, 1993; Alibali et al., 1999). We examined the overlap between uncertainty gesture and speech in the four science/applied science domains. Figure 2 presents the percentage of segments with uncertainty gestures when the segment has speech uncertainty or speech approximation present/absent in the three domains with cued-recall think-alouds. The first thing to note is that uncertainty gestures are relatively less common that uncertainty or approximation speech codes. The second thing to note is the strong cross-validation across all three domains: uncertainty gestures occurred much more often when uncertainty
Submarine
Figure 2 Percentage of segments (with SE bars) with uncertainty gestures as a function of uncertainty and approximation appearing the speech segment for the domains of fMRI, Weather, and Submarine. The ‘‘Present’’ cases each involve approximately 300 segments and the ‘‘Absent’’ cases each involve approximately 1000 segments.
From Uncertainly Exact to Certainly Vague
237
speech occurred (ps < 0.01 in all three), whereas uncertainty gestures had no consistent relationship to whether approximation speech occurred (only the Weather difference is statistically significant, p < 0.05, and in the reverse direction from the uncertainty speech pattern). In the naturalistic science conversation Mars data, 5.3% of segments with an uncertainty code had an uncertainty gesture, in comparison to 2.7% of segments without an uncertainty speech code (X2(1) ¼ 16.0, p < 0.001)— in other words, uncertainty gestures occur twice as often in the context of uncertainty speech. There is an association between approximation statements and uncertainty gestures (X2(1) ¼ 6, p < 0.02), but the association is weaker; uncertainty gestures are only 50% more likely to appear in the context of approximation speech than without approximation speech. Overall, then, uncertainty speech and uncertainty gesture are clearly related, whereas uncertainty gesture and approximation speech have a smaller ambiguous relationship, perhaps reflecting some miscoding of approximation gestures as uncertainty gestures. To further validate that there is indeed something called an uncertainty gesture that signals an internal state of uncertainty, we can examine gesture data from the fMRI, Weather, and Submarine domains, focusing on the relative frequency of uncertainty gestures in response to the Know and Not know questions. In all three domains, 2% of segments co-occurred with an uncertainty gesture during the response to the Know question. In response to the Not know question, rate of uncertainty gestures increased significantly (ps < 0.05) and generally more than doubled (5% fMRI, 8% Weather, and 4% Submarine).
5. Uncertainty, Approximation, and Expertise With multimodal affirmation of the somewhat surprising distinction between uncertainty and approximation in hand, we can now explore a third pragmatic question: whether the distinction plays a useful role in explaining behavior, in this case behavior of scientists and engineers. One intuition might be that uncertainty and approximation should differ by expertise levels, with experts showing more approximation and less uncertainty. Indeed, some expertise literature focuses on the amazing swiftness with which experts can see problems in terms of solutions features and solve problems (Chase & Simon, 1973; Chi, Feltovich, & Glaser, 1981; Gobet & Simon, 1996; Larkin, McDermott, Simon, & Simon, 1980). However, much of the expertise literature making those claims focuses on welldefined problems such as simple physics problems that are purely education tasks rather than problems an expert would actually encounter. The actual life of an engineer and scientist is much less clear-cut. Indeed, experts in
238
Christian D. Schunn
most domains deal with a very uncertain world, hence the large focus on decision making under uncertainty within naturalistic decision-making research. While an expert certainly can produce better solutions and in less time than novices in the much more ill-defined contexts of real science and engineering problem solving (Moss, Kotovsky, & Cagan, 2006; Schunn & Anderson, 1999; Voss, Tyler, & Yengo, 1983), it is not a matter of recognition of simple solutions for the expert. Issues involving uncertainty must be recognized and then resolved through complex processes, like mental simulation. It may be that novices do not even recognize what is uncertain about the current situation, treating initial point estimates as fact rather than estimates. The fMRI, Weather, and Submarine cued-recall dataset provides an opportunity to look at expertise effects on rates of uncertainty and approximation across domains to look for consistent patterns. We defined novices as those individuals having already learned enough of the task basics to be able to complete the analysis tasks on their own (e.g., analyze an fMRI dataset, make a weather prediction). Experts were those at the top performance levels. Intermediates were those with considerable experience beyond novice levels, but far from expert levels in that domain. In our participant pool for that study, only fMRI involved all three performance levels. The Weather data included novices ( juniors and seniors in weather forecasting school) and experts, and the submarine data had only intermediates and experts (both were submarine officers with field experience, but to varying degrees). Figure 3A presents the levels of uncertainty speech across the expertise levels in each domain, and Figure 3B presents the levels of approximation speech across the expertise levels in each domain. There are a few statistically significant differences, but no consistent differences across the three domains. For example, in the submarine domain, the experts have the highest levels of uncertainty, whereas in the Weather domain they have the lowest. In all three domains, the differences by expertise level are small. The best overall conclusion to draw is that recognizing uncertainty may itself be a kind of expertise and the frequency of uncertainty comments will involve two opposing trends as a function of expertise: (1) experts likely recognize more facets of uncertainty and (2) experts are better able to resolve the uncertainty. How those opposing trends balance in aggregate will depend on the complexities of the task at hand. That is, I doubt that even a whole domain will have general patterns by expertise level on amount of uncertainty as some tasks within the domain will involve more detection challenges and others will involve more resolution challenges. In support of this idea that there are recognition and resolution elements to uncertainty, one can divide a problem-solving session into two halves (early and late). If experts recognize uncertainty more readily and then are able to resolve it, we would expect their uncertainty levels to go down over
239
From Uncertainly Exact to Certainly Vague
A 35%
Uncertainty
30%
% of segments
25% 20% 15% 10% 5% 0% Novice
Intermediate
Expert
B Approximation
30%
% of segments
25% 20% 15% 10% 5% 0% Novice fMRI
Intermediate Weather
Expert Submarine
Figure 3 The percentage of segments (with SE bars) showing (A) uncertainty speech and (B) approximation speech as a function of domain and expertise levels.
time. By contrast, if novices are struggling to even see the issues of uncertainty and are less able to resolve these uncertainties, then we would expect novices’ uncertainty levels to go up over time. Figure 4 presents relevant uncertainty speech data from the fMRI domain. We see that uncertainty levels do go up for novices and intermediates whereas they go down (directionally but not statistically significant) for experts. Similar (small) interactions of early/late by expertise levels on uncertainty levels could also be seen in the other domains.
240
Christian D. Schunn
35%
fMRI uncertainty
30%
% of segments
25% 20% 15% 10% 5% 0% Novice
Intermediate Early
Expert
Late
Figure 4 The percentage of segments (with SE bars) in the fMRI domain showing uncertainty speech as a function of early and late minutes of problem solving and expertise levels. Ns for each percentage vary between 130 and 300 segments of speech.
Of course, more fine-grained coding of uncertainty detection and resolutions’ strategies included in this analysis of expertise effects on uncertainty and approximation would provide a more conclusive perspective on why uncertainty is not clearly associated with expertise and appears to be changing in different ways over time. We have done this coding in all four science/applied science domains. We specifically looked at what indicators were used to identify sources of uncertainty. For example, uncertainty becomes apparent when different data sources (as in two weather models) produce conflicting results, or when one data source produces seemingly impossible results (as in brain activation outside the skull). A number of such general indicators could be found. We also looked at the strategies used to resolve the uncertainty. It turns out that there are a very large number of such general strategies that can be observed, some more spatial in form, others less spatial. There are some expertise differences by strategy within each of the domains, but the differences are not consistent across the domains, probably because different strategies are differentially effective within each domain. In sum, uncertainty and approximation have a complex relationship to expertise levels rather than a simple-linear trend relationship, and the relationship likely depends upon the ease in which uncertainty is detectable and resolvable in a given setting given available tools and strategies.
From Uncertainly Exact to Certainly Vague
241
6. From Uncertainty to Approximation via Spatial Reasoning Thus far, I have focused on the differences between uncertainty and approximations—how they are not the same. Now I would like to focus on the positive relationship that they have to one another. In particular, the theoretical assertion that I would like to make is that uncertainty and approximation have an input/output relationship to one another with spatial reasoning lying in between, at least in science and engineering problem solving. The next three sections build up the evidence for this theoretical assertion. Section 6.1 examines verbal protocol evidence that uncertainty leads to mental spatial transformations. Section 6.2 examines gesture data to examine the relative temporal relationship of uncertainty, approximation, and spatial mental representations. Section 6.3 focuses on a particular kinds of spatial problem solving that appears to be used to move from uncertainty to approximation in problem solving.
6.1. Uncertainty and Verbally Coded Spatial Transformations in Basic and Applied Science In Trickett et al. (2007), we used the syntactic approach to coding uncertainty in speech and then also coded the speech for the presence of spatial transformations. Spatial transformations are mental operations a person mentally performs on an internal representation or an external visualization (on paper or computer screen). Typical spatial transformations are creating a mental image, adding or deleting features to an image, rotating or moving an object, or making comparisons between different views. Table 2 provides examples of uncertainty codes and spatial transformations from utterances. In one study, we examined the relative co-occurrence of spatial transformations with uncertainty in speech for an expert (over 16 years of experience) making a weather forecast while giving a think-aloud (approximately 50 min of speech to analyze). We found that the rate of spatial transformations was almost twice as high during speech with uncertainty markers than in speech without uncertainty markers. Follow-up work with more experts and novices (although still trained in weather forecasting) found that both experts and novices showed this pattern but the effect is much larger in experts than in novices. In the second study of Trickett et al. (2007), a more rigorous test was conducted using the fMRI and Weather cued-recall dataset described earlier, but in a slightly different way. Here, spatial transformations were coded from the think-aloud speech of the problem solver doing the initial fMRI data analysis or weather forecast. Then relative levels of uncertainty were coded
242
Christian D. Schunn
Table 2 Examples of Spatial Transformations and Certain and Uncertain Utterances with Indications of Uncertainty in Bold and Spatial Transformation in Italics (adapted from Trickett et al., 2007).
Utterance
Code
Spatial transformations (ST)
Nogaps [a mathematical model] has some precipitation over the Vancouver/Canada border (while viewing a visualization) This is valid today Possibly some rain over Port Angeles And then uh, at Port Angeles, there’s gonna be some rain up at the north, and if that sort of sneaks down, we could see a little bit of restriction of visibility, but only down to 5 miles at the worst I don’t think the uh front’s gonna get to Whidbey Island, but it should be sitting right about over Port Angeles right around 0Z this evening
Certain
No ST
Certain No ST Uncertain No ST Uncertain ST: mentally moving rain [sneaks down]
Uncertain ST: mentally moving front/animation
from the responses to the cued-recall questions. A minute of problem solving was determined to be a high-uncertainty minute if the cued-recall phase for that minute generated a high percentage of uncertainty speech codes whereas the minute of problem solving was determined to be a low-uncertainty minute if the cued-recall phase generated a low percentage of uncertainty speech codes. Thus, spatial transformations and relative uncertainty levels were coded from different datasets (and also by coders from different labs). Further, in our prior analyses, uncertainty speech was coded at the utterance-by-utterance level, whereas the underlying uncertainty is likely more pervasive (i.e., the speech codes may be considered as the tip of the uncertainty iceberg). This designation of entire minutes as being high or low uncertainty addresses this issue. Indeed, using this approach to examining uncertainty against spatial transformations, we found that spatial transformations were over four times greater during highuncertainty minutes than during low-uncertainty minutes.
6.2. Association of Uncertainty and Approximation with Spatial Gestures in Basic Science In addition to potentially capturing uncertainty or approximation in thinking, gestures can also capture spatial problem solving. If spatial problem solving takes place between uncertainty and approximation, then we should
From Uncertainly Exact to Certainly Vague
243
see more spatial gestures between uncertainty and approximation. But what kind of gestures should we expect to see? There are many different kinds of spatial representations. The spatial reasoning literatures (in cognitive psychology, developmental psychology, and cognitive neuroscience) frequently make distinctions between large scale and small scale, egocentric and exocentric (or allocentric), and between two-dimensional and three-dimensional visual–spatial representations. The work described in the prior section suggests that spatial transformations are frequently used by problem solvers to resolve uncertainty. The cognitive neuroscience literature has suggested for multiple decades that a ventral (‘‘what’’ or object type information) and dorsal (‘‘where’’ or object location information) pathway is a critical distinction in thinking about visual–spatial processing (Ungerleider & Mishkin, 1982). Later work (for a review, see Kosslyn, Ganis, & Thompson, 2001) has suggested that the parietal lobe (part of the where pathway) is heavily involved during spatial transformations (e.g., during mental rotation). Other neuroscience work has suggested that the parietal lobes are specifically involved in small 3D representations of space (Previc, 1998). By inference, one would expect to see high numbers of small 3D manipulation gestures following uncertainty speech and preceding approximation speech if mental transformations are doing the work of going from uncertainty to approximation and these gestures map onto mental transformations of this type. We have tested exactly this prediction in the Mars data described earlier. In addition to coding uncertainty gestures, we also coded for several other kinds of spatial and nonspatial gestures. The most common spatial gesture was small-scale 3D gestures. Based on a theoretical framework I have developed elsewhere (Harrison & Schunn, 2002), these are called manipulative gestures. Specifically, manipulative gestures are gestures that place objects and activity in a nearby space, such that the problem solver can actually manipulate or place the imaginary objects. Examples of manipulative gestures included one-handed gestures of a brain region (a cupped hand facing up) and two-handed gestures showing dusting billowing over a small crater lip (the left hand flat and held still at an angle to represent the crater lip and the right hand swooping over the left with fingers wiggling to show the billowing dust). Gestures in which the hand shape suggests placing or holding as opposed to strictly pointing were also coded as manipulative. To examine the relative temporal arrangement of uncertainty speech and manipulative gestures, we divided speech segments into several different types: segments with uncertainty speech (exact), segments that have uncertainty 1–5 segments before the current one (before), segments that have uncertainty 1–5 segments after the current one (after), segments with both before and after relationships, and then segments not near uncertainty speech (distant), which can be thought to establish a base rate of spatial gestures. We then examined the rate of manipulative gestures during each of
244
Christian D. Schunn
these segment types. The same analysis was also done for gestures’ temporal relationship to approximation speech codes. Figure 5 presents the results of this analysis. Focusing on manipulative gestures relative to uncertainty speech, the highest rates of manipulative gestures occur when the uncertainty speech occurs before the current segment. The ‘‘during’’ cases (both and exact) have lower rates of manipulative gestures, and the after case has a manipulative gesture rate similar to segments distant from any uncertainty speech. Thus, it appears that uncertainty speech occurs primarily before manipulative gestures and not after. For approximation speech and manipulative gestures, a different pattern appears. Here manipulative gestures are elevated anywhere near approximation speech, but particularly right during it. Thus, the approximation representations appear to occur simultaneously with the spatial transformation work. Overall, these data are consistent with the view that uncertainty leads to spatial transformations that produce approximation results.
6.3. From Approximation to Uncertainty via Mental Simulations in Engineering Design
Proportion of manipulative gestures
The Christensen and Schunn (2009) examination of uncertainty and approximation in engineering design also examined the temporal relationships of uncertainty and approximation relative to mental problem solving. 0.4
0.3
0.2
0.1
0 Distant
Before (1–5) Both Location of speech code Uncertainty
Exact
After (1–5)
Approximation
Figure 5 The proportion of speech segments (with SE bars) with manipulative gestures as a function of whether uncertainty speech occurred before (within five segments), after (within five segments), both before and after, or exactly in the segment. Each proportion is based on between 300 and 600 segments of data, except the ‘‘distant’’ proportions, which are based on 1300 segments.
From Uncertainly Exact to Certainly Vague
245
In particular, we focused on a kind of problem solving that was quite frequent in engineering design team meetings: mental simulations. These mental simulations happened approximately once every two minutes on average during the meetings. In the part of the meetings in which the conversation was focused on active design of the product (vs. future planning), the rate went up to one per minute. The coding scheme for mental simulations was adapted from the coding scheme developed by Trickett and Trafton (2007) for coding scientist mental simulations. A mental simulation is a mentally constructed model of a situation, building upon objects in memory of mental modifications of objects currently present. A defining feature of a mental simulation is that something is ‘‘running,’’ that is, that the process alters the representation. The simulation is not just asking a ‘‘what if’’ question. It also provides an answer about whether something will work, what a resulting feature will be, etc. Mental simulations involve a sequence of three critical elements: creating an initial representation, running the representation (elements or functions are changed, added, or deleted), and a final changed representation. Each segment was coded as ‘‘mental simulation’’ or ‘‘no mental simulation,’’ along with the separate steps. Table 3 presents an example mental simulation from the transcripts coded into three components. The interrater reliability for coding mental simulations was quite high, kappa ¼ 0.9. Figure 6 presents the rate of uncertainty and approximation speech as a function of step during a mental simulation. The base rate of (speech coded) uncertainty is 8% in this dataset. The rate of uncertainty speech was statistically significantly higher than the base rate at the initial representation and Table 3 An Example Mental Simulation from the Engineering Design Domain (from Christensen & Schunn, 2009). Step
Initial representation
Utterance
Could you add something so that you couldn’t close this thing because there would be something in the way when you try to fold this way. . . Run But if this thing goes this way, then it is in a position to allow the ear to enter. . . But then I just don’t know how it should be folded. . . ’cause if it is folded this way then it will come out here. . .then it should be folded unevenly somehow. . .You should fold it oblique. Changed representation It wouldn’t make any difference one way or the other. It would fold the same way, and come out on this side the same way.
246
Christian D. Schunn
25%
Percentage
20%
15%
10%
5%
0% Initial representation
Simulation run
Resulting representation
Mental simulation sequential steps Uncertainty
Approximation
Figure 6 Percentage of segments with uncertainty and approximation by mental simulation sequential step, with SE bars (from Christensen & Schunn, 2009).
during the simulation run, but not during the resulting representation. By contrast, approximation speech was at baseline levels (3%) at the initial representation step, and rose to significantly higher levels by the resulting representation. Thus, the temporal patterns are perfectly consistent with the hypothesis that mental simulations have the effect of turning uncertainty into approximations. More recently, Linden and Christensen (2009) coded for uncertainty and mental simulations in a different engineering design dataset and found exactly the same results—a reduction of uncertainty in the initial representation down to base levels of uncertainty by the resulting representation state of the mental simulation.
7. Summary and Discussion Epistemic uncertainty is a huge area of scholarship. It has captured the minds of scholars in psychology and many domain-specific studies of reasoning and problem solving, presumably because uncertainty is ubiquitous or nearly so in real-world problem solving. With all the rich distinctions that could be made about uncertainty, I began this chapter with a
From Uncertainly Exact to Certainly Vague
247
different psychological distinction that seemed on first inspection a nondistinction: the distinction between vaguely uncertain and certainly vague. Indeed, when I began empirical investigations into uncertainty problem solving, I assumed that uncertainty was the start state and precise certainty was the end state. That is, I assume a problem solver moved along a continuum of precision, with initial states involving little precision and final states involving high precision. Yet, examination of problem-solving transcripts hinted at a different transformation: from uncertainty to imprecision, or, as I call it now, approximation. In the early coding work on the uncertainty/approximation distinction, we had many arguments within the lab about what the distinction even was and how it could be coded with any conceptual integrity. Yet, the initial intuition about the need for such a distinction appeared to have merit. The distinction can be defined psychologically, even though the logical or information theoretic definitions are lacking. More importantly perhaps to researchers who are empirically rather than philosophically oriented, the distinction could be coded reliably in real problem-solving transcripts, and cross-validation investigations were also very successful. Of course, subtle demand characteristics of context might have created these distinctions in the minds of the coders. For example, it is hard to hide from the coders the context of participants being asked what did they know versus what did they not know. Even with the questions themselves being hidden, the participants often repeat the question verbatim or with minor rephrasings. However, the same pattern was observed in many different datasets, which involved many different coders (spread across labs in different cities), and crossvalidations of different forms. Furthermore, we focused on more syntactic approaches to coding uncertainty and approximation to reduce the possible influence of situational demand characteristics significantly determining our results. Finally, we did not find that expertise levels had clear associations with uncertainty levels, even though some of the coders had strong expectations that there would be such patterns. Thus, effects of coder expectations on coding behavior were not strong enough to create results through expectations alone. Perhaps, most persuasive and interesting are the patterns of uncertainty and approximation against reasoning indicators. We found clear temporal patterns: (1) uncertainty invokes mental spatial transformations; (2) spatial gestures seem to reside between verbal uncertainty and verbal approximation; and (3) mental simulations seem to reside between verbal uncertainty and verbal approximation. Before declaring victory in this appeal for a new general distinction, I want to return to the information theoretic/logical basis or nonbasis for the distinction. In cognitive science, there is a general view that cognition is but computation. Further, considerable recent theorizing has focused on the optimality or rationality of human cognition (Anderson, 1990; Gigerenzer, 2000; Griffiths & Tenenbaum, 2006). It should make the reader nervous to
248
Christian D. Schunn
accept a distinction as the basis of rational, expert problem solving when the computational/logical basis of the distinction is fundamentally flawed. As I noted before, the definition for uncertainty and approximation could not be made simply on the basis of logical informational ambiguity. That is, one could imagine uncertain cases that had less ambiguity than other approximation cases. However, I also noted that, empirically, problem-solving processes would generally reduce the underlying ambiguity as the problem solver moved from uncertainty to approximation. Therein lies the true rational basis of this mode of processing. The computational work of Forbus (1997) building running conceptual simulations (called qualitative reasoning) shows how approximate quantitative answers can be derived from incomplete information. To extend a computational framework to the current proposal, the idea is as follows. A problem solver is working on a task and discovers that the informational ambiguity is above some threshold such that a critical decision/inference cannot be made (e.g., will a design choice produce a satisfactory outcome?). A state of uncertainty is thus taken on, which motivates problem-solving processes (such as spatial transformations or mental simulations) to reduce the underlying ambiguity. When the ambiguity is sufficiently reduced to enable decision making, then the resulting ambiguity is declared an approximation. I should also add an important caveat. While uncertainty frequently resolves in approximation in science and engineering, I am not claiming that it always results in approximation; sometimes it just ends in more uncertainty and the problem solvers move on, and sometimes it even ends in precise certainty. Although the world of scientists and engineers is complex enough that exact, certain values are not the norm, it does happen. A final caveat involves my focus on science and engineering. Many psychologists avoid rich real domains because of the difficulties in obtaining access to participants and the complexities of studying real tasks. Those psychologists who do study real domains tend to pick a particular domain to study. I have presented data from many different domains, including several basic sciences, several applied sciences, and engineering design. Hopefully, the case is now persuasively made for science and engineering. But certainly the space of domains involving informational uncertainty is much broader still. I suspect that similar distinctions will be relevant in these other domains, but that remains an empirical question for others to examine.
8. Future Directions I have attempted to provide a simple and rational account for what problem solvers do with uncertain information, but many questions remain. For example, we have at best a very incomplete understanding of how
From Uncertainly Exact to Certainly Vague
249
information uncertainty is detected by the problem solver. In science and engineering, the problem solver might encounter hundreds to thousands of quantities, all of which may involve some uncertainty and yet psychological uncertainty is not triggered for all of those values. Most values are simply accepted. What raises the uncertainty hairs of the problem solver in these complex settings? A related question involves what it means, exactly, to have the uncertainty hairs raised. We know that information ambiguity is troubling to problem solvers. It motivates them to reduce the ambiguity and the ambiguity reduce procedures appear to be useful for problem-solving success. But this behavioral description does not precisely unpack the mental state of uncertainty. Is it purely cognitive or does it have a core emotional component? Does it have underlying phenomenological primitives or is psychological uncertainty a foundational concept? As mentioned in Section 1, we know that uncertainty derived from different factors produces different behaviors, but that, by itself, does not answer the phenomenological question. Cognitive neuroscience may provide some interesting data on this front. We know that relative predictability of outcomes is a key variable in predicting the reactions of certain brain areas (e.g., the anterior cingulate cortex or the basal ganglia), and this relative predictability is heavily implicated in learning. Another further direction involves the qualitative. Thus far I have emphasized psychological uncertainty about quantitative dimensions. What about qualitative dimensions? Perhaps, the enemy will come by plane or by train. Perhaps, it will snow or it will rain. Psychological uncertainty is clearly relevant to these qualitative ambiguities. What about approximation? Let us briefly consider some of the hedge words that we used for coding approximation in speech: ‘‘pretty much,’’ ‘‘virtually,’’ ‘‘generally,’’ ‘‘frequently,’’ ‘‘usually,’’ ‘‘normally,’’ ‘‘basically,’’ and ‘‘almost.’’ All of these hedges could be applied to the qualitative ambiguities in enemy transportation method or precipitation type. Semantically, those qualifiers would be ones of probability, which is a quantitative dimension attached to discrete qualitative states. Indeed, many of the things that were coded in our datasets as approximations involved these sorts of probabilistic hedges to qualitative issues. The task for future research is to fathom whether approximation on quantities and approximate probabilities on qualities is actually the same basic thing.
ACKNOWLEDGMENTS The reported projects were supported by grants from ONR (N000140610053, N000140210113, and N000140310061) and NSF (SBE-0738071). These projects were intense collaborations with Bo Christensen, Greg Trafton, Susan Trickett, Susan Kirschenbaum, Lelyn Saner, and Tsunhin Wong, and involved additional hard coding work by Melanie Shoup and Mike Knepp.
250
Christian D. Schunn
REFERENCES Abbaspour, R. A., Delavar, M. R., & Batouli, R. (2003). The issue of uncertainty propagation in spatial decision making. In: K. Virrantaus & H. Tveite (Eds.), Proceedings of the Scandinavian research conference on geographical information science (pp. 57–65). Helsinki, Finland: Helsinki University of Technology. Alibali, M. W., Bassok, M., Solomon, K. O., Syc, S. E., & Goldin-Meadow, S. (1999). Illuminating mental representations through speech and gesture. Psychological Science, 10(4), 327–333. Alibali, M. W., & Goldin-Meadow, S. (1993). Gesture-speech mismatch and mechanisms of learning: What the hands reveal about a child’s state of mind. Cognitive Psychology, 25(4), 468–523. Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: L. Erlbaum Associates. Berkeley, D., & Humphreys, P. (1982). Structuring decision problems and the ‘‘bias heuristic’’. Acta Psychologica, 50(3), 201–252. Bottom, W. P. (1998). Negotiator risk: Sources of uncertainty and the impact of reference points on negotiated agreements. Organizational Behavior and Human Decision Processes, 76(2), 89–112. Brashers, D. E., Neidig, J. L., Russell, J. A., Cardillo, L. W., Haas, S. M., Dobbs, L., et al. (2003). The medical, personal, and social causes of uncertainty in HIV illness. Issues in Mental Health Nursing, 24, 497–522. Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Psychology, 4, 55–81. Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121–152. Christensen, B. T., & Schunn, C. D. (2009). The role and impact of mental simulation in design. Applied Cognitive Psychology, 23(3), 327–344. Cohen, M. S., Freeman, J. T., & Thompson, B. (1998). Critical thinking skills in tactical decision making: A model and a training strategy. In J. A. Cannon-Bowers & E. Salas (Eds.), Making decisions under stress: Implications for individual and team training (pp. 155–189). Washington, DC: American Psychological Association. Egan, J. P., Schulman, A. I., & Greenberg, G. E. (1961). Memory for waveform and time uncertainty in auditory detection. Journal of the Acoustical Society of America, 33, 779–781. Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (2nd ed.). Cambridge, MA: MIT Press. Forbus, K. D. (1997). Qualitative reasoning. The Computer Science and Engineering Handbook, 715–733. Friday, E. W. (2003). Communicating uncertainties in weather and climate information: A workshop summary. Washington, DC: National Academies Press. Gigerenzer, G. (2000). Adaptive thinking: Rationality in the real world. Oxford: Oxford University Press. Gobet, F., & Simon, H. A. (1996). Recall of random and distorted chess positions: Implications for the theory of expertise. Memory & Cognition, 24(4), 493–503. Griffiths, T. L., & Tenenbaum, J. B. (2006). Optimal predictions in everyday cognition. Psychological Science, 17(9), 767–773. Hall, K. H. (2002). Reviewing intuitive decision-making and uncertainty: The implications for medical education. Medical Education, 36, 216–224. Harrison, A. M., & Schunn, C. D. (2002). ACT-R/S: A computational and neurologically inspired model of spatial reasoning. In: Paper presented at the 24th annual meeting of the cognitive science society. Mahwah, NJ: Erlbaum. Howell, W. C., & Burnett, S. A. (1978). Uncertainty measurement: A cognitive taxonomy. Organizational Behavior and Human Decision Processes, 22(1), 45–68.
From Uncertainly Exact to Certainly Vague
251
Jousselme, A.-L., Maupin, P., & Bosse´, E. (2003). Uncertainty in a situation analysis perspective. In: Paper presented at the 6th annual conference on information fusion, Cairns, Australia. Kahneman, D., & Tversky, A. (1982). Variants of uncertainty. Cognition, 11(2), 143–157. Klein, G. A. (1989). Strategies of decision making. Military Review, 56–64 (May). Kosslyn, S. M., Ganis, G., & Thompson, W. L. (2001). Neural foundations of imagery. Nature Reviews Neuroscience, 2, 635–642. Krivohlavy, J. (1970). Subjective probability in experimental games. Acta Psychologica, 34(2–3), 229–240. Larkin, J. H., McDermott, J., Simon, D., & Simon, H. (1980). Expert and novice performance in solving physics problems. Science, 208, 140–156. Linden, J. B., & Christensen, B. T. (2009). Analogical reasoning and mental simulation in design: Two strategies linked to uncertainty resolution. Design Studies, 3, 169–186. Lipshitz, R., & Strauss, O. (1997). Coping with uncertainty: A naturalistic decision-making analysis. Organizational Behavior and Human Decision Processes, 69(2), 149–163. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago, IL: University of Chicago Press. McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press. Moss, J., Kotovsky, K., & Cagan, J. (2006). The role of functionality in the mental representations of engineering students: Some differences in the early stages of expertise. Cognitive Science, 30(1), 65–93. Musgrave, B. S., & Gerritz, K. (1968). Effects of form of internal structure on recall and matching with prose passages. Journal of Verbal Learning and Verbal Behavior, 7(6), 1088–1094. Plewe, B. (2002). The nature of uncertainty in historical geographic information. Transactions in GIS, 6(4), 431–456. Previc, F. H. (1998). The neuropsychology of 3-D space. Psychological Bulletin, 124(2), 123–164. Priem, R. L., Love, L. G., & Shaffer, M. A. (2002). Executives’ perceptions of uncertainty sources: A numerical taxonomy and underlying dimensions. Journal of Management, 28(6), 725–746. Regan, H. M., Colyvan, M., & Burgman, M. A. (2002). A taxonomy and treatment of uncertainty for ecology and conservation biology. Ecological Applications, 12(2), 618–628. Regan, H. M., Hope, B. K., & Ferson, S. (2002). Analysis and portrayal of uncertainty in a food web exposure model. Human and Ecological Risk Assessment, 8(7), 1757–1777. Rowe, W. D. (1994). Understanding uncertainty. Risk Analysis, 14, 743–750. Schunn, C. D., & Anderson, J. R. (1999). The generality/specificity of expertise in scientific reasoning. Cognitive Science, 23(3), 337–370. Schunn, C. D., Saner, L. D., Kirschenbaum, S. K., Trafton, J. G., & Littleton, E. B. (2007). Complex visual data analysis, uncertainty, and representation. In M. C. Lovett & P. Shah (Eds.), Thinking with data. Mahwah, NJ: Erlbaum. Sheer, V. C., & Cline, R. J. (1995). Testing a model of perceived information adequacy and uncertainty reduction in physician/patient interactions. Journal of Applied Communication Research, 23, 44–59. Trickett, S. B., & Trafton, J. G. (2007). ‘‘What if. . .’’: The use of conceptual simulations in scientific reasoning. Cognitive Science, 31(5), 843–875. Trickett, S. B., Trafton, J. G., Saner, L. D., & Schunn, C. D. (2007). ‘‘I don’t know what is going on there’’: The use of spatial transformations to deal with and resolve uncertainty in complex visualizations. In M. C. Lovett & P. Shah (Eds.), Thinking with data. Mahwah, NJ: Erlbaum. Trickett, S. B., Trafton, J. G., & Schunn, C. D. (2009). How do scientists respond to anomalies? Different strategies used in basic and applied science. Topics in Cognitive Science, 1, 711–729.
252
Christian D. Schunn
Trope, Y. (1978). Inferences of personal characteristics on the basis of information retrieved from one’s memory. Journal of Personality and Social Psychology, 36(2), 93–106. Ungerleider, L. G., & Mishkin, M. (1982). Two cortical visual systems. In D. J. Ingle, M. A. Goodale, & R. J. W. Mansfield (Eds.), Analysis of visual behavior. Cambridge, MA: MIT Press. Urbany, J. E., Dickson, P. R., & Wilkie, W. L. (1989). Buyer uncertainty and information search. Journal of Consumer Research, 16(2), 208–215. Vlek, C., & Hendrickx, L. (1988). Statistical risk versus personal control as conceptual bases for evaluating (traffic) safety. In T. Rothengatter & R. de Bruin (Eds.), Road user behaviour: Theory and research (pp. 139–151). Assen, Netherlands: Van Gorcum & Co B.V. Voss, J. F., Tyler, S. W., & Yengo, L. A. (1983). Individual differences in the solving of social science problems. In R. F. Dillon & R. R. Schmeck (Eds.), Individual differences in cognition, Vol. 1 (pp. 205–232). New York: Academic. Walker, V. R. (1991). The siren songs of science: Toward a taxonomy of scientific uncertainty for decision makers. Connecticut Law Review, 23, 567. Walker, V. R. (1998). Risk regulation and the "faces" of uncertainty. Risk: Health, Safety, & Environment, 9, 27–38. Webster, A., & Bond, T. (2002). Structuring uncertainty: Developing an ethical framework for professional practice in educational psychology. Educational and Child Psychology, 19(1), 16–29.
C H A P T E R
S E V E N
Event Perception: A Theory and Its Application to Clinical Neuroscience Jeffrey M. Zacks and Jesse Q. Sargent Contents 1. Introduction 2. Event Segmentation Theory 2.1. Prior Evidence 3. Schizophrenia 3.1. Cognitive Deficits 3.2. Schizophrenia and Event Segmentation 4. Obsessive-Compulsive Disorder 4.1. Cognitive Disturbances 4.2. Obsessive-Compulsive Disorder and Event Segmentation 5. Parkinson’s Disease 5.1. Cognitive Deficits 5.2. Parkinson’s Disease and Event Segmentation 6. Lesions of the Prefrontal Cortex 6.1. Cognitive Deficits 6.2. Prefrontal Lesions and Event Segmentation 7. Aging 7.1. Prefrontal Cortex 7.2. Midbrain Neuromodulatory Systems 7.3. Episodic Memory and Situation Model Construction 7.4. Aging and Event Segmentation 8. Alzheimer’s Disease 8.1. Brain Changes and Cognitive Deficits 8.2. Alzheimer’s Disease and Event Segmentation 9. Conclusions Acknowledgments References
254 255 259 262 263 263 264 265 267 269 269 271 272 272 274 275 276 277 277 279 282 283 286 287 290 290
Abstract The chunking of continuous ongoing activity into discrete events is a central component of perception and cognition. It plays important roles in attention, cognitive control, and memory. Here, we review a theory of how the mind/brain Psychology of Learning and Motivation, Volume 53 ISSN 0079-7421, DOI: 10.1016/S0079-7421(10)53007-X
#
2010 Elsevier Inc. All rights reserved.
253
254
Jeffrey M. Zacks and Jesse Q. Sargent
segments ongoing activity into meaningful events. The theory proposes that event segmentation arises because perceptual systems make predictions about the near future. These predictions are guided by working memory representations, and when predictions fail memory representations are updated. We apply the theory to six conditions in clinical neuroscience: schizophrenia, obsessivecompulsive disorder, Parkinson’s disease, lesions of the prefrontal cortex, aging, and Alzheimer’s disease. This analysis makes novel suggestions for interventions to address these conditions, and points the way to new avenues of research.
1. Introduction For humans to experience the world as structured, the brain must organize the cacophonous wash of information that comes in through the senses. One powerful organizational principle is chunking: grouping a contiguous region of the input space under one representation. Chunking occurs at many stages in the central nervous system. Close to the sensory surfaces, a single neuron in the primary visual cortex may code for information gathered from many individual photoreceptor cells in the retina and thus represent an extended, coherent feature in the visual environment, such as a line at a certain orientation (Hubel & Wiesel, 1968; Marr & Ullman, 1981). At later stages, information in memory appears to be organized so that some specific items are associated or grouped with other specific items. For example, items that are learned in close temporal proximity are more likely to be recognized later if they are again presented in close temporal proximity (e.g., Faust, Balota, & Spieler, 2001; Underwood, 1957). In this chapter, we present a theory of how the human brain chunks the continuous stream of experience associated with everyday life into discrete episodes, or events. If asked to recall yesterday’s activities, one might organize the description into separate chunks such as going to the grocery store, doing the laundry and calling a friend. We propose that this organization is not just the result of conscious efforts to present an orderly description. Rather, as a function of perceiving and experiencing those episodes, a network of specific brain regions automatically inserts boundaries between discrete events as they occur. As a fundamental element of normal perceptual processing, this type of segmentation is suggested to be at the center of attention, action control, online memory updating, and episodic memory encoding. Because these functions are so important for everyday functioning, conditions that affect event segmentation may produce significant changes in cognition. Accordingly, this chapter investigates event segmentation in relation to cognitive disorders in which everyday event understanding is disturbed. We hope this will prove useful in organizing the facts about
Event Perception: A Theory and Its Application to Clinical Neuroscience
255
deficits of higher level cognition, providing testable hypotheses, and suggesting specific interventions that might not emerge from other theoretical perspectives. We begin with an overview of the theory and then consider how it may be useful in efforts to understand several neuropsychological disorders and the cognitive changes associated with healthy aging.
2. Event Segmentation Theory Event Segmentation Theory (EST) describes how and why our nervous systems segment ongoing experience into discrete episodes (Zacks, Speer, Swallow, Braver, & Reynolds, 2007; see also Kurby & Zacks, 2008; Swallow & Zacks, 2008). For example, consider what might happen during a typical visit to a coffee shop: you wait in line, you give your order, you pay, you put cream in your coffee, you leave. Different people will generate somewhat different lists of activities, but all are able to describe experience across time as organized into distinct units and overall there will be considerable agreement across individuals regarding what those units are. EST proposes this happens because, as part of normal perceptual processing, humans automatically segment episodes into units. In fact, EST suggests that the ongoing segmentation of experience is at the center of cognitive control, working memory (WM) updating, and storage and retrieval from episodic memory. The core components of EST, corresponding hypothesized neurophysiological structures, and the basic flow of information are illustrated in Figure 1. Reference to Figure 1 may be helpful as we describe the components of EST and review some of the relevant empirical evidence below. For a more detailed presentation of the neurocognitive account, see Zacks et al. (2007). For a more detailed computational presentation and computer simulation results, see Reynolds, Zacks, and Braver (2007). EST starts from the supposition that some of the most important products of perception and comprehension are predictions about what will happen in the near future. Prediction is front and center in many contemporary accounts of perceptual processing (Enns & Lleras, 2008), learning (Schultz & Dickinson, 2000), and language (Elman, 2009). Good predictions are adaptive because they allow one to plan actions more successfully (e.g., avoiding hazards or intercepting desired objects). Also, good predictions can facilitate efficient perceptual processing. For example, if a pitcher winds up and completes a throwing motion, the perceptual system anticipates that the ball will fly out of the pitcher’s hand toward home plate. In the absence of such anticipation, perceiving the ball whizzing through the air would be much more difficult—in fact, one might miss it altogether! According to EST, prediction is abetted by WM representations called event models. Event models may be thought of as representations of
256
Jeffrey M. Zacks and Jesse Q. Sargent
Error detection SN, VTA, LC Predicted future inputs ACC, ...
∗ Perceptual processing IT, MT+, pSTS, ...
Event models lateral PFC
Event schemata lateral PFC
Sensory inputs A1, V1, S1, ...
Figure 1 Schematic depiction of the model, with hypotheses about the neurophysiological structures corresponding to the different components of the model. Thin gray arrows indicate the flow of information between processing areas, which are proposed to be due to long-range excitatory projections. Dashed lines indicate projections that lead to the resetting of event models. PFC, prefrontal cortex; IT, inferotemporal cortex; MTþ, human MT complex; pSTS, posterior superior temporal sulcus; ACC, anterior cingulate cortex; SN, substantia nigra; VTA, ventral tegmental area; LC, locus coeruleus; A1, primary auditory cortex; S1, primary somatosensory cortex; V1, primary visual cortex. (Adapted with permission from Zacks et al., 2007.)
what-is-happening-now. EST suggests that all perceptual input is processed in the context of a currently activated conception of what-is-happeningnow. Our conceptualization of event models borrows heavily from work on situation models in discourse comprehension (e.g., Zwaan & Radvansky, 1998). Event models represent those aspects of a situation that are consistent within an event, while ignoring those aspects that vary haphazardly from moment to moment. Such representations are helpful not only for prediction but also because they allow the disambiguation of ambiguous sensory information and the filling-in of missing information. For example, at a baseball game an event model would represent the location of the baseball while it is hidden in the pitcher’s glove. We have proposed that event models are maintained in lateral prefrontal cortex (PFC). Event models combine current perceptual information with information acquired very recently in the present context, and with patterns of information learned over a lifetime of experience. For example, if you have never seen a baseball game, the first time the pitcher sets up to throw, you may have very little idea where the ball will go. As the pitch count goes up, your expectation that each upcoming pitch will go to home plate increases. However, if you are an experienced baseball fan, each pitch in an at-bat is perceived in the context of an event model informed by relatively stable long-term semantic memory about what happens at ball games. In EST, these long-term weight
Event Perception: A Theory and Its Application to Clinical Neuroscience
257
based representations are referred to as event schemata. In contrast, event models are activation-based WM representations. So, the content of an event model may overlap at any given time with a particular event schema, but when an event model ceases to have predictive value, it can be rapidly and completely updated to reflect the changing situation. We propose that event schemata as well as event models are implemented by the lateral PFC. A number of studies suggest that representations of events are maintained in the anterior, lateral PFC (e.g., Grafman, 1995; Schwartz et al., 1995; Wood & Grafman, 2003). We review some of this evidence in more detail in Section 6. The exact nature of the interaction between event models and event schemata is currently a topic of active research. So, while event models may be informed by current perceptual information, they can also influence how the perceptual system processes that incoming information (see Figure 1). For example, as described above, information provided by event models allows the visual system to anticipate the flight of a baseball before it is released by the pitcher. However, event models may facilitate processing of all types of sensory information across numerous, distributed brain regions. Perceptual analysis is accomplished by hierarchically organized neural systems specialized for vision, hearing, touch, and the other sensory modalities. For example, in the visual system (Felleman & Van Essen, 1991), information is initially represented in terms of simple local visual features in the early visual areas (V1 and V2, in the posterior occipital cortex). Successive processing stages form representations that are increasingly extended in space and time. Two broad streams process information important for object identification and for motor control relatively separately (Goodale, 1993). Features relevant to object identity and category are differentially represented in inferior temporal cortex (IT), whereas features related to motion and grasping are differentially represented in dorsal regions including the human MT complex (MTþ) and the posterior superior temporal sulcus (pSTS). Although there is communication between the streams and massive feedback throughout the system, these systems can be described as hierarchically organized, following a rough posterior-to-anterior spatial organization. Many of the classical studies characterizing these perceptual systems were conducted in nonhuman primates and relied on radically simplified stimuli. However, recent neuroimaging studies have shown similar responses in these areas across individuals during movie viewing (Bartels & Zeki, 2004; Hasson, Nir, Levy, Fuhrmann, & Malach, 2004; Hasson, Yang, Vallines, Heeger, & Rubin, 2008). EST proposes that event models bias processing in these streams. As we shall see shortly, EST also proposes that the updating of event models regulates processing over time in these streams. A critical feature of event models is that they need to be protected from moment-to-moment changes in sensory and perceptual information. Updating one’s event model to delete the baseball when it disappeared
258
Jeffrey M. Zacks and Jesse Q. Sargent
from sight would clearly be counterproductive. However, event models have to be updated eventually in order to be useful—the baseball game model will not be helpful at a gas station! The question is, when and how can event models be updated adaptively? EST’s answer is that event models are updated in response to transient increases in prediction error, mediated by systems in the anterior cingulate cortex (ACC) and midbrain neuromodulatory systems. The ACC maintains predictions and constantly compares them to actual inputs, producing an online error signal. Studies have shown this region to be sensitive to the commission of overt errors and to covertly measured cognitive conflict (e.g., Botvinick, Braver, Barch, Carter, & Cohen, 2001) and to the learning of sequential behaviors (Koechlin, Danek, Burnod, & Grafman, 2002; Procyk, Tanaka, & Joseph, 2000). When prediction error increases suddenly, this is detected by monitoring systems in the midbrain, which broadcast a global reset signal to the cortex. This system may include dopamine-based signaling subserved by the substantia nigra (SN) and ventral tegmental area (VTA) and norepinephrine-based signaling subserved by the locus coeruleus (LC). Neurons in the SN and VTA are sensitive to errors in reward prediction (e.g., Schultz, 1998). Dopamine cells in the SN and VTA project broadly to the frontal cortex, both directly and through the striatum, providing a mechanism for a reset signal such as is posited by EST. The LC has been implicated in regulating the sensitivity of an organism to external stimuli (e.g., Usher, Cohen, Servan-Schreiber, Rajkowski, & Aston-Jones, 1999). It also has broad connections to the cortex, these based on norepinephrine rather than dopamine. The reset signal transiently opens an input gate on the event models, exposing them to the early stages of sensory and perceptual processing (see Figure 1). This produces a short burst of increased activity in the perceptual processing stream and the event models settle into new states. As the event models are updated, predictions become more adaptive and errors decrease. The system returns to a stable configuration. A schematic representation of the temporal dynamics of the error-based updating process is shown in Figure 2. According to this account, event segmentation is an ongoing concomitant of everyday experience, which happens without intent or necessarily awareness. The processing that occurs at event boundaries can be viewed both as focal attention and as memory updating. An appropriate (stable) event model is a WM buffer whose outputs bias processing in that stream. The opening up of event models’ inputs is a form of focal attention, and the settling into a new state is a form of memory updating. Event segmentation in and of itself is not the goal of the system; instead, it is a by-product of mechanisms evolved in support of a more efficient, predictive perceptual system. Important for thinking about how EST applies to daily experience, it is suggested that event segmentation occurs simultaneously at multiple time scales. Consider the coffee shop example given above. If one’s event model for going to a coffee shop generates predictions consistent with all the distinct units of activity typically involved (e.g., waiting, ordering, paying),
Event Perception: A Theory and Its Application to Clinical Neuroscience
259
Prediction error
Event models updated
Prediction error low; event models stable
An unpredicted change occurs; prediction error increases
Prediction error returns to low level
Time
Figure 2 Temporal dynamics of event segmentation. Most of the time prediction error is relatively low and event models are stable. As a model becomes less adaptive, prediction error increases. In response, information form sensory and perceptual processing is gated into the model, updating its contents. After updating, error declines and the model settles into new state.
then no error signal would be generated, the model would be stable throughout the episode, and no event boundaries would occur. So, how does EST explain the segmentation of going to a coffee shop into distinct units of activity? We may consider events as hierarchical representations. The event ‘‘going to a coffee shop’’ is at a higher level in the hierarchy than the events ‘‘waiting in line’’ and ‘‘ordering’’. Lower level aspects of an event representation are sensitive to prediction error signals integrated over shorter time scales. So, when it comes time to place an order, the ‘‘waiting in line’’ level of the hierarchical event representation generates some degree of prediction error. That hierarchical level becomes unstable until the ‘‘ordering’’ model is instantiated at which point the error signal decreases. Meanwhile, at a higher level of the event representation, ‘‘going to a coffee shop’’ is insensitive to such short lived error signals. Higher levels are sensitive to error signals integrated over longer time scales. When one leaves the coffee shop, it is likely that there will be a more prolonged increase in error. The resulting prolonged error signal causes instability at a higher level of the hierarchical representation and ‘‘going to a coffee shop’’ is abandoned for a more adaptive model. In accordance with this explanation, we would expect models at higher hierarchical levels to make less specific predictions. Also, we would expect boundaries between events at higher hierarchical levels to align with boundaries between events at lower levels.
2.1. Prior Evidence EST makes a number of claims about behavior and brain function, some of which are consistent with previous research and some of which have been tested directly. First, EST predicts that event segmentation is an ongoing part of normal perceptual processing. Evidence for this proposal comes from
260
Jeffrey M. Zacks and Jesse Q. Sargent
behavioral and functional magnetic resonance imaging (fMRI) studies. In a typical event segmentation paradigm, participants watch movies of actors engaged in everyday activities (e.g., doing laundry) and are instructed to press a button whenever they believe one meaningful unit of activity has ended and another has begun (Newtson, 1973). When instructions direct attention to larger (coarse grain) or smaller (fine grain) units of activity, the behavioral data are thought to reflect ongoing event segmentation at higher or lower levels of hierarchical event representation. Studies have demonstrated that segmentation of videos using this method shows both stable intersubject agreement, and stable individual differences over a period of more than a year (Newtson, 1976; Speer, Swallow, & Zacks, 2003; Zacks et al., 2007). Furthermore, observers spontaneously group fine-grained event boundaries into hierarchically organized coarse-grained events (Newtson, 1976; Zacks, Tversky, & Iyer, 2001). That is, coarse grain boundaries tend to correspond to a subset of fine grain boundaries, which supports the view that event segmentation occurs simultaneously at multiple time scales. The reliability and structure of the data from the segmentation task support the suggestion that this paradigm is capturing an ongoing feature of normal perception. Ultimately, however, these results prove only that individuals can segment ongoing experience into units. Evidence that individuals do segment experience in the course of normal day-to-day perception comes from neurophysiological studies. Using fMRI, Zacks et al. (2001) first monitored participants’ brain activity during passive viewing of simple movies. Afterward, participants segmented the movies by indicating whenever, in their view, one meaningful unit of activity had ended and another had begun. During passive viewing, a collection of regions transiently increased in activity at those moments that viewers later identified as event boundaries. These regions included areas in lateral posterior cortex (including the inferior and superior temporal sulci and ventral temporal cortex), medial posterior cortex (including the cuneus and precuneus), and lateral frontal cortex. Similar results have been generated using several variations of this general paradigm (Speer, Zacks, & Reynolds, 2007; Speer et al., 2003; Zacks, Swallow, Vettel, & McAvoy, 2006). Second, EST predicts that perceptual processing increases at event boundaries. The fact that brain activity transiently increases at event boundaries is consistent with this prediction—particularly suggestive are the increases in posterior regions associated with perceptual processing. It has been shown that memory for perceptual details at or around event boundaries is better than that for details associated with event middles (Newtson & Engquist, 1976; Schwan, Garsoffky, & Hesse, 2000). Also, EST suggests that if the surface structure of events is consistent with the underlying event structure, then event segmentation mechanisms should operate more efficiently, and again, memory for the episode should improve. This too has
Event Perception: A Theory and Its Application to Clinical Neuroscience
261
been borne out in the laboratory (e.g., Schwan & Garsoffky, 2004). For example, Boltz (1992) showed participants a feature film with no commercial breaks, breaks that corresponded to event boundaries, or with breaks placed at nonboundaries. Recall of activity and memory for the temporal order of events in the movie was improved by the breaks at event boundaries and reduced by the breaks at nonboundaries. Further support for the suggestion that segmenting events in a manner that corresponds to their intrinsic structure improves memory for those events comes from a study of individual differences. Zacks, Speer, Vettel, and Jacoby (2006) found that group-typical segmentation of movies, which may be assumed to reflect intrinsic structure, predicted better performance on subsequent memory tests after controlling for overall cognitive level. Another prediction of EST is that information associated with the current event model, and thus active in WM, should be more accessible than information associated with a previously active model. When using text material, event boundaries can be induced by imposing a change such as a temporal break (e.g., ‘‘. . .a day later. . .’’) or a shift of spatial location (e.g., ‘‘the detective burst into the room’’). Such shifts result in the perception of an event boundary for films as well (Zacks, Speer, & Reynolds, 2009). Numerous studies using text comprehension have shown results consistent with this prediction (e.g., Bower & Rinck, 2001; Zwaan & Radvansky, 1998). For example, Speer and Zacks (2005) required participants to read narratives and showed that memory for items in the narrative was lower when a temporal break intervened between the mention of the item and the test. Similar results have recently been obtained with movies (Swallow, Zacks, & Abrams, 2009). In sum, EST proposes that predictions about the near future are guided by WM representations of the current event, which are updated in response to transient increases in prediction error. This updating includes upregulation of the perceptual processing pathways feeding into event models. The experience of an error spike and consequent updating is perceived as a boundary between meaningful events. Thus, event segmentation is an ongoing perceptual mechanism standing at the center of attention, cognitive control, and memory. It is subserved by a distributed set of brain mechanisms described above (see Figure 1). If one or more of these is selectively affected by a disorder or age-related process, it may have substantial consequences for cognition. In the following sections, we apply EST to the analysis of six conditions in clinical neuroscience. We have selected six conditions based on the overlap between the neurocognitive mechanisms implicated in each and the mechanisms of event segmentation as proposed by EST. The six are: schizophrenia, obsessive-compulsive disorder (OCD), Parkinson’s disease (PD), lesions of the PFC, aging, and AD. Our selections are necessarily heuristic and surely incomplete. However, we think the analysis shows the
262
Jeffrey M. Zacks and Jesse Q. Sargent
potential for EST to provide new insights regarding major cognitive deficits associated with these disorders.
3. Schizophrenia Schizophrenia is a developmental neurocognitive disorder that affects approximately 1% of adults (Bresnahan et al., 2000). In most cases, it is diagnosed in early adulthood and has consequences throughout adult life. Schizophrenia has classically been characterized by positive symptoms, which include hallucinations, delusions, and paranoia, and by negative symptoms, which include flattened affect, reduced volition, and anhedonia. However, it has become increasingly clear that cognitive impairments are a prominent part of the disease, and that these have profound effects on people’s lives. In a review and meta-analysis, Green, Kern, Braff, and Mintz (2000) examined the relations between cognitive deficits and functional outcomes. The cognitive variables studied included secondary (long-term) verbal memory, immediate verbal memory, executive control, and vigilance. The functional outcome measures included success in psychosocial skill acquisition, social problem-solving, and daily functioning such as occupational functioning and independent living. All of the cognitive variables were related to functional outcomes, accounting for 20–40% of the variance across individuals. Thus, cognitive performance is a major predictor of the ability of persons with schizophrenia to maintain employment, build social ties, and live independently. The etiology and pathophysiology of schizophrenia are complex and not fully understood. Schizophrenia selectively affects the PFC, as well as the hippocampus and thalamus (Harrison, 1999). The neurotransmitter dopamine has been shown to play a major role in the disorder, though its functions are still not fully known. Dopamine’s role in schizophrenia has been reviewed by Guillin, Abi-Dargham, and Laruelle (2007). Early research focused on the D2 dopamine receptor, which is widely expressed in the midbrain where large numbers of dopamine neurons are found. In a classic set of studies, Creese, Burt, and Snyder (1976) discovered that the effectiveness of antipsychotic medications to treat the positive symptoms of schizophrenia was correlated with their ability to occupy D2 receptor sites. Current theory holds that abnormally high activity of D2 receptors in the midbrain reduces the effectiveness of glutamate, an excitatory neurotransmitter. More recently, attention has focused on D1 receptors in the PFC. D1 receptors have been found to have reduced activity in the dorsolateral PFC, possibly as a compensatory response to chronic
Event Perception: A Theory and Its Application to Clinical Neuroscience
263
overstimulation. Reduced D1 responsiveness interferes with inhibitory local signaling based on the neurotransmitter GABA.
3.1. Cognitive Deficits The cognitive deficits in schizophrenia are specialized rather than global. For example, implicit memory appears to be relatively well preserved (Clare, McKenna, Mortimer, & Baddeley, 1993). However, WM—the ability to store and manipulate information over short durations—is impaired. Barch (2005) reviewed the data on WM impairments in schizophrenia and proposed that a specific component of WM is affected by schizophrenia. Baddeley’s (1986) WM theory proposes that WM is implemented by a set of passive storage systems and a central executive that manages the updating and transformation of information in the storage systems. Barch argued that the data suggest no impairment in the maintenance of auditory information, possible impairment of visuospatial information, and substantial impairment of the central executive. This central executive impairment is associated with functional differences in the PFC. In particular, Barch (2006) and colleagues have proposed that the ability to maintain cognitive representations of task set and use them to guide behavior is impaired in schizophrenia. This conception of the central executive is consistent with Baddeley’s (2000) recent proposal of an episodic buffer, a component of the central executive that maintains integrated multimodal representations of the current behavioral episode.
3.2. Schizophrenia and Event Segmentation In terms of EST, the neurochemical and cognitive disturbances identified in schizophrenia could produce two different effects on event understanding. If D2 hyperactivity impairs the effectiveness of long-range excitatory projections from the midbrain, this would be expected to impair event model updating. If D1 hypoactivity in the PFC affects the maintenance and use of information, this should be reflected as an impaired ability to maintain information in event models. This proposal fits with the behavioral findings that the central executive may be impaired in schizophrenia. In particular, it is consistent with the proposal that task set representations are affected by the disease. Both of these possibilities—impaired event model updating and impaired maintenance—would be expected to lead to deficits in event segmentation. There is very little direct evidence on event perception in schizophrenia, but the existing data support the existence of an event segmentation deficit. Zalla, Verlut, Franck, Puzenat, and Sirigu (2004) asked outpatients with schizophrenia and healthy controls to view movies of everyday activities and segment them into fine and coarse events. Patients and controls
264
Jeffrey M. Zacks and Jesse Q. Sargent
identified similar numbers of events, and their fine-grained event boundaries were located in similar locations. However, the patients tended to identify coarse-grained boundaries in normatively incorrect locations. This tendency was correlated with schizophrenic symptomatology. In a pilot study in our laboratory (Zacks & Barch, unpublished data), we replicated the finding that persons with schizophrenia segmented in a less normative fashion than healthy controls. Persons with schizophrenia also showed impaired memory for the temporal order of events, and impaired recognition memory for pictures taken from the events. In the future, it will be important to follow up these results to determine if schizophrenia produces a selective deficit in event model updating or maintenance. If updating is selectively impaired, it may be possible to remediate this by teaching explicit strategies for identifying event boundaries, or by explicitly highlighting event boundaries in texts or films. If event model maintenance is selectively impaired, it may be possible to intervene by teaching explicit strategies to rehearse key information such as characters and task goals, or by providing external memory aids to support maintenance. Finally, in the meta-analysis described above, Green et al. (2000) note that the mechanism by which the cognitive deficits associated with schizophrenia lead to lower functional outcome scores remains unclear. Here, it is interesting to consider that event segmentation mechanisms may be closely related to performance on functional outcome measures. For example, in one functional outcome measure, patients are required to interpret vignettes depicting interpersonal interactions. The ability to form and maintain appropriate event models would seem to be central to this task. This suggests that measures of event segmentation ability might be particularly informative regarding the ability of schizophrenics to function independently in society. In sum, cognitive dysfunction is a salient component of schizophrenia and a major predictor of the disease’s impact on a person. Neurophysiological studies implicate the dopaminergic system, including midbrain D2 receptors and prefrontal D1 receptors. Disruption of either system could produce disorders of event segmentation and memory. The limited available evidence supports the existence of such disorders and suggests point of intervention to remediate them.
4. Obsessive-Compulsive Disorder OCD is a psychiatric condition characterized by persistent intrusive thoughts and compulsive behaviors. The obsessive thoughts often have to do with threats to safety and threats of contamination. Compulsive behaviors often relate to alleviating these threats (e.g., compulsive washing associated with obsessive thoughts about dirt or disease), but in some cases
Event Perception: A Theory and Its Application to Clinical Neuroscience
265
the behaviors appear to be unrelated to the obsessive concerns (Boyer & Lienard, 2008). For a time, a dominant view of the neurochemical mechanism of OCD was that it was caused by hypoactivity of the neurotransmitter serotonin. This was motivated largely by the finding that the symptoms of OCD were ameliorated by serotonin reuptake inhibitors (SRIs), antipsychotic drugs that strengthen the effects of serotonin in the synapse. However, the effects of SRIs are widespread and complex, and some studies that have directly intervened in the action of serotonin have cast doubt on its being the primary causal mechanism. As a result, some attention recently has focused on a possible role of dopamine in OCD (Fornaro et al., 2009). Another possibility that has been proposed is that a circuit involving the orbitofrontal cortex (OFC), the SN, and the basal ganglia is dysregulated, leading innate motor programs to be triggered inappropriately (Rapoport, 1990), or to hypersensitivity of attentional systems to environmental threats (Saxena & Rauch, 2000). Both serotonin and dopamine play important roles in this circuit. Huey et al. (2008) have presented a psychological and neuroanatomical model of OCD that is particularly relevant to the current discussion because of the central role played by event representations. This model suggests that the PFC supports goal-oriented, structured sequences of events (structured event complexes, or SECs). Once this type of event representation is activated, a network of neural systems involved in reward and emotional processing (e.g., OFC, limbic system), support a motivational signal, experienced as anxiety, that abates upon completion of the SEC. The authors suggest that OCD patients do not experience the relief from anxiety normally associated with completion of an SEC. Obsession is the behavioral manifestation of the neural signal that an SEC has been ‘‘left hanging.’’
4.1. Cognitive Disturbances Evans and Leckman (2006) have recently reviewed the epidemiology, symptomatology, and neurophysiology of OCD. They note that the intrusive thoughts associated with OCD are similar to those experienced by healthy controls—they are just more frequent and more difficult to dismiss. Obsessive behaviors vary over the lifespan in healthy persons, being most prominent in early childhood (2–6), at puberty, and after becoming a new parent. Early childhood and puberty are also the peak times of onset of clinical OCD. Together, these patterns suggest that persons with OCD do not have disordered representations of events, objects, or persons; rather they have a disruption in the ability to control the unwanted influence of some of these representations. Evans and Leckman propose that OCD arises from the dysregulation of evolutionarily adaptive systems for threat monitoring and avoidance.
266
Jeffrey M. Zacks and Jesse Q. Sargent
What is the nature of this dysregulation? Current accounts propose hyperactive monitoring of overt behavioral errors or of covert conflict between information processing streams (van Veen & Carter, 2002). Degree of obsessive thought is correlated with errors on the Wisconsin Card Sort Test (WCST). The WCST requires one to sort cards according to the number, shape, or color of objects on the card. One must discover a rule for sorting cards based on feedback, and then adapt one’s performance when the experimenter covertly changes the rule. This task would seem to be quite sensitive to dysfunction in error detection mechanisms. However, there is no evidence that increased WCST errors among OCD patients are related to error or conflict processing, and the relationship could reflect broader cognitive impairments. There are stronger data linking OCD to selective deficits in motor inhibition and response suppression (Evans & Leckman, 2006). However, the strongest data come from studies that have measured neurophysiological responses to situations that produce errors or high conflict. These studies have focused on the ACC. As we have described in Section 2, the ACC is associated with monitoring conflict between information streams, and in EST it is proposed to support the evaluation of prediction error. In one study, Gehring, Himle, and Nisenson (2000) asked persons with OCD and healthy controls to perform the Stroop task. In this task, participants are shown color words printed in various ink colors (either congruent or incongruent with the color named by the word) and asked to name the ink color, rather than to read the word. This requires suppressing a prepotent response to read the word and produces slow responses or errors, depending on the task constraints. Gehring and colleagues used electroencephalography to measure a correlate of error processing, the error-related negativity (ERN), during task performance. The ERN is a negative-going wave that is found just after people commit errors in simple cognitive tasks. It is strongest over frontocentral electrodes, and is thought to originate in the ACC. The Stroop task requires a high degree of cognitive control and leads to frequent errors, accompanied by ERNs. In persons with OCD, these responses were exaggerated. In a functional MRI experiment from this group, persons with OCD and controls performed a flanker task, in which they had to respond to the identity of the central character in an array while ignoring the characters on either side. Like the Stroop task, the flanker task produces a conflict between response tendencies driven by the target stimulus information and those driven by the to-be-ignored information. Also, like the Stroop task, it produces many errors. Both the control and patient groups showed increased activity in the dorsal portion of the ACC on trials in which they made errors. However, the OCD patients also showed significant increases in the rostral ACC. Overt errors do not appear to be necessary to produce activation in the ACC, nor to dissociate the neurophysiological response of control and OCD participants to task performance (van Veen & Carter, 2002). In one study, Ursu, Stenger, Shear,
Event Perception: A Theory and Its Application to Clinical Neuroscience
267
Jones, and Carter (2003) asked participants with OCD and healthy controls to perform a version of the continuous performance task. In this task, participants view a sequence of alphabetic characters and are asked to respond only when a particular two-step subsequence is presented (e.g., an A followed by an X). The stimulus set was constructed such that when the first character (A) appeared, the second (X) was quite likely. This establishes a strong prepotent tendency to respond to the following character. Thus, if the following character is a nontarget (e.g., Y), there is a conflict between the prepotent response and the correct nonresponse. On such trials, persons with OCD showed larger responses in the ACC than controls—even when they successfully withheld their responses. A striking feature of all three of these studies—using three different tasks—is that the behavioral performance of persons with OCD did not differ substantively from the controls. Thus, the neurophysiological markers of exaggerated error or conflict processing were present even when there was no evidence of ‘‘compulsive’’ behavioral performance.
4.2. Obsessive-Compulsive Disorder and Event Segmentation In terms of EST, we consider three possible pathways by which OCD could be related to event segmentation. One possibility is that compulsive behavior results from attending to or integrating prediction error signals at an abnormally short time scale, which should cause one to experience events as segmented at an abnormally fine grain. Boyer and Lienard (2008) propose that this is the case and that it accounts for the ritualized character of compulsive behaviors. If one attends to events on a very fine time scale, one should neglect their relation to larger events and the larger goals of one’s actions (Vallacher & Wegner, 1987). Boyer and Lienard propose that this shift to a fine grain of event segmentation is adaptive because it occupies WM, which reduces the intrusion of obsessive thoughts. A recent study by Zor et al. (2009) provides support for this possibility. In this study, participants with OCD were videotaped performing activities that formed the basis for their compulsive behavior, for example, filling a pet’s bowl, lighting a cigarette, or blowing one’s nose. For each patient participant, a control participant was videotaped performing the same activities. The lowlevel actions (e.g., checking the bowl’s position, waving hands) were coded from each videotape. The patient group performed many more actions than the controls and repeated actions more often. Importantly, these ‘‘extra’’ actions tended to be idiosyncratic and apparently nonfunctional, such as waving one’s hands when filling a pet’s bowl. This result suggests that the patients were attending to the activity at a low level that neglected the goal relevance of the individual actions. A straightforward prediction from this proposal is that patients with OCD should segment activity into finergrained events than control participants. This could be tested using the
268
Jeffrey M. Zacks and Jesse Q. Sargent
behavioral tasks described previously. (One caveat is worth mentioning: Grain of segmentation in explicit event-marking tasks is quite sensitive to instructions and participants’ interpretations of those instructions. These effects presumably affect the output processes involved in performing the explicit task, such as deciding when to press a response key and executing the response, rather than affecting the ongoing segmentation process. So, if comparing patients to controls, one would want to minimize task demands that could affect segmentation grain and use converging measures to help distinguish between differences in the mechanisms of ongoing segmentation and differences in task-specific output processes.) A second possibility is that obsessions and compulsive behavior result from a chronically high prediction error signal or a too-low threshold for error-based gating. (In EST, these two components are not uniquely identifiable, because raising or lowering the mean error signal can be compensated for by lowering or raising the gating threshold.) This accords with OCD patients’ frequent report that things ‘‘just don’t feel right.’’ It is also consistent with the findings of exaggerated error and conflict signals in the ACC, described above. Exaggerated prediction error responses could result in more frequent activation of the error-based gating mechanism and therefore more frequent event boundaries. At the same time, if prediction error signals are chronically elevated, this would reduce the ability of the error-based updating system to distinguish between intervals of low and high prediction error. This in turn should produce unreliable, idiosyncratic event segmentation. Thus, this proposal predicts that the segmentation of patients with OCD should be more idiosyncratic than that of controls as well as finer grained. Failing to segment activity into the proper event units would be expected to reduce the effectiveness of actions and could produce the sorts of interruptions and perseverations reported by Zor et al. (2009). A final possibility is that persons with OCD have disordered event schemata. Schemata are long-term memory representations implemented by synaptic weight changes. These representations reflect commonly activated event models. For example, the event schema representing making toast is built up over a lifetime of experience making toast. Now, suppose that one began performing some nonfunctional perseverative behavior, such as tapping the toaster three times, whenever one made toast. Eventually, the schema for making toast would include this nonfunctional behavior. In this case, the presence of the toaster-tapping in the event schema would represent a source of compulsive behavior, in addition to whatever other sources may exist. Although this does not explain the initial appearance of the compulsive behavior, it may be a mechanism by which such behavior becomes difficult to expunge. In sum, the clinical profile and pathophysiology of OCD suggest it may involve a dysregulation of mechanisms for monitoring error or conflict. Such dysregulation could affect event segmentation directly or indirectly,
Event Perception: A Theory and Its Application to Clinical Neuroscience
269
through its long-term impacts on event schemata. If so, event segmentation measures may prove valuable for better understanding the mechanisms of OCD or for diagnosing it. We believe this is a fertile area for future research.
5. Parkinson’s Disease PD is a neurological condition characterized by a disorder of movement (Binder, Hirokawa, & Windhorst, 2009). Patients with PD have tremor, rigidity, postural instability, and bradykinesia—difficulty initiating movements combined with slow movement execution. Bradykinesia is often the most debilitating motor symptom. PD is diagnosed by a cluster of these motor symptoms combined with a finding that the individual responds to medications that increase the effectiveness of dopamine. PD is also associated with nonmotor symptoms including loss of smell, depression, anxiety, autonomic dysfunction, sleep disturbance, and cognitive deficits. The etiology of PD is not fully understood; most likely, PD can arise through multiple pathways (Olanow & Tatton, 1999). It occurs occasionally in middle age, but becomes more prevalent after age 60. The motor symptoms are the result of a dramatic reduction in the projection of dopamine cells from the SN to the striatum. These pathways form part of a set of thalamocortical loops, which are thought to be important for the online control of movement and cognition. However, PD is also associated with more diffuse lesions to subcortical and cortical structures as well. PD often is accompanied by frank dementia, particularly in the later stage of the disease (Aarsland et al., 2001). Dementia in PD is characterized by deficits in executive control, visuospatial processing, and personality disorder (particularly depression). The mechanisms of PD dementia are not well understood. The fact that it is not well controlled by dopamine agonist medications suggests that PD dementia may be caused by lesions other than those to the dopamine cells in the SN described above. Because this dementia is relatively global and its mechanisms are not currently well understood, its relevance to event perception is limited. In earlier or milder cases, the cognitive deficits are more focal and therefore may be more informative. Thus, we focus here on cognitive deficits in PD patients without dementia.
5.1. Cognitive Deficits In a comprehensive review, Taylor and Saint-Cyr (1995) described the primary cognitive deficit in PD as a selective impairment of the selection of action plans when the environment provides cues for multiple potential action plans. For example, patients with PD typically are impaired on the WCST. A key characteristic of this task is that the cues provided by the card
270
Jeffrey M. Zacks and Jesse Q. Sargent
underdetermine the correct response. PD patients are also impaired on a version of the Tower of Hanoi task. The Tower of Hanoi is a puzzle in which participants must move a stack of discs of various sizes from one of three pegs to another, subject to two rules: One can only move one disc at a time, and a larger disc can never be placed on a smaller disc. In this case, several moves are possible on each turn and the participant must hold multiple evaluations in mind to select a better move. Taylor and SaintCyr propose that the cognitive deficits can be understood neurobiologically in terms of two thalamocortical loops projecting from the SN. Both loops project through the basal ganglia to the cortex, primarily the PFC. Whereas the motor dysfunction in PD may be due to damage to projections from the SN to the caudate nucleus of the basal ganglia, the cognitive deficit may be due to projections from the SN to the putamen, as well as direct projections to the supplementary motor are and the dorsolateral PFC. A recent study focused, in particular, on cognitive deficits in PD patients without dementia (Green et al., 2002). Patients and controls were administered a battery of cognitive tests. Patients had relatively preserved short-term memory span and long-term recognition memory. However, impairments were frequently observed in the WCST and in fluency tasks (e.g., naming as many animals as possible within a 1-min interval). These deficits were interpreted as reflecting damage to the ‘‘cognitive’’ thalamocortical loops. However, patients also were frequently impaired on judgments of line orientation and the acquisition of new verbal memories; these deficits do not fit as well with this interpretation. Persons with PD are impaired at learning new sequential behaviors (Seger, 1994). For example, in the serial reaction time task, participants are cued to press one of several keys by the onset of a light above the key. Trials follow each other rapidly, and a repeating sequence of keys can be embedded in the string of trials. Performance improves over time for two reasons: Participants become faster at responding to the light, and they learn to anticipate the sequence of keypresses. This can be seen by contrasting the condition with the repeating sequence to a condition in which each light follows the previous one randomly. Performance in this control condition improves somewhat with practice, but not as much as in the sequential condition. Sequence learning in the serial reaction time task often occurs without participants becoming aware of the repeating sequence, particularly if the sequence is relatively long. Patients with PD show substantially reduced sequence learning in this and related tasks. In sum, in the early stages of PD, there may be a relatively selective deficit due to selective damage to the thalamocortical loops. This may result in impairments of action selection when multiple potential actions are possible, and in learning associations among actions in these conditions.
Event Perception: A Theory and Its Application to Clinical Neuroscience
271
5.2. Parkinson’s Disease and Event Segmentation In terms of EST, a primary lesion to the dopaminergic projections from the SN would be expected to produce a deficit in updating event models. The deficits of PD patients in the WCST accord well with this possibility. However, we will see that similar deficits can be produced by lesions to frontal cortex, which we interpret as selectively affecting event model maintenance or event schemata (see Section 2). This task is not well suited to teasing apart event model updating from maintenance. Similarly, impairments in action selection and sequential learning are consistent with a deficit in event model updating, but do not discriminate this possibility from numerous others. A pair of studies by Zalla and colleagues (Zalla et al., 1998, 2000) strongly suggest that event schemata are intact in patients with PD, dissociating their performance from those of patients with prefrontal lesions. In one study (Zalla et al., 1998), patients with PD were given cards describing steps in everyday activities such as toasting bread and going to the movies. On each trial, 20 cards were given, 5 for each of 4 activities. Participants were asked to sort the cards such that the steps for each activity were segregated and ordered. Whereas frontal lesion patients frequently mixed steps from the different activities, PD patients were able to segregate the activities and order the cards. However, their performance was quite slow, and when distractor steps were included (which did not belong to any of the activities), PD patients were less able to set these aside. Zalla et al. concluded that whereas the frontal lobe patients had deficient event representations, the PD patients had intact event knowledge but had difficulty shifting their cognitive set in order to deploy that knowledge efficiently in the task. In the second study (Zalla et al., 2000), PD patients and patients with frontal lobe lesions were asked to generate lists of the steps involved in a similar set of everyday activities. Again, the frontal lobe group showed evidence of impaired event knowledge, producing fewer correct steps and failing to place them in the correct order. The PD group showed neither impairment. However, they were less able to identify which steps were important for completing the activity. Zalla et al. interpret this as an impairment in action selection. Little is known about event segmentation in PD. EST predicts that if cortical updating due to dopamine signaling is impaired, then patients with PD should show disorganized event segmentation. This should be evident in segmentation behavior: Patients with PD should show reduced segmentation agreement. Moreover, patients with PD should show reduced evoked brain responses at event boundaries, reflecting reduced updating. This should hold whether normative event boundaries or those identified by the patient are used to estimate the evoked responses.
272
Jeffrey M. Zacks and Jesse Q. Sargent
Thus, the deficits in event understanding observed in PD are consistent with the hypothesis that dopamine-based updating is impaired in this disorder. However, the tasks that have been used thus far do not differentiate this possibility from the possibility that event model maintenance may be impaired. This is an important question for future research.
6. Lesions of the Prefrontal Cortex Lesions to the PFC produce cognitive disturbances that are at once subtle and profound. On the one hand, prefrontal lesions rarely produce dramatic deficits in sensation, perception, or movement control (though lesions to the immediately posterior parts of frontal cortex produce profound motor deficits). On the other hand, prefrontal lesions often produce disorders of intentional action that interfere greatly with everyday functioning. There is a large literature on the cognitive deficits associated with prefrontal lesions (for reviews, see Fuster, 1997; Grafman, 1995). Here, we focus on those aspects of cognition that are most relevant for event understanding.
6.1. Cognitive Deficits Persons with prefrontal lesions frequently suffer from particular forms of apraxia, or disorder of action. Whereas persons with posterior lesions are more likely to experience apraxias in which they are unable to pick up objects or perform body movements on command, persons with prefrontal lesions often have intact ability to perform simple actions but deficits in the ability to organize these actions effectively. Schwartz and her colleagues have described this as action disorganization syndrome (Schwartz, 2006; Schwartz et al., 1995). One potential cause of action disorganization syndrome is damage to the long-term memory representations supporting structured action. Evidence that such knowledge depends critically on the PFC comes from several sources. Grafman and colleagues have suggested the PFC stores representations of typical actions called structured event complexes (SECs; e.g., Grafman, 1995, 1999; Sirigu et al., 1998). SECs correspond closely to the event schemata described above (see Section 2). They are structured representations that capture information about the actions that make up an activity, their relations, the social structure of the activity, and the activity’s characteristic physical setting and objects. It is posited not only that SECs are stored in PFC, but that they are stored with category-specific localization. Using fMRI, Wood and Grafman (2003) showed that when participants made classification judgments about whether single words belonged to particular semantic categories, PFC activation patterns differed from those observed when judgments were made about whether action phrases
Event Perception: A Theory and Its Application to Clinical Neuroscience
273
belonged to particular SECs (e.g., going out to dinner). Furthermore, patterns of PFC activation differed depending on whether the classified items were social in nature or not. Similar results have been reported by Zanini and colleagues (Zanini, 2008; Zanini, Rumiati, & Shallice, 2002). One of the factors that distinguish SECs from other types of memory representations is the inclusion of information regarding the sequencing of behaviors over time. For example, Sirigu et al. (1996) examined the selection and temporal organization of actions among normal controls and patients with lesions to either the PFC or to more posterior regions. This study used the same paradigm that Zalla et al. (1998, 2000) used to measure event knowledge in patients with PD (see Section 5). Participants were given cards printed with the steps in a set of four everyday activities, and asked to sort the cards to separate the activities and place the steps in order. Patients with PFC lesions were more likely to place steps out of order, and more likely to intrude steps from one activity into another. Research by Humphreys and colleagues has directly compared action observation and action performance, suggesting that a common deficit in event knowledge can impair both (Humphreys & Forde, 1998; Humphreys, Forde, & Riddoch, 2001). A second potential cause of action disorganization syndrome is disruption to the ability to maintain representations of one’s current actions and goals online. It has been proposed that the PFC maintains representations of one’s current goals and task (e.g., Miller & Cohen, 2001; Mushiake et al., 2009). This proposal is based in part on the finding that PFC neurons exhibit sustained firing during memory and other tasks (e.g., Fuster & Alexander, 1971; Levy & Goldman-Rakic, 2000). In human fMRI studies, sustained activity is found in PFC when participants attempt to maintain information over a delay (Wager & Smith, 2003). Some individual cells in monkey PFC are sensitive to which task the monkey is to perform, independent of the sensory input (e.g., Muhammed, Wallis, & Miller, 2006), and in human fMRI experiments PFC is sensitive to the complexity and timescale of task instructions (Koechlin & Summerfield, 2007). Norman and Shallice (1986) proposed a model in which the posterior cortex stores representations of low-level actions, and the PFC is selectively involved when multiple low-level actions compete for activation. In these cases, competition has to be resolved using event knowledge and maintenance of current goals. This theory has been implemented recently as a computational model, which can reproduce the qualitative features of action disorganization syndrome (Cooper & Shallice, 2000, 2006). Another very different computational model proposes that goal maintenance and competition resolution are combined in a single processing framework that uses similarity structure learned from previous experiences to resolve competition (Botvinick & Plaut, 2004, 2006). Although they differ dramatically in their computational architecture, both models propose that knowledge
274
Jeffrey M. Zacks and Jesse Q. Sargent
about event structure and maintenance of current task information is subserved by the PFC. More generally, the available data support the view that PFC is important both for long-term knowledge about events and for the online maintenance of task and goal information. An important open question is whether these two functions are neurophysiologically dissociated. In terms of EST, event knowledge, or SECs, corresponds to event schemata, and current task and goal representations correspond to event models.
6.2. Prefrontal Lesions and Event Segmentation The data on cognitive deficits associated with prefrontal lesions have two straightforward implications for event segmentation. First, according to EST, impairments to event schemata should reduce one’s ability to use previous experience to form adaptive event models. Thus, disordered event schemata should reduce one’s ability to use knowledge to support WM and long-term memory encoding. This is not a terribly original conclusion; it is one shared with many current theories of WM and long-term memory. More specific to EST, impaired event schemata should negatively affect one’s ability to identify normative event boundaries—particularly for activities that are familiar and thus should have strong support from schemata in control participants. Second, EST proposes that impairments to event models should affect event segmentation and memory because impaired event models should be less effective in biasing predictions. Memory for recently encountered information should be particularly impaired—specifically information encountered within the current event. Segmentation should be broadly impaired. Again, the conclusion that memory should be impaired is not original, but the conclusion that event segmentation should be affected is. Importantly, disruption of event schemata and disruption of event models should produce two qualitatively different event segmentation deficits. Disordered event schemata should produce stronger impairments for more familiar activities—those for which one has a schema. Disordered event models should produce global impairments in event segmentation. More speculatively, one might guess that disordered event schemata would selectively impair segmentation at coarser temporal grains, because coarsegrained segmentation may be more sensitive to top-down influence (Zacks & Tversky, 2001). If both event schemata and event models were impaired, one would expect to see both types of impairment. To our knowledge, there has been only one study of event segmentation in patients with frontal lobe lesions (Zalla, Pradat-Diehl, & Sirigu, 2003). In this experiment, participants with PFC lesions and healthy controls segmented two short movies of everyday activities at coarse and fine temporal grains. The patient group did not differ significantly from the controls in their fine segmentation, but their coarse segmentation was less
Event Perception: A Theory and Its Application to Clinical Neuroscience
275
well ordered and delayed relative to the controls. The fact that coarse segmentation was selectively affected suggests, albeit weakly, that these patients had impaired event schemata. The fact that fine segmentation did not show obvious impairment suggests—again, weakly—that the patient group’s event models may have been intact. Clearly, there is a need for more data on the effect of PFC lesions on event segmentation. It would be particularly valuable to vary the familiarity of the activities to be segmented, and to directly compare event knowledge with segmentation. If impairments in segmentation track impairments in event knowledge and both are caused by PFC lesions, this would support EST’s proposal that event schemata subserved by the PFC contribute to forming adaptive event models. It also would be valuable to combine event segmentation measures with measures of the memory functions of event models. The available data strongly suggest that memory for within-event information is impaired by PFC lesions (e.g., Mu¨ller & Knight, 2006). If the degree of this impairment tracks impairment in event segmentation, this would support EST’s proposal that event models bias perceptual prediction. More specifically, memory impairments should predict segmentation impairments above and beyond impairments attributable to deficits in event knowledge. In sum, lesions to the PFC are likely to be of profound consequence for event segmentation. Although there are few data that bear directly on this possibility, those that exist are consistent with it. This is important in its own right, but also is important for thinking about other conditions that affect the PFC. We turn now to two such circumstances—adult aging and AD.
7. Aging While clearly neither a neuropsychological nor a cognitive disorder, normal aging has been associated with a host of changes in brain and behavior. Perhaps, the most concrete age-related change is reduction in brain weight and volume. Postmortem studies show that total brain weight declines by about 2% per decade over progression from early to late adulthood (Kemper, 1994) and in vivo volumetry MRI studies show median correlations between brain volume and age to be about 0.5 (Raz, 1996). However, reduced volume, and age-related changes in general, occur differentially across different brain regions. Here, we will focus on changes in two brain systems that are relevant to event understanding: the PFC and neuromodulatory systems in the midbrain.
276
Jeffrey M. Zacks and Jesse Q. Sargent
7.1. Prefrontal Cortex Although some regions (e.g., primary sensory cortices) show very little ageassociated shrinkage, reduction in PFC volume is severe (e.g., Raz et al., 1997). Perhaps more meaningful, age-associated reductions in synaptic density and dendritic arborization (e.g., Liu, Erikson, & Brun, 1996) and in resting cerebral blood flow (e.g., Shaw et al., 1984) are greatest in the PFC. Evidence that physiological changes are most pronounced in PFC accords with findings showing age-related cognitive deficits specifically in tasks that are thought to depend on the PFC. For example, WM tasks measure ability to maintain information in a readily available state while simultaneously performing other cognitive operations of varying complexity. Numerous studies have shown age-related deficits in WM tasks (e.g., Belleville, Rouleau, & Caza, 1998; Hartman, Dumas, & Nielsen, 2001; Verhaeghen & Salthouse, 1997; see Hasher & Zacks, 1988, for review). Damage to lateral PFC regions impairs performance on a range of WM tasks (e.g., Baldo & Shimamura, 2000; D’Esposito & Postle, 1999; GoldmanRakic, 1987; Hartley et al., 1998). Also, neuroimaging studies show that, during the retention interval of WM tasks, dorsolateral PFC increases in activity as the degree of concurrent information processing increases (see Cabeza & Nyberg, 2000; D’Esposito & Postle, 1999; D’Esposito et al., 1998, for reviews). Attentional control, which also shows age-related decline, is another specific cognitive function thought to depend on PFC (see Posner & Peterson, 1990, for a review). There are a number of different tasks that are used to measure attentional control. Selective attention tasks require deployment of attention to a particular channel (e.g., left or right ear). Focused attention tasks might require maintenance of attention on a particular target or region of space, while divided attention tasks might tap the ability to monitor several stimuli at once, or to rapidly switch attention between multiple targets. Performance on tasks that assess attentional control declines with age. For example, divided attention costs have been shown to be greater in older than in younger adults (e.g., Hartley, 1992, 1993). Hasher, Zacks, and colleagues have presented an inhibition-deficit view of cognitive aging. According to this view, many age-related cognitive deficits are due to a decreased ability to limit access to WM and delete unwanted information from WM (e.g., Hasher & Zacks, 1988; Hasher, Zacks, & May, 1999; Zacks & Hasher, 1994). For example, participants were presented with italicized passages containing distracting text (in regular font) and instructed to read the italicized and ignore the regular font text (Connelly, Hasher, & Zacks, 1991). Older adults showed slower reading times and poorer comprehension, indicating reduced ability to focus attention on only the relevant portions of the text. Providing support for this interpretation, recent results show that older adults actually retain more of the to-be-ignored material as evidenced by
Event Perception: A Theory and Its Application to Clinical Neuroscience
277
implicit memory tests (Thomas & Hasher, 2009, submitted for publication). Recent work by Hasher and colleagues suggests that declines in prefrontal mediated inhibition of distracting information are responsible for age-related declines in episodic memory (Healey, Campbell, & Hasher, 2008; Stevens, Hasher, Chiew, & Grady, 2008). In sum, the volume and structural integrity of the PFC decline with age. These declines are associated with reduced WM capacity and attentional control.
7.2. Midbrain Neuromodulatory Systems Neuromodulatory systems whose neurons have cell bodies in the midbrain may undergo significant age-related changes. As described in Section 2, neurons in the anterior LC signal with norepinephrine, project broadly to the forebrain, and may code error signals. These cells show attrition with age (e.g., Chan-Palay & Asan, 1989a,b; McGeer & McGeer, 1989). Evidence of age-related decreases in the dopamine system comes from several findings. First, postmortem studies have shown an age-related decrease in the number of dopamine neurons (Fearnley & Lees, 1991). Also, D2 receptor binding in the striatum has been shown to decline with age (Sakata, Farooqui, & Prasad, 1992). In a particularly relevant study (Volkow et al., 1998), striatal D2 receptor binding in adults ranging from 24 to 86 years of age was assessed using positron emission tomography (PET). A cognitive battery including the WCST was also administered. Consistent with previous findings, D2 receptor binding in caudate and putamen decreased with age. In addition, a significant relationship between receptor binding and cognitive performance remained even after controlling for the effects of age. This strengthens the observed relationship between decreased dopaminergic system activity and cognitive deficits. In sum, midbrain neuromodulatory systems involved in signaling errors show age-related declines, and these may be related to changes in cognitive function.
7.3. Episodic Memory and Situation Model Construction We have established that age-related differences in WM and attentional control are substantial and have been associated with differences in specific brain structures. Age-related differences in episodic memory are also substantial. However, the medial temporal lobes, which are critical to episodic memory formation (Squire & Zola-Morgan, 1991), undergo minimal change with healthy aging (Head, Snyder, Girton, Morris, & Buckner, 2005; Raz, 2000). One possibility is that age-related declines in episodic memory are due to changes in controlled processing during encoding and retrieval, which may be mediated by the PFC (Healey et al., 2008; Stevens et al., 2008).
278
Jeffrey M. Zacks and Jesse Q. Sargent
Older adults have particular difficulty remembering contextual aspects of studied material. For example, memory is poorer for perceptual details such as the color, case, or font in which target material appeared (e.g., Kausler & Puckett, 1981; Naveh-Benjamin & Craik, 1995), location of target material (e.g., Chalfonte & Johnson, 1996; Uttl & Graf, 1993), temporal order of target material (Dumas & Hartman, 2003; Kausler, Salthouse, & Saults, 1988), and even whether the target material was presented visually or auditorially (Light, La Voie, Valencia-Laver, Albertson-Owens, & Mead, 1992). Accordingly, older adults are also less likely to correctly identify the source of a memory, for example, was the stimulus, seen or imagined (e.g., Norman & Schacter, 1997). This age-related deficit in source memory has been tied to differences in activity in the PFC (e.g., Swick, Senkfor, & Van Petten, 2006). Although aging is associated with significant deficits in memory for events, particularly for their contextual details, some aspects of event memory show striking preservation. There is evidence that reading and comprehending prose is facilitated by the construction of situation models, and that older adults rely on situation models during comprehension as much as younger adults. Situation models are higher level representations that describe the gist of the situation described in the text (e.g., Zwaan & Radvansky, 1998). For example, reading the sentence ‘‘She entered the hotel lobby’’ might result in the formation of a situation model wherein there is a hotel lobby with a reception desk and elevators, even though these contextual details were not in the text. Rather, they were supplied by semantic memory for what makes up a hotel lobby. Zwaan and Radvansky (1998) distinguish between a current model, which represents the current state of affairs and is updated at boundaries between events, and an integrated model of the current event together with all the previous ones. The final integrated model (or complete model) determines later episodic memory. In the terms of EST, current models correspond to event models, and semantic memory for events is provided by event schemata. Studies have shown that these situation models are maintained and updated to similar extents by younger and older adults. For example, in a study by Morrow, Leirer, Altieri, and Fitzsimmons (1994), younger and older participants read narratives that described a protagonist moving from room to room. When reading was interrupted by probe questions about certain objects mentioned in the texts, answers were faster and more accurate for objects that were closer to the protagonist’s current location, for both younger and older adults. This suggests that readers in both age groups maintained spatial situation models that were updated to reflect the protagonist’s current location. While a number of studies on discourse processing show older adults are able to construct and maintain situation models (e.g., Radvansky & Curiel, 1998; Radvansky, Zacks, & Hasher, 1996), some suggest that the use of such models may be more demanding
Event Perception: A Theory and Its Application to Clinical Neuroscience
279
for older adults (Morrow et al., 1994; Morrow, Stine-Morrow, Leirer, Andrassy, & Kahn, 1997). It is important to distinguish between the proposal that older adults rely heavily on situation models and the proposal that situation model processing is unaffected by aging. The data seem clear that older adults rely at least as heavily on situation models as younger adults. One possibility is that older adults’ construction and use of situation models is relatively intact, and reliance on them is an adaptive response to compensate for deficits in other processing domains (Radvansky & Dijkstra, 2007). However, it is also possible that older adults’ situation models are impaired but still exert a heavy influence on comprehension. This could come about because older adults prioritize global gist in comprehension over the processing of fine details (Stine-Morrow, Gagne, Morrow, & DeWall, 2004). It could also come about because it is difficult to implement comprehension strategies that do not rely heavily on situation models, even if they would be adaptive. In our view, the currently available data provide strong evidence that older adults rely heavily on situation models, but are less convincing in showing that those situation models are not negatively impacted by aging.
7.4. Aging and Event Segmentation The neurocognitive changes associated with aging make contact with the mechanisms of event segmentation at multiple points. The data reviewed above suggest three ways in which event segmentation may change with aging. The first two possibilities follow directly from the preceding discussion of the effects of PFC lesions on event understanding (see ‘‘Frontal Lobe Lesions,’’ above). First, PFC dysfunction may indicate that event model maintenance is impaired in aging. As illustrated previously, both WM and attentional control are associated with PFC function. Moreover, current theories suggest that attentional control plays a central role in determining WM capacity (Baddeley, 1986; Kane et al., 2004; McCabe, Roediger, McDaniel, Balota, & Hambrick, 2006). But what is attentional control? One view is that attentional control is the ability to maintain task-relevant information in the face of distracting sensory stimulation (Darowski, Helder, Zacks, Hasher, & Hambrick, 2008). Another view is that attentional control is the ability to maintain a representation of one’s current task and goals (Braver & Cohen, 2001; Miller & Cohen, 2001). These proposals lead to the suggestion that changes in the ability to maintain appropriate event models and update them adaptively could be at the core of age-related differences in attentional control, accounting for some of the age differences in cognitive performance. Second, PFC dysfunction may indicate that event schemata are impaired with aging. This is possible, but seems less likely than the possibility that event models are impaired. One reason to doubt that event schemata are
280
Jeffrey M. Zacks and Jesse Q. Sargent
impaired in older adults is that other domains of semantic knowledge, such as those measured by vocabulary tests, show no impairments—rather, older adults often show better performance than younger adults (Verhaeghen, 2003). Further, older adults’ scripts for everyday events do not differ systematically in their structure or content from those of younger adults (Rosen, Caplan, Sheesley, Rodriguez, & Grafman, 2003). Finally, as we have shown above, older adults appear to make as heavy use of situational knowledge as do younger adults in text comprehension and memory. Third, reductions in the efficacy of the D2 or norepinephrine systems could produce deficits in the ability to update event models in response to spikes in prediction error. Deficits in either prediction error calculation or in error-based updating would be expected to introduce noise into the timecourse of event model updating. Although a simple change to the system, such a deficit would have cascading effects: If event models are updated at inappropriate times they will form less adaptive representations of the current situation. These representations should be reflected in poorer comprehension and performance online, and in poorer later memory. We believe that the available data suggest most strongly the possibility of age-related declines in the maintenance of event models, in their updating in response to prediction error spikes, or both. Either possibility predicts that event segmentation should become less reliable and less adaptive with age. Support for this proposal comes from a study using the event segmentation paradigm described above (Zacks, Speer, et al., 2006). Older and younger participants watched movies of actors engaged in everyday activities (e.g., making a bed) and indicated when they believed one natural meaningful unit of activity had ended and another had begun. Then participants performed an order memory task in which they were given 12 cards with still pictures taken from each movie, randomly ordered, and asked to sort them into the order in which they had occurred in the movie. Participants also performed a recognition memory task for each movie. On each trial, participants were shown one picture from the movie they had viewed and one picture from a similar movie, and asked to choose the picture from the movie they had seen. Finally, participants also completed a psychometric battery, including a measure of semantic memory for event order, the Picture Arrangement subtest of the WAIS (Wechsler, 1997). In the Picture Arrangement test, participants are given a set of cartoon drawings for a common activity (e.g., going fishing) and asked to sort them into the order in which they typically occur. Thus, whereas the order memory test is a measure of one’s episodic memory for the order of events in a particular experienced activity, the Picture Arrangement test is a measure of semantic knowledge about how events typically unfold. This may be said to measure the accuracy and depth of participants’ event schemata. There were no systematic differences in boundary location between older and younger adults, which allowed calculation of segmentation
Event Perception: A Theory and Its Application to Clinical Neuroscience
281
4
3
Recognition memory accuracy
Order memory error
Healthy Mild dementia r=– 0.4 1
2 r = –0 .32
1
0.0
0.1 0.2 0.3 0.4 0.5 Segmentation agreement
0.6
0.9 0.8
0.7
7
r=
0.5
0.6 r=
1
0.4
0.5 0.0
0.1 0.2 0.3 0.4 0.5 Segmentation agreement
0.6
Figure 3 Event segmentation in older adults correlates with memory for event order (left) and recognition memory (right). For recognition memory, this relationship remained statistically significant after controlling for clinical dementia status and psychometric performance. (Data from Zacks, Speer, et al., 2006; Zacks, Swallow, et al., 2006.)
agreement scores by comparing each individual’s segmentation to that of the sample as a whole. Segmentation agreement was lower for older adults than younger adults; in other words, older adults’ segmentation was more variable. Older adults also had poorer order memory and recognition memory. Most importantly, for older adults, after controlling for global psychometric performance, segmentation agreement was significantly correlated with memory scores. Thus, those older adults who identified boundaries in a more normative fashion showed better memory for the movies (see Figure 3). Although further work is needed, it appears that age-related dysfunction of event segmentation mechanisms may be a causal factor in age-related episodic memory problems. Picture Arrangement scores were significantly lower for the older than for the younger adults, and among the older adults these scores correlated with memory scores. One possibility is that in addition to maintenance problems, semantic event schemata information available to inform event models is reduced in older adults. However, as reviewed previously there is some evidence that semantic information about events is preserved with aging. Another possibility is that the variance shared between Picture Arrangement scores and episodic memory scores reflects not knowledge about events, but shared cognitive operations between the Picture Arrangement task and the memory tasks. In particular, both the order memory test and the Picture Arrangement test require participants to sort cards with pictures on them in temporal order. This may depend heavily on WM and attentional control.
282
Jeffrey M. Zacks and Jesse Q. Sargent
Current work in our laboratory is further exploring the relations between age, event segmentation, and memory (Kurby & Zacks, under review). As noted previously, observers spontaneously group fine-grained events hierarchically into coarse events (Zacks et al. 2001; see Section 2). This hierarchical organization is weakened in older adults compared to younger adults. Moreover, like segmentation agreement, hierarchical organization predicts subsequent memory within the older adult group. In sum, these experiments show that older adults do not segment activity in as reliable or as organized a fashion as younger adults. Across individuals, the ability to segment well predicts later memory performance. This is consistent with EST’s proposal that event models partly determine episodic memory encoding. However, the available data do not do much to tell us which of the systems affected by aging is responsible for the changes in event segmentation and memory performance. We noted previously that event model maintenance and error-based updating are good candidates for mechanisms that undergo changes due to the aging process. In future research, it will be important to test directly which of these mechanisms is affected. One possibility is to measure evoked brain responses at event boundaries during passive viewing. We know that a substantial subset of healthy older adults, when asked to perform an explicit segmentation task, segment events at less normative and less effective points in time. We have proposed that the posterior cortical responses at event boundaries may reflect the consequences of error-based updating (Speer et al., 2003). If older adults have impaired updating, this would predict that older adults with poor event segmentation would show reduced responses in these areas at event boundaries during passive viewing. Alternatively, if event model maintenance is impaired, this would predict intact responses at event boundaries during passive viewing. Another possibility is to directly measure the activity of the error signaling system. Current research in our laboratory is characterizing the response of this system in younger adults using fMRI (Kurby, Zacks, & Haroutunian, 2009); in the future we plan to extend these studies to explore age differences.
8. Alzheimer’s Disease AD is a progressive neurodegenerative disease associated with old age. The earliest neuropsychological symptoms typically cited are deficits in episodic memory (e.g., Huff et al., 1987; Welsh, Butters, Hughes, Mohs, & Heyman, 1992). However, more recently it has been suggested that attentional control deficits may be observed even earlier in the disease course (e.g., Balota & Faust, 2001; Tse, Balota, Moynan, Duchek, & Jacoby, in press). As the disease progresses, memory is affected more globally
Event Perception: A Theory and Its Application to Clinical Neuroscience
283
and eventually all higher order cognitive processes break down resulting in symptoms such as disorientation and loss of speech. In some respects, the changes in behavior associated with early AD resemble accelerations in the changes associated with normal aging (Storandt & Beaudreau, 2004). For example, episodic memory problems are associated with both normal aging and AD, and it is primarily the more severe memory loss that distinguishes AD. In fact, a recent study showed that 20–40% of a sample of healthy older adults had the neuropathological markers of AD and that even in this sample, the degree to which these markers were present at autopsy was correlated with premorbid cognitive function (Price et al., 2009). This raises interesting questions about the relationship between the cognitive and brain changes associated with normal aging and those associated with early-stage AD. However, cognitive deficits that are qualitatively unique to AD have also been identified (e.g., Johnson, Storandt, & Balota, 2003). In any case, research has shown a clear pattern of brain changes and cognitive deficits associated with AD.
8.1. Brain Changes and Cognitive Deficits Definitive diagnosis of AD requires postmortem identification of characteristic intraneuronal neurofibrillary changes (tangles) and extracellular amyloid deposits (plaques) in the brain. Through postmortem examination of healthy and diseased brains, Braak and Braak (1991) identified six stages of AD development on the basis of the distribution pattern of neurofibrillary tangles (NFTs) and neuropil threads (NTs). Stage I is associated with the appearance of NFTs and NTs in the entorhinal cortex in the medial temporal lobe. In stage II, the hippocampus is also affected. Stages III and IV are marked by denser accumulation of markers in these areas and some spreading to other limbic structures. In stage V, neocortical association areas are affected and by stage VI primary cortical areas are affected as well. The potential causal relationship between the appearance of these neuropathological markers and the clinical course of AD is complex and not fully understood. For example, although amyloid plaques are commonly thought to be causally related to AD (e.g., Hardy & Higgins, 1992), they are found in significant percentages of cognitively normal older adults (e.g., Arriagada, Marzloff, & Hyman, 1992; Mintun et al. 2006; Price et al., 2009; Sperling et al., 2009). However, there is little doubt that the progression of AD is marked by the accumulation of these markers in specific areas (e.g., Berg et al., 1998; Martin et al., 1987; Price & Morris, 2004), and that the presence of these markers in a particular region is associated with neuronal dysfunction in that region (e.g., Berg et al.; Hardy, 2002). For example, Kanne, Balota, McKeel, Storandt, and Morris (1998) showed evidence that accumulation of cored senile plaques (late-stage amyloid deposits) in specific brain areas was associated with deficits on specific cognitive tasks believed to
284
Jeffrey M. Zacks and Jesse Q. Sargent
involve those areas. A large sample of participants with mild and very mild AD completed a cognitive test battery. A factor analysis identified three factors: a mental control/frontal factor, a memory-verbal/temporal factor, and a visuospatial/parietal factor. Forty-one of these participants came to autopsy an average of 5.1 years after testing. The relative density of senile plaques in each region was correlated with performance on that region’s putative corresponding psychometric factor. This study provides some support for the idea that the cognitive changes associated with AD provide indicators of which structures are accumulating neuropathological markers and failing in their functional duties. Further support comes from imaging techniques that allow antemortem examination of AD-related brain changes. In vivo amyloid deposition can be examined using a radiological contrast compound (C-PIB) that binds specifically to amyloid plaques and can be imaged using PET. For example, Klunk et al. (2004) showed that AD is associated with C-PIB uptake in the frontal cortex, particularly the medial portion, in temporal and occipital cortices, and in the striatum as well. Using fluorodeoxyglucose PET (FDGPET) to examine patterns of glucose metabolism in the brain, the authors also showed that these regions were associated with reduced glucose metabolism. Subsequently, Buckner et al. (2005) presented converging measures showing AD pathology in a similar network of brain regions. In addition to atrophy in the MTL, early AD was associated with atrophy (as identified by structural MRI), amyloid deposition, and reduced metabolism in precuneus, posterior cingulate, and lateral temporal and parietal regions. It is noteworthy that atrophy in the MTL and precuneus was observed in very early stages, and even in healthy converters who were not diagnosed until later. Work involving radiological contrast compounds that bind to NFTs is in very early stages of development. Already, there is some evidence that binding of compounds with an affinity for both plaques and tangles across temporal, parietal, posterior cingulate, and frontal regions differentiates between normal controls and AD patients better than FDG-PET or brain volume as measured by MRI (Small et al., 2006). As the in vivo imaging of amyloid plaques and NFTs improves, a clearer picture of the relationship between the accumulation of these markers in specific areas and the clinical course of AD will emerge (for more see Hardy & Higgins, 1992; Price & Morris, 2004). The general progression of AD neuropathology identified by Braak and Braak (1991), from medial temporal structures, throughout the limbic system, cortical association areas, and eventually to the entire neocortex is supported by imaging studies of brain volume (Devanand et al., 2007; Henneman et al., 2009) and metabolism (Dickerson & Sperling, 2008; Li et al., 2008). This is in keeping with the observation of episodic memory deficits in early AD (e.g., Huff et al., 1987; Welsh et al., 1992). However,
Event Perception: A Theory and Its Application to Clinical Neuroscience
285
there is also evidence suggesting that the precuneus shows atrophy, and the medial frontal cortex accumulates amyloid very early in the disease course (e.g., Buckner et al., 2005). Both of these regions have been associated with attention (e.g., Mao, Zhou, Zhou, & Han, 2007; Nagahama et al., 1999; Thienel et al., 2009). Accordingly, deficits in attentional control are observed in very early-stage AD (Perry & Hodges, 1999; Rizzo, Anderson, Dawson, Myers, & Ball, 2000; Tse et al., in press) and even identify healthy older adults who will subsequently convert to AD (e.g., Balota et al., in press; Twamley, Ropacki, & Bondi, 2006). Although the neurophysiological correlates of changes in attention in AD are not currently well understood (Hirao et al., 2005; Johnson et al., 1998), the literature does indicate that changes in attention and the precuneus, as well changes in memory and the MTL, may characterize early and even preclinical AD. Recently, researchers have been particularly interested in a network of regions that show greater activity during rest or in passive control conditions than during focused cognitive tasks. These include a set of midline regions in the anterior and posterior cortex and regions in lateral parietal cortex. Dubbed the ‘‘default mode network’’ (DMN; Raichle et al., 2001), this network has been proposed to subserve a set of tasks performed on an ongoing basis to sustain normal functioning. Interestingly, the brain regions identified above as particularly vulnerable to early amyloid deposition (i.e., MTL, medial parietal and prefrontal areas) show considerable overlap with the DMN. The DMN appears to increase in activity during episodic and autobiographical memory retrieval, and decrease in activity when attention to external stimuli is required (e.g., Shulman et al., 1997; Svoboda, McKinnon, & Levine, 2006; Wagner, Shannon, Kahn, & Buckner, 2005). Within the DMN, AD patients show increased amyloid accumulation and disrupted neural activity, for example, decreased connectivity (e.g., Bai et al., 2008; Buckner et al., 2005; Greicius, Srivastava, Reiss, & Menon, 2004). Even in older adults without dementia, high levels of amyloid deposition in the DMN have been associated with abnormal neural activity in this network during memory tasks as measured by fMRI (Sperling et al., 2009). While work relating the DMN to AD is in early stages of development, results to date support the connection between biomarker deposition in the DMN and cognitive dysfunction observed in AD. In sum, evidence suggests that the MTL and the precuneus are affected earliest in the course of AD, followed by other cortical regions such as the posterior cingulate, temporoparietal region and the medial frontal cortex (e.g., Buckner et al., 2005). These brain changes correspond, at least partially, to the cognitive changes in the disease: Episodic memory and attention are selectively affected early on; further deterioration in these areas is observed in the middle stages, and in the late stages cognition is globally impaired.
286
Jeffrey M. Zacks and Jesse Q. Sargent
8.2. Alzheimer’s Disease and Event Segmentation This progression suggests that the effects of early-stage AD on event segmentation should resemble exaggerated versions of the effects of aging. Event segmentation itself may be little affected by selective lesions to the MTL memory system. However, such lesions predict that event segmentation has an exaggerated effect on memory accessibility. Among healthy adults, the ability to remember details from a narrative is reduced if the narrative includes a change likely to trigger an event boundary (e.g., temporal or spatial shift) since the mention of such details (e.g., Speer & Zacks, 2005). Given the importance of the MTL for retrieval of items no longer maintained in WM, or no longer in the current event model, we would expect even poorer memory for details requiring retrieval across event boundaries among early AD patients. There is also reason to believe that AD-related neuropathology in medial posterior regions, particularly the precuneus and the posterior cingulate, would have negative consequences for event segmentation mechanisms. As described previously, research in our laboratory suggests that these regions are part of a network involved in event segmentation, which shows transient increases when perceivers experience event boundaries during comprehension (see Section 2 above). We suggest that these posterior regions may be important either for detecting changes in the various dimensions that define events (e.g., time, space, actors, goals, etc.), or in providing inputs to event models when error-based gating mechanisms update a current event model. Either way, AD-related dysfunction in the posterior cingulate and precuneus might be expected to interfere with the updating of event models that no longer provide accurate predictions. Given that event models serve to guide attention, this could manifest as the type of attention problems observed in very early AD. Although we have focused on how AD-related brain changes might affect event segmentation mechanisms, it is also possible that such mechanisms might be preserved, particularly earlier in the disease course. This possibility is supported by the fact that there is relatively little overlap between the brain regions associated with EST (see Figure 1) and those affected by early-stage AD pathology described above. Previous work in our laboratory with older adults, both healthy and with very mild AD, suggests that individual differences in event segmentation predict event memory independently of clinical dementia status (Zacks, Speer, et al., 2006). Work is currently underway, using larger sample sizes, which will enable us to ask whether the strength of the relationship between event segmentation and event memory varies across levels of clinical dementia status. If this relationship is as strong among earlystage AD patients as among healthy older adults, this would suggest that some mechanisms of event segmentation are independent of those degraded in the early stages of the disease. The finding that mechanisms of event segmentation
Event Perception: A Theory and Its Application to Clinical Neuroscience
287
are robust against the moderate neural lesions of early-stage AD would have an important clinical application: Event segmentation would be an attractive target for training to remediate memory deficits. One possibility is that deliberate attention to event segmentation itself will improve memory encoding. In addition, imaging data will afford the opportunity to ask whether structural integrity in certain brain regions mediates the relation between event segmentation and memory. According to EST, effects in PFC would suggest that early dementia affects either the formation of event models or the use of event knowledge. Effects in posterior cortex would suggest early dementia affects either the processes of detecting an event boundary or of updating an event model. In the later stages of AD, damage to neural integrity is widespread, and deficits in cognition are comparably broad. Early in the disease progression, the encoding of new memories is affected but the retrieval of previously learned material is preserved (e.g., Huff et al., 1987; Welsh et al., 1992). As the disease progresses, access to autobiographical memories declines. In the later stages, even the most overlearned semantic associations are lost. At this point, in addition to the frontal maintenance problems discussed above, it is likely that reliable event schemata are no longer available or accessible. Accordingly, the perceptual guidance provided by event models is likely to be severely limited. This represents a fundamental breakdown of the event segmentation system and would have wide ranging deleterious consequences such as those observed in advanced AD, for example, disorientation. However, at this stage in the disease, global cognitive function has deteriorated to the point where drawing connections to EST may be of limited value. In sum, the brain changes associated with early AD may lead to attention and memory problems by way of disruption of event segmentation mechanisms. Alternatively, it may be that event segmentation abilities, or certain aspects thereof, are relatively well preserved in AD. In the latter case, clinical efforts to maximize the cognitive burden carried by particularly well-preserved event segmentation mechanisms may reduce attention and memory problems. Work is currently underway that will begin to address these possibilities.
9. Conclusions We have reviewed a complex and diverse set of clinical neuroscientific circumstances—and there are many more we have had to leave to the side for lack of space. A heuristic overview of the pattern of deficits we have observed is provided in Table 1. We would like to emphasize that the set of mechanisms we have examined, as well as the set of conditions, is selective. For example, we have not discussed the role of the medial temporal episodic memory system (Cohen & Eichenbaum, 1995) in event understanding and
Table 1
Overview of Potential Event Segmentation Mechanism Impairments.
Schizophrenia Obsessive-compulsive disorder Parkinson’s disease Frontal lobe lesions Aging Alzheimer’s disease
Sensory-perceptual processing
Prediction monitoring
Error-based updating
Event models
Event schemata
0 0 0 0 0 0
0 þ 0 0 0 þ
þ þ þ 0 þ þ
þþ 0 0 þ þ þ
0 þ 0 þþ 0 þ
þ: Suggestive evidence for impairment; þþ: strong evidence for impairment; 0: not yet tested.
Event Perception: A Theory and Its Application to Clinical Neuroscience
289
memory, nor have we considered persons who experience anterograde amnesia after damage to this system. For none of the conditions we have examined do we find evidence for deficits in sensory-perceptual processing. However, in other conditions—for example, visual form agnosia or motion blindness—deficits in sensory-perceptual processing are clearly evident and likely have important consequences for event segmentation. We believe the picture that emerges from this review underwrites a strong message: The mechanisms of event segmentation provide a valuable framework for understanding cognitive dysfunction. This provides an exciting leverage point for clinical diagnosis and treatment. People, including those of us who are aging or coping with a neurological or neuropsychiatric condition, tend to care about their ability to comprehend the everyday events around them, to remember those events later, and to plan adaptive actions. Theory-driven interventions that may improve event comprehension and memory have the potential to substantially improve quality of life. As we have described throughout the chapter, researchers coming from a range of theoretical perspectives are applying such interventions to a range of clinical problems. We are hopeful that the current chapter illustrates how EST may contribute to this effort. However, the basic science base underlying such interventions needs extending on at least two fronts. First, there is an urgent need for many more data on event understanding in clinical populations and in healthy aging. One can draw inferences about the mechanisms of event segmentation from the available data concerning attention, memory, and performance. However, such inferences are necessarily weak and invite direct verification. Second, there is a need for formal models that make fine-grained predictions about the consequences of specific neurological changes for specific aspects of event segmentation and memory. An initial step in this direction was taken with the computational model of Reynolds et al. (2007). This model was a connectionist implementation of the core architecture of EST. It would be valuable to extend this model to produce moment-by-moment predictions for event perception and memory. Virtual lesions could then be applied to the model, and the model’s performance could be directly compared with that of patients from the groups discussed here. Such comparisons would provide powerful means to constrain theories of event understanding and to characterize the cognitive dysfunction in these conditions. Clearly, there is much work to be done. We believe this is an exciting time for researchers studying deficits in higher cognition. New landscapes of theory and methods are opening up—the lens of event segmentation that we have applied here can encompass only a small field of view over this terrain. Basic scientists who wish to better understand how people comprehend and remember the everyday events that make up their lives have a lot to gain by taking up this exploration. Those with disorders of event perception also stand to benefit from this endeavor.
290
Jeffrey M. Zacks and Jesse Q. Sargent
ACKNOWLEDGMENTS Preparation of this chapter was supported in part by NIH grants R01-MH70674 and R01AG031150 and by NSF grant BCS-0236651, all to Jeff Zacks. The authors thank Dave Balota, Jordan Grafman, Joe Magliano, G. A. Radvansky, and Rose Zacks for helpful comments on the manuscript.
REFERENCES Aarsland, D., Andersen, K., Larsen, J., Lolk, A., Nielsen, H., & Kragh-Sorensen, P. (2001). Risk of dementia in Parkinson’s disease—A community-based, prospective study. Neurology, 56, 730–736. Arriagada, P. V., Marzloff, K., & Hyman, B. T. (1992). Distribution of Alzheimer type pathologic changes in nondemented elderly individuals matches the pattern in Alzheimer’s disease. Neurology, 42, 1681–1688. Baddeley, A. (1986). Dementia and working memory. Quarterly Journal of Experimental Psychology, 38A, 603–618. Baddeley, A. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4(11), 417–423. Bai, F., Zhang, Z., Yu, H., Shi, Y., Yuan, Y., Zhu, W., et al. (2008). Default-mode network activity distinguishes amnestic type mild cognitive impairment from healthy aging: A combined structural and resting-state functional MRI study. Neuroscience Letters, 438, 111–115. Baldo, J. V., & Shimamura, A. P. (2000). Spatial and color working memory in patients with lateral prefrontal cortex lesions. Psychobiology, 28, 156–167. Balota, D. A., & Faust, M. (2001). Attention in dementia of the Alzheimers type. In F. Boller & S. Cappa (Eds.), Handbook of neuropsychology, Vol. 6 (pp. 51–80). Elsevier Science No. 2. Balota, D. A., Tse, C. S., Hutchison, K. A., Spieler, D. H., Duchek, J. M., & Morris, J. C. (2010). Predicting conversion to dementia of the Alzheimer type in a healthy control sample: The power of errors in stroop color naming. Psychology and Aging, 25, 208–218. Barch, D. M. (2005). The relationships between cognition, motivation and emotion in schizophrenia: How much and how little we know. Schizophrenia Bulletin, 31, 875–881. Barch, D. (2006). What can research on schizophrenia tell us about the cognitive neuroscience of working memory? Neuroscience, 139, 73–84. Bartels, A., & Zeki, S. (2004). Functional brain mapping during free viewing of natural scenes. Human Brain Mapping, 21, 75–85. Belleville, S., Rouleau, N., & Caza, N. (1998). Effect of normal aging on the manipulation of information in working memory. Memory & Cognition, 26(3), 572–583. Berg, L., McKeel, D. W., Miller, J. P., Storandt, M., Rubin, E. H., Morris, J. C., et al. (1998). Clinicopathologic studies in cognitively healthy aging and Alzheimer’s disease: Relation of histologic markers to dementia severity, age, sex, and apoE genotype. Archives of Neurology, 55(3), 326–335. Binder, M. D., Hirokawa, N., & Windhorst, U. (2009). Encyclopedia of neuroscience. Berlin, Heidelberg: Springer. Boltz, M. (1992). Temporal accent structure and the remembering of filmed narratives. Journal of Experimental Psychology: Human Perception and Performance, 18, 90–105. Botvinick, M., Braver, T., Barch, D., Carter, C., & Cohen, J. (2001). Conflict monitoring and cognitive control. Psychological Review, 108(3), 624–652.
Event Perception: A Theory and Its Application to Clinical Neuroscience
291
Botvinick, M. M., & Plaut, D. C. (2004). Doing without schema hierarchies: A recurrent connectionist approach to routine sequential action and its pathologies. Psychological Review, 111, 394–429. Botvinick, M., & Plaut, D. (2006). Short-term memory for serial order: A recurrent neural network model. Psychological Review, 113, 201–233. Bower, G. H., & Rinck, M. (2001). Selecting one among many referents in spatial situation models. Journal of Experimental Psychology. Learning, Memory, and Cognition, 27, 81–98. Boyer, P., & Lienard, P. (2008). Ritual behavior in obsessive and normal individuals— Moderating anxiety and reorganizing the flow of action. Current Directions in Psychological Science, 17, 291–294. Braak, H., & Braak, E. (1991). Neuropathological stageing of Alzheimer-related changes. Acta Neuropathologica (Berlin), 82, 239–259. Braver, T. S., & Cohen, J. D. (2001). Working memory, cognitive control, and the prefrontal cortex: Computational and empirical studies. Cognitive Processing, 2, 25–55. Bresnahan, M. A., Brown, A. S., Schaefer, C. A., Begg, M. D., Wyatt, R. J., & Susser, E. S. (2000). Incidence and cumulative risk of treated schizophrenia in the prenatal determinants of schizophrenia study. Schizophrenia Bulletin, 26, 297–308. Buckner, R. L., Snyder, A. Z., Shannon, B. J., LaRossa, G., Sachs, R., Fotenos, A. F., et al. (2005). Molecular, structural, and functional characterization of Alzheimer’s disease: Evidence for a relationship between default activity, amyloid, and memory. Journal of Neuroscience, 25, 7709–7717. Cabeza, R., & Nyberg, L. (2000). Imaging cognition II: An empirical review of 275 PET and fMRI studies. Journal of Cognitive Neuroscience, 12, 1–47. Chalfonte, B. L., & Johnson, M. K. (1996). Feature memory and binding in younger and older adults. Memory & Cognition, 24, 403–416. Chan-Palay, V., & Asan, E. (1989a). Alterations in catecholamine neurons of the locus coeruleus in senile dementia of the Alzheimer type and in Parkinson’s disease with and without dementia and depression. Journal of Comparative Neurology, 287(3), 373–392. Chan-Palay, V., & Asan, E. (1989b). Quantitation of catecholamine neurons in the locus coeruleus in human brains of normal young and older adults and in depression. Journal of Comparative Neurology, 287(3), 357–372. Clare, L., McKenna, P. J., Mortimer, A. M., & Baddeley, A. D. (1993). Memory in schizophrenia: What is impaired and what is preserved? Neuropsychologia, 31(11), 1225–1241. Cohen, N. J., & Eichenbaum, H. (1995). Memory, amnesia, and the hippocampal system. Cambridge, MA: MIT Press. Connelly, S. L., Hasher, L., & Zacks, R. T. (1991). Age and reading: The impact of distraction. Psychology and Aging, 6, 533–541. Cooper, R., & Shallice, T. (2000). Contention scheduling and the control of routine activities. Cognitive Neuropsychology, 17, 297–338. Cooper, R., & Shallice, T. (2006). Hierarchical schemas and goals in the control of sequential behavior. Psychological Review, 113, 887–916. Creese, I., Burt, D. R., & Snyder, S. H. (1976). Dopamine receptor binding predicts clinical and pharmacological potencies of antischizophrenic drugs. Science, 19, 481–483. Darowski, E. S., Helder, E., Zacks, R. T., Hasher, L., & Hambrick, D. Z. (2008). Agerelated differences in cognition: The role of distraction control. Neuropsychology, 22, 638–644. D’Esposito, M., Aguirre, G. K., Zarahn, E., Ballard, D., Shin, R. K., & Lease, J. (1998). Functional MRI studies of spatial and non-spatial working memory. Cognitive Brain Research, 7, 1–13. D’Esposito, M., & Postle, B. R. (1999). The dependence of span and delayed response performance on prefrontal cortex. Neuropsychologia, 37, 1303–1315.
292
Jeffrey M. Zacks and Jesse Q. Sargent
Devanand, D. P., Pradhaban, G., Liu, X., Khandji, A., De Santi, S., Segal, S., et al. (2007). Hippocampal and entorhinal atrophy in mild cognitive impairment: Prediction of Alzheimer disease. Neurology, 68(11), 828–836. Dickerson, B. C., & Sperling, R. A. (2008). Functional abnormalities of the medial temporal lobe memory system in mild cognitive impairment and Alzheimer’s disease: Insights from functional MRI studies. Neuropsychologia, 46, 1624–1635. Dumas, J., & Hartman, M. (2003). Adult age differences in temporal and item memory. Psychology of Aging, 3, 573–586. Elman, J. L. (2009). On the meaning of words and dinosaur bones: Lexical knowledge without a lexicon. Cognitive Science, 33, 547–582. Enns, J., & Lleras, A. (2008). What’s next? New evidence for prediction in human vision. Trends in Cognitive Sciences, 12, 327–333. Evans, D. W., & Leckman, J. F. (2006). Origins of obsessive-compulsive disorder: Developmental and evolutionary perspectives. In D. Cicchetti & D. Cohen (Eds.), The Handbook of Developmental Psychopathology. (2nd edition). NY: Wiley. Faust, M. E., Balota, D. A., & Spieler, D. H. (2001). Building episodic connections: Changes in episodic priming with age and dementia. Neuropsychology, 15(4), 626–637. Fearnley, J. M., & Lees, A. J. (1991). Ageing and Parkinson’s disease: Substantia nigra regional selectivity. Brain, 114, 2283–2301. Felleman, D. J., & Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1, 1–47. Fornaro, M., Gabrielli, F., Albano, C., Fornaro, S., Rizzato, S., Mattei, C., et al. (2009). Obsessive-compulsive disorder and related disorders: A comprehensive survey. Annals of General Psychiatry, 8, 13. Fuster, J. M. (1997). The prefrontal cortex: Anatomy, physiology, and neuropsychology of the frontal lobe. Philadelphia: Lippincott-Raven. Fuster, J. M., & Alexander, G. E. (1971). Neuron activity related to short term memory. Science, 173, 652–654. Gehring, W. J., Himle, J., & Nisenson, L. G. (2000). Action-monitoring dysfunction in obsessive-compulsive disorder. Psychological Science, 11(1), 1–6. Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and regulation of behavior by representational memory. In F. Plum & V. Mountcastle (Eds.), Handbook of Physiology, Vol. 5 (pp. 373–417). Bethesda, MD: American Physiological Society. Goodale, M. A. (1993). Visual pathways supporting perception and action in the primate cerebral cortex. Current Opinion in Neurobiology, 3, 578–585. Grafman, J. (1995). Similarities and distinctions among current models of prefrontal cortical functions. Annals of the New York Academy of Sciences, 769, 337–368. Grafman, J. (1999). Experimental assessment of adult frontal lobe function. In B. L. Miller & J. L. Cummings (Eds.), The human frontal lobes: Functions and disorders (pp. 321–344). New York: Guilfrod Press. Green, M., Kern, R., Braff, D., & Mintz, J. (2000). Neurocognitive deficits and functional outcome in schizophrenia: Are we measuring the ‘‘right stuff’’? Schizophrenia Bulletin, 26, 119–136. Greicius, M. D., Srivastava, G., Reiss, A. L., & Menon, V. (2004). Default-mode network activity distinguishes Alzheimer’s disease from healthy aging: Evidence from functional MRI. Proceedings of the National Academy of Sciences of the United States of America, 101, 4637–4642. Guillin, O., Abi-Dargham, A., & Laruelle, M. (2007). Neurobiology of dopamine in schizophrenia. International Review of Neurobiology, 78, 1–39. Hardy, J. (2002). The amyloid hypothesis of Alzheimer’s disease: Progress and problems on the road to therapeutics. Science, 297, 353–356. Hardy, J. A., & Higgins, G. A. (1992). Alzheimer’s disease: The amyloid cascade hypothesis. Science, 256, 184–185.
Event Perception: A Theory and Its Application to Clinical Neuroscience
293
Harrison, P. J. (1999). The neuropathology of schizophrenia: A critical review of the data and their interpretation. Brain, 122, 593–624. Hartley, A. A. (1992). Attention. In F. I. M. Craik & T. A. Salthouse (Eds.), The handbook of aging and cognition (pp. 3–49). Hillsdale, NJ: Lawrence Erlbaum Associates. Hartley, A. A. (1993). Evidence for the selective preservation of spatial selective attention in old age. Psychology and Aging, 8, 371–379. Hartley, A. A., Speer, N., Jonides, J., Reuter-Lorenz, P., Smith, E. E., Marshuetz, C., et al. (1998). Do age related impairments in specific working memory systems result in greater reliance on the central executive? In: Cognitive Neuroscience Society annual meeting abstract program: A supplement of the Journal of Cognitive Neuroscience, 88. Hartman, M., Dumas, J., & Nielsen, C. (2001). Age differences in updating working memory: Evidence from the delayed matching to sample task. Aging, Neuropsychology, and Cognition, 8, 14–35. Hasher, L., & Zacks, R. T. (1988). Working memory, comprehension and aging: A review and a new view. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory (pp. 193–225). San Diego, CA: Academic Press. Hasher, L., Zacks, R. T., & May, C. P. (1999). Inhibitory control, circadian arousal, and age. In D. Gopher & A. Koriat (Eds.), Attention & performance, XVII, cognitive regulation of performance: Interaction of theory and application (pp. 653–675). Cambridge, MA: MIT Press. Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., & Malach, R. (2004). Intersubject synchronization of cortical activity during natural vision. Science, 303(5664), 1634–1640. Hasson, U., Yang, E., Vallines, I., Heeger, D. J., & Rubin, N. (2008). A hierarchy of temporal receptive windows in human cortex. Journal of Neuroscience, 28, 2539–2550. Head, D., Snyder, A. Z., Girton, L. E., Morris, J. C., & Buckner, R. L. (2005). Frontal– hippocampal double dissociation between normal aging and Alzheimer’s disease. Cerebral Cortex, 15, 732–739. Healey, M. K., Campbell, K. L., & Hasher, L. (2008). Cognitive aging and increased distractibility: Costs and potential benefits. Progress in Brain Research, 169, 353–363. Henneman, W. J. P., Sluimer, J. D., Barnes, J., van der Flier, W. M., Sluimer, I. C., Fox, N. C., et al. (2009). Hippocampal atrophy rates in Alzheimer disease. Added value over whole brain volume measures. Neurology, 72, 999–1007. Hirao, K., Ohnishi, T., Hirata, Y., Yamashita, F., Mori, T., Moriguchi, Y., et al. (2005). The prediction of rapid conversion to Alzheimer’s disease in mild cognitive impairment using regional cerebral blood flow. Neuroimage, 28(4), 1014–1021. Hubel, D. H., & Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 195(1), 215–243. Huey, E. D., Zahn, R., Krueger, F., Moll, J., Kapogiannis, D., Wasserman, E. M., et al. (2008). A psychological and neuroanatomical model of obsessing-compulsive disorder. The Journal of Neuropsychiatry and Clinical Neurosciences, 20, 390–408. Huff, F. J., Becker, J. T., Bell, S. H., Nebbes, R. D., Holland, A. L., & Boller, F. (1987). Cognitive deficits and clinical diagnosis of Alzheimer’s disease. Neurology, 37, 1119–1124. Humphreys, G. W., & Forde, E. M. E. (1998). Disordered action schema and action disorganisation syndrome. Cognitive Neuropsychology, 15, 771–811. Humphreys, G. W., Forde, E. M. E., & Riddoch, M. J. (2001). The planning and execution of everyday actions. In The handbook of cognitive neuropsychology: What deficits reveal about the human mind (pp. 565–589). Philadelphia: Psychology Press. Johnson, D. K., Storandt, M., & Balota, D. A. (2003). A discourse analysis of logical memory recall in normal aging and in dementia of the Alzheimer’s type. Neuropsychologia, 17, 82–92. Johnson, K. A., Jones, K., Holman, B. L., Becker, J. A., Spiers, P. A., Satlin, A., et al. (1998). Preclinical prediction of Alzheimer’s disease using SPECT. Neurology, 50, 1563–1571.
294
Jeffrey M. Zacks and Jesse Q. Sargent
Kane, M., Hambrick, D., Tuholski, S., Wilhelm, O., Payne, T., & Engle, R. (2004). The generality of working memory capacity: A latent-variable approach to verbal and visuospatial memory span and reasoning. Journal of Experimental Psychology General, 133, 189–217. Kanne, S. M., Balota, D. A., McKeel, D., Storandt, M., & Morris, J. (1998). Relating anatomy to function in Alzheimer’s disease: Neuropsychological profiles predict regional neuropathology five years later. American Academy of Neurology, 50, 979–985. Kausler, D. H., & Puckett, J. M. (1981). Adult age differences in memory for sex of voice. Journal of Gerontology, 36, 44–50. Kausler, D. H., Salthouse, T. A., & Saults, J. S. (1988). Temporal memory over the adult lifespan. American Journal of Psychology, 101, 207–215. Kemper, T. L. (1994). Neuroanatomical and neuropathological changes during aging and in dementia. In M. L. Albert & E. J. E. Knoepfel (Eds.), Clinical Neurology of Aging (pp. 3–67). (2nd ed.). New York: Oxford University Press. Klunk, W. E., Engler, H., Nordberg, A., Wang, Y., Blomqvist, G., Holt, D. P., et al. (2004). Imaging brain amyloid in Alzheimer’s disease with Pittsburgh compound B. Annals of Neurology, 55, 306–319. Koechlin, E., Danek, A., Burnod, Y., & Grafman, J. (2002). Medial prefrontal and subcortical mechanisms underlying the acquisition of motor and cognitive action sequences in humans. Neuron, 35(2), 371–381. Koechlin, E., & Summerfield, C. (2007). An information theoretical approach to prefrontal executive function. Trends in Cognitive Sciences, 11, 229–235. Kurby, C. A., & Zacks, J. M. (2008). Segmentation in the perception and memory of events. Trends in Cognitive Sciences, 12, 72–79. Kurby, C. A., & Zacks, J. M. (under review). Age differences in the perception of hierarchical structure in events. Journal of Experimental Psychology: Human Perception and Performance. Kurby, C. A., Zacks, J. M., & Haroutunian, N. (2009). Event boundaries and everyday clairovoyance. In: Poster presentation (#3156) at annual meeting of psychonomic society. Boston, MA. Levy, R., & Goldman-Rakic, P. S. (2000). Segregation of working memory functions within the dorsolateral prefrontal cortex. Experimental Brain Research, 133, 23–32. Li, Y., Rinne, J. O., Mosconi, L., Pirraglia, E., Rusinek, H., DeSanti, S., et al. (2008). Regional analysis of FDG and PIB-PET images in normal aging, mild cognitive impairment, and Alzheimer’s disease. European Journal of Nuclear Medicine and Molecular Imaging, 35(12), 2169–2181. Light, L. L., La Voie, D., Valencia-Laver, D., Albertson-Owens, S. A., & Mead, G. (1992). Direct and indirect measures of memory for modality in younger and older adults. Journal of Experimental Psychology. Learning, Memory, and Cognition, 18, 1284–1297. Liu, X., Erikson, C., & Brun, A. (1996). Cortical synaptic changes and gliosis in normal aging, Alzheimer’s disease and frontal lobe degemneration. Dementia, 7, 128–134. Mao, L., Zhou, B., Zhou, W., & Han, S. (2007). Neural correlates of covert orienting of visual spatial attention along vertical and horizontal dimensions. Brain Research, 1136(1), 142–153. Marr, D., & Ullman, S. (1981). Directional selectivity and its use in early visual processing. Proceedings of the Royal Society B: Biological Sciences, 211(1183), 151–180. Martin, E., Wilson, R., Penn, R., Fox, J. H., Clasen, R. A., & Savoy, S. M. (1987). Cortical biopsy results in Alzheimer’s disease: Correlation with cognitive deficits. Neurology, 37(7), 1201–1204. McCabe, D. P., Roediger, H. L., McDaniel, M. M., Balota, D. A., & Hambrick, D. Z. (2006). The relationship between working memory capacity and frontal-lobe functioning: An adult life span study. In: Biennial Cognitive Aging Conference. Atlanta, GA.
Event Perception: A Theory and Its Application to Clinical Neuroscience
295
McGeer, P. L., & McGeer, E. G. (1989). Amino acid neurotransmitters. In G. J. Siegel, B. W. Agranoff, R. W. Albers, & P. W. Molinoff (Eds.), Basic neurochemistry: Molecular, cellular, and medical aspects (pp. 311–332). New York: Raven. Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24, 167–202. Mintun, M. A., Larossa, G. N., Sheline, Y. I., Dence, C. S., Lee, S. Y., Mach, R. H., et al. (2006). [11C]PIB in a nondemented population: Potential antecedent marker of Alzheimer disease. Neurology, 67, 446–452. Morrow, D. G., Leirer, V. O., Altieri, P. A., & Fitzsimmons, C. (1994). Age differences in creating spatial mental models from narratives. Language and Cognitive Processes, 9, 203–220. Morrow, D. G., Stine-Morrow, E. A. L., Leirer, V. O., Andrassy, J. M., & Kahn, J. (1997). The role of reader age and focus of attention in creating situation models from narratives. Journal of Gerontology: Psychological Sciences, 52B, 73–80. Muhammad, R., Wallis, J. D., & Miller, E. K. (2006). A comparison of abstract rules in the prefrontal cortex, premotor cortex, inferior temporal cortex, and striatum. Journal of Cognitive Neuroscience, 18, 974–989. Mu¨ller, N., & Knight, R. (2006). The functional neuroanatomy of working memory: Contributions of human brain lesion studies. Neuroscience, 139, 51–58. Mushiake, H., Sakamoto, K., Saito, N., Inui, T., Aihara, K., & Tanji, J. (2009). Involvement of the prefrontal cortex in problem solving. International Review of Neurobiology, 85, 1–11. Nagahama, N., Okada, T., Katsumi, Y., Hayashi, T., Yamauchi, H., Sawamoto, N., et al. (1999). Transient neural activity in the medial superior frontal gyrus and precuneus time locked with attention shift between object features. NeuroImage, 10, 193–199. Naveh-Benjamin, M., & Craik, F. I. M. (1995). Memory for context and its use in item memory: Comparisons of younger and older persons. Psychology and Aging, 10, 284–293. Newtson, D. (1973). Attribution and the unit of perception of ongoing behavior. Journal of Personality and Social Psychology, 28, 28–38. Newtson, D. (1976). Foundations of attribution: The perception of ongoing behavior (pp. 223–248). Hillsdale, New Jersey: Lawrence Erlbaum Associates. Newtson, D., & Engquist, G. (1976). The perceptual organization of ongoing behavior. Journal of Experimental Social Psychology, 12, 436–450. Norman, D. A., & Shallice, T. (1986). Attention to action: Willed and automatic control of behaviour. In R. Davidson, G. Schwartz, & D. Shapiro (Eds.), Consciousness and self regulation: Advances in research and theory, Vol. 4 (pp. 1–18). New York: Plenum. Norman, K. A., & Schacter, D. L. (1997). False recognition in older and younger adults: Exploring the characteristics of illusory memories. Memory & Cognition, 25, 838–848. Olanow, C., & Tatton, W. (1999). Etiology and pathogenesis of Parkinson’s disease. Annual Review of Neuroscience, 22, 123–144. Perry, R. J., & Hodges, J. R. (1999). Attention and executive deficits in Alzheimer’s disease. A critical review. Brain, 122, 383–404. Posner, M. I., & Peterson, S. E. (1990). The attention system of the human brain. Annual Review of Neuroscience, 13, 25–42. Price, J. L., & Morris, J. C. (2004). So what if tangles precede plaques? Neurobiology of Aging, 25, 721–723. Price, J. L., McKeel, D. W., Jr, Buckles, V. D., Roe, C. M., Xiong, C., Grundman, M., et al. (2009). Neuropathology of nondemented aging: Presumptive evidence for preclinical Alzheimer disease. Neurobiology of Aging, 30(7), 1026–1036. Procyk, E., Tanaka, Y. L., & Joseph, J. P. (2000). Anterior cingulate activity during routine and non-routine sequential behaviors in macaques. Nature Neuroscience, 3(5), 502–508. Radvansky, G. A., & Curiel, J. M. (1998). Narrative comprehension and aging: The fate of completed goal information. Psychology and Aging, 13, 69–79.
296
Jeffrey M. Zacks and Jesse Q. Sargent
Radvansky, G. A., & Dijkstra, K. (2007). Aging and situation model processing. Psychonomic Bulletin & Review, 14, 1027–1042. Radvansky, G. A., Zacks, R. T., & Hasher, L. (1996). Fact retrieval in younger and older adults: The role of mental models. Psychology and Aging, 11, 258–271. Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain function. Proceedings of the National Academy of Sciences of the United States of America, 98, 676–682. Rapoport, J. L. (1990). Obsessive compulsive disorder and basal ganglia dysfunction. Psychological Medicine, 20, 465–469. Raz, N. (1996). Neuroanatomy of aging brain: Evidence from structural MRI. In E. D. Bigler (Ed.), Neuroimaging II: Clinical applications (pp. 153–182). New York: Academic Press. Raz, N. (2000). Aging of the brain and its impact on cognitive performance: Integration of structural and functional findings. Handbook of Aging and Cognition, 2, 1–90. Raz, N., Gunning, F. M., Head, D., Dupuis, J. H., McQuain, J. M., Briggs, S. D., et al. (1997). Selective aging of human cerebral cortex observed in vivo: Differential vulnerability of the prefrontal gray matter. Cerebral Cortex, 7, 268–282. Reynolds, J. R., Zacks, J. M., & Braver, T. S. (2007). A computational model of event segmentation from perceptual prediction. Cognitive Science, 31, 613–643. Rizzo, M., Anderson, S. W., Dawson, J., Myers, R., & Ball, K. (2000). Visual attention impairments in Alzheimer’s disease. Neurology, 54, 1954–1959. Rosen, V., Caplan, L., Sheesley, L., Rodriguez, R., & Grafman, J. (2003). An examination of daily activities and their scripts across the adult lifespan. Behavioral Research Methods, Instruments & Computers, 35, 32–48. Sakata, M., Farooqui, S. M., & Prasad, C. (1992). Post transcriptional regulation of loss of rat striatal D2 dopamine receptor during aging. Brain Research, 575, 309–314. Saxena, S., & Rauch, S. L. (2000). Functional neuroimaging and the neuroanatomy of obsessive-compulsive disorder. Psychiatric Clinics of North America, 23, 563–586. Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27. Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annual Review of Neuroscience, 23, 473–500. Schwan, S., & Garsoffky, B. (2004). The cognitive representation of filmic event summaries. Applied Cognitive Psychology, 18, 37–55. Schwan, S., Garsoffky, B., & Hesse, F. W. (2000). Do film cuts facilitate the perceptual and cognitive organization of activity sequences? Memory & Cognition, 28(2), 214–223. Schwartz, M. F. (2006). The cognitive neuropsychology of everyday action and planning. Cognitive Neuropsychology, 23, 202–221. Schwartz, M. F., Montgomery, M. W., Fitzpatrick-DeSalme, E. J., Ochipa, C., Coslett, H., & Mayer, N. (1995). Analysis of a disorder of everyday action. Cognitive Neuropsychology, 12, 863–892. Seger, C. A. (1994). Implicit learning. Psychological Bulletin, 115, 163–196. Shaw, T. G., Mortel, K. F., Meyer, J. S., Rogers, R. L., Hardenberg, J., & Cutaia, M. M. (1984). Cerebral blood flow changes in benign aging and cerebrovascular disease. Neurology, 34, 855–862. Shulman, G. L., Fiez, J. A., Corbetta, M., Buckner, R. L., Miezin, F. M., Raichle, M. E., et al. (1997). Common blood flow changes across visual tasks: Decreases in cerebral cortex. Journal of Cognitive Neuroscience, 9, 648–663. Sirigu, A., Cohen, L., Zalla, T., Pradat-Diehl, P., VanEeckhout, P., Grafman, J., et al. (1998). Distinct frontal regions for processing sentence syntax and story grammar. Cortex, I34, 771–778. Sirigu, A., Zalla, T., Pillon, B., Grafman, J., Agid, Y., & Dubois, B. (1996). Encoding of sequence and boundaries of scripts following prefrontal lesions. Cortex, 32, 297–310.
Event Perception: A Theory and Its Application to Clinical Neuroscience
297
Small, G. W., Kepe, V., Ercoli, L. M., Siddarth, P., Bookheimer, S. Y., Miller, K. J., et al. (2006). PET of brain amyloid and tau in mild cognitive impairment. New England Journal of Medicine, 355, 2652–2663. Speer, N. K., Swallow, K. M., & Zacks, J. M. (2003). Activation of human motion processing areas during event perception. Cognitive, Affective & Behavioral Neuroscience, 3, 335–345. Speer, N. K., & Zacks, J. M. (2005). Temporal changes as event boundaries: Processing and memory consequences of narrative time shifts. Journal of Memory and Language, 53, 125–140. Speer, N. K., Zacks, J. M., & Reynolds, J. R. (2007). Human brain activity time-locked to narrative event boundaries. Psychological Science, 18, 449–455. Sperling, R. A., LaViolette, P. S., O’Keefe, K., O’Brien, J., Rentz, D. M., Pihlajamaki, M., et al. (2009). Amyloid deposition is associated with impaired default network function in older persons without dementia. Neuron, 63, 178–188. Squire, L., & Zola-Morgan, S. (1991). The medial temporal lobe memory system. Science, 253, 1380–1386. Stevens, W. D., Hasher, L., Chiew, K. S., & Grady, C. L. (2008). A neural mechanism underlying memory failure in older adults. Journal of Neuroscience, 28(48), 12820–12824. Stine-Morrow, E. A. L., Gagne, D. D., Morrow, D. G., & DeWall, B. (2004). Age differences in rereading. Memory & Cognition, 32, 696–710. Storandt, M., & Beaudreau, S. (2004). Do reaction time measures enhance diagnosis of earlystage dementia of the Alzheimer type? Archives of Clinical Neurology, 19, 119–124. Svoboda, E., McKinnon, M. C., & Levine, B. (2006). The functional neuroanatomy of autobiographical memory: A meta-analysis. Neuropsychologia, 44, 2189–2208. Swallow, K. M., & Zacks, J. M. (2008). Sequences learned without awareness can orient attention during the perception of human activity. Psychonomic Bulletin & Review, 15(1), 116–122. Swallow, K. M., Zacks, J. M., & Abrams, R. A. (2009). Event boundaries in perception affect memory encoding and updating. Journal of Experimental Psychology: General, 138, 236–257. Swick, D., Senkfor, A. J., & Van Petten, C. (2006). Source memory retrieval is affected by aging and prefrontal lesions: Behavioral and ERP evidence. Brain Research, 1107(1), 161–176. Taylor, A. E., & Saint-Cyr, J. A. (1995). The neuropsychology of Parkinsons-Disease. Brain and Cognition, 28, 281–296. Thienel, R., Voss, B., Kellermann, T., Reske, M., Halfter, S., Sheldrick, A. J., et al. (2009). Nicotinic antagonist effects on functional attention networks. International Journal of Neuropsychopharmacology, 12(10), 1295–1305. Tse, C. S., Balota, D. A., Moynan, S. C., Duchek, J. M., & Jacoby, L. L. (2010). The utility of placing recollection in opposition to familiarity in early discrimination of healthy aging and very mild dementia of the Alzheimer’s type. Neuropsychology, 24(1), 49–67. Twamley, E. W., Ropacki, S. A., & Bondi, M. W. (2006). Neuropsychological and neuroimaging changes in preclinical Alzheimer’s disease. Journal of International Neuropsychological Society, 12, 707–735. Underwood, B. J. (1957). Interference and forgetting. Psychological Review, 64, 49–64. Ursu, S., Stenger, V. A., Shear, M. K., Jones, M. R., & Carter, C. S. (2003). Overactive action monitoring in obsessive-compulsive disorder: Evidence from functional magnetic resonance imaging. Psychological Science, 14, 347–353. Usher, M., Cohen, J. D., Servan-Schreiber, D., Rajkowski, J., & Aston-Jones, G. (1999). The role of the locus coeruleus in the regulation of cognitive performance. Science, 283, 549–554. Uttl, B., & Graf, P. (1993). Episodic spatial memory in adulthood. Psychology and Aging, 8, 257–273.
298
Jeffrey M. Zacks and Jesse Q. Sargent
Vallacher, R. R., & Wegner, D. M. (1987). What do people think they’re doing? Action identification and human behavior. Psychological Review, 94, 3–15. van Veen, V., & Carter, C. S. (2002). The anterior cingulate as a conflict monitor: fMRI and ERP studies. Physiology & Behavior, 77, 477–482. Verhaeghen, P. (2003). Aging and vocabulary score: A meta-analysis. Psychology and Aging, 18, 332–339. Verhaeghen, P., & Salthouse, T. A. (1997). Meta-analyses of age cognition relations in adulthood: Estimates of linear and non-linear age effects and structural models. Psychological Bulletin, 122, 231–249. Volkow, N. D., Gur, R. C., Wang, G. J., Fowler, J. S., Moberg, P. J., Ding, Y. S., et al. (1998). Association between decline in brain dopamine activity with age and cognitive and motor impairment in healthy individuals. American Journal of Psychiatry, 155(3), 344–349. Wager, T. D., & Smith, E. E. (2003). Neuroimaging studies of working memory: A metaanalysis. Cognitive, Affective & Behavioral Neuroscience, 3, 255–274. Wagner, A. D., Shannon, B. J., Kahn, I., & Buckner, R. L. (2005). Parietal lobe contributions to episodic memory retrieval. Trends in Cognitive Science, 9, 445–453. Wechsler, D. (1997). Wechsler Adult Intelligence Scale (3rd ed.). San Antonio, TX: The Psychological Corporation. Welsh, K. A., Butters, N., Hughes, J. P., Mohs, R. C., & Heyman, A. (1992). Detection and staging of dementia1 in Alzheimer’s disease: Use of the neuropsychological measures developed for the consortium to establish a registry for Alzheimer’s disease. Archives of Neurology, 49, 448–452. Wood, J. N., & Grafman, J. (2003). Human prefrontal cortex: Processing and representational perspectives. Nature Reviews Neuroscience, 4, 139–147. Zacks, J. M., Braver, T. S., Sheridan, M. A., Donaldson, D. I., Snyder, A. Z., Ollinger, J. M., et al. (2001). Human brain activity time-locked to perceptual event boundaries. Nature Neuroscience, 4, 651–655. Zacks, J., Speer, N., & Reynolds, J. R. (2009). Segmentation in reading and film comprehension. Journal of Experimental Psychology: General, 138, 307–327. Zacks, J. M., Speer, N. K., Swallow, K. M., Braver, T. S., & Reynolds, J. R. (2007). Event perception: A mind/brain perspective. Psychological Bulletin, 133, 273–293. Zacks, J. M., Speer, N. K., Vettel, J. M., & Jacoby, L. L. (2006). Event understanding and memory in healthy aging and dementia of the Alzheimer type. Psychology and Aging, 21, 466–482. Zacks, J. M., Swallow, K. M., Vettel, J. M., & McAvoy, M. P. (2006). Visual motion and the neural correlates of event perception. Brain Research, 1076, 150–162. Zacks, J. M., & Tversky, B. (2001). Event structure in perception and conception. Psychological Bulletin, 127, 3–21. Zacks, J. M., Tversky, B., & Iyer, G. (2001). Perceiving, remembering, and communicating structure in events. Journal of Experimental Psychology: General, 130, 29–58. Zacks, R. T., & Hasher, L. (1994). Directed ignoring: Inhibitory regulation of working memory. In D. Dagenbach & T. H. Carr (Eds.), Inhibitory mechanisms in attention, memory, and language (pp. 241–264). New York, NY: Academic Press. Zalla, T., Pradat-Diehl, P., & Sirigu, A. (2003). Perception of action boundaries in patients with frontal lobe damage. Neuropsychologia, 41, 1619–1627. Zalla, T., Sirigu, A., Pillon, B., Dubois, B., Agid, Y., & Grafman, J. (2000). How patients with Parkinson’s disease retrieve and manage cognitive event knowledge. Cortex, 36, 163–179. Zalla, T., Sirigu, A., Pillon, B., Dubois, B., Grafman, J., & Agid, Y. (1998). Deficient in evaluating pre-determinated sequences of script events in patients with Parkinson’s disease. Cortex, 34, 621–628.
Event Perception: A Theory and Its Application to Clinical Neuroscience
299
Zalla, T., Verlut, I., Franck, N., Puzenat, D., & Sirigu, A. (2004). Perception of dynamic action in patients with schizophrenia. Psychiatry Research, 128, 39–51. Zanini, S. (2008). Generalised script sequencing deficits following frontal lobe lesions. Cortex, 44, 140–149. Zanini, S., Rumiati, R., & Shallice, T. (2002). Action sequencing deficit following frontal lobe lesion. Neurocase, 8, 88–99. Zor, R., Keren, H., Hermesh, H., Szechtman, H., Mort, J., & Eilam, D. (2009). Obsessivecompulsive disorder: A disorder of pessimal (non-functional) motor behavior. Acta Psychiatrica Scandinavica, 120, 288–298. Zwaan, R. A., & Radvansky, G. A. (1998). Situation models in language comprehension and memory. Psychological Bulletin, 123(2), 162–185.
C H A P T E R
E I G H T
Two Minds, One Dialog: Coordinating Speaking and Understanding Susan E. Brennan, Alexia Galati, and Anna K. Kuhlen Contents 1. Introduction: The Joint Nature of Language Processing 2. Dialog: Beyond Transcripts 3. Process Models of Dialog 3.1. The Message Model 3.2. Two-Stage Models 3.3. The Collaborative View and the Grounding Model 4. The Role of Cues in Grounding 5. Partner-Specific Processing 5.1. Global and Local Adaptations 5.2. Speakers Adapt Utterances for Their Addressees 5.3. Addressees Adapt Utterance Interpretations to Speakers 5.4. Simple or ‘‘One-Bit’’ Partner Models 6. Neural Bases of Partner-Adapted Processing 6.1. Mirroring 6.2. Theory of Mind 6.3. Distinguishing a Partner’s Perspective from One’s Own: The Role of Executive Control 6.4. Mentalizing Versus Mirroring 6.5. Cues Hypothesized to Support Partner-Adapted Processing 7. Conclusions Acknowledgments References
302 304 307 308 310 311 313 315 316 320 323 324 324 325 326 330 332 333 335 337 338
Abstract In this chapter, we consider communication as a joint activity in which two or more interlocutors share or synchronize aspects of their private mental states and act together in the world. We summarize key experimental evidence from our own and others’ research on how speakers and addressees take one another into account while they are processing language. Under some circumstances, production and comprehension are adjusted to a partner’s perspective or characteristics in the early moments of processing, in a flexible and Psychology of Learning and Motivation, Volume 53 ISSN 0079-7421, DOI: 10.1016/S0079-7421(10)53008-1
#
2010 Elsevier Inc. All rights reserved.
301
302
Susan E. Brennan et al.
probabilistic fashion. We advocate studying the coordination and integration of cognitive products and processes both between and within the minds of interlocutors. We then discuss recent evidence from electrophysiology and imaging studies (relevant to Theory of Mind and to mirroring) that has begun to illuminate brain networks that underlie the coordination of joint and individual processing during communication.
1. Introduction: The Joint Nature of Language Processing The scientific study of language has been shaped by the assumption that the human language faculty evolved for thinking rather than for communicating (e.g., Chomsky, 1965, 1980). This ‘‘language-as-product’’ tradition takes language itself as the object of study, focusing on grammatical knowledge and the core processes for recovering linguistic structure from sentences. This common focus has given generations of psycholinguists and other cognitive scientists license to concentrate on the study of the linguistic representation and processing in the mind and brain of a lone (and largely generic) native speaker, independent of context. As a result, a great deal is known about how individuals store, organize, and access knowledge in the mental lexicon; how individuals parse sentences and resolve syntactic ambiguity; and how individuals plan and articulate utterances. But there is more to language processing than these (seemingly) autonomous processes, as has been demonstrated by those who work within the ‘‘language-as-action’’ tradition (e.g., Brennan & Clark, 1996; Clark, 1992; Clark & Wilkes-Gibbs, 1986; Fussell & Krauss, 1989, 1991, 1992; Glucksberg, Krauss, & Weisberg, 1966; Hanna, Tanenhaus, & Trueswell, 2003; Krauss, 1987; Schober & Clark, 1989). Consider three students, Leah, Dale, and Adam, who are trying to recall a scene from an excerpt of a movie1 that they recently watched together, in which the protagonist is forced to wear an odd and embarrassing object: ... Leah: um. . . then he gets punished or whatever? Dale: what was that, a wreath or— Leah: yeah it was some kind of browny— Adam: yeah it was some kind of straw thing or something Leah: mhm Dale: around his neck Leah: so that everybody knew what he did or something? 1
The scene comes from a John Sayles movie, The Secret of Roan Inish.
Two Minds, One Dialog: Coordinating Speaking and Understanding
303
Adam: straw wreath Dale: yeah . . . (excerpted from Brennan & Ohaeri, 1999)
Even though this transcript bears little resemblance to the idealized sentences typical of playwrights’ scripts, psycholinguists’ stimuli, or linguists’ grammaticality judgments, it unfolds in an orderly way. The three partners rapidly succeed in establishing consensus as they share a focus of attention, cue one another’s memories, and ratify one another’s proposals about what to include in the product they are constructing together: their joint memory of the event. In doing this, they even complete one another’s utterances. The product represented by this transcript reflects a process by which both memory recall and speaking are grounded in action conducted jointly, rather than achieved by minds working alone. Such data from studies of language-asaction (Clark, 1992; Tanenhaus & Trueswell, 2004) focus on language use in physical or communicative contexts. This particular spontaneous exchange comes from a large corpus recorded in an experimental study of collaborative recollection (Ekeocha & Brennan, 2008). It is so typical of everyday conversation as to seem rather unremarkable and yet at the same time, displays a level of coordination between partners that is astonishing in its virtuosity. There is a growing trend within cognitive science to examine human cognition in social contexts, either pairwise or in small groups. This includes recall of memories (e.g., Ekeocha & Brennan, 2008; Harris, Paterson, & Kemp, 2008; Hollingshead, 1998; Weldon & Bellinger, 1997), collaborative visual search (e.g., Brennan, Chen, Dickinson, Neider, & Zelinsky, 2007; Neider, Chen, Dickinson, Brennan, & Zelinsky, 2005), decision making (e.g., Kiesler & Sproull, 1992; Wiley & Jensen, 2006), learning (e.g., Wiley & Bailey, 2006), two-person motor activities (e.g., Sebanz, Bekkering, & Knoblich, 2006; Sebanz & Knoblich, 2009), and of course, psycholinguistic processing in dialog. Some have argued that processing may be qualitatively different in the context of dialog than in monologue because both speech comprehension and speech planning systems are active at once (e.g., Pickering & Garrod, 2004). Others argue that, at least initially, language processes in dialog are identical to language processes in monologue because conversational partners process language from their own ‘‘egocentric’’ perspectives in which early processing is encapsulated from partner-specific information (e.g., Barr & Keysar, 2002; Keysar, Barr, Balin, & Brauner, 2000; Keysar, Barr, Balin, & Paek, 1998; Kronmu¨ller & Barr, 2007), followed by a second stage in which they can take their partner’s perspective into account. We take the view that processing in dialog can be explained by ordinary memory processes (Horton & Gerrig, 2002, 2005a, 2005b; Metzing & Brennan, 2003) and argue that these processes need not be encapsulated, but under some circumstances, are adapted flexibly and rapidly to the perspective of a conversational partner.
304
Susan E. Brennan et al.
In addition to the coordination that takes place interpersonally, between partners, language processes are also coordinated intrapersonally, within the mind of an individual with many processes conducted in parallel: For instance, an individual speaker simultaneously plans and articulates an utterance while monitoring an addressee’s reactions, and an individual addressee simultaneously listens to and interprets an utterance moment by moment while preparing what to say next, or even how to contribute to what the speaker is saying. This appears to require that various subprocesses of planning, parsing, interpretation, articulation, and monitoring must be able to share information and influence one another in a rather fine-grained way. Even though key capabilities that make human communication possible—such as the language faculty itself, the ability to mentalize about another person’s mental state (or Theory of Mind—ToM), and the ability to respond rapidly and automatically to sensorimotor cues from human motion, speech, and other behaviors—may to some extent be supported by neural circuits thought to be distinct (Van Overwalle & Baetens, 2009), behavioral evidence suggests that there is close integration of these underlying processes (and their products), both within and between the minds of interlocutors. This, we argue, is what the study of language processing should aim to map, model, and explain. In this chapter, we consider language processing in communicative contexts as a joint activity in which two or more interlocutors share or synchronize aspects of their private mental states and act together in the world. We summarize key experimental evidence from our own and others’ research on how speakers and addressees take one another into account during communication. Under some circumstances, interlocutors can adjust to information about a partner’s characteristics, needs, or knowledge in the early moments of processing. The accumulating evidence suggests that cognitive processing is probabilistic and flexible in how it adapts to partnerspecific information (Brennan & Hanna, 2009; Jurafsky, 1996; MacDonald, 1994; Tanenhaus & Trueswell, 1995). We then discuss the evidence from electrophysiology and imaging studies that has begun to illuminate the neural architecture supporting joint and individual processing during communication.
2. Dialog: Beyond Transcripts As evident from the example of the three students recalling a movie together, the process of coordinating meaning leaves behind striking evidence in the dialog transcript. A transcript is an analyzable product that can provide evidence about how interpersonal coordination unfolds, as one utterance seems to shape what is said next. Transcripts show that successive
Two Minds, One Dialog: Coordinating Speaking and Understanding
305
utterances produced by interlocutors often display recognizable contingency. One speaker may complete another’s utterance by adding an installment that seamlessly continues its syntactic structure, as in our opening example (for studies of collaborative completions, see DuBois, 1974; Lerner, 1996; Wilkes-Gibbs, 1986). Many important descriptive insights about structural phenomena in conversation such as turn-taking, repair, and co-construction of utterances have been presented by ethnomethodologists who analyze detailed transcripts of naturally occurring conversations (e.g., Goodwin, 1981; Jefferson, 1973; Sacks, Schegloff, & Jefferson, 1974). Although a transcript can be informative, it is only an artifact of the processes that generate it; people who overhear a conversation (including those who analyze it later) may not understand it in the same way that participants do (Kraut, Lewis, & Swezey, 1982; Schober & Clark, 1989). Psycholinguists who study dialog are interested in systematically probing the processes from which a transcript emerges. To understand what people might intend when they say what they say, psychologists (e.g., Clark, 1992; Glucksberg et al., 1966) have wrestled conversation into the laboratory in order to test hypotheses about language use and processing (often inspired by insights from conversation analysts). Experimental control and reliability are achieved by assigning different pairs of subjects to complete the same task in which they refer to, look at, pick up, and move objects. By observing such task-oriented dialog, the experimenter has access not only to the transcript, but also to physical evidence of what speakers mean and what addressees understand. This has led to conclusions about the underlying cognitive mechanisms of phenomena such as lexical choice and variability, perspective taking, distribution of initiative, conversational repair, the accumulation of common ground between partners, and audience design, or tailoring an utterance to a particular partner. Consider these three excerpts from the transcript of a referential communication experiment in which two naı¨ve partners could hear but not see each other (Stellmann & Brennan, 1993). Partners A and B each had a duplicate set of 12 cards displaying abstract geometric objects. The matcher (B) needed to arrange his cards in the same order as the director’s (A’s) cards. They did this for the first time in Trial 1, after which the cards were scrambled and matched again repeatedly (Trials 2 and 3): Trial 1: A: ah boy this one ah boy alright it looks kinda like, on the right top there’s a square that looks diagonal B: uh huh A: and you have sort of another like rectangle shape, the– like a triangle, angled, and on the bottom it’s ah I don’t know what that is, glass shaped
306
Susan E. Brennan et al.
B: alright I think I got it A: it’s almost like a person kind of in a weird way B: yeah like like a monk praying or something A: right yeah good great B: alright I got it (etc. – they match about a dozen other cards)
Trial 2: B: 9 is that monk praying A: yup (etc. – they match other cards)
Trial 3: A: number 4 is the monk B: ok
This matching task elicits data about interlocutors’ spontaneous productions (from the transcript) and interpretations (from observing physical evidence provided by when and where the matcher moves the cards). The combination of behavioral evidence in the context of an experimentally controlled setting, synchronized with speech documented in the transcript, has provided powerful evidence for common ground or partially and mutually shared mental representations that presumably accumulate in the minds of both partners as they interact (whether in a laboratory experiment or in everyday conversation). Grounding enables partners to achieve a joint perspective on an object, such that referring to it becomes more efficient over time. The process of grounding typically results in entrainment, or convergence and synchronization between partners on various linguistic and paralinguistic levels—including in wording, syntax, speaking rate, gestures, eye-gaze fixations, body position, postural sway, and sometimes pronunciation (e.g., Branigan, Pickering, & Cleland, 2000; Brennan & Clark, 1996; Giles & Powesland, 1975; Levelt & Kelter, 1982; Shockley, Richardson, & Dale, 2009). Transcripts of different pairs of partners referring repeatedly to the same object demonstrate that there is less variability in the wording and perspectives associated with objects within a particular dialog than between dialogs (Brennan & Clark, 1996). In one experiment, 13 pairs each created, entrained on, and consistently reused one of 13 different perspectives for the geometric tangram figure in Figure 1 (Stellmann & Brennan, 1993). The perspective that two interlocutors ground during a dialog, then, is another kind of joint product that emerges from interpersonal interaction. At the same time, interlocutors who share a communicative goal can be flexible in revising jointly achieved perspectives when necessary. And they can be extremely flexible in what they are willing to negotiate an expression or even a single word to mean.
Two Minds, One Dialog: Coordinating Speaking and Understanding
307
“A bat” “The candle” “The anchor” “The rocket ship” “The Olympic torch” “The Canada symbol” “The symmetrical one” “Shapes on top of shapes” “The one with all the shapes” “The bird diving straight down” “The airplane flying straight down” “The angel upside down with sleeves” “The man jumping in the air with bell bottoms on”
Figure 1 Perspectives vary across conversations.
Although a transcript can vividly illustrate some of these interpersonal products of interactive dialog, it often says little about how language processing unfolds incrementally and intrapersonally (within the mind of a participant). A major methodological advance has been the ‘‘visual worlds’’ paradigm pioneered by Tanenhaus, Spivey-Knowlton, Eberhard, and Sedivy (1995). This experimental paradigm measures the looking behavior of listeners who wear inobstrusive, head-mounted eye trackers while hearing prerecorded or scripted utterances that refer to visible objects; it measures indirect evidence of processing at a fine temporal grain, computed from the proportions of looks to an object within a defined epoch, in order to uncover the time course of lexical, prosodic, syntactic, semantic, and pragmatic processing (e.g., Altmann & Kamide, 2007). Some recent studies have merged the visual worlds eyetracking paradigm with referential communication tasks done jointly by two spontaneously interacting partners (e.g., Brown-Schmidt, 2009; BrownSchmidt, Gunlogson, & Tanenhaus, 2008; Hanna & Brennan, 2007; Kraljic & Brennan, 2005). This approach has the potential to uncover not only how processing unfolds online within an individual engaged in dialog, but also how processing is coordinated incrementally between individuals.
3. Process Models of Dialog What is the nature of dialog? All experimental studies of collaborative cognition rely on some notion, often entirely implicit, of what it means to participate in a dialog or to otherwise process information along with a partner (Kuhlen & Brennan, 2008). Some studies rely on the mere presence of one or more partners who may not be allowed to interact; this approach
308
Susan E. Brennan et al.
presumes that the effect of interpersonal collaboration is strictly motivational. Others allow a partner to contribute to the interaction only once, which decouples coordination processes from language processing. These approaches seem to assume that collaboration is based on a unidirectional exchange of information: While one conversational partner speaks the other listens passively. Some studies control the timing, order, or kinds of contributions that partners may make during a task (e.g., Basden, Basden, & Henry, 2000; Wright & Klumpp, 2004); while this may be desirable for controlling variation due to behavioral contingencies, it removes partners’ ability to take initiative, treats what may be meaningful coordinating signals as noise, and probably rules out any but the simplest sorts of coordination of the processes under study. Some psycholinguistic studies of dialog gain control by using confederates (whether human or simulated). But unless a confederate is doing the task for real, with actual communicative needs, the confederate’s behavior can differ in troubling ways from the spontaneous behavior of a naı¨ve partner. For instance, when a confederate plays the role of an addressee over and over in a study about speech production, she may know what the speaker is about to say better than the speaker himself does, and her feedback and nonverbal cues, if not carefully characterized and controlled, are very likely to communicate her lack of a need for information (Brennan & Williams, 1995; Kuhlen & Brennan, 2010; Lockridge & Brennan, 2002). For that reason, we are wary of using confederates in the addressee role unless they are actually doing a task with the subject. Most of our studies of language use and processing have used pairs of truly naı¨ve speakers and addressees (e.g., Bortfeld & Brennan, 1997; Brennan, 1990, 1995, 2004; Brennan & Clark, 1996; Brennan & Ohaeri, 1999; Brennan et al., 2007; Ekeocha & Brennan, 2008; Galati & Brennan, 2010a; Hanna & Brennan, 2007; Kraljic & Brennan, 2005; Lockridge & Brennan, 2002). Some have had copresent confederate speakers who interact mostly spontaneously with naı¨ve addressees, producing only certain critical utterances according to a partial script (e.g., Hwang, Brennan, & Huffman, 2007; Metzing & Brennan, 2003). A few have used prerecorded utterances but without any pretense that a live speaker is present (e.g., Perryman & Brennan, 2009). The point is that one partner’s behavior shapes another’s during dialog or during collaboration more generally (Kuhlen & Brennan, 2010), and this should be acknowledged when confederates are employed (Kuhlen & Brennan, 2008). In this section, we describe three influential views of processing in dialog, each of which makes quite different assumptions about its essential aspects.
3.1. The Message Model The message model of communication (or as Pickering & Garrod call it in their 2004 critique, the autonomous transmission model ) is intuitively plausible and widely assumed among the cognitive sciences (e.g., Akmajian,
Two Minds, One Dialog: Coordinating Speaking and Understanding
309
Demers, & Harnish, 1987). This model is derived from information theory (MacKay, 1983; Shannon & Weaver, 1949; Wiener, 1965), in which information is defined in probabilistic terms; what is less probable is more informative. Communication involves the transmission and reception of information, which flows at a particular rate through a channel. One agent, a sender, encodes a message into a language and transmits it to another, a recipient, who decodes it; the two agents can communicate as long as they both have the same set of encoding and decoding rules (e.g., a language). Feedback (e.g., ‘‘backchannels’’ in conversation; Yngve, 1970) regulates the flow of information. The message model is consistent with the conduit metaphor (see critique by Reddy, 1979), in which words are treated like packages of meaning sent by speakers to listeners. It is difficult to think formally about communication without invoking the conduit metaphor and other information theoretic terms (Eden, 1983). The approach represented by the message model decouples coordination from language per se, and it does not require that one partner recognizes an intention to communicate in the other. It has been used to model interactions between humans, between nonhumans, between mechanical processes, and between humans and machines (Wiener, 1965). But it is difficult to see how the message model could explain the tightly coordinated exchange among Leah, Dale, and Adam, in which their contributions defy relegating them to roles of sender or receiver, and meanings have no simple mapping but are negotiated so fluidly and flexibly. As these three recall the movie together, they coauthor a jointly recalled and articulated product (rather than formulating and sending signals autonomously). They all recognize a common goal. And in the first trial from the ‘‘monk praying’’ example, Partner A was the one who knew the identity of the target objects (and so should be considered to be the sender of the message), and yet it is B (the recipient) who ended up proposing the perspective that they entrain upon. As Figure 1 illustrates, there is no predictable mapping of perspective or label to object. We argue (as do Reddy, 1979; Schober, 1998) that words do not ‘‘contain’’ their meanings; even labels for common objects that are highly conventional can turn out to be negotiable. This means that there is no guaranteed 1:1 mapping of meaning to word, even for basic level terms. As Brennan and Clark (1996) showed in a series of referential communication studies, once speakers have entrained upon a perspective for a common object (e.g., calling a shoe the man’s loafer to distinguish it from other shoes), they often continue to use the over-informative term even when this level of detail is no longer necessary (when the man’s loafer is the only shoe). In fact, native speakers of English may even produce wildly nonidiomatic referring expressions (e.g., the chair in which I shake my body for a rocking chair or the chair with five little tires on the bottom for an office chair) to maintain a perspective that has been mutually achieved with a non-native speaker (Bortfeld & Brennan, 1997). The message model does not account
310
Susan E. Brennan et al.
for such flexibility. Because we are interested in understanding how people coordinate joint actions interpersonally and how they coordinate joint action with language processing intrapersonally, we find that the message model presents an unsatisfying view of communication.
3.2. Two-Stage Models Several accounts of cognitive processing in dialog can be grouped together because they presume that processing is conducted in two distinct stages. According to the ‘‘interactive alignment’’ model (Pickering & Garrod, 2004), language processing in a dialog setting is fundamentally different from language processing in monologue because in dialog, both the speech production and speech comprehension systems are active at once, with the two systems assumed to have parity of representations. The interactive alignment model further assumes that interlocutors routinely come to achieve shared mental representations through a ‘‘direct’’ process of priming. Priming is proposed as the mechanism that explains convergent linguistic behaviors both between and within interlocutors such as lexical entrainment, shared perspectives, and the reuse of syntactic forms. According to this account, interlocutors converge on shared terms (such as in our earlier ‘‘monk praying’’ example) simply because one partner’s utterance primes another’s. Interpersonally, alignment is claimed to be direct and automatic. As the basis for such imitation, Pickering and Garrod (p. 188) invoke the human mirror system (to be discussed in Section 6), as well as the fact that the same brain areas (Brodmann’s Areas 44 and 45; see Iacoboni et al., 1999) are implicated in both language processing and imitation. On Pickering and Garrod’s view, processing in dialog defaults to what is assumed to be automatic and inflexible, driven by priming. The interactive alignment model is compatible with two-stage proposals by Keysar and colleagues (e.g., the ‘‘monitoring and adjustment’’ theory: Horton & Keysar, 1996 and ‘‘perspective adjustment’’ theory: Keysar, Barr, & Horton, 1998) that assume that early processes in dialog are unable to take account of a partner. On these proposals, interlocutors often share the same context, knowledge, or informational needs, so that what appears to be audience design (when one partner seems to take the other’s knowledge or mental state into account) is actually done for the self (Brown & Dell, 1987). As with the interactive alignment model, the first stage of these models is fast, automatic, and encapsulated from all but ‘‘egocentric’’ information, followed by an inferential stage that can accommodate partner-specific information, but more slowly. On these approaches, such mentalizing about a partner (or deploying ‘‘full common ground’’ to plan or process an utterance) is thought to be computationally expensive (e.g., Pickering & Garrod, 2004, p. 180), and therefore either optional or else
Two Minds, One Dialog: Coordinating Speaking and Understanding
311
invoked only when necessary for a repair: ‘‘normal conversation does not routinely require modeling the interlocutor’s mind’’ (Pickering & Garrod, 2004, p. 180). The interactive alignment theory further assumes that, intrapersonally or within the mind of an individual, priming at one level of linguistic processing (e.g., phonological) leads directly to alignment at another level (e.g., lexical representation), and that this automatically results in shared representations between partners at all levels of linguistic processing (Pickering & Garrod, 2004). But for this proposal to work, both interlocutors would have to be exact copies of one another. The problem is that presumably any conceptual networks that undergo priming within an individual’s mind will have been sculpted by their idiosyncratic experiences and memories, and so it seems unlikely that shared meanings can be reached simply by priming (see Schober, 2004 for a related critique). Priming is simply the underlying currency by which language and memory are purchased, with multiple elements being primed at a given moment. As we will argue in Section 5.3, priming is not a satisfying explanation for convergent behaviors such as entrainment because such behaviors have a partner-specific component. Note that not all of the theories that assign a prominent role to priming in order to account for convergent behavior agree that priming results in shared mental representations. In the ‘‘coordinative structures’’ proposal (Shockley et al., 2009), which focuses on convergent behaviors such as gaze patterns, body sway, and postural coordination, the authors argue that at least for these behavioral adjustments, executive control (and presumably mentalizing) does not play a role (p. 315) since these behaviors happen too rapidly, and since postural mimicry and sway are largely unconscious. The question remains, then, whether linguistic and communicative behaviors can also be aligned at multiple levels of linguistic processing without involving executive control and without achieving aligned mental representations.
3.3. The Collaborative View and the Grounding Model Like the interactive alignment model, the grounding model views dialog as fundamentally different from monologue, but for different reasons (see Clark & Brennan, 1991 for discussion; see Cahn & Brennan, 1999; Clark & Schaefer, 1989 for formal models of grounding). According to this view, spoken communication is conducted not only as a kind of joint activity, but as a collaboration (Clark, 1992; Clark & Wilkes-Gibbs, 1986). On this view, words do not ‘‘contain’’ meanings, there are no ‘‘default’’ contexts, and entrainment and understanding are not automatic byproducts of priming. Rather, communicative signals are intended to be recognized as such by communicating partners. Meanings are coordinated
312
Susan E. Brennan et al.
through grounding, the interactive process by which people in dialog seek and provide evidence that they understand one another (Brennan, 1990, 2004). Evidence used for grounding can be explicit, such as a backchannel response (uhuh) or clarification question, or it can be implicit, such as displaying continuing attentiveness via eye contact or continuing with a next relevant utterance. Interlocutors spontaneously provide evidence of what they themselves understand; they also monitor one another for such evidence, and when it is not forthcoming (or else not what they expect), they seek it out. Depending on their purposes and the task at hand, they set higher or lower grounding criteria for the form, strength, and amount of evidence they seek or provide at any particular point (Brennan, 1990, 2004; Clark & Brennan, 1991; Clark & Schaefer, 1989; Clark & Wilkes-Gibbs, 1986; Wilkes-Gibbs, 1986). According to Clark and Schaefer’s (1989) grounding model, Partner A cannot know whether her utterance (‘‘number 4 is the monk’’) constitutes a contribution to the conversation (and to the common ground she is accruing with Partner B) until there is some evidence, verbal or nonverbal, about how (or whether) Partner B has heard and understood it (‘‘ok’’). On this model, each contribution to a conversation has a presentation phase (an utterance) and an acceptance phase (the evidence that comes after it). A speaker evaluates her addressee’s response against the response she expected; she can then refashion her utterance and represent it, or even revise her original intention so that it now converges with the one her addressee seems to be recognizing or proposing. Elsewhere we have conceptualized grounding as a process of joint hypothesis testing (Brennan, 1990, 2004), by which an addressee also forms incremental interpretations or meaning hypotheses as an utterance unfolds (Krauss, 1987) and then tests and revises them as more evidence accrues. From the speaker’s perspective, the unfolding utterance embodies her hypothesis about what she believes might induce her addressee to recognize and take up her intention at a particular moment. Experimental studies of grounding often observe pairs of interlocutors doing a joint task, such as matching duplicate objects (as with the three trials in our previous example in which Partners A and B became increasingly efficient while discussing tangram figures). What began as a provisional, complex, and possibly incoherent proposal for a suitable perspective on an object (Trial 1 in our previous example) was ratified during the grounding process; both partners converged on an efficient and streamlined label for a perspective built on their common ground (Trials 2 and 3). Both took responsibility for making sure communication succeeds, not just Partner A (the one who knew the target configuration): A: it’s almost like a person kind of in a weird way B: yeah like like a monk praying or something
Two Minds, One Dialog: Coordinating Speaking and Understanding
313
According to the assumptions of the message model, which assumes that communication is about one person who has information transmitting it to another who does not have it, this should not happen. According to the collaborative view, this is not unusual. Sometimes it is not clear whether partner-adapted processing is due to cues produced during the grounding process, or from the explicit representation of a partner’s perspective. An early study that documented partneradapted referring during referential communication (Brennan & Clark, 1996) had pairs of naı¨ve speakers establish referential precedents during spontaneous conversation (e.g., using the high heel, to distinguish one shoe from several); after that, speakers either continued to interact with the same partner or else were paired with a new one to match the same objects. When continuing with the same partners, speakers continued to use the same terms they had entrained upon even when this was over-informative (e.g., when there was only one shoe in the set). But they tended to switch to the unadorned basic level term (e.g., shoe) when interacting with a brand new partner who had not matched the objects before. This partner-specific effect may have been shaped by speakers mentalizing about what their partners knew, by cues that partners presented about their knowledge or needs during the dialog, or by both of these factors in combination. These two sources of information may be independent, or they may interact.
4. The Role of Cues in Grounding Experimental work within the grounding framework has focused on coordination by examining the role of nonlinguistic and nonverbal cues, including elements that other traditions have considered mere noise—either a product not worth studying or one too difficult to study systematically. These elements include paralinguistic cues (both verbal and nonverbal) such as acknowledgments or eye contact (Schober & Clark, 1989). Paralinguistic cues may be used in a variety of ways, such as to display an addressee’s continued attention to (or confusion about, or alignment with) an utterance, to signal a speaker’s degree of commitment toward what she is saying, to invite an addressee to participate in completing an utterance, to capture the addressee’s attention, to display a speaker’s awareness of a speech disfluency or other problem in speaking, or to initiate or invite a repair (e.g., Brennan & Williams, 1995; Clark & Fox Tree, 2002; Goodwin, 1981). Additional evidence of a partner’s understanding comes from incremental progress in whatever joint task interlocutors are doing (Brennan, 1990). During the process of grounding, interlocutors produce and monitor paralinguistic cues and monitor one another’s instrumental behavior in order to seek and provide evidence that they understand one another.
314
Susan E. Brennan et al.
We propose that the use of such cues in grounding facilitates the kind of intrapersonal ‘‘mind reading’’ needed for interlocutors to conclude that they are both talking about the same thing. These paralinguistic signals (track 2 or secondary signals; Clark, 1994, 1996) provide information about the ongoing utterance itself (as distinct from track 1 signals, which encode the ‘‘official business’’ of the utterance; Clark, 1994, 1996). The interactive alignment model (Pickering & Garrod, 2004), along with its cousins (Barr & Keysar, 2002; Dell & Brown, 1991; Horton & Keysar, 1996; Keysar, Barr, & Horton, 1998), ignores any early or automatic role that such cues may play in shaping language processing in dialog (largely ruling out the kind of flexible collaboration that such signals could help achieve, and instead focusing on what is achieved by automatic, ‘‘dumb’’ priming). Most versions of the message model allow a role for backchannel cues limited to regulating the rate of information flow rather than modeling how the evidence provided by a partner may collaboratively shape the incremental products of dialog. Of the models we have reviewed here, only the grounding model assigns a major role to such cues. Are such cues really communicative? An essential aspect of communication is the ability of one person to recognize another’s intention to communicate. This, according to Grice (1957), is what differentiates natural information (e.g., smoke is a symptom caused by fire) from non-natural (e.g., a smoke signal may be recognizable as an intentional communicative act). What starts out developmentally as a natural cue, such as a cry of pure distress produced by a baby who is hungry, develops into an intentional display intended to be communicative, as when a child cries to get her parents’ attention. Although savvy parents can tell the difference, sometimes the distinction between natural and non-natural cues is ambiguous (see Harding, 1982 for more on relevant cues in development). A cue may serve both communicative and instrumental purposes; it is not always easy to differentiate communicative from noncommunicative behavior. Consider the production of um and uh, short elements sometimes known as ‘‘fillers.’’ Clark and Fox Tree (2002) have argued that such signals are communicative, that they can facilitate processing, and in fact, that um contrasts with uh in much the same way that lexical items do. However, facilitation may be due to the time that elapses while the filler is produced rather than to its phonetic form (Brennan & Schober, 2001). Moreover, a cue can facilitate processing for an addressee without being communicative. Consider three criteria that must be met for a cue to be ‘‘communicative’’ (proposed by Brennan & Williams, 1995): Criterion 1. The cue must be potentially informative; that is, it must encode information. Criterion 2. The addressee must be able to process the cue and recover the information.
Two Minds, One Dialog: Coordinating Speaking and Understanding
315
Criterion 3. Finally, the cue must be able to be modified by the speaker’s intentions. This does not require that the speaker be consciously aware of planning or modifying the cue per se, but only that the cue be shaped by the speaker’s intentions toward the addressee or what they are doing together. We acknowledge that some paralinguistic cues may be produced communicatively while others may not be; nevertheless, even the cues that do not meet Criterion 3 can still serve a coordinating function, helping partners in conversation seek and provide evidence about what each other intends and understands. Consider the phenomenon of ‘‘Feeling of Knowing’’ (Hart, 1965), the metalinguistic ability to assess one’s own knowledge. Speakers can display their confidence (or lack thereof) when they answer a question, via the latency to their answer, the use of rising intonation, a filler such as uh or um, and self-speech (Smith & Clark, 1993). Speakers who display uncertainty while recalling an answer or certainty when saying ‘‘I don’t know’’ are likely to fail to recognize the answer later on a multiple choice test. This satisfies Criterion 1; the paralinguistic cue displays reliable information about what the speaker really knows. It turns out that these cues are also interpretable by addressees (as a ‘‘Feeling of Another’s Knowing,’’ Brennan & Williams, 1995; Swerts & Krahmer, 2005), satisfying Criterion 2 and potentially aiding coordination. However, such cues may simply emerge from the speakers’ own ease or difficulty in recalling, planning, and articulating an answer; whether they are actually communicative or not depends on whether speakers modify the cues based on their intentions toward their addressees. One way to test for Criterion 3 is to have speakers answer questions that are either sincere (the speaker knows that the partner who asked the question does not know the answer) or rhetorical (the speaker knows that the partner knows the answer, similar to a student answering a question posed by a teacher; Brennan & Kipp, 1996; Brennan, Kuhlen, & Ratra, 2010). So far we have focused our discussion of cues on their potential as interpersonal signals in the process of grounding, as revealed in dialog transcripts. In the next section, we consider evidence for partner-specific impacts as revealed by the time course of eye gaze and other behaviors synchronized with linguistic evidence.
5. Partner-Specific Processing It is clear from the evidence in a dialog’s transcript that speakers tailor their utterances to what they know about addressees, and that addressees tailor their interpretations to what they know about speakers. What is not so clear is how and when they do this. The models of interactive
316
Susan E. Brennan et al.
communication described in Section 3 make very different predictions about partner-adapted processing. Recall that according to the message model, processing language in dialog is not so different from processing in monologue; interlocutors take discrete turns, with one listening while the other is speaking and vice versa. Partner-adapted processing is not an issue because words map simply onto meanings; rules of encoding and decoding guarantee successful communication, as long as the transmission channel is not noisy or otherwise defective. The recognition of communicative intention is beside the point. According to the interactive alignment model, processing in dialog is distinctly different from processing in monologue, with an individual’s production and comprehension systems both active at the same time during dialog, so that processing is assisted by an assumed parity between representations for speaking and representations for interpretation. One interlocutor’s behavior primes another’s, such that convergence of their mental representations is largely automatic. Like the two-stage interactive alignment model, the monitoring and adjustment model predicts that processing, at least initially, is automatic and inflexible; people with different perspectives or knowledge default to processing in a way that is not adapted to a partner, and they take account of ‘‘full common ground’’ only later (if ever), as a kind of slow inference or repair. Grounding, on the other hand, assigns an essential role to recognizing and signaling communicative intent; dialog can be viewed as a highly coordinated hypothesis-testing activity that individuals engage in together, where one partner’s presentation (their hypothesis of what their partner will understand) plays a dual role by providing the other person with evidence of how the previous utterance has been understood. Products such as utterances and perspectives are jointly constructed. This sort of model supposes that partner-specific processing is flexible and ‘‘smart,’’ as well as highly incremental. In Section 5, we consider experimental evidence about the products and timing of partner-adapted processing in dialog. We discuss some of our own and others’ behavioral and eye-tracking data that are relevant to the agenda of uncovering a cognitive architecture that could support such effects.
5.1. Global and Local Adaptations It is useful to categorize partner-specific information into two sources: (1) information from a more or less global model of a partner or their characteristics, mentally represented from prior personal experience, from expectations, or else from a stereotype, and (2) feedback that becomes available locally online, from cues that emerge as the dialog unfolds. The first source of information involves some degree of mentalizing about the partner and their intentions. It is available in some form at the start of
Two Minds, One Dialog: Coordinating Speaking and Understanding
317
the dialog (whether in detailed or else quite rudimentary form), and it may or may not be updated as the dialog unfolds. The second source consists of evidence emerging during the interaction about the context or the partner’s needs, perceived from verbal and nonverbal cues. Whether a particular kind of cue evokes mentalizing, and when such mentalizing might occur, depends on the attributions made to the cue (as we will see presently). Presumably if a cue satisfies all the criteria to be considered as communicative (including being able to be mediated by intention, as outlined in Section 4), mentalizing is involved; if the cue satisfies only the first two (is informative and can be perceived), then it may support interpersonal coordination but not involve mentalizing. Both global and local sources of partner-specific information have the potential to guide production of utterances. In one study (Brennan, 1991), students were led to believe they were interacting via text with either a remotely located student or else a computer that could interpret natural language; the task was to retrieve information to fill in the missing cells of a spreadsheet database about hypothetical students and their characteristics. The answers were provided by a confederate (blind to whether she was assumed to be human or computer), were entirely rule-based, and in a given dialog, took the form of either short elliptical and telegraphic turns, or else complete sentences that reused syntax and word choice from the students’ original questions. Those who believed they were communicating with a natural language interface began the dialogs by typing telegraphic utterances, whereas those who believed they were communicating with a remotely located person began with longer, grammatical sentences. But this global force for audience design was trumped midway through the session by the remote partner’s online feedback; by the end of the sessions, students’ questions converged in form with their partners’ answers (to either short utterances or complete sentences), regardless of whether the partner was believed to be human or computer. Although this pattern of adaptation was true for some kinds of measures (e.g., lexical choice and syntactic form), it was not true for all measures. For instance, students used third-person pronouns relevant to the task equally often in all conditions (e.g., Where does he work?), showing that they expected their (human or computer) partner to model connectedness of utterances within the dialog context, but they rarely used first- or second-person pronouns with computer partners compared to with humans (e.g., Can you tell me whether. . .?), suggesting that they did not expect to have social context with computers. Often, local cues (e.g., feedback about the informational needs of a conversational partner) corroborate the information available through global cues (e.g., about a partner’s identity). This can make it challenging to tease apart effects of these two potentially independent factors, and most studies do not attempt to do so. In a recent study (Kuhlen & Brennan, 2010), we teased apart expectations about a partner from cues. Speakers
318
Susan E. Brennan et al.
learned jokes in the form of brief stories and told them to addressees who also were naı¨ve subjects. The instructions led speakers to expect either attentive addressees (who would have to retell the jokes later), or distracted addressees (working on a secondary task while listening to the jokes). As expected, attentive addressees gave more feedback than distracted addressees. Thus, while (globally) expecting attentive or distracted addressees some speakers encountered behavior contrary to their expectation (based on local cues in form of addressee feedback). We found that the tellings of the jokes were shaped both by speakers’ expectations and by addressees’ cues. Speakers with attentive addressees told the jokes with more vivid detail than those with distracted addressees, but only when they expected attentive addressees. Speakers with distracted addressees put less time into the task than did those with attentive addressees, but only when they had expected the distracted addressees to be attentive (when the initial expectation did not match the unfolding evidence). These results suggest that feedback cues are interpreted against prior expectations or attributions about a partner. A similar pattern of partner-specific adaptations was found in speakers’ speech-accompanying gestures (Kuhlen, Galati, & Brennan, 2010). Independent of adjustments made in speaking, speakers gestured more frequently when their expectations were consistent with addressees’ feedback, supporting the idea that speakers put more effort into narrating when their global expectations of addressees’ needs are matched by local cues provided by addressees in the interaction. Moreover, speakers used more gestures that were produced in the body’s periphery when narrating to attentive addressees whom they had also expected to be attentive, supporting the idea that consistency between local and global cues is associated with more vivid narration. These results suggest that global information established prior to the interaction is updated by local cues provided within the interaction in a highly interactive manner, resulting in a cascade of adjustments in speakers’ narrating style that affects both speech and gesture. A clear example of cues intended by one partner to be recognized by the other as communicative (and recognized by the other partner as such) comes from Brennan’s (1990) study (reported in Brennan, 2004). Pairs of subjects in adjoining cubicles discussed target locations on identical maps displayed on networked computer screens. The task was for the matcher to get his car icon parked in the same target location displayed on the director’s screen. In one condition, the director could visually monitor the progress of the matcher’s car; in the other, she could not. In both conditions, they could talk freely; in both, the matcher saw only his car icon displayed over the map. Over 80 trials with different targets, whether the director could see the matcher’s movements toggled every 10 trials (and the matcher was informed of this switch at the start of each block of 10 trials). So the director had local cues of what the matcher understood, updated moment by moment, while
Two Minds, One Dialog: Coordinating Speaking and Understanding
319
the matcher had only global information (that he needed to keep in memory) about what his partner could see. When they could not visually monitor the matchers’ progress in the task, directors proposed descriptions in installments, and matchers responded verbally to clarify, modify, and eventually, ratify descriptions of the target location. Meaning was established incrementally and opportunistically, with both partners sharing the responsibility for doing so (as with the earlier dialog about the tangram that looked like a monk). The matcher’s icon typically arrived at the correct target location early in the trial; but they still needed additional verbal turns during which they grounded their meaning. It was up to the matcher to propose when he thought he understood well enough for current purposes and go on to the next trial. In contrast, when the director could monitor the matcher’s icon’s movements, she took the responsibility for determining when the matcher indeed understood the target location, and since this was based on direct visual evidence, she took responsibility of proposing when to go on to the next trial, sometimes suspending speaking midword as soon as the matcher reached the target, as here (note: asterisks denote overlapping speech): Director: ok now we’re gonna go over to M-Memorial Church? and park right in Memorright there that’s *good.* Matcher: *that’s* rude to park in the church. Director: hheh heh
Grounding with visual evidence was much more efficient, although partners adjusted their effort so that performance was equally accurate with and without visual evidence. What is particularly striking is that even though matchers’ screens appeared the same to them regardless of what condition they were in (there were no cues to remind them of what directors could see), they easily adapted to what they knew about their unseen partners’ perceptual context by providing or withholding backchannels; when they knew the directors could see their cars, they used their icon moves not only as instrumental acts for doing the task, but also as communicative acts (Brennan, 2004). Each time the visual evidence condition toggled, matchers adapted to this global partner-specific information immediately (almost always without discussion). Directors packaged location descriptions into installments and grounded these with the online local cues provided by matchers’ icon movements. So in this study, directors used
320
Susan E. Brennan et al.
local cues provided moment by moment by their partners; these were verbal when they could not see their partners’ moves, and visual when they could. At the same time, matchers, who were aware when their moves could or could not be seen, used that simple bit of information to guide whether to produce backchannels or not.
5.2. Speakers Adapt Utterances for Their Addressees Interlocutors often share considerable context beyond being speakers of the same language, including that due to previously established common ground or to being copresent in the same perceptual environment. Therefore, what might appear to be a case of a speaker tailoring an utterance to an addressee’s needs or knowledge may occur simply because that is what is easiest for the speaker to do. For example, within a discourse the first articulation of a word (when it represents new and sometimes unpredictable information) tends to be longer in duration and more intelligible than repeated mentions of the same word (or other uses in which it is more predictable) (Bard et al., 2000; Fowler & Housum, 1987; Lieberman, 1963; McAllister, Potts, Mason, & Marchant, 1994; Samuel & Troicki, 1998). Listeners can pick up on attenuation as a marker of information status, such that when they hear an initially ambiguous word that is destressed, they assume that it refers to the given item in an array (that also includes a new distractor with the same phonological onset); but when the word is stressed, they assume that it refers to the new item (Dahan, Tanenhaus, & Chambers, 2002). The question is whether variations due to attenuation are communicative, for the benefit of the addressee (as assumed by Nooteboom, 1991; Samuel & Troicki, 1998), or whether this is a generic sort of variation produced automatically by speakers (Dell & Brown, 1991) that would likely occur without any addressee present. To establish that a variation in speaking is not egocentric but is produced truly as a form of audience design, ‘‘for’’ a partner, the perspectives of the speaker and the addressee must be distinguishable (for discussion, see Keysar, 1997; Lockridge & Brennan, 2002). Moreover, the speaker must be aware of her addressee’s distinct perspective or needs in time to incorporate this information into speaking; if relevant information about the addressee’s distinct perspective is not available in time, then a failure to incorporate it does not constitute a fair test of whether the early stages of speaking are egocentric (Horton & Gerrig, 2005a, 2005b; Kraljic & Brennan, 2005). When telling stories, speakers leave out some details and include others; for example, they are more likely to mention atypical instruments and omit typical ones (which are implicitly associated with a particular verb or situation). A study by Brown and Dell (1987) tested whether this typicality effect is egocentric, or else driven by the needs of particular addressees. Eighty speakers read silently and then recounted aloud to a confederate addressee
Two Minds, One Dialog: Coordinating Speaking and Understanding
321
very short stories in which an instrument (either typical or atypical in association with a main verb) played a key role; the confederate either had or did not have a picture illustrating the main action and instrument (and the speaker subject knew what the addressee could see). Whether the addressee could see the instrument or not had no effect on whether and how speakers mentioned it; Brown and Dell concluded that the typicality effect was not an adjustment to the addressee’s needs, but simply automatic for the speakers. However, their addressees (both of them) heard the same stories over and over, so actually knew them better than the speakers did; it is possible that the cues they provided signaled this. A subsequent study by Lockridge and Brennan (2002) had speakers tell similar stories, but to naı¨ve addressees who had never heard the stories before, and who saw or did not see the pictures. Speakers were more likely to mention atypical instruments, to mention them early (within the same clause as the action verb), and to mark them as indefinite, when speaking to addressees without pictures than to addressees with pictures. This suggests that when addressees have real needs (and presumably signal them somehow), speakers take this into account in the syntactic choices they make early in an utterance (Lockridge & Brennan). In another study, we examined the extent to which speakers attenuated elements of a longer story ‘‘for’’ themselves or ‘‘for’’ their addressees (Galati & Brennan, 2010a). Twenty naı¨ve speakers spontaneously told and retold the same Road Runner cartoon story twice to one naı¨ve addressee and once to another (counterbalanced for order: Addressee1/Addressee1/Addressee2 or Addressee1/Addressee2/Addressee1). This design enabled us to tease apart tellings of the story that were new versus old to speakers from those that were new versus old to addressees. We found that attenuation was mainly due to whether the material was new or old to the addressee rather than to the speaker; stories retold to the same (old) addressee were attenuated compared to those retold to the new addressee. This was true for a variety of linguistic units, including number of words, amount of detail, and number of events realized in the stories. Although lexically identical expressions by a same speaker were no different in length when addressed to a new versus an old addressee, expressions that had been addressed to new partners were more intelligible to a later group of listeners than when they had been addressed to addressees who had heard them before. This study provides strong evidence that attenuation is driven at least in part by the needs of addressees (in fact, it found little if any evidence for speaker-driven attenuation). The findings contrast sharply with that of Bard et al. (2000), who found that attenuation in articulation of repeated expressions depended on speakers’ experience rather than addressees’ (although it should be noted that their study did not tease apart speakers’ from addressees’ perspectives; all addressees where hearing the expressions for the first time). We found a similar pattern of partner-specific attenuation in these speakers’ gestures (Galati & Brennan, 2010b). Speakers produced fewer
322
Susan E. Brennan et al.
representational gestures overall in retellings to old addressees than to new addressees. The gestures produced in stories retold to old addressees were also smaller and less precise than those retold to new addressees (a for-theaddressee effect), although gestures were also attenuated over time (the only comparison from this experimental corpus that showed any for-the-speaker effect). These data support the conclusion that gesture production is guided by both the needs of addressees and automatic processes by which speakers do what is easiest for themselves. Although Bard et al. (2000) (in their measures of duration and intelligibility) found no audience design effect at the grain of pronunciation of repeated words, Bard and Aylett (2000) did find audience design at the grain of referring expressions; their speakers marked expressions as definite when appropriate given the addressee’s knowledge. The authors proposed a ‘‘dual-process model’’ in which automatic processes are modular and cannot take partner-specific context into account while other, more flexible processes can. But given the audience design effects on articulation that we found in Galati and Brennan (2010a), the modularity claim seems hard to defend. It may be that audience design effects on articulation are either produced inconsistently or that they are difficult to detect. On the other hand, a pattern of variable findings would be consistent with a system whose architecture allowed information to be incorporated into planning in a probabilistic (constraint-based) fashion (e.g., Jurafsky, 1996; MacDonald, 1994; Tanenhaus & Trueswell, 1995). A claim of modularity based on a null finding of audience design might be convincing if every stone has been overturned, and if the information in question is available early enough to impact planning (for discussion, see Brennan & Hanna, 2009; Kraljic & Brennan, 2005). Variability in pronunciation is influenced by multiple factors. Hwang et al. (2007) examined the extent to which articulation may be governed by priming as well as by a conversational partner’s communicative needs, using Korean-born speakers of English as a second language (L2). Ambiguities arise when non-native speakers fail to make L2 phonetic contrasts that are absent in their native language (L1). Korean speakers lack the voicing contrast b/p (‘‘mob’’ vs. ‘‘mop’’) and the vowel contrast ae/E (‘‘pat’’ vs. ‘‘pet’’), so that when they speak Korean-accented English, the first words in each of these pairs are likely to be neutralized to sound like the second words. In two referential communication experiments, subjects who were Korean speakers of English spontaneously produced target words (e.g., ‘‘mob’’). A confederate partner either primed the target words with a rhyming word (e.g., asking ‘‘What is below hob?’’) or did not prime them, and the referential contexts required pragmatically distinguishing two contrasting words (‘‘mob’’ adjacent to ‘‘mop’’ in the array), or did not. The Korean speakers produced more English-like phonetic targets in both the priming and pragmatic conditions (vowel duration was used to
Two Minds, One Dialog: Coordinating Speaking and Understanding
323
signal both contrasts). Moreover, Korean speakers were primed to make the disambiguating contrast when interacting with an English speaker but not with another Korean speaker of English. These results show that Korean speakers speaking English (L2) can be led to produce a phonetic contrast that they do not have in L1 both when they are primed to do so and when their addressees need them to do to resolve an ambiguous expression. Sections 5.1 and 5.2 have reviewed some studies of audience design in which interlocutors with distinct perspectives incorporate their partners’ knowledge or needs rather than ignoring them or taking them into account at a late stage of processing. But we have not yet addressed the question of how perspectives (whether of self or other) are suppressed, selected, or updated moment by moment.
5.3. Addressees Adapt Utterance Interpretations to Speakers According to the grounding framework, just as speakers design utterances for their addressees, addressees interpret utterances in the context of what they know about speakers. This means that the same words may be interpreted differently depending on who utters them. In a referential communication experiment that incorporated interaction between confederate speakers and naı¨ve addressees, addressees’ initial looks to familiar target objects (that they had previously grounded during interaction with a speaker) were delayed by a few hundred ms when the same speaker uttered an entirely new expression for the familiar object, but not when a new speaker uttered the same new expression (Metzing & Brennan, 2003). The conclusion was that speakers and addressees ground ‘‘conceptual pacts’’ or shared perspectives that are not only partner-specific but also quite flexible: Addressees were quick to abandon the precedent of a familiar expression when interacting with a new speaker; their first looks to the target were not delayed when the new speaker used the new expression. This finding, that addressees experience interference or slowed processing when a conceptual pact (previously grounded with a particular speaker) is broken, has been replicated with young children, who show the effect when a speaker abandons a precedent for a new term without any apparent reason, but not when a new speaker introduces a new term (Matthews, Lieven, & Tomasello, 2008). These findings and related findings (e.g., Brown-Schmidt, 2009; Nadig & Sedivy, 2002) are incompatible with interactive alignment theory that seeks to explain convergence from priming alone (Pickering & Garrod, 2004), and in which the speaker’s identity should not matter. Addressees do not inflexibly map expressions onto referents; within a pragmatic context (Grice, 1975), the identity of the speaker can be part of what is represented. Finally, global information (specifically about a speaker) can interact with local information (from cues that emerge during dialog or speaking). That is, listeners interpret cues against the attributions that they make about
324
Susan E. Brennan et al.
those cues (Kuhlen & Brennan, 2010). For instance, when listeners hear a speaker’s disfluency just before a referring expression, they interpret it online as evidence that the speaker is in the process of saying something difficult (Arnold, Tanenhaus, Altmann, & Fagnano, 2004)—unless they have a stable attribution for the disfluency (the speaker has agnosia; Arnold, Hudson-Kam, & Tanenhaus, 2007).
5.4. Simple or ‘‘One-Bit’’ Partner Models It may be no coincidence that experiments that show audience design early in processing involve partner-specific information that is not only clear, but also already-computed and quite simple. In such experiments, what a partner needs is often captured by only two alternatives: my partner can see what I’m doing, or not (Brennan, 2004; Nadig & Sedivy, 2002); my partner can reach the object she’s talking about, or not (Hanna & Tanenhaus, 2004); my partner has a picture of what we’re discussing, or not (Lockridge & Brennan, 2002); my partner and I have spoken about this before, or not (Galati & Brennan, 2010a; Matthews et al., 2008; Metzing & Brennan, 2003); my partner is currently gazing at this object, or not (Hanna & Brennan, 2007); my partner needs to distinguish this referent from a competitor, or not (Hwang et al., 2007); my partner is a young child, as opposed to older (Shatz & Gelman, 1973); or my partner is a native speaker of English, or not (Bortfeld & Brennan, 1997). In these situations, an interlocutor may represent information in working memory about a partner’s state as a simple either/or cue that can be flexibly updated as the situation changes. The findings of audience design in these situations demonstrates that a ‘‘partner model’’ need not entail a detailed record of all of the knowledge one partner has about what the other is likely to know (as well as what the other does not know, as pointed out in a critique by Polichak & Gerrig, 1998). In contrast, a simple ‘‘one-bit’’ model that does not require complex inferences or elaborate maintenance or updating could facilitate rapidly partner-adapted processing, even when two partners have distinct perspectives (Brennan & Hanna, 2009; Galati & Brennan, 2010a). In the next section, we consider evidence from brain imaging studies about the neural circuits that may support partner-adapted processing, both by interpreting local cues and by maintaining simple models of interlocutors’ intentions, perspectives, or communicative needs.
6. Neural Bases of Partner-Adapted Processing Our cognitive/behavioral research program has followed the assumption that partner-specific adaptation during communication can be explained by general principles of memory and cognitive processing, rather
Two Minds, One Dialog: Coordinating Speaking and Understanding
325
than by special cognitive modules that either give priority to an egocentric perspective (Horton & Keysar, 1996; Keysar, Barr, Balin, et al., 1998; Keysar, Barr, & Horton, 1998; Keysar et al., 2000; Pickering & Garrod, 2004) or automatically restrict referential interpretation to what is in common ground (a position attributed to Clark & Carlson, 1981 by Barr & Keysar, 2002). Our studies and others that allow for spontaneous interaction between interlocutors (e.g., Brown-Schmidt, 2009; Brown-Schmidt et al., 2008; Hanna & Brennan, 2007; Kraljic & Brennan, 2005) demonstrate that partner-specific effects can emerge early in processing, and show no evidence for modular or two-stage (early egocentric, late partner-specific) processing models. We find that the evidence supports a cognitive architecture for language processing and communication that combines the available information in a parallel, constraint-based, and probabilistic fashion (Brennan & Hanna, 2009; Horton & Gerrig, 2002, 2005b; MacDonald, 1994; Metzing & Brennan, 2003; Tanenhaus & Trueswell, 1995). However, the behavioral evidence does not tell us precisely how such flexible, partner-adapted processing is achieved in the brain. Imaging studies have revealed multiple neural circuits that appear to aid and abet everyday communication. These circuits handle a wide variety of cues and functions. Cues relevant to communication include gesture, eye gaze, nonlinguistic verbal cues, contrastive stress and other prosodic cues, and disfluencies. Relevant functions that may make use of these cues include speaking, linguistic parsing, postural and motor coordination during joint action, monitoring a partner’s orientation or attention, evoking person stereotypes and other world knowledge, and last but certainly not least, mentalizing about their intentions or beliefs (Theory of Mind). Mapping the circuits that underlie these functions and discovering how these functions could work together requires deploying cognitive/behavioral tasks that preserve the essential aspects of communication. In this section, we discuss some recent and intriguing findings about the neural underpinnings of language and communicative processing that are potentially relevant to a more complete account of adaptive processing.
6.1. Mirroring The idea that the production of speech relies on the same motor routines and representations as the interpretation of speech has been around for a long time (Galantucci, Fowler, & Turvey, 2006; Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). Over the past decade and a half, much evidence has accumulated that people perceive and understand the actions of others by relying on their own motor routines, using a common coding for both. Individual mirror neurons, activated both when an action is performed and when it is observed, have been identified in primates (di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti, 1992) and
326
Susan E. Brennan et al.
are presumed to exist in humans (Iacoboni et al., 1999). The human ‘‘mirror system’’ comprises a network that includes areas in the premotor cortex (PMC) and parietal cortex (in particular, the anterior intraparietal sulcus, aIPS), with input from the posterior superior temporal sulcus (pSTS) (Van Overwalle & Baetens, 2009). The existence of a more or less direct perception–action link is proposed to help people detect each other’s goals by helping them simulate another’s state, as ‘‘nature’s way of getting the observer into the same ‘mental shoes’ as the target’’ (Gallese & Goldman, 1998, p. 497). As Sebanz and Knoblich (2008) have pointed out, the mirror system has been misunderstood by some as being an inflexible mechanism that automatically supports mimicry, and hyped by others as being the explanation for all of social cognition. Recent accounts by these authors and others (e.g., Bekkering et al., 2009) argue that the truth is somewhere in between: the mirror system is recruited for rapid processing of a wide variety of cues and provides input to many kinds of processes, including those that support language, communication, and other forms of joint action. The perception–action links that the mirror system provides do not support only mirroring; arguably, most of the actions that people do jointly involve complementary or noncongruent actions rather than imitative or congruent ones, and the mirror system is more active during the preparation of complementary than imitative actions (NewmanNorlund, van Schie, van Zuijlen, & Bekkering, 2007). Currently, the value of the mirror system is presumed to be in facilitating the recognition of a partner’s goal and the monitoring of outcomes of actions, rather than literally reproducing specific action primitives; otherwise it would be of limited use, as much of the time people would not be well served by an imitation reflex and since the mapping of action to goal is not 1:1 (for discussion, see Bekkering et al.; Van Overwalle & Baetens, 2009). In this more flexible role—as a way to understand (but not mimic) the perspective of another by simulation—the mirror system could support partner-adapted processing by monitoring cues about a partner or about the objects in a task and rapidly updating the state of a simple or ‘‘one-bit’’ partner model. This would explain why interlocutors sometimes adapt rapidly to their partners’ needs or knowledge rather than defaulting to behavior that appears to be egocentric.
6.2. Theory of Mind Designing an utterance or action with regard to what a partner knows, or recognizing an utterance or other action as communicative (Grice, 1957, 1975), presumably involves mentalizing, or attributing intention to another.
Two Minds, One Dialog: Coordinating Speaking and Understanding
327
Mentalizing involves neural circuitry that is usually thought to include (1) the medial prefrontal cortex (mPFC),2 (2) the bilateral temporoparietal junction (TPJ),3 and (3) the precuneus (BA 7)4 (e.g., Ciaramidaro et al., 2007; Van Overwalle & Baetens, 2009; Vogeley et al., 2001). These areas are often considered to be core parts of a ToM network, activated during tasks that require taking into account another person’s mental state. The classic ToM task (as tested on children in various stages of development) involves having a child witness an actor learning of the location of a hidden object, witness the object being rehidden in a different location unknown to the actor, and then predict where the actor will look for the object (Wimmer & Perner, 1983). The majority of imaging studies that aim to probe the ToM network in adults have been conducted in noninteractive settings in which subjects in an fMRI scanner read text stories about characters’ true or false beliefs, perspectives, intentions, or motivations, compared to texts about characters’ physical characteristics or objects (that do not require ToM to understand; Saxe & Kanwisher, 2003). Generally speaking, mentalizing has been proposed to be a fast, automatic process rather than a slow, inferential one (Kampe, Frith, & Frith, 2003; Scholl & Leslie, 1999). This is consistent with the behavioral findings reported earlier, that interlocutors adapt processing to their partners from the early moments of processing. Within the ToM mentalizing circuit, the TPJ appears to be implicated whenever there are early and automatic inferences about another’s goal, with the mPFC implicated during inferences about another’s traits that unfold more slowly (Van der Cruyssen, Van Duynslaeger, Cortoos, & Van Overwalle, 2009; Van Duynslaeger, Van Overwalle, & Verstraeten, 2007; see Van Overwalle & Baetens, 2009 for discussion). So ToM as a network may underlie not only immediate partner-adapted processing (in the TPJ region), but also the slower, inferential, adjustments to a partner that may unfold after an initially ‘‘egocentric’’ response (in the mPFC). 6.2.1. Distinguishing Kinds of Intentions: Private, Social, and Communicative Some of the variability in findings about the shape of the network hypothesized to underlie ToM may be due to lack of precision in fMRI imaging, and some may be due to limitations in the kinds of intentions depicted in the stimulus stories. A study by Ciaramidaro et al. (2007) took a nuanced look at the neural bases of ToM, by having individuals in the scanner read short comic strips that distinguished (1) the private intentions of characters from their social intentions toward other characters, and (2) within these social 2 3 4
Some studies label this Theory of Mind area as the anterior paracingulate cortex within the mPFC. Although note that some studies label this as the posterior STS, which extends to the TPJ. Some studies (Gallagher & Frith, 2003; Gallagher et al., 2002; Kampe et al., 2003) implicate the temporal poles (BA 38) in the ToM circuit.
328
Susan E. Brennan et al.
intentions, communicative from noncommunicative intentions. The ToM areas mPFC and TPJ were both found to be crucial, but were activated differentially depending on the kind of intention being recognized. The right TPJ and precuneus were active in the processing of all types of prior intentions, with the anterior paracingulate cortex in the mPFC and the left TPJ active when processing social intention; in fact, in these comparisons the left TPJ was active only when processing communicative intention. The evidence from this study suggests four (rather than three) core parts for the ToM network, with distinct roles for both the left and right TPJ areas. It is possible that previous studies that failed to find a clear role for the left TPJ during mentalizing used stimuli that did not require recognizing communicative intentions; the authors suggest that the left TPJ may be a fourth ToM area activated by the recognition of intentions that are specifically intended as communicative. 6.2.2. Joint Activation During Interpersonal Interaction While mentalizing about the intentions of characters in a story almost certainly overlaps with the mentalizing involved in thinking about an interlocutor’s knowledge or communicative needs, a reading task probably misses some of the essential aspects of interacting with a partner in dialog. For instance, most ToM stimuli texts are written about characters in the third person rather than the first or second person, and most fMRI scanner tasks do not probe contingently unfolding social interaction between partners, with a few notable exceptions. A few studies have used interactive games with real or simulated partners. In one series, neural activation was examined while pairs of partners playing a ‘‘tacit communication game’’ (Noordzij et al., 2009) in which ‘‘senders’’ invented new ways of conveying their communicative intentions to ‘‘receivers’’ using entirely graphical means. Senders had to figure out how to move icons so that receivers could distinguish instrumental moves from moves intended to instruct them about where they should move their own icons. The perspectives of the two partners were known (by both) to be different, with one person’s icon being inherently more ambiguous than the other (a triangle that could be oriented in three ways, a rectangle that could be oriented in two ways, or a circle for which orientation did not matter). In this task, communication was interactive, incremental, and graphical; both communicative and control trials evoked identical motor actions and graphics so that activation related to the planning and interpretation of communicative intent could be distinguished from that related to noncommunicative signals, visual motion, and hand movements. In each session, fMRI data were collected from either the sender or the receiver. Remarkably, during communicative trials senders and receivers both showed activation in one of the same ToM regions: the right pSTS, but not in the left pSTS. This right activation was modulated by the degree of ambiguity in
Two Minds, One Dialog: Coordinating Speaking and Understanding
329
the communicative signal (e.g., a sender’s circle could not easily depict how a receiver should orient their own triangle), but not by visual appearance or sensorimotor complexity. In addition to the right pSTS, mentalizing about communicative intent coactivated the mPFC. That the same ToM circuitry implicated in recognizing a partner’s (the sender’s) intention is also implicated in predicting how best to signal one’s own intention to a partner (the receiver) suggests that there is a kind of functional parity between signaling one’s own and interpreting a partner’s intentionality. One puzzle in comparing this study with the previous one (Ciaramidaro et al., 2007) is that Noordzij et al. (2009) reported no differential activation whatsoever for communicative action in the left pSTS (which extends into the left TPJ, the region where Ciaramidaro et al. did find activation associated with communicative intention). Whether this apparent inconsistency is due to a task difference remains to be settled. Noordzij et al.’s interactive task differentiated first- and second-person communicative intentions from instrumental acts, whereas Ciaramidaro et al.’s reading task differentiated third-person communicative intentions from other (ToM-associated) intentions. The interactive task required participants to generate communicative intentions as well as to recognize them, whereas they needed only to recognize them in the reading task. And the interactive task used graphical communication, whereas the reading task used language. There are so few imaging studies of communicative intention that it is difficult to interpret the implications of these task differences, but one speculative possibility is that the left TPJ might link ToM activation to language processing networks in the left temporal lobe. 6.2.3. Interactions with Human Versus Computer Partners ToM is associated with predicting the behavior of conspecifics (e.g., Ciaramidaro et al., 2007; Van Overwalle & Baetens, 2009). But does it matter whether an interacting partner is human or computer? Several imaging studies have been conducted using tasks in which subjects interacted with computers or human partners (or else ones they believed to be human) in a prisoner’s dilemma or other payoff game. In one such investigation (Gallagher, Jack, Roepstorff, & Frith, 2002), subjects who believed they were playing a (competitive) rock–paper–scissors game with either a computer or another person showed more activation in only one of the ToM areas with human than computer partners, the anterior paracingulate cortex (mPFC). In another investigation (Rilling, Sanfey, Aronson, Nystrom, & Cohen, 2004), subjects playing interactive games and receiving feedback from supposed partners showed activation in two of the main ToM areas, the mPFC and posterior STS; these areas were activated whether subjects believed their partners were human or computer. The cues that subjects received during the sessions were identical (and automatically generated) in both partner conditions, and ToM was activated in both
330
Susan E. Brennan et al.
kinds of sessions, but activation was higher when subjects believed they interacted with humans (Rilling et al., 2004). This difference in activation may reflect partner-adapted processing that distinguishes human from machine partners, or it may emerge simply from different levels of engagement in the task; but either way, it documents the influence of the same sort of global partner-identity variable that has emerged in behavioral studies (e. g., Brennan, 1991). Recent studies by Krach et al. (2008, 2009) have found consistent results, with activation in the mPFC and right TPJ when interactive games were played with (supposed) human or computer partners; however in Krach et al. (2009), the first of these ToM areas was more activated when the partner was believed to be human than computer. When people played with one of four kinds of (simulated) partners (human, anthropomorphic robot, functional robot, or computer process), there was more activation in both of these ToM areas, the more human-like the partner (Krach et al., 2008). So the difference between interacting with a human partner and a computer partner may be quantitative rather than qualitative (at least for part of the ToM network). These studies suggest to us that under some circumstances ToM processing may be flexible enough to be able to model varieties of an intelligent partner’s ‘‘mind’’ that need not even be human, an idea relevant to the field of ‘‘intelligent’’ computer– human interaction (Don, Brennan, Laurel, & Schneiderman, 1992).
6.3. Distinguishing a Partner’s Perspective from One’s Own: The Role of Executive Control Stimulus stories that require recognizing a single character’s intention presumably require less complex mentalizing than referential communication studies that require distinguishing two perspectives (e.g., one’s own from one’s partner’s or one’s private knowledge from common ground shared with a partner), especially when the two perspectives may in fact be inconsistent (Galati & Brennan, 2010a; Hanna et al., 2003; Metzing & Brennan, 2003; Nadig & Sedivy, 2002). Distinguishing privately held information from common ground presumably requires such mentalizing, as well as executive control to select the appropriate perspective and/or to suppress the inappropriate one. In addition, during dynamic communicative interaction, there is the challenge of keeping track of how a partner’s perspective (or else common ground) changes over time. Imaging studies show that the mentalizing network is recruited when people explicitly prevent themselves from imitating another’s behavior (Van Overwalle & Baetens, 2009), perhaps facilitating the differentiation of self from other (Brass, Derrfuss, & von Cramon, 2005; Brass, Zysset, & von Cramon, 2001; for discussion, see Van Overwalle & Baetens). A study by Vogeley et al. (2001) attempted to distinguish egocentric (SELF) processing from ToM by comparing activation associated with stories about the
Two Minds, One Dialog: Coordinating Speaking and Understanding
331
intentions of another person to stories about the reader’s own perspective. Consistent with other studies, Vogeley et al. found ToM to implicate the mPFC.5 But reasoning about one’s own perspective led to additional activation in the right inferior temporoparietal cortex that did not appear to be associated with ToM (Vogeley et al., 2001). These authors conclude that the right TPJ ‘‘is involved in computing an egocentric reference frame’’ (p. 179), and that ToM and SELF interact in the right prefrontal cortex, an area that has been associated with executive control processes. To the extent that taking another’s perspective requires inhibiting one’s own, executive control seems to play a role by inhibiting responses that are either overlearned or imitative (Brass et al., 2005). Concerning imitation, there is some evidence that what has been proposed by some to be a largely automatic tendency to imitate (governed by the mirror system; see, e.g., Pickering & Garrod, 2004) is routinely mediated by executive control, so that people can avoid imitating others when such behavior might be costly or inappropriate. Imitative finger gestures are actually initiated more quickly when working memory load is increased (with a two-back task) than without such load (Van Leeuwen, van Baaren, Martin, Dijksterhuis, & Bekkering, 2009), suggesting that executive control is the rule (for restraining this sort of imitation from the start) rather than the exception (for adjusting this behavior later in planning). More evidence for the importance of executive control in suppressing egocentric behavior is implicated by Brown-Schmidt’s (2009) visual worlds eye-tracking study of communication. To test the role of executive control, individual differences were first measured using a Stroop task. Then subjects interacted with a confederate partner to do a referential communication task that included both shared and privileged information; subjects had to differentiate what they knew from what the partner knew. Interaction was mostly unscripted, with the confederate partner asking the subject for information using expressions that were temporarily ambiguous between an object they could both see and one that only the subject could see. Some of the time immediately after the partner asked for information, their display would disappear so that the task would be interrupted before the subject could respond (thus interrupting the grounding process), and 2 s later, the display would reappear and task would resume again. This innovative manipulation aimed to test whether subjects closely monitored the grounding process in order to keep track of what their partners were actually likely to know. The findings were clear: Subjects who were better at suppressing Stroop interference were better able to restrict themselves to considering shared (rather than private) information in the early moments of responding to their partner’s temporarily ambiguous questions. And they 5
Vogeley et al. (2001) also found ToM activation in the left temporopolar cortex.
332
Susan E. Brennan et al.
were better able to keep track of which expressions had been verbally grounded (and could therefore be assumed to be in common ground) as opposed to which had been uttered but interrupted before being grounded (these were treated as referring to information that was still private). This is a remarkable demonstration of not only the role of executive control in perspective taking, but also the ability of interlocutors to keep detailed track of the mutual knowledge product resulting from the grounding process. If these kinds of interactive tasks could be probed with imaging, the workings of the ToM network might be further clarified. It may be possible to use imaging to delineate a role for the mentalizing system in influencing executive control over other neural circuits (including those associated with the mirror system). Such findings would be consistent with the choice and timing evidence from our and Brown-Schmidt’s experiments and could provide a mechanism by which partner-adapted information that has already been perceived or computed could have an early impact.
6.4. Mentalizing Versus Mirroring Recall that the goal of this review is to better understand how speakers and addressees take one another into account during processing. The behavioral evidence of adaptive processing that we wish to explain emerges from not only cues that unfold during interaction (locally driven) but also simple models of a partner (globally driven; see Section 5.1). This distinction can be mapped onto its neural counterpart, the mirror system (driven by sensorimotor resonance by which one partner simulates another’s perspective) versus the mentalizing system (which involves more conceptual perspective taking). How might the mentalizing and mirroring systems work together to support flexible partner-adapted processing? The answer is not clear. In a comprehensive meta-analysis of over 200 fMRI studies, Van Overwalle & Baetens (2009) considered three possibilities: (1) that mentalizing and mirroring might show anatomical overlap and share a functional core, (2) that they might not overlap but both be active during the same sorts of tasks, or (3) that they might be activated independently. They found the mirroring and mentalizing systems to be ‘‘rarely concurrently active’’ (p. 564), and so concluded that they are complementary, with neither subserving the other. This conclusion does not seem like the end of the story, however. These authors acknowledge ‘‘the lack of clear anatomical definitions for the pSTS and the TPJ’’ and warn that the overlap in their patterns of activation ‘‘cautions against making any strong distinction between them’’ (p. 568). Recall that the TPJ is implicated in rapid mentalizing. However, the seeds of an answer may exist in Noordzij et al.’s (2009) study, which aimed to distinguish mentalizing from mirror networks. Here, the right pSTS was activated not only in recognizing communicative actions, but also in planning actions intended to be recognized as
Two Minds, One Dialog: Coordinating Speaking and Understanding
333
communicative (see Section 6.2.2). The right pSTS, traditionally associated with the mirror system, appeared to participate in a ToM pattern of activation that included mPFC activation, as well as coinciding with the deactivation of the mirror system’s sensorimotor areas (which were most deactivated during planning communicative action). Unfortunately this study is too new to have been covered in the meta-analysis; however, it causes us to question Van Overwalle & Baetens’ conclusion that mirroring and mentalizing are independent for two reasons. First, it may have been premature to conclude that pSTS activation is indicative only of mirroring and not of mentalizing (especially given Van Overwalle & Baetens own caveat), and so the two systems may share a functional core after all. Second, it is probable that few if any studies in the meta-analysis involved interactive communication between partners (the analysis did not include the other studies deploying interactive tasks that we have surveyed here: Krach et al., 2009; Rilling et al., 2004). So it may be that deploying measures that preserve essential aspects of communicative interaction (e.g., Suda et al., 2010) along with tasks that evoke recognition and planning of communicative intentions could show more clearly how these two essential networks might work together.
6.5. Cues Hypothesized to Support Partner-Adapted Processing In this section, we consider several cues relevant to spoken communication. As we have argued from eye-tracking and other behavioral evidence (e.g., Brennan & Hanna, 2009; Metzing & Brennan, 2003), partner-adapted processing can be both rapid and flexible. Thus it makes sense to investigate not only mentalizing as a facilitator of such behaviors, but also the role of cues or local signals about a partner’s needs. Just as affordances in the environment appear to directly support behavior (Gibson, 1977; Norman, 2002), the evidence that unfolds either as feedback from a partner or progress in a joint task could shape an individual’s behavior ‘‘for the partner.’’ Reconsider (from Section 4) the three criteria that for a cue to be ‘‘communicative’’: it must be informative, it must be able to be perceived, and it should be able to be modified by the originator’s intentions (Brennan & Williams, 1995). It can be a challenge to set up behavioral studies of communication that satisfy the last criterion. The ‘‘tacit communication game’’ of Noordzij et al. (2009) accomplished this quite well and found a clear dissociation for moves that signal intention versus (instrumental) moves that do not, when the moves employ otherwise identical perceptual/motor actions. As the neural network(s) associated with processing communicative intentions (whether from local cues or global knowledge) become more well understood, imaging may be able to illuminate communicative processing in ways that are impossible with behavioral studies alone.
334
Susan E. Brennan et al.
6.5.1. Processing Cues That Initiate Social Interaction A dialog begins when one partner recognizes another’s intention to communicate. Calling a partner’s name (an auditory cue) and making eye contact (a visual cue) signal the initiation of social interaction. Both of these cues activate the mPFC (in particular, the right paracingulate cortex) and the left temporal pole of an addressee (Kampe et al., 2003), suggesting that these regions are part of a multimodal circuit that supports recognizing a partner’s intention to communicate. 6.5.2. Voice Cues to Partner Identity Because fMRI studies address anatomical localization but not event-related timing, it is particularly useful to consider electrophysiological evidence from event-related potentials (ERPs) in order to consider the time course with which partner-specific information may have an effect. New evidence from electrophysiological data demonstrates that listeners integrate the content of an utterance with stereotypic information about its speaker from the earliest moments of utterance processing (Van Berkum, van den Brink, Tesink, Kos, & Hagoort, 2008). In Van Berkum et al.’s study, listeners heard utterances (in Dutch) whose content was either congruent or incongruent with stereotypes evoked by the voices in which they were spoken, such as: statements odd for a child speaker but not for an adult (Every evening I drink some wine before I go to sleep), odd for a man but not a woman (I recently had a check-up at the gynecologist in the hospital), or odd for a speaker with a lower-class accent but not an upper-class one (In my free time I enjoy listening to piano music by Chopin). Voice-incongruent utterances evoked reliable N400 waves right from the acoustic onsets of relevant words, at the same early point in time as lexically based semantic anomalies evoke N400s when other semantic information is integrated (Van Berkum et al., 2008). It is remarkable that this incongruity effect of utterance content and speaker stereotype was cued entirely by prerecorded voices (with each presented in a block). It is certainly possible that physical copresence with an interacting speaker in dialog could yield even stronger partner-specific effects, if ERP could be used in this kind of situation. Van Berkum and colleagues next localized this speaker-specific effect. Generally speaking, recognition of a speaker’s identity and characteristics (as evident in the voice) is associated with activation in the right anterior superior temporal sulcus (STS) or temporal pole. Presumably that area provides inputs into language processing in Broca’s Area (BA 44 and 45 in the inferior frontal gyrus, IFG). An fMRI study using the same stimuli as Van Berkum et al. (2008) found more activation in the left IFG (or Brodmann’s Areas 45/47) as well as the right IFG (BA 47) for voice-incongruent sentences than for voice-congruent sentences (although IFG was activated for both kinds of sentences; Tesink et al., 2008). This was interpreted as
Two Minds, One Dialog: Coordinating Speaking and Understanding
335
reflecting effort to unify lexico-semantic information from the utterance with the world knowledge stereotype evoked by the speaker’s voice. Sentences in which voice and message were coherent led to enhanced activation in the bilateral superior temporal cortex (STC, BA 22 extending into BA 41), the right lingual gyrus (BA 18), and the right posterior cingulate cortex (PCC, BA 29). These regions were construed to form a ‘‘unification network’’ for combining linguistic and extralinguistic information, with STC activation proposed to be specific to the congruence between voice and message (as opposed to semantic coherence in general; Tesink et al., 2008). This study did not report any activation in the ToM network. Finally, autism is associated with (and sometimes diagnosed by) ToM deficits. In another study, Tesink and colleagues tested listeners with and without autism spectrum disorder (ASD) using the same voice-incongruent and congruent stimuli. Again, the listeners were able to detect the voiceincongruent messages, showing more activation in the right IFG (BA 47) for speaker-incongruent than congruent messages (Tesink et al., 2008 2009). However, this activation was stronger in listeners with ASD than without; their increased right hemisphere activation in this area over that of non-ASD listeners was interpreted as evidence of compensation, or more effortful processing (perhaps due to difficulty in evoking stereotypes). In addition, non-ASD listeners showed more activity than did ASD listeners in the right ventral mPFC (BA 10) and right ACC (anterior cingulate cortex, BA 24/32) regions (Tesink et al., 2009).
7. Conclusions Psycholinguistic studies of dialog that preserve as many of the natural aspects of spontaneous interpersonal communication as possible (while at the same time achieving sufficient control) have found evidence that speakers and addressees can adapt to each other from the early moments of processing. That is, processing need not be encapsulated from relevant partner-specific information that is straightforward and known in advance. Under some circumstances, speakers can adjust immediately to their addressees’ needs or perspectives, even when these are distinct from their own. The following considerations, we propose, represent useful design considerations for experimental studies that aim to uncover the cognitive and/or neural bases of language processing in communicative contexts, and in particular, partner-specific processing:
To the extent that an experimental task affords behavioral, eye-tracking, or imaging evidence that can be measured independently from evidence in the stimulus events or transcript, this gives the experimenter a window into subjects’ cognitive processing.
336
Susan E. Brennan et al.
The ‘‘language game’’ that subjects are asked to play should be well characterized and staged such that it does not exclude the behavior that it aims to study. To this end, imaging studies with tasks that require subjects to communicate should yield valid data about the kind of processing that underlies language-as-action. Especially useful is evidence that unfolds moment by moment and can be synchronized with events or a transcript, or that can be collected from two interacting partners and synchronized. To experimentally distinguish ‘‘for-the-self’’ from ‘‘for-the-other’’ processing, partners doing a joint task must (at least at some point in the task) have perspectives, needs, or knowledge states that can be operationally distinguished from each other’s. Unless the goal is to study perspective taking under cognitive load, information about one partner’s needs must be available to the other partner in a timely enough fashion to be incorporated into speech planning, articulation, or interpretation—otherwise, one cannot conclude that behavior that seems to be egocentric is actually egocentric. It may be useful for an experimental design to distinguish local (sensorimotor) cues from global cues that are updated less often, or at least to take this distinction into account. It may be useful to characterize cues as to whether they consist of signals intended to be recognized as communicative (in the Gricean sense), or whether they are simply informative. This may determine whether they activate the mentalizing system.
When thinking about how to model partner-adapted processing, it is productive to consider fMRI and electrophysiology data alongside eye-tracking and behavioral studies of communication. We anticipate that timing data from electrophysiology studies and anatomical data from imaging studies have potential to clarify process models that would otherwise be ambiguous. Each approach can shape and inform the kinds of questions that the other can ask, as well as the kinds of cognitive models that it makes sense to propose. Ultimately, plausible cognitive models must be guided by neurological constraints. The distinction between local cues and global partner models that we have developed in our behavioral studies seems to map naturally onto the mirror system and the mentalizing network, respectively. Our findings about how local and global sources of information shape one another to achieve partner-adapted processing lead us to seek out ways in which the mirror and mentalizing systems coexist in the service of language and communicative processing. Executive control appears to play an important role in both kinds of systems: for instance, to inhibit mimicry in the mirror system when necessary, and to select, suppress, or update a global perspective, especially when more than one perspective is implicated in the context (e.g., self vs. other).
Two Minds, One Dialog: Coordinating Speaking and Understanding
337
The mirror system automatically processes social cues that are sensorimotor in nature (e.g., voice, gaze, body motion, backchannels), whereas ToM underlies more conceptual modeling of a partner’s perspectives, needs, and intentions. It remains to be established whether and how these circuits interact. But given the range of processes they support and the likely importance of these processes in interpersonal communication, we expect that they do interact. Previous imaging studies (e.g., as surveyed by Van Overwalle & Baetens, 2009) have failed to clearly establish how they may work together, but this does not mean they are independent, especially since many of the tasks currently in use (especially for ToM) are based on an impoverished notion of what constitutes dialog. Most of the tasks employed so far in ToM studies have not involved interpersonal interaction (or first- or secondperson communicative intent); progress could accelerate with more sophistication in the kinds of language tasks that imagers employ. Another challenge is that sometimes it is difficult to determine exactly which anatomical areas are activated in a particular study. There is much that is unknown about the potential connectivity among regions and about the time course of their activation. And it is extremely difficult to stage an experiment in a scanner that involves speaking; perhaps, new experimental techniques will make it easier to use tasks that preserve the essence of spoken (or even face-to-face) dialog, such as near-infrared spectroscopy (Suda et al., 2010). We also expect that new evidence from imaging studies will help to clarify how ToM and mirroring neural circuits work in concert with those traditionally associated with language, with profound implications for neural models of joint processing both within and between the minds of language users. Understanding how brain networks interact may promote a more nuanced understanding of why communication failures occur, of individual differences in perspective taking, and of the neural basis of communication deficits. In closing, we suggest that to study language use based entirely on individual cognitive processes is to overlook a ubiquitous and astonishing human skill: the coordination of the behavior and mental states of interacting individuals. Interpersonal coordination is so pervasive that it is worthy of scientific investigation in its own right. This skill proceeds in parallel (and is closely integrated) with traditional psycholinguistic processing. For that reason, we advocate studying language processing along with interpersonal coordination in order to understand what it is that minds actually do when communicating.
ACKNOWLEDGMENTS We thank Richard Gerrig, Arthur Aron, and Hoi-Chung Leung for their comments and the Gesture Focus Group for many helpful discussions. This material is based upon work supported by NSF under Grants IIS-0527585 and ITR-0325188. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
338
Susan E. Brennan et al.
REFERENCES Akmajian, A., Demers, R. A., & Harnish, R. M. (1987). Linguistics: An introduction to language and communication (2nd ed.). Cambridge, MA: MIT Press. Altmann, G. T. M., & Kamide, Y. (2007). The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing. Journal of Memory and Language, 57, 502–518. Arnold, J. E., Hudson-Kam, C. L., & Tanenhaus, M. K. (2007). If you say thee uh—you’re describing something hard: The on-line attribution of disfluency during reference comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 914–930. Arnold, J. E., Tanenhaus, M. K., Altmann, R., & Fagnano, M. (2004). The old and thee, uh, new. Psychological Science, 15, 578–581. Bard, E. G., Anderson, A. H., Sotillo, C., Aylett, M., Doherty-Sneddon, G., & Newlands, A. (2000). Controlling the intelligibility of referring expressions in dialogue. Journal of Memory and Language, 42, 1–22. Bard, E. G., & Aylett, M. (2000). Accessibility, duration, and modeling the listener in spoken dialogue. In: Proceedings of Gotalog 2000, 4th Workshop on the Semantics and Pragmatics of Dialogue, Gotalog, Sweden: Gothenburg University. Barr, D. J., & Keysar, B. (2002). Anchoring comprehension in linguistic precedents. Journal of Memory and Language, 46, 391–418. Basden, B. H., Basden, D. R., & Henry, S. (2000). Costs and benefits of collaborative remembering. Applied Cognitive Psychology, 14, 497–507. Bekkering, H., de Bruijn, E. R. A., Cuijpers, R. H., Newman-Norlund, R., van Schie, H. T., & Meulenbroek, R. (2009). Joint action: Neurocognitive mechanisms supporting human interaction. Topics in Cognitive Science (Special Issue on Joint Action), 1, 340–352. Bortfeld, H., & Brennan, S. E. (1997). Use and acquisition of idiomatic expressions in referring by native and non-native speakers. Discourse Processes, 23, 119–147. Branigan, H. P., Pickering, M. J., & Cleland, A. A. (2000). Syntactic coordination in dialogue. Cognition, 75, B13–B25. Brass, M., Derrfuss, J., & von Cramon, D. Y. (2005). The inhibition of imitative and overlearned responses: A functional double dissociation. Neuropsychologia, 43, 89–98. Brass, M., Zysset, S., & von Cramon, D. Y. (2001). The inhibition of imitative response tendencies. NeuroImage, 14, 1416–1423. Brennan, S. E. (1990). Speaking and providing evidence for mutual understanding. Unpublished doctoral dissertation, Stanford University, Stanford, CA. Brennan, S. E. (1991). Conversation with and through computers. User Modeling and UserAdapted Interaction, 1, 67–86. Brennan, S. E. (1995). Centering attention in discourse. Language and Cognitive Processes, 10, 137–167. Brennan, S. E. (2004). How conversation is shaped by visual and spoken evidence. In J. Trueswell & M. Tanenhaus (Eds.), Approaches to studying world-situated language use: Bridging the language-as-product and language-as-action traditions (pp. 95–129). Cambridge, MA: MIT Press. Brennan, S. E., Chen, X., Dickinson, C. A., Neider, M. B., & Zelinsky, G. J. (2007). Coordinating cognition: The costs and benefits of shared gaze during collaborative search. Cognition, 106, 1465–1477. Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 6, 1482–1493. Brennan, S. E., & Hanna, J. E. (2009). Partner-specific adaptation in dialogue. Topics in Cognitive Science (Special Issue on Joint Action), 1, 274–291.
Two Minds, One Dialog: Coordinating Speaking and Understanding
339
Brennan, S. E., & Kipp, E. G. (1996). An addressee’s knowledge affects a speaker’s use of fillers in question-answering. In: Abstracts of the Psychonomic Society, 37th Annual Meeting (p. 24), Chicago, IL. Brennan, S. E., Kuhlen, A. K., & Ratra, B. (2010). Audience design in answering rhetorical versus sincere questions (in preparation). Brennan, S. E., & Ohaeri, J. O. (1999). Why do electronic conversations seem less polite? The costs and benefits of hedging. In: Proceedings, International Joint Conference on Work Activities, Coordination, and Collaboration (WACC’99) (pp. 227–235), San Francisco, CA: ACM. Brennan, S. E., & Schober, M. F. (2001). How listeners compensate for disfluencies in spontaneous speech. Journal of Memory and Language, 44, 274–296. Brennan, S. E., & Williams, M. (1995). The feeling of another’s knowing: Prosody and filled pauses as cues to listeners about the metacognitive states of speakers. Journal of Memory and Language, 34, 383–398. Brown, P. M., & Dell, G. S. (1987). Adapting production to comprehension: The explicit mention of instruments. Cognitive Psychology, 19, 441–472. Brown-Schmidt, S. (2009). The role of executive function in perspective taking during online language comprehension. Psychonomic Bulletin & Review, 16, 893–900. Brown-Schmidt, S., Gunlogson, C., & Tanenhaus, M. K. (2008). Addressees distinguish shared from private information when interpreting questions during interactive conversation. Cognition, 107, 1122–1134. Cahn, J. E., & Brennan, S. E. (1999). A psychological model of grounding and repair in dialog. In: Proceedings, AAAI Fall Symposium on Psychological Models of Communication in Collaborative Systems (pp. 25–33), North Falmouth, MA: American Association for Artificial Intelligence. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N. (1980). Rules and representations. Behavioral and Brain Sciences, 3(1), 1–62. Ciaramidaro, A., Adenzato, M., Enrici, I., Erk, S., Pia, L., Bara, B. G., et al. (2007). The intentional network: How the brain reads varieties of intentions. Neuropsychologia, 45, 3105–3113. Clark, H. H. (1992). Arenas of language use. Chicago, IL: University of Chicago Press. Clark, H. H. (1994). Managing problems in speaking. Speech Communication, 15, 243–250. Clark, H. H. (1996). Using language. Cambridge MA: Cambridge University Press. Clark, H. H., & Brennan, S. E. (1991). Grounding in communication. In L. B. Resnick, J. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149). Washington, DC: APA Reprinted in R. M. Baecker (Ed.), Groupware and computersupported cooperative work: Assisting human–human collaboration (pp. 222–233). San Mateo, CA: Morgan Kaufman Publishers. Clark, H. H., & Carlson, T. B. (1981). Context for comprehension. In J. Long & A. Baddeley (Eds.), Attention and performance, Vol. IX (pp. 313–330). Hillsdale, NJ: Erlbaum. Clark, H. H., & Fox Tree, J. E. (2002). Using uh and um in spontaneous speaking. Cognition, 84, 73–111. Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13, 259–294. Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1–39. Dahan, D., Tanenhaus, M. K., & Chambers, C. G. (2002). Accent and reference resolution in spoken language comprehension. Journal of Memory and Language, 47, 292–314. Dell, G. S., & Brown, P. M. (1991). Mechanisms for listener-adaptation in language production: Limiting the role of the ‘Model of the Listener’. In D. Napoli & J. Kegl (Eds.), Bridges between psychology and linguistics: A Swarthmore Festschrift for Lila Gleitman (pp. 105–129). Hillsdale, NJ: Erlbaum.
340
Susan E. Brennan et al.
di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V., & Rizzolatti, G. (1992). Understanding motor events: A neurophysiological study. Experimental Brain Research, 91, 176–180. Don, A., Brennan, S., Laurel, B., & Schneiderman, B. (1992). Anthropomorphism: From Eliza to Terminator 2. In: Proceedings, CHI’92, Human Factors in Computing Systems, Monterey, CA (pp. 67–70). DuBois, J. (1974). Syntax in mid-sentence. In C. Fillmore, G. Lakoff, & R. Lakoff (Eds.), Berkeley studies in syntax and semantics, Vol. 1 (pp. III.1–III.25). Berkeley, CA: University of California. Eden, M. (1983). Cybernetics. In F. Machlup & U. Mansfield (Eds.), The study of information: Interdisciplinary messages (pp. 409–439). New York, NY: John Wiley & Sons. Ekeocha, J. O., & Brennan, S. E. (2008). Collaborative recall in face-to-face and electronic groups. Memory, 16, 245–261. Fowler, C. A., & Housum, J. (1987). Talkers signaling ‘new’ and ‘old’ words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language, 26, 489–504. Fussell, S. R., & Krauss, R. M. (1989). Understanding friends and strangers: The effects of audience design on message comprehension. European Journal of Social Psychology, 19, 509–525. Fussell, S. R., & Krauss, R. M. (1991). Accuracy and bias in estimates of others’ knowledge. European Journal of Social Psychology, 21, 445–454. Fussell, S. R., & Krauss, R. M. (1992). Coordination of knowledge in communication: Effects of speakers’ assumptions about what others know. Journal of Personality and Social Psychology, 62, 378–391. Galantucci, B., Fowler, C. A., & Turvey, M. T. (2006). The motor theory of speech perception reviewed. Psychonomic Bulletin & Review, 13(3), 361–377. Galati, A., & Brennan, S. E. (2010a). Attenuating information in spoken communication: For the speaker, or for the addressee? Journal of Memory and Language, 62, 35–51. Galati, A., & Brennan, S. E. (2010b). Audience design in the production of gesture (under review). Gallagher, H. L., & Frith, C. D. (2003). Functional imaging of ‘theory of mind’. Trends in Cognitive Sciences, 7, 77–83. Gallagher, H. L., Jack, A. I., Roepstorff, A., & Frith, C. D. (2002). Imaging the intentional stance in a competitive game. NeuroImage, 16, 814–821. Gallese, V., & Goldman, A. (1998). Mirror neurons and the simulation theory of mindreading. Trends in Cognitive Sciences, 2, 493–501. Gibson, J. J. (1977). The theory of affordances. In R. Shaw & J. Bransford (Eds.), Perceiving, acting, and knowing. Hillsdale, NJ: Lawrence Erlbaum Associates. Giles, H., & Powesland, P. F. (1975). Speech styles and social evaluation. London: Academic Press. Glucksberg, S., Krauss, R., & Weisberg, R. (1966). Referential communication in nursery school children: Method and some preliminary findings. Journal of Experimental Child Psychology, 3, 333–342. Goodwin, C. (1981). Conversational organization: Interaction between speakers and hearers. New York, NY: Academic Press. Grice, H. P. (1957). Meaning. Philosophical Review, 66, 377–388. Grice, H. P. (1975). Logic and conversation (from the William James lectures, Harvard University, 1967). In P. Cole & J. Morgan (Eds.), Syntax and semantics 3: Speech acts (pp. 41–58). New York, NY: Academic Press. Hanna, J. E., & Brennan, S. E. (2007). Speakers’ eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of Memory and Language, 57, 596–615. Hanna, J. E., & Tanenhaus, M. K. (2004). Pragmatic effects on reference resolution in a collaborative task: Evidence from eye movements. Cognitive Science, 28, 105–115.
Two Minds, One Dialog: Coordinating Speaking and Understanding
341
Hanna, J. E., Tanenhaus, M. K., & Trueswell, J. C. (2003). The effects of common ground and perspective on domains of referential interpretation. Journal of Memory and Language, 49, 43–61. Harding, C. (1982). Development of the intention to communicate. Human Development, 25, 140–151. Harris, C. B., Paterson, H. M., & Kemp, R. I. (2008). Collaborative recall and collective memory: What happens when we remember together? Memory, 16, 213–230. Hart, J. T. (1965). Memory and the feeling-of-knowing experience. Journal of Educational Psychology, 56, 208–216. Hollingshead, A. B. (1998). Retrieval processes in transactive memory systems. Journal of Personality and Social Psychology, 74, 659–671. Horton, W. S., & Gerrig, R. J. (2002). Speaker’s experiences and audience design: Knowing when and knowing how to adjust utterances to addressees. Journal of Memory and Language, 47, 589–606. Horton, W. S., & Gerrig, R. J. (2005a). Conversational common ground and memory processes in language production. Discourse Processes, 40, 1–35. Horton, W. S., & Gerrig, R. J. (2005b). The impact of memory demands on audience design during language production. Cognition, 96, 127–142. Horton, W. S., & Keysar, B. (1996). When do speakers take into account common ground? Cognition, 59, 91–117. Hwang, J., Brennan, S. E., & Huffman, M. K. (2007). How non-native speakers make phonetic adjustments to partners in dialogue. In: Abstracts of the Psychonomic Society, 48th Annual Meeting (p. 88 ), Long Beach, CA. Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C., & Rizzolatti, G. (1999). Cortical mechanisms of human imitation. Science, 286, 2526–2528. Jefferson, G. (1973). A case of precision timing in ordinary conversation: Overlapped tagpositioned address terms in closing sequences. Semiotica, 9, 47–96. Jurafsky, D. (1996). A probabilistic model of lexical and syntactic disambiguation. Cognitive Science, 20, 137–194. Kampe, K. K. W., Frith, C. D., & Frith, U. (2003). ‘‘Hey John’’: Signals conveying communicative intention toward the self activate brain regions associated with ‘‘mentalizing’’, regardless of modality. Journal of Neuroscience, 23, 5258–5263. Keysar, B. (1997). Unconfounding common ground. Discourse Processes, 24, 253–270. Keysar, B., Barr, D. J., Balin, J. A., & Brauner, J. S. (2000). Taking perspective in conversation: The role of mutual knowledge in comprehension. Psychological Science, 11, 32–38. Keysar, B., Barr, D. J., Balin, J. A., & Paek, T. S. (1998). Definite reference and mutual knowledge: Process models of common ground in comprehension. Journal of Memory and Language, 39, 1–20. Keysar, B., Barr, D. J., & Horton, W. S. (1998). The egocentric bias of language use: Insights from a processing approach. Current Directions in Psychological Science, 7, 46–50. Kiesler, S., & Sproull, L. (1992). Group decision making and communication technology. Organizational Behavior and Human Decision Processes, 52, 96–123. Krach, S., Blu¨mel, I., Marjoram, D., Lataster, T., Krabbendam, L., Weber, J., et al. (2009). Are women better mindreaders? Sex differences in neural correlates of mentalizing detected with functional MRI. BMC Neuroscience, 10, 9. Krach, S., Hegel, F., Wrede, B., Sagerer, G., Binkofski, F., & Kircher, T. (2008). Can machines think? Interaction and perspective taking with robots investigated via fMRI. PLoS ONE, 3, e2597. Kraljic, T., & Brennan, S. E. (2005). Using prosody and optional words to disambiguate utterances: For the speaker or for the addressee? Cognitive Psychology, 50, 194–231.
342
Susan E. Brennan et al.
Krauss, R. M. (1987). The role of the listener: Addressee influences on message formulation. Journal of Language and Social Psychology, 6, 81–98. Kraut, R. E., Lewis, S. H., & Swezey, L. W. (1982). Listener responsiveness and the coordination of conversation. Journal of Personality and Social Psychology, 43, 718–731. Kronmu¨ller, E., & Barr, D. J. (2007). Perspective-free pragmatics: Broken precedents and the recovery-from-preemption hypothesis. Journal of Memory and Language, 56, 436–455. Kuhlen, A. K., & Brennan, S. E. (2008). Addressees shape speaking: When confederates may be hazardous to your data. In: Abstracts of the Psychonomic Society, 49th Annual Meeting (p. 6), Chicago, IL. Kuhlen, A. K., & Brennan, S. E. (2010). Anticipating distracted addressees: How speakers’ expectations and addressees’ feedback influence storytelling. Discourse Processes, (in press). Kuhlen, A. K., Galati, A., & Brennan, S. E. (2010). Gesturing integrates top-down and bottom-up information: Effects of speakers’ expectations and addressees’ feedback (under review). Lerner, G. H. (1996). On the ‘‘semi-permeable’’ character of grammatical units in conversation: Conditional entry into the turn space of another speaker. In E. Ochs, E. A. Schegloff, & S. Thompson (Eds.), Interaction and grammar (pp. 238–276). Cambridge, MA: Cambridge University Press. Levelt, W. J. M., & Kelter, S. (1982). Surface form and memory in question answering. Cognitive Psychology, 14, 78–106. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of speech code. Psychological Review, 74, 431–461. Lieberman, P. (1963). Some effects of context on the intelligibility of hearing and deaf children’s speech. Language and Speech, 24, 255–264. Lockridge, C. B., & Brennan, S. E. (2002). Addressees’ needs influence speakers’ early syntactic choices. Psychonomic Bulletin & Review, 9, 550–557. MacDonald, M. C. (1994). Probabilistic constraints and syntactic ambiguity resolution. Language and Cognitive Processes, 9, 157–201. MacKay, D. M. (1983). The wider scope of information theory. In F. Machlup & U. Mansfield (Eds.), The study of information: Interdisciplinary messages (pp. 485–492). New York, NY: John Wiley & Sons. Matthews, D. E., Lieven, E. V. M., & Tomasello, M. (2008). What’s in a manner of speaking? Children’s sensitivity to partner-specific referential precedents. In: Proceedings of the LONDIAL Workshop on the Semantics and Pragmatics of Dialog, London, UK. McAllister, J., Potts, A., Mason, K., & Marchant, G. (1994). Word duration in monologue and dialogue speech. Language and Speech, 37, 393–405. Metzing, C., & Brennan, S. E. (2003). When conceptual pacts are broken: Partner-specific effects in the comprehension of referring expressions. Journal of Memory and Language, 49, 201–213. Nadig, A. S., & Sedivy, J. S. (2002). Evidence of perspective-taking constraints in children’s online reference resolution. Psychological Science, 13, 329–336. Neider, M. B., Chen, X., Dickinson, C. A., Brennan, S. E., & Zelinsky, G. J. (2005). Sharing eyegaze is better than speaking in a time-critical consensus task. In: Abstracts of the Psychonomic Society, 46th Annual Meeting (p. 72), Toronto, Canada. Newman-Norlund, R. D., van Schie, H. T., van Zuijlen, A. M. J., & Bekkering, H. (2007). The mirror neuron system is more active during complementary compared with imitative action. Nature Neuroscience, 10, 817–818. Noordzij, M. L., Newman-Norlund, S. E., de Ruiter, J. P., Hagoort, P., Levinson, S. C., & Toni, I. (2009). Brain mechanisms underlying human communication. Frontiers in Human Neuroscience, 3, 1–13. Nooteboom, S. G. (1991). Perceptual goals of speech production. In: Proceedings of the 12th International Congress of Phonetic Sciences, Aix-en-Provence, France, August 19–24 (pp. 107–110), Vol. 1.
Two Minds, One Dialog: Coordinating Speaking and Understanding
343
Norman, D. A. (2002). The design of everyday things. New York, NY: Basic Books. Perryman, G. A., & Brennan, S. E. (2009). Effects of multiple speakers (copresent or not) on dialog context. In: Abstracts of the Psychonomic Society, 50th Annual Meeting, Boston, MA . Pickering, M. J., & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27, 167–226. Polichak, J. W., & Gerrig, R. J. (1998). Common ground and everyday language use: Comments on Horton and Keysar (1996). Cognition, 66, 183–189. Reddy, M. J. (1979). The conduit metaphor—A case of frame conflict in our language about language. In A. Ortony (Ed.), Metaphor and thought (pp. 284–297). Cambridge, UK: Cambridge University Press. Rilling, J. K., Sanfey, A. G., Aronson, J. A., Nystrom, L. E., & Cohen, J. D. (2004). The neural correlates of theory of mind within interpersonal interactions. NeuroImage, 22, 1694–1703. Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking in conversation. Language, 50, 696–735. Samuel, S. G., & Troicki, M. (1998). Articulation quality is inversely related to redundancy when children or adults have verbal control. Journal of Memory and Language, 39, 175–194. Saxe, R., & Kanwisher, N. (2003). People thinking about thinking people: The role of the temporo-parietal junction in ‘‘theory of mind’’ NeuroImage, 19, 1835–1842. Schober, M. F. (1998). Conversational evidence for rethinking meaning. Social Research, 65, 511–534. Schober, M. F. (2004). Just how aligned are interlocutors’ representations? Behavioral and Brain Sciences, 27, 209–210. Schober, M. F., & Clark, H. H. (1989). Understanding by addressees and overhearers. Cognitive Psychology, 21, 211–232. Scholl, B. J., & Leslie, A. M. (1999). Modularity, development and ‘theory of mind’. Mind & Language, 14, 131–153. Sebanz, N., Bekkering, H., & Knoblich, G. (2006). Joint action: Bodies and minds moving together. Trends in Cognitive Science, 10, 70–76. Sebanz, N., & Knoblich, G. (2008). From mirroring to joint action. In I. Wachsmuth, M. Lenzen, & G. Knoblich (Eds.), Embodied communication (pp. 129–150). Oxford: Oxford University Press. Sebanz, N., & Knoblich, G. (2009). Prediction in joint action: What, when, and where. Topics in Cognitive Science, 1, 353–367. Shannon, C., & Weaver, W. (1949). The mathematical theory of communication. Urbana, IL: University of Illinois Press. Shatz, M., & Gelman, R. (1973). The development of communication skills: Modifications in the speech of young children as a function of listener. Monographs of the Society for Research in Child Development, 38, 1–38. Shockley, K., Richardson, D., & Dale, R. (2009). Conversation and coordinative structures. Topics in Cognitive Science, 1, 305–319. Smith, V. L., & Clark, H. H. (1993). On the course of answering questions. Journal of Memory and Language, 32, 25–38. Stellmann, P., & Brennan, S. E. (1993). Flexible perspective-setting in conversation. In: Abstracts of the Psychonomic Society, 34th Annual Meeting (p. 20), Washington, DC. Suda, M., Takei, Y., Aoyama, Y., Narita, K., Sato, T., Fukuda, M., et al. (2010). Frontopolar activation during face-to-face conversation: An in situ study using near-infrared spectroscopy. Neuropsychologia, 48, 441–447. Swerts, M., & Krahmer, E. (2005). Audiovisual prosody and feeling of knowing. Journal of Memory and Language, 53, 81–94. Tanenhaus, M. K., Spivey-Knowlton, M., Eberhard, K., & Sedivy, J. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632–1634.
344
Susan E. Brennan et al.
Tanenhaus, M. K., & Trueswell, J. C. (1995). Sentence comprehension. In J. Miller & P. Eimas (Eds.), Handbook of perception and cognition, Vol. 11: Speech language and communication. San Diego, CA: Academic Press. Tanenhaus, M. K., & Trueswell, J. C. (2004). Eye movements as a tool for bridging the language-as-product and language-as-action traditions. In J. C. Trueswell & M. K. Tanenhaus (Eds.), Approaches to studying world-situated language use: Bridging the languageas-product and language-action traditions (pp. 3–37). Cambridge, MA: MIT Press. Tesink, C. M. J. Y., Buitelaar, J. K., Petersson, K. M., van der Gaag, R. J., Kan, C. C., Tendolkar, I., et al. (2009). Neural correlates of pragmatic language comprehension in autism spectrum disorders. Brain, 132, 1941–1952. Tesink, C. M. J. Y., Petersson, K. M., van Berkum, J. J. A., van den Brink, D., Buitelaar, J. K., & Hagoort, P. (2008). Unification of speaker and meaning in language comprehension: An fMRI study. Journal of Cognitive Neuroscience, 21, 2085–2099. Van Berkum, J. J. A., van den Brink, D., Tesink, C. M. J. Y., Kos, M., & Hagoort, P. (2008). The neural integration of speaker and message. Journal of Cognitive Neuroscience, 20, 580–591. Van der Cruyssen, L., Van Duynslaeger, M., Cortoos, A., & Van Overwalle, F. (2009). ERP time course and brain areas of spontaneous and intentional goal inferences. Social Neuroscience, 4, 165–184. Van Duynslaeger, M., Van Overwalle, F., & Verstraeten, E. (2007). Electrophysiological time course and brain areas of spontaneous and intentional trait inferences. Social Cognitive and Affective Neuroscience, 2, 174–188. Van Leeuwen, M. L., van Baaren, R. B., Martin, D., Dijksterhuis, A., & Bekkering, H. (2009). Executive functioning and imitation: Increasing working memory load facilitates behavioural imitation. Neuropsychologia, 47, 3265–3270. Van Overwalle, F., & Baetens, K. (2009). Understanding others’ actions and goals by mirror and mentalizing systems: A meta-analysis. NeuroImage, 48, 564–584. Vogeley, K., Bussfeld, P., Newen, A., Herrmann, S., Happe, F., Falkai, P., et al. (2001). Mind reading: Neural mechanisms of theory of mind and self perspective. NeuroImage, 14, 170–181. Weldon, M. S., & Bellinger, K. D. (1997). Collective memory: Collaborative and individual processes in remembering. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 1160–1175. Wiener, N. (1965). Cybernetics—or control and communication in the animal and the machine (2nd ed.). Cambridge, MA: The MIT Press (originally published in 1948). Wiley, J., & Bailey, J. (2006). Effects of collaboration and argumentation on learning from web pages. In A. M. O’Donnell, C. E. Hmelo-Silver, & G. Erkens (Eds.), Collaborative learning, reasoning, and technology (pp. 297–321). Hillsdale, NJ: Erlbaum. Wiley, J., & Jensen, M. (2006). When three heads are better than two. In: Proceedings, CogSci 2006, 28th Annual Conference of the Cognitive Science Society (p. 3275), Vancouver, CA: Cognitive Science Society. Wilkes-Gibbs, D. (1986). Collaborative processes of language use in conversation. Unpublished doctoral dissertation, Stanford University, Stanford, CA. Wimmer, H., & Perner, J. (1983). Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition, 13, 103–128. Wright, D. B., & Klumpp, A. (2004). Collaborative inhibition is due to the product, not the process, of recalling in groups. Psychonomic Bulletin & Review, 11, 1080–1083. Yngve, V. H. (1970). On getting a word in edgewise. In: Papers from the 6th Regional Meeting of the Chicago Linguistic Society (pp. 567–578), Chicago, IL: Chicago Linguistic Institute.
C H A P T E R
N I N E
Retrieving Personal Names, Referring Expressions, and Terms of Address Zenzi M. Griffin Contents 345 346 346 348 364 365 369 371 372 376 379 379 380
1. Introduction 2. Psychological Research on Personal Name Production 2.1. How Difficult Are Personal Names? 2.2. Why Are Personal Names So Difficult? 3. Personal Names and Reference Across Cultures 3.1. What Are Names Like Cross-Culturally? 3.2. How Are People Referred to? 4. Direct Address in Spoken Language 4.1. Forms of Direct Address 4.2. Factors Influencing Choice of Address Form 5. Conclusion Acknowledgments References
Abstract Why is it more difficult to recall the names of celebrities and old acquaintances than other words that one seldom uses? Several factors related to the way information about people are structured and how words are produced conspire to make personal names particularly difficult to retrieve. In contrast, expressions such as descriptive nicknames, kinship terms, and titles appear easier to retrieve. A review of how people are named, referred to, and addressed across cultures and situations suggests that there is broad range in the relative difficulty of producing terms and that several social variables must be considered in a full account of name retrieval.
1. Introduction Personal names are among the most difficult words to learn and retrieve. At the same time, they are often extremely important for social interactions. Several properties of personal names contribute to their Psychology of Learning and Motivation, Volume 53 ISSN 0079-7421, DOI: 10.1016/S0079-7421(10)53009-3
#
2010 Elsevier Inc. All rights reserved.
345
346
Zenzi M. Griffin
difficulty. Rather than enter into the seemingly endless debate about the ‘‘meaning’’ of personal names (for novel perspectives, see, e.g., Allerton, 1987; Le´vi-Strauss, 1966; Miller & Johnson-Laird, 1976), the question to be addressed here is how they are retrieved in speaking. That is, what the processes and retrieval cues are that speakers use to generate personal names and how do these cues differ, if at all, from those used to retrieve other words. There is a large existing literature on the use of names as opposed to thirdperson pronouns such as he1 or she for referring to individuals in text and speech (for review, see Arnold, 2008). Much of this literature concentrates on the ability of a comprehender to recognize whom a speaker refers to and the extent to which other factors drive a speaker’s choice of referring expression (e.g., Arnold & Griffin, 2007). In contrast, no work in psycholinguistics has addressed the use of names as forms of direct address (i.e., vocative usage), as when speakers say the name of the person they are talking to, as in How’s it going, Brian? Moreover, people do not always address or refer to people by name. Speakers often use other terms such as titles (e.g., Doctor), kinship terms (Mom), nicknames (Fridge), endearments (Sweetie), and pronouns (you). Another goal of this chapter is to explore when these other forms of direct address are used and develop hypotheses about how they are retrieved. First, I review the literature on the learning and retrieval of personal names in comparison to other words such as object names (see also Valentine, Brennen, & Bre´dart, 1996). As most of the work of name production has been carried out in the United States and United Kingdom, the generalizations and assumptions about the forms of personal names are not universally valid. For the initial portion of this chapter, I nevertheless discuss this work as if it does generalize broadly. Then I briefly review naming and address systems from other cultures, drawing heavily on work from anthropology and sociolinguistics, and hypothesize about the relative ease of production for referring expressions and forms of direct address in these systems. Based on the information currently available, the terms used in other cultures may be easier to retrieve than the types of names typically studied or they may be difficult for different reasons.
2. Psychological Research on Personal Name Production 2.1. How Difficult Are Personal Names? Complaints about the difficulty of learning and remembering personal names are common (e.g., Cohen & Faulkner, 1986). On occasion, speakers fail to retrieve the names of familiar people although they have identified 1
Italics will be used to indicate reference to the form of a word rather than its referent. Quotes denote usage.
Personal Names
347
them correctly and can provide specific information about them. Such tipof-the-tongue (TOT) states occur when someone is confident that he/she knows the name of a person or concept but nonetheless cannot retrieve it. When in a TOT state, speakers often know the first sound or letter of the sought-for name and how many syllables it has (Brown & McNeill, 1966). In one study, 130 adults were asked to record each instance of TOT states over 4 weeks (Burke, MacKay, Worthley, & Wade, 1991). Although TOT states were relatively uncommon (averaging one occurrence per week for college-age adults), the majority of these retrieval failures were for proper names, and typically for acquaintances known for several years but with whom the speaker had not been in recent contact. Other diary studies report similar TOT frequencies for personal names (Cohen & Faulkner; Gollan, Bonanni, & Montoya, 2005; Young, Hay, & Ellis, 1985). In laboratory studies too, speakers are more likely to report TOT states for the names of familiar celebrities than for familiar objects (e.g., Gollan et al., 2005). Unfortunately, it is not possible to equate the familiarity of objects and faces (Semenza, 2006) or labels2 and celebrity names on all of the variables that one would want. Even taking this into account, personal names appear much more vulnerable to forgetting or retrieval blocks. Alas, for example, students forget the names of famous cognitive psychologists faster than they forget facts and concepts from cognitive psychology (Conway, Cohen, & Stanhope, 1991). Patients suffering from cognitive impairments tend to show greater decrements in their ability to retrieve proper names than common nouns (for review, see Yasuda, Nakamura, & Beckman, 2000). Even the cognitive deficits associated with extremely high altitude may impair recall of first names while sparing memory for common nouns (Pelamatti, Pascotto, & Semenza, 2003).3 Many studies have compared learning names for unfamiliar faces with the learning of other information such as occupations, hometowns, or hobbies. Again and again these studies show that personal names take longer to learn and are less likely to be recalled than other information (Cohen & Faulkner, 1986; Crook & West, 1990; McWeeny, Young, Hay, & Ellis, 1987). Personal names truly do seem particularly difficult to learn and retrieve.
2 3
Typically the literature refers to object names, but these will be referred to as labels here in an attempt at clarity. In this study, all of the common nouns came from different categories (e.g., body parts, vegetables, furniture) whereas nothing was done to make the personal names more distinctive. So, differences in recall may be due to greater item similarity within the list of personal names than within the list of common nouns. Note that this was not the case for the original study to use this technique with personal names (Hittmair-Delazer, Denes, Semenza, & Mantovan, 1994).
348
Zenzi M. Griffin
2.2. Why Are Personal Names So Difficult? Speaking involves conceptualizing the content of an utterance, selecting a term to use, retrieving the sounds of the selected word, and planning and executing the motor movements to articulate it (see, e.g., Griffin & Ferreira, 2006; Levelt, 1989; Meyer & Belke, 2007). Difficulties can arise at any of these stages but only the stages preceding articulatory planning may account for TOT states. Theorists and researchers have noted several ways in which personal names differ from other words. Some of these differences appear real and are likely to lead to slower or less successful retrieval of personal names. Other proposed differences have not yet been substantiated; others appear unlikely to hinder word retrieval. The evidence for these proposals is reviewed in the following section. 2.2.1. Individuality, Uniqueness, and Arbitrariness One of the most common assertions about proper nouns as opposed to object names is that the former pick out individuals (or tokens) rather than categorizing them as exemplars of a type (e.g., Hittmair-Delazer et al., 1994; Semenza & Zettin, 1989). This one-to-one relationship is often cited as a reason that personal names are more difficult to produce than object names and has been incorporated into most models of face naming to account for the retrieval difficulty of proper nouns (Burke et al., 1991; Burton, Bruce, & Johnston, 1990; Valentine et al., 1996). One prediction, then, is that other unique information should be equally difficult to retrieve. Harris and Kay (1995) methodically evaluated the extent to which a patient with proper-name anomia was impaired in retrieving information about people. Consistent with her diagnosis, the patient categorized and sorted celebrity photos perfectly and was unable to name many photos of celebrities and even photos of her friends. However, she had no problems generating unique information about celebrities (e.g., Why was Salman Rushdie in the news?) or her friends (such as which seat they habitually occupied at daycare). Her ability to recall identifying information about individuals, but not their names suggests that uniqueness alone is not to blame for the vulnerability of personal names to retrieval problems (see also Hittmair-Delazer et al., 1994). Furthermore, not all proper nouns appear as difficult to retrieve as personal names are. For example, place names (e.g., New York, London, Sweden) and the names of monuments and masterpieces (e.g., the Statue of Liberty, the Mona Lisa) are often less impaired by brain damage than personal names are (Ghika-Schmid & Nater, 2003; Saetti, Marangolo, De Renzi, Rinaldi, & Lattanzi, 1999; Warrington & Clegg, 1993). A review of over 10 cases of proper-name anomia showed that place names were only impaired for the patients with the most severe celebrity face-naming impairments (Hanley & Kay, 1998). As the researchers note, this could be due to place names (at least those that were used in these particular tests) being acquired
Personal Names
349
earlier and used more often than the celebrity names to which they were compared. Other researchers have hypothesized that such differences may be due to famous geographical locations often being more meaningful as reflected in their use as adjectives such as Parisian, and therefore having more connections between conceptual and word representations (Cohen & Burke, 1993).4 Alternatively, the existence of adjectival word representations for place names may support the retrieval of nominal word representations (Semenza, 1997). So, despite the individuality and relative arbitrariness of place names, it seems that they may acquire meaningful associations to their names that support their retrieval. Note that the Mona Lisa is also used as modifier as in a Mona Lisa smile and monuments often have meaningful words in them, such as statue and liberty, which may help their recall. That place names are often less impaired than personal names are further suggests that individuality and uniqueness alone do not make personal names particularly difficult to recall. Interestingly, one ability that seems just as impaired as retrieving personal names in proper-name anomia5 is retrieval of previously well-known telephone numbers and addresses (Harris & Kay, 1995; Saetti et al., 1999). Telephone numbers and addresses are extremely arbitrary, suggesting that arbitrariness and meaningfulness are important contributors to retrieval difficulty.6 2.2.2. Word Forms Learning someone’s occupation is typically a matter of linking a familiar concept and familiar word form (e.g., psychologist) to a new individual. Although one comes across new occupations occasionally (perhaps psycholinguist), new occupations are encountered far less frequently than novel first or last names. One study found that only 2% of first names in Oklahoma were meaningful words such as Ruby, Faith, or Violet (Alford, 1988). Moreover, even novel occupations typically contain familiar components (e.g., psycho-, lingu-, -ist) that make their forms and meanings somewhat familiar. In contrast, Brennen (1993) argues that learning someone’s name often involves learning a new word form (e.g., Zenzi) in addition to an association between the new name and the unfamiliar individual. 4 5
6
Names of famous people can also be used as modifiers (Cohen & Burke, 1993), but this may be less often the case for the celebrities used in face-naming studies. Many patients with proper-name anomia also have extreme difficulty in paired associate learning for unrelated words implying that arbitrariness could be an issue (Hittmair-Delazer et al., 1994), but there are several exceptions (Ghika-Schmid & Nater, 2003; Saetti et al., 1999). As Saetti et al. point out, it is likely that participants vary in their tendency and ability to spontaneously generate meaningful associations between unrelated words. Building names such as the Eiffel Tower are not generally extended to anything else and may have fewer associations than other famous unique things. Two experiments involving people with closed-head injuries and controls found that celebrities and buildings (matched on rated ease of naming) were equally slow and inaccurate in naming (Milders, 2000).
350
Zenzi M. Griffin
Indeed, people are much more successful at learning associations between names and faces when the forms of the names are common (e.g., Mr. Mitchell ) rather than so uncommon as to be unique (e.g., Mr. Marland; James & Fogler, 2007). Also, consistent with Brennen’s argument, when novel word forms such as ryman and crumpler are used for occupations in learning studies, occupations show no advantage in recall over surnames (Bruce, Burton, & Walker, 1994; Cohen, 1990b). In language production, words that share sounds with many other words are easier to retrieve than those that are more unique (e.g., Dell & Gordon, 2003; Harley & Bown, 1998; Vitevitch, 2002). Brennen (1993) posited that personal names vary much more widely in their phonological forms than other words do. Intuitively this assertion seems plausible. Greater variation may be due to many first and last names often coming from other languages that allow different sound sequences (e.g., Knut, Antje), due to the wide range of choices for first names among English speakers (e.g., using city names, combining parts of both parents’ first names into one for a child), or simply the combinations of sounds in even common names perhaps differing from those in other words (e.g., for a monosyllable, there are relatively few words that differ from John by one sound). Further, this possible difference between the forms of personal names and other words entails that the same amount of partial phonological information will be more constraining when trying to retrieve an occupation than a person’s name (Brennen). For example, the first syllable /bei/ is compatible with only a few occupations (baker, bailiff) but many potential and plausible surnames (Bay, Bader, Baker, Bale, Baines, Bates, Beattie, Bateman, etc.). Such a difference could make a TOT state for a surname harder to resolve than one for an occupation or other class of words with more regular forms. When phonological neighbors are defined as words that differ by just one sound, the phonological neighborhoods of multisyllabic words tend to be less dense than the neighborhoods for monosyllables (Harley & Bown, 1998). Even with other definitions of neighborhoods and attempting to control for the important variables that covary with length, longer words appear more vulnerable to speech errors (Goldrick, Folk, & Rapp, 2010). First and last names are often multisyllabic (e.g., Cassandra and Cooper, respectively). An implication of being longer, with sparser phonological neighborhoods would be that personal names receive less support from similar words during retrieval and therefore be more prone to TOTs and speech errors, just as these factors affect common nouns (Goldrick et al.; Harley & Bown). Consistent with this, retrieval failures occur more often for the names of celebrities who are known by three-part names (e.g., Martin Luther King) than equally familiar celebrities that are known by two-part names (e.g., Clint Eastwood; Hanley & Chapman, 2008). In sum, personal names may be longer and more phonotactically diverse than other types of words. These two variables are known to affect word
Personal Names
351
retrieval for common nouns and may be part of what makes personal names particularly difficult to learn and retrieve. 2.2.3. Descriptiveness and Meaning Experimental work on name learning shows that when properties of word forms are controlled by only using name-occupation homonyms, it is still more difficult to learn someone’s name than their occupation. That is, it is easier to recall that a new person in a photo is a potter by profession than that her last name is Potter ( James, 2004; McWeeny et al., 1987). One account for this paradox is that learning someone’s occupation provides a wealth of information (e.g., a potter is likely to be artistic, good at working with her hands, and not squeamish about dirt) whereas learning a name provides less information. Indeed, first names typically provide little information beyond statistical clues about sex (e.g., Cassidy, Kelly, & Sharoni, 1999), age cohort (e.g., Todd & Robert, 2009), and ethnicity (e.g., Rymes, 1996). Surnames provide even less information. Cohen (1990b) tested the extent to which surnames are treated as meaningless by comparing their recall to that of real and novel possessions in a learning study. So, for a particular photographed face, participants would hear, This man is called Mr. Hobbs. He is a pilot. He has a blick/dog (with the order of properties counterbalanced). So, the novel possessions had no meaning connected to them to aid in recall. As usual, participants recalled occupations more accurately than surnames. Names and nonword possessions were recalled equally poorly, whereas real-word possessions were recalled as accurately as occupations were. Cohen’s result is consistent with the idea that people treat surnames as meaningless. Also consistent with this, participants sometimes report spontaneously using mental imagery to learn occupations but do not seem to do so spontaneously for names (Bruce et al., 1994; McWeeny et al., 1987). In another experiment, Cohen found that participants were better able to learn homonymic surnames such as Baker when they were paired with meaningless occupation names like ryman than when paired with real occupations like lawyer.7 Cohen argued that the mismatch in information contained in potentially meaningful surnames and real occupations (or other properties) discourages people from treating surnames as meaningful. In other words, it is difficult to imagine that Potter is a potter while remembering that she is a psychologist. That said, studies of name learning indicate that instructions to try ‘‘to give meaning’’ to names increase recall (Brooks, Friedman, Gibson, & Yesavage, 1993; Milders, Deelman, & Berg, 1998; Morris, Fritz, Jackson, Nichol, & Roberts, 2005).8 This suggests that meaningful associations for a proper 7 8
Ideally, one would like to see control conditions with matched familiar but meaningless surnames like Hobbes. Unfortunately, this strategy is difficult to apply outside of the lab (Morris et al., 2005).
352
Zenzi M. Griffin
noun facilitate its retrieval (Warrington & Clegg, 1993) even though such associations are likely to be nonsensical and possibly inconsistent with the person’s other characteristics. Of course, surnames such as Potter and Baker were originally bestowed on people who had such occupations. The greater ease of recall for real occupations than surnames suggests that such names were easier to retrieve when they were descriptive (i.e., when Potter was a potter) than they are now. Likewise, a case study of an anomic aphasic is consistent with the idea that descriptive names are easier to recall than less descriptive ones. The patient was described as having preserved information about familiar people, good comprehension, and fluent grammatical speech, but impaired ability to retrieve personal names despite good recognition ability for faces (Flude, Ellis, & Kay, 1989). Notably among the few names or parts of names he was able to produce were ‘‘The Queen Mother,’’ ‘‘Princess’’ for Princess Anne and ‘‘Prince of’’ for the Prince of Wales. That is, when names included titles or roles, they were more readily retrieved. Other studies further indicate that descriptive names are retrieved more readily than nondescriptive ones. For example, people find it easier to name pictures of well-known cartoon characters that have descriptive names such as the Pink Panther and Spider-man than matched characters with nondescriptive names like Homer Simpson and Garfield (Bre´dart & Valentine, 1998; Fogler & James, 2007). Along the same lines, the only two famous individuals (out of 22) that an Italian proper-name anomic (HittmairDelazer et al., 1994) was able to name from descriptions had descriptive names: Superman and Batman. At the same time, he was able to provide additional specific information about 20 of the individuals and match a name to each face perfectly from a choice of three names. So, personal names that are descriptive of their bearers are more easily retrieved than nondescriptive names. Unfortunately, many names are not descriptive. However, this does not mean that speakers do not make use of information about a person to retrieve the person’s name. 2.2.4. Features and Representational Structure In general, when a speaker accidentally substitutes one word for another, the substituting word is highly related in meaning to the intended word. A representative substitution is ‘‘It’s at the bottom—I mean—the top of the stack of books’’ (Fromkin, 1971). Such observations are used to argue that the first stage of word production involves selecting word representations based on correspondence to semantic features (for review, see Griffin & Ferreira, 2006). Although personal names provide little information about their bearers, there is evidence that properties of their owners are used to retrieve them. Substitution errors typically result in the name of a person with shared characteristics. For example, errors in celebrity face naming by
Personal Names
353
both college students and anomics tend to be names of people who share nationality and profession, such as calling President Kennedy ‘‘Reagan’’ or ‘‘Eisenhower’’ or calling Elizabeth I ‘‘Mary of Scotland’’ (Bre´dart & Valentine, 1992; Cipolotti, McNeil, & Warrington, 1993; Lucchelli, Muggia, & Spinnler, 1997).9 In a name-learning experiment where occupations and other information were also presented, learners’ naming errors showed a tendency to confuse the names of individuals who had the same occupation (Fraas et al., 2002). Physical similarity also may play a role even when there is no reason to suspect that the substitution is due to mistaken identity. For example, survey respondents’ parents were more likely to mistakenly call them by the name of their siblings when they were of the same sex, close in age, or similar in appearance (Griffin & Wangerman, 2008). When parents accidentally addressed their children by names other than those of siblings, the source of the substitution was nearly always another family member such as the speaker’s own sibling, spouse, or a family pet. In sum, the substitution errors for personal names suggest that the characteristics and roles of a person are used retrieve the person’s name. However, the representations of people and objects are likely to differ in several ways that may affect the ease of word retrieval. Retrieving a name for something requires distinguishing it from other similar things and the features that are unique to it are important for achieving this. Connectionist networks trained to map between concepts and word forms develop stronger connections between word forms and distinctive features than between word forms and shared features (e.g., Cree, McNorgan, & McRae, 2006). This translates into faster retrieval for items with more distinctive features. In the context of retrieving names based on photographs or line drawings, visual distinctiveness is important. For example, rushing people to label objects disproportionately increased substitution errors for objects belonging to visually similar categories such as animals, fruits, and vegetables, rather than artifacts (Lloyd-Jones & Nettlemill, 2007). Substitutions tended to be visually and semantically related to the target, as in asparagus for celery. Indicating that retrieval of personal names is likewise sensitive to visual similarity, faces that are rated as more distinctive are named faster than less distinctive ones (e.g., Jack Nicholson vs. Mel Gibson; Valentine & Moore, 1995). In labeling animals, the speaker just needs to distinguish between categories such as lions from tigers. In naming faces, speakers must make a more fine-grained distinction, within the category of human. As a consequence, faces should be slower to activate appropriate name representations, be more prone to errors in name
9
However, within the category of musicians, US versus UK nationality is not strong enough to produce release from proactive interference unless the categories are explicitly mentioned (Darling & Valentine, 2005).
354
Zenzi M. Griffin
selection, and be more susceptible to impairment, just as the literature on proper naming deficits in brain-damaged patients suggests (Yasuda et al., 2000). If people tend to share many nonvisual features (e.g.,
, , ), this could also contribute to difficulty retrieving their names. When people are asked to list the features of various object concepts (e.g., dog: , , ), the distribution of features provided for animals and artifacts differ systematically. Animals tend to have fewer distinctive or informative features as well as more highly correlated features (e.g., the features and co-occur for multiple concepts) than artifacts do (e.g., Cree & McRae, 2003; Cree et al., 2006; Tyler, Moss, Durrant-Peatfield, & Levy, 2000). These differences between feature distributions have been hypothesized to account for category-specific deficits in general semantic knowledge and word retrieval (e.g., greater impairment for processing animals than artifacts) with brain damage (McRae, de Sa, & Seidenberg, 1997; Tyler et al.). So, an important question is how the distribution of features for individual people compares to other categories. A learning study by Cohen (1990a) further illustrates the importance of shared versus distinctive features in name learning. Participants studied a list of 18 statements that each linked an attribute to a common male first name with the goal of being able to supply the correct name when given the associated attributes. One condition paired each name with a unique attribute (no-fan), while in another condition, each name was paired with three unique attributes (fan-in). In the crossed-fan condition, each name was paired with two attributes that were each linked to one other name (see Figure 1). Once participants could go through the entire list of attributes and correctly generate the correct names for all of them, they performed a speeded attribute verification test (e.g., George likes cats—true or false?) and then an untimed cued recall test (e.g., __________ likes beer and is a doctor). In both tests, performance was equally fast and highly accurate for the two conditions in which names were paired with unique attributes (fan-in and no-fan), despite the difference in the number of attributes associated with each name. Participants were slowest and least accurate in verifying and retrieving names in the crossed-fan condition with its intermediate number of attributes per name. Thus, sharing attributes with others impaired name retrieval even though the combination of attributes linked to each individual was unique. Another learning study found that uncommon occupations such as jockey were better recalled than common ones like secretary when participants were cued by a photo (Stanhope & Cohen, 1993).10 In real life, people seem likely to share
10
List composition modulates this effect. Uncommon stimuli must be in the minority for the effect to appear (Stanhope & Cohen, 1993).
355
Personal Names
likes beer
is a doctor
likes films
has a bike
David
Tim Crossed-fan
is a waiter
has a car
is thin
Alan
John
Peter
No-fan
Fan-in
Figure 1 Diagram of conditions in Cohen’s (1990b) fan effect experiment. Recall was poorest in the crossed-fan condition.
many characteristics and have few, if any, unique attributes, and this may hinder retrieval of personal names. Certainly work in social psychology shows that people who share important features are perceived as similar and may be confused (e.g., Andersen & Berk, 1998; Fitzsimons & Shah, 2009). However, further work is required to establish the extent to which differences in distinctive features may account for differences between retrieving personal and object names. One potential complication in comparing distinctive properties across categories is that features that appear relatively distinctive for a person are likely to involve their roles in a relational structure. For example, Ragni Lantz has the distinction of being the one and only person who is my mother. However, the property of being a mother is not distinctive at all. Insofar as representing the unique property of being my mother relies on the nonunique relation <mother-of-X>, it might not function as a distinctive feature or at least not as effectively as, say, a feature like <moos> does for identifying cows. Recall that in the name substitution errors summarized above, the characteristics shared by the target person and owner of the substituting name were often extrinsic to the person (i.e., shared social roles such as governing the same country, having the same occupation, or belonging to the same family) rather than intrinsic features such as hair color or personality traits. The mental representations for relational properties differ from those of feature-based ones in many ways (Markman & Stilwell, 2001) that are likely to impact word retrieval. For example, relational concepts provide second order partitions of the world that are more complex and acquired later than categories based on intrinsic feature distinctions (Gentner & Kurtz, 2005). Furthermore, Gentner and Kurtz point out that labeling isolated objects emphasizes intrinsic features rather than relational ones. For example, a hammer is primarily for pounding nails but when it is presented by itself, its physical appearance provides the primary cues for identification and word retrieval. By extension to face
356
Zenzi M. Griffin
naming, if relational information is more critical in person representations and the retrieval of personal names, one should expect relatively poorer retrieval of personal names than of object labels even if other variables could be held equal. Note that this argument does not bear on whether mental representations for people are richer than for those for objects, but rather posits that the information that distinguishes among people is likely to be represented in a more complex fashion than that which distinguishes objects. A priori, the results of priming and interference studies should be helpful for investigating the mental representations for people. Many studies have investigated priming between celebrities in face naming and found weak categorical priming for people but strong associative priming (e.g., Brennen & Bruce, 1991; Carson & Burton, 2001; Darling & Valentine, 2005; Young, Ellis, Flude, McWeeny, & Hay, 1986; Young, Flude, Hellawell, & Ellis, 1994), which differs from what is typically seen for labeling objects (e.g., Lupker, 1979). However, interpreting the results of these celebrity-naming experiments is difficult. One large limitation is that categorical relatedness for people is typically defined as sharing an occupation or nationality in such studies. So, actors such as Tom Hanks and Demi Moore should affect the naming of John Wayne more than they would the naming of a politician (Carson & Burton, 2001). Although occupations or nationalities are clear categories, they may not be terribly important for how people conceptualize celebrities (e.g., witness the porous boundary between performer and politician, and the many successful Canadians on American TV). Association between celebrities is typically due to marriage or other co-occurrence, which intuitively results in stronger associations than found between objects. In addition, as Carson and Burton (2001) pointed out, association often was confounded with categorical relatedness in celebritynaming studies. So, for example, Prince Charles was associated with Princess Di, Stan Laurel was associated with Oliver Hardy, whereas for objects, a mouse was associated with cheese. Without some way of quantifying categorical or associative relatedness, there is no sound basis for comparing the degree of priming between people with the priming between objects. Another source of information about the semantic representations of words comes from their distribution in texts. For example, the words lemon and lime tend to appear in similar sentence contexts with words like pie, tree, squeeze, and wedge, while the word laptop does not. Large-scale analyses of word co-occurrence patterns yield vector representations that reflect these distribution patterns (Landauer & Dumais, 1997; Lund & Burgess, 1996). The more similar the vectors for two words, the more similar their meanings. For example, such measures of similarity have predicted the magnitude of priming from one word to another in a word recognition task (Lund & Burgess) and distinguished synonyms from foils on a vocabulary test (Landauer & Dumais). Such representations can be calculated for proper
Personal Names
357
names that appear in texts as well as for common nouns (Burgess & Conley, 1999). The most similar words for 20 personal names (e.g., Thomas) were compared to the most similar words for 20 common nouns of the same frequency as the names (e.g., dollar). The difference between the representations of personal names and their neighbors was smaller than the difference between common nouns and their neighbors. In other words, fewer features distinguished the use of personal names than common nouns. This result supports the idea that the representations of people that are used for name retrieval are less distinctive than those of other nouns. In summary, another source of the greater susceptibility of personal names to retrieval deficits may lie in how people are mentally represented. One likely source of difficulty is a predominance of shared features and dearth of distinctive ones. In addition, many distinguishing features may be extrinsic to the person, involving their relationships to other concepts or people, which may make their semantic representations more elaborate in a way that hampers retrieval. 2.2.5. Multiple Names Researchers have suggested that one of the reasons personal names are so difficult to retrieve is that there are so few alternative ways to refer to individuals (e.g., Bre´dart, 1993; Cohen & Faulkner, 1986). That is, people often have a given first name, a middle name that only their parents and the government know, and a surname. Unlike objects, they do not have synonyms (e.g., sofa for couch), superordinate terms ( furniture), or subordinate category names (sectional) that may substitute. Even insofar as a speaker knows different parts of a person’s name, conventions seems likely to preclude the use of an alternative such as suddenly addressing an acquaintance by their last name instead of their first name. Of course, some people are known by truncated11 versions of their given names (e.g., Vic for Victor). However, given the phonological overlap between long and truncated name forms, if one version can be retrieved, the other might be easily retrieved as well. In contrast, when someone is known by a nickname or pseudonym, speakers may not know the given name at all and, even if they did, it may not function well as a reference or form of address (e.g., Aceto, 2002; Dorian, 1970). That said, a study of male college students found that the more intimate a student was with someone, more names for he used for him (Brown & Ford, 1964). For example, a close friend of James Scoggin would call him ‘‘Scoggin,’’ ‘‘Jim,’’ ‘‘James,’’ and ‘‘Scoggs.’’ Although intuitively it might seem that more choices are to be preferred over few, the reverse is often the case for ease of production. For example, the time needed to retrieve an object label increases with its number of 11
I reserve the term nickname for names that are not standard truncations of given names such as calling someone named Robert Red but not Bob.
358
Zenzi M. Griffin
context appropriate labels (e.g., Bates et al., 2003). For example, people take longer to label an object that can be called TV or television than to label a matched object with a single, dominant label like tooth. This influence of codability or name agreement is greater than other variables such as word frequency, age of acquisition, phonological neighborhood size, and so on (Bates et al.; Bonin, Chalard, Meot, & Fayol, 2002; Snodgrass & Yuditsky, 1996). Furthermore, relative to monolinguals, bilinguals are more prone to TOT states for common nouns for which they may know or at least recognize labels across two languages. However, in both elicited and naturally occurring speech, monolinguals and bilinguals are equally prone to TOT states for proper names, for which they probably have one representation (Gollan et al., 2005). Moreover, common words do not even need to be relatively synonymous to compete for selection. For example, sharing a superordinate category allows giraffe and zebra to compete (for review, see Griffin & Ferreira, 2006; Vitkovitch, Humphreys, & Lloyd-Jones, 1993). There is surprisingly little and equivocal evidence on whether multiple names have a similar effect in retrieving personal names. To study the effect of having multiple names, some studies have made use of famous actors who play famous characters. A series of celebrity face-naming studies (Bre´dart, 1993) compared name retrieval for actors who were strongly associated with a particular character (Harrison Ford playing Indiana Jones) with equally familiar actors who were not associated with any particular character name (e.g., Woody Allen). A norming study indicated that the names of the actors and the characters were similar in familiarity. If having two names associated with a face were similar to having two labels associated with an object, one would expect slower and less successful naming for Harrison Ford. However, the proportions of correct responses and retrieval failures were similar for retrieving the actors’ names from the two groups.12 It may have been the case that the instruction to name the actor rather than the character successfully eliminated interference from the character’s name. Counter to that hypothesis, when participants were free to produce either actors’ or characters’ names as responses, they were still significantly more successful and suffered fewer TOT states for celebrities with a famous character than those without. Although character names are not synonymous with actors’ names, this last result suggests that having more names to choose among is facilitatory for person naming although it is not for object naming. Moreover, the same benefit of multiple known names occurred in a comparison between the same set of famous actors with famous characters (Harrison Ford/Indiana Jones) and photographs of actors whose names were not known but who played well-known characters (Richard Anderson as MacGyver). Insofar as 12
Another study failed to replicate this result and found a cost for having multiple names. However, the familiarity of the actors’ names was not controlled across conditions although other important factors were (Stevenage & Lewis, 2005).
Personal Names
359
characters’ names can be considered as close to actors’ names as multiple labels for an object are to one another, these studies suggest a discrepancy between the effect of multiple names on object labeling and face naming. However, another series of studies manipulated the availability of alternative names for actors and objects and found evidence of competition among multiple names for both types of words. On target trials in an initial experiment, participants named a photograph of an actor (e.g., John Cleese) after providing one of three responses to the same photograph of the actor on an earlier trial (Valentine, Hollis, & Moore, 1999). Having previously produced both the actor’s name and the name of a famous character that the actor played (e.g., John Cleese and Basil Fawlty) dramatically slowed latencies to later produce the actor’s name alone, relative to conditions in which participants previously either named the actor alone or in addition to the name of the TV program (e.g., John Cleese and Fawlty Towers). The conclusion was that the associated character’s name competed with the actor’s name for selection. Because the names of TV programs do not fall into the category of personal names, they did not compete with the actors’ names despite also being produced in response to the photograph. As the design of that experiment differed from existing studies of object labeling, somewhat analogous experiments were carried out with objects that differed in their labels in British and American English (Valentine & Darling, 2006). For example, the same object is labeled a lorry in British English and a truck in American English. For practice using the names, objects were repeatedly presented with labels written below. Responses were significantly slower and less accurate when participants had been trained to use two different labels during practice rather than one, regardless of whether they were asked to only use British labels at test or were free to use either British or American ones. So, common nouns from different dialects showed a multiple name competition effect just as celebrity names did using a similar paradigm. A clear difference between the studies that showed no cost of having multiple names for a face as opposed to those that did is whether participants were asked to switch responses used for repeated stimuli. Participants in a TOT study (Cross & Burke, 2004) generated names from celebrity photographs and based on descriptions with word stems (e.g., The flower girl from the musical ‘My Fair Lady’ whom Prof. Higgins transforms into a fashionable lady presentable to society, Eli____ Do____). Unlike those in the John Cleese experiments, these participants were more likely to successfully name a celebrity’s photograph (e.g., Audrey Hepburn) after previously generating an associated character name from a fill-in-the-blank question (Eliza Doolittle) than after an unrelated name. Altogether, this suggests that repeating stimuli and asking for different responses may be necessary to get competition between multiple personal names. In contrast, for objects, researchers consistently find a cost for having multiple potential labels regardless of whether the objects required multiple responses within the study or not.
360
Zenzi M. Griffin
It is not obvious which experimental situation is more likely to generalize to the everyday use of personal names. 2.2.6. Name Frequency and Age of Acquisition For objects, the impact of greater word usage is relatively simple. People are faster and more accurate to label objects that have more frequently used labels (Oldfield & Wingfield, 1964). Frequency of use is typically estimated from how often a word appears in a corpus of text or speech. Frequently used words are typically learned at earlier ages, so much work has tried to determine whether effects that were attributed to frequency of use were actually due to differences in age of acquisition (Carroll & White, 1973). Researchers normally use ratings to estimate how early words are learned and these ratings tend to correlate highly with the ages at which children can accurately label objects (Morrison, Chappell, & Ellis, 1997). Although frequency and age of acquisition are highly correlated, evidence suggests that both variables affect retrieval (see, e.g., Brysbaert & Ghyselinck, 2006). People are faster to produce earlier acquired personal names and have fewer TOT states for them (Bonin, Perret, Me´ot, Ferrand, & Mermillod, 2008; Moore & Valentine, 1998). However, establishing the role of usage frequency for personal names is complicated. Early diary studies found that people reported TOT states most often for names of friends and acquaintances (Cohen & Faulkner, 1986). This suggested a paradox in which the commonly used names are the most prone to problems, which would be the opposite of the frequency effect found for object labels. The researchers suggested that the apparent paradox could simply be due to the high frequency of retrieval attempts for the names of friends and acquaintances. That is, one rarely attempts to retrieve names of relatively unfamiliar people and so there are fewer occasions for failure. The results of an experiment that controlled for the number of retrieval attempts supported this explanation (Bre´dart, 1996). Less familiar celebrities elicited more TOT states than more familiar ones did. Furthermore, a speaker may simply entirely forget infrequently used names, in which case they will not result in TOT states. Unfortunately, the initial observation has been referred to as a reverse frequency or reverse familiarity effect on some occasions (Bre´dart; Cohen, 1990a), creating some confusion about how frequency affects the retrieval of personal names. Another difficulty is that only the frequency of surnames has been used as a measure. When dependent measures are based on the production of first and last names, one might not expect to see an effect of surname frequency (Bonin et al., 2008). On the other hand, it is a bit odd to produce bare surnames for celebrities who are known by their first and last names, so one might not expect robust effects in surname production. Then there is the issue of how to estimate name frequency. Researchers tend to use how often a surname appears in a telephone book or census ( James, 2004; James & Fogler, 2007; Moore & Valentine, 1998; Valentine &
Personal Names
361
Moore, 1995). This measure is probably correlated with frequency of exposure because the more Johnsons there are, the more likely one is to hear the name Johnson. However, there are relatively few people referred to by the name Madonna but the name is used relatively frequently (albeit periodically). Frequency measures for object labels reflect how often the words appear in print or speech rather than how many objects bear the label. So, the variable typically referred to as name frequency in the context of personal names is actually more a reflection of name ambiguity (e.g., Cohen, 1990a). Researchers have found that participants are faster and more accurate to name celebrities that they rate as more familiar, that is, ones that they have encountered more frequently than others (Moore & Valentine). However, rated familiarity appears to conflate exposure to the person with exposure to the name, so it is a very rough measure. In summary, age of acquisition and familiarity of personal names affects how quickly and accurately they may be retrieved just as they do for object labels, but researchers have not tested a measure of frequency of use for names that resembles the measure used for object labels. Many personal names may be more difficult to retrieve than common nouns if their forms are acquired by speakers later in life (Brennen, 1993). However, it has not been established whether personal names typically differ from object names and other words in age of acquisition, familiarity, or frequency, so they may or may not be at a disadvantage in this regard. 2.2.7. Name Ambiguity A prominent difference between personal names and common nouns is that a personal name may be shared by many completely unrelated individuals. For example, I know more than one person named Dan. For common nouns, the most analogous case is that of homonyms such as bank meaning financial institution or the shore of a river. Just as meeting one Dan will do nothing to help you recognize another one, learning one meaning of bank will not help recognition of the other. Having multiple unrelated meanings affects the speed and accuracy of producing a word. That is, the processing of homonyms (bank) and homophones (week/weak) differs from the processing of words with only one meaning or related senses (e.g., balcony). The speed and accuracy of producing a homophone is affected by how often the unintended meaning is typically used (Dell, 1990; Jescheniak & Levelt, 1994; Jescheniak, Meyer, & Levelt, 2003).13 Priming studies further indicate that having a shared form has processing consequences. For example, the word dance is related to one meaning of ball but not the bouncy, round meaning.14 13 14
The degree to which this holds is controversial (Caramazza, Costa, Miozzo, & Bi, 2001). Lemma or abstract word representations connect meanings, syntactic information, and phonological forms. Different homonym meanings are assumed to have different abstract word representations that likely feed into shared phonological forms (Dell, 1990).
362
Zenzi M. Griffin
When participants named a picture of a round ball, hearing dance as distractor sped up their responses relative to hearing an unrelated distractor word (Cutting & Ferreira, 1999). Presumably, dance activates the dance meaning of ball, which spreads activation to a shared phonological form / ball/, allowing it to be produced more rapidly. Homophone priming also occurs when personal names and common nouns share a common form. Producing the word pit as in cherry pit in response to a definition increased the probability of correctly retrieving Brad Pitt’s last name and reduced TOT states (Burke, Locantore, Austin, & Chae, 2004). It is not entirely clear whether these effects are due to multiple meaning representations becoming active when one is intended or just having multiple inputs converging on a single phonological form, but either way, there are implications for shared personal names. Retrieval of a person’s name should be affected by having the same form as another person’s name. For example, the ease of producing the name Dan should be sensitive to the use of the name for all Dans even if they do not share higher-level representations (i.e., person or word representations). To the extent that person representations that share a name are processed like the unrelated meanings of homophones, priming the characteristics of one Dan could make the name of another Dan faster and more likely to be retrieved. However, two humans have much more in common than two homophone concepts, so it may be premature to make predictions that assume that they are equally unrelated. Words that are similar in meaning like lion and tiger compete, slowing retrieval. Their level of competition is modulated by their degree of similarity (Vigliocco, Vinson, Damian, & Levelt, 2002). Substitution errors in the retrieval of personal names suggest that the names of similar people may also compete for selection (e.g., Bre´dart, 1993; Griffin & Wangerman, 2008). If so, perhaps paradoxically, the representations for two different individuals sharing the same name may interfere with one another, slowing retrieval and making it less likely to succeed (see Figure 2). Indeed, noun phrase representations for Kate Bush and George Bush are predicted to inhibit one in Node Structure Theory (Valentine & Moore, 1995). To recap, retrieving a name that is shared by many individuals should be different than retrieving a unique name. Currently, it is not clear what the effect of name ambiguity is, independent of other variables that tend to covary like frequency. So, name ambiguity may or may not be a disadvantage for personal names relative to other words. 2.2.8. Vocabulary Size and Age The longer a person lives, the bigger their vocabulary tends to get and particularly their knowledge of uncommon words (Alwin & McCammon, 2001; Verhaeghan, 2003). This vocabulary difference at least partially explains age-related increases in TOT states for common
363
Personal Names
Representation likes travel
has a Ph.D.
rows Feature
cycles
person1
Dan
?
is vegan
person2
person3
Fahad
David
is thin
person4
person5
Person
Andrew
Drew
Word
Form
Figure 2 Simplified diagram of representations for five men. If person representations compete with one another for selection, representations for individuals with the same name (person 1 and 2) may interfere with each other more than representations for equally similar individuals with different names (e.g., person 3 and 4). Note that there could be different word representations for people with the same name (i.e., two Dan word representations) and competition could occur at this level alone or in addition to the person level.
nouns (Bock, 1977; Dahlgren, 1998; Gollan & Brown, 2006). Just as people learn more uncommon words with age, they also learn more personal names. Higher numbers of known personal names may result in proactive interference in learning new names or greater interference between known names during retrieval (Brooks et al., 1993). Another contributor to poorer learning and retrieval of proper names among older adults may be their use of less effective mnemonics for learning names or decreased abilities to carry out mnemonics in real-world interactions (Brooks et al.). So, changes in knowledge and processing that are associated with increasing age tend to make personal names harder to retrieve. 2.2.9. Summary People are highly similar, both visually and in what can be predicated of them. Their distinguishing features often involve complex relationships (i.e., Barack Obama is not the first president in the world, nor is he the first US president or first multiracial/Black president, but he is the first official multiracial/Black US president). In other words, the perceptual and semantic spaces for people may be very dense with only complex
364
Zenzi M. Griffin
constellations of cues to distinguish them. So, the retrieval cues for people’s names may overlap with one another more than the cues for other categories of words do, making word selection more difficult. Even the names of countries and cities seem more meaningful than most personal names as suggested by their seemingly greater use as informative modifiers (e.g., a New York minute or an Austin music festival) and their higher recall in learning studies (Cohen & Faulkner, 1986). Furthermore, many personal names may be at a disadvantage due to later age of acquisition, lower frequency of use, greater length, greater phonological diversity, and higher ambiguity relative to common nouns (Brennen, 1993; Cohen, 1990a). However, studies have not directly compared word types on these dimensions. In addition, having multiple names for the same referent may or may not be more common for people than for objects and the effect of multiple names for people on retrieval is currently unclear. So, there are many possible reasons why personal names should be more difficult to retrieve than other words or information. Surprisingly, the fact that personal names pick out individuals rather than labeling categories does not seem to be a strong contributing factor to their difficulty.
3. Personal Names and Reference Across Cultures The naming system in some societies seems to have evolved primarily in order to differentiate individuals, in which case other means must be found to categorize them, to place them in the social matrix. In other societies, the naming system seems to have evolved primarily to categorize individuals, so that additional means must be found to differentiate them. (Alford, 1988, p. 69)
Speakers seem to take into account their audience’s knowledge when determining how to refer to things (e.g., Olson, 1970). The emphasis in psycholinguistic studies of reference has been on how speakers successfully differentiate between potential referents. First mentions of a person are typically in the form of a description (A guy) or some form of name (Steve), while subsequent references are more likely to use third-person pronouns such as he (see Smith, Noda, Andrews, & Jucker, 2005; Stivers, Enfield, & Levinson, 2007). So, the words we use to refer to a person vary considerably even within a single conversation. Although it is implicit in contrasting forms, less attention has been paid to the category information conveyed by different forms beyond their specificity. However, a referring expression for a person conveys a great deal of information about the referent, as well as the speaker and the relationship between the two (e.g.,Befu & Norbeck, 1958).
Personal Names
365
Generalizations about personal names up until this point have primarily reflected the types of names one finds in British and American mainstream culture. These properties do not generalize across the world or even across communities within the United Kingdom and the United States. Sociolinguists and anthropologists have documented various naming practices and identified many social and pragmatic factors that influence choice of referring expression (for review, see Alford, 1988; Stivers et al., 2007). In this section, I briefly discuss naming practices and speculate about their potential consequences for production based on the properties of the names. To preview the results, possession of multiple names and use of descriptive names or nicknames is much more common than one would expect based on the assumptions made in the existing psychological literature on name retrieval.
3.1. What Are Names Like Cross-Culturally? Alford (1988) studied naming practices across a probability sample of 60 nonindustrialized societies. He found that people had a first or given name in every society sampled, but in small communities, individuals often had only one component to their names. At the other extreme, in 5% of societies, names had four or more components. 3.1.1. Family and Clan Names Consistent with the use of names to categorize their bearers, many names indicated family or clan membership. Family surnames appeared in 33% of Alford’s (1988) sample and names that conveyed clan or lineage in 15%. Among many native American and Australian tribes, people receive names that are associated with their clan’s totem (Le´vi-Strauss, 1966). For example, members of an Osage Black Bear clan were dubbed, ‘‘Flashing-eyes (of the black bear), Tracks-on-the-prairies, Ground-cleared-of-grass, Black-bearwoman, Fat-on-the-skin of the black bear’’ (p. 173). In China, familyrelated information is marked on multiple names of an individual (Yau, 1996). In addition to a shared family name, the first character of a given name is traditionally taken from a family poem and shared by all male members of a generation. In Los Angeles, a complete gang name includes both a nickname and gang affiliation, where the nickname may be descriptive or even refer to a senior gang member who help violently initiate the person into the gang (Rymes, 1996). As Le´vi-Strauss argues, when names identify individuals as members of a class, the type/token distinction between proper names and common nouns is blurred. In using surnames to convey family membership, names in the United States and United Kingdom are similar to those in many other cultures.
366
Zenzi M. Griffin
3.1.2. Descriptive and Meaningful Names In two-thirds of Alford’s (1988) sampled societies, children typically received meaningful names. The most popular source for a meaningful name is the physical or behavioral characteristics of a child, making the name descriptive as well. In some communities, bestowing a name is customarily delayed for years after a child is born in order to allow characteristics to emerge. Animal names are also common, as it is hoped that children will take on their desirable characteristics. Children also receive names based on circumstances or events at the time of their birth. For example, among the Nuer of Borneo (Evans-Pritchard, 1948/1964), a child born during a drought was named Reath [drought]. Another Nuer child was named Met [to deceive], because the child’s father bent the truth while courting the child’s mother. Derogatory protective names were used in 21% of societies to ward off bad luck (Alford). Names that are tailored to individuals in this way may be more likely to be unique. All else being equal, descriptive names and meaningful names should be far easier to learn and less prone to TOT states than nondescriptive and meaningless names. Just as the pinkness of the Pink Panther facilitated recall of his name (Bre´dart & Valentine, 1998; Fogler & James, 2007), the mental representation of a person should provide good retrieval cues for a descriptive personal name. Having a story behind an episode-based name should make it memorable by creating a rich mental representation with many retrieval cues. At the same time though, meaningful names may be prone to semantically related word substitutions in a way that nonmeaningful names are spared. For example, someone named Blizzard may be prone to being called Snowfall by mistake. Alternatively, the meanings of meaningful names may cease to be processed after the person and name become familiar (Brennen, 2000). On the beneficial side, the word or words that comprise meaningful names are part of speakers’ normal vocabulary, so the names have forms that are similar in frequency and age of acquisition to those of common nouns. Furthermore, unique names avoid potential problems associated with name ambiguity. Thus, given names in many societies are likely to be more memorable and accessible than the ones typically studied by psychologists, but may be relatively more prone to semantic errors. 3.1.3. Nonunique and Multiple Names In societies with small sets of first and last names to draw on, many people end up sharing their entire names. Alford (1988) found that some form of alternative or nickname was commonly used in 75% of societies, and particularly in those where personal names did not uniquely specify individuals. For example, in a few small communities in Ireland, people so commonly share legal first and last names like Catherine Mullen and Pa´draig O´ Conghaile that their names are useless for reference (Lele, 2009).
Personal Names
367
Instead, referring expressions (bynames) include the Gaelic version of a person’s first name followed by either an ancestor’s name (e.g., Pa´draig Mha´ire Mho´ir [Pa´draig descendent of Big Mary]) or a prominent property of the referent (e.g., Jockan Rua´ [Red-headed Jockan]). A similar situation and solution arises in some villages in the Scottish Highlands, where for example, there were once 13 William Mackays in one school and three surnames shared by the majority of the inhabitants (Dorian, 1970). Likewise, legal first and last names fail to uniquely identify people in some Caribbean communities, so nicknames and license plate numbers are used for reference instead (Manning, 1974). Although some of the Caribbean nicknames may be nonsense words or arbitrarily connected to their bearers, they are often based on the individual’s personality, physical characteristics, or experiences. The Kamsa´ of Southwestern Colombia also have legal names that are shared by many members of the community (McDowell, 1981). So, they instead refer to one another within the community using ugly names that pick out distinctive characteristics (e.g., height or weight) or behaviors (e.g., a man who called all vegetables ‘‘yuca’’ was referred to by the word for the yuca plant). Descriptive and episode-based nicknames should be functionally the same as descriptive and episode-based names, and easily be learned and retrieved. In communities where ancestors are well known, adding an ancestor to a name may make it more memorable by adding retrieval cues. On the other hand, a name composed of two nondescriptive personal names may be more difficult to retrieve than a single name because either name may suffer retrieval failure. Recall that celebrities with three-part names resulted in more TOT states than those with two-part names (Hanley & Chapman, 2008). Moreover, Gaelic bynames place the ambiguous name first in the sequence, followed by the name that differentiates between potential referents. As a result, the initial name may cue the wrong second name, delaying retrieval or resulting in speech errors (Sevald & Dell, 1994). On the positive side, when legal names are so ambiguous that other names are nearly always used for reference instead, many people may only know a person’s nickname (see Aceto, 2002 for further examples). If so, there will be no processing consequences associated with having multiple names for a person. Other naming practices also result in an individual having multiple names. For example, among the Nuer, children often receive one personal name from their father’s side of the family and then another from their mother’s side (Evans-Pritchard, 1948/1964). An effort is made to make the names semantically related. For example, a child named Mun [earth] by one side was named Tiop [earth mixed with manure and ashes] by the other side of the family. A particular speaker is only likely to produce whichever name is appropriate for their side of the family or the village that they are in. However, the semantic relatedness and knowledge that both names apply to the same referent may cause the names to interfere with one another in production.
368
Zenzi M. Griffin
Belonging to multiple cultures and using more than one language can also result in multiple names for individuals. For example, a given or legal name in many spoken language may not be easily expressed in a sign language of the deaf. As a result, signers dub people with name signs. Within the first weeks of starting school in Greece, the United States, or China, deaf children from hearing families typically receive a name sign from an older schoolmate or an assertive peer, and then often carry the name sign for life (Kourbetis & Hoffmeister, 2002; Supalla, 1992; Yau, 1996). Name signs based on physical characteristics are very common across signed languages. For example, a study of 200 users of Greek sign language (Kourbetis & Hoffmeister) found that 92% of them had descriptive names, of which 55% expressed physical characteristics like having a scar or curly hair and 21% expressed personality traits. Likewise, a survey of users of the sign language of the Netherlands reported that 75% had descriptive name signs, such as the sign for smile for someone who smiled frequently (Schuit, 2009). Name signs may also be nondescriptive, based on a nondescriptive given name. For example, Supalla’s name sign in American sign language was an S handshape for Samuel that moved from one side of the chin to the other to distinguish it from the name sign of his brother Steve, which was an S handshape touching the side of the chin twice. However, in French sign language, Supalla was given a name sign denoting his pointed nose. Celebrities receive name signs if their given names are too unwieldy to fingerspell. For example, the name sign for Mao Zedong in Chinese sign language is composed of the sign for hair, which is the literal meaning of Mao, plus a gesture that alludes to his facial mole (Yau). In many societies, people receive an additional name or undergo a name change when their social relationships or status change (Kendall, 1980). The most familiar version of this in the United States and United Kingdom is the custom of women taking on their husbands’ surnames in place of their original surname or in addition to their surname. Often individuals receive a new name or nickname explicitly as part of entering adulthood or around in adolescence. These may be related to the person’s characteristics or related to objects strongly associated with the person, as in Nuer ox-names (EvansPritchard, 1948/1964). In a third of Alford’s (1988) sample, parents take on the name of a child (teknonymy), being called the equivalent of father-of-X or mother-of-X. Among some Africans, X, the child referred to in a teknonym, is the oldest one that still lives at home. As a result a parent may go through a sequence of names over a few years. The Penan of Borneo even have a default name that can be used in a teknonym until a child receives a name (Needham, 1954). The Penan also take new names when a family member dies. The death name specifies the relationship between the referent and the dead family member(s) as teknonyms do. So, a man may go from being referred to as Tama Jalong [father of Jalong] to Uyung Jalong [first born child Jalong is dead].
Personal Names
369
In sum, many cultures have naming systems that result in an individual having multiple names either sequentially, simultaneously, or both (see also Rymes, 1996). Name changes are highly likely to slow name retrieval and decrease accuracy, as old names interfere with new ones. As reviewed earlier, the effect of simultaneous multiple names is unclear, and it is likely to be modulated by the phonological and semantic relationships between names, their relative frequencies, their descriptiveness, and the strength of contextual cues associated with different uses.
3.2. How Are People Referred to? Names serve different purposes. So, even though a person may possess a name with three or more components, it is unlikely that the entire name will actually be used for more than occasional official documents or ceremonies. One question is how individuals are referred to when people who know them discuss them. In Alford’s (1988) sample, people were primarily referred to by some portion of their given name in 46% of societies, by possessed kinship terms (e.g., my uncle, your aunt) in 46%, and by nickname in the remainder. Use of kinship terms was more common when personal names did not already contain genealogical information in the form of surnames or patronyms (i.e., John’s son). Not surprisingly, frequent users of kinship terms tended to be kin-centered societies (see also Stivers et al., 2007). Because kinship terms depend on the meaningful, systematic relationships between people, they could be relatively easy to retrieve. People readily conceptualize messages relative to themselves (e.g., Ertel, 1977; Keysar, Barr, & Horton, 1998; MacWhinney, 1977), so the relationship between a referent and the speaker should be particularly salient and easy to represent. On the other hand, although a speaker might only refer to her aunt as my aunt, she must be able to recognize the references made to that person using a wide array of other kinship terms (e.g., mother, sister). Not only are all of these terms associated with the individual, but also they are all semantically related by being labels for female family members. So, again, the meaningfulness of the term should facilitate its retrieval but at the same time it introduces a number of semantically related terms that may interfere with it. Moreover, it may be advantageous in conversation to associate an individual to the addressee or someone other than oneself, for example by saying the equivalent of Your sister is causing a scene rather than My aunt (Stivers et al., 2007). However, retrieving a form that associates a relative to someone other than the speaker is likely to take extra time and increase the risk of speech error. In some cases, a further complication is the presence of taboos on name use. Having unique names in a society is associated with having name taboos (Alford, 1988). Unique names are often treated as extremely intimate and
370
Zenzi M. Griffin
sometimes sacred. Because they may be so evocative of the person named, in the wrong hands they may be used for bad magic and hence their use is avoided on most occasions (like US social security numbers). To the extent that thinking of a person activates a taboo name, speakers may have difficulty producing an alternative form and the anxiety about violating a taboo may increase the interference between names. Analyses of conversations suggest that speakers prefer to produce short referring expressions and let the addressee provide feedback if the referent is not recognized rather than produce unnecessarily elaborate descriptions (Sacks & Schegloff, 1979). However, speakers need to take into consideration not only whether an addressee is likely to identify the referent of a particular referring expression, but also whether the expression is an appropriate way to refer to the person given the addressee (e.g., Allerton, 1996; Murphy, 1988; Stivers et al., 2007). When speaking to someone of a lower status, it is common for a higher status speaker to refer to another person using the form of reference or address that would be appropriate for the lower status addressee to use rather than the speaker’s own term of address (Dickey, 1997). So, for example, a professor refers to another professor by title and last name (e.g., Dr. Markman) when speaking to an undergraduate, but by a shortened version of the professor’s name under normal circumstances (Art). Another example is when a parent refers to their child’s other parent as the equivalent of Mommy or Daddy only when speaking to their child or in its presence (Befu & Norbeck, 1958; Dickey). These situations share some characteristics with situations in the experimental work on perspective taking and audience design in object reference (see Brennan & Hanna, 2009). That work suggests that overcoming one’s own perspective may be effortful but vary with cultural practice (e.g., Keysar et al., 1998; Wu & Keysar, 2007). The broader social context of speech is also important. Some expressions should only be used if the referent is not present. Derogatory nicknames are the most obvious example (Crozier, 2002; McDowell, 1981). As a result, when a person’s physical presence is likely to make their nickname most easily retrievable is also when use of the nickname is least appropriate. Even if the referent is not present, social context matters. Dorian (1970) remarks on the difficulties determining whether an addressee was someone who would take offense when a byname was used to refer to another person and the difficulty coming up with another form of reference. On the other hand, speakers may avoid using first names or nicknames as referring expressions when speaking to someone who is less intimate with the referent (Dickey, 1997; Murphy, 1988). Whether a speaker is in the presence of in-group members and can be considered a member of the in-group is very important in selecting a referring expression and avoiding conflict (see Allerton, 1996).
Personal Names
371
In summary, whether or not an addressee can identify whom a speaker is referring to may be less important than making sure that the referring expression used is appropriate. Among other considerations, choice of an appropriate expression depends on the speaker’s relationship to the referent, the speaker’s relationship to the addressee, the relationship between the addressee and the referent, the presence of overhearers including the referent, the social context, and what the speaker wishes to express or emphasize about the referent and their relationship. Although this seems like it might require a great deal of calculation, it is not clear how much explicit reasoning is actually involved in selecting a referring expression. Contextual cues and implicit memory processes may help make appropriate expressions available (Horton & Gerrig, 2005). Further research is needed but support comes from the finding that even common nouns that have previously been used in conversation with a person are more quickly retrieved in the presence of the same person than with a different person, although the speaker is not speaking to them (Horton, 2007). The next section considers the forms used when the referent is the addressee.
4. Direct Address in Spoken Language Vocatives are terms that refer to the addressee of an utterance, that is, the person to whom a speaker speaks (Zwicky, 1974). Speakers often address people by name when trying to get their attention15 (e.g., Hey Jennifer, over here!) and when distinguishing them as the intended addressees of an utterance rather than others within earshot. Not surprisingly, these are usually the first two functions listed for vocatives (Leech, 1999; Zwicky). These functions probably account for vocatives being used more frequently in multiparty conversations than in two-party ones (Leech). Similarly, vocatives occur more often in utterances that introduce a change in topic than in those that continue with the same topic (Wilson & Zeitlyn, 1995). What is more interesting however is the third function of vocatives, which has to do with establishing and maintaining social bonds (Leech, 1999; Zwicky, 1974). When addressing someone for this function, the form of address such as a title, name, endearment, nickname, etc., is critical. In American culture, one risks offense by avoiding use of someone’s name because it often means that the name (and by implication the person) has been forgotten (e.g., Fiske, 1978). Indeed, one feels like a callous jerk when 15
Writers vary considerably in the functions they consider as a vocative use and their willingness to use the term vocative in the absence of vocative case marking. Here I will use direct address and vocative for all reference to a speaker’s addressee.
372
Zenzi M. Griffin
greeted by name and unable to reciprocate. In other societies, addressing someone by their name may be an affront, particularly one’s in-laws (Alford, 1988; Kasanga, 2009). Speech communities vary considerably in their preferred forms of address as well as the conventions that dictate their canonical use. Many variables, including the nature of the relationship between interlocutors, determine the form of address.
4.1. Forms of Direct Address Forms of direct address differ in many ways from referring expressions. Grammatically, they typically occur at the boundaries of an utterance (Leech, 1999; McCormick & Richardson, 2006). Particularly when the speaker already has the addressee’s attention, the form of address does not need to clearly specify the person as a referring expression would. Address forms are discussed below in roughly in their order of specificity. Nonspecific terms may be more easily retrieved than more specific ones, because they are likely to be used more frequently (e.g., you vs. man vs. John) and because they require less information be accurately retrieved about the addressee. However, richer, more meaningful representations of potential referents may support retrieval of more specific terms. 4.1.1. Address Avoidance and Second-Person Pronouns Because forms of address are so laden with social meaning (Zwicky, 1974), it is often tempting to avoid them altogether to prevent a social blunder. Indeed, address avoidance in English is common when people are unsure about the appropriate form of address (Ervin-Tripp, 1972). For example, one study found that advanced graduate students were more likely than starting graduate students to avoid explicitly addressing faculty by name (Little & Gelles, 1975). Presumably, the advanced students were past the point where they could comfortably call many faculty members by title and surname (e.g., Dr. Pen˜a) as the starting graduate students did, but not at a point where they felt comfortable using first names. Even if one avoids addressing someone by name or title, one is likely to require a second-person pronoun eventually in a conversation, even if it is simply to ask Would you like fries with that? Modern English speakers are lucky in this regard. You is a high-frequency word that works for both individuals and groups. One need know nothing about the addressee to use it. A speaker will not reveal much about his or her relationship to the addressee by the word you alone (although the degree of politeness in the remaining utterance is likely to provide clues about social distance). That said, ‘‘Hey you!’’ as an attention getter is considered rude and there are constructions that seem to exist just to help speakers avoid such pronouns (Brown & Levinson, 1987).
Personal Names
373
The situation is far more complex in languages that have multiple second-person pronouns such as French, Russian, and German where the form may vary systematically with the degree of intimacy between speaker and address as well as their relative status (Brown & Gilman, 1960/1970). Standards for second-person pronoun use changed dramatically in Europe over the twentieth century (e.g., Paulston, 1976). Traditionally, the T-pronoun (tu, ty, du) is informal and used with intimate equals in a casual setting, such as close friends. The V-pronoun (vous, vy, Sie) is formal and required for those of higher status due to wealth, occupation, and age, and in more formal settings. However, even when a speaker is close enough to an addressee to use the T-pronoun, the speaker might switch to the V-pronoun to express respect or contempt (see also Braun, 1988; ErvinTripp, 1972). So, while the second-person pronoun in English is insensitive to the subtleties of social interaction, the second-person pronouns of other languages may require speakers to take into account their relationship to the addressee and what they wish to express about that relationship in the utterance. Although this would be habitual for a native speaker, one would expect that the selection of a pronoun would be slowed when cues conflicted. 4.1.2. Insults and Endearments When someone cuts you off in traffic, terms of address may readily come to mind. Indeed, curses may not only come to mind, but even be uttered involuntarily. An early writer on aphasia noted how well preserved the ability to swear often was in cases of severe brain damage (Jackson, 1866/ 1958). All that seems needed to generate many insults is indignation. Tailoring the form of the curse to the properties of the addressee (e.g., gender, race, nationality, language) may be optional. On the other end of the emotional spectrum, endearments may likewise be relatively indifferent to the individual characteristics of the addressee. So, depending on the speaker’s preference, out may come Sweetie, Love, Honey, Snookums, Honeybunny, etc. Alas, psycholinguistic data on insult and endearment production are lacking, but the particularly strong emotions that are involved are likely to facilitate production relative to other forms of address. 4.1.3. Familiarizers and Fictive Kinship Terms Forms of address such as pal, buddy, chum, man, dude, comrade, or mate appear to be used primarily with strangers to reduce social distance and express solidarity (Brown & Levinson, 1987). For example, there is a famous song from the Great Depression, ‘‘Brother, can you spare a dime?’’ including a verse with ‘‘Buddy, can you spare a dime?’’ (Harburg & Gorney, 1931). Although some familiarizers appear gender-specific, many are used to by both genders to address both genders anyway. Indeed, the features of the addressee seem less relevant to their retrieval than the friendly sentiment that
374
Zenzi M. Griffin
the speaker wants to convey. In a survey of mate usage among Australians suggested that young men and women saw it ‘‘as a friendly term or as a term of endearment, used within a relaxed, informal or casual context’’ to address primarily men but also women (Rendle-Short, 2009, p. 253). Men also were likely to use mate when they had forgotten the addressee’s name. Many also just said they used it ‘‘out of habit’’ which further suggests that it is easily retrievable. Familiarizers seem relatively specific to speech communities and age cohorts, so it seems unlikely that a particular speaker would experience interference from buddy when retrieving dude. Furthermore, familiarizers may often be components of idiomatic expressions so that the other parts of the expression support their retrieval. In many societies, fictive kinship terms are frequently used to address people. For example, a Nuer man will typically call an older man outside of his family the equivalent of father and a younger man my son (EvansPritchard, 1948/1964). A study of 74 youth from various countries in Africa found that they used fictive kinship terms such as aunt and uncle about 75% of the time when addressing older adults outside of their families (Kasanga, 2009). In the United States, one seems most often to hear bro, brother, sister, and pop. Such terms should be quite easy to retrieve because they only require knowing the gender and relative age of the addressee, and perhaps whether they belong to one’s cultural group. For example, Navajo traditionally use fictive kinship terms corresponding to my aunt for women older than themselves and corresponding to my grandmother for yet older looking Navajo women (Fiske, 1978). When kinship terms are used so often for address (albeit with different addressees), their forms should be quite easy to retrieve. 4.1.4. Occupational Titles and Other Categorizations Occupational titles such as Doctor and President as well as military rank (e.g., Private, General) are clear reminders of social roles. Use of a title may indicate that the speaker acknowledges the addressee’s role or that the speaker expects the addressee to live up to the expectations associated with the role. Brown and Levinson (1987) noted that aside from greetings and such, Former Assistant Attorney General Henry Petersen only addressed President Nixon as Mr. President when expressing very sensitive topics such as giving bad news, assurances, suggestions, and asking about touchy subjects like indictments. Occupational titles are meaningful words. The features associated with the concept of doctor (e.g., wearing a stethoscope) can be used to identify people as doctors and retrieve the form of address Doctor. On the other hand, occupations are relational concepts so their conceptual representations are more complex than those for simple objects and this may affect word retrieval. In addition to their meaningfulness, the word forms for occupational titles are often acquired during childhood and may be quite
Personal Names
375
high in frequency. When used as forms of address, occupational titles may have fewer potential competitors than when used as category labels or for reference. For example, one can address all types of physicians, veterinarians, and dentists as Doctor whereas addressing them by their specialties is not an option (e.g., *Thank you, Dentist). In Jordanian Arabic, anyone who has made the pilgrimage to Mecca may be addressed as hadze or hadzi [pilgrim] (Braun, 1988). Like occupational titles, address forms that are related to simple categorizations should be relatively easy to retrieve. In sum, these categorization-like terms should be much easier to retrieve than personal names but may be more difficult than more general solidarity terms like mate or dude because they are more specific. 4.1.5. Kinship Terms Among the first 10 words that children learn across cultures typically are forms of Mommy and Daddy (Tardif et al., 2008). Kinship terms are the dominant form of address for 49% of Alford’s (1988) sampled societies. Among English speakers, kinship terms are mostly used to address members of generations older than the speaker (e.g., Dad, Grandma) whereas first names are typically used for members of the same generation or younger, such as one’s siblings and children (Allerton, 1996; Dickey, 1997). In many other cultures, kinship terms extend to a larger group of people, are more specific (i.e., specifying matrilineal or patrilineal descent, or birth order), and are used more frequently than in the United Kingdom and United States. In some communities, speakers occasionally address people using the kinship term that the addressee would normally use to address the speaker (i.e., inverse-kinship terms, bipolar terms). For example, in Kuwaiti Arabic, a father might address his daughter as /yuba/ [my father] when cajoling or placating her (Yassin, 1977). Other inverse-kinship terms in Kuwaiti can be used to express affection, mild rebuke, or condescension. Such address inversions further emphasize the relationship between the speaker and the addressee by using a term that takes the addressee’s perspective. Braun (1988) noted that address inversion clearly indicates that forms of direct address are not about identifying addressees but rather about emphasizing the relationship between speaker and addressee. With kin, speakers often know an addressee’s name even if it is not appropriate to use it in a specific community or context. So, the availability of multiple forms of address may affect retrieval times. Moreover, as mentioned earlier, multiple related kin terms may become available and compete for selection. That said, the forms for addressing kin (e.g., Mom, Mother), referring to kin (my mom, your mother), or categorizing family relationships (a mom, a mother) are often similar or identical within a language, which, in addition to their early acquisition, should make them easy to retrieve.
376
Zenzi M. Griffin
4.1.6. Nicknames Nicknames were the primary form of address in 19% of Alford’s sample. Across cultures, nicknames tend to be based on a person’s physical characteristics, characteristic behaviors, embarrassing episodes, or humorous variations on a person’s given name (e.g., Aceto, 2002; Manning, 1974). In many communities, men receive nicknames more often than women do (Kendall, 1980). Members of the same age cohort are the ones who tend to coin and use nicknames (e.g., Evans-Pritchard, 1948/1964). Drawing attention to personal information is either associated with acceptance and intimacy as when nicknames are used among friends, or aggression and hostility as when they are part of name calling and bullying (see Crozier & Skliopidou, 2002). Nicknames that are intended to be hurtful tend to pick out the most distinctive physical aspect of the person, especially weight (Crozier & Dimmock, 1999). Racial labels and animal names are also common in name calling. Because nicknames are so often descriptive, they should be easier to retrieve than nondescriptive personal names are, but unlike kin terms and occupational titles, they are particular to an individual so they are likely used less frequently. To the extent that they use common words or morphemes (e.g., Shorty, Red ), the forms of nicknames should be easier to retrieve than personal names are. 4.1.7. Names and Teknonyms Names were the primary form of address in 32% of Alford’s (1988) sample. As reviewed earlier, the difficulty of learning and retrieving names depends on variables such as their descriptiveness, meaningfulness, frequency of use, application to multiple people, and perhaps the number of alternative names for the addressee. Teknonyms such as Abu Ali [father of Ali] may be somewhat more difficult than personal names to retrieve, because they require retrieving the name of the relative whom the speaker is not currently addressing in addition to verbalizing the relationship between the addressee and the relative. On the other hand, in addressing someone as father-of-X, there may be freedom to choose the name of whichever child is easiest to retrieve to form the teknonym (Evans-Pritchard, 1948/1964). Thus, if a person had four children, a speaker may have four potential ways of creating a teknonym. Also, a teknonym may be a better match for how an individual is conceptualized than their personal name is. For example, the father of a friend may be primarily thought of in terms of the friend and paternal relationship, so Abu Ali would be easier to retrieve than a title with a last name such as Mr. Smith.
4.2. Factors Influencing Choice of Address Form In a large corpus study of vocatives (excluding you) in British and American everyday speech, full or truncated first names (e.g., James or Jimmy) were used in 64% of instances and a title with a surname (Mr. Spock) in a further
Personal Names
377
2% (Leech, 1999). The remainder of uses were familiarizers such as dude 14%, kinship terms 10%, and endearments 5%. In contrast, an analysis of speech in academic settings (which included you) found that most commonly used vocatives were group terms such as guys, followed closely by second-person pronouns such as you and you guys (McCormick & Richardson, 2006). About 15% of address forms were names, 7% honorifics (sir), 6.5% familiarizers (dude), and 4% endearments (baby). An analysis of an American middle-class family dinner conversation found that children addressed their parents almost equally often with you and kinship terms (Mom, Dad), whereas parents used you consistently with each other as well as the majority of the time in addressing their children (Wilson & Zeitlyn, 1995). The children addressed each other equally often by first name and with you. Naturally, the relative frequency of different forms of address depends heavily on the contexts that speech is drawn from. In a classic paper, Brown and Ford (1964) described some criteria that affected the choice of address for Americans in the mid-twentieth century. Upon introduction, people of similar status started off addressing each other by title and last name (Mr. Braddock) and then they quickly tended to shift to reciprocal use of first names. Nonreciprocal forms of address are still common when there is a difference in status due to age or occupation between speakers such as student–teacher, employee–employer, and server– customer. The movie Mrs. Robinson (1967) included wonderful examples of asymmetrical address. Although they have known them their entire lives, the college-age children address their parents’ friends by title and last name (Mr. or Mrs. Robinson) and in return are addressed by first name. So, the older woman, Mrs. Robinson, addresses her young lover as ‘‘Benjamin,’’ while he persists in calling her ‘‘Mrs. Robinson.’’ Increased intimacy typically results in reciprocal use of first names (Brown & Ford), but the continued mismatch in address forms emphasizes the status difference and emotional distance between the two. Strangers or new acquaintances typically have a single form of address. Across cultures, as a relationship grows more intimate, there is often a shift in the form of address used (Befu & Norbeck, 1958; Brown & Ford, 1964; Kasanga, 2009). When there is a difference in status, the higher status person may need to invite the lower status one to use a more familiar form of address. Germans and Swedes even had an informal ceremony to mark the occasion of shifting from the use of formal second-person pronouns to the more familiar pronoun (Brown & Ford; Paulston, 1976). Use of forms that are associated with intimacy is not taken lightly. Imagine that a stranger addressed you by a pet name that only a significant other used. As Beidelman (1974) points out, ‘‘Unwarranted use of a name would thus represent an invasion of a person’s social space, esteem, dignity, privacy, and therefore an abuse’’ (p. 282). Unfortunately, cultural differences in address terms can be difficult to reconcile and may result in discomfort or offense (Bargiela et al., 2002).
378
Zenzi M. Griffin
As people become closer, the number of forms they use to address one another often increases (Brown & Ford, 1964; Jonz, 1975). Brown and Ford asked 32 male undergraduates at MIT to each list four men of approximately the same age that he had met about one year before and all the different names he used to address them. A greater number of names used to address a man were associated with greater self-disclosure to the addressee. The researchers compared this to other areas of vocabulary where greater importance and interest in a domain leads to finer lexical distinctions within it (e.g., skiers have more terms for snow than nonskiers). In part, this increase in address forms is likely due to interacting with the person under a broader range of circumstances. Having multiple names for an individual provides a rich means of expressing variations in how the person is considered (Brown, 1959/1970). To the extent that contextual or emotional cues selectively activate particular address forms, having more forms available may not be a problem. Further complicating matters, people might use formal terms of address for someone in overt speech but more intimate terms when thinking about the person, particularly as an object of secret affection (Friedrich, 1972). In such cases, it seems particularly likely that covert forms would interfere with retrieving overt ones. Shifts in vocative forms convey the speaker’s current attitude about the relationship, the addressee, or the message to be conveyed (e.g., RendleShort, 2009). A clear example of this is the use of endearments like Sweetie or Love, particularly when providing emotional support. In the United States, parents typically address their children by first name or a truncated version of it. However, they may address their children by their full names (first, middle, and last) when displeased by their behavior (Brown & Ford, 1964). Although the use of full names is canonically associated with formal occasions and politeness, it connotes distance when used by chiding parents. Likewise, college students report that they are more likely to use formal terms like Mother and Father rather than Mom and Dad when in conflict with their parents than in casual situations (Lewis, 1965). Several researchers note that speakers shift to more intimate or kinship terms for family members when requesting something from an addressee (Ervin-Tripp, 1972; Kendall, 1980). An anthropologist’s informant expressed the motivation succinctly, ‘‘If you call someone maya [parallel cousin], they have to treat you right’’ (Kendall, p. 266). In contrast, speakers may switch to more polite forms such as title and last name when requesting something of nonkin (Brown & Levinson, 1979; Jonz, 1975). Indeed, use of address forms is an important part of softening threats to the ‘‘face’’ of one’s addressee (Brown & Levinson, 1987). In emergency situations, there is no time for considering the merits of alternative forms of address. Jonz reported that bare titles were particularly likely to be used in the military when under fire. Although the use of first names and diminutives is standardly associated with intimacy and the use of terms like buddy or mate often implies solidarity
Personal Names
379
and equality, these terms may also be used to connote condescension or hostility (Brown & Ford, 1964; Ervin-Tripp, 1972; Rendle-Short, 2009). So, the form that a vocative takes not only reflects the respective social roles of the speaker and the addressee, but also communicates the speaker’s current attitude about the relationship and the addressee. At present far too little is known about the production processes behind such ironic use of words to speculate about the processing involved (but see Hancock, 2004). Interestingly, name signs are not used for direct address among signers (Schuit, 2009). Second-person pronouns are in the lexicons of signed languages, but I have as yet found no data about variations in politeness for these or other variations in address forms. This summary only scratches the surface of work that has been done on personal address (see, e.g., Braun, 1988; Philipsen & Huspek, 1985). Very little is known about how these considerations come into play in online word production. Clearly, understanding the processes that underlie the production of address terms will require the field to consider speakers’ intentions, emotions, attitudes, and social interactions at a deeper level than it previously has.
5. Conclusion The study of how speakers retrieve personal names has thus far primarily addressed a small subset of name forms (mostly nondescriptive surnames) in relatively unrepresentative domains of name usage (labeling faces and referring in narration). On the other hand, abundant information is available from anthropology and sociolinguistics about naming systems and the variables that affect the choice of terms for personal reference and address. Given the recent shift to studying language processes in communicative settings and the increase in experimental methods for doing so (for review, see Griffin & Crew, 2010), the field is ripe for beginning to address the production of socially relevant language in communicative contexts. Social psychology can provide information about person perception, but challenges will lie in developing production models that can capture the importance and subtleties of social relationships, communicative intentions, and discourse factors.
ACKNOWLEDGMENTS Thanks to Tamar Gollan, Brian Ross, and Cognition and Communication Lab members: Callan Cooper, Cassandra Jacobs, and Madeline Clark for comments on drafts of this chapter and to the many people who have discussed names with me. I am especially grateful to the people who stimulated my interest in this area by emailing me to ask about the cause of substitution errors for personal names.
380
Zenzi M. Griffin
REFERENCES Aceto, M. (2002). Ethnic personal names and multiple identities in Anglophone Caribbean speech communities in Latin America. Language in Society, 31(4), 577–608. Alford, R. D. (1988). Naming and identity: A cross-cultural study of personal naming practice. New Haven, CN: HRAF Press. Allerton, D. J. (1987). The linguistic and sociolinguistic status of proper names What are they, and who do they belong to? Journal of Pragmatics, 11(1), 61–92. Allerton, D. J. (1996). Proper names and definite descriptions with the same reference: A pragmatic choice for language users. Journal of Pragmatics, 25(5), 621–633. Alwin, D. F., & McCammon, R. J. (2001). Aging, cohorts, and verbal ability. Journals of Gerontology Series B: Psychological Sciences and Social Sciences, 56(3), S151–S161. Andersen, S. M., & Berk, M. S. (1998). The social-cognitive model of transference: Experiencing past relationships in the present. Current Directions in Psychological Science, 7(4), 109–115. Arnold, J. E. (2008). Reference production: Production-internal and addressee-oriented processes. Language and Cognitive Processes, 23(4), 495–527. Arnold, J. E., & Griffin, Z. M. (2007). The effect of additional characters on choice of referring expression: Everyone counts. Journal of Memory and Language, 56, 521–536. Bargiela, F., Boz, C., Gokzadze, L., Hamza, A., Mills, S., & Rukhadze, N. (2002). Ethnocentrism, politeness and naming strategies. In: Working Papers on the Web, Vol. 3: Linguistic Politeness and Context. Retrieved December 1, 2009, from http://extra.shu.ac. uk/wpw/politeness/bargiela.htm. Bates, E., D’Amico, S., Jacobsen, T., Szkely, A., Andonova, E., Devescovi, A., et al. (2003). Timed picture naming in seven languages. Psychonomic Bulletin & Review, 10(2), 344–380. Befu, H., & Norbeck, E. (1958). Japanese usages of terms of relationship. Southwestern Journal of Anthropology, 14(1), 66–86. Beidelman, T. O. (1974). Kaguru names and naming. Journal of Anthropological Research, 30(4), 281–293. Bock, J. K. (1977). The effect of a pragmatic presupposition on syntactic structure in question answering. Journal of Verbal Learning and Verbal Behavior, 16, 723–734. Bonin, P., Chalard, M., Meot, A., & Fayol, M. (2002). The determinants of spoken and written picture naming latencies. British Journal of Psychology, 93(1), 89–114. Bonin, P., Perret, C., Me´ot, A., Ferrand, L., & Mermillod, M. (2008). Psycholinguistic norms and face naming times for photographs of celebrities in French. Behavior Research Methods, 40(1), 137–146. Braun, F. (1988). Terms of address: Problems of patterns and usage in various languages and cultures. Berlin: Mouton de Gruyter. Bre´dart, S. (1993). Retrieval failures in face naming. In G. Cohen & D. M. Burke (Eds.), Memory for proper names (pp. 351–366). Hillsdale, NJ: Lawrence Erlbaum Associates. Bre´dart, S. (1996). Person familiarity and name-retrieval failures: How are they related? Cahiers de Psychologie Cognitive/Current Psychology of Cognition, 15(1), 113–120. Bre´dart, S., & Valentine, T. (1992). From Monroe to Moreau: An analysis of face naming errors. Cognition, 45(3), 187–223. Bre´dart, S., & Valentine, T. (1998). Descriptiveness and proper name retrieval. Memory, 6(2), 199–206. Brennan, S. E., & Hanna, J. E. (2009). Partner-specific adaptation in dialog. Topics in Cognitive Science, 1(2), 274–291. Brennen, T. (1993). The difficulty with recalling people’s names: The plausible phonology hypothesis. In G. Cohen & D. M. Burke (Eds.), Memory for proper names (pp. 409–431). Hillsdale, NJ: Lawrence Erlbaum Associates.
Personal Names
381
Brennen, T. (2000). On the meaning of personal names: A view from cognitive psychology. Names, 48(2), 139–146. Brennen, T., & Bruce, V. (1991). Context effects in the processing of familiar faces. Psychological Research, 53, 296–304. Brooks, J. O., III, Friedman, L., Gibson, J. M., & Yesavage, J. A. (1993). Spontaneous mnemonic strategies used by older and younger adults to remember proper names. In G. Cohen & D. M. Burke (Eds.), Memory for proper names (pp. 393–407). Hillsdale, NJ: Lawrence Erlbaum Associates. Brown, P., & Levinson, S. C. (1979). Social structure, groups and interaction. In H. Giles & K. R. Scherer (Eds.), Social markers in speech (pp. 291–341). Cambridge, UK: Cambridge University Press. Brown, P., & Levinson, S. C. (1987). Politeness: Some universals in language usage. Cambridge, UK: Cambridge University Press. Brown, R. (1959/1970). A review of Nabokov’s Lolita. In R. Brown (Ed.), Psycholinguistics: Selected papers by Roger Brown (pp. 370–376). New York, NY: Free Press. Brown, R., & Ford, M. (1964). Address in American English. In D. Hymes (Ed.), Language in culture and society: A reader in linguistics and anthropology (pp. 234–244). New York, NY: Harper & Row. Brown, R., & Gilman, A. (1960/1970). Pronouns of power and solidarity. In R. Brown (Ed.), Psycholinguistics: Selected papers by Roger Brown. New York, NY: Free Press. Brown, R., & McNeill, D. (1966). The ‘‘tip-of-the-tongue’’ phenomenon. Journal of Verbal Learning and Verbal Behavior, 5, 325–337. Bruce, V., Burton, A. M., & Walker, S. (1994). Testing the models? New data and commentary on Stanhope & Cohen (1993). British Journal of Psychology, 85(3), 335–349. Brysbaert, M., & Ghyselinck, M. (2006). The effect of age of acquisition: Partly frequency related, partly frequency independent. Visual Cognition, 13(7), 992–1011. Burgess, C., & Conley, P. (1999). Representing proper names and objects in a common semantic space: A computational model. Brain and Cognition, 40(1), 67–70. Burke, D. M., Locantore, J. K., Austin, A. A., & Chae, B. (2004). Cherry pit primes Brad Pitt: Homophone priming effects on young and older adults’ production of proper names. Psychological Science, 15(3), 164–170. Burke, D. M., MacKay, D. G., Worthley, J. S., & Wade, E. (1991). On the tip of the tongue: What causes word finding failures in young and older adults? Journal of Memory and Language, 30(5), 542–579. Burton, A. M., Bruce, V., & Johnston, R. A. (1990). Understanding face recognition with an interactive activation model. British Journal of Psychology, 81, 361–380. Caramazza, A., Costa, A., Miozzo, M., & Bi, Y. C. (2001). The specific-word frequency effect: Implications for the representation of homophones in speech production. Journal of Experimental Psychology: Learning Memory and Cognition, 27(6), 1430–1450. Carroll, J. B., & White, M. N. (1973). Word frequency and age of acquisition as determinants of picture-naming latency. Quarterly Journal of Experimental Psychology, 25, 85–95. Carson, D. R., & Burton, A. M. (2001). Semantic priming of person recognition: Categorial priming may be a weaker form of the associative priming effect. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 54(4), 1155–1179. Cassidy, K. W., Kelly, M. H., & Sharoni, L. J. (1999). Inferring gender from name phonology. Journal of Experimental Psychology: General, 128(3), 362–381. Cipolotti, L., McNeil, J. E., & Warrington, E. K. (1993). Spared written naming of proper nouns: A case report. In G. Cohen & D. M. Burke (Eds.), Memory for proper names (pp. 289–311). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Cohen, G. (1990a). Recognition and retrieval of proper names: Age differences in the fan effect. European Journal of Cognitive Psychology, 2(3), 193–204.
382
Zenzi M. Griffin
Cohen, G. (1990b). Why is it difficult to put names to faces? British Journal of Psychology, 81(3), 287–297. Cohen, G., & Burke, D. M. (1993). Memory for proper names: A review. In G. Cohen & D. M. Burke (Eds.), Memory for proper names (pp. 249–263). Hillsdale, NJ: Lawrence Erlbaum Associates. Cohen, G., & Faulkner, D. (1986). Memory for proper names: Age differences in retrieval. British Journal of Developmental Psychology, 4(2), 187–197. Conway, M. A., Cohen, G., & Stanhope, N. (1991). On the very long-term retention of knowledge acquired through formal education: Twelve years of cognitive psychology. Journal of Experimental Psychology: General, 120(4), 395–409. Cree, G. S., McNorgan, C., & McRae, K. (2006). Distinctive features hold a privileged status in the computation of word meaning: Implications for theories of semantic memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32(4), 643–658. Cree, G. S., & McRae, K. (2003). Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns). Journal of Experimental Psychology: General, 132(2), 163–201. Crook, T. H., & West, R. L. (1990). Name recall performance across the adult life-span. British Journal of Psychology, 81(3), 335–349. Cross, E. S., & Burke, D. M. (2004). Do alternative names block young and older adults’ retrieval of proper names? Brain and Language, 89(1), 174–181. Crozier, W. R., & Dimmock, P. S. (1999). Name-calling and nicknames in a sample of primary school children. British Journal of Educational Psychology, 69, 505–516. Crozier, W. R., & Skliopidou, E. (2002). Adult recollections of name-calling at school. Educational Psychology, 22(1), 113–124. Cutting, J. C., & Ferreira, V. S. (1999). Semantic and phonological information flow in the production lexicon. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 318–344. Dahlgren, D. J. (1998). Impact of knowledge and age on tip-of-the-tongue rates. Experimental Aging Research, 24(2), 139–153. Darling, S., & Valentine, T. (2005). The categorical structure of semantic memory for famous people: A new approach using release from proactive interference. Cognition, 96(1), 35–65. Dell, G. S. (1990). Effects of frequency and vocabulary type on phonological speech errors. Language and Cognitive Processes, 5, 313–349. Dell, G. S., & Gordon, J. K. (2003). Neighbors in the lexicon: Friends or foes? In N. O. Schiller & A. S. Meyer (Eds.), Phonetics and phonology in language comprehension and production (pp. 9– 38). Berlin: Mouton de Gruyter. Dickey, E. (1997). Forms of address and terms of reference. Journal of Linguistics, 33(2), 255–274. Dorian, N. C. (1970). A substitute name system in the Scottish Highlands. American Anthropologist, 72(2), 303–319. Ertel, S. (1977). Where do the subjects of sentences come from? In S. Rosenberg (Ed.), Sentence production: Developments in research and theory (pp. 141–167). Hillsdale, NJ: Lawrence Erlbaum Associates. Ervin-Tripp, S. M. (1972). Alternation and co-occurrence. In J. J. Gumperz & D. Hymes (Eds.), Directions in sociolinguistics: The ethnography of communication (pp. 218–250). New York, NY: Holt, Rinehart and Winston. Evans-Pritchard, E. E. (1948/1964). Nuer modes of address. In D. Hymes (Ed.), Language in culture and society (pp. 221–227). New York, NY: Harper & Row. Fiske, S. (1978). Rules of address: Navajo women in Los Angeles. Journal of Anthropological Research, 34(1), 72–91.
Personal Names
383
Fitzsimons, G. M., & Shah, J. Y. (2009). Confusing one instrumental other for another: Goal effects on social categorization. Psychological Science, 20(12), 1468–1472. Flude, B. M., Ellis, A. W., & Kay, J. (1989). Face processing and name retrieval in an anomic aphasic: Names are stored separately from semantic information about familiar people. Brain and Cognition, 11(1), 60–72. Fraas, M., Lockwood, J., Neils-Strunjas, J., Shidler, M., Krikorian, R., & Weiler, E. (2002). ‘What’s his name?’ A comparison of elderly participants’ and undergraduate students’ misnamings. Archives of Gerontology and Geriatrics, 34(2), 155–165. Fogler, K. A., & James, L. E. (2007). Charlie Brown versus Snow White: The effects of descriptiveness on young and older adults’ retrieval of proper names. Journals of Gerontology: Series B: Psychological Sciences and Social Sciences, 62(4), 201–207. Friedrich, P. (1972). Social context and semantic feature: The Russian pronominal usage. In J. Gumperz & D. Hymes (Eds.), Directions in sociolinguistics (pp. 273–300). New York, NY: Holt, Rinehart and Winston. Fromkin, V. A. (1971). The non-anomalous nature of anomalous utterances. Language, 47, 27–52. Gentner, D., & Kurtz, K. (2005). Relational categories. In W. K. Ahn, R. L. Goldstone, B. C. Love, A. B. Markman, & P. W. Wolff (Eds.), Categorization inside and outside the lab (pp. 151–175). Washington, DC: American Psychological Association. Ghika-Schmid, F., & Nater, B. (2003). Anemia for people’s names, a restricted form of transient epileptic amnesia. European Journal of Neurology, 10(6), 651–654. Goldrick, M., Folk, J. R., & Rapp, B. (2010). Mrs. Malaprop’s neighborhood: Using word errors to reveal neighborhood structure. Journal of Memory and Language, 62, 113–134. Gollan, T. H., Bonanni, M. P., & Montoya, R. (2005). Proper names get stuck on bilingual and monolingual speakers’ tip of the tongue equally often. Neuropsychologia, 19(3), 278–287. Gollan, T. H., & Brown, A. S. (2006). From tip-of-the-tongue (TOT) data to theoretical implications in two steps: When more TOTs means better retrieval. Journal of Experimental Psychology: General, 135(3), 462–483. Griffin, Z. M., & Crew, C. (2010). Research in language production. In M. Spivey, M. Joanisse, & K. McRae (Eds.), Cambridge handbook of psycholinguistics. Cambridge, UK: Cambridge University Press (in press). Griffin, Z. M., & Ferreira, V. S. (2006). Properties of spoken language production. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (2nd ed.). (pp. 21–59). London: Elsevier. Griffin, Z. M., & Wangerman, T. (2008). ‘‘Lisa, Patty, Selma, Snowball . . . Maggie!’’ Names that parents call their children by mistake. Poster presented at the 5th International Workshop on Language Production, Annapolis, MD. Hancock, J. T. (2004). Verbal irony use in face-to-face and computer-mediated conversations. Journal of Language and Social Psychology, 23(4), 447–463. Hanley, J. R., & Chapman, E. (2008). Partial knowledge in a tip-of-the-tongue state about two- and three-word proper names. Psychonomic Bulletin & Review, 15(1), 156–160. Hanley, J. R., & Kay, J. (1998). Proper name anomia and anomia for the names of people: Functionally dissociable impairments? Cortex, 34(1), 155–158. Harburg, E. Y., & Gorney, J. (1931). Brother, can you spare a dime. (lyrics by Yip Harburg, music by Jay Gorney). Harley, T. A., & Bown, H. E. (1998). What causes a tip-of-the-tongue state? Evidence for lexical neighborhood effects in speech production. British Journal of Psychology, 89, 151–174. Harris, D. M., & Kay, J. (1995). I recognize your face but I can’t remember your name: Is it because names are unique? British Journal of Psychology, 86(3), 345–358.
384
Zenzi M. Griffin
Hittmair-Delazer, M., Denes, G., Semenza, C., & Mantovan, M. C. (1994). Anomia for people’s names. Neuropsychologia, 32(4), 465–476. Horton, W. S. (2007). The influence of partner-specific memory associations on language production: Evidence from picture naming. Language and Cognitive Processes, 22(7), 1114–1139. Horton, W. S., & Gerrig, R. J. (2005). Conversational common ground and memory processes in language production. Discourse Processes, 40(1), 1–35. Jackson, H. J. (1866/1958). Notes on the physiology and pathology of language. In J. Taylor (Ed.), Selected writings of John Hughlings Jackson. London: Staples Press (originally published in 1866, Vol. 2, pp. 121–128). James, L. E. (2004). Meeting Mr. Farmer versus Meeting a Farmer: Specific effects of aging on learning proper names. Psychology and Aging, 19(3), 515–522. James, L. E., & Fogler, K. A. (2007). Meeting Mr. Davis vs Mr. Davin: Effects of name frequency on learning proper names in young and older adults. Memory, 15(4), 366–374. Jescheniak, J.-D., & Levelt, W. J. M. (1994). Word frequency effects in speech production: Retrieval of syntactic information and of phonological form. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 824–843. Jescheniak, J. D., Meyer, A. S., & Levelt, W. J. M. (2003). Specific-word frequency is not all that counts in speech production: Comments on Caramazza, Costa, et al. (2001) and new experimental data. Journal of Experimental Psychology: Learning, Memory, & Cognition, 29(3), 432–438. Jonz, J. G. (1975). Situated address in the United States Marine Corps. Anthropological Linguistics, 17(2), 68–77. Kasanga, L. A. (2009). Language socialization: The naming of non-kin adults by African children and preadolescents in intercultural encounters. Intercultural Pragmatics, 6(1), 85–114. Kendall, M. B. (1980). Exegesis and translation: Northern Yuman names as texts. Journal of Anthropological Research, 36(3), 261–273. Keysar, B., Barr, D. J., & Horton, W. S. (1998). The egocentric basis of language use: Insights from a processing approach. Current Directions in Psychological Science, 7(2), 46–50. Kourbetis, V., & Hoffmeister, R. J. (2002). Name signs in Greek sign language. American Annals of the Deaf, 147(3), 35–43. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240. Leech, G. (1999). The distribution and function of vocatives in American and British English conversation. In H. Hasselga˚rd & S. Oksefjell (Eds.), Out of corpora: Studies in honour of Stig Johansson (pp. 107–118). Amsterdam: Rodopi. Lele, V. (2009). ‘‘It’s not really a nickname, it’s a method’’: Local names, state intimates, and kinship register in the Irish Gaeltacht. Journal of Linguistic Anthropology, 19(1), 101–116. Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press. Le´vi-Strauss, C. (1966). The savage mind. Chicago, IL: University of Chicago Press. Lewis, L. S. (1965). Terms of address for parents and some clues about social relationships in the American family. The Family Life Coordinator, 14(2), 43–46. Little, C. B., & Gelles, R. J. (1975). The social psychological implications of form of address. Sociometry, 38(4), 573–586. Lloyd-Jones, T. J., & Nettlemill, M. (2007). Sources of error in picture naming under time pressure. Memory & Cognition, 35(4), 816–836. Lucchelli, F., Muggia, S., & Spinnler, H. (1997). Selective proper name anomia: A case involving only contemporary celebrities. Cognitive Neuropsychology, 14(6), 881–900. Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28(2), 203–208.
Personal Names
385
Lupker, S. J. (1979). The semantic nature of response competition in the picture–word interference task. Memory & Cognition, 7, 485–495. MacWhinney, B. (1977). Starting points. Language, 53, 152–168. Manning, F. E. (1974). Nicknames and number plates in the British West Indies. Journal of American Folklore, 87(344), 123–132. Markman, A. B., & Stilwell, C. H. (2001). Role-governed categories. Journal of Experimental & Theoretical Artificial Intelligence, 13, 329–358. McCormick, J., & Richardson, S. (2006). Vocatives in MICASE [Electronic Version]. MICASE Kibbitzers, 12, Retrieved November 18, 2008, from http://micase.elicorpora. info/micase-kibbitzers/12-vocatives-in-micase. McDowell, J. H. (1981). Toward a semiotics of nicknaming the Kamsa´ example. Journal of American Folklore, 94(371), 1–18. McRae, K., de Sa, V. R., & Seidenberg, M. S. (1997). On the nature and scope of featural representations of word meaning. Journal of Experimental Psychology: General, 126(2), 99–130. McWeeny, K. H., Young, A. W., Hay, D. C., & Ellis, A. W. (1987). Putting names to faces. British Journal of Psychology, 78(2), 143–149. Meyer, A. S., & Belke, E. (2007). Word form retrieval in language production. In M. G. Gaskell (Ed.), Oxford handbook of psycholinguistics (pp. 471–487). Oxford: Oxford University Press. Milders, M. (2000). Naming famous faces and buildings. Cortex, 36(1), 139–145. Milders, M., Deelman, B., & Berg, I. (1998). Rehabilitation of memory for people’s names. Memory, 6(1), 21–36. Miller, G. A., & Johnson-Laird, P. N. (1976). Language and Perception. Cambridge MA: Harvard University Press. Moore, V., & Valentine, T. (1998). The effect of age of acquisition on speed and accuracy of naming famous faces. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 51(3), 485–513. Morris, P. E., Fritz, C. O., Jackson, L., Nichol, E., & Roberts, E. (2005). Strategies for learning proper names: Expanding retrieval practice, meaning and imagery. Applied Cognitive Psychology, 19(6), 779–798. Morrison, C. M., Chappell, T. D., & Ellis, A. W. (1997). Age of acquisition norms for a large set of object names and their relation to adult estimates and other variables. Quarterly Journal of Experimental Psychology A, 50, 528–559. Murphy, G. L. (1988). Personal reference in English. Language in Society, 17(3), 317–349. Needham, R. (1954). The system of teknonyms and death-names of the Penan. Southwestern Journal of Anthropology, 10(4), 416–431. Oldfield, R. C., & Wingfield, A. (1964). The time it takes to name an object. Nature, 202, 1031–1032. Olson, D. R. (1970). Language and thought: Aspects of a cognitive theory of semantics. Psychological Review, 77, 257–273. Paulston, C. B. (1976). Pronouns of address in Swedish: Social class semantics and a changing system. Language in Society, 5(3), 359–386. Pelamatti, G., Pascotto, M., & Semenza, C. (2003). Verbal free recall in high altitude: Proper names vs common names. Cortex, 39(1), 97–103. Philipsen, G., & Huspek, M. (1985). A bibliography of sociolinguistic studies of personal address. Anthropological Linguistics, 27(1), 94–101. Rendle-Short, J. (2009). The address term mate in Australian English: Is it still a masculine term? Australian Journal of Linguistics, 29(2), 245–268. Rymes, B. (1996). Naming as social practice: The case of Little Creeper from Diamond Street. Language in Society, 25(2), 237–260.
386
Zenzi M. Griffin
Sacks, H., & Schegloff, E. A. (1979). Two preferences in the organization of reference to persons in conversation and their interaction. In G. Psathas (Ed.), Everyday language: Studies in ethnomethodology (pp. 15–21). New York, NY: Halsted (Irvington). Saetti, M. C., Marangolo, P., De Renzi, E., Rinaldi, M. C., & Lattanzi, E. (1999). The nature of the disorder underlying the inability to retrieve proper names. Cortex, 35(5), 675–685. Schuit, J. (2009). What’s in a name sign? Name signs in sign language of the Netherlands (NGT). In A. Ender, M. Matter & F. Tissot (Eds.), Proceedings der 39 Studentischen Tagung Sprachwissenschaft (StuTS), (pp. 21-34). Bern: Universitaet Bern Arbeitspapiere. Semenza, C. (1997). Proper-name-specific aphasias. In H. Goodglass & A. Wingfield (Eds.), Anomia: Neuroanatomical and cognitive correlates (pp. 115–134). San Diego, CA: Academic Press. Semenza, C. (2006). Retrieval pathways for common and proper names. Cortex, 42(6), 884–891. Semenza, C., & Zettin, M. (1989). Evidence from aphasia for the role of proper names as pure referring expressions. Nature, 342(6250), 678–679. Sevald, C. A., & Dell, G. S. (1994). The sequential cuing effect in speech production. Cognition, 53(2), 91–127. Smith, S. W., Noda, H. P., Andrews, S., & Jucker, A. H. (2005). Setting the stage: How speakers prepare listeners for the introduction of referents in dialogues and monologues. Journal of Pragmatics, 37(11), 1865–1895. Snodgrass, J. G., & Yuditsky, T. (1996). Naming times for the Snodgrass and Vanderwart pictures. Behavior Research Methods, Instruments, & Computers, 28, 516–536. Stanhope, N., & Cohen, G. (1993). Retrieval of proper names: Testing the models. British Journal of Psychology, 84(1), 51–65. Stevenage, S. V., & Lewis, H. G. (2005). By which name should I call thee? The consequences of having multiple names. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 58(8), 1447–1461. Stivers, T., Enfield, N. J., & Levinson, S. C. (2007). Person reference in interaction. In N. J. Enfield & T. Stivers (Eds.), Person reference in interaction: Linguistic, cultural, and social perspectives (pp. 1–20). Cambridge, UK: Cambridge University Press. Supalla, S. J. (1992). The book of name signs: Naming in American Sign Language. San Diego, CA: Dawn Sign Press. Tardif, T., Fletcher, P., Liang, W. L., Zhang, Z. X., Kaciroti, N., & Marchman, V. A. (2008). Baby’s first 10 words. Developmental Psychology, 44(4), 929–938. Todd, M. G., & Robert, L. G. (2009). How you named your child: Understanding the relationship between individual decision making and collective outcomes. Topics in Cognitive Science, 1(4), 651–674. Tyler, L. K., Moss, H. E., Durrant-Peatfield, M. R., & Levy, J. P. (2000). Conceptual structure and the structure of concepts: A distributed account of category-specific deficits. Brain and Language, 75(2), 195–231. Valentine, T., Brennen, T., & Bre´dart, S. (1996). The cognitive psychology of proper names: On the importance of being Ernest. London: Routledge. Valentine, T., & Darling, S. (2006). Competitor effects in naming objects and famous faces. European Journal of Cognitive Psychology, 18(5), 686–707. Valentine, T., Hollis, J., & Moore, V. (1999). The nominal competitor effect: When one name is better than two. In M. Hahn & S. C. Stoness (Eds.), Proceedings of the 21st Annual Meeting of the Cognitive Science Society, (pp. 749–754). Mahwah, NJ: Lawrence Earlbaum Associates. Valentine, T., & Moore, V. (1995). Naming faces: The effects of facial distinctiveness and surname frequency. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 48(4), 849–878.
Personal Names
387
Verhaeghan, P. (2003). Aging and vocabulary scores: A meta-analysis. Psychology and Aging, 18, 332–339. Vigliocco, G., Vinson, D. P., Damian, M. F., & Levelt, W. J. M. (2002). Semantic distance effects on object and action naming. Cognition, 85, B61–B69. Vitevitch, M. S. (2002). The influence of phonological similarity neighborhoods on speech production. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28(4), 735–747. Vitkovitch, M., Humphreys, G. W., & Lloyd-Jones, T. J. (1993). On naming a giraffe a zebra: Picture naming errors across different object categories. Journal of Experimental Psychology: Learning, Memory, & Cognition, 19(2), 243–259. Warrington, E. K., & Clegg, F. (1993). Selective preservation of place names in an aphasic patient: A short report. In G. Cohen & D. M. Burke (Eds.), Memory for proper names (pp. 281–288). Hillsdale, NJ: Lawrence Erlbaum Associates. Wilson, A. J., & Zeitlyn, D. (1995). The distribution of person-referring expressions in natural conversation. Research on Language and Social Interaction, 28(1), 61–92. Wu, S., & Keysar, B. (2007). The effect of culture on perspective taking. Psychological Science, 18(7), 600–606. Yassin, M. A. F. (1977). Bi-polar terms of address in Kuwaiti Arabic. Bulletin of the School of Oriental and African Studies, University of London, 40(2), 297–301. Yasuda, K., Nakamura, T., & Beckman, B. (2000). Review. Aphasiology, 14(11), 1067–1089. Yau, S.-C. (1996). The weight of tradition in the formation of the name signs of the deaf in China. Diogenes, 44(3), 55–65. Young, A. W., Ellis, A. W., Flude, B. M., McWeeny, K. H., & Hay, D. C. (1986). Face–name interference. Journal of Experimental Psychology: Human Perception and Performance, 12, 466–475. Young, A. W., Flude, B. M., Hellawell, D. J., & Ellis, A. W. (1994). The nature of semantic priming effects in the recognition of familiar people. British Journal of Psychology, 85, 393–411. Young, A. W., Hay, D. C., & Ellis, A. W. (1985). The faces that launched a thousand slips: Everyday difficulties and errors in recognizing people. British Journal of Psychology, 76(4), 495–523. Zwicky, A. (1974). Hey, whatsyourname!. Chicago Linguistic Society, 10, 787–801.
Subject Index
A Action disorganization syndrome. See Prefrontal cortex (PFC) Adaptive memory domain-specific mnemonic process, potential candidates, 3–4 memory theory and nature’s criterion encoding–retrieval match, 13–14 episodic future thought, 16–18 levels of processing, 14–16 rational analysis, memory, 18–20 stone-age brain, remembering, 20–21 ancestral priorities, survival processing, 23–24 cognitive adaptations, 21–23 domain-specific knowledge systems, 25 mnemonic adaptations, 24 multiple module, 26 shallow perceptual dimensions, 26 survival processing paradigm emotional processing, 8–9 proportion correct recall, words, 6–7 scenarios, 5 special adaptation, 11–12 thematic processing, 9–11 taboo words, 4 temporal context, 2 Age-invariance, 84–85 Aging episodic memory and situation model, 277–279 event segmentation, 279–282 midbrain neuromodulatory systems, 277 prefrontal cortex, 276–277 Alzheimer’s disease (AD) brain changes and cognitive deficits, 283–285 event segmentation, 286–287 symptoms, 282–283 Anterior cingulate cortex (ACC), 258 Argument evaluation, 185, 191–193, 203–204 B Blindness, 54–55 C Category-based induction inference. See also Inductive inference generation cognitive functions, 221
forced-choice/argument-evaluation, 219 induction process, 184 inductive reasoning, 221–222 premise categories, 223 taxonomic relations, 220 Causal inferences, 215–216 Chronic de´ja` vu, 55–56 Creativity implications, 173–174 retrieving analogies brainstorming, 155–156 incubation/preparedness effects, 157 social factors, 156 Cued-recall method, 233 D Deficient-processing effects homographic repetition, 77 presentation rate, 78 same-sense repetition, 77 testing effect, 117–119 word puzzles, 77 De´ja` vu research aging, 56–57 dreams, 57 implicit memory explanation episodic experience, 43–46 gestalt familiarity explanation, 49–51 hypnosis, 51–52 single-element familiarity explanation, 46–49 Jamais vu, 58–59 physiological explanation neural transmission asynchrony, 52–53 surgical elicitation, 53–54 surgical elimination, 53 reincarnation and extra sensory perception, 35 reports, anomalous individuals blindness, 54–55 chronic de´ja` vu, 55–56 single vs. multiple causes, 57–58 split perception Jacoby and Whitehouse’s design, 37–38 modern cognitive science, 36 peripheral priming possibility, 41 pre-experiment source rating, 41 superficial glance, shallow processing, 42–43 symbols, 38–41
389
390
Subject Index
Dialog processing confederation, 308 different perspectives, 306–307 grounding process, 306 process model collaborative view, 311–313 grounding, 311–313 message, 308–310 two-stage, 310–311 transcripts, 304–307 Direct address, spoken language address avoidance and second-person pronouns, 372–373 address form choice, factors, 376–379 familiarizers and fictive kinship terms, 373–374 insults and endearments, 373 kinship terms, 375 names and teknonyms, 376 nicknames, 376 occupational titles, 374–375 social bonds, 371 vocatives, 371 Domain knowledge acquisition implications, 175–176 retrieving analogies complex declarative learning, 158 progressive alignment, 159 salient surface property, 159 social guidance, 159 tradeoff, 158 Dual-process model, 322 E Episodic memory, 277–279 Epistemic uncertainty and approximation demand characteristics, 247 distinction, 246–247 expertise function, 238 vs. novices, 238–239 submarine domain, 238 uncertainty detection and resolution strategies, 240 gestures coding cross-validation, 236 taxonomies, 235 uncertainty speech code, 237 visual–spatial content, 234–235 linguistic pragmatics, 229–231 patterns, 247 psychological uncertainty vs. approximation, 229 qualitative reasoning, 248 spatial reasoning mental simulations, engineering design, 244–246
spatial gestures, 242–244 verbally coded spatial transformations, 241–242 speech coding conversation and interview coding, science data analysis, 232–234 conversation coding, engineering design team, 231–232 types, 229 Error-related negativity (ERN), 266 Event perception aging episodic memory and situation model, 277–279 midbrain neuromodulatory systems, 277 possibilities, 279–282 prefrontal cortex, 276–277 Alzheimer’s disease (AD) brain changes and cognitive deficits, 283–285 symptoms, 282–283 event segmentation theory (EST) behavior and brain function, 259–260 components, 256 event models, 255 event schemata, 257 predictions, 255 temporal dynamics, 258–259 working memory (WM) representation, 261 obsessive-compulsive disorder (OCD) cognitive disturbances, 265–267 neurochemical mechanism, 265 possibilities, 267–269 serotonin and dopamine, 265 Parkinson’s disease (PD) coginitive deficits, 269–270 symptoms, 269 prefrontal cortex (PFC) cognitive deficits, 272–274 prefrontal lesions, 274–275 Schizophrenia cognitive deficits, 263 cognitive dysfunction, 264 neurotransmitter dopamine, 262 Event Segmentation Theory (EST) aging, 279–282 Alzheimer’s Disease (AD), 286–287 behavior and brain function, 259–260 components, 256 event models, 255 event schemata, 257 Obsessive-Compulsive Disorder (OCD), 267–269 overview of, 288 Parkinson’s Disease, 271–272 predictions, 255 prefrontal cortex (PFC), 274–275
391
Subject Index
Schizophrenia, 263–264 temporal dynamics, 258–259 WM representation, 261 Extrinsic inferences, 215 F Fitness-relevant processing domain-specific mnemonic process, potential candidates, 3–4 survival processing paradigm emotional processing, 8–9 proportion correct recall, words, 6–7 scenarios, 5 special adaptation, 11–12 thematic processing, 9–11 taboo words, 4 G Glenberg surface, 81, 86–87, 90, 103, 107 Grounding model, 311–313 Guide inductive reasoning, 204 H High-familiarity symbols, 38–39 I Implicit memory interpretation episodic experience, 43–46 gestalt familiarity explanation, 49–51 hypnosis, 51–52 single-element familiarity explanation, 46–49 Inductive inference generation categorical induction process, 184 category-based induction cognitive functions, 221 forced-choice/argument-evaluation, 219 inductive reasoning, 221–222 premise categories, 223 taxonomic relations, 220 causal relations, 189–190, 202 contextual relations, 202 extrinsic similarity, 188–189 induction and relations, 187 novel properties, 191 open-ended method, 186–187 premise relations effects argument evaluation, 191–193 coding, 196–198 multiple regressions analyses, 200 privileged taxonomic inferences, 201 relative frequency inferences, 198–199 relative salience of conceptual relations, 193–194 research design and procedure, 195–196 property effects argument evaluation, 203–204
causal inferences, 215–216 coding, 206–207 extrinsic inferences, 215 gene, 210–211 premise pair, 212 relative frequency inferences, 207–208 research design and procedure, 206 salience shared habitat, 213 substance and disease, 208–210 taxonomic inferences, 214–215 salient relations, 191 salient spatiotemporal or causal relations, 195 salient taxonomic relations, 195 taxonomic similarity, 187–188 Intention invariance incidental learning effect, 82–84 intentional-learning, 83 rehearsal borrowing, 81 J Jamais vu, 58–59 L Lag effect, 65, 69–70, 78, 86–87, 101–103, 107 Language processing dialog collaborative view, 311–313 confederation, 308 different perspectives, 306–307 grounding process, 306 message model, 308–310 transcripts, 304–307 two-stage models, 310–311 partner-adapted processing human vs. computer partner interactions, 329–330 joint activation, 328–329 mentalizing vs. mirroring system, 332–333 mirroring system, 325–326 private, social, and communicative intentions, 327–328 processing cues, 334 role of executive control, 330–332 voice cues, 334–335 partner-specific processing addressees adapt utterance, 323–324 global and local adaptations, 316–320 ‘‘one-bit’’partner models, 324 speakers adapt utterances, 320–323 role of cue, 313–315 Linguistic pragmatics, 229–231 List-strength effects, 68, 107–108 incidental learning and mixed lists, 79–80 SAM/REM model, 104–106, 111 Low-familiarity symbols, 38–41
392
Subject Index M
Message model, 308–310 Mirroring systems human ‘‘mirror system’’, 326 vs. mentalizing, 332–333 N Naı¨ve theories, 216 Neural transmission asynchrony, 52–53 Novel open-ended induction task, 185, 190, 216 Novel symbols, 38–41 O Obsessive-compulsive disorder (OCD) cognitive disturbances, 265–267 event segmentation, 267–269 neurochemical mechanism, 265 serotonin and dopamine, 265 One-bit partner models, 324 P Parkinson’s disease (PD) coginitive deficits, 269–270 event segmentation, 270–271 symptoms, 269 thalamocortical loops, 269 Partner-adapted processing cues hypothesize processing cues, 334 voice cues, 334–335 mentalizing vs. mirroring system, 332–333 mirroring system, 325–326 role of executive control, 330–332 theory of mind (ToM) human vs. computer partner interactions, 329–330 joint activation, 328–329 private, social, and communicative intentions, 327–328 Partner-specific processing addressees adapt utterance, 323–324 global and local adaptations, 316–320 ‘‘one-bit’’ partner models, 324 speakers adapt utterances, 320–323 Personal names contextual cues and implicit memory process, 371 derogatory nicknames, 370 descriptive and meaningful names, 366 family and clan names, 365 kinship terms, 369 nonunique and multiple names descriptive and episode-based nicknames, 367 Gaelic version, first name, 367 name signs, 368
nicknames and license plate numbers, reference, 367 teknonymy, 368 tip-of-the-tongue (TOT) states, 367 psychological research cognitive impairment, patients, 347 cognitive psychology, 347 descriptiveness and meaning, 351–352 features and representational structure, 352–357 individuality, uniqueness, and arbitrariness, 348–349 multiple names, 357–360 name ambiguity, 361–362 name frequency and acquisition age, 360–361 tip-of-the-tongue (TOT) state, 347–348 vocabulary size and age, 362–363 word forms, 349–351 taboo, 369–370 unique names, 369–370 Predator–prey relations, 204 Prefrontal cortex (PFC) aging, 276–277 cognitive deficits, 272–274 event segmentation, 274–275 Premise relations effects argument evaluation, 191–193 coding, 196–198 multiple regressions analyses, 200 premise pair, 212 privileged taxonomic inferences, 201 relative frequency inferences, 198–199 relative salience of conceptual relations, 193–194 research design and procedure, 195–196 salience shared habitat, 213 Problem solving implications, 173–174 retrieving analogies base rates, 154 domain experts, 154 higher quality retrievals, 155 learning phase, 152 mathematics test scores, 154 potential generalization, 153 self-explanation, 153 transfer effects, 153 Process model grounding, 311–313 message, 308–310 two-stage, 310–311 R Real spacing effect, 65–66, 73, 80–81. See also Spacing effect Recency effects, 67–68, 73, 108
393
Subject Index
Rehearsal-borrowing effect ‘‘deep’’ mnemonics, 71 encoding strategy, 71–72 later memory test, 69–70 ‘‘modal model’’ paper, 69 rehearse-aloud protocols, 69 rote rehearsal and borrowing hypothesis, 73–74 story mnemonic, 71–72 Retrieving analogies ambiguity and contextual variability, 161–162 context specificity, 161–162 creativity brainstorming, 155–156 incubation/preparedness effects, 157 social factors, 156 domain knowledge acquisition complex declarative learning, 158 progressive alignment, 159 salient surface property, 159 social guidance, 159 tradeoff, 158 encoding specificity, 160, 163 exemplars, encode, 160 generic encodings, 165 LISA model, 164 problem solving base rates, 154 domain experts, 154 higher quality retrievals, 155 learning phase, 152 mathematics test scores, 154 potential generalization, 153 self-explanation, 153 transfer effects, 153 retrieval time autobiographical memory, 168–169 controlled memory set studies, 170–171 MAC/FAC simulation modeling, 171–172 role bindings, 164 Reverse spacing effect, 78, 98. See also Spacing effect S Salient spatiotemporal/causal relations, 195 Salient taxonomic relations, 195 Schizophrenia cognitive deficits, 263 event segmentation, 263–264 neurotransmitter dopamine, 262 Situation model, 277–279 Spacing effect age-invariance, 84–85 contextual variability, 87–91 Glenberg surface, 86–87 hybrid accounts, 101–103 intention invariance
incidental learning effect, 82–84 intentional-learning, 83 rehearsal borrowing, 81 pedagogical ecology approach, 137 recognition, spacing benefits final free recall test, 91 incidental-learning condition, 93 semantic and perceptual priming accounts, cued-memory tasks, 95–101 species invariance, 85–86 spotting impostors deficient-processing effects, 77–79 impostor effects and confounds, 80 list-strength effects, 79–80 primacy and recency buffers (see Zero-sum effect) recency effects, 67–68 rehearsal effects and strategy-switching, 68–74 study-phase retrieval account context strength, 107 cued-recall spacing effects, 106 FRAN, model, 108 incidental background stimuli, 108 list-strength effect, 104–106, 109–111 SAM/REM model, 106, 111 U-shaped curve, 108 verbal theory, 106–107 and testing, educational contexts advocacy, 127 distant transfer, 129–130 educational outcome improvement, 127 in-class discussions., 131 individual differences, 134–135 learner improment, 131–134 remembering and learning, 128 rote memory, 127–128 theories and key phenomena, 103–104 Spatial reasoning mental simulations, engineering design, 244–246 spatial gestures mental transformations, 243 speech segment, 243–244 spatial transformations cued-recall phase, 242 fMRI data analysis, 241 Species invariance, 81, 85–86 Split perception, de´ja` vu Jacoby and Whitehouse’s design, 37–38 modern cognitive science, 36 peripheral priming possibility, 41 pre-experiment source rating, 41 superficial glance, shallow processing, 42–43 symbols, 38–41 Study-phase retrieval account context strength, 107 cued-recall spacing effects, 106
394
Subject Index
Study-phase retrieval account (cont.) FRAN, model, 108 incidental background stimuli, 108 list-strength effect, 104–106, 109–111 SAM/REM model, 107, 112 U-shaped curve, 108 verbal theory, 106–107 Survival processing paradigm emotional processing, 8–9 proportion correct recall, words, 6–7 scenarios, 5 special adaptation, 11–12 thematic processing, 9–11 T Taxonomic inferences, 214–215 Testing effect deficient-processing accounts, 117–119 encoding variability, 122–123 forgetting rate, 115 integrated stimuli, 124–125 learning aid, educational practice, 113 pedagogical ecology approach, 137 recitation method, 113 restudy condition, 122 retention interval, 115–117 retrieval effort and desirable-difficulties framework, 121–122 and spacing, educational contexts advocacy, 127 distant transfer, 129–130 educational outcome improvement, 127 in-class discussions., 131
individual differences, 134–135 learner improment, 131–134 remembering and learning, 128 rote memory, 127–128 transfer-appropriate processing accounts ACT-R theory, 120 free recall test, 119 long retention interval, 119 Theory of mind (ToM) human vs. computer partner interactions, 329–330 joint activation, 328–329 private, social, and communicative intentions, 327–328 Transfer-appropriate processing accounts ACT-R theory, 120 free recall test, 119 long retention interval, 119 U Uncertainty/approximation distinction, 247 V Visual–spatial processing, 243 W Wisconsin card sort test (WCST), 266 Working memory (WM) theory, 263 Z Zero-sum effect, 68, 74–77
CONTENTS OF RECENT VOLUMES
Volume 40 Different Organization of Concepts and Meaning Systems in the Two Cerebral Hemispheres Dahlia W. Zaidel The Causal Status Effect in Categorization: An Overview Woo-kyoung Ahn and Nancy S. Kim Remembering as a Social Process Mary Susan Weldon Neurocognitive Foundations of Human Memory Ken A. Paller Structural Influences on Implicit and Explicit Sequence Learning Tim Curran, Michael D. Smith, Joseph M. DiFranco, and Aaron T. Daggy Recall Processes in Recognition Memory Caren M. Rotello Reward Learning: Reinforcement, Incentives, and Expectations Kent C. Berridge Spatial Diagrams: Key Instruments in the Toolbox for Thought Laura R. Novick Reinforcement and Punishment in the Prisoner’s Dilemma Game Howard Rachlin, Jay Brown, and Forest Baker Index
Volume 41 Categorization and Reasoning in Relation to Culture and Expertise Douglas L. Medin, Norbert Ross, Scott Atran, Russell C. Burnett, and Sergey V. Blok On the Computational basis of Learning and Cognition: Arguments from LSA Thomas K. Landauer Multimedia Learning Richard E. Mayer Memory Systems and Perceptual Categorization Thomas J. Palmeri and Marci A. Flanery
Conscious Intentions in the Control of Skilled Mental Activity Richard A. Carlson Brain Imaging Autobiographical Memory Martin A. Conway, Christopher W. Pleydell-Pearce, Sharon Whitecross, and Helen Sharpe The Continued Influence of Misinformation in Memory: What Makes Corrections Effective? Colleen M. Seifert Making Sense and Nonsense of Experience: Attributions in Memory and Judgment Colleen M. Kelley and Matthew G. Rhodes Real-World Estimation: Estimation Modes and Seeding Effects Norman R. Brown Index
Volume 42 Memory and Learning in Figure–Ground Perception Mary A. Peterson and Emily Skow-Grant Spatial and Visual Working Memory: A Mental Workspace Robert H. Logie Scene Perception and Memory Marvin M. Chun Spatial Representations and Spatial Updating Ranxiano Frances Wang Selective Visual Attention and Visual Search: Behavioral and Neural Mechanisms Joy J. Geng and Marlene Behrmann Categorizing and Perceiving Objects: Exploring a Continuum of Information Use Philippe G. Schyns From Vision to Action and Action to Vision: A Convergent Route Approach to Vision, Action, and Attention Glyn W. Humphreys and M. Jane Riddoch Eye Movements and Visual Cognitive Suppression David E. Irwin
395
396
What Makes Change Blindness Interesting? Daniel J. Simons and Daniel T. Levin Index
Volume 43 Ecological Validity and the Study of Concepts Gregory L. Murphy Social Embodiment Lawrence W. Barsalou, Paula M. Niedinthal, Aron K. Barbey, and Jennifer A. Ruppert The Body’s Contribution to Language Arthur M. Glenberg and Michael P. Kaschak Using Spatial Language Laura A. Carlson In Opposition to Inhibition Colin M. MacLeod, Michael D. Dodd, Erin D. Sheard, Daryl E. Wilson, and Uri Bibi Evolution of Human Cognitive Architecture John Sweller Cognitive Plasticity and Aging Arthur F. Kramer and Sherry L. Willis Index
Volume 44 Goal-Based Accessibility of Entities within Situation Models Mike Rinck and Gordon H. Bower The Immersed Experiencer: Toward an Embodied Theory of Language Comprehension Rolf A. Zwaan Speech Errors and Language Production: Neuropsychological and Connectionist Perspectives Gary S. Dell and Jason M. Sullivan Psycholinguistically Speaking: Some Matters of Meaning, Marking, and Morphing Kathryn Bock Executive Attention, Working Memory Capacity, and a Two-Factor Theory of Cognitive Control Randall W. Engle and Michael J. Kane Relational Perception and Cognition: Implications for Cognitive Architecture and the Perceptual-Cognitive Interface Collin Green and John E. Hummel An Exemplar Model for Perceptual Categorization of Events Koen Lamberts
Contents of Recent Volumes
On the Perception of Consistency Yaakov Kareev Causal Invariance in Reasoning and Learning Steven Sloman and David A. Lagnado Index
Volume 45 Exemplar Models in the Study of Natural Language Concepts Gert Storms Semantic Memory: Some Insights From Feature-Based Connectionist Attractor Networks Ken McRae On the Continuity of Mind: Toward a Dynamical Account of Cognition Michael J. Spivey and Rick Dale Action and Memory Peter Dixon and Scott Glover Self-Generation and Memory Neil W. Mulligan and Jeffrey P. Lozito Aging, Metacognition, and Cognitive Control Christopher Hertzog and John Dunlosky The Psychopharmacology of Memory and Cognition: Promises, Pitfalls, and a Methodological Framework Elliot Hirshman Index
Volume 46 The Role of the Basal Ganglia in Category Learning F. Gregory Ashby and John M. Ennis Knowledge, Development, and Category Learning Brett K. Hayes Concepts as Prototypes James A. Hampton An Analysis of Prospective Memory Richard L. Marsh, Gabriel I. Cook, and Jason L. Hicks Accessing Recent Events Brian McElree SIMPLE: Further Applications of a Local Distinctiveness Model of Memory Ian Neath and Gordon D. A. Brown What is Musical Prosody? Caroline Palmer and Sean Hutchins Index
397
Contents of Recent Volumes
Volume 47 Relations and Categories Viviana A. Zelizer and Charles Tilly Learning Linguistic Patterns Adele E. Goldberg Understanding the Art of Design: Tools for the Next Edisonian Innovators Kristin L. Wood and Julie S. Linsey Categorizing the Social World: Affect, Motivation, and Self-Regulation Galen V. Bodenhausen, Andrew R. Todd, and Andrew P. Becker Reconsidering the Role of Structure in Vision Elan Barenholtz and Michael J. Tarr Conversation as a Site of Category Learning and Category Use Dale J. Barr and Edmundo Kronmu¨ller Using Classification to Understand the Motivation-Learning Interface W. Todd Maddox, Arthur B. Markman, and Grant C. Baldwin Index
Volume 48 The Strategic Regulation of Memory Accuracy and Informativeness Morris Goldsmith and Asher Koriat Response Bias in Recognition Memory Caren M. Rotello and Neil A. Macmillan What Constitutes a Model of Item-Based Memory Decisions? Ian G. Dobbins and Sanghoon Han Prospective Memory and Metamemory: The Skilled Use of Basic Attentional and Memory Processes Gilles O. Einstein and Mark A. McDaniel Memory is More Than Just Remembering: Strategic Control of Encoding, Accessing Memory, and Making Decisions Aaron S. Benjamin The Adaptive and Strategic Use of Memory by Older Adults: Evaluative Processing and ValueDirected Remembering Alan D. Castel Experience is a Double-Edged Sword: A Computational Model of the Encoding/ Retrieval Trade-Off With Familiarity
Lynne M. Reder, Christopher Paynter, Rachel A. Diana, Jiquan Ngiam, and Daniel Dickison Toward an Understanding of Individual Differences In Episodic Memory: Modeling The Dynamics of Recognition Memory Kenneth J. Malmberg Memory as a Fully Integrated Aspect of Skilled and Expert Performance K. Anders Ericsson and Roy W. Roring Index
Volume 49 Short-term Memory: New Data and a Model Stephan Lewandowsky and Simon Farrell Theory and Measurement of Working Memory Capacity Limits Nelson Cowan, Candice C. Morey, Zhijian Chen, Amanda L. Gilchrist, and J. Scott Saults What Goes with What? Development of Perceptual Grouping in Infancy Paul C. Quinn, Ramesh S. Bhatt, and Angela Hayden Co-Constructing Conceptual Domains Through Family Conversations and Activities Maureen Callanan and Araceli Valle The Concrete Substrates of Abstract Rule Use Bradley C. Love, Marc Tomlinson, and Todd M. Gureckis Ambiguity, Accessibility, and a Division of Labor for Communicative Success Victor S. Ferreira Lexical Expertise and Reading Skill Sally Andrews Index
Volume 50 Causal Models: The Representational Infrastructure for Moral Judgment Steven A. Sloman, Philip M. Fernbach, and Scott Ewing Moral Grammar and Intuitive Jurisprudence: A Formal Model of Unconscious Moral and Legal Knowledge John Mikhail Law, Psychology, and Morality Kenworthey Bilz and Janice Nadler
398
Protected Values and Omission Bias as Deontological Judgments Jonathan Baron and Ilana Ritov Attending to Moral Values Rumen Iliev, Sonya Sachdeva, Daniel M. Bartels, Craig Joseph, Satoru Suzuki, and Douglas L. Medin Noninstrumental Reasoning over Sacred Values: An Indonesian Case Study Jeremy Ginges and Scott Atran Development and Dual Processes in Moral Reasoning: A Fuzzy-trace Theory Approach Valerie F. Reyna and Wanda Casillas Moral Identity, Moral Functioning, and the Development of Moral Character Darcia Narvaez and Daniel K. Lapsley ‘‘Fools Rush In’’: A JDM Perspective on the Role of Emotions in Decisions, Moral and Otherwise Terry Connolly and David Hardman Motivated Moral Reasoning Peter H. Ditto, David A. Pizarro, and David Tannenbaum In the Mind of the Perceiver: Psychological Implications of Moral Conviction Christopher W. Bauman and Linda J. Skitka Index
Volume 51 Time for Meaning: Electrophysiology Provides Insights into the Dynamics of Representation and Processing in Semantic Memory Kara D. Federmeier and Sarah Laszlo Design for a Working Memory Klaus Oberauer When Emotion Intensifies Memory Interference Mara Mather Mathematical Cognition and the Problem Size Effect Mark H. Ashcraft and Michelle M. Guillaume Highlighting: A Canonical Experiment John K. Kruschke
Contents of Recent Volumes
The Emergence of Intention Attribution in Infancy Amanda L. Woodward, Jessica A. Sommerville, Sarah Gerson, Annette M. E. Henderson, and Jennifer Buresh Reader Participation in the Experience of Narrative Richard J. Gerrig and Matthew E. Jacovina Aging, Self-Regulation, and Learning from Text Elizabeth A. L. Stine-Morrow and Lisa M. S. Miller Toward a Comprehensive Model of Comprehension Danielle S. McNamara and Joe Magliano Index
Volume 52 Naming Artifacts: Patterns and Processes Barbara C. Malt Causal-Based Categorization: A Review Bob Rehder The Influence of Verbal and Nonverbal Processing on Category Learning John Paul Minda and Sarah J. Miles The Many Roads to Prominence: Understanding Emphasis in Conversation Duane G. Watson Defining and Investigating Automaticity in Reading Comprehension Katherine A. Rawson Rethinking Scene Perception: A Multisource Model Helene Intraub Components of Spatial Intelligence Mary Hegarty Toward an Integrative Theory of Hypothesis Generation, Probability Judgment, and Hypothesis Testing Michael Dougherty, Rick Thomas, and Nicholas Lange The Self-Organization of Cognitive Structure James A. Dixon, Damian G. Stephen, Rebecca Boncoddo, and Jason Anastas Index